[01:19:32] <wikibugs_>	 10Quarry, 10Data-Services: Switch Quarry to use *.analytics.db.svc.eqiad.wmflabs as replica database host - https://phabricator.wikimedia.org/T176694#3633999 (10bd808)
[01:19:49] <wikibugs_>	 10Quarry, 10Data-Services: Switch Quarry to use *.analytics.db.svc.eqiad.wmflabs as replica database host - https://phabricator.wikimedia.org/T176694#3634013 (10bd808)
[03:44:33] <wikibugs_>	 10Quarry, 10Data-Services: Switch Quarry to use *.analytics.db.svc.eqiad.wmflabs as replica database host - https://phabricator.wikimedia.org/T176694#3633999 (10zhuyifei1999) I donno is the config.yaml is puppetized somewhere (probably not), but this change should do the switch: ``` $ sed 's/enwiki.labsdb/enwi...
[03:45:38] <wikibugs_>	 (03CR) 10Nuria: [V: 031 C: 031] Add draft creation config and query (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/379441 (https://phabricator.wikimedia.org/T176375) (owner: 10Nettrom)
[03:46:00] <wikibugs_>	 (03CR) 10Nuria: [V: 032 C: 032] Add draft creation config and query [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/379441 (https://phabricator.wikimedia.org/T176375) (owner: 10Nettrom)
[03:51:08] <wikibugs_>	 (03CR) 10Nuria: Add oozie jobs loading druid monthly uniques (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/348052 (https://phabricator.wikimedia.org/T159471) (owner: 10Joal)
[04:10:46] <wikibugs_>	 10Quarry, 10Data-Services: Switch Quarry to use *.analytics.db.svc.eqiad.wmflabs as replica database host - https://phabricator.wikimedia.org/T176694#3634206 (10zhuyifei1999) 05Open>03Resolved a:03zhuyifei1999 LGTM
[08:48:24] <joal>	 Hi a-team
[08:51:29] <wikibugs_>	 10Analytics-Kanban: Replace references to dbstore1002 by db1047 in reportupdater jobs - https://phabricator.wikimedia.org/T176639#3634730 (10elukey) >>! In T176639#3632819, @mpopov wrote: >>>! In T176639#3632795, @elukey wrote: >> Is there a valid reason why `analytics-slave` shouldn't be used? Are we talking ab...
[08:58:35] <elukey>	 joal: o/
[09:02:28] <joal>	 elukey: Do you mind me self-merging some patches and deploying?
[09:03:00] <wikibugs_>	 (03CR) 10Joal: "Comments inline." (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/348052 (https://phabricator.wikimedia.org/T159471) (owner: 10Joal)
[09:05:13] <joal>	 The two patches I'd like to merge are: https://gerrit.wikimedia.org/r/#/c/379836/ and https://gerrit.wikimedia.org/r/#/c/379835/
[09:07:24] <elukey>	 nope
[09:07:35] <elukey>	 (nope == green light from me)
[09:07:43] <joal>	 Okey
[09:09:01] <joal>	 elukey: here is my plan: self-merging & deploying refinery-source, update refinery for jar versions and then deploy. Will ask for quick reviews on changelogs etc if you don't mind
[09:09:20] <elukey>	 +1
[09:09:32] <elukey>	 joal: what is the fix for the dt:null issue?
[09:09:41] <elukey>	 (asking because I didn't follow the resolution)
[09:09:54] <joal>	 elukey: https://gerrit.wikimedia.org/r/#/c/380533/
[09:10:17] <elukey>	 ah nice
[09:12:38] <wikibugs_>	 (03CR) 10Joal: [C: 032] "Self merging for deploy." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/379835 (owner: 10Joal)
[09:17:34] <wikibugs_>	 (03Merged) 10jenkins-bot: Correct field names in mediawiki-history spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/379835 (owner: 10Joal)
[09:18:19] <joal>	 git up
[09:18:22] <joal>	 oops
[09:21:19] <wikibugs_>	 (03PS1) 10Joal: Update changelog.md for v0.0.53 deployment [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/380709
[09:21:42] <joal>	 elukey: --^ when you have a minute
[09:23:51] <wikibugs_>	 (03CR) 10Elukey: [C: 031] Update changelog.md for v0.0.53 deployment [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/380709 (owner: 10Joal)
[09:24:22] <wikibugs_>	 (03CR) 10Joal: [V: 032 C: 032] "Self merging for deploy." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/380709 (owner: 10Joal)
[09:33:34] <joal>	 !log Releasing refinery-source v0.0.53 with Jenskins
[09:33:36] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[10:10:20] <wikibugs_>	 (03PS1) 10Addshore: instanceof.php improved extra logging on sparql result [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/380715 (https://phabricator.wikimedia.org/T176577)
[10:10:35] <wikibugs_>	 (03PS1) 10Addshore: instanceof.php improved extra logging on sparql result [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/380716 (https://phabricator.wikimedia.org/T176577)
[10:10:44] <wikibugs_>	 (03CR) 10Addshore: [C: 032] instanceof.php improved extra logging on sparql result [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/380716 (https://phabricator.wikimedia.org/T176577) (owner: 10Addshore)
[10:10:47] <wikibugs_>	 (03CR) 10Addshore: [C: 032] instanceof.php improved extra logging on sparql result [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/380715 (https://phabricator.wikimedia.org/T176577) (owner: 10Addshore)
[10:10:50] <wikibugs_>	 (03Merged) 10jenkins-bot: instanceof.php improved extra logging on sparql result [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/380716 (https://phabricator.wikimedia.org/T176577) (owner: 10Addshore)
[10:10:53] <wikibugs_>	 (03Merged) 10jenkins-bot: instanceof.php improved extra logging on sparql result [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/380715 (https://phabricator.wikimedia.org/T176577) (owner: 10Addshore)
[10:13:00] <wikibugs_>	 (03PS2) 10Joal: Correct field names in mediawiki-history jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379836
[10:15:44] <wikibugs_>	 (03CR) 10Joal: [V: 032 C: 032] "Self merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379836 (owner: 10Joal)
[10:18:51] <wikibugs_>	 (03PS1) 10Joal: Bump jar version to 0.0.53 in oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/380718 (https://phabricator.wikimedia.org/T175707)
[10:19:20] <joal>	 elukey: --^ if you have aminute
[10:23:35] <wikibugs_>	 (03CR) 10Elukey: [C: 031] Bump jar version to 0.0.53 in oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/380718 (https://phabricator.wikimedia.org/T175707) (owner: 10Joal)
[10:23:59] <joal>	 elukey: Thanks :)
[10:24:22] <wikibugs_>	 (03CR) 10Joal: [V: 032 C: 032] "Self merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/380718 (https://phabricator.wikimedia.org/T175707) (owner: 10Joal)
[10:25:04] <joal>	 !log Deploy Refinery with scap
[10:25:06] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[10:29:07] <elukey>	 email sent joal :)
[10:29:18] <joal>	 Great elukey, thanks a lot !
[10:29:30] <joal>	 elukey: I hope we'll have a go, and we'll be able o move fast :)
[10:32:25] <joal>	 y
[10:35:00] <joal>	 !log Deploying refinery onto HDFS
[10:35:01] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[10:53:52] <wikibugs_>	 (03CR) 10GoranSMilovanovic: [C: 032] WDCM Usage Dashboard - Crosstabs [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380652 (owner: 10GoranSMilovanovic)
[10:53:57] <wikibugs_>	 (03CR) 10GoranSMilovanovic: [V: 032 C: 032] WDCM Usage Dashboard - Crosstabs [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380652 (owner: 10GoranSMilovanovic)
[10:53:59] <wikibugs_>	 (03Merged) 10jenkins-bot: WDCM Usage Dashboard - Crosstabs [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380652 (owner: 10GoranSMilovanovic)
[10:56:54] * elukey lunch!
[10:57:09] <elukey>	 err joal, forgot to ask if everything is ok with the deployment
[10:57:30] <joal>	 Everything good elukey - Currently restarting jobs
[10:57:42] <elukey>	 all right, be back in 30 mins :)
[10:58:28] <joal>	 !log Restart webrequest load job after deploy
[10:58:29] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[11:01:36] <joal>	 !log Restart mediawiki-history-denormalize and mediawiki-history-druid jobs after deploy
[11:01:37] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[11:36:23] <joal>	 Ok a-team, taking a break, everything look good afer deploy (still a job to be restarted, but waiting for current to finish
[12:19:27] <wikibugs_>	 (03Abandoned) 10Addshore: DNM JENKINS JOB VALIDATION [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380306 (owner: 10Addshore)
[12:28:46] <wikibugs_>	 (03CR) 10Addshore: "check experimental" [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380306 (owner: 10Addshore)
[12:29:11] <wikibugs_>	 (03CR) 10jenkins-bot: DNM JENKINS JOB VALIDATION [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380306 (owner: 10Addshore)
[12:29:41] <wikibugs_>	 (03CR) 10Addshore: "check experimental" [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380652 (owner: 10GoranSMilovanovic)
[12:32:24] <wikibugs_>	 (03CR) 10jenkins-bot: WDCM Usage Dashboard - Crosstabs [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380652 (owner: 10GoranSMilovanovic)
[13:16:15] <wikibugs_>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Use Prometheus for Kafka JMX metrics instead of jmxtrans - https://phabricator.wikimedia.org/T175922#3635207 (10fgiunchedi) >>! In T175922#3630917, @fgiunchedi wrote: > Yeah the idea is to have dedicated Prometheus instances roughl...
[13:36:43] <ottomata>	 elukey:  HIIiiiI!
[13:44:25] <elukey>	 o/
[13:54:17] <ottomata>	 elukey:  heyyyy
[13:54:22] <ottomata>	 got a min to talk about druid profile stuf?
[13:54:44] <ottomata>	 hmm, wanna do ops sycn today instead of tomorrow?
[13:54:49] <ottomata>	 tomorrow i'll be at strata
[13:56:18] <elukey>	 sure!
[14:00:27] <ottomata>	 bc?
[14:00:46] <ottomata>	 oh bc is taken elukey
[14:00:55] <ottomata>	 going to bc 2
[14:00:55] <ottomata>	 https://hangouts.google.com/hangouts/_/wikimedia.org/a-batcave-2
[14:02:18] <elukey>	 ottomata: grabbing the headphones
[14:16:18] <wikibugs_>	 10Analytics-Kanban: Replace references to dbstore1002 by db1047 in reportupdater jobs - https://phabricator.wikimedia.org/T176639#3635520 (10mforns) @mpopov  And regarding an alternative that we considered during the sinc-up meeting: Using `db1047.eqiad.wmnet` instead of `analytics-slave.eqiad.wmnet` would work,...
[14:17:56] <GoranSM>	 Hi, did anyone notice any problems in connecting from Labs to tools.labsdb recently? One of my Shiny Dashboards cannot connect there since 10 minutes ago or so. Thanks. 
[14:20:46] <wikibugs_>	 (03PS1) 10Mforns: Replace references to dbstore1002 by db1047 [analytics/limn-flow-data] - 10https://gerrit.wikimedia.org/r/380751 (https://phabricator.wikimedia.org/T176639)
[14:22:01] <GoranSM>	 Just checked w. #wikimedia-cloud there is definitely a problem with tools.labsdb and they're working to resolve it. 
[14:22:08] <ottomata>	 GoranSM:  i dont' know why it wouldn't work, but there is also now https://phabricator.wikimedia.org/phame/post/view/70/new_wiki_replica_servers_ready_for_use/
[14:22:11] <ottomata>	 if you haven't yet seen
[14:22:13] <ottomata>	 cool
[14:22:13] <ottomata>	 :)
[14:22:30] <wikibugs_>	 (03PS1) 10Mforns: Replace references to dbstore1002 by db1047 [analytics/limn-edit-data] - 10https://gerrit.wikimedia.org/r/380755 (https://phabricator.wikimedia.org/T176639)
[14:23:19] <GoranSM>	 ottomata: Thanks, I've seen the announcement but didn't check out whether do I need to change the server that I connect to from Labs or not. Will check it out. 
[14:24:16] <wikibugs_>	 (03PS1) 10Mforns: Replace references to dbstore1002 by db1047 [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/380756 (https://phabricator.wikimedia.org/T176639)
[14:24:20] <GoranSM>	 ottomata: tools.labsdb) will continue to support user created databases and tables. (from the announcement). Anyways, I'm sure they will take care of everything.
[14:24:27] <ottomata>	 ya
[14:26:18] <elukey>	 ottomata: done!
[14:26:25] <ottomata>	 merged and cool?
[14:26:40] <elukey>	 yep!
[14:26:43] <ottomata>	 gr8
[14:29:59] <wikibugs_>	 (03PS1) 10Mforns: Replace references to dbstore1002 by db1047 [analytics/limn-ee-data] - 10https://gerrit.wikimedia.org/r/380757 (https://phabricator.wikimedia.org/T176639)
[14:31:36] <wikibugs_>	 (03PS1) 10Mforns: Replace references to dbstore1002 by db1047 [analytics/limn-multimedia-data] - 10https://gerrit.wikimedia.org/r/380758 (https://phabricator.wikimedia.org/T176639)
[14:32:09] <joal>	 elukey, ottomata: sorry for having missed ops standup
[14:32:23] <joal>	 anything I can help with to speed up druid process?
[14:32:35] <ottomata>	 hmm, joal don't think so
[14:32:41] <joal>	 ok
[14:33:22] <elukey>	 ottomata: is cluster: jumbo_kafka final? 
[14:33:38] <elukey>	 (Filippo is asking it to me since it will change how metrics are stored)
[14:34:31] <ottomata>	 jumbo_kafka?
[14:34:43] <ottomata>	 i still don't fully understand what puppet 'cluster' is
[14:34:55] <ottomata>	 the official kafka cluster name is
[14:34:57] <ottomata>	 jumbo-eqiad
[14:36:18] <elukey>	 I think it is a logical grouping of hosts in puppet, that might reflect on other things.. for example, IIUC ganglia clusters == puppet clusters
[14:37:38] <elukey>	 ottomata: we also have profile::kafka::broker::kafka_cluster_name: jumbo
[14:37:45] <elukey>	 not jumbo-eqiad
[14:37:55] <elukey>	 is it ok/
[14:37:55] <elukey>	 ?
[14:39:04] <ottomata>	 yes, because that is passed to the kafka_cluster_name function
[14:39:11] <ottomata>	 which auto suffixes the site
[14:39:17] <ottomata>	 that way you can use the same role in multiple sites
[14:39:59] <elukey>	 super
[14:40:06] <elukey>	 so for analytics we currently have cluster: analytics_kafka
[14:40:11] <ottomata>	 yes, but that is a historical problem
[14:40:20] <ottomata>	 and i'm not really sure where that came from (did I make that? i don't know)
[14:40:25] <ottomata>	 what about main?
[14:40:56] <elukey>	 I didn't find any
[14:41:03] <ottomata>	 no?
[14:41:13] <elukey>	 I mean cluster: naming
[14:41:13] <ottomata>	 also, we do'nt even includ eganglia i think
[14:41:18] <ottomata>	 on these nodes
[14:41:30] <ottomata>	     if !defined(Class['::standard']) {
[14:41:30] <ottomata>	         class { '::standard':
[14:41:31] <ottomata>	             has_ganglia => false
[14:41:31] <ottomata>	         }
[14:41:31] <ottomata>	     }
[14:41:35] <wikibugs_>	 (03PS1) 10Mforns: Replace references to dbstore1002 by db1047 [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/380759 (https://phabricator.wikimedia.org/T176639)
[14:41:39] <ottomata>	 would like to know what this cluster_name is used for
[14:41:44] <elukey>	 sure sure ganglia was an example
[14:42:05] <elukey>	 other clusters are like "logstash, cache_text, etc.."
[14:43:24] <ottomata>	 yes but why?  what is using cluster name?
[14:44:23] <ottomata>	 elukey:  if we had to pick one, maybe I would do
[14:44:27] <ottomata>	 kafka_jumbo-eqiad
[14:44:38] <ottomata>	 or kafka-jumbo-eqiad (not sure which is better there)
[14:45:14] <ottomata>	 not sure if cluster_name wants the DC suffix or not
[14:46:08] <elukey>	 ahhaha kafka_jumbo-eqiad made my eyes bleeding
[14:48:32] <ottomata>	 haha
[14:48:35] <ottomata>	 but, it makes sense!
[14:48:38] <ottomata>	 they are just separator
[14:48:40] <ottomata>	 ss
[14:48:46] <ottomata>	 'jumbo-eqiad' is the cluster name
[14:48:48] <ottomata>	 kafka_ is a prefix
[14:48:58] <ottomata>	 but yeah, i knew it would make your eyes bleed (mine too a bid)
[14:49:02] <ottomata>	 bit*
[14:49:11] <joal>	 !log restart 
[14:49:12] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:49:18] <joal>	 !log restart mobile_apps session_metrics bundle
[14:49:19] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:55:18] <elukey>	 maybe we can just set kafka-jumbo
[14:55:39] <elukey>	 I am trying to get if this will be our cluster name in the prometheus dashboard
[14:55:51] <elukey>	 in that case kafka_jumbo_eqiad or kafka-jumbo-eqiad
[14:56:52] <wikibugs_>	 10Analytics-Kanban, 10Patch-For-Review: Replace references to dbstore1002 by db1047 in reportupdater jobs - https://phabricator.wikimedia.org/T176639#3635646 (10mforns) @elukey  @mpopov  In my opinion, the technical nomenclature master/slave is a really unhappy choice (from decades ago). But nowadays it's used...
[14:58:46] <wikibugs_>	 10Analytics-Kanban, 10Patch-For-Review: Replace references to dbstore1002 by db1047 in reportupdater jobs - https://phabricator.wikimedia.org/T176639#3632150 (10Milimetric) primary/replica, primary/secondary, master/doctor (HA!!!) are all fine alternatives.  As others have pointed out, we might also have to ch...
[14:58:58] <wikibugs_>	 (03PS1) 10Mforns: Replace references to dbstore1002 by db1047 [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/380762 (https://phabricator.wikimedia.org/T176639)
[15:01:27] <ottomata>	 elukey:  if you have to, do kafka-jumbo-eqiad, as that will be easier to use with the kafka_cluster_name  in puppet
[15:01:32] <ottomata>	 since it uses hyphen jumbo-eqiad
[15:10:54] <elukey>	 maybe kafka_jumbo is fine? it seems more consistent to other cluster naming across puppet, plus IIUC we'll need to choose the prometheus source as variable in our dashboards (like https://grafana.wikimedia.org/dashboard/db/prometheus-apache-hhvm-dc-stats -> datasource)
[15:13:14] <elukey>	 ahhh the cluster is also a cumin target: https://puppet-compiler.wmflabs.org/compiler02/8039/kafka-jumbo1001.eqiad.wmnet/
[15:20:32] <joal>	 mforns: batcave-2 for talk about mw-reduced?
[15:20:40] <mforns>	 joal, ok!
[15:32:30] <ottomata>	 a-team gonna do talk if you want
[15:34:17] <ottomata>	 joal:  you wanna?
[15:34:21] <joal>	 I do
[15:34:44] <joal>	 joining ottomata 
[15:39:04] <wikibugs_>	 (03PS1) 10GoranSMilovanovic: Sep 26 2017 Add logo + Crosstabs [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380770
[15:40:28] <wikibugs_>	 (03CR) 10GoranSMilovanovic: [V: 032 C: 032] Sep 26 2017 Add logo + Crosstabs [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380770 (owner: 10GoranSMilovanovic)
[15:40:34] <wikibugs_>	 (03Merged) 10jenkins-bot: Sep 26 2017 Add logo + Crosstabs [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380770 (owner: 10GoranSMilovanovic)
[15:57:28] <wikibugs_>	 (03CR) 10Bearloga: [C: 031] "Just a heads up that this repo is inactive and that if I had admin rights on it, I would have marked it as read-only when we deprecated it" [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/380762 (https://phabricator.wikimedia.org/T176639) (owner: 10Mforns)
[16:02:34] <joal>	 mforns, milimetric: Time for a talk on metrics definition (again)??
[16:02:39] <mforns>	 yeeeo
[16:02:43] <mforns>	 yep
[16:02:51] <milimetric>	 joal: I'm in the manager's thing
[16:02:54] <mforns>	 oh
[16:03:08] <milimetric>	 you guys got it, I trust you!
[16:03:09] <joal>	 milimetric: ok no prob, we'll wait for you to finish
[16:03:23] <joal>	 milimetric: That one is bit more tricky, we'll wait
[16:03:32] <travis-ci>	 wikimedia/mediawiki-extensions-EventLogging#698 (wmf/1.31.0-wmf.1 - 902a144 : libraryupgrader): The build has errored.
[16:03:36] <travis-ci>	 Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/compare/wmf/1.31.0-wmf.1
[16:03:36] <travis-ci>	 Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/280031142
[16:05:34] <bearloga>	 mforns elukey: how long is dbstore1002 aka analytics-store.eqiad.wmnet going to keep running?
[16:06:35] <bearloga>	 like, is there an estimated date at which it will be shut down and that CNAME removed?
[16:07:04] <mforns>	 hi bearloga, I'm not sure...
[16:07:46] <mforns>	 bearloga, I'd say within next quarter
[16:07:59] <mforns>	 elukey?
[16:10:55] <elukey>	 so ideally in our plans the log database will stop being replicated on dbstore1002 when the new db1047 replacement will be ready 
[16:11:06] <elukey>	 so dbstore1002 will only replicate wikis
[16:11:21] <elukey>	 we might keep the analytics-store cname
[16:12:58] <elukey>	 eventually, hopefully after few of the next quarters, we'll be able to migrate mysql users of the log database to HDFS
[16:13:39] <elukey>	 only then we'll start talking if it is possible to stop maintaining mysql for the log database (so cleaning up analytics-slave)
[16:14:20] <wikibugs_>	 (03CR) 10Mforns: "Oh, OK!" [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/380762 (https://phabricator.wikimedia.org/T176639) (owner: 10Mforns)
[16:20:07] <wikibugs_>	 (03CR) 10Mforns: "@Bearloga here's the puppet patch, added you as a reviewer:" [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/380762 (https://phabricator.wikimedia.org/T176639) (owner: 10Mforns)
[16:21:12] <wikibugs_>	 (03Abandoned) 10Mforns: Replace references to dbstore1002 by db1047 [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/380762 (https://phabricator.wikimedia.org/T176639) (owner: 10Mforns)
[16:21:34] <AndyRussG>	 Hi! What's the status of https://wikitech.wikimedia.org/wiki/SWAP? Can/should I use it instead of Ipython for some quick CentralNotice-y checks I need to do? Thanks!!!
[16:21:54] <chasemp>	 madhuvishy: ^
[16:22:34] <AndyRussG>	 (The doc talks about "plans"...)
[16:22:48] <AndyRussG>	 Mmm gonna try it NEway :)
[16:23:04] <madhuvishy>	 AndyRussG: yeah if its to talk to analytics stores, use it
[16:23:28] <AndyRussG>	 madhuvishy: K fantastic :) thx!! Yep that's what it's for
[16:23:29] <madhuvishy>	 AndyRussG: https://gist.github.com/madhuvishy/d349c472de1279568534e4fb2b5bf505
[16:24:16] <milimetric>	 ok, joal, done
[16:24:31] <milimetric>	 I'll go to the cave and maybe chop some veggies for lunch at the same time
[16:24:36] <milimetric>	 give me 4 minutes
[16:24:43] <AndyRussG>	 madhuvishy: neat!
[16:24:45] <madhuvishy>	 Nettrom: did you ever get your ldap access squared away?
[16:25:29] <Nettrom>	 madhuvishy: yes, I got that the day after I requested it, and since then I’ve been able to access the notebook, thanks!
[16:25:42] <madhuvishy>	 Nettrom: oh perfect :) yw
[16:29:11] <bearloga>	 mforns elukey: got it, okay, thanks
[16:29:49] <joal>	 milimetric, mforns: batcve ?
[16:30:18] <mforns>	 joal, omw
[16:41:29] <gwicke>	 quick question: is there a way to figure out how many requests hit api.php a year ago?
[16:42:36] <gwicke>	 doing this for the last three months is fairly easy via the webrequest table & hive, but there is no historical data beyond that
[16:48:25] <joal>	 gwicke: you said it - no historical data beyond that
[16:49:48] <gwicke>	 okay, just double checking that there is no other place to look for this kind of thing
[16:49:52] <gwicke>	 thanks!
[16:50:32] <joal>	 np gwicke 
[16:50:53] <joal>	 sorry for not having answered positively gwicke )
[17:09:07] * elukey off!
[17:43:29] <AndyRussG>	 joal: hi! how's it going? :) Can I query Druid directly? (I could get the same data from Hive, but if I can get it instantaneously and using less resources from Druid, it seems better)
[17:43:43] <AndyRussG>	 I recall seeing some doc somewhere, haven't found it again yet
[17:43:45] <AndyRussG>	 thx in advance!
[17:45:46] <ottomata>	 AndyRussG:  yes you can, but it is a json query language
[17:46:07] <ottomata>	 http://druid.io/docs/latest/querying/querying.html
[17:46:17] <ottomata>	 we are in the process of setting up an lvs balanced service, but you can hit any druid broker node
[17:46:19] <ottomata>	 so try
[17:46:22] <ottomata>	 druid1001.eqiad.wmnet 
[17:46:22] <ottomata>	 as the hostname
[17:46:30] <ottomata>	 oh and 8082 as the port
[17:54:04] <wikibugs_>	 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Make banner impression counts available somewhere public - https://phabricator.wikimedia.org/T115042#1713344 (10ksmith) @Nuria : Have you read the [[ https://docs.google.com/document/d/1R3G04PCe3xZAR2azWPzdWVO4vQQKraZUrlOMQidfh0w/edit...
[18:01:00] <AndyRussG>	 ottomata: cool fantastic! aaaaaand... Caaaan I do it from SWAP/Jupyter? /me writes letter to Santa, folds it into a paper airplane, and flies to the North Pole
[18:01:44] <AndyRussG>	 or, has anyone tried to do it from SWAP/Jupyter yet?
[18:02:53] <ottomata>	 AndyRussG:  i think you shoudl be able to
[18:03:06] <ottomata>	 but i don't know of anyone who has
[18:03:08] <ottomata>	 if you do, i would LOVE to see it
[18:03:12] <ottomata>	 that would be super cool
[18:06:11] <AndyRussG>	 ottomata: cool, will do! :)
[18:39:19] <icinga-wm>	 PROBLEM - HDFS active Namenode JVM Heap usage on analytics1001 is CRITICAL: CRITICAL: 62.71% of data above the critical threshold [3891.2]
[18:43:06] <elukey>	 hello an1001
[18:43:15] <joal>	 Arf :(
[18:43:37] <joal>	 ottomata: looks like I can't connect from notebook1001 to druid1001 on port 8082 / expected?
[18:43:45] <elukey>	 https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&from=now-24h&to=now
[18:43:59] <elukey>	 something seems to have changed in metrics from ~17:30 UTC onwards
[18:44:10] <elukey>	 nothing super horrible but enough to trip alarms
[18:44:35] <joal>	 elukey: Huge job running - Might be that?
[18:46:17] <elukey>	 could be yeah
[18:46:26] <joal>	 hm
[18:46:47] <joal>	 gwicke: by the way, not sure if you're aware - the hive queries you run are huge
[18:46:48] <ottomata>	 joal:  not expected
[18:46:51] <ottomata>	 i can make that connection
[18:47:24] <ottomata>	 joal ya it works fine for me from CLI
[18:47:33] <joal>	 ottomata: Will try agian
[18:47:45] <gwicke>	 joal: yeah, unfortunately we don't have live metrics for the action API
[18:48:21] <gwicke>	 or filtered logs for the api only
[18:48:58] <elukey>	 gwicke: sure, but not a good reason to make huge queries without letting us know first :)
[18:49:05] <gwicke>	 I'm only looking at the september numbers for action api vs. rest api
[18:49:32] <joal>	 gwicke: other way to think of it is doing daily queries instead of requesting september data multiple times over the month
[18:50:00] <gwicke>	 yeah, we have that for the rest api, but not the action api
[18:50:30] <joal>	 other question for you gwicke: /api/rest_v1/% is used in both text and upload ? 
[18:51:16] <gwicke>	 I think it's pretty much all text
[18:51:21] <gwicke>	 err, sorry
[18:51:27] <gwicke>	 rest api is text only
[18:51:45] <gwicke>	 not 100% sure about api.php
[18:51:58] <joal>	 gwicke: the query you run now is "uri_path like /api/rest_v1/%"
[18:52:14] <gwicke>	 yep
[18:52:35] <gwicke>	 is there a way to limit queries to the text cluster?
[18:52:38] <joal>	 asking that gwicke because we have a partition by webrequest_source (text, upload, misc) - half of it is text, the other one upload (misc is almost nothin)
[18:53:10] <joal>	 If you can add a filtering clause "AND webrequest = 'text'" you'll drop the query size by 2
[18:53:15] <joal>	 gwicke: --^
[18:53:15] <gwicke>	 so basically just add something like webrequest_source = "text" ?
[18:53:21] <joal>	 indeed
[18:53:34] <joal>	 hat would be a first and easy improvement :)
[18:53:55] <elukey>	 I'd also reduce the scope of the query and not use the whole month
[18:53:59] <gwicke>	 happy to add that, but the current query is at 78%
[18:54:02] <gwicke>	 worth canceling?
[18:54:07] <joal>	 gwicke: nope
[18:54:21] <gwicke>	 okay
[18:54:36] <joal>	 gwicke: however I second elukey : run queries every week on one week of data, it'll be faset on your side, and easier for he cluster :)
[18:54:48] <gwicke>	 not using the whole month means that we'd need to set up some kind of cron tool
[18:55:01] <gwicke>	 and then aggregate the daily counts
[18:55:10] <gwicke>	 (or weekly)
[18:55:14] <joal>	 gwicke: We use oozie for that, and other teams also use reportupdater
[18:55:23] <joal>	 gwicke: I think it's easy
[18:55:43] <joal>	 gwicke: lastly, I hope you won't run a full month query when september ends for real :)
[18:55:54] <gwicke>	 are there docs on how to set that up?
[18:56:22] <joal>	 gwicke: oozie is not super easy, I'd go for reportupdater
[18:56:23] <gwicke>	 the current queries are ad-hoc, for some slides Victoria is preparing
[18:56:35] <gwicke>	 ideally, we'd have action api metrics every couple minutes
[18:57:05] <gwicke>	 that would avoid the need for the manual querying, and would also give us historical data beyond three months, similar to the rest api
[18:58:18] <elukey>	 gwicke: what would be the issue of aggregate numbers in daily/weekly counts?
[18:58:24] <joal>	 gwicke: I thimnk what you're after is exactly what you have in graphite (from the job I see running every hour on he cluster)
[18:58:29] <elukey>	 (I am trying to understand the issue)
[18:58:51] <gwicke>	 joal: yeah, but that's only for the rest api
[18:59:00] <gwicke>	 we'd need the same for the action api
[18:59:02] <gwicke>	 so that we can compare
[18:59:07] <joal>	 gwicke: Isn't that wha you are currently querying ?
[18:59:13] <joal>	 Ah
[18:59:20] <gwicke>	 I'm querying both rest & action, sequentially
[18:59:37] <gwicke>	 to make sure that the methodology is comparable
[19:00:15] <joal>	 gwicke: Given the size of he query (you read ~35Tb of data), getting rest results out of graphite is certainly less easy but also less costly in term of computation
[19:00:20] <joal>	 hm
[19:00:32] <gwicke>	 I don't fully trust the rough "multiply avg request rate from graphite" thing, especially when we do that only for one of the numbers
[19:00:37] <joal>	 makes sense, but really 2 quesries over 1 month of webrequest is not nice for the cluster
[19:01:10] <ottomata>	 (joal coudl we add tags for those two things?  if we don't already?)
[19:01:20] <joal>	 ottomata: Yes, we will
[19:01:21] * elukey off again
[19:01:37] <joal>	 ottomata: aggrgation will stillbe needed post-tagging, but yes, we will
[19:01:43] <gwicke>	 joal: yeah, I'll definitely add the webrequest (or webrequest_source?) clause
[19:02:09] <gwicke>	 also, is there a way to count both in a single pass?
[19:02:13] <joal>	 gwicke: thank you
[19:02:18] <joal>	 gwicke: TRhere is
[19:02:55] <elukey>	 also please query less data in one go, it is important
[19:03:06] <elukey>	 we are currently getting alarms firing 
[19:03:26] <gwicke>	 hmm, from a single sequential query?
[19:04:06] <joal>	 gwicke: webrequest text only, for both action and rtest?
[19:04:21] <gwicke>	 is there something you can tune to avoid that setting off alarms?
[19:04:43] <joal>	 gwicke: We have alarms, but we're not sure it was triggered by your job
[19:04:50] <gwicke>	 joal: yes
[19:05:19] <joal>	 There currently are multiple big jobs running - it might be the all-of-them together triggering
[19:05:37] <joal>	 gwicke: last time you ran those type of queries it didn't fire
[19:05:40] <elukey>	 gwicke: sure there is, people collaborating and not making huge queries. I mean, what is really the problem of not querying an entire month of webrequest data?
[19:05:55] <joal>	 so I'm assuming you're not the culprit by yourself :)
[19:06:18] <elukey>	 joal: this time there are multiple big jobs, but gwicke's query is really big and by far the most resource consuming
[19:06:49] <elukey>	 again, we support all the users needs, we just ask to avoid querying too much data
[19:06:53] <elukey>	 in one go
[19:06:59] <gwicke>	 elukey: as far as I understand it, that would only really help if data was stored & incrementally rolled up
[19:07:37] <gwicke>	 just doing a loop from 1 to 31 in a shell would still touch the same data
[19:08:18] <joal>	 gwicke: doing it sequentially would prevent burst somewhat
[19:08:30] <joal>	 gwicke: but yes, we'd rather do it incrementally
[19:08:42] <elukey>	 gwicke: it wouldn't use all the containers that are currently allocated in once
[19:08:48] <elukey>	 but in small batches
[19:09:07] <elukey>	 so if the cluster is already busy, it will not get into alarms warning zone
[19:09:21] <elukey>	 and it wouldn't change much your results
[19:09:45] <gwicke>	 alternatively, is it possible to tell hadoop to be smarter about parallelism?
[19:09:51] <elukey>	 ....
[19:10:49] <elukey>	 need to go now sorry! Joseph let me know later on if the issue is still ongoing, I'll check after dinner (need to go out with some friends now)
[19:10:58] <gwicke>	 in the meantime, my query is all done
[19:12:20] <joal>	 good gwicke 
[19:12:22] <ottomata>	 hm, the alert was for JVM namenode heap size, which is interesting, and does not usually happen just for large queries
[19:12:29] <joal>	 What's the pattern for rest_action ?
[19:12:33] <ottomata>	 the cluster is smart about parallelism and assigning workers
[19:12:40] <ottomata>	 you could try using the 'nice' queue
[19:12:45] <ottomata>	 but i'm not totally sure if htat would help or not
[19:12:46] <elukey>	 ottomata: look at all the other metrics, they are all spiking from 17:30 UTC
[19:13:07] <elukey>	 anyhow, will check tomorrow :)
[19:13:08] <elukey>	 byyyeee
[19:13:08] <gwicke>	 joal: /api/rest_v1/%
[19:13:28] <joal>	 gwicke: this one is rest, action?
[19:13:47] <gwicke>	 rest_v1 ;)
[19:14:04] <joal>	 meeeeeh gwicke :)
[19:14:27] * joal no understanding
[19:14:56] <gwicke>	 rest
[19:15:06] <gwicke>	 action is /w/api.php%
[19:15:23] <joal>	 ok gwicke 
[19:17:13] <joal>	 gwicke: https://gist.github.com/jobar/f371ef9179699c41bf86fdbe1c493b9e
[19:17:22] <joal>	 gwicke: this gives you both results in one pass
[19:18:01] <gwicke>	 nice, thank you!
[19:18:13] <joal>	 gwicke: we plan to have smaller datasets to query for what you wish soon (we'll tag rows in webrequest and split the data)
[19:18:47] <joal>	 gwicke: the last part will then be to add final aggregations in order to be able to keep data long term
[19:19:17] <gwicke>	 updated https://wikitech.wikimedia.org/wiki/RESTBase#Analytics_and_metrics
[19:20:00] <gwicke>	 nod, that would be nice
[19:20:53] <gwicke>	 would it be hard to add a graphite metric for /w/api.php% requests, possibly even in the same job as the rest api one?
[19:20:57] <joal>	 gwicke: do you have results for rest_v1 for august?
[19:21:04] <joal>	 gwicke: supper easy
[19:21:20] <gwicke>	 that one runs every 30 minutes already
[19:21:34] <joal>	 gwicke: we already have a job doing so for rest_v1, adding action is easy
[19:21:53] <joal>	 gwicke: would you have number of request done to rest_v1 in augus?
[19:22:09] <gwicke>	 and powers the varnish part of https://grafana-admin.wikimedia.org/dashboard/db/api-summary?panelId=1&fullscreen&orgId=1
[19:22:22] <gwicke>	 approximately, from graphite
[19:22:27] <gwicke>	 via avg
[19:22:39] <joal>	 I think we could have exact numbers
[19:23:11] <gwicke>	 if we had live metrics for both action & rest api, I think that would be good enough for our purposes
[19:23:26] <gwicke>	 any small errors like packet loss would affect both metrics
[19:25:13] <joal>	 gwicke: I have monthly numbers from graphite for rest_v1 using summarize
[19:26:54] <gwicke>	 so far, I have 14.3 bn from hive, ~14.9bn via graphite avg
[19:27:38] <joal>	 gwicke: for september 01 to 26?
[19:27:47] <gwicke>	 yeah
[19:27:55] <gwicke>	 to now
[19:27:57] <gwicke>	 in both cases
[19:29:07] <wikibugs_>	 10Analytics: Add action api counts to graphite-restbase job - https://phabricator.wikimedia.org/T176785#3636597 (10JAllemandou)
[19:29:13] <joal>	 gwicke: --^
[19:29:29] <icinga-wm>	 RECOVERY - HDFS active Namenode JVM Heap usage on analytics1001 is OK: OK: Less than 60.00% above the threshold [3686.4]
[19:29:40] <joal>	 gwicke: --^ as well
[19:30:12] <gwicke>	 joal: cool, thank you for your help!
[19:30:47] <joal>	 gwicke: np, sorry for bothering, it was really big !
[19:30:54] <joal>	 my graphite number gwicke : 14293820969
[19:31:06] <wikibugs_>	 10Analytics-Tech-community-metrics, 10Developer-Relations (Jul-Sep 2017): Fix duplicated enrollments in database - https://phabricator.wikimedia.org/T176786#3636616 (10Aklapper)
[19:31:19] <gwicke>	 joal: that's a lot closer
[19:31:35] <gwicke>	 just summarize(metricname, "1month")?
[19:31:46] <joal>	 gwicke: correct :)
[19:31:50] <joal>	 gwicke: single stat mode
[19:31:54] <joal>	 to get a single number
[19:32:11] <joal>	 And, from=true
[19:32:27] <joal>	 actually gwicke :summarize(metricname, "1month", sum, true)?
[19:34:18] <gwicke>	 ahh, yes
[19:34:50] <gwicke>	 in a dashboard it shows up as "avg"
[19:34:53] <gwicke>	 ;)
[19:35:09] <joal>	 Didn't know gwicke :)
[19:37:10] <joal>	 ottomata: I get a 503 querying druid from paws
[19:37:25] <joal>	 ottomata: I think it's because it runs inside a docker
[19:39:41] <ottomata>	 that is probably it
[19:39:46] <ottomata>	 isolated network stuff
[19:39:51] <ottomata>	 dunno how madhu makes that magic work
[19:39:52] <joal>	 yup
[19:40:02] <ottomata>	 she gets it to connect to hive though...
[19:40:04] <joal>	 ottomata: I have no idea
[19:40:09] <joal>	 ottomata: very true !
[19:41:08] <ottomata>	 hmmmm joal can you try from notebook1002
[19:41:13] <ottomata>	 ?
[19:41:15] <joal>	 ottomata: mayve there is some special connection
[19:41:18] <ottomata>	 thye have slightly different configs
[19:41:22] <ottomata>	 and it looks like only 1002 has a hadoop client
[19:42:47] <joal>	 ottomata: I can access hive from notebook1001
[19:43:07] <joal>	 ottomata: issue is connecting to http://druid1001.eqiad.wmnet:8082
[19:48:40] <ottomata>	 huh werid
[19:48:58] <ottomata>	 it must be configured in the paws internal wheels deploy?
[19:49:01] <ottomata>	 i don't see anything in puppet for it
[19:51:47] <joal>	 weirdoh
[19:55:47] <wikibugs_>	 (03PS7) 10Joal: Update mediawiki-history-reduced oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379000 (https://phabricator.wikimedia.org/T174174)
[19:55:56] <joal>	 mforns, milimetric --^ if you have a minute
[19:56:17] <joal>	 I updated the docs and example queries, and also added a paragraph for us to remember having to add deletion drift
[19:56:22] <joal>	 cd ../../aqs
[19:56:24] <joal>	 oops
[19:56:36] <milimetric>	 Directory not found
[19:57:21] <joal>	 LOL milimetric 
[20:00:54] <joal>	 and now the aqs one as expected :)
[20:01:00] <wikibugs_>	 (03PS10) 10Joal: Add mediawiki-history-metrics endpoints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805)
[21:19:34] <wikibugs_>	 10Analytics-Tech-community-metrics, 10Developer-Relations (Jul-Sep 2017): Fix duplicated enrollments in database - https://phabricator.wikimedia.org/T176786#3637294 (10Aklapper) In theory done: ``` To https://github.com/Bitergia/mediawiki-identities.git    f44cdb9..50e1ccc  master -> master ``` In practice let...
[22:12:19] <icinga-wm>	 PROBLEM - HDFS active Namenode JVM Heap usage on analytics1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [3891.2]
[22:21:29] <icinga-wm>	 RECOVERY - HDFS active Namenode JVM Heap usage on analytics1001 is OK: OK: Less than 60.00% above the threshold [3686.4]
[22:49:53] <wikibugs_>	 (03PS1) 10GoranSMilovanovic: Fix Tabs/Crosstabs Init [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380892
[22:50:11] <wikibugs_>	 (03CR) 10GoranSMilovanovic: [V: 032 C: 032] Fix Tabs/Crosstabs Init [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380892 (owner: 10GoranSMilovanovic)
[22:50:17] <wikibugs_>	 (03Merged) 10jenkins-bot: Fix Tabs/Crosstabs Init [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380892 (owner: 10GoranSMilovanovic)