[03:09:42] (03PS1) 10GoranSMilovanovic: Semantics Dashboard [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/386121 [03:09:59] (03CR) 10GoranSMilovanovic: [V: 032 C: 032] Semantics Dashboard [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/386121 (owner: 10GoranSMilovanovic) [03:27:16] (03PS1) 10GoranSMilovanovic: fix Overview Dashboard [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/386122 [03:27:38] (03CR) 10GoranSMilovanovic: [V: 032 C: 032] fix Overview Dashboard [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/386122 (owner: 10GoranSMilovanovic) [09:40:57] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: Possibly faulty BBU on analytics1029 - https://phabricator.wikimedia.org/T178742#3705846 (10elukey) [09:42:18] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: Possibly faulty BBU on analytics1029 - https://phabricator.wikimedia.org/T178742#3701391 (10elukey) p:05Triage>03Normal Tried to force a learn cycle again, not much joy.. ``` elukey@analytics1029:~$ sudo megacli -AdpBbuCmd -a0 BBU status for Ada... [09:43:06] 10Analytics-Kanban, 10User-Elukey: Add a prometheus metric exporter to all the Druid daemons - https://phabricator.wikimedia.org/T177459#3705849 (10elukey) [09:44:21] 10Analytics-Kanban, 10User-Elukey: Add a prometheus metric exporter to all the Druid daemons - https://phabricator.wikimedia.org/T177459#3659740 (10elukey) p:05Triage>03High a:03elukey [09:57:05] heloooo [10:07:38] 10Analytics-Kanban, 10User-Elukey: Test and possibly raise the Xmx/Xms settings for the Hadoop Yarn Namenode and HDFS datanode daemons - https://phabricator.wikimedia.org/T178876#3705888 (10elukey) [10:08:59] o/ [10:13:54] joal: o/ whenever you have time would you mind to check these numbers https://gerrit.wikimedia.org/r/#/c/386147/1/hieradata/hosts/analytics1030.yaml ? :) [10:29:20] * elukey lunch! [10:40:43] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 2 others: Set up ChangeProp for JobQueue in beta - https://phabricator.wikimedia.org/T178881#3705976 (10mobrovac) [12:17:25] 10Analytics, 10Analytics-Wikistats: German derivative of Wikistats report shows marked difference for new editors in Aug vs Sep - https://phabricator.wikimedia.org/T178891#3706248 (10Erik_Zachte) [13:08:31] 10Analytics, 10Analytics-Wikistats: German derivative of Wikistats report shows marked difference for new editors in Aug vs Sep - https://phabricator.wikimedia.org/T178891#3706382 (10Erik_Zachte) Yes the basic principle has changed a bit, albeit longer ago, start 2017. In preparation for Wikistats-2 one Wiki... [13:10:25] hii elukey :) [13:11:39] ottomata: o/ [13:11:56] I am going to merge https://gerrit.wikimedia.org/r/#/c/385173/ now, will be back in 10 mins if nothing goes on fire :D [13:12:13] ok, lemme know when you got some prometheus foo time for me :) [13:21:47] suuuuper smooth! all went fine, big refactoring landed without issues :) [13:28:02] ottomata: I have time now! [13:30:08] great! [13:30:14] batcave elukey? [13:32:52] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3706417 (10elukey) Next steps: 1) Create unit files and systemd config for eventlogging_sync.sh and add the guards in puppet to allow trusty/stre... [13:33:33] ottomata: sure [13:34:05] k am there [14:02:18] mforns: o/ [14:02:23] helo elukey :] [14:02:30] sup [14:02:38] I added the next steps to bring db1108 alive in https://phabricator.wikimedia.org/T177405#3706417 [14:02:57] today I merged the big puppet patch to refactor the roles etc.. [14:03:06] now I only need to add systemd config [14:03:15] and we should be ready to bring it alive :D [14:15:36] 10Analytics, 10Patch-For-Review, 10User-Elukey: Move away from jmxtrans in favor of prometheus jmx_exporter - https://phabricator.wikimedia.org/T175344#3706529 (10Ottomata) Heya @fgiunchedi, https://gerrit.wikimedia.org/r/#/c/386190/ adds a default jmx exporter config file, which actually let's jmx exporter... [14:24:44] elukey, that is amazing :D [14:54:01] Hi joal, are you in? [14:54:15] Hi Shilad - I am [14:54:26] Shilad: but we'll be in standup in minutes [14:54:45] Got it. Thanks again for the code review. I made the revisions. Things are remarkably cleaner now. [14:54:46] I'll have time in 1h05 Shilad [14:55:40] That's all I have! [14:56:18] No prob for CR Shilad, it's my pleasure to try to help :) [14:56:33] I'll read your code again and continue to comment [14:56:47] Talk soon Shilad, thanks for the heads-up [14:57:11] Thank you! [14:58:54] 10Analytics-Kanban: Archive tables to hadoop: MobileWikiAppToCInteraction_10375484_15423246 and Edit_13457736_15423246 - https://phabricator.wikimedia.org/T177960#3706744 (10Nuria) Edit_13457736_15423246 can also be dropped [15:00:35] ping joal elukey [15:00:42] ping fdans [15:00:53] sorry! [15:01:40] joining sorry [15:03:32] 10Analytics-Kanban: Backup some files from HDFS with checksumming on/after copy - https://phabricator.wikimedia.org/T177224#3706757 (10Nuria) [15:06:08] 10Analytics-Kanban, 10Patch-For-Review: Fix banner activity success file cleaner to allow for email alerts - https://phabricator.wikimedia.org/T178302#3706766 (10Nuria) [15:29:05] 10Analytics-Kanban: Archive tables to hadoop: MobileWikiAppToCInteraction_10375484_15423246 and Edit_13457736_15423246 - https://phabricator.wikimedia.org/T177960#3706905 (10elukey) Sanity check: ``` 0: jdbc:hive2://analytics1003.eqiad.wmnet:100> select count(*) from edit_13457736_15423246; INFO : Compiling c... [16:17:16] wikimedia/mediawiki-extensions-EventLogging#707 (wmf/1.31.0-wmf.5 - d19bd44 : James D. Forrester): The build has errored. [16:17:16] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/compare/wmf/1.31.0-wmf.5 [16:17:16] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/292164067 [16:26:05] nuria_: tables dropped! [16:26:26] 10Analytics-Kanban: Archive tables to hadoop: MobileWikiAppToCInteraction_10375484_15423246 and Edit_13457736_15423246 - https://phabricator.wikimedia.org/T177960#3707104 (10elukey) All tables dropped! [16:26:41] moved --^ to done [16:26:57] elukey: all right [16:33:23] elukey: so glad! [16:34:22] nuria_: thanks a lot ! [17:04:33] ottomata: forgot to ask at standup: I'm assuming we wait for an answer from legal before deleting old data, right? [17:07:06] joal: if you have time https://gerrit.wikimedia.org/r/#/c/386147/ [17:07:34] elukey: did it a while ago ;) [17:08:16] elukey: just forgot to ping back [17:09:34] ahh okok :) [17:09:36] thanks :) [17:09:40] will do it tomorrow [17:09:56] no prob [17:10:56] joal: yes [17:11:02] joal yes [17:16:23] * elukey off! [17:33:38] nuria_, ping? [18:50:15] joal: i was going to stop/restart jobs [19:07:47] nuria_: unique-devices ones? [19:07:52] joal: yes [19:08:20] ok for me [19:08:41] those are druid-loading on pre-computed data - very small, no risk :) [19:17:57] nuria_: do you need help? [19:18:23] joal: i think i can do it, i am in the middle of something but can get to it in 15 mins [19:18:48] nuria_: ok - I'll wait :) [19:26:37] joal: ok back! [19:26:42] joal: stoping hue jobs [19:28:41] nuria_: can you please log in the chan as well :) [19:33:47] joal: yes [19:33:55] joal: boy hue is slow [19:34:09] yes, I've seen that as well - I wonder why though :( [19:34:44] ottomata: in csae you're around - it seems we're experiencing something weird with hue ... Super ultra slow ... [19:36:11] joal: so do i kill jobs from ui (oozie job kill ) [19:36:15] joal: https://hue.wikimedia.org/oozie/list_oozie_coordinator/0066771-170829140538136-oozie-oozi-C/ [19:36:33] joal: or is there a kill job button that i am not seeing? [19:37:13] nuria_: normally there are kill jobs buttons on hue, but I can't access it now [19:37:28] nuria_: in those cases (hue not accessible) I use oozie CLI [19:37:35] nuria_: not super fun, but workds [19:37:54] joal: ya, i do that always as hue is normally slow [19:38:00] :D [19:38:04] ottomata: ping us when you are back [19:38:17] I wonder if ottomata did something ot not - seems solved :) [19:39:06] pingggg [19:39:13] eh i have done nothing! [19:39:15] in hue nuria_ I kill jobs with the button on bottom left (kill [19:39:21] ok - awesome [19:39:33] ottomata: as usual, just mentionning your name, systems get afraid :) [19:40:49] info [19:44:03] !log killing druid coordinators uniques-monthly and per-project-family: 0066771-170829140538136-oozie-oozi-C,0066767-170829140538136-oozie-oozi-C,0010139-170621131133576-oozie-oozi-C [19:44:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:10:56] joal: i start jobs with sudo -u hdfs right? [20:11:04] yes m'dmae [20:11:44] it should work with your user, since you're not writing data onto hdfs in places where you don't have rights, but we prefer o run prod jobs as hdfs [20:13:35] joal: ok [20:13:40] https://www.irccloud.com/pastebin/GjPIiXXP/ [20:13:46] doe sthis look good? [20:14:14] * joal reads [20:16:15] nuria_: error in the sub-command of the refinery_directory parameter - folder should say /wmf/refinery/2017*, not /wmf/refinery/2016* [20:16:20] except from that, looks good [20:16:25] ah yeah [20:17:55] joal: ok, started let me see how does this work before i start the others [20:18:03] sure nuria_ [20:19:18] nuria_: so far so good, druid indexation for 2016-01 has started :) [20:21:24] joal: ok, let me start one of teh perfamily ones [20:21:29] *the per family [20:23:27] !log restarted job uniques-monthly-per-domain-druid 0102785-170829140538136-oozie-oozi-C [20:23:35] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:24:20] nuria_: https://pivot.wikimedia.org/#unique_devices_per_domain_monthly [20:24:23] :) [20:24:26] !log restarted job unique_devices-per_project_family-druid-monthly-coord 0102799-170829140538136-oozie-oozi-C [20:24:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:24:41] joal: WOW super fast [20:24:43] :) [20:24:53] nuria_: only 2 month done, but yes, raltively fast :) [20:25:04] joal: shoudl we delete the old data now too? [20:25:15] nuria_: we can do that [20:25:27] joal: ok let me restart the third job [20:27:07] nuria_: error in start date for the per_project_family_monthly one - It should have started in April 2017 [20:27:15] and it seems to have started for june only [20:27:37] joal: argh, i had that wrong on sec [20:29:13] joal: i can do another job that only runs for 2 months no? with -Dend_date=2016-06-01 [20:29:18] correct [20:29:41] !log started unique_devices-per_project_family-druid-daily-coord 0102816-170829140538136-oozie-oozi-C [20:29:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:29:46] nuria_: same for per_project_family_daily [20:29:56] joal: ya same mistake in both [20:30:13] ok :) [20:31:40] joal: stop_time not end_time, let me fix [20:37:04] joal: ok, will baby sit jobs [20:37:16] they look good to me :) [20:37:35] they should finish soon-ish (longer for daily ones, obvisouly) [20:38:20] !Log added indexing jobs to process couple months missed on start date of "per project family " jobs: 0102849-170829140538136-oozie-oozi-C, 0102849-170829140538136-oozie-oozi-C [20:38:49] joal: the catchup jobs should finish quick [20:38:56] yup [20:39:03] monthly already did [20:39:44] joal: after this is ongoin, i will 1) disable datasources on admin ui 2) delete datasources with: curl -X DELETE [20:39:50] joal: correct? [20:39:59] almost :) [20:40:04] joal: ajajajq [20:40:06] ok [20:40:12] the 2 steps you are mentionning are actually the same [20:40:12] ahem... what am i missing? [20:40:25] disabling in UI = curl -X DELETE [20:40:33] joal: i had .. some.. faint..recollection... [20:40:43] that we needed to disable before deleting? [20:40:55] To remove deep-storage segments: curl -X post {kill-task} [20:41:22] We indeed need to disable - but whether you do it on UI or with curl -X DELETE is the same [20:41:37] trhen you want to drop deep storage posting kill-tasks to overlord [20:41:38] joal: ah wait, curl -X delete just "diables" [20:41:42] *disables [20:41:43] yes [20:43:14] nuria_: Just had a look at our doc in wikistats - It's not super evident that it;s he same, but the two steps are mentionned [20:43:32] in wikitech, not wikistats, obviously hem [20:44:01] joal: ok, i was looking at my notes [20:53:34] (03PS3) 10Joal: [FUN] AQS for druid only [analytics/aqs] - 10https://gerrit.wikimedia.org/r/384113 [20:54:57] joal: for some definition of fun, sure [20:55:03] :D [21:05:40] (03PS1) 10Jgreen: fix missing " in kafkatee.upstart [analytics/kafkatee] - 10https://gerrit.wikimedia.org/r/386277 [21:15:49] nuria_: I'm gonna go to sleep, excep if you need help :) [21:20:32] Gone, then :) See you tomorrow team [21:45:30] (03CR) 10Ottomata: [V: 032 C: 032] fix missing " in kafkatee.upstart [analytics/kafkatee] - 10https://gerrit.wikimedia.org/r/386277 (owner: 10Jgreen) [23:27:19] 10Analytics-Kanban: Archive tables to hadoop: MobileWikiAppToCInteraction_10375484_15423246 and Edit_13457736_15423246 - https://phabricator.wikimedia.org/T177960#3708517 (10Nuria) 05Open>03Resolved [23:27:31] 10Analytics-Kanban: Backup some files from HDFS with checksumming on/after copy - https://phabricator.wikimedia.org/T177224#3708518 (10Nuria) 05Open>03Resolved [23:28:10] 10Analytics-Kanban, 10Patch-For-Review: Fix banner activity success file cleaner to allow for email alerts - https://phabricator.wikimedia.org/T178302#3708521 (10Nuria) 05Open>03Resolved [23:28:24] 10Analytics-Kanban, 10Patch-For-Review: Fix MediaWiki snapshot cleaner cron job - https://phabricator.wikimedia.org/T178256#3708522 (10Nuria) 05Open>03Resolved [23:28:37] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Eventlogging refine popups, temporary cron - https://phabricator.wikimedia.org/T177783#3708524 (10Nuria) 05Open>03Resolved [23:30:28] 10Analytics-Kanban, 10Patch-For-Review: Add mediawiki-history metrics to AQS - https://phabricator.wikimedia.org/T175805#3708530 (10Nuria) 05Open>03Resolved [23:30:40] 10Analytics-Cluster, 10Analytics-Kanban, 10monitoring, 10Patch-For-Review, 10User-Elukey: Use Prometheus for Kafka JMX metrics instead of jmxtrans - https://phabricator.wikimedia.org/T175922#3708533 (10Nuria) [23:30:42] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Port Kafka alerts from check_graphite to check_prometheus - https://phabricator.wikimedia.org/T175923#3708532 (10Nuria) 05Open>03Resolved [23:31:00] 10Analytics-Kanban, 10Analytics-Wikistats: Implement pageview metric in Wikistats UI - https://phabricator.wikimedia.org/T163817#3708535 (10Nuria) [23:31:02] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Implement Topic Selector Widget - https://phabricator.wikimedia.org/T167676#3708534 (10Nuria) 05Open>03Resolved [23:47:36] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Alpha Release: Breakdown selection disapears when you change timespam - https://phabricator.wikimedia.org/T177646#3708580 (10Nuria) 05Open>03Resolved [23:47:48] 10Analytics-Kanban, 10Patch-For-Review: Add link to footer of wikistats with "file a bug" - https://phabricator.wikimedia.org/T177642#3708582 (10Nuria) 05Open>03Resolved [23:48:17] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0 UI second deployment/iteration - https://phabricator.wikimedia.org/T170460#3708584 (10Nuria) [23:48:19] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats2 bugs (4/4) - Detail page - https://phabricator.wikimedia.org/T170940#3708583 (10Nuria) 05Open>03Resolved [23:48:37] 10Analytics-Kanban, 10Analytics-Wikistats: Productionise list view - https://phabricator.wikimedia.org/T175265#3708585 (10Nuria) 05Open>03Resolved [23:49:06] (03PS1) 10GoranSMilovanovic: Semantics - t-SNE Maps [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/386323 [23:49:20] (03CR) 10GoranSMilovanovic: [V: 032 C: 032] Semantics - t-SNE Maps [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/386323 (owner: 10GoranSMilovanovic)