[07:27:20] Piwik is running with the new puppet config, all good \o/ [07:27:38] now the only part that it is left is mysql config and backup [07:27:59] ah and I have just re-pooled aqs1006 and aqs1009 [08:45:27] !log Restart Workflow webrequest-load-wf-maps-2017-4-28-1 [08:45:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:57:40] (03CR) 10Joal: [C: 031] "Looks good to me." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/349723 (https://phabricator.wikimedia.org/T143119) (owner: 10Milimetric) [08:58:19] thanks joal :) [08:58:27] didn't check the emails this morning :( [08:58:32] no probl elukey :) [08:58:48] in addition to me usually checking, it's my ops week ;) [09:00:38] joal: I have a permanent ops week assignment :P [09:00:43] huhuhu :) [09:01:20] Thanks elukey for having made thisd week's big mess work like a charm [09:01:36] elukey: raw move etc could have gone way more wronmg [09:03:01] well we all did a ton of things to make it relatively smooth :) [09:03:33] elukey: my feeling is that you did most of it :) [09:04:25] not feeling the same but thanks :) [09:17:43] joal: there is a big query from Filippo for a ops task in https://yarn.wikimedia.org/proxy/application_1492691387549_24204/mapreduce/job/job_1492691387549_24204 [09:17:50] let me know if it is not ok for you [09:18:13] elukey: reading [09:19:18] elukey: It's big, but it seems correct - Only improvement I see would be to aggregate results instead of keeping one line per matching event [09:20:01] elukey: Like that querying the new table would be faster since table would be smaller [09:20:20] okok :) [09:20:29] just wanted to make sure that wasn't super huge [09:20:35] elukey: it is [09:20:42] elukey: but done once, it's ok [09:50:35] joal: do you have a min for a data privacy/retention consult? [09:50:45] sure elukey [09:51:02] thanks! [09:52:03] So on oxygen we have a kafkatee instance that dumps on a file 5XX webrequest logs [09:52:19] this is extremely useful for ops when we need to figure out what is breaking [09:52:40] but as you can imagine using grep/awk/jq is really annoying during emergencies [09:53:18] so Filippo came up with the idea of pushing 5XX webrequests to logstash, in order to be able to use Kibana as dashboard [09:53:51] the solution is simple but I am wondering if there are any issues from the privacy/data-retention point of view [09:53:57] of having data in logstash/EL [09:54:04] (that is already protected by NDA) [09:54:09] not sure about the retention [09:54:18] but afaik it should be 90 days [09:54:34] elukey: it's a subset of webrequest, non anonymized: We should keep it more than 90 days [09:54:41] we should NOT sorry [09:55:33] nice, I wanted to double check before doing anything :) [09:56:38] sure [09:59:04] (03CR) 10Joal: "Another round of comments :)" (038 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346291 (https://phabricator.wikimedia.org/T161924) (owner: 10Joal) [10:27:48] (03PS4) 10Joal: [WIP] Update banner monthly job to reuse index [analytics/refinery] - 10https://gerrit.wikimedia.org/r/347653 (https://phabricator.wikimedia.org/T159727) [10:28:19] (03PS5) 10Joal: [WIP] Update banner monthly job to reuse index [analytics/refinery] - 10https://gerrit.wikimedia.org/r/347653 (https://phabricator.wikimedia.org/T159727) [10:48:34] (03PS6) 10Joal: Update banner monthly job to reuse index [analytics/refinery] - 10https://gerrit.wikimedia.org/r/347653 (https://phabricator.wikimedia.org/T159727) [11:09:31] taking a break a-team [11:25:00] * elukey lunch! [12:51:00] fdans: you wanna meet up in 40 minutes or so to start pairing on the scaffold? [12:51:28] @milimetric: sounds good! [13:11:28] joal: when you get back, could I chat with you for a few minutes about the sitelink query? [13:22:45] 06Analytics-Kanban, 06DC-Ops, 06Operations, 10ops-eqiad: analytics1030 stuck in console while booting - https://phabricator.wikimedia.org/T162046#3220724 (10Cmjohnson) @elukey Is this okay to power off? [13:25:54] helloooo [13:29:05] hey mforns [13:29:19] was trying to figure out where we're putting the code for the new wikistats [13:29:25] aha [13:29:51] options: stay on my github prototype repo, new github, new gerrit, branch of wikistats gerrit [13:29:52] milimetric: batcave? [13:29:59] yeah, omw [13:30:03] * fdans votes github pls [13:30:08] :] [13:30:31] heh, if we're voting you'll get outvoted [13:31:28] elukey: o/ [13:32:37] urandom: o/ [13:34:17] elukey: question: how do you feel about https://gerrit.wikimedia.org/r/#/c/350632 ? [13:34:29] elukey: is that something you'd rather postpone until monday? [13:35:15] urandom: nah we can do it [13:35:47] even if if you are going to apply it $everywhere it might be wise to avoid Friday :) [13:36:45] yeah, friday is why i was asking [13:36:51] and it would go out everywhere [13:37:56] asking also to Filippo [13:38:04] elukey: :) [13:38:58] elukey: i asked him, fwiw [13:40:14] he is ok :) [13:40:22] i posed it as "if it were OK with elukey" (since it'll hit aqs too), and he said "+1 if elukey is OK with it" (paraphrasing) [13:40:27] qq - did we deploy the collectoer everywhere ? [13:40:39] the jar is everywhere [13:40:47] all right I think we can go then [13:40:51] ready? [13:40:53] so, the worst-case is that someone it would break [13:40:58] and we'd lose metrics [13:41:07] and we'd just roll back to the old jar [13:41:11] yep yep [13:41:16] and of course blame you [13:41:20] so, yeah, ready! [13:41:23] :D [13:41:24] yes, yes, of course :) [13:41:25] merging [13:41:33] and The Curse Of Cassandra [13:41:43] and Java [13:43:29] urandom: merged! [13:43:41] sweet, let me canary it on a restbase node real quick [13:43:42] let's run puppet on a couple of nodes [13:43:47] super [13:47:37] elukey: fyi; it requires a manual restart of the collector, so it's only live on restbase1007 atm [13:47:52] yep yep, I say the #ops logs [13:55:05] elukey: ok, i'm going to restart the collector on all restbase nodes [13:58:19] super [13:58:22] let me know if you need help [13:59:35] 06Analytics-Kanban: Label mediawiki_history snapshots for the last month they include - https://phabricator.wikimedia.org/T163483#3220802 (10JAllemandou) a:03JAllemandou [14:00:29] milimetric, sent the invite [14:00:33] k [14:00:44] elukey: everything looks good; thanks! [14:00:45] mforns / joal: batcave? [14:01:03] milimetric: we're here: https://hangouts.google.com/hangouts/_/wikimedia.org/gather-metrics [14:13:34] 06Analytics-Kanban: Puppetize Piwik's Database and set up periodical backups - https://phabricator.wikimedia.org/T164073#3220845 (10elukey) [14:16:15] 06Analytics-Kanban, 13Patch-For-Review, 15User-Elukey: Piwik puppet configuration refactoring and updates - https://phabricator.wikimedia.org/T159136#3220875 (10elukey) [14:30:25] 06Analytics-Kanban, 10DBA: Puppetize Piwik's Database and set up periodical backups - https://phabricator.wikimedia.org/T164073#3220920 (10elukey) [14:35:19] 06Analytics-Kanban, 06DC-Ops, 06Operations, 10ops-eqiad: analytics1030 stuck in console while booting - https://phabricator.wikimedia.org/T162046#3220951 (10Ottomata) @Cmjohnson yes, it is already 'off' from our point of view :) [14:48:07] 06Analytics-Kanban, 10DBA, 06Operations: Puppetize Piwik's Database and set up periodical backups - https://phabricator.wikimedia.org/T164073#3220992 (10elukey) [14:49:17] 06Analytics-Kanban, 10DBA, 06Operations: Puppetize Piwik's Database and set up periodical backups - https://phabricator.wikimedia.org/T164073#3220845 (10elukey) @akosiaris: After a chat with Jaime I'd like to explore the possibility of using bacula, but I was told to double check with you requirements. Do yo... [14:51:30] 06Analytics-Kanban, 06DC-Ops, 06Operations, 10ops-eqiad: analytics1030 stuck in console while booting - https://phabricator.wikimedia.org/T162046#3221001 (10Cmjohnson) Failed to get through post, fails at initializing idrac that eventually times out and tries again. Most likely a system board replacement... [14:52:18] 10Analytics-Cluster, 06Analytics-Kanban, 06Operations, 15User-Elukey: Reimage the Hadoop Cluster to Debian Jessie - https://phabricator.wikimedia.org/T160333#3221004 (10elukey) analytics1003 was done, so to complete the work we'd need to reimage: * stat100[23] * analytics1030 (down for maintenance) All t... [14:52:33] kafka1018 and kafka1020 have ferm stopped, is that intentional? [14:52:41] or fallout from the row D move? [14:53:20] 10Analytics-Cluster, 06Analytics-Kanban, 06Operations, 15User-Elukey: Reimage the Hadoop Cluster to Debian Jessie - https://phabricator.wikimedia.org/T160333#3221010 (10Ottomata) We dont' need to reimage stat100[23]. They should be decommed this quarter. [14:53:31] 06Analytics-Kanban, 06DC-Ops, 06Operations, 10ops-eqiad: analytics1030 stuck in console while booting - https://phabricator.wikimedia.org/T162046#3221011 (10Cmjohnson) Also, receive a message that bbu is discharged [14:55:15] 06Analytics-Kanban, 15User-Elukey: Improve purging for analytics-slave data on Eventlogging - https://phabricator.wikimedia.org/T156933#3221021 (10elukey) [14:56:35] Hi schana - I've been in meeting so far, now again for 1/2h, then I'm away for ~1h - We can spend some time around 7pm CEST ? [14:56:47] fdans: http://danielkummer.github.io/git-flow-cheatsheet/ [14:56:48] perfect, joal [14:56:51] mforns: ^ [14:57:13] hey? [14:57:21] oh ok [15:02:19] fdans: standup? [15:02:36] omw [15:13:24] 06Analytics-Kanban, 10DBA, 06Operations: Puppetize Piwik's Database and set up periodical backups - https://phabricator.wikimedia.org/T164073#3221132 (10akosiaris) Depends on how often you want it backed up and the rate of growth. So mysql needs to be dumped in some way before it is backed up as backing up... [15:23:47] 06Analytics-Kanban, 10DBA, 06Operations: Puppetize Piwik's Database and set up periodical backups - https://phabricator.wikimedia.org/T164073#3220845 (10jcrespo) I agree with most of things said, and I actually mentioned some of those to luka on IRC. BTW, for the record- the best way to move forward regardi... [15:27:31] fdans: so I initialized git flow on that repo, send me your remote when you're set up, mforns you too. Then we can organize the plan into feature branches and get goin [16:10:36] milimetric, thanks, sent it [16:14:52] going offline people, byeeeee o/ [17:03:12] joal: ping? [17:25:19] schana - Here now [17:25:35] I'm in the calendar hangout [17:55:14] Hi milimetric - Do you have a minute to say Hello to Lino ? [17:55:47] yes! to th bc [20:32:33] (03PS1) 10Milimetric: Revert "Add dty.wikipedia to pageview whitelist" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/350898 [20:33:10] (03Abandoned) 10Milimetric: Revert "Add dty.wikipedia to pageview whitelist" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/350898 (owner: 10Milimetric) [20:38:00] (03PS1) 10Milimetric: Fix dv.wikipedia added date [analytics/refinery] - 10https://gerrit.wikimedia.org/r/350900 [20:39:33] (03CR) 10Milimetric: [V: 032 C: 032] Fix dv.wikipedia added date [analytics/refinery] - 10https://gerrit.wikimedia.org/r/350900 (owner: 10Milimetric)