[06:18:31] PROBLEM - Check the last execution of mediawiki-history-drop-snapshot on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit mediawiki-history-drop-snapshot https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:18:56] good morning an-coord :D [06:41:59] Oct 15 06:15:18 an-coord1001 refinery-drop-mediawiki-snapshots[76647]: File "/srv/deployment/analytics/refinery/python/refinery/hive.py", line 287, in partition_datetime_from_spec [06:42:03] Oct 15 06:15:18 an-coord1001 refinery-drop-mediawiki-snapshots[76647]: if isinstance(regex, basestring): [06:42:06] Oct 15 06:15:18 an-coord1001 refinery-drop-mediawiki-snapshots[76647]: NameError: name 'basestring' is not defined [06:42:13] again python3 migration issues :( [06:50:52] (03PS1) 10Elukey: hive.py: avoid using 'basestring' in python3 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/543030 [07:00:01] RECOVERY - Check the last execution of mediawiki-history-drop-snapshot on an-coord1001 is OK: OK: Status of the systemd unit mediawiki-history-drop-snapshot https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:44:04] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Create test Kerberos identities/accounts for some selected users in hadoop test cluster - https://phabricator.wikimedia.org/T212258 (10elukey) [07:48:32] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Create test Kerberos identities/accounts for some selected users in hadoop test cluster - https://phabricator.wikimedia.org/T212258 (10elukey) @Isaac thanks a lot! I just created a kerberos account for you, you should have an email with you tmp password. You c... [08:53:58] brb [09:49:41] (03CR) 10Mforns: Add hive query for wmcs edits (032 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/543008 (https://phabricator.wikimedia.org/T232671) (owner: 10Srishakatux) [09:54:33] (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/543030 (owner: 10Elukey) [09:56:59] mforns: o/ [09:57:05] 10Analytics, 10Analytics-Kanban: Data Quality Alarms - https://phabricator.wikimedia.org/T198986 (10mforns) [09:57:09] (03CR) 10Elukey: [V: 03+2 C: 03+2] hive.py: avoid using 'basestring' in python3 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/543030 (owner: 10Elukey) [10:00:50] 10Analytics, 10Analytics-Kanban: Add data quality metric: distribution of eventlogging user agents - https://phabricator.wikimedia.org/T234588 (10mforns) [10:03:10] 10Analytics: Add data quality metric: distribution of page view article titles - https://phabricator.wikimedia.org/T235483 (10mforns) [10:04:11] 10Analytics, 10Analytics-Kanban, 10Research: Add data quality metric: traffic variations per country - https://phabricator.wikimedia.org/T234484 (10mforns) [10:05:15] 10Analytics: Add data quality metric: distribution of page view article titles - https://phabricator.wikimedia.org/T235483 (10mforns) [10:05:18] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Coarse alarm on data quality for refined data based on entrophy calculations - https://phabricator.wikimedia.org/T215863 (10mforns) [10:16:37] 10Analytics, 10Analytics-Kanban: Hive data quality alarms pipeline - https://phabricator.wikimedia.org/T235486 (10mforns) [10:19:22] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Proof of concept: Entropy calculations can be used to alarm on anomalies for data quality metrics - https://phabricator.wikimedia.org/T215863 (10mforns) [10:20:57] * elukey lunch! [10:22:50] 10Analytics, 10Analytics-Kanban: Alarming scripts for entropy alarms. Anomaly detection and reporting. - https://phabricator.wikimedia.org/T227357 (10mforns) [10:22:53] 10Analytics, 10Analytics-Kanban: Hive data quality alarms pipeline - https://phabricator.wikimedia.org/T235486 (10mforns) [10:23:34] 10Analytics, 10Analytics-Kanban: Hive data quality alarms pipeline - https://phabricator.wikimedia.org/T235486 (10mforns) [10:23:36] 10Analytics, 10Analytics-Kanban: Data Quality Alarms - https://phabricator.wikimedia.org/T198986 (10mforns) [10:24:09] 10Analytics, 10Analytics-Kanban: Alarming scripts for entropy alarms. Anomaly detection and reporting. - https://phabricator.wikimedia.org/T227357 (10mforns) [10:24:13] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Proof of concept: Entropy calculations can be used to alarm on anomalies for data quality metrics - https://phabricator.wikimedia.org/T215863 (10mforns) [10:25:35] 10Analytics, 10Analytics-Kanban, 10Research: Add data quality metric: traffic variations per country - https://phabricator.wikimedia.org/T234484 (10mforns) [10:25:37] 10Analytics, 10Analytics-Kanban: Data Quality Alarms - https://phabricator.wikimedia.org/T198986 (10mforns) [10:25:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Proof of concept: Entropy calculations can be used to alarm on anomalies for data quality metrics - https://phabricator.wikimedia.org/T215863 (10mforns) [10:25:48] 10Analytics, 10Analytics-Kanban: Add data quality metric: distribution of eventlogging user agents - https://phabricator.wikimedia.org/T234588 (10mforns) [10:25:50] 10Analytics, 10Analytics-Kanban: Data Quality Alarms - https://phabricator.wikimedia.org/T198986 (10mforns) [10:25:52] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Proof of concept: Entropy calculations can be used to alarm on anomalies for data quality metrics - https://phabricator.wikimedia.org/T215863 (10mforns) [10:25:58] 10Analytics: Add data quality metric: distribution of page view article titles - https://phabricator.wikimedia.org/T235483 (10mforns) [10:26:00] 10Analytics, 10Analytics-Kanban: Data Quality Alarms - https://phabricator.wikimedia.org/T198986 (10mforns) [10:26:02] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Proof of concept: Entropy calculations can be used to alarm on anomalies for data quality metrics - https://phabricator.wikimedia.org/T215863 (10mforns) [10:29:08] 10Analytics, 10Analytics-Kanban: Add data quality metric: distribution of eventlogging user agents - https://phabricator.wikimedia.org/T234588 (10mforns) What's missing in this task is to decide which of the 4 (one or more) metrics considered for the proof of concept of entropy-based quality metrics are we goi... [10:30:12] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Proof of concept: Entropy calculations can be used to alarm on anomalies for data quality metrics - https://phabricator.wikimedia.org/T215863 (10mforns) I still need to post the results of the proof of concept to Wikitech. [10:42:50] !log backfilling January 1st 2015 for mediarequests per referer daily, proceeding with all days until May 2019 if successful [10:42:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:20:30] 10Analytics, 10MinervaNeue, 10Performance-Team (Radar), 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): MinervaClientError sends malformed events - https://phabricator.wikimedia.org/T234344 (10pmiazga) @JoeWalsh is it something Product Infrastructure could look at ? [11:22:22] fdans: o/ [11:22:30] hellooo [11:22:44] qq - in ~1h there is a hadoop maintenance that requires to stop the masters [11:22:52] will it impact your backfill? [11:23:57] elukey: yes but that's alright, I'll just restart the failed jobs [11:24:06] I'm closely monitoring them as they run [11:24:17] alarms will come to my inbox so that's fine too [11:25:08] okok [11:25:24] !log stop all systemd timers on an-coord1001 as prep step for hadoop maintenance [11:25:25] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:27:13] just stopped all the jobs starting from the an coordinator, including camus [11:27:24] the maintenance will start in ~1h [11:30:29] elukey: I'm done traumatizing students :D [11:34:20] 10Analytics, 10Product-Analytics: Create a reports directory under analytics.wikimedia.org - https://phabricator.wikimedia.org/T235494 (10Neil_P._Quinn_WMF) [11:41:54] joal: :D [11:43:38] the cluster is reasonably empty except Fran's oozie job [11:44:03] elukey: It'll be good to have all those empty spark drivers killed ;) [11:45:28] joal: not now though right? [11:47:26] elukey: as part of the restart [11:47:58] elukey: oh, mileading thought from me - They'll stay! [11:57:48] fdans: as part of the procedure, we'll need to enable HDFS safe mode, so hdfs will go read only [11:57:55] I think that we should stop your job [11:58:02] elukey: sounds good, killing [11:58:06] or suspend? [11:58:16] fdans: suspend should be enough (at coord level) [11:58:46] joal: elukey coordinator suspended [11:59:25] Thanks fdans :) [11:59:39] thank you! [12:00:17] joal: select queries on hive are fine? I'm testing some stuff [12:01:03] when we start it is better not to issue any query etc.. please :) [12:01:12] it should last 10/15 mins hopefully [12:03:38] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Wikimedia-Portals: Sudden drop in WikipediaPortal events - https://phabricator.wikimedia.org/T234461 (10Pcoombe) Thanks @Nuria ! [12:07:33] ok hosts downtimed [12:07:38] puppet disabled on the hadoop masters [12:10:32] elukey: understood! [12:11:05] excuse me fdans -I missed your ping [12:11:13] nono all good [12:15:32] joal: there is only one select ongoing, then we can start? [12:15:55] finished :) [12:16:32] ok starting maintenance [12:16:49] \o/ [12:21:41] safe mode on, saved namespace [12:21:47] stopping yarn and hdfs on 1002 [12:23:26] stopping on 1001 [12:25:28] starting 1001 again [12:30:06] starting 1002 [12:33:00] ok we should be back in the game, datanodes are starting to collaborate [12:34:00] elukey@an-master1001:~$ sudo -u hdfs /usr/bin/hdfs haadmin -getServiceState an-master1001-eqiad-wmnet [12:34:03] active [12:34:05] elukey@an-master1001:~$ sudo -u hdfs /usr/bin/hdfs haadmin -getServiceState an-master1002-eqiad-wmnet [12:34:08] standby [12:34:11] elukey@an-master1001:~$ sudo -u hdfs /usr/bin/yarn rmadmin -getServiceState an-master1001-eqiad-wmnet [12:34:14] active [12:34:16] elukey@an-master1001:~$ sudo -u hdfs /usr/bin/yarn rmadmin -getServiceState an-master1002-eqiad-wmnet [12:34:19] standby [12:34:22] joal: --^ [12:35:30] elukey: you rock my friend, as usual :) [12:36:00] let's wait a sec, the hdfs master still thinks that there are a ton of under replicated blocks sigh [12:36:15] :/ [12:42:47] joal: I think that it might be due to https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/540149/2/hieradata/common.yaml [12:42:57] it was in the default rack [12:43:04] and this is the first restart of the namenodes [12:43:14] from the NN logs I can see a lot of reshuffling [12:43:22] hm - right - it needs to move stuff around to enforce rack-dispersion [12:43:26] good catch elukey [12:45:31] it will take a couple of hours I think [12:45:35] but the no of files is the same [12:45:41] elukey: haha! spark drivers gone! Actually I was right but for reasons I had not thought about :) [12:45:42] and the other metrics look healthy [12:46:31] !log moved hadoop cluster to new zookeeper cluster [12:46:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:46:46] elukey: let's restart timers? [12:47:46] joal: I'd prefer to wait a bit to watch metrics, say 10/20 mins, is it ok? [12:47:52] no problem :) [12:48:08] elukey: shall I do some manual testing? [12:49:08] sure! [12:49:13] the druid coordinators seem happy [12:51:42] elukey: you total sauceboss :) [12:53:08] ahaha thanks :D [12:53:42] fdans: does this have a link to mforns mayonese-flight? [12:54:34] joal: ha, not really, i think it's an old school expression I heard from someone sometime [12:55:09] ah ok :) I think it's great to have a mayonese-flighter and a sauceboss both in our team ) [12:56:17] elukey: whenever you can let me know if i can resume backfilling :) [12:57:04] fdans: please proceed so we'll see if everything looks ok :) [12:57:31] wololo! [12:57:34] * fdans resuming [12:57:59] !log resumed backfilling of mediarequests per referer daily [12:58:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:05:40] joal: added https://grafana.wikimedia.org/d/000000585/hadoop?panelId=93&fullscreen&orgId=1 [13:06:04] so we have a good comparison between blocks underreplicated/missing/etc.. with their total number [13:06:38] hm - I don't get it elukey :( [13:06:59] I added the total number of blocks actually used by our data [13:07:18] Ah! Makes sense :) [13:08:10] elukey: the thing that doesn't is https://grafana.wikimedia.org/d/000000585/hadoop?panelId=41&fullscreen&orgId=1 [13:09:21] mgerlach: Hello :) Have my comments on the task about new-editors make sense? [13:09:43] * joal hides for bad conjugation [13:10:20] joal: doesn't ? :D [13:10:24] not getting sorry [13:10:54] my idea is basically to know if 66k blocks under replicated are a lot a few etc.. compared to the overall number [13:11:09] just to get an idea from the graphs about the problem [13:11:13] elukey: the number of "Under-replicated-blocks" is the same as the number of blocks [13:11:21] in my view [13:11:28] can you refresh? [13:11:37] I just did, way better thanks :) [13:11:38] you may have an outdated view [13:11:41] ahhh okok :) [13:11:46] sorry :) [13:12:05] nono my bad! [13:12:21] indeed, with 50K blocks to fix, it feels like the rack-change thing [13:12:26] elukey: --^ [13:15:17] all right glad that we triple checked :) [13:15:56] !log re-enable timers on an-coord1001 [13:15:58] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:17:33] all good then :) [13:17:54] elukey: I keep an eye on ongoing jobs :) [13:23:44] 10Analytics, 10MinervaNeue, 10Performance-Team (Radar), 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): MinervaClientError sends malformed events - https://phabricator.wikimedia.org/T234344 (10JoeWalsh) @pmiazga do you mean the CORS issue with the request to maps.wikimedia.org? Looking at the page referenced... [13:26:38] joal: yes, that was very helpful. havent implemented in the code yet, but makes a lot of sense how you describe [13:27:19] great mgerlach :) Sorry for the delay in answering however - I'll try to be better next time :) [13:27:53] joal: no worries, there always other things to do in the meantime ; ) [13:28:07] joal: again thanks for taking the time [13:28:51] mgerlach: I'll let you know if I some point I manage to put my finder on that text-processing memory issue :) [13:29:44] joal: I see that this is not high-priority for both of us, so no rush, but would be curious to learn if you find out more [13:30:31] mgerlach: so would I!!! We have made some investigation with Andrew and found interestings directions, but no precise answer yet [13:30:57] mgerlach: And, giving more overhead memory to spark workers make the jobs succeed, so it's actually a no-blocker [13:31:18] But, I'd really like to understand, so at some point I'll spend the time [13:36:14] going afk for a bit, please ping me on whap/etc.. if anything comes up :) [13:45:29] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Decouple analytics zookeeper cluster from kafka zookeeper cluster [2019-2020] - https://phabricator.wikimedia.org/T217057 (10elukey) [13:48:19] joal: did you ever see a meeting invite for you and me and martin and djellel? [13:48:30] i made it last week but now I can't find any evidence of it! [13:48:33] i'm going crazy! [13:49:01] ottomata: I've not seen it no - And forgot to let you know :)( [13:49:09] ottomata: when? [13:49:16] for tomorrow [13:49:25] nope [13:49:25] there's another meeting i made at the same time for stream config stuff that is gone [13:49:27] what time? [13:49:30] i must be crazy [13:49:36] :s [13:49:37] or google cal is pranking me [13:50:08] tomorrow I can't do it before standup (kids) [13:50:17] And post-standup is pretty busy [13:50:23] can be late, no problem [13:50:40] oh, but not early? [13:50:45] earlier? [13:50:48] was thiking about this time [13:50:55] starting 35 minutes ago [13:51:02] can't do it before standup :( [13:51:04] ok [13:51:07] thursday then? [13:51:11] Ye! [13:51:13] +s [13:51:49] Like 51 minutes before now would be great (again, kids before standup :S) [13:52:05] ottomata: They don't tell you that before you have kids... [13:52:23] Or didn't listen - Don't know [13:52:47] Also - will miss standup and send an e-scrum this evening [13:53:06] ottomata: will you have some time for me in your afternoon for spark 2.4.4? [13:56:04] Hi, i'm a senior data analyst working in fundraising and my username is Jerrie Kumalah and I am having trouble logging into Hue [14:03:41] hola nuria and/or analytics as a whole :) I have a quick question: what is the difference between the mediacounts and the mediarequests table? [14:09:45] ping fdans ^ [14:09:57] joal: yes for sure [14:11:32] hellooo miriam_ [14:11:48] ciao fdans :) [14:13:22] miriam_ mediacounts is an old dataset that tracks webrequests to media files, without taking into account the wiki projects that they were requested from [14:13:50] plus the data is pivoted [14:14:53] _miriam: mediarequests does kind of the same thing but in a denormalised way, plus keeping track of file types and the wikis that the files were requested from [14:22:14] fdans thanks! question: do requests include only users' actively clicking on images, say in Wikipedia pages, or they record everytime an image is loaded, e.g. when a page is viewed? [14:23:34] miriam_: every time an image is requested [14:24:06] ok, so I won't know if a reader is actively clicking on the image just by looking at those tables, right? [14:24:21] it’s just values in webrequest hitting upload.wikimedia.org [14:24:41] miriam_: no but you can use pageviews for that [14:25:11] fdans: yes yes, nice, thanks :) [14:25:25] fdans: you told me you were coming to London in october, right? [14:30:07] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Decouple analytics zookeeper cluster from kafka zookeeper cluster [2019-2020] - https://phabricator.wikimedia.org/T217057 (10elukey) Last step is to clean up the zookeeper main eqiad cluster from old hadoop zones (~30k, a lot) to complete... [14:34:44] miriam_: in the end only my wife went! it was a lil impractical for me to go too for such a short time [14:57:10] started the clean up of znodes on the SRE cluster [14:57:12] https://grafana.wikimedia.org/d/000000261/zookeeper?refresh=5m&panelId=42&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=main-eqiad&var-zookeeper_hosts=All&from=now-1h&to=now [14:57:18] very gently [14:57:34] should remove a ton of znodes related to yarn [15:01:41] 10Analytics, 10Analytics-Kanban, 10Operations, 10SRE-Access-Requests: Analytics Access for Grant - https://phabricator.wikimedia.org/T235260 (10Nuria) ping #sre-access-requests after verification of employment (grant is our new CTO) his user should be added to the wmf ldap group [15:03:06] 10Analytics, 10Analytics-Kanban, 10Operations, 10SRE-Access-Requests: Analytics Access for Grant - https://phabricator.wikimedia.org/T235260 (10Reedy) Doesn't this technically need Katherine to sign it off? [15:06:50] hey a-team: could we get an ETA on https://phabricator.wikimedia.org/T234870 ? we'd like to give Legal an update on this [15:08:20] 10Analytics, 10Analytics-Kanban: Add data quality metric: distribution of eventlogging user agents - https://phabricator.wikimedia.org/T234588 (10Nuria) Given your results I would choose the combined metric. [15:09:03] Nettrom, yes, I can delete the data today. I saw your confirmation on Phab that all data previous to Oct 1st has to be deleted, right? Just to be super-sure. [15:09:33] mforns: awesome! and yes, everything prior to 2019-10-01 can be deleted. Thanks so much! :) [15:09:34] *from event_sanitized.helppanel [15:09:38] ok [15:12:59] mforns: since it is joal ops week you can assign that ticket to joal [15:13:13] nuria, was doing that right now, will take 5 mins [15:13:25] is that ok? [15:15:42] mforns: yayaya [15:15:45] of course [15:15:47] k :] [15:16:56] elukey: fyi presto is set up on analytics1030 [15:17:07] haven't tried to get it working with kerb [15:17:42] ack! [15:19:00] Nettrom, I just deleted the data up to (not including) Oct 1st. It will stay for a bit in the trash folder, in case you need to recover something. [15:20:54] 10Analytics, 10Analytics-Kanban: Upgrade matomo to its latest upstream version - https://phabricator.wikimedia.org/T234607 (10elukey) [15:22:04] Nettrom, now I'll proceed to delete the Hive table partitions, the table may be inaccessible for a couple minutes [15:29:06] Nettrom, done, will update the task. Thanks for the ping! [15:29:33] mforns: no problem, thanks so much for getting it done so quickly, very much appreciated! :) [15:29:56] as mentioned in the phab task, we'll work on our end to make sure this isn't an urgent task in the future, apologies for that [15:32:26] 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Help panel: delete sanitized data from before Oct 1 - https://phabricator.wikimedia.org/T234870 (10mforns) I have deleted all data directories and Hive partitions for event_sanitized.helppanel up to Oct 1st 2019 (not included). I checked that... [15:34:10] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Growth-Team (Current Sprint): Help panel: delete sanitized data from before Oct 1 - https://phabricator.wikimedia.org/T234870 (10mforns) a:03mforns [15:35:49] 10Analytics, 10Analytics-Kanban: Add data quality metric: distribution of eventlogging user agents - https://phabricator.wikimedia.org/T234588 (10mforns) @Nuria Should I remove the rest of the useragent metrics from the data quality Oozie pipeline then? [15:35:56] ottomata: if you have time in these days, can you check https://gerrit.wikimedia.org/r/#/c/eventlogging/+/540871/ ? [15:37:10] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Wikimedia-Portals: Sudden drop in WikipediaPortal events - https://phabricator.wikimedia.org/T234461 (10Milimetric) 05Declined→03Resolved [15:40:49] 10Analytics, 10Analytics-Kanban: Add data quality metric: distribution of eventlogging user agents - https://phabricator.wikimedia.org/T234588 (10Nuria) @mforns I think they are good to see changes like the recent UA update that changed OS values more than anything. Thus while we alarm on the "overall" metric... [15:42:31] elukey: +1! [15:43:32] ottomata: thanks! Did you see the processor's new error message when validating? [15:43:45] you can see it in deployment prep [15:43:54] it is very verbose [15:44:20] I find it super clear to read (especially for logstash) but not sure if we need something less verbose [15:45:14] that could also help in https://phabricator.wikimedia.org/T116719 (super old task) [15:47:19] CindyCicaleseWMF: can you take a look at comments on https://phabricator.wikimedia.org/T223414 and advice? [15:47:58] nuria: o/ zookeeper swapped, all good! [15:48:23] elukey: WOW, let's make sure to document This sucess! (will do) [15:49:28] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Figure out how to $ref common schema across schema repositories - https://phabricator.wikimedia.org/T233432 (10Ottomata) Ok, time for naming bike shed! We need to name both the new 'common' schema repository, as wel... [15:49:47] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Figure out how to $ref common schema across schema repositories - https://phabricator.wikimedia.org/T233432 (10Ottomata) Ping @Pchelolo @Milimetric @jlinehan about ^ [15:53:35] (03PS7) 10Milimetric: Publish monthly geoeditor numbers [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) [15:53:54] 10Analytics, 10MinervaNeue, 10Performance-Team (Radar), 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): MinervaClientError sends malformed events - https://phabricator.wikimedia.org/T234344 (10pmiazga) The problem is that in some scenarios we get an event with malformed URL. The statsv client looks like it's... [15:54:02] (03CR) 10Milimetric: "oh my gosh, I was so sloppy, thank you!" (038 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) (owner: 10Milimetric) [15:57:10] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Move reportupdater reports that pull data from eventlogging mysql to pull data from hadoop - https://phabricator.wikimedia.org/T223414 (10CCicalese_WMF) The trends over time are one of the important aspects of the pingback data.... [15:57:18] nuria: done! [15:57:25] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Growth-Team (Current Sprint): Help panel: delete sanitized data from before Oct 1 - https://phabricator.wikimedia.org/T234870 (10nettrom_WMF) 05Open→03Resolved @mforns : thanks for taking care of this! I've verified that the table doesn't contain... [15:57:27] CindyCicaleseWMF: grasiasssss [15:58:21] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Figure out how to $ref common schema across schema repositories - https://phabricator.wikimedia.org/T233432 (10Milimetric) I like a single namespace, especially because having "common" as a root would be too vague.... [15:59:58] nuria: de nada ;-) [16:00:06] CindyCicaleseWMF: jajajaja [16:01:22] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Figure out how to $ref common schema across schema repositories - https://phabricator.wikimedia.org/T233432 (10Ottomata) > schemas/event/wikimedia (better name than "mediawiki" for our schemas, in my opinion) Well, k... [16:05:37] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys, 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10ovasileva) p:05Triage→03Normal [16:28:08] 10Analytics, 10MobileFrontend, 10Readers-Web-Backlog: Having trouble setting up MobileFrontend for development - https://phabricator.wikimedia.org/T226071 (10pmiazga) @Milimetric do you still see this issue? [16:48:44] (03CR) 10Nuria: Publish monthly geoeditor numbers (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) (owner: 10Milimetric) [17:09:50] does anyone happen to know the best way for me to get access to Hue? [17:11:08] 10Analytics, 10Desktop Improvements, 10Event-Platform, 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): [SPIKE 8hrs] How will the changes to eventlogging affect desktop improvements - https://phabricator.wikimedia.org/T233824 (10MBinder_WMF) [17:22:46] fdans: or elukey: any advice on the ^ [17:23:33] Hi jkumalah - I'm the one who should help you this week (we rotate what we call ops weeks) [17:23:48] jkumalah: give me a few minutes and I'll be here and try to help [17:23:55] thanks joal: [17:31:39] https://grafana.wikimedia.org/d/000000261/zookeeper?refresh=5m&panelId=42&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=main-eqiad&var-zookeeper_hosts=All&from=now-6h&to=now [17:31:51] joal: i have a meeting in 1h20 mins, in case you want to talk about spark 2.4 before then [17:31:53] officially out from zookeeper! [17:32:08] Nice elukey! [17:33:28] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Operations, and 3 others: Public schema.wikimedia.org endpoint for schema.svc - https://phabricator.wikimedia.org/T233630 (10Ottomata) Ping @ema and/or @BBlack. Should I make some patches? [17:35:17] Here I am [17:35:49] jkumalah: do you have a wikitech account giving you shell access? [17:35:59] yes i do [17:36:05] my user name is Jerrie Kumalah [17:36:20] jkumalah: I'd need your shell username please [17:36:30] joal: jkumalah [17:36:45] One could have guessed, but it could have been different :) [17:37:10] :) [17:37:31] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Operations, and 3 others: Public EventGate endpoint for analytics event intake - https://phabricator.wikimedia.org/T233629 (10Ottomata) I'm inclined to just use the existent eventgate-analytics backend endpoint for now. We can consider setting up... [17:37:43] jkumalah: you can try, hopefully it should be good [17:40:36] I'm in! [17:41:54] \o/ ! [17:42:08] Sorry for the delay in answering jkumalah :) [17:42:17] joal: thanks again! [17:42:25] * elukey off! [17:44:12] (03CR) 10Joal: [V: 03+2] "Let's merge (already pushed to prod)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542998 (https://phabricator.wikimedia.org/T235448) (owner: 10Joal) [17:45:31] jkumalah: something that i'd like to pint out is that we do recommend to run queries on hue, jupyter notebook is a much better venue for that [17:46:09] jkumalah: hue has little capacity and it is crashed couple times as of late and we suspect that was due to query executiion [17:46:24] jkumalah: jupyter notebooks docs [17:46:38] jkumalah: https://wikitech.wikimedia.org/wiki/SWAP [17:46:43] nuria: thanks for the tip. will explore jupyter notebooks as i continue to get oriented to our tools [17:47:26] jkumalah: syncing with kzeta team would help [17:47:35] nuria: got an overview from Mikhail and will probably use it in a limited capacity for writing up code and iteration before actually running [17:47:54] nuria: thank you [17:49:57] (03CR) 10Joal: "2 things left for me to be happy :)" (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) (owner: 10Milimetric) [18:17:09] Heya ottomata - Would now be a good moment? [18:38:37] jkumalah: for exploring our databases, you might like DataGrip - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Access#DataGrip [18:39:31] kzeta: thanks for sharing. Mikhail shared that with me last week so I'm touching base with him later this week to make sure i have it set-up correctly [18:39:54] 10Analytics, 10Analytics-Kanban, 10Operations, 10SRE-Access-Requests: Analytics Access for Grant - https://phabricator.wikimedia.org/T235260 (10Nuria) @Reedy not sure if you are for real but while "technically" that might be a requirement I vote for proceeding w/o it in this case so as not to have to wait... [18:40:33] trying again: Heya ottomata - Would now be a good moment? [18:42:12] jkumalah: cool, wanted to make sure you had that :) [18:51:11] joal: ah just swa this [18:51:14] meeting starting [18:51:18] free in 40 mins [18:51:21] (or less maybe) [18:51:28] np ottomata - ping me when ready [18:51:41] ping milimetric coming to meeting? [19:25:11] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Move reportupdater reports that pull data from eventlogging mysql to pull data from hadoop - https://phabricator.wikimedia.org/T223414 (10mforns) @CCicalese_WMF > The trends over time are one of the important aspects of the pin... [19:27:55] (03CR) 10Joal: [C: 04-1] "Comments on start date and indentation - almost there :)" (033 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/543008 (https://phabricator.wikimedia.org/T232671) (owner: 10Srishakatux) [19:31:54] 10Analytics, 10Analytics-Kanban: Add data quality metric: distribution of eventlogging user agents - https://phabricator.wikimedia.org/T234588 (10mforns) @Nuria Ok, makes sense. I guess, this scenario where we want to calculate secondary data quality metrics that help troubleshoot, or we want to first plot so... [19:49:48] 10Analytics, 10Core Platform Team, 10Event-Platform, 10WMF-JobQueue, and 2 others: EventBus extension must not send batches that are too large - https://phabricator.wikimedia.org/T232392 (10WDoranWMF) [20:07:04] 10Analytics, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog, 10Epic, 10Performance-Team (Radar): Prototype client to log errors - https://phabricator.wikimedia.org/T235189 (10Gilles) [20:07:32] 10Analytics, 10Event-Platform, 10WMF-JobQueue, 10CPT Initiatives (Modern Event Platform (TEC2)), and 2 others: EventBus extension must not send batches that are too large - https://phabricator.wikimedia.org/T232392 (10daniel) [20:10:05] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 4 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10Gilles) [20:15:00] (03CR) 10MNeisler: Add the MobileWebUIActionsTracking schema to EventLogging whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563) (owner: 10MNeisler) [20:31:25] 10Analytics, 10Fundraising-Backlog, 10Fundraising Sprint U 2019: Identify source of discrepancy between HUE query in Count of event.impression and druid queries via turnilo/superset - https://phabricator.wikimedia.org/T204396 (10DStrine) [20:32:05] (03PS1) 10Joal: [WIP] Add partition pruning to hive queries [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/543210 (https://phabricator.wikimedia.org/T235283) [20:32:07] (03CR) 10Milimetric: Publish monthly geoeditor numbers (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) (owner: 10Milimetric) [20:32:11] (03PS8) 10Milimetric: Publish monthly geoeditor numbers [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) [20:32:49] mforns_, nuria - If ou have a minute, can you confirm the way I plan to apply partition pruning looks ok to you before I change all the queries ? See patch above [20:33:10] (03CR) 10Milimetric: "I'm sad that you weren't happy before, but happy that it's as easy as some xml snippets :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) (owner: 10Milimetric) [20:34:22] OOOF sorry joal [20:34:26] we talked for a long ttime [20:34:42] (03CR) 10Joal: [C: 03+1] ":) +1 for me - no +2 as nuria had comments" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) (owner: 10Milimetric) [20:34:46] Thanks a lot milimetric :) [20:34:55] ottomata: :) [20:35:01] thank you jo [20:35:09] maybe you had enough talking for the day? [20:35:12] ottomata: -^ [20:35:56] joal: i can talk now [20:36:06] Let's do that :) [20:36:08] batcave? [20:36:25] ya [20:39:15] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Figure out how to $ref common schema across schema repositories - https://phabricator.wikimedia.org/T233432 (10Milimetric) I wouldn't worry too much about how others are using this in their own mediawiki installs. N... [20:39:31] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Spark to 2.4.x - https://phabricator.wikimedia.org/T222253 (10Ottomata) @elukey mind if we upgrade to Spark 2.4.4 in the analytics test cluster and do some tests there? [20:40:12] (03CR) 10Joal: "Only 2 files updated to check with @mforns and @nuria that the change is ok, as it'll mostly be copy/paste to other queries (by folder)." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/543210 (https://phabricator.wikimedia.org/T235283) (owner: 10Joal) [20:42:13] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Figure out how to $ref common schema across schema repositories - https://phabricator.wikimedia.org/T233432 (10Ottomata) Re the 'mediawiki schemas' vs 'wikimedia-schemas', I kind of agree with you, but I think @Pchel... [20:52:01] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 4 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10Ottomata) Ok @Krinkle we talked a bit more after the meeting today. I think we understand how EventLogging... [20:58:07] 10Analytics, 10Operations, 10decommission, 10ops-eqiad, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Jclark-ctr) a:05Cmjohnson→03Jclark-ctr [20:58:46] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10Jclark-ctr) a:05Cmjohnson→03Jclark-ctr [21:07:31] 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10JFishback_WMF) @Ijon I'm working on an explanation to the community for why we feel this is a low risk data set to release in spit... [21:07:43] (03PS1) 10Joal: Update mediawiki-history dumper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/543216 (https://phabricator.wikimedia.org/T235269) [21:14:51] (03PS2) 10Srishakatux: Add hive query for wmcs edits [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/543008 (https://phabricator.wikimedia.org/T232671) [21:20:35] (03CR) 10Joal: [C: 03+1] "LGTM! @mforns can you double check please?" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/543008 (https://phabricator.wikimedia.org/T232671) (owner: 10Srishakatux) [21:21:20] 10Analytics, 10Patch-For-Review: Use spark to split webrequest on tags - https://phabricator.wikimedia.org/T164020 (10JAllemandou) [21:22:01] 10Analytics, 10Patch-For-Review: Use spark to split webrequest on tags - https://phabricator.wikimedia.org/T164020 (10JAllemandou) a:05JAllemandou→03None [21:26:37] (03CR) 10Nuria: Add the MobileWebUIActionsTracking schema to EventLogging whitelist (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563) (owner: 10MNeisler) [21:28:28] (03CR) 10Nuria: [C: 03+1] [WIP] Add partition pruning to hive queries (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/543210 (https://phabricator.wikimedia.org/T235283) (owner: 10Joal) [21:30:54] (03CR) 10Nuria: Update mediawiki-history dumper (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/543216 (https://phabricator.wikimedia.org/T235269) (owner: 10Joal) [21:40:03] Gone for tonight team - see you tomorrow [21:56:03] 10Analytics, 10Analytics-EventLogging, 10Operations, 10decommission, and 2 others: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Papaul) [21:56:17] 10Analytics, 10Analytics-EventLogging, 10Operations, 10decommission, and 2 others: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Papaul) 05Open→03Resolved Complete [22:06:42] musikanimal: yt? [22:49:08] 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Ijon) No, I don't think that's what happened at all. This has been held back for several years now (I have been requesting this s... [23:17:47] 10Analytics, 10Growth-Team, 10GrowthExperiments, 10Product-Analytics: Homepage: ensure data retention is in line with the guideline exception - https://phabricator.wikimedia.org/T235577 (10nettrom_WMF) [23:18:46] 10Analytics, 10Growth-Team, 10GrowthExperiments, 10Product-Analytics: Homepage: ensure data retention is in line with the guideline exception - https://phabricator.wikimedia.org/T235577 (10nettrom_WMF) a:05nettrom_WMF→03None