[01:37:03] 10Analytics, 10Event-Platform, 10Growth-Team, 10MediaWiki-Revision-backend, and 7 others: Replace LinksUpdate Revision methods with RevisionRecord - https://phabricator.wikimedia.org/T249397 (10DannyS712) [01:37:10] 10Analytics, 10Event-Platform, 10Growth-Team, 10MediaWiki-Revision-backend, and 7 others: Replace LinksUpdate Revision methods with RevisionRecord - https://phabricator.wikimedia.org/T249397 (10DannyS712) 05Open→03Resolved [06:32:45] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Run a script to check REFINE_FAILED flags daily - https://phabricator.wikimedia.org/T240230 (10elukey) All jobs deployed, added documentation in https://wikitech.wikimedia.org/wiki/Analytics/Ops_week#Refine_failure_report [06:32:47] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Run a script to check REFINE_FAILED flags daily - https://phabricator.wikimedia.org/T240230 (10elukey) [06:32:49] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Run a script to check REFINE_FAILED flags daily - https://phabricator.wikimedia.org/T240230 (10elukey) [06:57:25] (03PS1) 10Gilles: Retain FirstInputTiming and NavigationTiming hardwareConcurrency [analytics/refinery] - 10https://gerrit.wikimedia.org/r/587914 (https://phabricator.wikimedia.org/T238091) [07:38:39] !log move hdfs-cleaner from an-coord1001 to an-launcher1001 [07:38:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:44:49] PROBLEM - Check the last execution of hdfs-cleaner on an-coord1001 is CRITICAL: NRPE: Command check_check_hdfs-cleaner_status not defined https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:45:27] this is me --^ [07:46:48] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move systemd timer from an-coord1001 to an-launcher1001 - https://phabricator.wikimedia.org/T249593 (10elukey) [08:08:35] !log move project_namespace_map from an-coord1001 to an-launcher1001 [08:08:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:10:11] PROBLEM - Check the last execution of refinery-download-project-namespace-map on an-coord1001 is CRITICAL: NRPE: Command check_check_refinery-download-project-namespace-map_status not defined https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:14:52] me again [08:15:15] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move systemd timer from an-coord1001 to an-launcher1001 - https://phabricator.wikimedia.org/T249593 (10elukey) [08:22:55] brb [08:49:26] PROBLEM - Check the last execution of refinery-sqoop-mediawiki-private on an-coord1001 is CRITICAL: NRPE: Command check_check_refinery-sqoop-mediawiki-private_status not defined https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:55:06] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move systemd timer from an-coord1001 to an-launcher1001 - https://phabricator.wikimedia.org/T249593 (10elukey) [08:57:10] PROBLEM - Check the last execution of refinery-sqoop-whole-mediawiki on an-coord1001 is CRITICAL: NRPE: Command check_check_refinery-sqoop-whole-mediawiki_status not defined https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [09:11:22] !log move druid_load jobs from an-coord1001 to an-launcher1001 [09:11:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:12:32] 10Analytics, 10Analytics-Kanban: Move systemd timer from an-coord1001 to an-launcher1001 - https://phabricator.wikimedia.org/T249593 (10elukey) [10:43:08] * elukey lunch! [10:56:10] Thanks a lot Andrew and Luca for figuring out the camus-flagger stuff [11:42:41] (03CR) 10Joal: [V: 03+2 C: 03+2] "> Since this is merged, please make note of it in the train page: https://etherpad.wikimedia.org/p/analytics-weekly-train" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/586432 (https://phabricator.wikimedia.org/T243090) (owner: 10Mforns) [11:48:48] joal: that was Andrew's magic, I didn't even think about it :) [11:55:38] joal: https://www.youtube.com/watch?v=T5RmMNDR5XI, really interesting (David gave it to me) [11:56:30] Nice elukey :) [11:57:42] All these French people interested in text search [11:57:45] :D [11:57:50] hihihi :D [11:58:07] jokes aside, ES/Lucene is really interesting, I'd love to know more [11:58:45] elukey: I have already done some of that exploration myself (long ago), but please keep me posted and updated ;) [11:59:26] joal: I was sure you did :D [12:09:16] joal: when you have a moment I have an idea about Kafka, the one that I was chatting during the last ops sync [12:09:51] brb [12:11:52] elukey: ping when you're back, we can talk :) [12:13:56] ah yes I am! [12:14:10] heya :) [12:14:17] very quick - I saw in https://docs.confluent.io/current/kafka/authorization.html that we can use something like --allow-host for various ACLs [12:15:26] we could use it for an-launcher1001 (where camus is), using its ipv4/ipv6 addresses for webrequest [12:15:54] the other host that use webrequest is weblog1001, for SRE, but I don't know any other use cases [12:16:01] does it sound legit? [12:16:06] I'd test it first [12:16:17] it is just a temporary measure before kerberos [12:16:24] to limit a bit Kafka [12:16:27] It does elukey - My understanding is that the allow-host is related to an ACL, therefore username etc [12:16:44] Can we implement that as well (will camus support?) [12:17:17] yeah we can use User:ANONYMOUS [12:17:24] ah ok [12:17:46] Ideally Camus should eventually support SASL + TLS + Kafka 0.9+ [12:17:54] ok [12:18:02] the test that we should do it to try to upgrade the kafka client [12:18:14] gobblin did something similar IIUC https://gobblin.readthedocs.io/en/latest/developer-guide/HighLevelConsumer/ [12:18:31] ack [12:19:13] all right if it makes sense I'll do some tests, thanks :) [13:13:59] be back in a bit! [14:29:03] 10Analytics, 10Analytics-Kanban, 10Pageviews-API: Pageviews missing for titles with emojis since April 23, 2019 - https://phabricator.wikimedia.org/T245468 (10lexnasser) Just wanted to write everything I figured out in the past 3 days. I would love your feedback! If you don't want to read several paragraphs... [14:31:04] (03CR) 10Nuria: [C: 03+2] Retain FirstInputTiming and NavigationTiming hardwareConcurrency [analytics/refinery] - 10https://gerrit.wikimedia.org/r/587914 (https://phabricator.wikimedia.org/T238091) (owner: 10Gilles) [14:31:06] (03CR) 10Nuria: [V: 03+2 C: 03+2] Retain FirstInputTiming and NavigationTiming hardwareConcurrency [analytics/refinery] - 10https://gerrit.wikimedia.org/r/587914 (https://phabricator.wikimedia.org/T238091) (owner: 10Gilles) [14:45:19] lexnasser: nice work on your ticket, it will take me a bit to read it all [14:46:54] nuria: ha, thanks! [15:14:59] 10Analytics: Users having issues with presto dashboards on superset - https://phabricator.wikimedia.org/T249923 (10Nuria) [15:15:51] 10Analytics, 10Analytics-Kanban: Users having issues with presto dashboards on superset - https://phabricator.wikimedia.org/T249923 (10Nuria) a:03elukey [15:16:57] lexnasser: here I am [15:17:08] so I'd ask you to go https://superset.wikimedia.org/superset/sqllab [15:17:26] you can see some dropdowns at the top left corner [15:17:55] the sources dropdown? [15:18:12] correct, so as database please select "presto_analytics_hive" [15:18:16] then schema "event" [15:18:23] and table one that you want [15:18:24] random [15:18:39] in theory you should see after a bit "unknown error" [15:18:56] and possibly basic auth repeated multiple time [15:20:02] how do I select it, I just see "show records"? [15:21:01] mmm in https://superset.wikimedia.org/superset/sqllab ? [15:21:28] elukey: ah, I see now [15:21:34] super :) [15:22:10] yeah I see unknown error [15:22:52] ok that confirms my theory then [15:23:45] lexnasser: can you retry? [15:24:40] still unknown error [15:26:47] lexnasser: another time if you have patience (I am setting some stuff on the fly to see if anything changes) [15:27:40] unknown error still [15:27:53] sigh thanks [15:28:37] I'm just chilling watching some YouTube, ping me whenever no problem [15:29:44] lexnasser: thanks a lot [15:30:36] elukey: if you want to do the thaing, I'm finished with the meeting [15:30:58] mforns: sure so I'd merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/587975/ first [15:31:07] to absent timers on an-coord1001 [15:31:11] lookin [15:31:13] then I'll add it to an-launcher1001 [15:31:21] should work out of the box right? [15:33:50] yes! :D theoretically :S [15:33:57] lovely [15:37:05] 10Analytics, 10Cloud-Services, 10Developer-Advocacy (Jan-Mar 2020), 10Goal, 10Patch-For-Review: Create a WMCS edits dashboard via Dashiki - https://phabricator.wikimedia.org/T226663 (10Aklapper) [15:38:53] mforns: and last step https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/588010/ [15:39:05] lookin [15:40:12] I +1ed but mostly because I didn't see any typos [15:41:49] thanks! when jenkins is pleased I'll merge [15:42:59] 10Analytics, 10Analytics-Kanban, 10Pageviews-API: Pageviews missing for titles with emojis since April 23, 2019 - https://phabricator.wikimedia.org/T245468 (10Milimetric) Great explanation, thank you, every time someone says "unicode" I usually have to read for an hour to remember all the tricks. So, my tho... [15:45:12] mforns: all done! [15:45:22] migrated? [15:45:26] yep! [15:45:38] !log migrate data_purge timers from an-coord1001 to an-launcher1001 [15:45:39] no fireworks? [15:45:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:45:50] mforns: well I haven't execute any of them yet :D [15:46:00] can we run the sanitization ones to see if they work? [15:46:08] sure! [15:46:50] IIRC saltrotate only runs at midnight [15:47:06] the sanitization jobs will just look at the salts in hdfs [15:47:12] that are currently there [15:47:29] mforns: can we anticipate the run of saltrotate? [15:48:16] elukey: I think we can [15:48:24] no salts should be rotated [15:48:33] the same salts will be re-synced to hdfs [15:49:06] mforns: to be sure we can just copy from hdfs to the host in say your home, run and compare after [15:49:21] elukey: makes sense, doing now [15:50:50] elukey: I know nothing about superset at all really, but in my profile, my email is listed as "lexnasser@email.notfound", maybe this might be an issue? Not sure at all [15:51:18] lexnasser: nono that should be fine, it is auto-create when you log in.. [15:51:32] it works for all members of analytics-admins [15:51:36] that are on the host [15:51:44] where presto run [15:51:51] so I tried to manually add you but nothing :( [15:52:02] the chain is: [15:52:11] Superset LDAP Auth (via basic auth) [15:52:50] then the user (in your case, lexnasser) will be proxied by superset (authenticated via kerberos as daemon) to presto on an-coord1001 (another daemon authenticated via kerberos) [15:53:00] superset is configured so it can query on your behalf [15:53:23] elukey: done, the current salts are in an-launcher1001 under /home/mforns/salts/pre_migration [15:53:25] so in theory the only error should be if you weren't in analytics-privatedata or similar [15:53:38] mforns: super, let's restart then [15:54:01] lexnasser: but in practice it works only for analytics-admins users :D [15:54:39] I tried to check presto logs but they are a mess [15:57:49] elukey:thanks for the explanation! [15:58:25] lexnasser: if you think about anything let me know :) [15:59:10] (03CR) 10Mforns: "Left some comments, please ping me if you need clarifications!" (036 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/576618 (https://phabricator.wikimedia.org/T244597) (owner: 10Lex Nasser) [16:01:13] mforns: Thanks for the cr! I'll let you know if I have any questions [16:01:29] lexnasser: ok! [16:01:50] lexnasser: Have you tested an oozie job before? [16:02:26] mforns: do you want me to restart? [16:02:31] if so, what unit? [16:03:09] mforns: I tested the one you just reviewed, with the exception of the datasets and _SUCCESS and all that, couldn't figure out how to test that [16:03:12] elukey: refinery-eventlogging-saltrotate? [16:03:56] lexnasser: if you want, we can pair on testing it [16:04:00] mforns: done! [16:04:07] O.o! [16:04:23] mforns: Sure thing, that would be great! I'll let you know when I get there [16:05:02] mforns: shas are ok! [16:05:14] (of files I mean, pre migration and current one under /etc) [16:05:16] lexnasser: great [16:05:16] \o/ [16:05:22] elukey: cool! [16:07:11] elukey: I also checked that the salts in HDFS are the expected ones [16:07:20] really nice [16:07:52] elukey: wanna force someting else? [16:10:27] 10Analytics, 10Cloud-Services, 10Developer-Advocacy (Jan-Mar 2020), 10Goal: Further improvements to the WMCS edits dashboard - https://phabricator.wikimedia.org/T240040 (10Aklapper) [16:13:24] elukey: the alerts about refinery-sqoop-mediawiki-private, refinery-download-project-namespace-map and hdfs-cleaner are because of the migration too? [16:17:43] mforns: yes yes [16:17:50] oh ok, thx [16:17:54] mforns: about force, I think we are ok [16:18:01] all right then :] [16:31:40] !log enable TLS from kafkatee to Kafka on analytics1030 (test instance) [16:31:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:32:39] elukey: not urgent. whenever you have a minute, could you +2 this patch putting my user on the ci whitelist? thanks https://gerrit.wikimedia.org/r/#/c/integration/config/+/588014/ [16:34:56] Gone for diner, will be back after [16:35:11] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP, 10Product-Analytics: pip not accessible in new SWAP virtual environments - https://phabricator.wikimedia.org/T247752 (10nshahquinn-wmf) 05Resolved→03Open Actually, this is still happening. I just tested in the Jupyter terminals on stat1004 and stat1005... [16:37:04] 10Analytics, 10Growth-Team, 10Product-Analytics: Growth: validate that data is purged after 270 days - https://phabricator.wikimedia.org/T249666 (10LGoto) p:05Triage→03Medium [16:40:54] hey a-team: we've found data missing from EventBus streams (e.g. mediawiki_revision_create) starting some time after 20:00 UTC on 2019-09-23, and ending some time after 18:00 UTC on 2019-09-29. Is this a known outage already with a phab task? [16:43:36] 10Analytics, 10Product-Analytics: Can't publish my draft dashboard on superset - https://phabricator.wikimedia.org/T248904 (10kzimmerman) Thank you for filing this, @Esanders, and for digging in, @Milimetric! A few of us encountered the same issue and hadn't thought to file it :facepalm: [16:44:59] Nettrom: can't remember from the back of my head [16:45:33] checking in https://tools.wmflabs.org/sal/analytics?p=0&q=&d=2019-09-29 [16:46:10] Nettrom: see https://tools.wmflabs.org/sal/log/AW1f6UWsEHTBTPG-WrJ_ [16:46:39] now I recall, there was a problem with big volume of data and camus wasn't able to catch up, so we had to create another instance of it [16:47:08] so it is probably related [16:47:34] yeah, I found https://phabricator.wikimedia.org/T233718 was mentioned the next day, looking at that now [16:48:47] 10Analytics, 10Analytics-Kanban: Move systemd timer from an-coord1001 to an-launcher1001 - https://phabricator.wikimedia.org/T249593 (10elukey) [16:49:01] elukey: thanks for digging into this, I'll read up on this later [16:49:49] ack! [16:49:55] I am logging off! o/ [16:57:31] 10Analytics, 10Analytics-Kanban, 10Pageviews-API: Pageviews missing for titles with emojis since April 23, 2019 - https://phabricator.wikimedia.org/T245468 (10Nuria) I disagree with @Milimetric, I think #3 would be a performance drag for code that might be executed 6000+ times a second, plus as a philosophi... [17:36:03] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Users having issues with presto dashboards on superset - https://phabricator.wikimedia.org/T249923 (10kzimmerman) [18:27:19] (03PS1) 10Mforns: Make RSVDAnomalyDetection ignore too short timeseries [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/588027 (https://phabricator.wikimedia.org/T249759) [18:27:34] 10Analytics, 10Analytics-Data-Quality, 10Analytics-EventLogging, 10WikiEditor, 10Mobile: WikiEditor records all edits as desktop edits in EventLogging - https://phabricator.wikimedia.org/T249944 (10kaldari) [18:29:01] (03CR) 10Mforns: [V: 03+2] "Tested on the cluster :]" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/588027 (https://phabricator.wikimedia.org/T249759) (owner: 10Mforns) [18:29:52] (03CR) 10Mforns: [V: 03+2] "I tested this on the cluster together with the companion refinery-source change (https://gerrit.wikimedia.org/r/#/c/analytics/refinery/sou" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/587844 (https://phabricator.wikimedia.org/T249759) (owner: 10Mforns) [18:31:40] (03CR) 10Mforns: [V: 03+2] Add traffic entropy data quality stats in hourly resolution (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/587844 (https://phabricator.wikimedia.org/T249759) (owner: 10Mforns) [18:37:22] 10Analytics: Druid access for view on event.editeventattempt - https://phabricator.wikimedia.org/T249945 (10Aklapper) Assuming this is about #analytics, feel free to correct [18:47:43] 10Analytics: Druid access for view on event.editeventattempt - https://phabricator.wikimedia.org/T249945 (10dr0ptp4kt) Thanks @Aklapper, that's right! [18:49:05] 10Analytics: Druid access for view on event.editeventattempt - https://phabricator.wikimedia.org/T249945 (10dr0ptp4kt) WIP at https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/587984 . I think CI will post that here shortly, too. [19:10:08] (03CR) 10Nuria: [C: 04-1] Make RSVDAnomalyDetection ignore too short timeseries (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/588027 (https://phabricator.wikimedia.org/T249759) (owner: 10Mforns) [19:28:53] 10Analytics, 10Analytics-Kanban, 10Pageviews-API: Pageviews missing for titles with emojis since April 23, 2019 - https://phabricator.wikimedia.org/T245468 (10awight) Thank you @lexnasser for the great write-up! I'm pinning as my new favorite overview of the crazy things that can happen with string encoding... [19:30:55] 10Analytics, 10Analytics-Kanban, 10Pageviews-API: Pageviews missing for titles with emojis since April 23, 2019 - https://phabricator.wikimedia.org/T245468 (10Milimetric) Good point Nuria, performance matters a lot. But JRegex might be fast, we should profile it. I think my point about keeping it a whiteli... [19:34:37] (03PS2) 10Mforns: Make RSVDAnomalyDetection ignore too short timeseries [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/588027 (https://phabricator.wikimedia.org/T249759) [19:35:00] (03CR) 10Mforns: Make RSVDAnomalyDetection ignore too short timeseries (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/588027 (https://phabricator.wikimedia.org/T249759) (owner: 10Mforns) [19:35:07] (03CR) 10Mforns: [V: 03+2] Make RSVDAnomalyDetection ignore too short timeseries [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/588027 (https://phabricator.wikimedia.org/T249759) (owner: 10Mforns) [20:10:50] 10Analytics, 10Analytics-Kanban, 10Pageviews-API: Pageviews missing for titles with emojis since April 23, 2019 - https://phabricator.wikimedia.org/T245468 (10lexnasser) Thanks everyone for your input! I'm a bit busy right now, but I'll be sure to address each of your points later today. Just for context, o... [21:11:12] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Users having issues with presto dashboards on superset - https://phabricator.wikimedia.org/T249923 (10Nuria) [21:14:27] (03CR) 10Nuria: [C: 03+2] Make RSVDAnomalyDetection ignore too short timeseries [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/588027 (https://phabricator.wikimedia.org/T249759) (owner: 10Mforns)