[07:11:37] good morning! [07:21:23] Hello [07:26:19] !log execute 'sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/mediawiki' on launcher to fix dir perms [07:26:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:26:50] elukey: still an issue? [07:27:48] joal: bonjour! Only the parent dir, the timer wasn't failing since it thought data was not there yet (but the snapshot for Nov is already there) [07:27:59] Ah [07:28:04] makes sense [07:28:08] Thanks for that :) [07:28:20] The rest seems ok, isn't it? [07:30:06] joal: yep yep I am checking the output of the rsyncs, I had to chown a couple of parent dir on saturday but nothing more [07:30:09] no errors so far [07:30:17] \o/ [07:32:13] yep I think we can call it done [07:34:58] elukey: IMO we can call operations fixing the stuff done - We should review perms and ownership of archive and possibly other prod folders [07:36:47] joal: I'd like to follow up on why druid indexations didn't alarm, but it is not burning.. Also the /tmp stuff is still todo, I'll try to ask some prioritization for that during the standup [07:37:25] I'll also follow up with discovery and product analytics to ensure that their deploy scripts are updated [07:38:25] ack elukey - Let's make sure the /tmp problem is described as being an issue for both spark and hive data (see hdfs dfs -ls /wmf/data/archive/pageview/legacy/hourly/2021/2021-01 ownership [07:39:35] yep sure, I created one task for DataFrameToDruid but I can add a note [07:39:40] <3 [08:04:42] 10Analytics, 10Datasets-Archiving, 10Datasets-General-or-Unknown, 10Internet-Archive: Mediacounts dumps missing since January 4, 2021 - https://phabricator.wikimedia.org/T271511 (10elukey) 05Open→03Resolved a:03elukey This should be solved now, please re-open if anything is missing. Thanks for the re... [08:08:40] (03CR) 10Elukey: "Neil: After T270629 the default permission scheme is to create files without world-readable (other) bits set. In the Analytics use case, o" [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/651794 (https://phabricator.wikimedia.org/T261953) (owner: 10Neil P. Quinn-WMF) [08:24:09] 10Analytics, 10SRE-tools, 10IPv6, 10User-crusnov: Some Analytics clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271133 (10elukey) The Druid nodes are part of different clusters already using ipv6, so we can add the records anytime (we'll also need to update some rules in puppe... [08:28:35] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Wikimedia-Portals, 10cloud-services-team (Kanban): dumps.wikimedia.org/other/pageviews/ appears to be stalled at 20210106-140000 - https://phabricator.wikimedia.org/T271561 (10JAllemandou) 05duplicate→03Resolved This should solved now. Please reo... [08:30:53] 10Analytics: dumps::web::fetches::stats job should use a user to pull from HDFS that exists in Hadoop cluster - https://phabricator.wikimedia.org/T271362 (10elukey) p:05High→03Medium @Ottomata the remaining step to do would be to address the issue that you outlined in the task description, namely adding dums... [09:00:32] joal: the analytics:hdfs ownership issue is specific to some hive action ? I am writing the task and trying to figure out how to best describe it [09:01:11] or is it that in refinery we leverage /tmp for some jobs? [09:01:26] like pageview legacy [09:01:50] elukey: Nah - the ownership issue is due to using a non-controled /tmp folder, therefore endingup with the files owned by the parent-foldr owner [09:02:34] joal: sure that part is understood, but why it happens only for some jobs? [09:02:52] I mean, when do we leverage the non-controlled /tmp? [09:02:55] It happens on all jobs using archive I think [09:03:41] ah okok [09:09:39] elukey: and all jobs using /tmp not managed by oozie [09:11:49] we use /tmp/etc.. extensively in workflows afaics in refinery, I am trying to see if we have weird ownership elsewhere [09:12:23] ack elukey - If jobs use /tmp for data that is deleted after (not using archive) it should be ok [09:13:23] yeah I think so, hopefully we have only a few use cases to fix [09:13:50] I am going to send an email later on to internal with the things to do [09:14:08] we should prioritize the work in my opinion, just to avoid surprises later on [09:14:28] I fully agree elukey [09:43:39] 10Analytics, 10SRE-tools, 10IPv6, 10User-crusnov: Some Analytics clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271133 (10Volans) >>! In T271133#6735120, @elukey wrote: > What is the correct procedure to fix the records? Add them in netbox and then run the cookbook? Yes, I've... [09:45:59] 10Analytics: Review /tmp usage when using hive in oozie workflows - https://phabricator.wikimedia.org/T271687 (10elukey) [10:39:58] Thanks a lot elukey for the email summary on hdfs-perms :) [10:40:15] <3 [10:53:59] 10Analytics, 10SRE-tools, 10IPv6, 10User-crusnov: Some Analytics clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271133 (10elukey) @crusnov I checked all the druid nodes and they have the AAAA records, and the notebook nodes have been repurposed, so I think that we are good fro... [10:56:02] Wow - https://openartbrowser.org/ [10:56:08] * joal loves wikidata [11:03:57] really nice [11:35:57] * elukey lunch! [13:34:21] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Performance-Team: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Gilles) We would like to retain IP and/or geocoded data in the NavigationTiming schema. The other schemas listed in the tas... [13:44:19] (03CR) 10Neil P. Quinn-WMF: "> Patch Set 2:" [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/651794 (https://phabricator.wikimedia.org/T261953) (owner: 10Neil P. Quinn-WMF) [13:46:03] (03CR) 10Neil P. Quinn-WMF: "> Patch Set 2:" [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/651794 (https://phabricator.wikimedia.org/T261953) (owner: 10Neil P. Quinn-WMF) [13:46:13] (03CR) 10Elukey: "> Patch Set 2:" [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/651794 (https://phabricator.wikimedia.org/T261953) (owner: 10Neil P. Quinn-WMF) [14:04:01] elukey: happy Monday :D three for when you have a bit of free time: https://gerrit.wikimedia.org/r/c/operations/puppet/+/655073 https://gerrit.wikimedia.org/r/c/operations/puppet/+/655065 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/655193 [14:21:14] razzi: can review these too! added him as reviewer :) [14:26:14] that'd be amazing [14:26:51] 10Analytics, 10Beta-Cluster-Infrastructure, 10Event-Platform: [Cloud VPS alert] Puppet failure on deployment-schema-2.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T271508 (10Ottomata) Edited horizon hiera for this node to say port 80 instead of '80'. [14:31:23] 10Analytics: dumps::web::fetches::stats job should use a user to pull from HDFS that exists in Hadoop cluster - https://phabricator.wikimedia.org/T271362 (10Ottomata) I think adding back readable perms to public datasets is a better solution. I think this is done! [14:33:31] (03PS3) 10Neil P. Quinn-WMF: Set up and document deployment strategy for jobs [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/651794 (https://phabricator.wikimedia.org/T261953) [14:34:21] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Performance-Team: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Ottomata) Deprecate means we won't migrate them, and they will eventually stop collecting data. If you want to keep these... [14:35:54] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Performance-Team: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Ottomata) > We would like to retain IP and/or geocoded data in the NavigationTiming schema. The other schemas listed in the... [14:36:17] 10Analytics, 10Beta-Cluster-Infrastructure, 10Event-Platform: [Cloud VPS alert] Puppet failure on deployment-schema-2.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T271508 (10Ottomata) 05Open→03Resolved a:03Ottomata [14:58:26] superset 1.0 is under voting! [14:59:57] \o/ [15:00:14] Gone for kids team - see you at standup [15:09:51] (03PS4) 10Neil P. Quinn-WMF: Set up and document deployment strategy for jobs [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/651794 (https://phabricator.wikimedia.org/T261953) [15:33:02] holaaa teamm [15:39:21] 10Analytics, 10Event-Platform, 10MediaWiki-Core-Testing, 10Patch-For-Review: Integration tests die with "HTTP requests to blocked. Use MockHttpTrait." when EventBus extension is installed - https://phabricator.wikimedia.org/T270801 (10Mholloway) It looks like most if not all of the remaining offender... [15:41:28] 10Analytics, 10Anti-Harassment, 10Event-Platform, 10MW-1.36-notes (1.36.0-wmf.22; 2020-12-15), 10Patch-For-Review: Migrate Anti-Harassment EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T268517 (10Ottomata) [15:42:15] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 4 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:51:52] 10Analytics, 10Research: Release dataset on top search engine referrers by country, device, and language - https://phabricator.wikimedia.org/T270140 (10Milimetric) Thanks @Isaac, hi @bmansurov!! Actually, this should be an oozie job. They're a bit more of a pain to write, but I can help with that. The major... [16:00:51] 10Analytics, 10Event-Platform, 10Fundraising-Backlog: CentralNoticeBannerHistory and CentralNoticeImpression Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10Ottomata) > @AndyRussG, FYI we plan to perform this migration next week (week of Jan 11th), unless you have objections. We will... [16:01:38] ping ottomata [16:01:43] ack coming [16:05:02] 10Analytics, 10Event-Platform, 10Language-analytics: UniversalLanguageSelector Event Platform Migration - https://phabricator.wikimedia.org/T267352 (10Ottomata) Hello! @nshahquinn-wmf would you be ok with me migrating this schema this week? We try to give folks at least a weeks notice but another schema I... [16:36:59] razzi: ops grooming? [16:37:01] razzi: we are https://meet.google.com/qdx-cxwy-feo [16:37:55] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Data, 10Product-Infrastructure-Team-Backlog: Develop test environment solution for MEP analytics events - https://phabricator.wikimedia.org/T238837 (10Ottomata) [16:38:06] 10Analytics, 10Event-Platform, 10MediaWiki-Vagrant: Vagrant's /var/log/daemon.log filling up with kafka errors - https://phabricator.wikimedia.org/T187102 (10Ottomata) 05Open→03Resolved Vagrant no longer installs kafka by default with event* roles. [16:52:54] 10Analytics: Remove support for the (deprecated) Druid datasources (in favor of Druid Tables) on Superset - https://phabricator.wikimedia.org/T263972 (10Ottomata) [16:52:56] 10Analytics, 10Analytics-Kanban: Analytics Ops Technical Debt - https://phabricator.wikimedia.org/T240437 (10Ottomata) [16:52:58] 10Analytics: Remove support for the (deprecated) Druid datasources (in favor of Druid Tables) on Superset - https://phabricator.wikimedia.org/T263972 (10Ottomata) p:05Triage→03Medium [16:53:32] 10Analytics, 10Analytics-Kanban: AQS pageview default caching is one day - https://phabricator.wikimedia.org/T268809 (10Ottomata) [16:53:51] 10Analytics: dumps::web::fetches::stats job should use a user to pull from HDFS that exists in Hadoop cluster - https://phabricator.wikimedia.org/T271362 (10fdans) 05Open→03Resolved a:03fdans [16:56:21] 10Analytics, 10Analytics-Kanban: Alter table for navigation timing errors out in Hadoop test - https://phabricator.wikimedia.org/T268733 (10Ottomata) a:03elukey [16:58:16] 10Analytics, 10User-Elukey: Add python urllib_kerberos package to analytics clients - https://phabricator.wikimedia.org/T240891 (10Ottomata) p:05High→03Low [16:58:41] 10Analytics, 10Analytics-Kanban, 10Anti-Harassment, 10Event-Platform, and 2 others: Migrate Anti-Harassment EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T268517 (10fdans) [16:59:13] 10Analytics: Create a kibana dashboard for AQS hyperswitch's logs - https://phabricator.wikimedia.org/T262012 (10Ottomata) 05Open→03Declined [16:59:49] 10Analytics, 10Analytics-Kanban: Replace Camus by Gobblin - https://phabricator.wikimedia.org/T271232 (10fdans) p:05Triage→03High [17:00:38] 10Analytics: Gather all data-purge into a single job - https://phabricator.wikimedia.org/T262201 (10Ottomata) This sounds pretty big, and I think is related to our desire to refactor our sanitization pipeline and processes. I'm moving this back into incoming and we can groom this as a non ops-excellence task. [17:01:53] 10Analytics: Update geocode UDF to NOT lookup some addresses - https://phabricator.wikimedia.org/T271340 (10fdans) p:05Triage→03Medium [17:05:09] 10Analytics, 10Product-Analytics, 10Epic: Replace Oozie with better workflow scheduler - https://phabricator.wikimedia.org/T271429 (10fdans) p:05Triage→03High [17:10:20] 10Analytics: Add Spark metrics - https://phabricator.wikimedia.org/T271513 (10fdans) p:05Triage→03Medium [17:15:39] 10Analytics: Follow up on /tmp/DataFrameToDruid permissions after umask change - https://phabricator.wikimedia.org/T271558 (10fdans) p:05Triage→03High a:03Milimetric [17:17:00] 10Analytics: Follow up on /tmp/analytics permissions after umask change on HDFS - https://phabricator.wikimedia.org/T271560 (10fdans) p:05Triage→03High [17:17:28] 10Analytics: Follow up on Druid alarms not firing when Druid indexations were failing due to permission issues - https://phabricator.wikimedia.org/T271568 (10fdans) p:05Triage→03High [17:19:30] 10Analytics, 10Product-Analytics: Update Image usage metric - https://phabricator.wikimedia.org/T271571 (10fdans) p:05Triage→03High [17:20:07] 10Analytics: Review /tmp usage when using hive in oozie workflows - https://phabricator.wikimedia.org/T271687 (10fdans) p:05Triage→03High [18:22:34] elukey: Thank you so much. Sorry for bothering a lot [18:23:06] Amir1: no please don't say that, sorry for the lag, you are doing a ton of work for us! [18:23:54] I wish I could do more :( [18:24:04] this is really simple stuff [18:26:24] Gone for today team - see y'all tomorrow [18:27:12] o/ [18:36:34] * elukey afk! [19:32:16] (03PS1) 10Ottomata: Add UniversalLanguageSelector to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/655495 (https://phabricator.wikimedia.org/T267352) [19:32:53] (03PS2) 10Ottomata: Add UniversalLanguageSelector to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/655495 (https://phabricator.wikimedia.org/T267352) [19:33:31] (03CR) 10jerkins-bot: [V: 04-1] Add UniversalLanguageSelector to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/655495 (https://phabricator.wikimedia.org/T267352) (owner: 10Ottomata) [19:34:18] (03PS3) 10Ottomata: Add UniversalLanguageSelector to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/655495 (https://phabricator.wikimedia.org/T267352) [19:35:13] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Performance-Team: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Gilles) a:03Gilles [19:35:24] (03CR) 10Ottomata: [C: 03+2] Add UniversalLanguageSelector to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/655495 (https://phabricator.wikimedia.org/T267352) (owner: 10Ottomata) [19:58:00] 10Analytics, 10Event-Platform, 10Language-analytics, 10Patch-For-Review: UniversalLanguageSelector Event Platform Migration - https://phabricator.wikimedia.org/T267352 (10Ottomata) The first steps are done. Tomorrow (after JS caches have updated and all events come through EventGate), I'll proceed with th... [19:59:58] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 4 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [20:03:00] 10Analytics, 10Event-Platform, 10Language-analytics, 10Patch-For-Review: UniversalLanguageSelector Event Platform Migration - https://phabricator.wikimedia.org/T267352 (10Ottomata) (FYI: Neil approved this migration to me in Slack) [20:06:10] ottomata: you said that the remaining Growth schemas (the PHP ones) can be migrated now normally? [20:06:14] yup! [20:06:22] ok, will do tomorrow [20:06:29] i just did a PHP migration for SpecialMuteSubmit [20:06:31] but it is hard to test [20:06:38] together with SuggestedTagsAction [20:06:41] esp since there is no data [20:06:47] ah [20:06:59] but let's see what growth ones are PHP? [20:07:16] HomepageVisit and ServerSideAccountCreation [20:07:39] those both have real recent data [20:07:40] you beat me to it [20:07:49] so should be easy enough to verify that it works :p [20:07:52] ok [20:08:11] just wait for production to send more events? [20:08:21] yeah...unless you can figure out how to trigger those events [20:08:31] maybe ServerSideAccountCreation is easy enough? createa new account on testwiki? [20:09:29] 10Analytics, 10Event-Platform, 10Product-Infrastructure-Data: PrefUpdate Event Platform Migration - https://phabricator.wikimedia.org/T267348 (10Ottomata) I'd like to migrate this schema soon. If possible this week! If not, next week is fine. [20:10:13] ottomata: makes sense [20:44:14] 10Analytics, 10Privacy Engineering, 10Research: Release dataset on top search engine referrers by country, device, and language - https://phabricator.wikimedia.org/T270140 (10JFishback_WMF) [20:57:18] (03PS1) 10Fdans: Count only non-locked dimensions when enabling the only dimension [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/655504 [21:04:45] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Performance-Team: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Krinkle) >>! In T271208#6736133, @Ottomata wrote: >> We would like to retain IP and/or geocoded data in the NavigationTimin... [21:06:44] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Performance-Team: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (10Ottomata) Sounds good. I'm mostly focusing on the near term of this migration at the moment and keeping things compatible.... [22:30:35] hello world [22:41:58] hi there tltaylor, welcome! [22:42:11] thanks razzi! [23:49:15] 10Analytics, 10Privacy Engineering, 10Research: Release dataset on top search engine referrers by country, device, and language - https://phabricator.wikimedia.org/T270140 (10bmansurov) o/ @Milimetric! I think I created an Oozie job once ;) Would you say this is a good starting point: https://wikitech.wikim...