[00:40:40] (03PS3) 10Zabe: querypage: Add UnusedCategories.hql [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1268292 (https://phabricator.wikimedia.org/T422448) [00:40:55] (03CR) 10Zabe: querypage: Add UnusedCategories.hql (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1268292 (https://phabricator.wikimedia.org/T422448) (owner: 10Zabe) [00:44:06] 06Data-Engineering, 06DBA: Move SpecialUnusedTemplates computation to Hadoop - https://phabricator.wikimedia.org/T422443#11853584 (10Zabe) [00:46:57] (03PS1) 10Zabe: Add UnusedTemplates.hql [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1276836 (https://phabricator.wikimedia.org/T422443) [00:47:14] (03PS2) 10Zabe: querypage: Add UnusedTemplates.hql [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1276836 (https://phabricator.wikimedia.org/T422443) [00:50:57] (03CR) 10TChin: [C:03+1] "If the attribution research stuff is going to be one of those long-lived experiments, how much should we worry about new namespaces being " [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1276803 (https://phabricator.wikimedia.org/T417050) (owner: 10Milimetric) [07:35:15] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10AQS2.0: Consider updating our heuristics for media type classification in AQS / wikistats - https://phabricator.wikimedia.org/T419882#11854198 (10GGoncalves-WMF) That's the MIME type of the response for a particular transcoding, is that right? So if we s... [07:46:45] 06Data-Engineering, 10Wikidata, 10WMDE Analytics, 06Data-Platform-SRE (2026-03-27 - 2026-04-17): Bug: Published datasets/reports directory sync fails when incompatible files are saved - https://phabricator.wikimedia.org/T412108#11854221 (10Gehel) [09:15:11] 06Data-Engineering, 06Discovery-Search, 06Java-Scala-Standardization, 06Data-Platform-SRE (2026-04-24 - 2026-05-15), and 2 others: [Epic] Replace Archiva with Gitlab artifact repositories - https://phabricator.wikimedia.org/T367315#11854440 (10Gehel) [09:15:43] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE (2026-04-24 - 2026-05-15), 07Essential-Work: Carry out end-user testing of spark on kubernetes - https://phabricator.wikimedia.org/T412925#11854462 (10Gehel) [09:15:48] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Engineering-Radar, 06ServiceOps new, 10ServiceOps-Services-Oids, and 2 others: Make eventstreams-internal available to WMF staff without an ssh tunnel - https://phabricator.wikimedia.org/T348763#11854464 (10Gehel) [09:15:54] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE (2026-04-24 - 2026-05-15), 07Essential-Work: Blunderbuss: Move Hadoop/HDFS XML configuration into Helm deployment chart - https://phabricator.wikimedia.org/T402323#11854466 (10Gehel) [09:16:01] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE (2026-04-24 - 2026-05-15): Optimize enqueueing of refine_webrequest_hourly pipeline - https://phabricator.wikimedia.org/T419050#11854468 (10Gehel) [09:16:19] 06Data-Engineering, 06Data-Engineering-Radar, 06Privacy Engineering, 06Security-Team, and 2 others: Privacy review of x1 tables in preparation of adding them to wikireplicas - https://phabricator.wikimedia.org/T415219#11854474 (10Gehel) [09:16:33] 06Data-Engineering, 06Data-Engineering-Radar, 06cloud-services-team, 06Data-Persistence, and 3 others: Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#11854478 (10Gehel) [09:16:47] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 10Data-Services, and 3 others: Set up x1 replication to Wiki Replicas - https://phabricator.wikimedia.org/T395881#11854480 (10Gehel) [09:16:57] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-04-24 - 2026-05-15), 13Patch-For-Review: Requesting Kerberos access for SCardenas (WMF) - https://phabricator.wikimedia.org/T418664#11854484 (10Gehel) [09:18:29] 06Data-Engineering, 10Test Kitchen, 06Data-Platform-SRE (2026-04-24 - 2026-05-15): Airflow instance for Experiment Platform - https://phabricator.wikimedia.org/T416709#11854522 (10Gehel) [09:18:47] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-04-24 - 2026-05-15), 07Essential-Work: Do performance testing of a big Hadoop Table hosted by Ceph - https://phabricator.wikimedia.org/T381416#11854526 (10Gehel) [09:20:28] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE (2026-04-24 - 2026-05-15): Investigate Gobblin failures - https://phabricator.wikimedia.org/T419436#11854553 (10Gehel) [09:20:34] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Data-Platform-SRE (2026-04-24 - 2026-05-15): Task Tries and Logs for Airflow DAGs sometimes unavailable - https://phabricator.wikimedia.org/T419162#11854554 (10Gehel) [09:20:52] 06Data-Engineering, 10Technical-blog-posts, 06Data-Platform-SRE (2026-04-24 - 2026-05-15), 07Essential-Work: Write a blog post about the recent Airflow migration to Kubernetes - https://phabricator.wikimedia.org/T393603#11854559 (10Gehel) [09:58:07] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 06Wikimedia Enterprise, 10Wikimedia Enterprise - Content Integrity, 06Data-Platform-SRE (2026-04-24 - 2026-05-15), 07Essential-Work: Implement an Airflow operator for moving data from point A to B - https://phabricator.wikimedia.org/T405360#11854662 (... [09:58:27] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Data Pipelines, 06Data-Platform-SRE (2026-04-24 - 2026-05-15), 07Essential-Work: Airflow dynamic task mapping logs mix up when, on rerun, an id is mapped to a different map_index_template - https://phabricator.wikimedia.org/T408802#11854672 (10Gehel) [09:58:47] 06Data-Engineering, 10Dumps-Generation, 10Wikidata, 06Data-Platform-SRE (2026-04-24 - 2026-05-15): Wikidata full .json.gz dumps not published since 20250625 - https://phabricator.wikimedia.org/T412428#11854678 (10Gehel) [09:59:26] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-04-24 - 2026-05-15), 07Essential-Work: Move the dumps_v1 DAGs from the Airflow test_k8s instance to the main instance - https://phabricator.wikimedia.org/T404084#11854704 (10Gehel) [09:59:32] 06Data-Engineering, 10BetaFeatures, 06cloud-services-team, 10Data-Services, and 2 others: Create view for betafeatures_user_counts table in wiki replicas - https://phabricator.wikimedia.org/T402145#11854700 (10Gehel) [09:59:53] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-04-24 - 2026-05-15), 07Essential-Work: Superset "track job" button leads to broken URL - https://phabricator.wikimedia.org/T410149#11854716 (10Gehel) [10:00:19] 06Data-Engineering, 06Data-Platform-SRE (2026-04-24 - 2026-05-15), 07Essential-Work: ERROR AsyncEventQueue: Listener DatahubSparkListener threw an exception - https://phabricator.wikimedia.org/T400207#11854730 (10Gehel) [10:13:41] !log Test Kitchen edge-unique experiments (poll 152648) - adds: logged-out-retention-round8; removes: none; fields: none - xLab/MPIC/TK tips at https://w.wiki/FwuD [10:13:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:48:16] 06Data-Engineering, 06Data-Engineering-Radar, 10Data-Platform, 06Growth-Team, and 5 others: Image Suggestions uses AI-generated images from Commons when adding images on English Wikipedia - https://phabricator.wikimedia.org/T422513#11854864 (10mfossati) [10:55:28] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform: HTML Enrichment: Create wikitech page - https://phabricator.wikimedia.org/T424206#11854873 (10JMonton-WMF) Page added: https://wikitech.wikimedia.org/wiki/MediaWiki_Event_Enrichment/HTML_Enrichment [11:52:32] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10AQS2.0: Consider updating our heuristics for media type classification in AQS / wikistats - https://phabricator.wikimedia.org/T419882#11855042 (10Ladsgroup) Yup. Exactly. I guess we should use both then :D [13:49:57] (03CR) 10Xcollazo: [V:03+2 C:03+2] querypage: Add UnusedTemplates.hql [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1276836 (https://phabricator.wikimedia.org/T422443) (owner: 10Zabe) [13:51:04] (03CR) 10Xcollazo: [V:03+2 C:03+2] "Added to https://wikitech.wikimedia.org/wiki/Data_Platform_Engineering/Ops_week/Analytics_weekly_train" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1276836 (https://phabricator.wikimedia.org/T422443) (owner: 10Zabe) [13:53:02] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 13Patch-For-Review: Monthly reconcile continues to emit a really large amount of events after user_id changes - https://phabricator.wikimedia.org/T419055#11855501 (10xcollazo) 05Open→03Resolved [13:54:56] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10AQS2.0: Consider updating our heuristics for media type classification in AQS / wikistats - https://phabricator.wikimedia.org/T419882#11855535 (10Snwachukwu) Thank you for your input @Ladsgroup. You are right @GGoncalves-WMF, those signals answer differe... [13:55:01] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 13Patch-For-Review: Sunset in-pipeline metrics computation for the MW Content Pipelines - https://phabricator.wikimedia.org/T401010#11855536 (10xcollazo) 05In progress→03Resolved [13:58:37] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Wikidata, 10Wikidata-Query-Service: Add a --output-dir argument to wikibase rdf and json dumps - https://phabricator.wikimedia.org/T401296#11855569 (10xcollazo) 05In progress→03Resolved [13:59:45] 06Data-Engineering, 06Data-Platform-SRE: Presto cluster improvements for concurrency and workload - https://phabricator.wikimedia.org/T424112#11855573 (10Gehel) @JAllemandou should have a look into this when back from vacation [14:06:23] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-Mediawiki-Content: Missing/inconsistent page_redirect_target field for redirects in Mediawiki content current v1 dumps - https://phabricator.wikimedia.org/T400632#11855621 (10xcollazo) >>! In T400632#11697165, @APizzata-WMF wrote: >>>! In T400632#116... [14:21:18] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10AQS2.0: Consider updating our heuristics for media type classification in AQS / wikistats - https://phabricator.wikimedia.org/T419882#11855707 (10GGoncalves-WMF) Sounds good to me! [14:21:30] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Weekly delivery cadence of core contributor metrics - https://phabricator.wikimedia.org/T418032#11855709 (10xcollazo) [14:22:15] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10AQS2.0: Consider updating our heuristics for media type classification in AQS / wikistats - https://phabricator.wikimedia.org/T419882#11855724 (10Ladsgroup) Same! [14:23:47] 06Data-Engineering: Requesting Kerberos password reset - https://phabricator.wikimedia.org/T424268#11855733 (10ssingh) 05Open→03Resolved a:03ssingh It seems like @SKaram-WMF's Kerberos credentials were never created initially: ` sukhe@krb1002:~$ sudo manage_principals.py reset-password skaramwmf --ema... [14:30:54] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Incremental MediaWiki History - https://phabricator.wikimedia.org/T424350#11855769 (10xcollazo) p:05Triage→03High [14:35:19] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Accelerate sqoop landing for MediaWiki History private tables - https://phabricator.wikimedia.org/T424355 (10xcollazo) 03NEW [14:37:01] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-Mediawiki-Content: Missing/inconsistent page_redirect_target field for redirects in Mediawiki content current v1 dumps - https://phabricator.wikimedia.org/T400632#11855831 (10APizzata-WMF) >Do we need to do anything else here @APizzata-WMF? Not real... [14:48:24] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Investigate oversized driver logs for `mediawiki_history_denormalize` - https://phabricator.wikimedia.org/T424358 (10xcollazo) 03NEW [14:55:51] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Architectural design agreement: Incremental MediaWiki History - https://phabricator.wikimedia.org/T424359 (10xcollazo) 03NEW [15:16:50] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: Edit type enrichment: Update stream to use new deployment config - https://phabricator.wikimedia.org/T424223#11855981 (10brouberol) ` brouberol@cephosd1001:~$ sudo radosgw-admin user create --uid=mw-page-html-feature... [15:17:49] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: Edit type enrichment: Update stream to use new deployment config - https://phabricator.wikimedia.org/T424223#11855984 (10brouberol) ` brouberol@cephosd1001:~$ sudo radosgw-admin user create --uid=mw-page-html-feature... [15:20:26] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: Edit type enrichment: Update stream to use new deployment config - https://phabricator.wikimedia.org/T424223#11855988 (10brouberol) The secrets were added to the private puppet repo. [15:58:12] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Architectural design agreement: Incremental MediaWiki History - https://phabricator.wikimedia.org/T424359#11856132 (10xcollazo) # Pitch: Moving MediaWiki History to a faster cadence (monthly → daily) ## Context **Problem.** The MediaWiki History pipeline (... [16:01:02] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Architectural design agreement: Incremental MediaWiki History - https://phabricator.wikimedia.org/T424359#11856142 (10xcollazo) [16:01:36] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform: Delete old edit-type stream - https://phabricator.wikimedia.org/T424364 (10AKhatun_WMF) 03NEW [16:02:07] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10AQS2.0: Sqoop file and filetypes tables from Commonswiki to the Hdfs - https://phabricator.wikimedia.org/T424365 (10Snwachukwu) 03NEW [16:04:44] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Backfill Movement Insights dbt models 5 years back - https://phabricator.wikimedia.org/T424366 (10amastilovic) 03NEW [16:12:42] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Backfill Movement Insights dbt models 5 years back - https://phabricator.wikimedia.org/T424366#11856217 (10amastilovic) In order to establish practical patterns for backfilling `dbt-jobs` models, I'm running backfills using both the Airflow's CLI backfill co... [16:18:06] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Incremental MediaWiki History - https://phabricator.wikimedia.org/T424350#11856230 (10xcollazo) [16:28:46] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Architectural design agreement: Incremental MediaWiki History - https://phabricator.wikimedia.org/T424359#11856247 (10Ahoelzl) I think I need AI to review this :-) [16:34:38] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10DPE-Mediawiki-Content: Missing/inconsistent page_redirect_target field for redirects in Mediawiki content current v1 dumps - https://phabricator.wikimedia.org/T400632#11856295 (10Isaac) Just an FYI that @MGerlach will be out until May 6th -- let me know... [17:30:13] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform: Edit type Enrichment: Update flink code - https://phabricator.wikimedia.org/T424231#11856469 (10AKhatun_WMF) @Ottomata, @JMonton-WMF There are some events in error sink. Not a new error, there are events in old edit-type error sink with t... [17:34:55] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform: Edit type enrichment: Update stream to use new deployment config - https://phabricator.wikimedia.org/T424223#11856497 (10AKhatun_WMF) [17:35:17] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform: Edit type Enrichment: Update flink code - https://phabricator.wikimedia.org/T424231#11856499 (10AKhatun_WMF) [17:47:39] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform: Edit type Enrichment: Update flink code - https://phabricator.wikimedia.org/T424231#11856550 (10Ottomata) This might be a case of undeleting a page with a single revision, which is more like a page create than an edit. Let's treat `page_... [17:48:58] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform: Edit type Enrichment: Update flink code - https://phabricator.wikimedia.org/T424231#11856559 (10AKhatun_WMF) Yes, the html pipeline is just not adding the diff, the current html is present. [18:09:01] 06Data-Engineering, 06Data-Engineering-Radar, 06MW-Interfaces-Team, 10Event-Platform, and 2 others: Ad-hoc EventBus event submissions should be batched - https://phabricator.wikimedia.org/T382970#11856642 (10Ottomata) [19:05:49] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): A more recent Spark + Iceberg will make Incremental MediaWiki History much more efficient - https://phabricator.wikimedia.org/T424381 (10xcollazo) 03NEW [19:16:33] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Draft: Architectural design agreement: Incremental MediaWiki History - https://phabricator.wikimedia.org/T424359#11856868 (10xcollazo) [21:30:55] 10Data-Engineering-Roadmap, 07Epic, 07OKR-Work (WE1 FY2025-26): dbt DPE work - https://phabricator.wikimedia.org/T416679#11857307 (10ldelench_wmf) [23:05:48] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st): Backfill Movement Insights dbt models 5 years back - https://phabricator.wikimedia.org/T424366#11857422 (10amastilovic) To backfill using `dbt` itself, I'm using the following procedure: ` ssh an-launcher1003.eqiad.wmnet ` Create a separate `profiles.yaml`...