[00:16:48] !log Test Kitchen edge-unique experiments (poll 262989) - adds: attribution-research-short-baseline-run; removes: none; fields: none - xLab/MPIC/TK tips at https://w.wiki/FwuD [00:16:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [00:30:18] 14Analytics-Radar, 06Data-Engineering, 06Data-Engineering-Icebox, 10Pageviews-API, 10Tool-Pageviews: massviews hits 429 Too Many Requests despite making requests synchronously - https://phabricator.wikimedia.org/T219857#11716619 (10MusikAnimal) 05Open→03Resolved a:03MusikAnimal For most users,... [00:30:47] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Build a set of configurable pre-scheduled DBT Airflow DAGs executing dbt-jobs models - https://phabricator.wikimedia.org/T419925#11716627 (10amastilovic) > As I understand it, orchestrating models would depend on submitting MRs to two repositories: dbt-jo... [00:38:06] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Build a set of configurable pre-scheduled DBT Airflow DAGs executing dbt-jobs models - https://phabricator.wikimedia.org/T419925#11716643 (10amastilovic) >I personally like the idea of running a single dbt run and letting dbt handle all the dependencies,... [00:40:07] 06Data-Engineering, 10Event-Platform: Upgrade eventstreams and eventstreams-internal to node24 (or node22) - https://phabricator.wikimedia.org/T420257#11716649 (10Ottomata) Yeah let's try node24 first! [00:40:10] 06Data-Engineering, 10Event-Platform: Upgrade eventstreams and eventstreams-internal to node24 (or node22) - https://phabricator.wikimedia.org/T420257#11716650 (10Ottomata) [00:42:50] 06Data-Engineering, 10Event-Platform: Upgrade eventgate-* to node24 - https://phabricator.wikimedia.org/T420296 (10Ottomata) 03NEW [01:11:51] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [01:11:51] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [03:02:18] 14Analytics-Radar, 06Data-Engineering, 06Data-Engineering-Icebox, 10Pageviews-API, 10Tool-Pageviews: massviews hits 429 Too Many Requests despite making requests synchronously - https://phabricator.wikimedia.org/T219857#11716872 (10Hawkeye7) @MusikAnimal: I tried to run a massviews and got: https://e... [05:11:51] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [05:11:51] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [08:46:34] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07Essential-Work, 10Event-Platform, 13Patch-For-Review: Upgrade mediawiki-event-enrichment jobs to >= Flink 1.20.3 and Java 17 - https://phabricator.wikimedia.org/T408918#11717375 (10JMonton-WMF) Thanks @xcollazo, I was about to do that deployment t... [09:03:23] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Build a set of configurable pre-scheduled DBT Airflow DAGs executing dbt-jobs models - https://phabricator.wikimedia.org/T419925#11717413 (10JMonton-WMF) > Cons: > > - More complicated to develop support for. Requires somewhat elaborate Airflow Python co... [09:06:08] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635#11717419 (10FCeratto-WMF) p:05Triage→03Medium [09:11:51] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [09:11:51] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [10:26:51] RESOLVED: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [10:26:51] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [10:28:06] FIRING: [3x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [10:28:06] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [10:33:06] FIRING: [4x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [10:33:06] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [10:50:42] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07Essential-Work, 10Event-Platform, 13Patch-For-Review: Upgrade mediawiki-event-enrichment jobs to >= Flink 1.20.3 and Java 17 - https://phabricator.wikimedia.org/T408918#11717802 (10JMonton-WMF) The issue was that some time ago, between v1.43 and v... [12:14:28] (03PS14) 10Andrew McAllister (WMDE): Add contributing guide and update readme [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243181 [12:16:37] 06Data-Engineering, 06Data-Persistence, 06ServiceOps new, 10ServiceOps-Mediawiki, and 2 others: ICU 72 upgrade: `categorylinks` table swap - https://phabricator.wikimedia.org/T419980#11718125 (10Raine) >>! In T419980#11713167, @Ladsgroup wrote: > - I think this needs to be done gradually. We have never do... [12:24:59] 06Data-Engineering, 06Data-Persistence, 06ServiceOps new, 10ServiceOps-Mediawiki, and 2 others: ICU 72 upgrade: `categorylinks` table swap - https://phabricator.wikimedia.org/T419980#11718195 (10Ladsgroup) >>! In T419980#11718125, @Raine wrote: >>>! In T419980#11713167, @Ladsgroup wrote: >> - I think this... [13:46:18] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Reconcile Flink Job: Too many warnings - https://phabricator.wikimedia.org/T420353 (10JMonton-WMF) 03NEW [13:50:25] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): enwiki File Export failed for 2026-03-01 - https://phabricator.wikimedia.org/T419291#11718641 (10xcollazo) a:03xcollazo [13:50:51] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): HDFS usage dashboard is quadruple counting file counts and file sizes - https://phabricator.wikimedia.org/T418780#11718644 (10xcollazo) p:05Triage→03High a:03xcollazo [13:51:50] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE, 10Event-Platform: Increase Max Message Size in Kafka Jumbo to 20MB - https://phabricator.wikimedia.org/T420356 (10Ottomata) 03NEW [14:00:13] 06Data-Engineering, 06DBA, 06Machine-Learning-Team, 10ORES, 07Schema-change-in-production: Drop ORES tables from wikis without ORES - https://phabricator.wikimedia.org/T420093#11718703 (10Dreamy_Jazz) [14:26:19] 14Analytics, 06Data-Engineering, 10Event-Platform: [Event Platform] Add expiry info to mediawiki.page-restrictions-change stream - https://phabricator.wikimedia.org/T282057#11718828 (10Ottomata) @Isaac we are considering just doing this. The ticket is a bit old though. Would this be valuable to just do? Do... [14:33:06] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [14:33:06] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [14:35:33] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-03-06 - 2026-03-27): Transfer ownership of Watchlist CTR dashboard to Mikhail - https://phabricator.wikimedia.org/T418485#11718886 (10mpopov) 05In progress→03Resolved I was able to fix everything I needed to fix. Sorry for the ve... [14:35:43] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-03-06 - 2026-03-27): Transfer ownership of Watchlist CTR dashboard to Mikhail - https://phabricator.wikimedia.org/T418485#11718890 (10mpopov) [14:51:58] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Test Kitchen: GrowthBook experiment analysis keeps failing/stalling - https://phabricator.wikimedia.org/T419286#11718988 (10mpopov) 05Open→03Resolved @amastilovic: Thank you for taking a look and making the change that fixed the problem! (Sorr... [15:22:09] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): HDFS usage dashboard is quadruple counting file counts and file sizes - https://phabricator.wikimedia.org/T418780#11719190 (10xcollazo) Context: this dashboard was put together as part of {T381707}. Root cause: The base_directories CTE in the Superset vi... [15:27:58] 06Data-Engineering: [Maintenance] Add a deletion job for `hdfs_usage` data - https://phabricator.wikimedia.org/T348774#11719232 (10xcollazo) This continues to be an issue: ` hive (default)> show partitions wmf.hdfs_usage; OK partition snapshot=2022-10-16 ... snapshot=2026-03-02 snapshot=2026-03-09 Time taken: 0... [15:35:42] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Build a set of configurable pre-scheduled DBT Airflow DAGs executing dbt-jobs models - https://phabricator.wikimedia.org/T419925#11719311 (10amastilovic) >I'm now wondering, should we give a try to Cosmos? It is supposed to split dbt models into Airflow t... [16:22:42] (03CR) 10Lucas Werkmeister (WMDE): T416680 Split active_user_changes.sql into user/temp account versions and run both Bug: T416680 (032 comments) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243177 (https://phabricator.wikimedia.org/T416680) (owner: 10Andrew McAllister (WMDE)) [16:24:40] (03CR) 10Lucas Werkmeister (WMDE): Add contributing guide and update readme (033 comments) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243181 (owner: 10Andrew McAllister (WMDE)) [16:29:39] (03PS7) 10Andrew McAllister (WMDE): Split active_user_changes.sql into user/temp account versions and run both [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243177 (https://phabricator.wikimedia.org/T416680) [17:29:03] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Event-Platform: Unbalanced partitions in eqiad.mediawiki.content_history_reconcile.v1 topic - https://phabricator.wikimedia.org/T420359#11720254 (10JMonton-WMF) [17:50:58] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): enwiki File Export failed for 2026-03-01 - https://phabricator.wikimedia.org/T419291#11720370 (10xcollazo) >>! In T419291#11720295, @CodeReviewBot wrote: > xcollazo updated https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/... [18:33:36] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [18:33:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [19:23:54] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): enwiki File Export failed for 2026-03-01 - https://phabricator.wikimedia.org/T419291#11720616 (10xcollazo) Ran the following as a one off: ` $ hostname -f an-launcher1003.eqiad.wmnet $ whoami analytics hdfs dfs -rm -R /wmf/data/exports/mediawiki_content_... [19:34:45] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): enwiki File Export failed for 2026-03-01 - https://phabricator.wikimedia.org/T419291#11720643 (10xcollazo) Rerunning via https://yarn.wikimedia.org/proxy/application_1764064841637_2255221/jobs/ [20:24:53] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Research-engineering, 06Research-Freezer, 10Event-Platform, 13Patch-For-Review: Productionized Edit Types - https://phabricator.wikimedia.org/T351225#11720787 (10Ottomata) @AKhatun_WMF and I met today to try and make some progress on the 'edit t... [20:27:02] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline - https://phabricator.wikimedia.org/T420412 (10mforns) 03NEW [20:43:51] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline - https://phabricator.wikimedia.org/T420412#11720858 (10mforns) One question related to creating new tables in the new Iceberg databases. I want to create a new ta... [20:48:06] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [20:48:06] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [22:00:11] 06Data-Engineering, 06Data-Persistence, 06DBA, 06ServiceOps new, and 4 others: ICU 72 upgrade: `categorylinks` table swap - https://phabricator.wikimedia.org/T419980#11721089 (10Raine) [22:15:32] PROBLEM - statsv Varnishkafka log producer on cp3071 is CRITICAL: PROCS CRITICAL: 3 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [22:16:32] RECOVERY - statsv Varnishkafka log producer on cp3071 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [23:17:10] 06Data-Engineering, 06DBA, 06Machine-Learning-Team, 10ORES, 07Schema-change-in-production: Drop ORES tables from wikis without ORES - https://phabricator.wikimedia.org/T420093#11721270 (10Ladsgroup) p:05Triage→03Medium