[07:01:51] 06Data-Engineering, 06Data-Engineering-Radar, 06cloud-services-team, 10Data-Services, 06DBA: Re-run maintainviews on all clouddb* and an-redacteddb1001.eqiad.wmnet - https://phabricator.wikimedia.org/T422459#11832571 (10FCeratto-WMF) [07:13:09] 06Data-Engineering, 06Data-Engineering-Radar, 06Commons, 06Data-Persistence, and 6 others: Migrate file tables to a modern layout (image/oldimage; file/filerevision; add primary keys) - https://phabricator.wikimedia.org/T28741#11832594 (10Nemoralis) [07:36:36] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635#11832646 (10FCeratto-WMF) [07:56:16] 06Data-Engineering, 06Data-Engineering-Radar, 06cloud-services-team, 10Data-Services, 06DBA: Re-run maintainviews on all clouddb* and an-redacteddb1001.eqiad.wmnet - https://phabricator.wikimedia.org/T422459#11832656 (10FCeratto-WMF) The run on s5 had an error that needs further investigation: https://ph... [08:32:02] 06Data-Engineering, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 10Event-Platform, 07good first task, 13Patch-For-Review: Flink base image should not install into system python environment - https://phabricator.wikimedia.org/T418525#11832710 (10atsuko) @Ottomata shall we move this step to the base pack... [08:44:06] FIRING: EventgateProduceRateAnomaly: Significant produce rate deviation (+-25%) on eventgate-analytics-external. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-dc=000000026&var-service=eventgate-analytics-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateProduceRateAnomaly [08:49:06] RESOLVED: EventgateProduceRateAnomaly: Significant produce rate deviation (+-25%) on eventgate-analytics-external. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-dc=000000026&var-service=eventgate-analytics-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateProduceRateAnomaly [09:11:45] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635#11832765 (10FCeratto-WMF) The run on s5 had an error that needs further investigation: https://phabricator.wikimedia.org/... [09:31:03] 06Data-Engineering, 06Product-Analytics: Iceberg table maintenance for tables under wmf_product database - https://phabricator.wikimedia.org/T391135#11832809 (10KCVelaga_WMF) 05Open→03Resolved a:03KCVelaga_WMF [09:35:23] 06Data-Engineering, 06Product-Analytics: Iceberg table maintenance for tables under wmf_product database - https://phabricator.wikimedia.org/T391135#11832822 (10KCVelaga_WMF) All looks good to me. The maintenance DAGs have been running successfully. Thank you [09:38:51] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics_privatedata_users and SQL Lab for AnnieKim_WMDE - https://phabricator.wikimedia.org/T420500#11832838 (10Martyn.ranyard) I approve this, but I think @Scott_French or someone else on that team still needs to add you to the gro... [09:44:37] PROBLEM - statsv Varnishkafka log producer on cp3071 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [09:45:37] RECOVERY - statsv Varnishkafka log producer on cp3071 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [09:58:14] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work, 13Patch-For-Review: Provide an access to MaxMind GeoIP in DSE K8S pods - https://phabricator.wikimedia.org/T405509#11832918 (10BTullis) Ah, since {T423279} was completed, the head of the reposit... [09:59:09] 06Data-Engineering, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 10Event-Platform, 07good first task, 13Patch-For-Review: Flink base image should not install into system python environment - https://phabricator.wikimedia.org/T418525#11832924 (10atsuko) ` Successfully published image docker-registry.dis... [10:20:00] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635#11832999 (10Marostegui) >>! In T419635#11832765, @FCeratto-WMF wrote: > The run on s5 had an error that needs further inv... [11:46:38] 06Data-Engineering, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 10Event-Platform, 07good first task, 13Patch-For-Review: Flink base image should not install into system python environment - https://phabricator.wikimedia.org/T418525#11833278 (10atsuko) [12:24:25] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work, 13Patch-For-Review: Provide an access to MaxMind GeoIP in DSE K8S pods - https://phabricator.wikimedia.org/T405509#11833354 (10BTullis) 05Open→03Resolved I'm happy with the DAG and the f... [13:04:28] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11833445 (10Ottomata) Intersting data point: Last night at about 00:00 both staging and prod had a spike in checkpoint... [13:07:03] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11833454 (10Ottomata) Also, a TM pod in production container OOMed and the job was auto restarted. Staging has paralle... [13:28:09] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11833531 (10Ottomata) >> I fixed the process function metrics. I accidentally renamed them. > Errr, I thought I fixed... [13:46:43] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Pipeline - Performance improvements - https://phabricator.wikimedia.org/T422928#11833564 (10Ottomata) Thanks @jmeybohm and @dcausse. I'd like to try this! So, IIUC we should: - set `x-envoy-max-retries: 0` -... [14:43:57] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11833754 (10Ottomata) > I'd like to experiment with python thread worker mode ` helmfile apply -e dse-k8s-eqiad \ --s... [14:44:36] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Pipeline - Performance improvements - https://phabricator.wikimedia.org/T422928#11833756 (10Ottomata) [14:51:22] 06Data-Engineering, 06Data-Engineering-Radar, 06cloud-services-team, 10Data-Services, 06DBA: Re-run maintainviews on all clouddb* and an-redacteddb1001.eqiad.wmnet - https://phabricator.wikimedia.org/T422459#11833785 (10FCeratto-WMF) [14:56:04] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11833801 (10Ottomata) Trying JEMALLOC in prod: ` helmfile apply -e dse-k8s-eqiad \ --set app.version=v1.51.0 \... [14:58:30] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Pipeline - Performance improvements - https://phabricator.wikimedia.org/T422928#11833819 (10Ottomata) [15:03:53] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Pipeline - Performance improvements - https://phabricator.wikimedia.org/T422928#11833845 (10Ottomata) [15:29:17] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11833924 (10Ottomata) Attempting a backfill on staging with ` helmfile apply -e dse-k8s-eqiad \ --set app.version=v1... [15:51:03] 06Data-Engineering, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 10Event-Platform, 07good first task, 13Patch-For-Review: Flink base image should not install into system python environment - https://phabricator.wikimedia.org/T418525#11833996 (10atsuko) To update: * [search-platform/cirrus-streaming-upd... [16:14:50] 06Data-Engineering, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 10Event-Platform, 07good first task, 13Patch-For-Review: Flink base image should not install into system python environment - https://phabricator.wikimedia.org/T418525#11834095 (10Ottomata) mediawiki-event-enrichment MR is deployed for mw... [16:22:10] 06Data-Engineering, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 10Event-Platform, 07good first task, 13Patch-For-Review: Flink base image should not install into system python environment - https://phabricator.wikimedia.org/T418525#11834097 (10atsuko) 05In progress→03Resolved [16:22:23] 06Data-Engineering, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 10Event-Platform, 07good first task, 13Patch-For-Review: Flink base image should not install into system python environment - https://phabricator.wikimedia.org/T418525#11834098 (10atsuko) [17:05:00] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics_privatedata_users and SQL Lab for AnnieKim_WMDE - https://phabricator.wikimedia.org/T420500#11834353 (10Dzahn) 05Resolved→03Open a:05Scott_French→03None [17:21:52] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11834406 (10Ottomata) [[ https://grafana.wikimedia.org/goto/efjdltpva8qv4b?orgId=1 | Checkpoint times up to about 2 min... [18:10:38] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11834556 (10Ottomata) Hm: https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/state/checkpointing_unde... [19:34:10] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11834745 (10Ottomata) I'm not entirely sure what to do other than increase checkpoint timeout. Even if we do the unord... [19:51:53] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11834776 (10Ottomata) I usually don't see skew: why would we? Unless somehow all very busy pages get mapped to the same... [19:55:31] 06Data-Engineering, 06Data-Engineering-Radar, 06Commons, 06Data-Persistence, and 6 others: Migrate file tables to a modern layout (image/oldimage; file/filerevision; add primary keys) - https://phabricator.wikimedia.org/T28741#11834779 (10Quiddity) [21:29:28] !log Test Kitchen mw-user experiment (poll 124783) - adds: none; removes: ab-test-email-confirmation-banner; fields: none - xLab/MPIC/TK tips at https://w.wiki/FwuD [21:29:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [23:07:41] 06Data-Engineering: Inconsistent country name for Netherlands in Turnilo pageviews - https://phabricator.wikimedia.org/T423751 (10Pcoombe) 03NEW [23:10:56] 06Data-Engineering (Q4 FS25/26 April 1st - June 30st), 10Event-Platform, 13Patch-For-Review: HTML Enrichment - Tuning & Backfilling configuration - https://phabricator.wikimedia.org/T421216#11835230 (10Ottomata) Checkpoints are being triggered about every 60 seconds, even though we are underbackpressure. Goi...