[01:13:29] 06Data-Engineering, 13Patch-For-Review: Update pingback MediaWiki versions to include new values - https://phabricator.wikimedia.org/T413349#11490744 (10Pppery) [08:01:56] (03CR) 10Joal: [V:03+2 C:03+2] "LGTM! Merging for later deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1222506 (https://phabricator.wikimedia.org/T413349) (owner: 10Cicalese) [09:53:23] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Add status columns to transcode table in wmf production - https://phabricator.wikimedia.org/T413688#11491159 (10FCeratto-WMF) 05Open→03In progress [10:07:06] 06Data-Engineering (Q2 FY25/26 October 1st - December 31th), 06Data-Platform-SRE (2025.11.07 - 2025.11.28): Airflow-main scheduler loop sometimes slows down markedly - https://phabricator.wikimedia.org/T412003#11491197 (10BTullis) a:05BTullis→03JAllemandou [10:25:39] 06Data-Engineering, 10MediaWiki-General, 13Patch-For-Review: Update pingback MediaWiki versions to include new values - https://phabricator.wikimedia.org/T413349#11491216 (10A_smart_kitten) (retagging for visibility as this affects MW reporting, even though the change isn't being made to MW itself) [10:31:54] 06Data-Engineering (Q2 FY25/26 October 1st - December 31th), 06Data-Platform-SRE: Update druid `mediawiki_history_reduced` dataset retention to 2 - https://phabricator.wikimedia.org/T413752 (10JAllemandou) 03NEW [10:32:10] 06Data-Engineering (Q2 FY25/26 October 1st - December 31th), 06Data-Platform-SRE: Update druid `mediawiki_history_reduced` dataset retention to 2 - https://phabricator.wikimedia.org/T413752#11491252 (10JAllemandou) a:03JAllemandou [10:45:41] 06Data-Engineering (Q2 FY25/26 October 1st - December 31th), 06Data-Platform-SRE: Update sqoop to gather the `centralauth` tables before tables non needed for `mediawiki_history` - https://phabricator.wikimedia.org/T413754 (10JAllemandou) 03NEW [10:46:03] 06Data-Engineering (Q2 FY25/26 October 1st - December 31th), 06Data-Platform-SRE: Update sqoop to gather the `centralauth` tables before tables non needed for `mediawiki_history` - https://phabricator.wikimedia.org/T413754#11491302 (10JAllemandou) a:03JAllemandou [10:50:16] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop the cupe_private column from the cu_private_event table on WMF wikis - https://phabricator.wikimedia.org/T412943#11491312 (10FCeratto-WMF) 05Open→03In progress [11:18:44] 06Data-Engineering (Q2 FY25/26 October 1st - December 31th), 06Data-Platform-SRE (2025.11.07 - 2025.11.28), 13Patch-For-Review: Update druid `mediawiki_history_reduced` dataset retention to 2 - https://phabricator.wikimedia.org/T413752#11491359 (10BTullis) [12:12:27] 06Data-Engineering, 06Data-Platform-SRE, 06Java-Scala-Standardization, 06Discovery-Search (2026.01.05 - 2026.01.30), and 2 others: Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all dependencies ... - https://phabricator.wikimedia.org/T367405#11491570 [12:12:30] 06Data-Engineering, 10CirrusSearch, 06Data-Platform-SRE, 10DPE-Mediawiki-Content, and 4 others: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script - https://phabricator.wikimedia.org/T366248#11491568 (10pfischer) [12:25:22] 06Data-Engineering, 10CheckUser, 07Essential-Work, 06Product Safety and Integrity (Essential Work Sprint (Dec 15th - Jan 9th)), 07Schema-change: Add a default for the *_agent columns in the CheckUser result tables - https://phabricator.wikimedia.org/T413764 (10Dreamy_Jazz) 03NEW [12:25:28] 06Data-Engineering, 10CheckUser, 07Essential-Work, 06Product Safety and Integrity (Essential Work Sprint (Dec 15th - Jan 9th)), 07Schema-change: Add a default for the *_agent columns in the CheckUser result tables - https://phabricator.wikimedia.org/T413764#11491642 (10Dreamy_Jazz) [12:25:43] 06Data-Engineering, 10CheckUser, 07Essential-Work, 06Product Safety and Integrity (Essential Work Sprint (Dec 15th - Jan 9th)), 07Schema-change: Add a default for the *_agent columns in the CheckUser result tables - https://phabricator.wikimedia.org/T413764#11491644 (10Dreamy_Jazz) a:03Dreamy_Jazz [12:26:34] 06Data-Engineering, 10CheckUser, 07Essential-Work, 06Product Safety and Integrity (Essential Work Sprint (Dec 15th - Jan 9th)), 07Schema-change: Add a default for the *_agent columns in the CheckUser result tables - https://phabricator.wikimedia.org/T413764#11491647 (10Dreamy_Jazz) [12:45:30] 06Data-Engineering, 10Dumps-Generation: Some wikimedia 20260101 dumps missing - https://phabricator.wikimedia.org/T413767 (10mykhal) 03NEW [12:53:09] 06Data-Engineering, 10Dumps-Generation: Some wikimedia 20260101 dumps missing - https://phabricator.wikimedia.org/T413767#11491749 (10mykhal) [13:08:58] 06Data-Engineering, 10CheckUser, 07Essential-Work, 06Product Safety and Integrity (Essential Work Sprint (Dec 15th - Jan 9th)), 07Schema-change: Add a default for the *_agent columns in the CheckUser result tables - https://phabricator.wikimedia.org/T413764#11491811 (10Dreamy_Jazz) 05Open→03Declin... [13:09:10] 06Data-Engineering, 10CheckUser, 07Essential-Work, 06Product Safety and Integrity (Essential Work Sprint (Dec 15th - Jan 9th)), 07Schema-change: Add a default for the *_agent columns in the CheckUser result tables - https://phabricator.wikimedia.org/T413764#11491814 (10Dreamy_Jazz) 05Declined→03In... [14:55:09] 06Data-Engineering (Q2 FY25/26 October 1st - December 31th): Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily - https://phabricator.wikimedia.org/T410431#11492207 (10xcollazo) @MGerlach [[ https://wikimedia.slack.com/archives/CSV483812/p1766423597871369 | identified another re... [15:05:18] 06Data-Engineering, 06Reader Growth Team, 06Wikipedia-Android-App-Backlog, 06Wikipedia-iOS-App-Backlog, and 4 others: Add page_id and namespace to X-Analytics header in Mobile App requests (2025 remake) - https://phabricator.wikimedia.org/T409358#11492251 (10Ottomata) @Milimetric, @Jgiannelos has a questio... [16:49:45] 06Data-Engineering, 10CirrusSearch, 06Data-Platform-SRE, 10DPE-Mediawiki-Content, and 4 others: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script - https://phabricator.wikimedia.org/T366248#11492787 (10BTullis) Should we start by disabling the legacy cirrussearch dumps in t... [16:59:23] 06Data-Engineering, 06Test Kitchen: json schema tools: Can we allow changing the regex pattern for a schema field - https://phabricator.wikimedia.org/T411518#11492846 (10Ottomata) We could! I think technically changing the valid regex is backwards incompatible though, because it can potentially break consumer... [17:17:47] 06Data-Engineering, 10CirrusSearch, 06Data-Platform-SRE, 10DPE-Mediawiki-Content, and 4 others: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script - https://phabricator.wikimedia.org/T366248#11492927 (10EBernhardson) >>! In T366248#11492787, @BTullis wrote: > Should we start... [18:20:55] 06Data-Engineering, 10Dumps-Generation: Some wikimedia 20260101 dumps missing - https://phabricator.wikimedia.org/T413767#11493335 (10mykhal) These are currently not found (HTTP 404): ` https://dumps.wikimedia.org/enwiki/20260101/enwiki-20260101-pages-logging.xml.gz https://dumps.wikimedia.org/enwiki/20260101/... [18:35:24] 06Data-Engineering, 10Dumps-Generation: Some wikimedia 20260101 dumps missing - https://phabricator.wikimedia.org/T413767#11493422 (10xcollazo) IIRC, the status page updating before it is is actually ready for download is a side effect of T388378. We do not rsync the data files as often as we used to before th... [18:56:58] 06Data-Engineering, 10Dumps-Generation, 10MW-on-K8s: Orchestrate dumps v1 from an airflow instance - https://phabricator.wikimedia.org/T388378#11493577 (10mykhal) [19:03:21] 06Data-Engineering, 06Machine-Learning-Team, 06serviceops: Enable ChangeProp to consume mediawiki.page_content_change.v1 - https://phabricator.wikimedia.org/T409469#11493601 (10Ottomata) > as it stands, moving such a big topic to kavka-main is an option. If we want to have such fat topics (on top of all... [19:07:49] 06Data-Engineering, 10Event-Platform: Schema validation for EventStreamConfig - https://phabricator.wikimedia.org/T405516#11493631 (10Ottomata) > How would EventGate or Kafka or whatever obtain in (in cases where topics is not explicitly set) if it's omitted from the API response? Do they not query the API? `... [19:08:52] 06Data-Engineering, 10Event-Platform, 13Patch-For-Review, 07Wikimedia-Performance-recommendation: StreamConfig::validate() eating 0.5% of index.php time - https://phabricator.wikimedia.org/T413350#11493634 (10Ottomata) Indeed! There is no reason to do this on every request. It would be better to do this... [19:09:01] 06Data-Engineering, 10Event-Platform, 13Patch-For-Review, 07Wikimedia-Performance-recommendation: StreamConfig::validate() eating 0.5% of index.php time - https://phabricator.wikimedia.org/T413350#11493636 (10Ottomata) [19:14:59] 06Data-Engineering, 10Tool-Pageviews: Generate 2025 topviews yearly datasets - https://phabricator.wikimedia.org/T413393#11493666 (10MusikAnimal) 05Open→03In progress [21:25:54] 06Data-Engineering (Q2 FY25/26 October 1st - December 31th), 13Patch-For-Review: Fix reconcile bug where user_id is not being populated correctly. - https://phabricator.wikimedia.org/T411803#11493987 (10xcollazo) From [[ https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1903 | M...