[01:26:01] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Airflow Refine: re-run of failed mapped tasks with changed stream config can cause silent data loss or data duplication - https://phabricator.wikimedia.org/T418021#11638980 (10Ottomata) >> Short-Term Mitigations > Avoid rerunning entire DAG runs when con... [01:35:02] 06Data-Engineering: spark-sql warns about mismatching table schema for event.EditAttemptStep - https://phabricator.wikimedia.org/T418065#11638986 (10Ottomata) Aye ya thanks for reporting! This is very annoying but it a known problem. It is due to this very annoying bug: {T209453} The ultimate fix is to switch e... [01:55:19] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07OKR-Work (WE1 FY2025-26): WE1.5.3 Productize Data for Monthly Active Moderator Actions - https://phabricator.wikimedia.org/T410940#11638990 (10AKhatun_WMF) [01:55:26] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Research, 10Event-Platform, 13Patch-For-Review: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794#11638991 (10AKhatun_WMF) [06:25:38] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786#11639073 (10Marostegui) [09:25:01] 06Data-Engineering, 10Dumps-Generation: Data missing from en.wiktionary.org February 2026 "MediaWiki Content File Exports" compared to "XML Database dump" - https://phabricator.wikimedia.org/T417596#11639301 (10APizzata-WMF) >@APizzata-WMF: FYI. I suspect these may be issues with our reconciliation mechanism.... [10:05:36] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Data Pipelines, 10Wikidata, 10Wikidata Analytics: NEW FEATURE REQUEST: sqoop (all) user properties from mariadb to wmf_raw.mediawiki_user_properties - https://phabricator.wikimedia.org/T323456#11639676 (10AndrewTavis_WMDE) > Alternatively, Wikidat... [12:01:33] 06Data-Engineering, 10Event-Platform: EventBus' stream config destination_event_service setting should move into producers.mediawikI_eventbus specific settings. - https://phabricator.wikimedia.org/T321557#11640127 (10Aklapper) a:05Ottomata→03None @Ottomata Removing task assignee as this open task has been... [12:01:36] 06Data-Engineering, 06Data-Platform-SRE, 06SRE: Grant IdempotentWrite Kafka Cluster ACL to User:ANONYMOUS in all Kafka clusters - https://phabricator.wikimedia.org/T334733#11640125 (10Aklapper) a:05Ottomata→03None @Ottomata Removing task assignee as this open task has been assigned for more than two year... [12:08:35] 06Data-Engineering, 06Data-Platform-SRE, 07Epic, 13Patch-For-Review: Upgrade Spark to a version with long term Iceberg support, and with fixes to support Dumps 2.0 - https://phabricator.wikimedia.org/T338057#11640179 (10awight) One more +1 for Spark 3.5. My use case is that I want to write directly to the... [14:15:43] 06Data-Engineering: Requesting a Kerberos identity for shell user lerickson (lerickson@wikimedia.org) - https://phabricator.wikimedia.org/T418135 (10lerickson) 03NEW [14:51:04] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10DPE-Mediawiki-Content: Use wmf.mediawiki_history as baseline for slo completeness - https://phabricator.wikimedia.org/T416312#11640724 (10Milimetric) technically speaking, MW History is as accurate as our monthly sqoop of the MariaDB replicas. I'm no... [15:58:54] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Refine: persist resolved stream configurations to make reruns deterministic - https://phabricator.wikimedia.org/T418151 (10Antoine_Quhen) 03NEW [16:09:54] 06Data-Engineering, 06Data-Platform-SRE: Reduce noise from HdfsRpcQueueLength alert - https://phabricator.wikimedia.org/T418152 (10Ottomata) 03NEW [16:11:07] 06Data-Engineering, 06Data-Platform-SRE: Reduce noise from HdfsRpcQueueLength alert - https://phabricator.wikimedia.org/T418152#11641231 (10Ottomata) [16:26:35] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Weekly delivery cadence of core contributor metrics - https://phabricator.wikimedia.org/T418032#11641311 (10Ottomata) Related: {T258511} and subtasks [16:30:18] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Research-engineering, 06Research-Freezer, 10Event-Platform: Productionized Edit Types - https://phabricator.wikimedia.org/T351225#11641321 (10Ottomata) a:05Ottomata→03AKhatun_WMF [16:38:27] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Create intermediary table for weighted bot signal scoring (pageviews) - https://phabricator.wikimedia.org/T418154 (10Antoine_Quhen) 03NEW [16:51:49] 06Data-Engineering, 10Data Pipelines: Provide an easy way for MediaWiki to fetch aggregate data from the data lake - https://phabricator.wikimedia.org/T341649#11641446 (10Ottomata) FYI, OW6.2 is a 'Derived Data' related KR in FY 2026-2027 that will be focused on this kind of issue. [16:53:50] 06Data-Engineering, 06Data-Platform-SRE: Requesting a Kerberos identity for shell user lerickson (lerickson@wikimedia.org) - https://phabricator.wikimedia.org/T418135#11641457 (10Ottomata) [17:37:20] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Refine: persist resolved stream configurations to make reruns deterministic - https://phabricator.wikimedia.org/T418151#11641653 (10Ahoelzl) Duplicate with more detailed problem statement: https://phabricator.wikimedia.org/T418021 [17:40:36] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Airflow Refine: re-run of failed mapped tasks with changed stream config can cause silent data loss or data duplication - https://phabricator.wikimedia.org/T418021#11641684 (10Ahoelzl) →14Duplicate dup:03T418151 [17:40:38] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Refine: persist resolved stream configurations to make reruns deterministic - https://phabricator.wikimedia.org/T418151#11641686 (10Ahoelzl) [17:47:19] 06Data-Engineering, 10Data Pipelines, 06Data-Platform-SRE (2026-02-13 - 2026-03-06), 07Essential-Work: Airflow dynamic task mapping logs mix up when, on rerun, an id is mapped to a different map_index_template - https://phabricator.wikimedia.org/T408802#11641715 (10Antoine_Quhen) We can try to implement th... [17:48:52] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Refine: persist resolved stream configurations to make reruns deterministic - https://phabricator.wikimedia.org/T418151#11641728 (10Antoine_Quhen) We could do the same for file exporter jobs following same pattern as Refine: T408802 [17:49:04] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Feb 2026, Extend webrequest retention time to 180 days - https://phabricator.wikimedia.org/T418162 (10Ahoelzl) 03NEW [17:49:42] 06Data-Engineering, 06Movement-Insights: Investigate and repair pageviews and unique devices spike starting in Nov 2025 - https://phabricator.wikimedia.org/T416933#11641749 (10Ahoelzl) Extended retention time change tracking: https://phabricator.wikimedia.org/T418162 [17:51:22] 06Data-Engineering, 06Movement-Insights: Investigate and repair pageviews and unique devices spike starting in Nov 2025 - https://phabricator.wikimedia.org/T416933#11641767 (10Ahoelzl) [17:55:55] 06Data-Engineering, 06Movement-Insights: Investigate and repair pageviews and unique devices spike starting in Nov 2025 - https://phabricator.wikimedia.org/T416933#11641776 (10Ahoelzl) [18:59:07] 06Data-Engineering, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Requesting Kerberos access for ASanford-WMF - https://phabricator.wikimedia.org/T417447#11641981 (10BTullis) a:03BTullis [19:23:20] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 13Patch-For-Review: Adapt Sqoop for imagelinks schema changes - https://phabricator.wikimedia.org/T416481#11642076 (10Snwachukwu) I did a manual CIM run just comparing the number of rows alone, only the `wmf_contributors.commons_category_metrics_snapsho... [19:42:10] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Airflow Refine: re-run of failed mapped tasks with changed stream config can cause silent data loss or data duplication - https://phabricator.wikimedia.org/T418021#11642173 (10Antoine_Quhen) [19:42:11] 06Data-Engineering, 10Data Pipelines, 06Data-Platform-SRE (2026-02-13 - 2026-03-06), 07Essential-Work: Airflow dynamic task mapping logs mix up when, on rerun, an id is mapped to a different map_index_template - https://phabricator.wikimedia.org/T408802#11642174 (10Antoine_Quhen) [19:43:38] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Refine: persist resolved stream configurations to make reruns deterministic - https://phabricator.wikimedia.org/T418151#11642190 (10Antoine_Quhen) [19:43:39] 06Data-Engineering, 10Data Pipelines, 06Data-Platform-SRE (2026-02-13 - 2026-03-06), 07Essential-Work: Airflow dynamic task mapping logs mix up when, on rerun, an id is mapped to a different map_index_template - https://phabricator.wikimedia.org/T408802#11642191 (10Antoine_Quhen) [19:45:56] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 13Patch-For-Review: Adapt Sqoop for imagelinks schema changes - https://phabricator.wikimedia.org/T416481#11642209 (10Snwachukwu) Seems like this may have to do with the fact that all usage_map values in the intermidiate table category_and_media_with_us... [21:26:08] 06Data-Engineering, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Requesting Kerberos access for ASanford-WMF - https://phabricator.wikimedia.org/T417447#11642583 (10BTullis) I have actioned this request, since creating a kerberos principal doesn't confer any greater privileges now, compared with membership o... [21:32:36] 06Data-Engineering, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Requesting Kerberos access for ASanford-WMF - https://phabricator.wikimedia.org/T417447#11642651 (10BTullis) 05Open→03Resolved @ASanford-WMF - This should all be complete now. You shoul have received an email with a temporary kerberos p... [22:59:23] 06Data-Engineering: spark-sql warns about mismatching table schema for event.EditAttemptStep - https://phabricator.wikimedia.org/T418065#11642869 (10amastilovic) A somewhat better way to manually fix this is to determine the difference in Hive vs Spark schemas, and apply `ALTER TABLE ... ALTER COLUMN` in Spark S... [23:06:55] 06Data-Engineering, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Requesting a Kerberos identity for shell user lerickson (lerickson@wikimedia.org) - https://phabricator.wikimedia.org/T418135#11642876 (10BTullis) [23:07:08] 06Data-Engineering, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Requesting a Kerberos identity for shell user lerickson (lerickson@wikimedia.org) - https://phabricator.wikimedia.org/T418135#11642877 (10BTullis) a:03BTullis [23:08:50] 06Data-Engineering, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Requesting a Kerberos identity for shell user lerickson (lerickson@wikimedia.org) - https://phabricator.wikimedia.org/T418135#11642882 (10BTullis) All approvals have already been obtained on the linked ticket, so I have gone ahead and created t... [23:13:24] 06Data-Engineering, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Requesting a Kerberos identity for shell user lerickson (lerickson@wikimedia.org) - https://phabricator.wikimedia.org/T418135#11642911 (10BTullis) 05Open→03Resolved The required kerberos flag in `data.yaml` was already added in https://...