[00:37:22] 06Data-Engineering, 06Data-Persistence, 10MediaWiki-extensions-OATHAuth, 06Security-Team, and 4 others: New database table for tracking WebAuthn userHandle values (oathauth_user_handles) - https://phabricator.wikimedia.org/T416544#11630381 (10Catrope) [00:37:46] 06Data-Engineering, 06Data-Persistence, 10MediaWiki-extensions-OATHAuth, 06Security-Team, and 4 others: New database table for tracking WebAuthn userHandle values (oathauth_user_handles) - https://phabricator.wikimedia.org/T416544#11630385 (10Catrope) 05In progress→03Resolved [00:38:02] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Research, 10Event-Platform, 13Patch-For-Review: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794#11630387 (10fkaelin) The content diff pipeline uses [[ https://github.com/google/diff-match-patch... [09:30:11] (03Abandoned) 10Aqu: Maintain Spark schema after Hive DDL operations [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1199473 (https://phabricator.wikimedia.org/T307040) (owner: 10Aqu) [09:37:51] 06Data-Engineering, 06MW-Interfaces-Team, 06MediaWiki-Platform-Team (Radar), 07OKR-Work: haproxy: capture x-wmf-* headers in webrequest data set - https://phabricator.wikimedia.org/T417864 (10daniel) 03NEW [09:53:24] 06Data-Engineering, 06Data-Platform-SRE, 10Wikidata, 10Wikidata Analytics: WMDE test Airflow instance can't egress out to external URLs - https://phabricator.wikimedia.org/T417630#11631150 (10AndrewTavis_WMDE) 05Open→03Resolved a:03AndrewTavis_WMDE The changes for T417754 have fixed the issue 😊 S... [09:53:57] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Wikidata, 10Wikidata Analytics: Airflow dev environment configuration doesn't have skein as default launcher - https://phabricator.wikimedia.org/T417754#11631168 (10AndrewTavis_WMDE) The changes have fixed the issue 😊 See screenshot below for a fini... [11:29:50] !log Test Kitchen edge-unique experiments (poll 153734) - adds: mobile-toc-abc; removes: none; fields: none - xLab/MPIC/TK tips at https://w.wiki/FwuD [11:29:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:23:23] 06Data-Engineering, 06MW-Interfaces-Team, 06Traffic, 06MediaWiki-Platform-Team (Radar), 07OKR-Work: haproxy: capture x-wmf-* headers in webrequest data set - https://phabricator.wikimedia.org/T417864#11631902 (10JAllemandou) [13:24:16] 06Data-Engineering, 06MW-Interfaces-Team, 06Traffic, 06MediaWiki-Platform-Team (Radar), 07OKR-Work: haproxy: strip x-wmf-* headers from responses - https://phabricator.wikimedia.org/T417781#11631905 (10JAllemandou) [13:31:23] (03CR) 10Joal: [V:03+2 C:03+2] "<3" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1240253 (https://phabricator.wikimedia.org/T414478) (owner: 10Joal) [13:56:43] 06Data-Engineering, 06MW-Interfaces-Team, 06Traffic, 06MediaWiki-Platform-Team (Radar), 07OKR-Work: haproxy: capture x-wmf-* headers in webrequest data set - https://phabricator.wikimedia.org/T417864#11632039 (10JAllemandou) [14:11:18] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Wikidata, 10Wikidata Analytics: Airflow dev environment configuration doesn't have skein as default launcher - https://phabricator.wikimedia.org/T417754#11632113 (10xcollazo) Thanks for verifying @AndrewTavis_WMDE! [14:17:18] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10DPE-Mediawiki-Content: Use wmf.mediawiki_history as baseline for slo completeness - https://phabricator.wikimedia.org/T416312#11632134 (10APizzata-WMF) Tested the split pipelines pushed in the MR and got a completeness of `0.9974045052252971` for the... [14:21:01] 06Data-Engineering, 06MW-Interfaces-Team, 06Traffic, 06MediaWiki-Platform-Team (Radar), 07OKR-Work: haproxy: capture x-wmf-* headers in webrequest data set - https://phabricator.wikimedia.org/T417864#11632139 (10daniel) [14:21:25] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786#11632147 (10fnegri) [14:22:07] 06Data-Engineering, 06MW-Interfaces-Team, 06Traffic, 06MediaWiki-Platform-Team (Radar), 07OKR-Work: haproxy: capture x-wmf-* headers in webrequest data set - https://phabricator.wikimedia.org/T417864#11632148 (10daniel) Let's postpone capturing x-wmf-user-id. x-wmf-ratelimit-class is the urgent one. Not... [14:36:26] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily - https://phabricator.wikimedia.org/T410431#11632227 (10xcollazo) Just checked again all `wmf_content` tables for duplicates since the monthly reconcile is now done. All... [14:41:53] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily - https://phabricator.wikimedia.org/T410431#11632260 (10APizzata-WMF) 🎉🎉🎉🎉🎉🎉🎉🎉 [14:56:40] 06Data-Engineering, 06MW-Interfaces-Team, 06Traffic, 06MediaWiki-Platform-Team (Radar), 07OKR-Work: haproxy: capture x-wmf-* headers in webrequest data set - https://phabricator.wikimedia.org/T417864#11632377 (10Clement_Goubert) [14:59:25] 06Data-Engineering, 06MW-Interfaces-Team, 06Traffic, 06MediaWiki-Platform-Team (Radar), 07OKR-Work: haproxy: capture x-wmf-* headers in webrequest data set - https://phabricator.wikimedia.org/T417864#11632385 (10Clement_Goubert) In the merged task, I was proposing not creating a new header, but instead a... [15:32:23] 06Data-Engineering, 06Machine-Learning-Team, 10Event-Platform: Emit article quality predictions as a stream and expose in EventStreams API. - https://phabricator.wikimedia.org/T417794#11632569 (10Isaac) Thanks @Ottomata! Chiming in with some more details and to connect dots: * Tagging @ifried as Connection T... [16:08:35] 06Data-Engineering, 10MediaWiki-extensions-EventLogging, 07Essential-Work, 07Technical-Debt, 06Test Kitchen (Experiment Platform Sprint 19): Migrate 1 instrument using mw.eventLog.newInstrument() to mw.xLab.getInstrument() - https://phabricator.wikimedia.org/T408096#11632723 (10KReid-WMF) 05Open→0... [16:10:29] 06Data-Engineering, 10MediaWiki-extensions-EventLogging, 07Essential-Work, 13Patch-For-Review, and 2 others: Decouple mw.xLab.getInstrument() and mw.eventLog.newInstrument() - https://phabricator.wikimedia.org/T408094#11632759 (10KReid-WMF) [16:38:09] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Troubleshoot duplicates issue in mw_content_merge_events_to_mw_content_history_daily - https://phabricator.wikimedia.org/T410431#11632884 (10xcollazo) 05In progress→03Resolved [17:12:54] 06Data-Engineering, 10Data-Platform, 06Moderator-Tools-Team, 06Product-Analytics (Kanban): Personal Dashboard Instrumentation Superset Dashboard - https://phabricator.wikimedia.org/T412137#11633029 (10jsn.sherman) >>! In T412137#11550018, @MNeisler wrote: > @Kgraessle @DMburugu - Have the events needed for... [18:42:07] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07OKR-Work, 13Patch-For-Review: SDS 2.2.6 Improve experiment event data data lake management - https://phabricator.wikimedia.org/T414105#11633655 (10AKhatun_WMF) ===== Done: ===== - MR for HQL files in `product-analytics/data-pipelines` repo is merge... [18:49:09] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Sqlfluff Rules for dbt - https://phabricator.wikimedia.org/T417152#11633668 (10Mayakp.wiki) I ran sqlfluff on the cluster today using the steps @JMonton-WMF provided 2 days ago and was successfully able to pass all the linter rules in the current version... [19:05:59] 10Data-Engineering-Roadmap, 10Data Pipelines, 07Epic: Refine jobs should be scheduled by Airflow - https://phabricator.wikimedia.org/T307505#11633769 (10Ahoelzl) a:05Ahoelzl→03None [19:06:43] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Data Pipelines, 10Wikidata, 10Wikidata Analytics: NEW FEATURE REQUEST: sqoop (all) user properties from mariadb to wmf_raw.mediawiki_user_properties - https://phabricator.wikimedia.org/T323456#11633774 (10Milimetric) Yes, this would constitute a b... [19:27:55] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Sqlfluff Rules for dbt - https://phabricator.wikimedia.org/T417152#11633832 (10Ahoelzl) 05Open→03Resolved [19:28:00] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Wikidata, 10Wikidata Analytics: Airflow dev environment configuration doesn't have skein as default launcher - https://phabricator.wikimedia.org/T417754#11633834 (10Ahoelzl) 05In progress→03Resolved [19:28:02] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): File Export files with revision ranges do not show end revision in filename - https://phabricator.wikimedia.org/T416176#11633836 (10Ahoelzl) 05In progress→03Resolved [19:28:04] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): OpsWeek: wikidatawiki fails to export due to OOM on partitionByXMLFileBoundaries - https://phabricator.wikimedia.org/T416642#11633837 (10Ahoelzl) 05Open→03Resolved [19:28:05] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): [Iceberg Migration] Extend Iceberg table maintenance mechanism to support multiple Airflow instances - https://phabricator.wikimedia.org/T373693#11633839 (10Ahoelzl) 05Open→03Resolved [19:28:07] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Create a `DbtSkeinOperator` in the Airflow `wmf_airflow_common` library - https://phabricator.wikimedia.org/T415194#11633843 (10Ahoelzl) 05Open→03Resolved [19:28:11] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10CheckUser, 06Product Safety and Integrity: Changes to the cuc_agent column in the cu_changes table - https://phabricator.wikimedia.org/T361210#11633838 (10Ahoelzl) 05Open→03Resolved [19:28:15] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Sqlufluff on Stat hosts - https://phabricator.wikimedia.org/T417171#11633845 (10Ahoelzl) 05Open→03Resolved [19:28:19] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 05MW-1.46-notes (1.46.0-wmf.14; 2026-02-03): Send client signals in various ways to understand new data - https://phabricator.wikimedia.org/T416472#11633846 (10Ahoelzl) 05Open→03Resolved [19:28:27] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): OpsWeek: Bump memory of refine job for product_metrics.web_base_with_ip to avoid recent OOMs - https://phabricator.wikimedia.org/T416719#11633849 (10Ahoelzl) 05Open→03Resolved [19:28:31] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): aggregate_for_fundraising_hourly failing for last 24 hours - https://phabricator.wikimedia.org/T415267#11633851 (10Ahoelzl) 05Open→03Resolved [19:51:26] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Research, 10Event-Platform, 13Patch-For-Review: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794#11634010 (10Ottomata) I like that difflib generates a standard (unified?) diff that can be applied... [19:53:19] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Research, 10Event-Platform, 13Patch-For-Review: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794#11634037 (10Ottomata) Another q: Which direction should the diff be created? On edit, we save la... [20:30:32] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Content-Transform-Team, 06MW-Interfaces-Team, 10Event-Platform: Common event data model for data derived from parsed page revision content - https://phabricator.wikimedia.org/T415158#11634270 (10Ottomata) > Also, If we keep an e-tag, don't we also... [20:33:42] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Content-Transform-Team, 06MW-Interfaces-Team, 10Event-Platform: Common event data model for data derived from parsed page revision content - https://phabricator.wikimedia.org/T415158#11634284 (10Ottomata) I'm currently leaning towards Option D (T4... [20:45:04] 06Data-Engineering, 06Machine-Learning-Team, 10Event-Platform: Emit article quality predictions as a stream and expose in EventStreams API. - https://phabricator.wikimedia.org/T417794#11634314 (10fkaelin) For reference, there are also offline components to this: - [[ https://datahub.wikimedia.org/dataset/urn... [21:08:00] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Content-Transform-Team, 06MW-Interfaces-Team, 10Event-Platform: Common event data model for data derived from parsed page revision content - https://phabricator.wikimedia.org/T415158#11634355 (10Ottomata) > I don't know if a RESTy URL is the best... [21:25:31] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Research, 10Event-Platform, 13Patch-For-Review: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794#11634479 (10Isaac) > Which direction should the diff be created? On edit, we save latest rev html.... [21:48:34] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Content-Transform-Team, 06MW-Interfaces-Team, 10Event-Platform, 13Patch-For-Review: Common event data model for data derived from parsed page revision content - https://phabricator.wikimedia.org/T415158#11634562 (10Ottomata) [[ https://gitlab.wi... [21:52:16] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): OpsWeek: Bump memory of refine job for product_metrics.web_base_with_ip to avoid recent OOMs - https://phabricator.wikimedia.org/T416719#11634569 (10Milimetric) we found another trickier issue - rerunning some of these jobs fetched new config but caus...