[00:51:15] 06Data-Engineering: spark-sql warns about mismatching table schema for event.EditAttemptStep - https://phabricator.wikimedia.org/T418065#11643088 (10Ottomata) So, the HiveMetastore had the fields, but Spark's version in TBLPROPERTIES did not, is that correct? If so, then that makes sense. But @amastilovic! AL... [01:40:47] 06Data-Engineering: spark-sql warns about mismatching table schema for event.EditAttemptStep - https://phabricator.wikimedia.org/T418065#11643152 (10amastilovic) @Ottomata yeah I just ran that command above, via `wmf.spark.run` in my Jupyter notebook. The trick with the `struct`s is that you have to provide a wh... [01:42:16] 06Data-Engineering: spark-sql warns about mismatching table schema for event.EditAttemptStep - https://phabricator.wikimedia.org/T418065#11643155 (10amastilovic) >>! In T418065#11643088, @Ottomata wrote: > So, the HiveMetastore had the fields, but Spark's version in TBLPROPERTIES did not, is that correct? If so... [02:03:34] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Refactor pingback reports pipelines using dbt - https://phabricator.wikimedia.org/T418190 (10amastilovic) 03NEW [02:17:59] 06Data-Engineering: spark-sql warns about mismatching table schema for event.EditAttemptStep - https://phabricator.wikimedia.org/T418065#11643237 (10Ottomata) @amastilovic uh thank you. Continuing convo over in {T209453}. [02:19:52] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data Pipelines: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453#11643241 (10Ottomata) @amastilovic, if I understand your explanation correctly in, if we modify this [[ https://github.com/wikimedia/analytics-refinery-sourc... [02:25:27] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data Pipelines: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453#11643250 (10amastilovic) @Ottomata I'm not sure if it will work when using a Hive adapter, but it should work through a Spark adapter since it works from Jup... [02:54:06] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data Pipelines: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453#11643278 (10Ottomata) Right! That is the hope of this task. To run Hive DDLs via spark.sql rather than over Hive JDBC. [06:25:24] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786#11643486 (10Marostegui) [09:09:32] !log Test Kitchen edge-unique experiments (poll 174781) - adds: mobile-toc-abc; removes: none; fields: none - xLab/MPIC/TK tips at https://w.wiki/FwuD [09:09:34] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:22:54] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Movement-Insights (FY25-26 H2): dbt repository structure (Milestone 3) - https://phabricator.wikimedia.org/T416672#11643976 (10JMonton-WMF) I also think the Team-first approach would be better, and I like the proposal of sticking to dbt's proposal of... [09:39:26] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07Essential-Work: Refine generates very large XCOM values - https://phabricator.wikimedia.org/T414953#11644052 (10brouberol) [10:31:41] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Refactor pingback reports pipelines using dbt - https://phabricator.wikimedia.org/T418190#11644291 (10GGoncalves-WMF) Nice! Just wondering, do we know who uses that output CSV and where? I'm asking for my own education (I don't know much about pingback),... [12:35:39] !log failing over HDFS namenode services to an-master1004 for T414948 [12:35:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:35:42] T414948: Decommission an-worker11[17-41] - https://phabricator.wikimedia.org/T414948 [13:52:59] 06Data-Engineering, 06Machine-Learning-Team, 10Event-Platform: Create new mediawiki links change streams based on fragment/mediawiki/state/change/page - https://phabricator.wikimedia.org/T331399#11645088 (10Ottomata) > there is no way to fetch the renderId from the revision API ? Answering my own question:... [14:12:58] 06Data-Engineering, 10Wikidata, 10Wikidata Analytics: Add rcshowwikidata property to the existing PrefUpdate instrumentation for wmf_raw.mediawiki_user_properties - https://phabricator.wikimedia.org/T418246 (10AndrewTavis_WMDE) 03NEW [14:14:35] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Data Pipelines, 10Wikidata, 10Wikidata Analytics: NEW FEATURE REQUEST: sqoop (all) user properties from mariadb to wmf_raw.mediawiki_user_properties - https://phabricator.wikimedia.org/T323456#11645208 (10AndrewTavis_WMDE) Thanks for the direction... [14:24:42] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE, 07OKR-Work, 13Patch-For-Review: Provide a Spark-on-k8s access for sql tools (dbt) - https://phabricator.wikimedia.org/T410017#11645298 (10BTullis) a:05BTullis→03None I'm moving this back to the main #data-platform-sre backlo... [14:25:55] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data Pipelines: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453#11645313 (10Ottomata) FYI, the query @amastilovic ran was in T418065#11642869. We should investigate to see if this will help us solve this bug. [14:38:52] !log failing over HDFS namenode services to an-master1003 for T414948 [14:38:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:38:55] T414948: Decommission an-worker11[17-41] but reuse an-worker11[17,18,31,33,34] as dse-k8s-workers - https://phabricator.wikimedia.org/T414948 [14:44:55] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 13Patch-For-Review: Adapt Sqoop for imagelinks schema changes - https://phabricator.wikimedia.org/T416481#11645483 (10xcollazo) @Snwachukwu can you share the parameters you used to run this test? I suspect perhaps the previous step, [[ https://airflow.... [15:19:26] 06Data-Engineering, 10ChangeProp, 10EventStreams, 10iPoid-Service, and 16 others: Migrate node-based services in production to node22 - https://phabricator.wikimedia.org/T393434#11645938 (10Dreamy_Jazz) [15:20:12] 06Data-Engineering, 10ChangeProp, 10EventStreams, 06MediaWiki-Engineering, and 15 others: Migrate node-based services in production to node22 - https://phabricator.wikimedia.org/T393434#11645944 (10Dreamy_Jazz) Per {T416623}, #ipoid-service will soon be undeployed. #similarusers is archived so I think that... [15:20:20] 06Data-Engineering, 10ChangeProp, 10EventStreams, 06MediaWiki-Engineering, and 15 others: Migrate node-based services in production to node22 - https://phabricator.wikimedia.org/T393434#11645946 (10Dreamy_Jazz) [15:23:59] 06Data-Engineering, 10ChangeProp, 10EventStreams, 06MediaWiki-Engineering, and 15 others: Migrate node-based services in production to node22 - https://phabricator.wikimedia.org/T393434#11645965 (10brouberol) [15:32:39] 06Data-Engineering, 06Machine-Learning-Team, 10Event-Platform, 13Patch-For-Review: Add Multilingual RevertRisk predictions to mediawiki.page_revert_risk_prediction_change - https://phabricator.wikimedia.org/T415892#11646013 (10gkyziridis) === Update === I updated and test the model on docker using newer bl... [15:34:27] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Product Safety and Integrity, 06Product-Analytics (Kanban): Add mediawiki_product_metrics_incident_reporting_system_interaction to the sanitization allowlist - https://phabricator.wikimedia.org/T384650#11646050 (10Dreamy_Jazz) [15:36:43] 06Data-Engineering, 06Data-Engineering-Radar, 10CampaignEvents, 06Connection-Team, and 17 others: Add namespace descriptions for Special:NamespaceInfo in WMF-deployed extensions - https://phabricator.wikimedia.org/T373070#11646112 (10Dreamy_Jazz) [15:37:16] 06Data-Engineering, 06Data-Engineering-Radar, 10DPE Temporary Accounts, 06Product Safety and Integrity, and 3 others: Ensure performer attributes in schemas clarify if the user is a temporary account - https://phabricator.wikimedia.org/T374940#11646122 (10Dreamy_Jazz) [15:47:06] 06Data-Engineering, 06Data-Engineering-Icebox, 06Product Safety and Integrity, 06SRE, and 3 others: Include User-Agent Client Hints in WebRequest logs - https://phabricator.wikimedia.org/T337947#11646291 (10Dreamy_Jazz) [15:47:36] 06Data-Engineering, 06Data-Engineering-Radar, 06Product Safety and Integrity: Mitigate unwanted side effects to anti abuse work from Google Chrome's IP Protection rollout - https://phabricator.wikimedia.org/T350428#11646298 (10Dreamy_Jazz) [15:48:00] 06Data-Engineering, 06Data-Engineering-Radar, 10EventStreams, 10MediaWiki-Page-protection, and 5 others: Create Mediawiki "oversightprotect" action that suppresses usernames of all edits of a page - https://phabricator.wikimedia.org/T354577#11646301 (10Dreamy_Jazz) [16:34:27] 06Data-Engineering, 06MW-Interfaces-Team, 06Traffic, 06MediaWiki-Platform-Team (Radar), 07OKR-Work: haproxy: capture x-wmf-* headers in webrequest data set - https://phabricator.wikimedia.org/T417864#11646998 (10Vgutierrez) >>! In T417864#11632385, @Clement_Goubert wrote: > In the merged task, I was prop... [16:56:40] (03PS1) 10Andrew McAllister (WMDE): T416680 Split active_user_changes.sql into user/temp account versions and run both Bug: T416680 [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243177 (https://phabricator.wikimedia.org/T416680) [16:57:25] (03CR) 10CI reject: [V:04-1] T416680 Split active_user_changes.sql into user/temp account versions and run both Bug: T416680 [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243177 (https://phabricator.wikimedia.org/T416680) (owner: 10Andrew McAllister (WMDE)) [17:05:24] (03PS2) 10Andrew McAllister (WMDE): T416680 Split active_user_changes.sql into user/temp account versions and run both Bug: T416680 [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243177 (https://phabricator.wikimedia.org/T416680) [17:10:16] (03PS3) 10Andrew McAllister (WMDE): T416680 Split active_user_changes.sql into user/temp account versions and run both Bug: T416680 [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243177 (https://phabricator.wikimedia.org/T416680) [17:14:08] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07Essential-Work, 10Event-Platform, 13Patch-For-Review: Upgrade mediawiki-event-enrichment jobs to Flink 1.20.2 and Java 17 - https://phabricator.wikimedia.org/T408918#11647286 (10Ottomata) Okay, I've submitted another [[ https://gitlab.wikimedia.or... [17:24:35] (03PS1) 10Andrew McAllister (WMDE): Add license and contributing guide + update readme [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243181 [17:38:14] FIRING: HaproxykafkaNoMessages: [WARNING] - Haproxykafka on cp2043 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2043 - https://alerts.wikimedia.org/?q=alertname%3DHaproxykafkaNoMessages [17:40:01] (03PS2) 10Andrew McAllister (WMDE): Add license and contributing guide + update readme [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243181 [17:40:17] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Feb 2026, Extend webrequest retention time to 180 days - https://phabricator.wikimedia.org/T418162#11647499 (10Ahoelzl) [17:49:02] (03PS4) 10Andrew McAllister (WMDE): T416680 Split active_user_changes.sql into user/temp account versions and run both Bug: T416680 [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243177 (https://phabricator.wikimedia.org/T416680) [17:51:51] (03PS3) 10Andrew McAllister (WMDE): Add license and contributing guide + update readme [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243181 [18:25:50] (03PS4) 10Andrew McAllister (WMDE): Add license and contributing guide + update readme [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243181 [18:41:37] (03PS5) 10Andrew McAllister (WMDE): Add license and contributing guide + update readme [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1243181 [18:53:06] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to Superset for mikez - https://phabricator.wikimedia.org/T418098#11647956 (10Dzahn) [18:53:23] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-platform-eng-admins for milimetric - https://phabricator.wikimedia.org/T417906#11647957 (10Dzahn) [19:11:20] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 13Patch-For-Review: Adapt Sqoop for imagelinks schema changes - https://phabricator.wikimedia.org/T416481#11648018 (10Snwachukwu) @xcollazo Found the culprit. 😔 That new inner join produced no rows. Of course that's because the mediawiki_imagelinks ta... [19:19:51] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 13Patch-For-Review: Adapt Sqoop for imagelinks schema changes - https://phabricator.wikimedia.org/T416481#11648097 (10Snwachukwu) Okay, For now, let's not change the il_to field to null. We keep it so that I can do a comparison for next sqoop run. Then... [19:24:00] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 13Patch-For-Review: Adapt Sqoop for imagelinks schema changes - https://phabricator.wikimedia.org/T416481#11648102 (10Snwachukwu) I'd quickly run a sqoop for mediawiki_imagelinks table [19:28:17] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Feb 2026, Extend webrequest retention time to 180 days - https://phabricator.wikimedia.org/T418162#11648119 (10JAllemandou) [20:20:11] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Content-Transform-Team, 06MW-Interfaces-Team, 10Event-Platform, 13Patch-For-Review: Common event data model for data derived from parsed page revision html (and more!) - https://phabricator.wikimedia.org/T415158#11648356 (10Ottomata) [20:24:56] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07Essential-Work, 10Event-Platform, 13Patch-For-Review: Upgrade mediawiki-event-enrichment jobs to Flink 1.20.2 and Java 17 - https://phabricator.wikimedia.org/T408918#11648375 (10Ottomata) > Next up: dive into the test failures to figure out what i... [20:27:58] !log increase webrequest retention from 90 to 180 days with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1243205 for T418162 [20:28:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:28:02] T418162: Feb 2026, Extend webrequest retention time to 180 days - https://phabricator.wikimedia.org/T418162 [20:28:52] 06Data-Engineering, 06Movement-Insights: Investigate and repair pageviews and unique devices spike starting in Nov 2025 - https://phabricator.wikimedia.org/T416933#11648386 (10BTullis) [21:16:45] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Movement-Insights: Investigate and repair pageviews and unique devices spike starting in Nov 2025 - https://phabricator.wikimedia.org/T416933#11648537 (10Ahoelzl) [21:16:53] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Movement-Insights: Investigate and repair pageviews and unique devices spike starting in Nov 2025 - https://phabricator.wikimedia.org/T416933#11648538 (10Ahoelzl) 05Open→03In progress [22:37:02] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): File BAR for Cursor and Claude Code - https://phabricator.wikimedia.org/T416937#11648827 (10Ahoelzl) https://wikimedia.coupahost.com/contract/easy_form_responses/8455