[01:02:29] 06Data-Engineering, 10Commons-Impact-Metrics, 10Commons-Impact-Metrics-Requests: Update Commons Impact Metrics allow-list March 2025 - https://phabricator.wikimedia.org/T390646 (10GFontenelle_WMF) 03NEW [06:27:49] (03CR) 10Joal: [C:03+1] "I think you should have waited to merge tis, in order to sync with the puppet one. No harm, but if a deploy happens this morning without p" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1132031 (https://phabricator.wikimedia.org/T390247) (owner: 10Aleksandar Mastilovic) [08:01:37] (03PS1) 10Joal: Update webrequest schemas for HAProxy migration [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1133068 (https://phabricator.wikimedia.org/T386177) [08:27:26] (03CR) 10Aqu: [C:03+1] "Looks good." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1133068 (https://phabricator.wikimedia.org/T386177) (owner: 10Joal) [08:32:23] (03CR) 10Joal: [V:03+2 C:03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1133068 (https://phabricator.wikimedia.org/T386177) (owner: 10Joal) [08:34:36] !log Drop wmf_raw.webrequest table [08:34:38] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:36:04] !log Drop wmf_staging.webrequest_frontend table (raw for HAProxy) [08:36:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:38:30] !log Recreate wmf_raw.webrequest and create wmf_deprecated.webrequest_raw [08:38:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:55:53] !log Drop wmf.webrequest and wmf_staging.webrequest [08:55:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:58:25] !log Recreate wmf.webrequest and wmf_deprecated.webrequest [08:58:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:59:23] 10Data-Engineering (Q3 2025 January 1st - March 31th), 13Patch-For-Review: Switch webrequest dataset to feed from HAProxy instead of VarnishKafka - https://phabricator.wikimedia.org/T386177#10697494 (10brouberol) I'm going to adjust retention of the following topics: - `webrequest_text` goes from 7d to 3d - `... [09:04:27] 10Data-Engineering (Q3 2025 January 1st - March 31th), 13Patch-For-Review: Switch webrequest dataset to feed from HAProxy instead of VarnishKafka - https://phabricator.wikimedia.org/T386177#10697534 (10JAllemandou) > All good? YES :) [09:07:04] !log Rebuild 1 day of wmf_raw.webrequest partitions for both text and upload [09:07:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:07:32] 10Data-Engineering (Q3 2025 January 1st - March 31th), 13Patch-For-Review: Switch webrequest dataset to feed from HAProxy instead of VarnishKafka - https://phabricator.wikimedia.org/T386177#10697542 (10brouberol) ` brouberol@kafka-jumbo1014:~$ kafka configs --alter --entity-type topics --entity-name webrequest... [09:07:48] 10Data-Engineering (Q3 2025 January 1st - March 31th), 13Patch-For-Review: Switch webrequest dataset to feed from HAProxy instead of VarnishKafka - https://phabricator.wikimedia.org/T386177#10697543 (10brouberol) ` brouberol@kafka-jumbo1014:~$ kafka configs --alter --entity-type topics --entity-name webrequest... [09:10:08] 10Data-Engineering (Q3 2025 January 1st - March 31th), 13Patch-For-Review: Switch webrequest dataset to feed from HAProxy instead of VarnishKafka - https://phabricator.wikimedia.org/T386177#10697548 (10brouberol) The `webrequest_{text,upload}` topics now have a retention of 3d. ` brouberol@kafka-jumbo1014:~$... [09:50:24] (03PS1) 10Joal: Hotfix for webrequest migration [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1133084 (https://phabricator.wikimedia.org/T386177) [09:50:58] (03CR) 10Aqu: [C:03+1] Hotfix for webrequest migration [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1133084 (https://phabricator.wikimedia.org/T386177) (owner: 10Joal) [09:51:48] (03CR) 10Joal: [V:03+2 C:03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1133084 (https://phabricator.wikimedia.org/T386177) (owner: 10Joal) [09:53:22] !log moved, dropped and recreating webrequest_sequence_stats tables [09:53:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:55:19] !log Deploying REfinery using scap [09:55:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:02:06] !log Deploying refinery on HDFS on both test and production [10:02:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:07:16] 10Data-Engineering (Q3 2025 January 1st - March 31th): Provide a list of Phabricator tags relevant for Data-Engineering and use them on Data-Engineering board - https://phabricator.wikimedia.org/T386752#10697777 (10BTullis) >>! In T386752#10683033, @Ottomata wrote: > cc also @BTullis who was [[ https://wikimedia... [10:26:37] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10697849 (10Lucas_Werkmeister_WMDE) >>! In T3... [10:57:06] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742#10697955 (10Ladsgroup) [12:28:05] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 06Research, 10Research-engineering: Update knowledge gaps pipeline to use article quality features V2 - https://phabricator.wikimedia.org/T386003#10698488 (10fkaelin) 05Stalled→03Resolved An updated content gap metrics... [13:08:39] (03PS1) 10Gerrit maintenance bot: Add nup.wikipedia to pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1133139 (https://phabricator.wikimedia.org/T390711) [13:16:19] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742#10698908 (10Ladsgroup) 05Open→03Resolved [13:50:22] 06Data-Engineering, 06Data-Platform-SRE: Canary failure on airflow platform_eng intsance after migrating to Kubernetes - https://phabricator.wikimedia.org/T390727 (10Snwachukwu) 03NEW [14:20:37] 06Data-Engineering: Requesting Kerberos access for ben.buchenau - https://phabricator.wikimedia.org/T390734 (10Ben.buchenau) 03NEW [14:25:12] 06Data-Engineering, 06Data-Platform-SRE: Canary failure on airflow platform_eng intsance after migrating to Kubernetes - https://phabricator.wikimedia.org/T390727#10699379 (10xcollazo) This should be fixed via {T380624}. Let's wait on conclusion of that work. [14:27:32] 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Canary failure on airflow platform_eng intsance after migrating to Kubernetes - https://phabricator.wikimedia.org/T390727#10699390 (10Stevemunene) a:03Stevemunene [14:29:48] 06Data-Engineering, 10CheckUser, 06DBA, 06Trust and Safety Product Team, 07Schema-change-in-production: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903#10699400 (10Ladsgroup) 05Open→03Resolved [15:39:46] 06Data-Engineering, 06Machine-Learning-Team, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10699855 (10achou) > Just curious, why is this the case? Are there non deterministic inputs to the article count... [15:40:05] (03CR) 10Aleksandar Mastilovic: [V:03+2 C:03+2] "Oh, sorry about that!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1132031 (https://phabricator.wikimedia.org/T390247) (owner: 10Aleksandar Mastilovic) [15:45:35] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE HAProxy Migration: [HAProxy migration] Drop webrequest_deprecated dag & linked wmf_deprecated tables (202) - https://phabricator.wikimedia.org/T390752#10699887 (10Ahoelzl) [15:48:13] 06Data-Engineering, 06Machine-Learning-Team, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10699907 (10Ottomata) > what kinds of changes would the mediawiki.revision_change.v1 stream capture besides visi... [15:52:53] 06Data-Engineering, 06Machine-Learning-Team, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10699928 (10Ottomata) > Can we say that the revert risk score is deterministic on revision content alone and is... [16:04:18] (03CR) 10Joal: [V:03+2 C:03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1133139 (https://phabricator.wikimedia.org/T390711) (owner: 10Gerrit maintenance bot) [16:24:58] 06Data-Engineering, 06Machine-Learning-Team, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10700071 (10achou) > I think the only revision properties that can change directly are visibility (rev_deleted)... [16:27:30] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Experimentation Lab: NEW/CHANGE FEATURE REQUEST: Documentation for v1 Enterprise endpoint deprecation - https://phabricator.wikimedia.org/T389542#10700085 (10xcollazo) 05Open→03Resolved Closing, as I can see the changes reflected at https://dumps... [16:30:24] 06Data-Engineering, 06Machine-Learning-Team, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10700094 (10Ottomata) Right, if this is being used for moderator tools, it is unlikely that moderators are looki... [16:46:20] 10Data-Engineering (Q3 2025 January 1st - March 31th): Enable Spark data lineage for all Airflow instances - https://phabricator.wikimedia.org/T386862#10700176 (10tchin) Currently encountering this error, with the search instance, think it's the same issue we encountered when migrating the analytics instance to... [17:26:41] 06Data-Engineering, 06Data-Platform-SRE, 10SRE-Access-Requests: Requesting Kerberos access for ben.buchenau - https://phabricator.wikimedia.org/T390734#10700350 (10Ottomata) [17:26:55] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE, 10SRE-Access-Requests: Requesting Kerberos access for ben.buchenau - https://phabricator.wikimedia.org/T390734#10700353 (10Ottomata) [17:29:23] 06Data-Engineering, 06Data-Engineering-Radar, 06MW-Interfaces-Team, 10WMF-JobQueue, 10Event-Platform: Bug: event validation error: bad mediawiki.job.* meta.request_id field - https://phabricator.wikimedia.org/T390013#10700368 (10Ottomata) [17:30:07] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE, 06SRE: Rebuild Spark images with Bookworm / bullseye-backports deprecation - https://phabricator.wikimedia.org/T390139#10700376 (10Ottomata) [17:30:53] 10Data-Engineering (Q4 2025 April 1st - June 30th): Migrate Gobblin job repository to GitLab - https://phabricator.wikimedia.org/T390247#10700385 (10Ottomata) [17:32:56] 06Data-Engineering, 06Data-Engineering-Radar, 10MediaWiki-extensions-Campaigns, 10MediaWiki-extensions-EventLogging, and 3 others: Remove deprecated parameters from ServerSideAccountCreation - https://phabricator.wikimedia.org/T389819#10700393 (10Ottomata) [17:32:58] 06Data-Engineering, 06Data-Engineering-Radar, 10MediaWiki-extensions-Campaigns, 10Event-Platform, 07Technical-Debt: ServerSideAccountCreation violates Identifier Naming Rules? - https://phabricator.wikimedia.org/T389822#10700396 (10Ottomata) [17:34:40] 06Data-Engineering, 06Data-Platform-SRE: Create views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10700401 (10Ottomata) [17:35:45] 06Data-Engineering, 06Data-Engineering-Radar, 10MediaWiki-General, 07Schema-change: Change data type of namespace fields from int to smallint or mediumint - https://phabricator.wikimedia.org/T383507#10700405 (10Ottomata) [17:36:09] 06Data-Engineering, 06Data-Engineering-Radar, 06Commons, 06Data-Persistence, and 5 others: Migrate file tables to a modern layout (image/oldimage; file/filerevision; add primary keys) - https://phabricator.wikimedia.org/T28741#10700410 (10Ottomata) [17:43:03] 06Data-Engineering, 10Event-Platform: Event Platform - Support JSON Schema draft-2019-09 schemas - https://phabricator.wikimedia.org/T390232#10700445 (10VirginiaPoundstone) Great thinking in here. This will move to backlog for now as a very nice to have. [17:44:11] 06Data-Engineering: NEW/CHANGE FEATURE REQUEST: Make Event Registration Tool's data available in Data Lake - https://phabricator.wikimedia.org/T389662#10700455 (10VirginiaPoundstone) 05Open→03Declined [17:45:04] 06Data-Engineering, 06Machine-Learning-Team, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10700458 (10diego) >> Another Q about revertrisk. Are visibilty settings relevant to possible revert risk? E.g.... [17:47:41] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Data Pipelines, 10Observability-Metrics, 07Essential-Work, and 2 others: Disable Data Platform Engineering generated graphite metrics and dashboards - https://phabricator.wikimedia.org/T372855#10700465 (10Snwachukwu) @AndrewTavis_WMDE This is good ne... [17:50:53] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content: Figure root cause of silent failures when computing metrics for mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T387033#10700469 (10xcollazo) 05Resolved→03Open Reopening as we had another instance of skein OOM:... [17:51:31] (03CR) 10Snwachukwu: 1.Add a closed flag to the project namespace map dataset 2. Add a whether to sqoop flag by checking if wikidb exists in cloud replica. (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1125184 (https://phabricator.wikimedia.org/T241741) (owner: 10Milimetric) [17:52:01] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content: Figure root cause of silent failures when computing metrics for mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T387033#10700474 (10xcollazo) CCing @tchin for awareness. [17:52:35] 06Data-Engineering, 06Data-Engineering-Radar, 06MW-Interfaces-Team, 06Trust and Safety Product Team, and 3 others: Bug: event validation error: bad mediawiki.job.* meta.request_id field - https://phabricator.wikimedia.org/T390013#10700479 (10Dreamy_Jazz) 05Open→03Resolved a:03Dreamy_Jazz These lo... [18:01:31] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE: Do performance testing of a big Hadoop Table hosted by Ceph - https://phabricator.wikimedia.org/T381416#10700505 (10xcollazo) Copy pasting part of conversation on {T381389} when testing a ~42TB query on top of Presto, for future reference: >... [18:24:12] 06Data-Engineering, 06Machine-Learning-Team, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10700609 (10Ottomata) Thanks @diego. A more specific question about this stream: Do we ever intend to produce... [18:35:26] 06Data-Engineering, 06Machine-Learning-Team, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10700697 (10diego) >>! In T326179#10700609, @Ottomata wrote: > Do we ever intend to produce messages into this... [19:08:31] 06Data-Engineering, 06Machine-Learning-Team, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10700773 (10Ottomata) Great! Then @AikoChou are you okay with either of the names proposed in https://phabricato... [19:43:57] 06Data-Engineering, 06Data-Engineering-Radar, 10Dumps-Generation, 06MediaWiki-Platform-Team, 06serviceops: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10700944 (10Jdforrester-WMF) [20:14:07] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: mw_content_reconcile_mw_content_history_monthly is not sensing correctly - https://phabricator.wikimedia.org/T390783 (10xcollazo) 03NEW [20:41:35] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 13Patch-For-Review: mw_content_reconcile_mw_content_history_monthly is not sensing correctly - https://phabricator.wikimedia.org/T390783#10701296 (10xcollazo) [20:54:06] 10Data-Engineering (Q3 2025 January 1st - March 31th): Enable Spark data lineage for all Airflow instances - https://phabricator.wikimedia.org/T386862#10701370 (10tchin) Actually nevermind, the error we had last time was ` Caused by: datahub.shaded.org.apache.kafka.common.config.ConfigException: No resolvable bo... [21:58:57] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE, 06SRE, 10SRE-Access-Requests: Requesting Kerberos access for ben.buchenau - https://phabricator.wikimedia.org/T390734#10701569 (10Dzahn) [22:15:31] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE, 06SRE, and 2 others: Requesting Kerberos access for ben.buchenau - https://phabricator.wikimedia.org/T390734#10701615 (10Dzahn) ACK! per [[https://wikitech.wikimedia.org/wiki/SRE/Clinic_Duty/Access_requests#Analytics_Groups | .. one of the... [22:21:23] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE, 06SRE, and 2 others: Requesting Kerberos access for ben.buchenau - https://phabricator.wikimedia.org/T390734#10701619 (10Dzahn) @Ben.buchenau I ran the `manage_principals.py create benbuchenau ..` command to create a Kerberos principal for... [22:22:47] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE, 06SRE, and 2 others: Requesting Kerberos access for ben.buchenau - https://phabricator.wikimedia.org/T390734#10701621 (10Dzahn) 05Open→03In progress a:03Ben.buchenau Please let us know if everything works for you now.