[00:06:40] 06Data-Engineering, 10Commons-Impact-Metrics, 10Commons-Impact-Metrics-Requests: Update Commons Impact Metrics allow-list June 2025 - https://phabricator.wikimedia.org/T398149#10956950 (10GFontenelle_WMF) [03:18:31] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [06:40:58] 06Data-Engineering, 06SRE: WE 5.4 FY 25/26: Improve automata detection at the edge and pass it to the refinery pipeline - https://phabricator.wikimedia.org/T396562#10957224 (10Joe) >>! In T396562#10907844, @JAllemandou wrote: > I think this would be feasible as most of frontend data is already available in th... [06:50:06] 10Data-Engineering-Roadmap, 06Discovery-Search, 10DPE-Mediawiki-Content, 07Epic: SUP: Use flink 1.20.1 - https://phabricator.wikimedia.org/T398159 (10pfischer) 03NEW [07:18:32] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [07:28:06] 10Data-Engineering-Roadmap, 06Discovery-Search, 10DPE-Mediawiki-Content: Flink: Update k8s operator to 1.12.0 - https://phabricator.wikimedia.org/T398162 (10pfischer) 03NEW [07:28:46] 10Data-Engineering-Roadmap, 06Discovery-Search, 10DPE-Mediawiki-Content: SUP: Use flink 1.20.1 - https://phabricator.wikimedia.org/T398159#10957301 (10pfischer) [08:04:51] 06Data-Engineering, 06Data-Engineering-Radar, 06Discovery-Search, 06Infrastructure-Foundations, and 2 others: Elasticsearch dependency upgrade in spicerack - https://phabricator.wikimedia.org/T390860#10957358 (10RKemper) a:03RKemper This is our top priority for this week. We'll be looking into replacing... [08:19:21] 10Data-Engineering (Q4 2025 April 1st - June 30th): Grow airflow-main global parallelism - https://phabricator.wikimedia.org/T398164 (10JAllemandou) 03NEW [08:19:39] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Data-Platform-SRE: Grow airflow-main global parallelism - https://phabricator.wikimedia.org/T398164#10957394 (10JAllemandou) [08:49:29] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.06.13 - 2025.07.04): Grow airflow-main global parallelism - https://phabricator.wikimedia.org/T398164#10957528 (10BTullis) a:03BTullis [08:49:52] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.06.13 - 2025.07.04): Grow airflow-main global parallelism - https://phabricator.wikimedia.org/T398164#10957531 (10BTullis) p:05Triage→03High [09:12:02] 06Data-Engineering: mediawiki_api_request has no data for June 27 - https://phabricator.wikimedia.org/T398173 (10Urbanecm_WMF) 03NEW [09:24:15] 06Data-Engineering: mediawiki_api_request has no data for June 27 - https://phabricator.wikimedia.org/T398173#10957654 (10JAllemandou) Hm, it looks like the data stopped to be published on the 25th of June: https://grafana.wikimedia.org/goto/WF5B_myHR?orgId=1 [10:41:18] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to airflow-an and statboxes for htriedman - https://phabricator.wikimedia.org/T398075#10957961 (10Clement_Goubert) 05Open→03In progress p:05Triage→03Medium Hi, As far as I can tell, you have access to the `analytics-platform-eng-adm... [10:43:46] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to airflow-an and statboxes for htriedman - https://phabricator.wikimedia.org/T398075#10957980 (10Clement_Goubert) [10:45:00] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to airflow-an and statboxes for htriedman - https://phabricator.wikimedia.org/T398075#10957983 (10Clement_Goubert) An old version of the L3 document was signed, could you sign the updated version as well, please? [10:45:09] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to airflow-an and statboxes for htriedman - https://phabricator.wikimedia.org/T398075#10957984 (10Clement_Goubert) a:03Clement_Goubert [10:46:29] 06Data-Engineering, 06Data-Engineering-Radar, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Grant Access to analytics-privatedata-user (and LDAP nda, wmde) for Anton Kokh (WMDE) - https://phabricator.wikimedia.org/T395917#10957990 (10Clement_Goubert) 05Open→03Stalled Stalled waiting for SSH k... [10:46:35] 06Data-Engineering, 06Data-Engineering-Radar, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Grant Access to analytics-privatedata-user (and LDAP nda, wmde) for Anton Kokh (WMDE) - https://phabricator.wikimedia.org/T395917#10957992 (10Clement_Goubert) p:05Triage→03Medium [10:49:47] 06Data-Engineering, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to airflow-an and statboxes for htriedman - https://phabricator.wikimedia.org/T398075#10958020 (10Clement_Goubert) [10:51:02] 06Data-Engineering, 06Data-Engineering-Radar, 07Epic, 07Wikimedia-production-error: eventgage-analytics has stopped producing events scine 2025-06-25 - https://phabricator.wikimedia.org/T398187 (10gmodena) 03NEW [10:51:35] 06Data-Engineering, 06Data-Engineering-Radar, 07Epic, 07Wikimedia-production-error: eventgage-analytics has stopped producing events since 2025-06-25 - https://phabricator.wikimedia.org/T398187#10958040 (10BTullis) [10:52:08] 06Data-Engineering, 06Data-Engineering-Radar, 07Epic, 07Tracking-Neverending: Data Platform Data Incidents - https://phabricator.wikimedia.org/T378559#10958042 (10taavi) [10:52:11] 06Data-Engineering, 06Data-Engineering-Radar, 07Epic, 10Event-Platform, 07Wikimedia-production-error: eventgage-analytics has stopped producing events since 2025-06-25 - https://phabricator.wikimedia.org/T398187#10958043 (10gmodena) [10:56:37] 06Data-Engineering, 06Data-Engineering-Radar, 07Epic, 10Event-Platform, 07Wikimedia-production-error: eventgage-analytics has stopped producing events since 2025-06-25 - https://phabricator.wikimedia.org/T398187#10958048 (10gmodena) [11:16:02] 06Data-Engineering, 06Data-Engineering-Radar, 10Event-Platform, 07Wikimedia-production-error: eventgage-analytics has stopped producing events since 2025-06-25 - https://phabricator.wikimedia.org/T398187#10958126 (10A_smart_kitten) [11:18:32] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [11:51:11] 06Data-Engineering, 06Data-Engineering-Radar, 10Event-Platform, 07Wikimedia-production-error: eventgage-analytics has stopped producing events since 2025-06-25 - https://phabricator.wikimedia.org/T398187#10958240 (10gmodena) We traced back the issue to a config [[ https://gerrit.wikimedia.org/r/c/operation... [12:40:46] 06Data-Engineering, 06Data-Engineering-Radar, 10Event-Platform, 07Wikimedia-production-error: eventgage-analytics has stopped producing events since 2025-06-25 - https://phabricator.wikimedia.org/T398187#10958412 (10gmodena) FWIW: `eventgate-analytics` is [[ https://gerrit.wikimedia.org/g/operations/alerts... [14:51:56] 06Data-Engineering: AlertLintProblem - https://phabricator.wikimedia.org/T395539#10958962 (10phaultfinder) [15:07:17] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Data-Engineering-Radar, 10Event-Platform, 07Wikimedia-production-error: eventgage-analytics has stopped producing events since 2025-06-25 - https://phabricator.wikimedia.org/T398187#10959029 (10gmodena) [15:07:19] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Data-Engineering-Radar, 10Event-Platform, 07Wikimedia-production-error: eventgage-analytics has stopped producing events since 2025-06-25 - https://phabricator.wikimedia.org/T398187#10959030 (10gmodena) p:05Triage→03Unbreak! [15:07:53] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Data-Engineering-Radar, 10Event-Platform, 07Wikimedia-production-error: eventgage-analytics has stopped producing events since 2025-06-25 - https://phabricator.wikimedia.org/T398187#10959036 (10gmodena) a:03gmodena [15:08:15] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Data-Engineering-Radar, 10Event-Platform, 07Wikimedia-production-error: eventgate-analytics has stopped producing events since 2025-06-25 - https://phabricator.wikimedia.org/T398187#10959038 (10gmodena) [15:11:38] 10Data-Engineering (Q4 2025 April 1st - June 30th): Gobblin test jobs for event and eventlogging_legacy are misconfigured - https://phabricator.wikimedia.org/T393645#10959066 (10amastilovic) 05Open→03Resolved [15:11:49] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 10Discovery-Search (2025.06.13 - 2025.07.04): SUP: Use flink 1.20.1 - https://phabricator.wikimedia.org/T398159#10959070 (10pfischer) [15:11:58] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 10Discovery-Search (2025.06.13 - 2025.07.04): Flink: Update k8s operator to 1.12.0 - https://phabricator.wikimedia.org/T398162#10959075 (10pfischer) [15:12:01] 10Data-Engineering-Roadmap, 06Discovery-Search, 10DPE-Mediawiki-Content: Flink: Update k8s operator to 1.12.0 - https://phabricator.wikimedia.org/T398162#10959077 (10pfischer) p:05Triage→03High [15:12:30] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 10Discovery-Search (2025.06.13 - 2025.07.04): Flink: Update k8s operator to 1.12.0 - https://phabricator.wikimedia.org/T398162#10959083 (10pfischer) [15:18:32] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [15:25:48] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.06.13 - 2025.07.04), 13Patch-For-Review: Grow airflow-main global parallelism - https://phabricator.wikimedia.org/T398164#10959178 (10BTullis) 05Open→03Resolved We increased the following config parameters: * `core.parallel... [15:27:27] 06Data-Engineering, 13Patch-For-Review: Improve spider detection in the webrequest refinery pipeline - https://phabricator.wikimedia.org/T394794#10959193 (10Joe) [15:27:47] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 10Discovery-Search (2025.06.13 - 2025.07.04): SUP: Use flink 1.20.1 - https://phabricator.wikimedia.org/T398159#10959210 (10pfischer) [15:30:18] 06Data-Engineering, 06SRE: Include accept-language header in turnilo/superset - https://phabricator.wikimedia.org/T398213 (10Joe) 03NEW [15:30:26] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 10Discovery-Search (2025.06.13 - 2025.07.04): Flink: Update k8s operator to 1.12.0 - https://phabricator.wikimedia.org/T398162#10959259 (10pfischer) [15:32:05] 10Data-Engineering-Roadmap, 06Data-Platform-SRE, 10DPE-Mediawiki-Content, 10Discovery-Search (2025.06.13 - 2025.07.04): Flink: Update k8s operator to 1.12.0 - https://phabricator.wikimedia.org/T398162#10959281 (10pfischer) [15:43:40] (03PS1) 10Giuseppe Lavagetto: Update druid to include the accept-language field [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1165054 (https://phabricator.wikimedia.org/T398213) [15:54:22] (03CR) 10Joal: "The code looks good, but the datasource in druid is already very big (almost 2Tb replcated 2 times), taking up to 40% of the cluster size." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1165054 (https://phabricator.wikimedia.org/T398213) (owner: 10Giuseppe Lavagetto) [15:55:34] 06Data-Engineering: mediawiki_api_request has no data for June 25 onwards - https://phabricator.wikimedia.org/T398173#10959439 (10Urbanecm_WMF) [15:59:01] 10Data-Engineering-Roadmap, 07Epic: [Epic] Instrument pageviews using events, instead of webrequests - https://phabricator.wikimedia.org/T371321#10959477 (10Milimetric) Some latest thoughts on this from our [[ https://wikimedia.slack.com/archives/C01DFMX6QLB/p1750997158361279 | slack thread ]] * **Excited**.... [16:07:47] 10Data-Engineering-Roadmap, 07Epic: [Epic] Instrument pageviews using events, instead of webrequests - https://phabricator.wikimedia.org/T371321#10959566 (10mpopov) Also documenting my reply from the same thread: **Regarding pageview instrumentation being on #experimentation_lab roadmap:** It is not. At most... [16:19:14] (03CR) 10Giuseppe Lavagetto: "I understand it's a lot of data, but it's used daily in incident response, and adding this info is very useful. As far as our purposes go," [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1165054 (https://phabricator.wikimedia.org/T398213) (owner: 10Giuseppe Lavagetto) [17:04:15] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Add metrics for monthly reconciles - https://phabricator.wikimedia.org/T388439#10959877 (10tchin) Adjusted airflow variables to use the new conda artifact. Should be good to go now. Now the only question is how l... [17:38:58] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.06.13 - 2025.07.04): Grow airflow-main global parallelism - https://phabricator.wikimedia.org/T398164#10960056 (10JAllemandou) Thanks so much folks :) [19:00:49] 10Data-Engineering (Q4 2025 April 1st - June 30th): Facilitate automatic artifact cache warming for airflow-dags artifacts - https://phabricator.wikimedia.org/T392244#10960339 (10amastilovic) OK, so now that we have the necessary updates to the `workflow_utils` library that will enable cache warming on base leve... [19:01:51] 10Data-Engineering (Q4 2025 April 1st - June 30th): Manage druid `webrequest_sampled_live` data size - https://phabricator.wikimedia.org/T398236 (10JAllemandou) 03NEW [19:01:57] (03CR) 10Joal: "Any column we drop helps. We keep 90D of data, we could also reduce that, or recompact the data keeping only a subset of dimensions after " [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1165054 (https://phabricator.wikimedia.org/T398213) (owner: 10Giuseppe Lavagetto) [19:18:32] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [19:24:37] 06Data-Engineering: mediawiki_api_request has no data for June 25 onwards - https://phabricator.wikimedia.org/T398173#10960508 (10gmodena) A mediawiki config config change inadvertently disabled EventBus logging. This was restored, and `event.mediawiki_api_request` data should be available as 2025-06-30. Relat... [19:25:41] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Data-Engineering-Radar, 10Event-Platform, 07Wikimedia-production-error: eventgate-analytics has stopped producing events since 2025-06-25 - https://phabricator.wikimedia.org/T398187#10960523 (10gmodena) Unfortunately we won't be able to backfill lost ev... [19:41:40] 10Data-Engineering-Roadmap, 07Epic: [Idea] Collect pageview data using client-side instrumentation - https://phabricator.wikimedia.org/T371321#10960619 (10nshahquinn-wmf) [19:59:24] 10Data-Engineering (Q4 2025 April 1st - June 30th): Manage druid `webrequest_sampled_live` data size - https://phabricator.wikimedia.org/T398236#10960703 (10CDanis) I think we could drop `client_port`. Are there stats available for which columns are the heaviest on storage? [20:16:07] 10Data-Engineering-Roadmap, 07Epic: [Idea] Collect pageview data using client-side instrumentation - https://phabricator.wikimedia.org/T371321#10960811 (10nshahquinn-wmf) [20:50:00] 10Data-Engineering (Q4 2025 April 1st - June 30th): Manage druid `webrequest_sampled_live` data size - https://phabricator.wikimedia.org/T398236#10960969 (10JAllemandou) > Are there stats available for which columns are the heaviest on storage? I don't have those unfortunately. The storage size is highly depend... [21:56:49] 06Data-Engineering, 10Datasets-General-or-Unknown, 10Pageviews-API: Pageviews / Mostread Data not available since June 28, 2025 - https://phabricator.wikimedia.org/T398150#10961132 (10MusikAnimal) [22:06:21] 06Data-Engineering, 10LDAP-Access-Requests, 06SRE, 10SRE-Access-Requests: Requesting access to airflow-an and statboxes for htriedman - https://phabricator.wikimedia.org/T398075#10961146 (10Htriedman) Hi @Clement_Goubert! When I navigate to the L3 document page, there's no option to sign again — any way I... [23:18:32] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem