[01:33:29] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [05:09:20] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130#10902603 (10Marostegui) [05:20:40] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10902652 (10Marostegui) s6 codfw master has been switched: T396509 [05:33:29] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [08:04:12] (03PS1) 10Gehel: feat(FileOutputCommitter): file output committer isolating jobs. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1155597 [08:15:02] 06Data-Engineering, 06SRE: WE 5.4 FY 25/26: Improve automata detection at the edge and pass it to the refinery pipeline - https://phabricator.wikimedia.org/T396562 (10Joe) 03NEW [08:18:42] 06Data-Engineering, 06SRE: WE 5.4 FY 25/26: Improve automata detection at the edge and pass it to the refinery pipeline - https://phabricator.wikimedia.org/T396562#10903050 (10Joe) p:05Triage→03High [08:20:45] 06Data-Engineering, 10EventStreams: SSE events from offline DC topics - https://phabricator.wikimedia.org/T396564 (10pfischer) 03NEW [09:33:29] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [09:46:42] !log restarting failed -drop- services on an-launcher1002 after applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1155621 [09:46:43] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:03:18] 06Data-Engineering, 06Data-Platform-SRE: Improve housekeeping of jar files in /tmp on Hadoop workers - https://phabricator.wikimedia.org/T396582 (10BTullis) 03NEW [10:07:38] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Movement-Insights, 06Traffic, 13Patch-For-Review: NEW BUG REPORT: Investigate rise in May 2025 Reader metrics - https://phabricator.wikimedia.org/T395934#10903556 (10Joe) One potential reason for this surge is linked to the activities of an actor, worki... [10:43:24] 06Data-Engineering, 06Data-Platform-SRE: Improve housekeeping of jar files in /tmp on Hadoop workers - https://phabricator.wikimedia.org/T396582#10903751 (10BTullis) I obtained the full list of how much space is used on each worker by jars more than 30 days old. ` btullis@cumin1003:~$ sudo cumin --force --no-p... [11:07:52] 06Data-Engineering, 10LDAP-Access-Requests, 06SRE: Grant Access to Product's Superset & Turnilo for SKivlehan - https://phabricator.wikimedia.org/T393626#10903827 (10elukey) @SKivlehan-WMF Hi! I think you need to request access to the `wmf` LDAP group, please check https://wikitech.wikimedia.org/wiki/SRE... [11:24:31] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Data-Platform-SRE: deployment-charts: remove deprecated mediawiki-content-history config - https://phabricator.wikimedia.org/T396593 (10gmodena) 03NEW [11:31:04] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Data-Platform-SRE, 13Patch-For-Review: deployment-charts: remove deprecated mediawiki-content-history config - https://phabricator.wikimedia.org/T396593#10903923 (10gmodena) [12:22:36] 06Data-Engineering, 06Data-Platform-SRE: Improve housekeeping of jar files in /tmp on Hadoop workers - https://phabricator.wikimedia.org/T396582#10904208 (10BTullis) I have done the following. ` btullis@cumin1003:~$ sudo cumin A:hadoop-worker 'find /tmp -name *.jar -mtime +30 -delete' 135 hosts will be targete... [12:47:47] 06Data-Engineering, 06Data-Persistence, 10Data-Platform, 10Data-Services, and 3 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10904329 (10fnegri) @Alien333 thanks for reporting this! I don't think the assignments (`:=`) are the problem, I narrowed it down to M... [13:33:29] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [13:37:43] 06Data-Engineering, 06SRE: WE 5.4 FY 25/26: Improve automata detection at the edge and pass it to the refinery pipeline - https://phabricator.wikimedia.org/T396562#10904642 (10Joe) There's a few open questions here: * In terms of pure traffic control, which is what SRE want, only running detection on cache mis... [14:09:52] stevemunene: o/ [14:10:11] I'd need to re-run the provision cookbook (causing a reboot) for an-conf1004, and potentially also for 1005 and 1006 [14:10:23] of course the timing is perfect, I see that you are adding them to the cluster :( [14:10:49] if one is still out of the cluster we can start with that one first :) [14:11:47] volans: 1006 is still in insetup, confirmed via grafana [14:11:53] is Steve isn't working on it we could test [14:13:08] btullis: o/ ---^ Do you know by any chance? [14:14:07] Yes, feel free to go ahead. stevemunene is out for a couple of hours, but 1006 hasn't been added to the cluster yet. [14:14:18] thx [14:14:49] thanks! [14:15:36] I think that 1004 and 1005 can also be done. One by one for preference, although the cluster is at n=5 so it should be possible to lose them both if needs be. [14:16:06] super yes [14:31:23] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Investigate reasons for remaining inconsistencies - https://phabricator.wikimedia.org/T385112#10904970 (10xcollazo) >>! In T385112#10901527, @xcollazo wrote: >>>! In T385112#10900753, @xcollazo wrote: >> [[ https... [15:02:54] 06Data-Engineering, 10LDAP-Access-Requests, 06SRE: Grant Access to Product's Superset & Turnilo for SKivlehan - https://phabricator.wikimedia.org/T393626#10905110 (10elukey) 05Resolved→03Open [17:33:29] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [19:07:21] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data-Engineering-Wikistats, 10Data Pipelines, and 4 others: Merge ks-Arab and ks-Deva to ks - https://phabricator.wikimedia.org/T314476#10906354 (10srishakatux) [19:09:10] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data-Engineering-Wikistats, 10Data Pipelines, and 4 others: Merge ks-Arab and ks-Deva to ks - https://phabricator.wikimedia.org/T314476#10906359 (10srishakatux) [19:19:28] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data-Engineering-Wikistats, 10Data Pipelines, and 4 others: Merge ks-Arab and ks-Deva to ks - https://phabricator.wikimedia.org/T314476#10906376 (10srishakatux) [21:26:22] 06Data-Engineering: NEW BUG REPORT wmf.interlanguage_navigation missing mobile data - https://phabricator.wikimedia.org/T396514#10906694 (10CMyrick-WMF) Another bug we just came across, related to the referer path: Per https://gerrit.wikimedia.org/g/analytics/refinery/+/fee5f29f8f1955f292532e65478bc6eaddea9846... [21:33:30] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [22:10:54] 06Data-Engineering, 06Data-Platform-SRE: Request for dedicated Airflow instance for WME - https://phabricator.wikimedia.org/T396672 (10Ahoelzl) 03NEW [22:50:31] 06Data-Engineering: [Iceberg Migration] Extend Iceberg table maintenance mechanism to support multiple Airflow instances - https://phabricator.wikimedia.org/T373693#10906965 (10amastilovic) So, now that T383931 has been completed, what else is there that should be done with regards to this ticket? We now do have...