[00:00:25] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" - https://phabricator.wikimedia.org/T392107#10750539 (10bd808) 05Resolved→03Open [00:00:38] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" - https://phabricator.wikimedia.org/T392107#10750541 (10bd808) [00:00:40] 10Quarry: [bug] Quarry queries don't run - https://phabricator.wikimedia.org/T392169#10750544 (10bd808) →14Duplicate dup:03T392107 [00:01:15] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" - https://phabricator.wikimedia.org/T392107#10750549 (10bd808) Down again per {T392169}. The magical restart fix did not hold unfortunately. [00:12:14] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" - https://phabricator.wikimedia.org/T392107#10750571 (10bd808) `lang=shell-session bd808@laptop$ ssh root@quarry-bastion.quarry.eqiad1.wikimedia.cloud root@quarry-bastion:~# export KUBECONFIG=/home/rook/quarry/tofu/kube.c... [00:23:06] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" - https://phabricator.wikimedia.org/T392107#10750590 (10bd808) `kubectl delete pod -n quarry --all` seems to be the magic temporary fix. I'm going to assume that there is something out of the normal that keeps getting sta... [00:28:44] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" - https://phabricator.wikimedia.org/T392107#10750678 (10bd808) @Liz Things should be working again. I just ran a query myself on the system. I am going to leave this task open for now since the last "victory" only lasted... [00:36:08] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" - https://phabricator.wikimedia.org/T392107#10750695 (10bd808) The redis pod's storage looks good following the pod restart: `lang=shell-session root@quarry-bastion:~# kubectl -n quarry exec -it pod/redis-676b955f95-gn4d9... [00:36:45] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" due to redis pod running out of disk space - https://phabricator.wikimedia.org/T392107#10750696 (10bd808) [00:37:31] 10Quarry, 06cloud-services-team: No alerting for quarry - https://phabricator.wikimedia.org/T392138#10750697 (10bd808) [00:37:32] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" due to redis pod running out of disk space - https://phabricator.wikimedia.org/T392107#10750698 (10bd808) [00:37:37] 10Quarry, 06cloud-services-team: Update quarry redis deployment - https://phabricator.wikimedia.org/T392141#10750699 (10bd808) [00:37:43] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" due to redis pod running out of disk space - https://phabricator.wikimedia.org/T392107#10750700 (10bd808) [00:37:50] 10Quarry, 06cloud-services-team: Quarry: Why so many web pods? - https://phabricator.wikimedia.org/T392143#10750701 (10bd808) [00:37:57] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" due to redis pod running out of disk space - https://phabricator.wikimedia.org/T392107#10750702 (10bd808) [00:45:13] 10Quarry, 06cloud-services-team, 07Documentation: [[wikitech:Portal:Data Services/Admin/Quarry]] documents legacy Quarry setup - https://phabricator.wikimedia.org/T392181 (10bd808) 03NEW [01:05:07] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" due to redis pod running out of disk space - https://phabricator.wikimedia.org/T392107#10750725 (10Liz) I think this may be resolved. [01:32:47] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" due to redis pod running out of disk space - https://phabricator.wikimedia.org/T392107#10750743 (10Liz) Thank you, much appreciated. This situation seems to happen every 6-9 months. But, luckily, not too often. Thanks again. [05:10:58] 10Quarry, 06cloud-services-team: No alerting for quarry - https://phabricator.wikimedia.org/T392138#10750857 (10taavi) a:03taavi [05:32:19] 10Quarry, 06cloud-services-team: No alerting for quarry - https://phabricator.wikimedia.org/T392138#10750862 (10github-toolforge-bot) supertassu opened https://github.com/toolforge/quarry/pull/74 [05:54:16] 06Data-Engineering, 06Machine-Learning-Team, 06Research, 10Event-Platform: Emit revision revert risk scores as a stream and expose in EventStreams API - https://phabricator.wikimedia.org/T326179#10750896 (10kevinbazira) Thanks everyone for your contributions! The RRLA model-server that produces revert-risk... [06:06:44] 10Quarry, 06cloud-services-team: Update quarry redis deployment - https://phabricator.wikimedia.org/T392141#10750937 (10github-toolforge-bot) supertassu opened https://github.com/toolforge/quarry/pull/75 [06:07:06] 10Quarry, 06cloud-services-team: Update quarry redis deployment - https://phabricator.wikimedia.org/T392141#10750938 (10taavi) a:03taavi [07:11:05] 10Quarry, 06cloud-services-team: quarry.wmcloud.org: "This web service cannot be reached" due to redis pod running out of disk space - https://phabricator.wikimedia.org/T392107#10750989 (10dcaro) Pods are failing to run on node1 again, it seems to have gotten out of inodes: ` dcaro@quarry-bastion:~$ kubectl ge... [08:22:20] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10751151 (10FCeratto-WMF) [08:36:30] 06Data-Engineering, 10ContentTranslation, 10Metrics Platform, 10MW-1.44-notes (1.44.0-wmf.25; 2025-04-15): Update WMF-deployed extensions to use mw.config checks instead of manual m-dot URL hacks - https://phabricator.wikimedia.org/T390923#10751168 (10Nikerabbit) [09:50:08] (03CR) 10Filippo Giunchedi: Add Prometheus stats push (032 comments) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1136417 (https://phabricator.wikimedia.org/T389344) (owner: 10Hasan Akgün (WMDE)) [10:21:02] 10Quarry, 06cloud-services-team: No alerting for quarry - https://phabricator.wikimedia.org/T392138#10751390 (10github-toolforge-bot) supertassu closed https://github.com/toolforge/quarry/pull/74 [10:22:58] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10MediaWiki-DomainEvents, 07Epic, 10Event-Platform, and 2 others: EventBus: replace PageDeleteCompleteHook with PageDeletedListener - https://phabricator.wikimedia.org/T392205 (10gmodena) 03NEW [10:25:08] 10Quarry, 06cloud-services-team: Update quarry redis deployment - https://phabricator.wikimedia.org/T392141#10751415 (10github-toolforge-bot) supertassu closed https://github.com/toolforge/quarry/pull/75 [10:26:35] 10Quarry, 06cloud-services-team: Quarry: Why so many web pods? - https://phabricator.wikimedia.org/T392143#10751416 (10github-toolforge-bot) supertassu opened https://github.com/toolforge/quarry/pull/76 [10:28:35] 10Quarry, 06cloud-services-team: Quarry: Why so many web pods? - https://phabricator.wikimedia.org/T392143#10751417 (10taavi) a:03taavi [10:32:30] (03CR) 10Hasan Akgün (WMDE): Add Prometheus stats push (032 comments) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1136417 (https://phabricator.wikimedia.org/T389344) (owner: 10Hasan Akgün (WMDE)) [10:53:02] 10Quarry, 06cloud-services-team: Quarry: Why so many web pods? - https://phabricator.wikimedia.org/T392143#10751491 (10github-toolforge-bot) supertassu closed https://github.com/toolforge/quarry/pull/76 [11:03:36] 10Quarry, 06cloud-services-team: Quarry: Why so many web pods? - https://phabricator.wikimedia.org/T392143#10751517 (10taavi) 05Open→03Resolved [11:07:02] 10Quarry, 06cloud-services-team: No alerting for quarry - https://phabricator.wikimedia.org/T392138#10751526 (10taavi) 05Open→03Resolved ` MariaDB [prometheusconfig]> insert into alerts (project_id, name, expr, duration, severity, annotations) values (16, 'Down', 'up{project="quarry",job="app"} == 0',... [12:35:33] 06Data-Engineering, 06Data-Engineering-Radar, 10BDC-Implementation, 06Data-Platform-SRE, 07Epic: EPIC: Trino/MinIO/Hive-Standalone-Metaserver/Dagster/Metabase/Superset Implementation - https://phabricator.wikimedia.org/T377362#10751712 (10Jgreen) [12:44:36] 10Quarry, 06cloud-services-team: Update quarry redis deployment - https://phabricator.wikimedia.org/T392141#10751742 (10taavi) The merged patch migrated the deployment to a replicaset, and added an emptyDir volume for the data directory to use. If we care that data enough (job queue + sessions) I guess we coul... [12:44:43] 06Data-Engineering, 06Data-Engineering-Radar, 10BDC-Implementation, 06Data-Platform-SRE, 07Epic: EPIC: Trino/MinIO/Hive-Standalone-Metaserver/Dagster/Metabase/Superset Implementation - https://phabricator.wikimedia.org/T377362#10751743 (10Jgreen) [13:11:27] 06Data-Engineering, 06Data-Engineering-Radar, 10BDC-Implementation, 06Data-Platform-SRE, 07Epic: EPIC: Trino/MinIO/Hive-Standalone-Metaserver/Dagster/Metabase/Superset Implementation - https://phabricator.wikimedia.org/T377362#10751837 (10Jgreen) [13:35:50] (03CR) 10Lucas Werkmeister (WMDE): Add Prometheus stats push (032 comments) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1136417 (https://phabricator.wikimedia.org/T389344) (owner: 10Hasan Akgün (WMDE)) [14:23:36] (03CR) 10Filippo Giunchedi: Add Prometheus stats push (032 comments) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1136417 (https://phabricator.wikimedia.org/T389344) (owner: 10Hasan Akgün (WMDE)) [14:30:16] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Spike: Figure how best to produce wmf_content.mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T366544#10752223 (10xcollazo) 05In progress→03Resolved [14:57:02] 06Data-Engineering, 06Data-Engineering-Radar, 10CirrusSearch, 10Structured Data Engineering, and 3 others: Migrate image recommendation to use page_weighted_tags_changed stream - https://phabricator.wikimedia.org/T372912#10752303 (10dcausse) [15:03:56] 06Data-Engineering, 10Data Pipelines: Provide an easy way for MediaWiki to fetch aggregate statistics from the data lake - https://phabricator.wikimedia.org/T341649#10752334 (10HNordeenWMF) [15:12:23] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10752415 (10FCeratto-WMF) [15:34:22] 06Data-Engineering, 06Data-Engineering-Icebox, 10JsonConfig, 07Wikimedia-production-error: PHP Warning: The locally stored wiki page '[page]' has unsupported content model (from Dashiki) - https://phabricator.wikimedia.org/T293295#10752548 (10thcipriani) Fresh stack trace. Happened 3 times in the past 90 d... [15:47:53] (03PS13) 10Lucas Werkmeister (WMDE): Add Prometheus stats push [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1136417 (https://phabricator.wikimedia.org/T389344) (owner: 10Hasan Akgün (WMDE)) [15:48:13] (03CR) 10Lucas Werkmeister (WMDE): Add Prometheus stats push (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1136417 (https://phabricator.wikimedia.org/T389344) (owner: 10Hasan Akgün (WMDE)) [16:44:47] 06Data-Engineering: Facilitate automatic artifact cache warming for airflow-dags artifacts - https://phabricator.wikimedia.org/T392244 (10amastilovic) 03NEW [16:45:53] 10Data-Engineering (Q4 2025 April 1st - June 30th): Facilitate automatic artifact cache warming for airflow-dags artifacts - https://phabricator.wikimedia.org/T392244#10752922 (10Ahoelzl) [17:19:39] 06Data-Engineering, 06Data-Engineering-Radar, 10BDC-Implementation, 06Data-Platform-SRE, 07Epic: [MinIO] Investigate load-balancing approaches to eliminate SPOF - https://phabricator.wikimedia.org/T392249 (10Jgreen) 03NEW [17:26:02] 14Analytics-Radar, 06Data-Engineering, 06Data-Engineering-Radar, 06Growth-Team, and 4 others: Edits to Flow pages result in a page-links-change event with no performer - https://phabricator.wikimedia.org/T216726#10753128 (10Scardenasmolinar) 05Open→03Resolved a:03Scardenasmolinar Looks like we've... [21:02:44] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Create table and pyspark job to produce wmf_content.mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T391282#10753873 (10xcollazo) From [[ https://gitlab.wikimedia.org/repos/data-engineering/dumps... [21:11:23] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Create Airflow pipeline to produce wmf_content.mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T391283#10753902 (10xcollazo) From [[ https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/12...