[00:10:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [01:13:57] (03PS1) 10Eugene233: Remove init.py file from application - Added venv to gitignore [labs/tools/wdaudiolex-be] - 10https://gerrit.wikimedia.org/r/1125263 (https://phabricator.wikimedia.org/T386325) [02:55:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [02:55:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:00:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:01:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:05:18] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:06:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:11:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [05:26:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [05:56:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [07:11:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [12:08:53] (03update) 10fnegri: Update deps for K8s 1.29 [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/121 (https://phabricator.wikimedia.org/T362868) [12:09:16] (03update) 10fnegri: Update deps for K8s 1.29 [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/19 (https://phabricator.wikimedia.org/T362868) [12:10:06] (03update) 10fnegri: Update deps for K8s 1.29 [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/52 (https://phabricator.wikimedia.org/T362868) [12:58:21] 10Tool-inteGraality: Wikidata:WikiProject India/Schools not updated (since 5 jan 2025) - https://phabricator.wikimedia.org/T388226 (10Kuldeepburjbhalaike) 03NEW [13:18:28] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-5 [13:23:48] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-5 [13:36:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [13:37:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [13:37:48] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [13:48:17] (03open) 10chuckonwumelu: Update packages [toolforge-repos/sample-rust-buildpack-app] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-rust-buildpack-app/-/merge_requests/1 [13:48:46] 10Tools, 06All-and-every-Wikisource, 10Wikidata: Build a Scholia like website for Wikisource - https://phabricator.wikimedia.org/T344328#10613181 (10Lydia_Pintscher) Update: https://sangkalak.toolforge.org is work in this direction [13:55:10] (03close) 10chuckonwumelu: Update packages [toolforge-repos/sample-rust-buildpack-app] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-rust-buildpack-app/-/merge_requests/1 [13:55:18] (03reopen) 10chuckonwumelu: Update packages [toolforge-repos/sample-rust-buildpack-app] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-rust-buildpack-app/-/merge_requests/1 [14:35:29] (03update) 10fnegri: Update deps for K8s 1.29 [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/121 (https://phabricator.wikimedia.org/T362868) [14:39:47] (03update) 10fnegri: Update deps for K8s 1.29 [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/19 (https://phabricator.wikimedia.org/T362868) [14:40:30] (03update) 10fnegri: Update deps for K8s 1.29 [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/52 (https://phabricator.wikimedia.org/T362868) [14:42:42] (03update) 10fnegri: Update deps for K8s 1.29 [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/18 (https://phabricator.wikimedia.org/T362868) [14:42:55] (03update) 10fnegri: Update deps for K8s 1.29 [repos/cloud/toolforge/volume-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/volume-admission/-/merge_requests/27 (https://phabricator.wikimedia.org/T362868) [14:43:11] (03update) 10fnegri: Update deps for K8s 1.29 [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/21 (https://phabricator.wikimedia.org/T362868) [14:50:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [15:00:23] (03approved) 10andrew: Update packages [toolforge-repos/sample-rust-buildpack-app] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-rust-buildpack-app/-/merge_requests/1 (owner: 10chuckonwumelu) [15:00:28] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [15:05:05] (03merge) 10andrew: Update packages [toolforge-repos/sample-rust-buildpack-app] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-rust-buildpack-app/-/merge_requests/1 (owner: 10chuckonwumelu) [15:43:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on testhost2001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [16:30:51] (03open) 10raymond-ndibe: [jobs-api] test log streaming from multiple pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 [16:40:36] 10Striker: [toolsadmin] Striker cannot create Developer accounts or tools with names matching existing SUL accounts - https://phabricator.wikimedia.org/T380384#10613979 (10Andrew) Sorry @Ladsgroup, this requires a fair bit of setup and learning before I can even start fixing this, and I'm out next week. You'll p... [16:45:42] 10Striker: [toolsadmin] Striker cannot create Developer accounts or tools with names matching existing SUL accounts - https://phabricator.wikimedia.org/T380384#10614012 (10Ladsgroup) I went with `scheherazade-temp` and will make it a redirect or remove it once this is fixed. No worries! [16:47:44] (03update) 10raymond-ndibe: [jobs-api] test log streaming from multiple pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 [18:36:30] 06cloud-services-team, 10Horizon: horizon: some users get 401 unauthorized - https://phabricator.wikimedia.org/T388137#10614344 (10Andrew) At least one of these (kamila's) seems to be encoding confusion in the round trip from horizon to idp to keystone. I have worked around it by changing the oid<->keystone m... [18:45:26] 10Cloud-Services, 06cloud-services-team, 06Discovery-Search, 10Elasticsearch: Update alerting to correspond with the new cloudsearch cluster - https://phabricator.wikimedia.org/T388270 (10Andrew) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://pha... [19:13:50] (03update) 10raymond-ndibe: [jobs-api] test log streaming from multiple pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 [19:22:03] (03update) 10raymond-ndibe: [jobs-api] test log streaming from multiple pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 [19:27:00] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/32 (owner: 10l10n-bot) [19:27:03] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/32 (owner: 10l10n-bot) [19:28:10] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/9 (owner: 10l10n-bot) [19:28:12] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/9 (owner: 10l10n-bot) [19:43:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on testhost2001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [19:51:34] (03update) 10raymond-ndibe: [jobs-api] test log streaming from multiple pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 [20:13:53] (03update) 10raymond-ndibe: [jobs-api] test log streaming from multiple pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 [20:21:00] (03update) 10raymond-ndibe: [jobs-api] test log streaming from multiple pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 [20:25:01] (03update) 10raymond-ndibe: [jobs-api] test log streaming from multiple pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 [20:35:49] (03update) 10raymond-ndibe: [jobs-api] test log streaming from multiple pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 [20:47:00] 06cloud-services-team, 10Toolforge: Very long response headers cause a 502 response from the Toolforge front proxy - https://phabricator.wikimedia.org/T356525#10614607 (10So9q) I [[ https://github.com/dpriskorn/topic-creator-frontend/ | rewrote ]] my tool using React and it [[ https://topic-curator.toolforge.o... [20:48:13] 06cloud-services-team, 10Toolforge: Very long response headers cause a 502 response from the Toolforge front proxy - https://phabricator.wikimedia.org/T356525#10614623 (10So9q) 05Open→03Invalid [20:49:24] 06cloud-services-team, 10Toolforge: Very long response headers cause a 502 response from the Toolforge front proxy - https://phabricator.wikimedia.org/T356525#10614625 (10So9q) Closed as invalid aka won't fix. the defaults should suffice for everyone. [21:00:55] 10Toolforge (Toolforge iteration 18): [jobs-api] toolforge jobs logs -f should get the logs of all containers in all target pods - https://phabricator.wikimedia.org/T388274 (10Raymond_Ndibe) 03NEW [21:01:24] 10Toolforge (Toolforge iteration 18): [jobs-api] `toolforge jobs logs -f` should get the logs of all containers in all target pods - https://phabricator.wikimedia.org/T388274#10614671 (10Raymond_Ndibe) [21:02:02] 10Toolforge (Toolforge iteration 18): [jobs-api] "toolforge jobs logs -f" should get the logs of all containers in all target pods - https://phabricator.wikimedia.org/T388274#10614675 (10Raymond_Ndibe) [21:06:27] (03update) 10raymond-ndibe: [jobs-api] test log streaming from multiple pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 (https://phabricator.wikimedia.org/T388274) [21:23:53] 10tool-wscontest, 10ISA, 06Wiki-Mentor-Africa, 07good first task: WSContest Tool admin cannot edit contest page if their name is multi-word separated by underscore (they can edit if it's a space, instead) - https://phabricator.wikimedia.org/T336157#10614717 (10Peachey88) [21:26:09] (03update) 10raymond-ndibe: [jobs-api] stream logs from all containers in all pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 (https://phabricator.wikimedia.org/T388274) [21:26:32] (03update) 10raymond-ndibe: [jobs-api] stream logs from all containers in all pods [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/148 (https://phabricator.wikimedia.org/T388274) [21:27:13] (03open) 10raymond-ndibe: [toolforge-weld] get all logs from all containers in all pods [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/77 (https://phabricator.wikimedia.org/T388274) [21:27:39] (03update) 10raymond-ndibe: [toolforge-weld] get all logs from all containers in all pods [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/77 (https://phabricator.wikimedia.org/T388274) [21:34:26] 10tool-wscontest, 06Wiki-Mentor-Africa, 07good first task: WSContest Tool admin cannot edit contest page if their name is multi-word separated by underscore (they can edit if it's a space, instead) - https://phabricator.wikimedia.org/T336157#10614743 (10Eugene233) [21:35:54] (03open) 10raymond-ndibe: [jobs-cli] include container info in log [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/87 (https://phabricator.wikimedia.org/T388274) [21:37:40] 10Toolforge (Toolforge iteration 18), 13Patch-For-Review: [jobs-api] "toolforge jobs logs -f" should get the logs of all containers in all target pods - https://phabricator.wikimedia.org/T388274#10614759 (10Raymond_Ndibe) 05Open→03In progress [21:38:34] (03update) 10raymond-ndibe: [jobs-cli] include container info in log [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/87 (https://phabricator.wikimedia.org/T388274) [22:37:31] 10Cloud-Services, 06cloud-services-team, 06Discovery-Search, 10Elasticsearch: Cloudelastic alerts should route to data platform alerts, not wmcs - https://phabricator.wikimedia.org/T388270#10614960 (10RKemper) [22:40:00] 10Cloud-Services, 06cloud-services-team, 06Discovery-Search, 10Elasticsearch, 06SRE Observability: Cloudelastic alerts should route to data platform alerts, not wmcs - https://phabricator.wikimedia.org/T388270#10614963 (10RKemper) The alerts I'm seeing (unassigned shard check) are defined here: https://... [23:43:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on testhost2001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources