[00:34:04] 06cloud-services-team, 10Toolforge: "toolforge build start" fails due to Heroku download error? - https://phabricator.wikimedia.org/T420547 (10Yaron_Koren) 03NEW [02:45:40] (03update) 10raymond-ndibe: Draft: job: clean up orphaned s3 objects [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/63 (https://phabricator.wikimedia.org/T418528) [03:15:32] (03update) 10raymond-ndibe: models: make job_type always show as set [repos/cloud/toolforge/jobs-api] (fix_images) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/279 (owner: 10dcaro) [03:16:27] (03approved) 10raymond-ndibe: models: make job_type always show as set [repos/cloud/toolforge/jobs-api] (fix_images) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/279 (owner: 10dcaro) [03:28:57] (03approved) 10raymond-ndibe: images: don't set the digest if it was not passed [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/278 (owner: 10dcaro) [03:28:58] (03update) 10raymond-ndibe: images: don't set the digest if it was not passed [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/278 (owner: 10dcaro) [03:34:57] (03update) 10raymond-ndibe: start: add --use-deprecated-versions flag [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/129 (owner: 10dcaro) [03:35:01] (03approved) 10raymond-ndibe: start: add --use-deprecated-versions flag [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/129 (owner: 10dcaro) [06:54:28] 10Toolforge (Toolforge iteration 26): [Toolforge Sustainability Framework] Create an inventory of Toolforge actions - https://phabricator.wikimedia.org/T420559 (10komla) 03NEW [09:16:46] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge: Add new k8s toolforge workers to cater for memory requests - https://phabricator.wikimedia.org/T419824#11726659 (10fgiunchedi) 05In progress→03Resolved a:03fgiunchedi This is done, however 32GB barely made a dent into the % requests vs available... [09:17:54] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/23287809902 (https://github.com/cluebotng/component-configs/commits/0976850451c9fbb8c4afb773cc70b91cd7c6fdeb) [09:17:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [09:21:16] (03update) 10raymond-ndibe: [status] make job status an enum, with clearly defined states [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/208 (https://phabricator.wikimedia.org/T401172) [10:05:09] (03update) 10dcaro: core: update jobs in storage too [repos/cloud/toolforge/jobs-api] (add_job_type_as_set) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/277 [10:17:14] PROBLEM - toolschecker: Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out - string OK not found on http://checker.tools.wmflabs.org:80/nfs/dumps - 324 bytes in 60.006 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [10:20:17] RECOVERY - toolschecker: Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 58.188 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [10:28:54] 06cloud-services-team, 10Toolforge: Audit tools memory requests vs actual usage - https://phabricator.wikimedia.org/T420565 (10fgiunchedi) 03NEW [10:39:43] (03merge) 10dcaro: images: don't set the digest if it was not passed [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/278 [10:39:46] (03update) 10dcaro: models: make job_type always show as set [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/279 [10:41:15] (03update) 10dcaro: models: make job_type always show as set [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/279 [10:42:53] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.477-20260319103957-8197eb50 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1176 [10:43:01] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.477-20260319103957-8197eb50 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1176 [10:48:04] (03open) 10taavi: infra-tracing: Remove configured Ingress object [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1177 [10:48:14] (03update) 10taavi: infra-tracing: Remove configured Ingress object [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1177 [10:48:19] (03update) 10taavi: infra-tracing: Remove configured Ingress object [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1177 [11:03:10] (03approved) 10fnegri: infra-tracing: Remove configured Ingress object [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1177 (owner: 10taavi) [11:04:13] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing [11:04:57] (03approved) 10dcaro: infra-tracing: Remove configured Ingress object [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1177 (owner: 10taavi) [11:05:24] 10Cloud-VPS (Quota-requests), 10Catalyst: Disk quota increase for catalyst - https://phabricator.wikimedia.org/T420544#11726934 (10dcaro) +1 [11:10:08] 10Cloud-VPS (Quota-requests), 10Catalyst: Disk quota increase for catalyst - https://phabricator.wikimedia.org/T420544#11726952 (10fnegri) 05Open→03In progress p:05Triage→03Medium a:03fnegri [11:17:01] !log fnegri@cloudcumin1001 catalyst START - Cookbook wmcs.openstack.quota_increase by 320 gigabytes (T420544) [11:17:05] T420544: Disk quota increase for catalyst - https://phabricator.wikimedia.org/T420544 [11:17:09] !log fnegri@cloudcumin1001 catalyst END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) by 320 gigabytes (T420544) [11:18:18] 10Cloud-VPS (Quota-requests), 10Catalyst: Disk quota increase for catalyst - https://phabricator.wikimedia.org/T420544#11726968 (10fnegri) 05In progress→03Resolved [11:19:00] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing [11:19:05] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing [11:25:08] 10Cloud-VPS (Project-requests): Request creation of s3etherpad VPS project - https://phabricator.wikimedia.org/T420532#11726990 (10fnegri) I see a missed opportunity of calling this `s3th3rpad` :P More seriously, can we call it "etherpad-s3" instead, or "etherpad-backup-s3"? I would like to avoid ending up with... [11:26:06] 06cloud-services-team, 10Toolforge: Audit tools memory requests vs actual usage - https://phabricator.wikimedia.org/T420565#11727003 (10fnegri) p:05Triage→03Medium [11:33:24] 06cloud-services-team, 10Toolforge: "toolforge build start" fails due to Heroku download error? - https://phabricator.wikimedia.org/T420547#11727018 (10dcaro) @Yaron_Koren can you show the `toolforge build list` output? that will give some info on what code you used and what tool is affected [11:36:33] 06cloud-services-team, 10Toolforge: "toolforge build start" fails due to Heroku download error? - https://phabricator.wikimedia.org/T420547#11727025 (10dcaro) I have been able to reproduce with the sample php app: ` tools.sample-php-buildpack-app@tools-bastion-15:~$ toolforge build start https://gitlab.wikimed... [11:37:52] 06cloud-services-team, 10Toolforge: "toolforge build start" fails due to Heroku download error? - https://phabricator.wikimedia.org/T420547#11727041 (10dcaro) Using `--latest-versions` also fails: ` tools.sample-php-buildpack-app@tools-bastion-15:~$ toolforge build start https://gitlab.wikimedia.org/toolforg... [11:38:02] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing [11:38:13] (03merge) 10taavi: infra-tracing: Remove configured Ingress object [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1177 [11:39:05] 06cloud-services-team, 10Toolforge, 06tools-platform-team: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T420153#11727042 (10fnegri) 05Open→03Resolved p:05Triage→03Medium a:03fnegri [11:42:49] 06cloud-services-team, 10Toolforge: "toolforge build start" fails due to Heroku download error? - https://phabricator.wikimedia.org/T420547#11727058 (10dcaro) Related upstream change in which they deprecated that url/repo https://github.com/heroku/buildpacks-php/commit/ba1d65dc691c1cb9bdbf9a8405bffff491b77bcc [11:49:29] 06cloud-services-team, 10Toolforge: "toolforge build start" fails due to Heroku download error? - https://phabricator.wikimedia.org/T420547#11727110 (10dcaro) Using the soon to be 'latest' buildpacks does work correctly: ` [step-build] 2026-03-19T11:48:02.353045996Z ## Heroku PHP Buildpack [step-build] 2026-03... [11:54:51] 06cloud-services-team, 10Toolforge: [builds-api,builds-builder] php buildpack is broken on current default buidlpacks and `--use-latest-versions` ones - https://phabricator.wikimedia.org/T420547#11727134 (10dcaro) [11:55:13] 06cloud-services-team, 10Toolforge (Toolforge iteration 26): [builds-api,builds-builder] php buildpack is broken on current default buidlpacks and `--use-latest-versions` ones - https://phabricator.wikimedia.org/T420547#11727135 (10dcaro) p:05Triage→03High a:03dcaro [11:57:26] 06cloud-services-team, 10Toolforge (Toolforge iteration 26): [builds-api,builds-builder] php buildpack is broken on current default buidlpacks and `--use-latest-versions` ones - https://phabricator.wikimedia.org/T420547#11727145 (10dcaro) The current list of tools using the latest builders (or that used them r... [12:00:48] 06cloud-services-team, 10Toolforge (Toolforge iteration 26): [builds-api,builds-builder] php buildpack is broken on current default buidlpacks and `--use-latest-versions` ones - https://phabricator.wikimedia.org/T420547#11727153 (10fnegri) > Maybe we should update directly the latest builders with the newer on... [12:01:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [12:26:03] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/63 [12:26:04] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/35 [12:28:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [12:33:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [12:51:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [12:57:45] 10Tool-campwiz-nxt, 10Google-Summer-of-Code (2026): GSoC 2026: CampWiz NxT Redesign - https://phabricator.wikimedia.org/T414269#11727304 (10UnknownStrange) Hi @Nokib_Sarkar and @Tiven2240! when i try to send the email but it keep on failing and i keep on getting this message Hello princenyarkoedwin@gmail.com... [13:01:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [13:02:25] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [13:15:16] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/63 (owner: 10l10n-bot) [13:15:20] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/63 (owner: 10l10n-bot) [13:16:25] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/35 (owner: 10l10n-bot) [13:16:28] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/35 (owner: 10l10n-bot) [13:31:20] 10Tool-wikimonitor, 07good first task, 07patch-welcome: Unable to Edit SpEL Condition from Mobile - https://phabricator.wikimedia.org/T420409#11727483 (10Vishaldevops) Hello, can i work on this task. Is it available right now?, if yes can you point me to the file or repository where the issue needs to be fix... [13:35:26] 06cloud-services-team, 10Data-Services, 10Datasets-General-or-Unknown, 06Traffic, 13Patch-For-Review: Move dumps.wikimedia.org HTTP service behind CDN edge - https://phabricator.wikimedia.org/T306550#11727489 (10CDanis) >>! In T306550#11651360, @taavi wrote: > (FWIW, a major reason these connections are... [13:38:07] 06cloud-services-team, 10PAWS, 06tools-platform-team: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T420152#11727510 (10fnegri) 05Open→03In progress p:05Triage→03Medium a:03fnegri [13:38:18] 06cloud-services-team, 10PAWS, 06tools-platform-team: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T418629#11727513 (10fnegri) 05Open→03In progress p:05Triage→03Medium a:03fnegri [13:42:10] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [13:45:33] 06cloud-services-team, 10Data-Services, 10Datasets-General-or-Unknown, 06Traffic, 13Patch-For-Review: Move dumps.wikimedia.org HTTP service behind CDN edge - https://phabricator.wikimedia.org/T306550#11727553 (10BBlack) Trying to pull these threads together for a bit of context. The question that isn't... [13:46:59] 10Tool-wikimonitor, 07good first task, 07patch-welcome: Unable to Edit SpEL Condition from Mobile - https://phabricator.wikimedia.org/T420409#11727557 (10Gerges) Hi @SheetalPro and @Vishaldevops, Thanks for your interest in working on this task! The issue is still open and available. The repository is: htt... [14:07:58] 06cloud-services-team, 10Toolforge: Audit tools memory requests vs actual usage - https://phabricator.wikimedia.org/T420565#11727731 (10dcaro) > Of immediate note the fact that 50% of namespaces/tools use less than 20%, i.e. we could be reducing their requests by 4-5x Can we check how many of the ones that us... [14:14:21] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 26): [builds-api,builds-builder] php buildpack is broken on current default buidlpacks and `--use-latest-versions` ones - https://phabricator.wikimedia.org/T420547#11727764 (10fnegri) [14:21:24] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [14:23:10] 10Tool-wikimonitor, 07good first task, 07patch-welcome: Unable to Edit SpEL Condition from Mobile - https://phabricator.wikimedia.org/T420409#11727839 (10SheetalPro) Hi, thanks for sharing the details. I would like to work on this issue. I will start by investigating the CodeMirror touch event handling on m... [14:30:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [14:33:11] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [14:40:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [14:43:07] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [14:53:31] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.reboot for all nodes [14:56:47] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [15:04:45] 10Cloud-VPS (Project-requests): Request creation of s3etherpad VPS project - https://phabricator.wikimedia.org/T420532#11728030 (10Andrew) +1, seems good [15:17:44] FIRING: IstioGatewayPodMisplaced: istio-gateway pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIstioGatewayPodMisplaced [15:18:14] FIRING: ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server toolsbeta-test-k8s-ingress-12.toolsbeta.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [15:19:29] (03update) 10dcaro: jobs-api: bump to 0.0.477-20260319103957-8197eb50 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1176 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [15:19:30] (03approved) 10dcaro: jobs-api: bump to 0.0.477-20260319103957-8197eb50 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1176 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [15:19:58] (03merge) 10dcaro: jobs-api: bump to 0.0.477-20260319103957-8197eb50 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1176 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [15:22:44] RESOLVED: [2x] IstioGatewayPodMisplaced: istio-gateway pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIstioGatewayPodMisplaced [15:23:14] RESOLVED: [3x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server toolsbeta-test-k8s-control-12.toolsbeta.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDo [15:24:14] FIRING: [2x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server toolsbeta-test-k8s-control-11.toolsbeta.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [15:25:41] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all nodes [15:28:17] (03merge) 10dcaro: models: make job_type always show as set [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/279 [15:28:22] (03update) 10dcaro: core: update jobs in storage too [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/277 [15:29:07] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for all nodes [15:29:14] RESOLVED: [7x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server toolsbeta-test-k8s-control-10.toolsbeta.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDo [15:29:58] 06cloud-services-team, 10Striker, 10Bitu, 06Infrastructure-Foundations: Confusion about password change location and account linking between toolsadmin and IDM - https://phabricator.wikimedia.org/T420598 (10Dondersmooi) 03NEW [15:31:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [15:31:59] 06cloud-services-team, 10Striker, 10Bitu, 06Infrastructure-Foundations: Confusion about password change location and account linking between toolsadmin and IDM - https://phabricator.wikimedia.org/T420598#11728222 (10Dondersmooi) [15:32:25] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.478-20260319152832-0ffd3702 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1178 [15:32:26] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.478-20260319152832-0ffd3702 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1178 [15:36:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [15:39:44] 10Tool-wikimonitor, 07good first task, 07patch-welcome: Unable to Edit SpEL Condition from Mobile - https://phabricator.wikimedia.org/T420409#11728287 (10SheetalPro) i am started doing this project but it is not running could you suggest me to how to run this project [15:41:21] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [15:41:58] 10Tool-wikimonitor, 07good first task, 07patch-welcome: Unable to Edit SpEL Condition from Mobile - https://phabricator.wikimedia.org/T420409#11728300 (10Aklapper) @SheetalPro: Hi, please follow https://www.mediawiki.org/wiki/New_Developers#Communication_tips and provide much more information what exactly yo... [15:43:15] 06cloud-services-team, 10Striker, 10Bitu, 06Infrastructure-Foundations, and 3 others: Inconsistent OAuth endpoint for Wikimedia Global Account (SUL) across services - https://phabricator.wikimedia.org/T420599 (10Dondersmooi) 03NEW [15:47:53] 10Tool-wikimonitor, 07good first task, 07patch-welcome: Unable to Edit SpEL Condition from Mobile - https://phabricator.wikimedia.org/T420409#11728343 (10SheetalPro) i am try to run the project by the command mvn spring-boot:run but it is not running [15:49:24] 10Tool-wikimonitor, 07good first task, 07patch-welcome: Unable to Edit SpEL Condition from Mobile - https://phabricator.wikimedia.org/T420409#11728353 (10Gerges) @SheetalPro, Did you create a .env file? [15:52:49] 06cloud-services-team, 10Striker, 10Bitu, 06Infrastructure-Foundations, 10Phabricator: Inconsistent OAuth endpoint for Wikimedia Global Account (SUL) across services - https://phabricator.wikimedia.org/T420599#11728389 (10JJMC89) [15:53:11] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [15:55:59] 10Tool-wikimonitor, 07good first task, 07patch-welcome: Unable to Edit SpEL Condition from Mobile - https://phabricator.wikimedia.org/T420409#11728431 (10SheetalPro) no i did not created the .env file [16:06:13] (03open) 10dcaro: build: replace runtime.txt with .python-version and pin to 3.13 [toolforge-repos/pywikibot-buildservice] (toolforge) - 10https://gitlab.wikimedia.org/toolforge-repos/pywikibot-buildservice/-/merge_requests/1 [16:08:33] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [16:11:40] 06cloud-services-team (FY2025/2026-Q3-Q4), 10PAWS, 06tools-platform-team: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T418629#11728578 (10fnegri) 05In progress→03Resolved [16:12:00] 06cloud-services-team (FY2025/2026-Q3-Q4), 10PAWS, 06tools-platform-team: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T420152#11728584 (10fnegri) 05In progress→03Resolved [16:15:32] FIRING: InstanceDown: Project tools instance tools-k8s-worker-nfs-76 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:20:32] RESOLVED: InstanceDown: Project tools instance tools-k8s-worker-nfs-76 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:24:26] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [16:25:12] (03open) 10dcaro: builds-api: update latest builder/runner [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1179 [16:27:36] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.roll_reboot_osds (T419960) [16:28:48] (03approved) 10dcaro: jobs-api: bump to 0.0.478-20260319152832-0ffd3702 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1178 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [16:28:51] (03merge) 10dcaro: jobs-api: bump to 0.0.478-20260319152832-0ffd3702 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1178 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [16:29:09] (03update) 10dcaro: builds-api: update latest builder/runner [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1179 [16:42:08] 10Cloud-VPS (Quota-requests), 10Catalyst: Disk quota increase for catalyst-dev - https://phabricator.wikimedia.org/T420611 (10thcipriani) 03NEW [16:45:43] (03PS1) 10Krinkle: build: Minor clean up after import to gerrit.wikimedia.org [labs/xtools] - 10https://gerrit.wikimedia.org/r/1255785 (https://phabricator.wikimedia.org/T402086) [16:46:56] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06Data-Persistence: clouddb1013 crashed after the upgrade to mariadb 10.11.16 - https://phabricator.wikimedia.org/T420177#11728804 (10fnegri) It crashed again :/ ` fnegri@clouddb1013:~$ sudo journalctl -u mariadb@s3 -g "SEGV" --since -7d -- Boot 38... [16:48:11] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.roll_reboot_osds (exit_code=0) (T419960) [16:48:21] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [16:50:13] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.roll_reboot_osds (T419960) [16:53:05] PROBLEM - Host cloudcephosd1016 is DOWN: PING CRITICAL - Packet loss = 100% [16:53:10] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api [16:53:38] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-api [16:54:33] RECOVERY - Host cloudcephosd1016 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [16:57:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [17:00:25] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api [17:12:05] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 26): [builds-api,builds-builder] php buildpack is broken on current default buidlpacks and `--use-latest-versions` ones - https://phabricator.wikimedia.org/T420547#11728934 (10dcaro) The fix has been deployed, for php, you have to use... [17:13:39] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 26), 13Patch-For-Review: [builds-builder] Add support for Heroku's "24" builder stack based on Ubuntu 2024.04 noble - https://phabricator.wikimedia.org/T380127#11728943 (10dcaro) Due to T420547, the release of the newer buildpacks ha... [17:15:02] 06cloud-services-team, 10Tool-spacemedia, 10Toolforge: [Build service] latest builder has old Java - https://phabricator.wikimedia.org/T405415#11728954 (10dcaro) >>! In T405415#11694405, @Don-vip wrote: > @dcaro is this update live for us? https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-depl... [17:17:49] 10Tool-inteGraality: Datatype is not detected properly when using advanced PredicateGroupingConfiguration - https://phabricator.wikimedia.org/T371178#11728960 (10JeanFred) [17:21:29] (03approved) 10dcaro: builds-api: update latest builder/runner [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1179 [17:21:32] (03merge) 10dcaro: builds-api: update latest builder/runner [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1179 [17:31:13] (03open) 10taavi: Clean up access to cloud channels for folks no longer around [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/34 [17:31:13] (03update) 10taavi: Clean up access to cloud channels for folks no longer around [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/34 [17:31:14] (03update) 10taavi: Add bliviero to cloud channels [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/35 [17:31:14] (03open) 10taavi: Add bliviero to cloud channels [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/35 [17:31:23] (03update) 10taavi: Clean up access to cloud channels for folks no longer around [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/34 [17:31:24] (03update) 10taavi: Add bliviero to cloud channels [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/35 [17:31:25] (03update) 10taavi: Add bliviero to cloud channels [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/35 [17:31:29] (03update) 10taavi: Clean up access to cloud channels for folks no longer around [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/34 [17:56:38] 06cloud-services-team, 10Tool-spacemedia, 10Toolforge: [Build service] latest builder has old Java - https://phabricator.wikimedia.org/T405415#11729124 (10Don-vip) Hi David, thank you so much! I was able to update to java 25 very smoothly: https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/commit/... [18:27:15] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 26): [builds-api,builds-builder] php buildpack is broken on current default buidlpacks and `--use-latest-versions` ones - https://phabricator.wikimedia.org/T420547#11729314 (10Yaron_Koren) Thank you! It works with "--use-latest-versions". [18:39:09] FIRING: CephSlowOps: Ceph cluster in eqiad has slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [18:44:32] FIRING: InstanceDown: Project metricsinfra instance metricsinfra-controller-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:47:50] FIRING: [4x] ProbeDown: Service tools-k8s-haproxy-7:443 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:49:32] RESOLVED: InstanceDown: Project metricsinfra instance metricsinfra-controller-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:49:44] 06cloud-services-team, 10Data-Services: [wikireplicas] Create views for new wiki abstractwiki - https://phabricator.wikimedia.org/T420637 (10JJMC89) 03NEW [18:51:56] FIRING: SystemdUnitDown: The service unit disable-tool.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:52:07] FIRING: ToolsDBWritableState: ToolsDB number of read/write instances is not 1 #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBWritableState [18:52:23] PROBLEM - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out - string OK not found on http://checker.tools.wmflabs.org:80/nfs/home - 324 bytes in 60.014 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [18:52:51] FIRING: [7x] ProbeDown: Service tools-k8s-haproxy-7:443 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:53:16] 10VPS-project-Wikistats: Add abstractwiki to wikistats - https://phabricator.wikimedia.org/T420640 (10JJMC89) 03NEW [18:53:50] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.roll_reboot_osds (exit_code=97) (T419960) [18:54:32] FIRING: InstanceDown: Project toolsbeta instance toolsbeta-puppetdb-03 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:55:32] FIRING: InstanceDown: Project cloudinfra instance cloudinfra-idp-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:55:32] FIRING: [2x] InstanceDown: Project tools instance tools-k8s-worker-nfs-46 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:55:32] FIRING: InstanceDown: Project gitlab-runners instance runner-1034 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:56:32] FIRING: InstanceDown: Project metricsinfra instance metricsinfra-controller-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:57:32] FIRING: WidespreadInstanceDown: Widespread instances down in project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [18:58:33] 10VPS-project-Codesearch: Codesearch is partially down - https://phabricator.wikimedia.org/T420644 (10SomeRandomDeveloper) 03NEW [18:58:34] FIRING: ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeWebHighErrorRate - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/infra-k8s-haproxy?var-frontend=k8s-ingress-https&var-backend=k8s-ingress-http&var-cluster=prometheus-tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeWebHighErrorRate [18:59:21] FIRING: MaintainKubeusersDown: maintain-kubeusers is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainKubeusersDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DMaintainKubeusersDown [18:59:32] FIRING: [3x] InstanceDown: Project toolsbeta instance toolsbeta-puppetdb-03 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:59:59] 06cloud-services-team, 10Data-Services: [wikireplicas] Create views for new wiki abstractwiki - https://phabricator.wikimedia.org/T420637#11729614 (10Jdforrester-WMF) [19:00:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-68 has some processes stuck on NFS - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:00:32] FIRING: [2x] InstanceDown: Project gitlab-runners instance runner-1034 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:00:32] FIRING: [8x] InstanceDown: Project tools instance tools-k8s-worker-nfs-12 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:00:33] !log tools.cluebotng-monitoring Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/23311687707 (https://github.com/cluebotng/component-configs/commits/1f6581053805672c606e64277292ab5b6b03ed68) [19:00:34] RECOVERY - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 51.466 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [19:02:32] RESOLVED: WidespreadInstanceDown: Widespread instances down in project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [19:02:51] RESOLVED: [7x] ProbeDown: Service tools-k8s-haproxy-7:443 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:03:13] 10VPS-project-Codesearch: Codesearch is partially down (2026-03-19) - https://phabricator.wikimedia.org/T420644#11729639 (10A_smart_kitten) [19:03:34] RESOLVED: ToolforgeWebHighErrorRate: High 5xx rate on Toolforge web services #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeWebHighErrorRate - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/infra-k8s-haproxy?var-frontend=k8s-ingress-https&var-backend=k8s-ingress-http&var-cluster=prometheus-tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeWebHighErrorRate [19:04:21] RESOLVED: MaintainKubeusersDown: maintain-kubeusers is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainKubeusersDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DMaintainKubeusersDown [19:04:32] RESOLVED: [4x] InstanceDown: Project toolsbeta instance toolsbeta-nfs-5 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:05:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-68 has some processes stuck on NFS - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:05:32] RESOLVED: [2x] InstanceDown: Project gitlab-runners instance runner-1034 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:05:32] RESOLVED: [2x] InstanceDown: Project cloudinfra instance cloudinfra-idp-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:05:32] RESOLVED: [9x] InstanceDown: Project tools instance tools-k8s-worker-nfs-12 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:06:32] RESOLVED: InstanceDown: Project metricsinfra instance metricsinfra-controller-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:06:56] RESOLVED: SystemdUnitDown: The service unit disable-tool.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:07:29] FIRING: ToolforgeToolviewsFailed: Toolviews processing failed - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeToolviewsFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeToolviewsFailed [19:10:39] RESOLVED: CephSlowOps: Ceph cluster in eqiad has slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [19:11:39] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [19:12:07] RESOLVED: ToolsDBWritableState: ToolsDB number of read/write instances is not 1 #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBWritableState [19:22:10] 10VPS-project-Codesearch: Codesearch is partially down (2026-03-19) - https://phabricator.wikimedia.org/T420644#11729683 (10SomeRandomDeveloper) 05Open→03Resolved Seems to be resolved [20:07:29] RESOLVED: ToolforgeToolviewsFailed: Toolviews processing failed - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeToolviewsFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeToolviewsFailed [20:30:17] !log tools.cluebotng Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/23315406679 (https://github.com/cluebotng/component-configs/commits/fd07020c08545c83ab35667616a26081966648df) [20:30:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL [20:46:14] FIRING: [6x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server tools-k8s-gateway-1.tools.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [20:46:33] FIRING: [4x] ProbeDown: Service tools-k8s-haproxy-7:30004 has failed probes (http_infra_tracing_loki_svc_tools_eqiad1_wikimedia_cloud_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:49:18] FIRING: IngressPodMisplaced: ingress-nginx-gen2 pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIngressPodMisplaced [20:50:44] FIRING: IstioGatewayPodMisplaced: istio-gateway pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIstioGatewayPodMisplaced [20:51:14] RESOLVED: [6x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server tools-k8s-gateway-1.tools.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [20:51:33] RESOLVED: [5x] ProbeDown: Service tools-k8s-haproxy-7:30004 has failed probes (http_infra_tracing_loki_svc_tools_eqiad1_wikimedia_cloud_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:52:46] FIRING: ProbeDown: Service tools-k8s-haproxy-8:443 has failed probes (http_api_svc_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:57:46] RESOLVED: [4x] ProbeDown: Service tools-k8s-haproxy-7:30004 has failed probes (http_infra_tracing_loki_svc_tools_eqiad1_wikimedia_cloud_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:57:59] FIRING: [14x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server tools-k8s-control-8.tools.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [20:58:44] FIRING: [2x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server tools-k8s-control-7.tools.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [20:58:51] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all nodes [21:02:59] RESOLVED: [16x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server tools-k8s-control-7.tools.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [21:11:32] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [21:26:32] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure