[00:09:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [01:21:45] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24221486227 (https://github.com/cluebotng/component-configs/commits/d96804861818d7786153d18d47be075a4dbbb6f2) [01:21:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [04:34:32] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24226442413 (https://github.com/cluebotng/component-configs/commits/4f895f83dae3f356cae2a1bbcfea51dd9d18bd15) [04:34:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [04:54:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [04:55:25] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [05:06:02] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, and 3 others: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11807371 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host clouddb1019.eqiad.wmnet with OS trixie [05:10:23] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, and 3 others: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11807373 (10Marostegui) p:05Triage→03Medium [05:40:40] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, and 2 others: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11807424 (10Marostegui) @Jclark-ctr I am not able to reimage the host, it is not rebooting, can you check onsite what's on the screen? I've tried several times to reboot it ma... [05:52:30] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11807450 (10Marostegui) >>! In T415165#11804490, @fnegri wrote: > Change of plans, @Marostegui will reimage clouddb10... [06:27:04] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, and 2 others: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11807473 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host clouddb1019.eqiad.wmnet with OS trixie executed with errors: - cl... [06:40:10] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [06:55:25] 10Tool-refill: delete 33,000 unnecessary front end files from production - https://phabricator.wikimedia.org/T422441#11807487 (10Novem_Linguae) [07:01:56] 10Tool-refill: delete 33,000 unnecessary front end files from production - https://phabricator.wikimedia.org/T422441#11807489 (10Novem_Linguae) Just now I killed off the /ng/ directory, and also got rid of the /public_html/ symlink. The compiled Vue files now reside in ~/public_html/ on https://refill.toolforge.... [07:10:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [07:22:18] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Engineering-Radar, 06DBA: Re-run maintainviews on all clouddb* and an-redacteddb1001.eqiad.wmnet - https://phabricator.wikimedia.org/T422459#11807510 (10Marostegui) @FCeratto-WMF has the parent's schema change progressed in any section th... [07:29:50] 06cloud-services-team, 10Toolforge: Connection with `k8s.tools.eqiad1.wikimedia.cloud` hits SSL error - https://phabricator.wikimedia.org/T422538#11807520 (10Nokib_Sarkar) hi, i have been seeing it for the past two days straight. I copied some useful debug information which might be beneficial. ` nokibsarkar@t... [07:59:25] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Improve linting - info.license - https://phabricator.wikimedia.org/T422908 (10KBach) 03NEW [08:26:59] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Improve linter rules for info.license validation - https://phabricator.wikimedia.org/T422912 (10KBach) 03NEW [08:28:45] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Improve linting - info.license - https://phabricator.wikimedia.org/T422908#11807674 (10KBach) [08:29:58] (03update) 10taavi: istio-gateway: Trust X-Forwarded-Proto from previous proxy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1218 (https://phabricator.wikimedia.org/T422829) [08:32:44] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway [08:32:56] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway [08:33:24] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway [08:33:37] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway [08:33:53] (03merge) 10taavi: istio-gateway: Trust X-Forwarded-Proto from previous proxy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1218 (https://phabricator.wikimedia.org/T422829) [08:34:44] FIRING: [2x] IstioGatewayPodMisplaced: istio-gateway pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIstioGatewayPodMisplaced [08:38:11] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Improve linting - info.license - https://phabricator.wikimedia.org/T422908#11807713 (10KBach) [08:39:44] RESOLVED: [2x] IstioGatewayPodMisplaced: istio-gateway pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIstioGatewayPodMisplaced [08:40:18] (03open) 10taavi: istio-gateway: Use maxSurge: 0 to improve pod replacement [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1219 [08:40:21] (03update) 10taavi: istio-gateway: Use maxSurge: 0 to improve pod replacement [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1219 [08:40:44] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Toolforge HTML head links sometimes are issued as http://.toolforge:443 - https://phabricator.wikimedia.org/T422829#11807716 (10taavi) 05Open→03Resolved [08:41:10] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway [08:41:22] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway [08:41:44] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway [08:41:57] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway [08:42:11] (03merge) 10taavi: istio-gateway: Use maxSurge: 0 to improve pod replacement [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1219 [09:07:52] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Engineering-Radar, 06DBA: Re-run maintainviews on all clouddb* and an-redacteddb1001.eqiad.wmnet - https://phabricator.wikimedia.org/T422459#11807745 (10FCeratto-WMF) Not yet, I'm looking at an error while updating s6 but in the meantime... [09:09:58] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team, 07Essential-Work: Spec caching: Cache validation results for identical OpenAPI specs - https://phabricator.wikimedia.org/T415916#11807777 (10KBach) [09:10:01] 10Tool-wmf-openapi-linter, 03[MWI] FY2025-26 Q4, 07Epic, 07OKR-Work: [5.2.5b Epic] Create a proof of concept of a CI workflow with the spec linter - https://phabricator.wikimedia.org/T422483#11807778 (10KBach) [09:10:12] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team, 07Essential-Work: [SPIKE] Figure out best approach for spec result caching - https://phabricator.wikimedia.org/T415917#11807779 (10KBach) [09:10:13] 10Tool-wmf-openapi-linter, 03[MWI] FY2025-26 Q4, 07Epic, 07OKR-Work: [5.2.5b Epic] Create a proof of concept of a CI workflow with the spec linter - https://phabricator.wikimedia.org/T422483#11807780 (10KBach) [09:10:34] 10Tool-wmf-openapi-linter, 03[MWI] FY2025-26 Q4, 07Epic, 07OKR-Work: [5.2.5b Epic] Create a proof of concept of a CI workflow with the spec linter - https://phabricator.wikimedia.org/T422483#11807782 (10KBach) [09:10:35] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team, 07Essential-Work: Lazy loading of ruleset: Implement caching layer for Spectral ruleset - https://phabricator.wikimedia.org/T415920#11807781 (10KBach) [09:15:48] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Improve linter rules for info.license validation - https://phabricator.wikimedia.org/T422912#11807809 (10KBach) [09:23:59] 10Tools: sal csp violation not showing on csp-report - https://phabricator.wikimedia.org/T422916#11807854 (10taavi) [09:32:20] 10Tool-wmf-openapi-linter, 07OKR-Work: Prepare the linter for use in CI - https://phabricator.wikimedia.org/T422918 (10KBach) 03NEW [09:32:37] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Prepare the linter for use in CI - https://phabricator.wikimedia.org/T422918#11807902 (10KBach) [09:33:39] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [builds-api] expose supported versions - https://phabricator.wikimedia.org/T422046#11807906 (10taavi) I think this would be useful. We already advertise `pack` at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Building_container_images#Testing_local... [09:41:22] 10Tool-wikimedia-attribution: Document reuse guidelines per content type - https://phabricator.wikimedia.org/T422919 (10Sarai-WMF) 03NEW [09:41:31] 10Tool-wikimedia-attribution: Document reuse guidelines by content type - https://phabricator.wikimedia.org/T422919#11807930 (10Sarai-WMF) [09:48:01] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Deploy the linter in CI - https://phabricator.wikimedia.org/T422920 (10KBach) 03NEW [10:02:46] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 06tools-platform-team, 13Patch-For-Review: [builds-builder] Add support for Heroku's "24" builder stack based on Ubuntu 2024.04 noble - https://phabricator.wikimedia.org/T380127#11808029 (10taavi) I've updated https://wikitech.wikimedia.org/wiki/Help:... [10:06:34] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Finalize other OAD resources - https://phabricator.wikimedia.org/T422924 (10KBach) 03NEW [10:07:17] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Finalize other OAD resources - https://phabricator.wikimedia.org/T422924#11808053 (10KBach) [10:07:18] 10Tool-wmf-openapi-linter, 03[MWI] FY2025-26 Q4, 07OKR-Work: [Hypothesis] 5.2.5b: Productionalize API spec linting - https://phabricator.wikimedia.org/T422476#11808054 (10KBach) [10:07:57] 06cloud-services-team, 10Toolforge: Add basic alerting for Toolforge Elasticsearch service - https://phabricator.wikimedia.org/T422925 (10taavi) 03NEW [10:10:55] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Define a process for OAD resource synchronization - https://phabricator.wikimedia.org/T422927 (10KBach) 03NEW [10:13:42] 06cloud-services-team, 10Toolforge: Can't run continuous job on Toolforge - https://phabricator.wikimedia.org/T422929 (10MBH) 03NEW [10:15:32] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Document linter rules - https://phabricator.wikimedia.org/T422930 (10KBach) 03NEW [10:21:12] (03open) 10taavi: elastic: Add basic cluster health alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/59 (https://phabricator.wikimedia.org/T422925) [10:21:13] (03update) 10taavi: elastic: Add basic cluster health alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/59 (https://phabricator.wikimedia.org/T422925) [10:26:36] 06cloud-services-team, 10Toolforge: Can't run continuous job on Toolforge - https://phabricator.wikimedia.org/T422929#11808170 (10Raymond_Ndibe) It seems like your job is missing some `toolforge environment variable`. To see your available environment variable, run `toolforge envvars list`. ` tools.mbh@tools-b... [10:27:30] 06cloud-services-team, 10Toolforge, 06tools-platform-team: Can't run continuous job on Toolforge - https://phabricator.wikimedia.org/T422929#11808175 (10Raymond_Ndibe) a:03Raymond_Ndibe [10:28:08] 06cloud-services-team, 10Toolforge, 06tools-platform-team: Can't run continuous job on Toolforge - https://phabricator.wikimedia.org/T422929#11808181 (10Raymond_Ndibe) can you verify that providing an environment variable fixes this problem for you @MBH ? [10:31:07] 06cloud-services-team, 10Toolforge, 06tools-platform-team: Can't run continuous job on Toolforge - https://phabricator.wikimedia.org/T422929#11808190 (10MBH) I already deleted "connection string" envvar, but this program clearly doesn't use it, doesn't use database connection at all and never used it. See co... [10:34:06] 06cloud-services-team, 10Toolforge, 06tools-platform-team: Can't run continuous job on Toolforge - https://phabricator.wikimedia.org/T422929#11808204 (10taavi) This seems to be a reoccurance of T365048 (and thus a real bug). Something's caused the pod to exit and restart (but without recreating the Pod API o... [10:44:20] 06cloud-services-team, 10Toolforge, 06tools-platform-team: Can't run continuous job on Toolforge - https://phabricator.wikimedia.org/T422929#11808225 (10MBH) `toolforge jobs restart file-renaming` worked, thanks. [10:50:17] (03update) 10taavi: elastic: Add basic cluster health alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/59 (https://phabricator.wikimedia.org/T422925) [10:50:42] 06cloud-services-team, 10Toolforge, 06tools-platform-team: Can't run continuous job on Toolforge - https://phabricator.wikimedia.org/T422929#11808228 (10Raymond_Ndibe) >>! In T422929#11808204, @taavi wrote: > This seems to be a reoccurance of T365048 (and thus a real bug). Something's caused the pod to exit... [10:50:55] 10Cloud-VPS, 06tools-platform-team: Openstack uwsgi logging to '.log' - https://phabricator.wikimedia.org/T422830#11808229 (10taavi) [10:51:18] 10Cloud-VPS, 06tools-platform-team: Consider allowing cumin access to all Cloud VPS VMs - https://phabricator.wikimedia.org/T422801#11808230 (10taavi) [10:51:29] 10Toolforge, 06tools-platform-team: Can't run continuous job on Toolforge - https://phabricator.wikimedia.org/T422929#11808231 (10taavi) [10:52:08] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Designate API timing out - https://phabricator.wikimedia.org/T422646#11808233 (10fgiunchedi) [10:55:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [10:56:25] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [11:21:57] (03approved) 10raymond-ndibe: models: fixed the right string for the JobsHttpHealtheckCheck [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/163 (owner: 10dcaro) [11:22:18] (03approved) 10raymond-ndibe: openapi - fix health check discriminator [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/285 (https://phabricator.wikimedia.org/T422753) (owner: 10damian) [11:22:45] (03merge) 10raymond-ndibe: openapi - fix health check discriminator [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/285 (https://phabricator.wikimedia.org/T422753) (owner: 10damian) [11:26:16] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.483-20260410112303-b93e57f2 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1220 (https://phabricator.wikimedia.org/T422753) [11:26:25] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.483-20260410112303-b93e57f2 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1220 (https://phabricator.wikimedia.org/T422753) [11:29:38] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [11:41:12] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [11:44:51] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [11:58:16] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [12:02:41] (03update) 10raymond-ndibe: jobs-api: bump to 0.0.483-20260410112303-b93e57f2 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1220 (https://phabricator.wikimedia.org/T422753) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [12:02:44] (03approved) 10raymond-ndibe: jobs-api: bump to 0.0.483-20260410112303-b93e57f2 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1220 (https://phabricator.wikimedia.org/T422753) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [12:02:56] (03merge) 10raymond-ndibe: jobs-api: bump to 0.0.483-20260410112303-b93e57f2 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1220 (https://phabricator.wikimedia.org/T422753) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [12:20:12] (03approved) 10filippo: elastic: Add basic cluster health alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/59 (https://phabricator.wikimedia.org/T422925) (owner: 10taavi) [12:27:17] (03update) 10damian: api: add supported versions endpoint [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/157 (https://phabricator.wikimedia.org/T422046) [13:06:31] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Designate API timing out - https://phabricator.wikimedia.org/T422646#11808560 (10fgiunchedi) I saved all openstack components logs from `/var/log` to `/root/filippo-T417393` on `cloudcontrol1007` and `cloudcontrol1011` to save them from rotation temporari... [13:10:44] FIRING: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [13:13:38] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Designate API timing out - https://phabricator.wikimedia.org/T422646#11808570 (10fgiunchedi) [13:15:44] RESOLVED: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [13:27:03] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24244959273 (https://github.com/cluebotng/component-configs/commits/2a6605ee2d07c0ff0d690aaa8aabed0ca35bab72) [13:27:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [13:31:56] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24245099622 (https://github.com/cluebotng/component-configs/commits/251c10040c01caf2ba9b855050c318d5d2fd8e81) [13:31:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [13:33:36] 10Cloud Services Proposals, 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 06tools-platform-team, and 4 others: [builds-api,components-api,webservice,jobs-api] Make Toolforge a proper platform as a service with push-to-deploy and build packs - https://phabricator.wikimedia.org/T194332#11808642 (10a... [13:36:38] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24245353001 (https://github.com/cluebotng/component-configs/commits/97eebf1bcdf5be901e0d3fd82c1b3ea6a8668163) [13:36:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [13:44:35] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24245816710 (https://github.com/cluebotng/component-configs/commits/6181fdda40150d3535541f3084ac7ff245f19536) [13:44:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [13:44:53] 10Tool-wmf-openapi-linter, 06Tech-Docs-Team, 03[MWI] FY2025-26 Q4, 07OKR-Work: [Hypothesis] 5.2.5b: Productionalize API spec linting - https://phabricator.wikimedia.org/T422476#11808686 (10apaskulin) [13:45:59] (03update) 10raymond-ndibe: models: fixed the right string for the JobsHttpHealtheckCheck [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/163 (owner: 10dcaro) [13:48:30] (03merge) 10raymond-ndibe: models: fixed the right string for the JobsHttpHealtheckCheck [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/163 (owner: 10dcaro) [13:51:16] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.192-20260410134843-833c1527 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1221 [13:52:31] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [13:57:09] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [13:59:11] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24246395413 (https://github.com/cluebotng/component-configs/commits/945fa198e64a0e63b777bb570d57d68ef0ce3f69) [13:59:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [14:05:55] (03merge) 10taavi: elastic: Add basic cluster health alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/59 (https://phabricator.wikimedia.org/T422925) [14:06:14] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Add basic alerting for Toolforge Elasticsearch service - https://phabricator.wikimedia.org/T422925#11808750 (10taavi) 05Open→03Resolved a:03taavi [14:07:08] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api [14:12:00] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [14:14:28] 10Tool-wikimedia-attribution: Document attribution guidelines by content type - https://phabricator.wikimedia.org/T422919#11808758 (10Sarai-WMF) [14:16:10] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [14:16:25] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [14:16:53] (03update) 10raymond-ndibe: components-api: bump to 0.0.192-20260410134843-833c1527 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1221 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:16:54] (03approved) 10raymond-ndibe: components-api: bump to 0.0.192-20260410134843-833c1527 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1221 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:16:57] (03merge) 10raymond-ndibe: components-api: bump to 0.0.192-20260410134843-833c1527 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1221 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:19:02] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: oslo.messaging does not failover to the next rabbit host on traffic blackhole situations - https://phabricator.wikimedia.org/T422820#11808792 (10fgiunchedi) I went back through the cloudcontrol1007 logs to see how extensive this problem is, P90364 contain... [14:22:33] !log tools.cluebot3 Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24247576870 (https://github.com/cluebotng/component-configs/commits/09b7afbf5d13900afa4cbcc31de73b324f45b3a9) [14:22:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebot3/SAL [14:23:21] !log tools.cluebotng Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24247570676 (https://github.com/cluebotng/component-configs/commits/93ced49392782bf65e34d13f10cbeaafa760f115) [14:23:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL [14:23:47] !log tools.cluebotng-editsets Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24247592403 (https://github.com/cluebotng/component-configs/commits/3a2e7dc89daf45ae2dc369cb6864577d5b18a74b) [14:23:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-editsets/SAL [14:23:59] !log tools.toolforge-functional-runner Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24247582903 (https://github.com/cluebotng/component-configs/commits/4c3786d55025827d330be1505ae5c35b0beed2b5) [14:23:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.toolforge-functional-runner/SAL [14:24:12] !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24247582931 (https://github.com/cluebotng/component-configs/commits/4c3786d55025827d330be1505ae5c35b0beed2b5) [14:24:12] !log tools.cluebot-syncer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24247617592 (https://github.com/cluebotng/component-configs/commits/d4e8e8f9cf7c6848df8726cc0392beac1a400b86) [14:24:12] !log tools.cluebotng-staging Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24247572438 (https://github.com/cluebotng/component-configs/commits/e5facd6bf8968a234c139ac18c8d2e72f8345d9e) [14:24:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [14:24:12] !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24247598064 (https://github.com/cluebotng/component-configs/commits/7ec9cb1bdb9d3218d0388086118e3609ccb68956) [14:24:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebot-syncer/SAL [14:24:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL [14:24:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [14:25:14] !log tools.cluebotng Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24247614873 (https://github.com/cluebotng/component-configs/commits/30bda68a3ea7a1674d174e43cc8651d301c7485c) [14:25:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL [14:25:58] !log tools.cluebotng-monitoring Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24247609205 (https://github.com/cluebotng/component-configs/commits/51257ea555ac174e0fe397b1923edbf76fada3dc) [14:25:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [14:26:00] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24247581365 (https://github.com/cluebotng/component-configs/commits/49becfde53d5f960c8e4df0484cebb2bb4d4c5aa) [14:26:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [14:27:29] !log tools.cluebotng-staging Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24247614901 (https://github.com/cluebotng/component-configs/commits/30bda68a3ea7a1674d174e43cc8651d301c7485c) [14:27:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL [14:27:45] !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24247621716 (https://github.com/cluebotng/component-configs/commits/2e25667a8eb92011273ca47b217f3546da48cae9) [14:27:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [14:27:52] !log tools.cluebotng-review Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24247620609 (https://github.com/cluebotng/component-configs/commits/e63a941f5b83d97a9751af731c869062ceef4519) [14:27:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [14:34:25] (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/weapon-of-mass-description] - 10https://gerrit.wikimedia.org/r/1269437 (owner: 10L10n-bot) [14:38:52] 10Tool-wikimedia-attribution: Document attribution guidelines by content type - https://phabricator.wikimedia.org/T422919#11808854 (10HCoplin-WMF) Just chiming in that we have a spreadsheet for the Attribution API that captures what fields are returned on which types of content, as informed by the framework: htt... [14:49:00] !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24248696057 (https://github.com/cluebotng/component-configs/commits/310efa6d212253bd549cddd16d85f4167b26f684) [14:49:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [14:50:30] !log tools.cluebotng-staging Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24248770650 (https://github.com/cluebotng/component-configs/commits/30b6b38a296396ea7c907cb624fb55006729e637) [14:50:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL [14:52:44] FIRING: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [14:53:17] FIRING: AlertLintProblem: Linting problems found for ElasticsearchHealth - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [14:55:20] (03open) 10taavi: elastic: Only deploy to tools project [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/60 (https://phabricator.wikimedia.org/T422925) [14:55:23] (03update) 10taavi: elastic: Only deploy to tools project [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/60 (https://phabricator.wikimedia.org/T422925) [14:57:46] (03merge) 10taavi: elastic: Only deploy to tools project [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/60 (https://phabricator.wikimedia.org/T422925) [15:03:46] !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24249371602 (https://github.com/cluebotng/component-configs/commits/acce8e3abd2da42aef50053d1c1126daec711da1) [15:03:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [15:12:50] 06cloud-services-team, 10Toolforge, 06tools-platform-team: [components-api] failing deployment 422 from jobs-api - https://phabricator.wikimedia.org/T422753#11808959 (10DamianZaremba) 05Open→03Resolved a:03DamianZaremba Confirming my tools are now working (with http health checking) as expected. T... [15:14:25] !log tools.toolforge-functional-runner Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24249897971 (https://github.com/cluebotng/component-configs/commits/68514222ba9a90ece524baf75b02c9835faf87d3) [15:14:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.toolforge-functional-runner/SAL [15:14:30] !log tools.cluebotng-editsets Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24249898009 (https://github.com/cluebotng/component-configs/commits/68514222ba9a90ece524baf75b02c9835faf87d3) [15:14:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-editsets/SAL [15:14:41] !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24249897956 (https://github.com/cluebotng/component-configs/commits/68514222ba9a90ece524baf75b02c9835faf87d3) [15:14:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [15:14:53] !log tools.cluebotng-staging Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24249897967 (https://github.com/cluebotng/component-configs/commits/68514222ba9a90ece524baf75b02c9835faf87d3) [15:14:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL [15:15:00] !log tools.cluebotng Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24249897963 (https://github.com/cluebotng/component-configs/commits/68514222ba9a90ece524baf75b02c9835faf87d3) [15:15:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL [15:16:22] !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24249897960 (https://github.com/cluebotng/component-configs/commits/68514222ba9a90ece524baf75b02c9835faf87d3) [15:16:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [15:17:14] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24249898021 (https://github.com/cluebotng/component-configs/commits/68514222ba9a90ece524baf75b02c9835faf87d3) [15:17:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [15:19:07] !log tools.cluebotng-staging Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24250023341 (https://github.com/cluebotng/component-configs/commits/d9e72fa744a319bc8d37238dc1895ad5d11732ba) [15:19:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL [15:23:17] RESOLVED: AlertLintProblem: Linting problems found for ElasticsearchHealth - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [15:26:31] !log tools.cluebotng-review Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24250442439 (https://github.com/cluebotng/component-configs/commits/bfa8b761a017e9b8bb69ae52c5cb731d17bd324f) [15:26:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [15:26:40] !log tools.cluebotng Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24250442412 (https://github.com/cluebotng/component-configs/commits/bfa8b761a017e9b8bb69ae52c5cb731d17bd324f) [15:26:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL [15:26:42] !log tools.cluebotng-monitoring Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24250442419 (https://github.com/cluebotng/component-configs/commits/bfa8b761a017e9b8bb69ae52c5cb731d17bd324f) [15:26:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [15:26:50] !log tools.cluebotng-trainer Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24250442397 (https://github.com/cluebotng/component-configs/commits/bfa8b761a017e9b8bb69ae52c5cb731d17bd324f) [15:26:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [15:27:17] !log tools.cluebotng-staging Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24250442472 (https://github.com/cluebotng/component-configs/commits/bfa8b761a017e9b8bb69ae52c5cb731d17bd324f) [15:27:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL [15:27:19] !log tools.cluebot3 Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24250442392 (https://github.com/cluebotng/component-configs/commits/bfa8b761a017e9b8bb69ae52c5cb731d17bd324f) [15:27:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebot3/SAL [15:27:44] RESOLVED: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [15:27:50] !log tools.cluebot-syncer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24250442402 (https://github.com/cluebotng/component-configs/commits/bfa8b761a017e9b8bb69ae52c5cb731d17bd324f) [15:27:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebot-syncer/SAL [15:27:59] !log tools.cluebotng-editsets Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24250442420 (https://github.com/cluebotng/component-configs/commits/bfa8b761a017e9b8bb69ae52c5cb731d17bd324f) [15:27:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-editsets/SAL [15:28:06] !log tools.toolforge-functional-runner Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24250442403 (https://github.com/cluebotng/component-configs/commits/bfa8b761a017e9b8bb69ae52c5cb731d17bd324f) [15:28:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.toolforge-functional-runner/SAL [15:34:38] !log tools.cluebotng-staging Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24250714955 (https://github.com/cluebotng/component-configs/commits/6c8100fde23d02e6b289d65ddd7fc06332eabee3) [15:34:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL [15:37:50] !log tools.cluebotng-staging Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24250823601 (https://github.com/cluebotng/component-configs/commits/78526aec436ada2123387a5fc328bf8e3fe7d4a8) [15:37:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL [15:53:47] !log tools.cluebotng-review Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24251572443 (https://github.com/cluebotng/component-configs/commits/2426c8db99c6d44c954ced07c9f41fcaa9e8e549) [15:53:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [16:05:23] 06cloud-services-team, 10Toolforge: [components-api] build quota failures leave state unclear - https://phabricator.wikimedia.org/T422951 (10DamianZaremba) 03NEW [16:08:46] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24251572443 (https://github.com/cluebotng/component-configs/commits/2426c8db99c6d44c954ced07c9f41fcaa9e8e549) [16:08:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [16:31:01] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team, 07OKR-Work: Validate the MediaWiki REST API OpenAPI description - https://phabricator.wikimedia.org/T422917#11809113 (10BPirkle) [16:31:10] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [16:33:40] 06cloud-services-team, 10Cloud-VPS, 07Upstream: openstack flamingo: "'enabled' is a required property" for LDAP-managed users - https://phabricator.wikimedia.org/T416483#11809117 (10Andrew) 05Open→03Resolved Upstream has acknowledged this and think it's fixed. [16:37:21] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: [SPIKE] Verify info.license in non-generated specs - https://phabricator.wikimedia.org/T422911#11809130 (10BPirkle) [17:06:54] 10Tool-etherpad-backup: Archive etherpads mentioned in hackathon/techconf tasks - https://phabricator.wikimedia.org/T417207#11809217 (10bd808) [17:07:55] 10Cloud-VPS (Project-requests), 10Tool-etherpad-backup: Request creation of etherpads3 VPS project - https://phabricator.wikimedia.org/T420532#11809221 (10bd808) [17:08:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [17:13:09] 10Tool-etherpad-backup: Setup proof of concept storage and retrieval from WMCS object storage - https://phabricator.wikimedia.org/T422958 (10bd808) 03NEW [17:13:37] 10Tool-etherpad-backup: Setup proof of concept storage and retrieval from WMCS object storage - https://phabricator.wikimedia.org/T422958#11809258 (10bd808) 05Open→03In progress p:05Triage→03Medium a:03bd808 [17:24:45] 10Cloud-VPS, 06tools-platform-team: Consider allowing cumin access to all Cloud VPS VMs - https://phabricator.wikimedia.org/T422801#11809294 (10Andrew) We can likely have cloudinit add a public cumin key to all hosts on creation, but I have a few thoughts: - I don't think it's enough to just replace our magnu... [17:56:50] (03update) 10danyya: Draft: Migrate to SQLite [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/3 [17:58:35] (03update) 10danyya: Migrate to SQLite [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/3 [17:58:48] (03merge) 10danyya: Migrate to SQLite [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/3 [18:02:45] 10Tool-curator: Curator should handle MediaWiki\Upload\Exception\UploadStashFileException Commons failure during uploads - https://phabricator.wikimedia.org/T420959#11809457 (10DaxServer) 05In progress→03Resolved [18:02:51] 10Tool-curator: Curator should handle "Nonce already used" errors in Commons API calls during uploads - https://phabricator.wikimedia.org/T420960#11809458 (10DaxServer) 05In progress→03Resolved [18:03:00] 10Tool-curator: Curator should handle Commons backend storage failure during uploads - https://phabricator.wikimedia.org/T420956#11809461 (10DaxServer) 05In progress→03Resolved [18:03:16] 10Tool-curator: Curator should handle invalid CSRF token errors during uploads - https://phabricator.wikimedia.org/T420961#11809465 (10DaxServer) 05In progress→03Resolved [18:03:40] 10Tool-curator: Save presets in Curator - https://phabricator.wikimedia.org/T417683#11809471 (10DaxServer) 05In progress→03Resolved [18:22:55] 10Tool-humaniki-2: Migrate to SQLite - https://phabricator.wikimedia.org/T422329#11809530 (10Danya) 05In progress→03Resolved [18:22:56] 10Tool-humaniki-2: Setup unit tests - https://phabricator.wikimedia.org/T422348#11809532 (10Danya) 05In progress→03Resolved [18:27:55] 10Tool-humaniki-2: Setup push to deploy - https://phabricator.wikimedia.org/T422347#11809540 (10Danya) 05Open→03In progress [18:29:01] (03open) 10danyya: Draft: Setup push to deploy [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/4 [18:37:17] FIRING: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:38:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [18:42:17] RESOLVED: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:50:45] (03update) 10danyya: Draft: Setup push to deploy [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/4 [19:16:21] (03update) 10danyya: Draft: Setup push to deploy [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/4 [19:24:17] FIRING: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [19:29:17] RESOLVED: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [19:30:10] (03update) 10danyya: Draft: Setup push to deploy [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/4 [19:30:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [19:31:57] (03update) 10danyya: Draft: Setup push to deploy [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/4 [19:33:36] (03update) 10danyya: Draft: Setup push to deploy [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/4 [19:35:20] (03update) 10danyya: Draft: Setup push to deploy [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/4 [19:35:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [19:36:46] (03update) 10danyya: Draft: Setup push to deploy [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/4 [19:41:06] (03update) 10danyya: Draft: Setup push to deploy [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/4 [19:41:50] (03update) 10danyya: Draft: Setup push to deploy [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/4 [19:42:35] (03update) 10danyya: Draft: Setup push to deploy [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/4 [19:50:09] (03close) 10danyya: Draft: Setup push to deploy [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/4 [20:28:39] 10Tool-humaniki-2: Setup push to deploy - https://phabricator.wikimedia.org/T422347#11809680 (10Danya) 05In progress→03Stalled Turns out this is not possible yet for webservices… [20:29:17] FIRING: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [20:31:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [20:36:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [20:44:17] RESOLVED: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [20:49:17] FIRING: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [20:59:17] RESOLVED: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [21:01:07] 06cloud-services-team, 10Cloud-VPS: Keystone logs no longer appearing in logstash - https://phabricator.wikimedia.org/T421911#11809759 (10Andrew) The root of this seems to be that keystone has stopped logging with the name 'keystone,' instead using the name '' ` Apr 10 20:59:53 c... [21:14:33] (03CR) 10Alien4444: [C:03+2] Localisation updates from https://translatewiki.net. [labs/xtools] - 10https://gerrit.wikimedia.org/r/1269433 (owner: 10L10n-bot) [21:16:17] FIRING: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [21:31:17] RESOLVED: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [21:38:16] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, and 2 others: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11809821 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host clouddb1019.eqiad.wmnet with OS trixie [21:45:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [21:46:28] 10Tool-etherpad-backup: Setup proof of concept storage and retrieval from WMCS object storage - https://phabricator.wikimedia.org/T422958#11809842 (10bd808) [21:49:56] 10Tool-etherpad-backup: Setup proof of concept storage and retrieval from WMCS object storage - https://phabricator.wikimedia.org/T422958#11809844 (10bd808) The new service user is https://ldap.toolforge.org/user/etherpadbackupbot / https://meta.wikimedia.org/wiki/User:EtherpadBackupBot. I have the passwords for... [21:51:24] 10Tool-etherpad-backup: Setup proof of concept storage and retrieval from WMCS object storage - https://phabricator.wikimedia.org/T422958#11809845 (10bd808) [21:52:19] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, and 2 others: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11809846 (10Jclark-ctr) I have not had any luck with getting it to power on. I will start Monday with pulling parts from decom servers to try to get it back up. [21:53:08] 10Tool-humaniki-2: Setup pre-commit hooks - https://phabricator.wikimedia.org/T422408#11809852 (10Danya) 05Open→03Resolved [21:53:39] 10Tool-humaniki-2: Setup push to deploy - https://phabricator.wikimedia.org/T422347#11809853 (10Danya) p:05High→03Low [21:55:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [22:07:25] (03CR) 10Cwhite: [C:03+2] add beta-logs pki key [labs/private] - 10https://gerrit.wikimedia.org/r/1268683 (https://phabricator.wikimedia.org/T350516) (owner: 10Cwhite) [22:07:34] (03CR) 10Cwhite: [V:03+2 C:03+2] add beta-logs pki key [labs/private] - 10https://gerrit.wikimedia.org/r/1268683 (https://phabricator.wikimedia.org/T350516) (owner: 10Cwhite) [22:30:08] (03PS1) 10Cwhite: logging: add dummy pki "secrets" [labs/private] - 10https://gerrit.wikimedia.org/r/1270089 (https://phabricator.wikimedia.org/T350516) [22:30:56] (03CR) 10Cwhite: [V:03+2 C:03+2] logging: add dummy pki "secrets" [labs/private] - 10https://gerrit.wikimedia.org/r/1270089 (https://phabricator.wikimedia.org/T350516) (owner: 10Cwhite) [22:31:39] (03CR) 10MusikAnimal: "L10n-bot was supposed to self-merge this… hmm." [labs/xtools] - 10https://gerrit.wikimedia.org/r/1269433 (owner: 10L10n-bot) [22:50:01] !log tools.cluebotng-review Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24267603054 (https://github.com/cluebotng/component-configs/commits/ba252b54cec9387b47dd4ac4a347d4a9c5118c3e) [22:50:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [23:04:52] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24268031378 (https://github.com/cluebotng/component-configs/commits/31367659ada078f50022f1df4b16b6139db27c09) [23:04:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [23:19:03] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24268430819 (https://github.com/cluebotng/component-configs/commits/3652893dce02243971055a6ab740363f103ce104) [23:19:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [23:27:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [23:32:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity