[00:00:58] (03update) 10bd808: Convert project to golang [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/6 [00:03:55] (03approved) 10bd808: Convert project to golang [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/6 [00:04:04] (03merge) 10bd808: Convert project to golang [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/6 [00:08:02] (03open) 10bd808: Delete .htaccess [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/7 [00:08:54] (03merge) 10bd808: Delete .htaccess [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/7 [00:11:05] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1039.eqiad.wmnet' (T394727) [00:11:12] T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727 [00:21:09] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1039.eqiad.wmnet' (T394727) [00:21:16] T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727 [00:49:29] (03open) 10bd808: Drop inbound Accept-Encoding header to avoid double gzip [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/8 [00:50:27] (03merge) 10bd808: Drop inbound Accept-Encoding header to avoid double gzip [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/8 [01:08:41] (03open) 10raymond-ndibe: [components.models.api_models] add config version check [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/74 (https://phabricator.wikimedia.org/T394273) [02:07:13] (03open) 10raymond-ndibe: [components-api] support health-checks and port [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/75 (https://phabricator.wikimedia.org/T362072) [02:18:49] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,cinder [02:18:56] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for service: project,cinder [02:22:05] 10Toolforge (Toolforge iteration 20): [components-api,buildsa-api] When building and deploying, if none of the settings changed, the jobs are not restarted - https://phabricator.wikimedia.org/T389044#10837733 (10Raymond_Ndibe) @dcaro this seems like something that should be done on the `jobs-api` side. What's yo... [02:23:07] 10Toolforge (Toolforge iteration 20): [components-api,buildsa-api] When building and deploying, if none of the settings changed, the jobs are not restarted - https://phabricator.wikimedia.org/T389044#10837734 (10Raymond_Ndibe) a:03Raymond_Ndibe [03:01:28] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,cinder [03:01:38] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for service: project,cinder [03:09:41] (03open) 10raymond-ndibe: [envvars-api.local] set image pullPolicy to Always [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/790 (https://phabricator.wikimedia.org/T391966) [03:10:14] (03update) 10raymond-ndibe: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [03:10:23] (03update) 10raymond-ndibe: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [03:12:15] (03update) 10raymond-ndibe: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [03:13:10] (03update) 10raymond-ndibe: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [03:13:37] (03update) 10raymond-ndibe: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [03:14:17] (03approved) 10raymond-ndibe: [envvars-api.local] set image pullPolicy to Always [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/790 (https://phabricator.wikimedia.org/T391966) [03:39:08] (03update) 10raymond-ndibe: [envvars-api] return custom message for invalid EnvvarName [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/53 (https://phabricator.wikimedia.org/T360147) [03:48:45] (03update) 10raymond-ndibe: [jobs-api] refactor quota models [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/164 (https://phabricator.wikimedia.org/T389118) [03:48:59] (03update) 10raymond-ndibe: [jobs-api] refactor quota models [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/164 (https://phabricator.wikimedia.org/T389118) [03:49:07] (03update) 10raymond-ndibe: [jobs-api] refactor quota models [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/164 (https://phabricator.wikimedia.org/T389118) [04:24:01] RESOLVED: ToolsNfsAlmostFull: Toolforge NFS is 0.8607519014138785/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull [04:31:57] (03open) 10raymond-ndibe: [components-api.openapi] add an openapi spec endpoint [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/76 [04:52:51] (03update) 10raymond-ndibe: [components-api.openapi] add an openapi spec endpoint [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/76 [04:57:55] (03update) 10raymond-ndibe: [components-api.openapi] add an openapi spec endpoint [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/76 [04:58:24] (03approved) 10raymond-ndibe: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [04:59:52] (03merge) 10raymond-ndibe: [envvars-api.local] set image pullPolicy to Always [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/790 (https://phabricator.wikimedia.org/T391966) [05:00:13] (03merge) 10raymond-ndibe: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [05:02:43] (03approved) 10raymond-ndibe: [runtimes.k8s.jobs] fix default resource bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/168 [05:02:46] (03merge) 10raymond-ndibe: [runtimes.k8s.jobs] fix default resource bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/168 [05:02:49] (03update) 10raymond-ndibe: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [05:03:29] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: envvars-api: bump to 0.0.69-20250520050021-383f7616 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/791 (https://phabricator.wikimedia.org/T391966) [05:05:18] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.375-20250520050258-e0edf661 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/792 [07:05:41] FIRING: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:15:41] RESOLVED: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:14:34] 06cloud-services-team, 10Toolforge: Decouple Toolforge API gateway authentication from Kubernetes certificates - https://phabricator.wikimedia.org/T332478#10837981 (10dcaro) The related task with some pre-work done is {T363983} [08:31:46] (03update) 10arthurtaylor: Add support for build completion notification [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/10 (https://phabricator.wikimedia.org/T392892) [08:47:54] 10Toolforge (Toolforge iteration 20): [components-api,buildsa-api] When building and deploying, if none of the settings changed, the jobs are not restarted - https://phabricator.wikimedia.org/T389044#10838051 (10dcaro) >>! In T389044#10837733, @Raymond_Ndibe wrote: > @dcaro this seems like something that should... [08:48:08] (03open) 10taavi: tools: Add tools- prefix to Prometheus volume names [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/33 [08:49:14] (03merge) 10taavi: tools: Add tools- prefix to Prometheus volume names [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/33 [08:55:06] (03open) 10taavi: Create tools-prometheus-8 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/34 (https://phabricator.wikimedia.org/T393697) [08:55:57] (03update) 10taavi: Create tools-prometheus-8 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/34 (https://phabricator.wikimedia.org/T393697) [08:56:37] (03update) 10taavi: Create tools-prometheus-8 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/34 (https://phabricator.wikimedia.org/T393697) [09:01:46] (03update) 10taavi: Create tools-prometheus-8 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/34 (https://phabricator.wikimedia.org/T393697) [09:11:51] (03update) 10taavi: Create tools-prometheus-8 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/34 (https://phabricator.wikimedia.org/T393697) [09:16:52] 10Toolforge (Toolforge iteration 20): [envvars] show the 'global' envvars when running `toolforge envvars list` - https://phabricator.wikimedia.org/T394408#10838123 (10dcaro) [09:17:57] 10Toolforge (Toolforge iteration 20): mypy x509 invalid syntax while running CI tests - https://phabricator.wikimedia.org/T394593#10838126 (10dcaro) I did not see any issues with this yesterday or today, maybe it was a temporary error? @Raymond_Ndibe can you paste/link to the failed jobs if any? (we can resolve... [09:18:18] 10Toolforge (Toolforge iteration 20): [components-api] deploy on tools - https://phabricator.wikimedia.org/T394337#10838127 (10dcaro) a:03dcaro [09:18:20] 10Toolforge (Toolforge iteration 20): [components-api] deploy on tools - https://phabricator.wikimedia.org/T394337#10838129 (10dcaro) 05Open→03In progress [09:18:33] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 07IPv6, 13Patch-For-Review: Rebuild Toolforge Prometheus nodes in v6-dualstack network - https://phabricator.wikimedia.org/T393697#10838132 (10dcaro) 05Open→03In progress [09:18:52] 10Toolforge (Toolforge iteration 20): [components-cli] Add a warning message saying it's 'beta' - https://phabricator.wikimedia.org/T394277#10838139 (10dcaro) 05Open→03In progress [09:19:25] (03approved) 10aborrero: Create tools-prometheus-8 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/34 (https://phabricator.wikimedia.org/T393697) (owner: 10taavi) [09:20:59] (03update) 10aborrero: dns: use zonename_recordname as opentofu state key [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/22 (https://phabricator.wikimedia.org/T394645) [09:21:41] (03approved) 10taavi: dns: use zonename_recordname as opentofu state key [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/22 (https://phabricator.wikimedia.org/T394645) (owner: 10aborrero) [09:22:05] (03merge) 10aborrero: dns: use zonename_recordname as opentofu state key [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/22 (https://phabricator.wikimedia.org/T394645) [09:23:02] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.quota_increase [09:23:07] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-api] add tool config version check - https://phabricator.wikimedia.org/T394273#10838153 (10dcaro) a:03Raymond_Ndibe [09:23:08] (03update) 10taavi: Create tools-prometheus-8 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/34 (https://phabricator.wikimedia.org/T393697) [09:23:09] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-api] add tool config version check - https://phabricator.wikimedia.org/T394273#10838155 (10dcaro) 05Open→03In progress [09:23:10] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) [09:25:30] (03open) 10aborrero: tofu-provisioning: cleanup unused files [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/35 [09:26:07] (03merge) 10taavi: Create tools-prometheus-8 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/34 (https://phabricator.wikimedia.org/T393697) [09:26:40] (03update) 10aborrero: tofu-provisioning: cleanup unused files [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/35 [09:28:23] 10Toolforge (Toolforge iteration 20): [components-api] Add endpoint to get what would be the "current" config - https://phabricator.wikimedia.org/T394753 (10dcaro) 03NEW [09:32:45] (03merge) 10aborrero: tofu-provisioning: cleanup unused files [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/35 [09:33:29] 10Toolforge (Toolforge iteration 20): [components-api] Add endpoint to get what would be the "current" config - https://phabricator.wikimedia.org/T394753#10838195 (10dcaro) p:05Triage→03Medium [09:34:18] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-prometheus-8.tools.eqiad1.wikimedia.cloud [09:35:29] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-prometheus-8.tools.eqiad1.wikimedia.cloud [09:35:57] 06cloud-services-team, 10Toolforge: toolforge: tofu-provisioning: reorganize DNS records in the state - https://phabricator.wikimedia.org/T394645#10838198 (10aborrero) 05In progress→03Resolved [09:38:31] 10Toolforge (Toolforge iteration 20), 07good first task: [components-api] Add basic prometheus metrics - https://phabricator.wikimedia.org/T394276#10838205 (10dcaro) Thank you for tagging this task with #good_first_task for Wikimedia newcomers! Newcomers often may not be aware of things that may seem obvious... [09:38:58] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-api] Add support for port/helathcheck for continuous jobs in tool config/depolyment - https://phabricator.wikimedia.org/T362072#10838210 (10dcaro) 05Open→03In progress [09:39:15] 10Toolforge (Toolforge iteration 20): [components-api,buildsa-api] When building and deploying, if none of the settings changed, the jobs are not restarted - https://phabricator.wikimedia.org/T389044#10838214 (10dcaro) 05Open→03In progress [09:40:46] (03open) 10taavi: Manage tools Prometheus web proxies [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/36 (https://phabricator.wikimedia.org/T393697) [09:43:52] (03update) 10taavi: Manage tools Prometheus web proxies [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/36 (https://phabricator.wikimedia.org/T393697) [09:45:34] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 13Patch-For-Review: Remove the compatibility layer of block schema in wikireplicas - https://phabricator.wikimedia.org/T390767#10838229 (10fnegri) This was also announced in [Tech News: 2025-18](https://meta.wikimedia.org/wiki/Tech/News/2025/18), I... [10:03:22] (03approved) 10dcaro: Manage tools Prometheus web proxies [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/36 (https://phabricator.wikimedia.org/T393697) (owner: 10taavi) [10:05:09] (03merge) 10taavi: Manage tools Prometheus web proxies [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/36 (https://phabricator.wikimedia.org/T393697) [10:24:22] (03CR) 10D3r1ck01: [C:03+2] "Yes https://en.wikipedia.org/wiki/Metadata" [labs/tools/intuition] - 10https://gerrit.wikimedia.org/r/1147856 (owner: 10Amire80) [10:25:08] (03Merged) 10jenkins-bot: Consistent spelling of "metadata" in a message [labs/tools/intuition] - 10https://gerrit.wikimedia.org/r/1147856 (owner: 10Amire80) [10:31:21] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Engineering, 06Data-Persistence, and 2 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10838343 (10fnegri) > you first need to stop mariadb Sorry my message was not clear enough, I was implying t... [10:32:36] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Engineering, 06Data-Persistence, and 2 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10838345 (10fnegri) > run-puppet-agent (will this automatically install 1011? do I need a puppet patch?) Ok,... [10:36:53] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Engineering, 06Data-Persistence, and 3 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10838374 (10fnegri) > Where is that in Puppet? Found it: https://gerrit.wikimedia.org/r/c/operations/puppet/... [11:20:21] 06cloud-services-team, 10Data-Services, 06DBA, 10Wikidata, 07User-notice: Set up x3 replication to wikireplicas - https://phabricator.wikimedia.org/T390954#10838542 (10taavi) [11:31:17] 10Tool-refill: Inserts space character before close brackets in url link - https://phabricator.wikimedia.org/T394766 (10Rich_Farmbrough) 03NEW [11:37:49] 10Tool-refill: massive error.log file spam due to LabsDB.php PHP fatal error - https://phabricator.wikimedia.org/T389917#10838622 (10Rich_Farmbrough) Does this need additional tags? [11:42:15] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Engineering, 06Data-Persistence, and 3 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10838639 (10Marostegui) >>! In T394372#10838374, @fnegri wrote: >> Where is that in Puppet? > > Found it: ht... [12:03:37] (03open) 10taavi: service: Always specify the full web proxy host [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/37 [12:04:24] (03update) 10taavi: service: Always specify the full web proxy host [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/37 [12:05:18] (03update) 10taavi: service: Always specify the full web proxy host [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/37 [12:07:26] (03update) 10taavi: service: Always specify the full web proxy host [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/37 [12:08:53] (03update) 10taavi: service: Always specify the full web proxy host [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/37 [12:11:20] (03approved) 10aborrero: service: Always specify the full web proxy host [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/37 (owner: 10taavi) [12:11:50] (03update) 10taavi: Manage tools Prometheus web proxies [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/36 (https://phabricator.wikimedia.org/T393697) [12:12:02] (03merge) 10taavi: service: Always specify the full web proxy host [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/37 [12:13:13] 10Tool-campwiz-nxt: Implement Reverse proxy and Failover server into campwiz nxt - https://phabricator.wikimedia.org/T394730#10838799 (10Nokib_Sarkar) p:05Triage→03Low [12:18:02] 06cloud-services-team, 10Data-Services, 06Data-Engineering: Create existencelinks table in production - https://phabricator.wikimedia.org/T394617#10838837 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup I just created the tables in all wikis. >>! In T394617#10833232, @Bugreporter wrote: > This tab... [13:05:15] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-prometheus-7 [13:05:26] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-prometheus-7 [13:05:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:07:55] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-prometheus-7 [13:08:06] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-prometheus-7 [13:10:29] (03open) 10taavi: Create tools-prometheus-9 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/38 (https://phabricator.wikimedia.org/T393697) [13:15:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:33:12] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10839267 (10Wangombe) >>! In T393850#10825236, @Nokib_Sarkar wrote: > @Wangombe Would [t... [13:33:22] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10839269 (10abi_) >>! In T393850#10825236, @Nokib_Sarkar wrote: > @Wangombe Would [this]... [14:03:54] 06cloud-services-team, 10Cloud-VPS: wmcs-enc-cli: keystoneauth1.exceptions.http.Forbidden: You are not authorized to perform the requested action: identity:list_services. - https://phabricator.wikimedia.org/T394775#10839384 (10taavi) [14:09:16] (03PS1) 10Jforrester: composer: Replace legoktm/clover-diff with wikimedia/clover-diff [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1148349 [14:14:29] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30) - https://phabricator.wikimedia.org/T362869#10839449 (10dcaro) [14:16:14] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): [kyverno] Upgrade to `3.3.9` chart (`1.13` app) for k8s 1.30 support - https://phabricator.wikimedia.org/T394787 (10dcaro) 03NEW [14:16:52] (03PS1) 10David Caro: kyverno.copy_images_to_registry: update the versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) [14:20:58] (03CR) 10CI reject: [V:04-1] kyverno.copy_images_to_registry: update the versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) (owner: 10David Caro) [14:57:33] 06cloud-services-team, 10decommission-hardware: Failures when draining certain VMs with attached cinder volumes (coibot-2) - https://phabricator.wikimedia.org/T394790 (10Andrew) 03NEW [14:58:15] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry [14:58:15] !log dcaro@acme tools Updating container image toolforge-kyverno-kyverno:v1.13.6 [14:58:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:58:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:58:32] !log dcaro@acme tools END (ERROR) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=97) [14:58:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:59:06] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry [14:59:06] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) [14:59:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:59:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:59:50] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry [14:59:51] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) [14:59:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:59:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:00:18] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry [15:00:18] !log dcaro@acme tools Updating container image toolforge-kyverno-kyverno:v1.13.6 [15:00:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:00:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:01:00] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) [15:01:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:05:49] FIRING: [57x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudnet1005 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [15:05:56] FIRING: [2x] SystemdUnitDown: The service unit drain_rabbitmq_notification_error.service is in failed status on host cloudrabbit1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:07:18] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [15:10:41] (03open) 10aborrero: cli: drop outadated comment [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/70 [15:11:05] (03update) 10aborrero: cli: drop outadated comment [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/70 [15:22:11] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for all services [15:29:41] 06cloud-services-team, 10Cloud-VPS: Failures when draining certain VMs with attached cinder volumes (coibot-2) - https://phabricator.wikimedia.org/T394790#10839794 (10taavi) [15:33:19] RESOLVED: [33x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudnet1005 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [15:33:34] 06cloud-services-team, 10Cloud-VPS: Failures when draining certain VMs with attached cinder volumes (coibot-2) - https://phabricator.wikimedia.org/T394790#10839836 (10Andrew) Rabbitmq rebuild did not seem to change anything. [15:42:33] 06cloud-services-team, 10Cloud-VPS: Failures when draining certain VMs with attached cinder volumes (coibot-2) - https://phabricator.wikimedia.org/T394790#10839924 (10Andrew) A newly-created VM with attached volume seems to migrate fine. So this issue is specific to these particular VMs or volumes. The volume... [15:42:45] (03open) 10aborrero: maintain-kubeusers: add logic to drop access to deleted admins [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/71 (https://phabricator.wikimedia.org/T394786) [15:42:52] (03update) 10aborrero: maintain-kubeusers: add logic to drop access to deleted admins [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/71 (https://phabricator.wikimedia.org/T394786) [15:44:21] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry [15:44:21] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 [15:44:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:44:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:44:51] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 [15:44:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:44:58] (03PS2) 10David Caro: kyverno.copy_images_to_registry: update the versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) [15:45:30] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 [15:45:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:45:57] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background:v1.13.6 [15:45:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:46:28] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup:v1.13.6 [15:46:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:46:58] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports:v1.13.6 [15:47:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:47:28] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 [15:47:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:48:26] RESOLVED: SystemdUnitDown: The service unit drain_rabbitmq_notification_error.service is in failed status on host cloudrabbit1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudrabbit1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:48:27] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 [15:48:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:48:41] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) [15:48:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:49:23] (03update) 10aborrero: maintain-kubeusers: add logic to drop access to deleted admins [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/71 (https://phabricator.wikimedia.org/T394786) [15:50:33] (03CR) 10CI reject: [V:04-1] kyverno.copy_images_to_registry: update the versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) (owner: 10David Caro) [15:56:06] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [kyverno] Upgrade to `3.3.9` chart (`1.13` app) for k8s 1.30 support - https://phabricator.wikimedia.org/T394787#10839966 (10dcaro) p:05Triage→03Medium [15:57:09] (03update) 10aborrero: maintain-kubeusers: add logic to drop access to deleted admins [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/71 (https://phabricator.wikimedia.org/T394786) [16:00:09] (03update) 10aborrero: maintain-kubeusers: add logic to drop access to deleted admins [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/71 (https://phabricator.wikimedia.org/T394786) [16:00:25] (03PS3) 10David Caro: kyverno.copy_images_to_registry: update the versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) [16:01:15] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [kyverno] Upgrade to `3.3.9` chart (`1.13` app) for k8s 1.30 support - https://phabricator.wikimedia.org/T394787#10839984 (10dcaro) Uploaded the images to docker-registry.tools.wmflabs.org: ` docker-registry.tools.wikimedia.clo... [16:02:10] (03update) 10raymond-ndibe: [components.models.api_models] add config version check [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/74 (https://phabricator.wikimedia.org/T394273) [16:03:19] (03update) 10raymond-ndibe: [components.models.api_models] add config version check [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/74 (https://phabricator.wikimedia.org/T394273) [16:05:18] (03update) 10raymond-ndibe: [components-api.openapi] add an openapi spec endpoint [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/76 [16:05:21] (03close) 10raymond-ndibe: [components-api.openapi] add an openapi spec endpoint [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/76 [16:07:14] (03update) 10dcaro: [components-api.openapi] add an openapi spec endpoint [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/76 (owner: 10raymond-ndibe) [16:11:05] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry [16:11:06] !log dcaro@acme tools Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyverno:v1.13.6 [16:11:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:11:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:11:25] !log dcaro@acme tools END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) [16:11:27] (03approved) 10raymond-ndibe: api.metrics: add deprecation metrics [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/166 (https://phabricator.wikimedia.org/T390137) [16:11:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:11:29] (03update) 10raymond-ndibe: api.metrics: add deprecation metrics [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/166 (https://phabricator.wikimedia.org/T390137) [16:13:44] (03merge) 10raymond-ndibe: api.metrics: add deprecation metrics [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/166 (https://phabricator.wikimedia.org/T390137) [16:13:45] (03update) 10raymond-ndibe: [do not merge] test deprecation metrics [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/167 [16:16:22] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.376-20250520161355-68ad8b03 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/792 (https://phabricator.wikimedia.org/T390137) [16:16:24] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.376-20250520161355-68ad8b03 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/792 (https://phabricator.wikimedia.org/T390137) [16:27:57] (03open) 10chuckonwumelu: [cli] Adding warning message for beta [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/31 (https://phabricator.wikimedia.org/T394277) [16:32:22] FIRING: [2x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:36:56] FIRING: [2x] SystemdUnitDown: The service unit cinder-api.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:06:25] (03open) 10dcaro: dns: use the floating ip for docker-registry.svc.t.o [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/39 [17:16:46] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry [17:16:46] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 [17:16:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:16:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:17:02] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 [17:17:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:17:14] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 [17:17:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:17:26] RESOLVED: SystemdUnitDown: The service unit cinder-api.service is in failed status on host cloudcontrol1011. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1011 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:17:27] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.13.6 [17:17:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:17:39] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.13.6 [17:17:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:17:51] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.13.6 [17:17:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:18:04] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 [17:18:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:18:17] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 [17:18:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:18:28] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) [17:18:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:20:20] (03update) 10dcaro: dns: use the floating ip for docker-registry.svc.t.o [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/39 [17:23:31] (03open) 10dcaro: k8s: upgrade to 1.30 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/241 (https://phabricator.wikimedia.org/T362869) [17:23:56] FIRING: [2x] SystemdUnitDown: The service unit cinder-api.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:14:35] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Engineering, 06Data-Persistence, and 3 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10840707 (10Ahoelzl) @JAllemandou any implications for the Data Platform / sqooping? [18:25:36] 10Striker, 10Toolsbeta-Tools: toolsbeta sudo rules for new projects being created on wrong project - https://phabricator.wikimedia.org/T394823 (10Addshore) 03NEW [18:28:23] 06cloud-services-team, 10Striker: toolsbeta sudo rules for new projects being created on wrong project - https://phabricator.wikimedia.org/T394823#10840803 (10taavi) [18:56:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [19:06:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [19:18:56] FIRING: [2x] SystemdUnitDown: The systemd unit cinder-api.service on node cloudcontrol1006 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:20:18] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T394837 (10phaultfinder) 03NEW [19:28:56] RESOLVED: [2x] SystemdUnitDown: The service unit cinder-api.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:30:52] RESOLVED: [2x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [19:35:26] RESOLVED: [2x] SystemdUnitDown: The systemd unit cinder-api.service on node cloudcontrol1006 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:37:07] (03CR) 10Legoktm: [C:03+1] "Thanks, I honestly haven't deployed this tool in years, so if Krinkle wants to take care of it that would be nice, otherwise I can figure " [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1148349 (owner: 10Jforrester) [19:39:03] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry [19:39:03] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 [19:39:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:39:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:39:21] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 [19:39:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:39:33] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 [19:39:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:39:45] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.13.6 [19:39:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:39:58] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.13.6 [19:39:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:40:13] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.13.6 [19:40:13] 06cloud-services-team, 10Cloud-VPS: Failures when draining certain VMs with attached cinder volumes (coibot-2) - https://phabricator.wikimedia.org/T394790#10841239 (10Andrew) In the 'volumes' table in the cinder database I see quite a few volumes with ` host: cloudcontrol1005@rbd#RBD ` After ` update volu... [19:40:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:40:24] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 [19:40:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:40:36] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 [19:40:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:40:43] 06cloud-services-team, 10Cloud-VPS: Failures when draining certain VMs with attached cinder volumes (coibot-2) - https://phabricator.wikimedia.org/T394790#10841241 (10Andrew) ` mysql:root@localhost [cinder]> update volumes set host='cloudcontrol1007@rbd#RBD' where host='cloudcontrol1005@rbd#RBD'; Query OK, 1... [19:40:48] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) [19:40:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:41:02] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1039.eqiad.wmnet' (T394727) [19:41:08] T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727 [19:42:27] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1039.eqiad.wmnet' (T394727) [19:42:35] 06cloud-services-team, 10Striker: toolsbeta sudo rules for new projects being created on wrong project - https://phabricator.wikimedia.org/T394823#10841243 (10bd808) [19:43:48] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1037.eqiad.wmnet' (T394727) [19:45:36] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1037.eqiad.wmnet' (T394727) [19:48:06] 06cloud-services-team, 10Cloud-VPS: Failures when draining certain VMs with attached cinder volumes (coibot-2) - https://phabricator.wikimedia.org/T394790#10841274 (10Andrew) 05Open→03Resolved ...and just like that, everything is working and the remaining cloudvirts drained without trouble. [20:23:48] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10841425 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt1031.eqiad.wmnet` - cloudvirt1031.eqiad... [20:23:49] FIRING: [8x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1032 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [20:35:35] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10841473 (10Andrew) [20:35:41] FIRING: CloudVPSDesignateLeaks: Detected 18 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:35:57] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10841474 (10Andrew) [20:40:44] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10841491 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt1032.eqiad.wmnet` - cloudvirt1032.eqiad... [20:53:11] RESOLVED: CloudVPSDesignateLeaks: Detected 18 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:05:27] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10841642 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt1031.eqiad.wmnet` - cloudvirt1031.eqiad... [21:59:33] 06cloud-services-team, 10Data-Services, 10Quarry: Quarry WMCloud (ruwiki_p, section s6) experiencing sustained replication lag (~16 h) - https://phabricator.wikimedia.org/T394859#10841931 (10Reedy) [22:22:49] FIRING: [7x] NeutronAgentDownForLong: Neutron neutron-openvswitch-agent on cloudvirt1033 has been down for more than 2h - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDownForLong [22:22:56] 06cloud-services-team: NeutronAgentDownForLong - https://phabricator.wikimedia.org/T394861 (10phaultfinder) 03NEW