[00:37:19] <jinxer-wm>	 FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling
[02:32:19] <jinxer-wm>	 RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling
[02:32:19] <jinxer-wm>	 RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling
[02:35:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[02:45:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[08:32:37] <wikibugs>	 (03merge) 10aborrero: kubeconfig: support updating the file [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/68 (https://phabricator.wikimedia.org/T262562)
[08:34:59] <wikibugs>	 (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: maintain-kubeusers: bump to 0.0.175-20250519083249-6ee18335 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/789 (https://phabricator.wikimedia.org/T262562)
[08:45:39] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers
[08:46:51] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers
[08:50:39] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers
[08:51:42] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers
[09:03:32] <wikibugs>	 (03merge) 10aborrero: maintain-kubeusers: bump to 0.0.175-20250519083249-6ee18335 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/789 (https://phabricator.wikimedia.org/T262562) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[09:09:53] <wikibugs>	 10Toolforge (Toolforge iteration 20): mypy x509 invalid syntax while running CI tests - https://phabricator.wikimedia.org/T394593#10833808 (10dcaro) We upgraded pre-commit deps too, we can try regenerating the ci image see if that solves the versioning issue
[09:52:42] <wikibugs>	 06cloud-services-team, 10Toolforge: toolforge: tofu-provisioning: reorganize DNS records in the state - https://phabricator.wikimedia.org/T394645 (10aborrero) 03NEW
[09:53:00] <wikibugs>	 06cloud-services-team, 10Toolforge: toolforge: tofu-provisioning: reorganize DNS records in the state - https://phabricator.wikimedia.org/T394645#10834027 (10aborrero) 05Open→03In progress p:05Triage→03Medium work started here: https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merg...
[10:05:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[10:10:21] <wikibugs>	 (03update) 10aborrero: dns: use zonename_recordname as opentofu state key [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/22 (https://phabricator.wikimedia.org/T394645)
[10:10:50] <wikibugs>	 (03update) 10aborrero: dns: use zonename_recordname as opentofu state key [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/22 (https://phabricator.wikimedia.org/T394645)
[10:11:28] <wikibugs>	 (03update) 10aborrero: dns: use zonename_recordname as opentofu state key [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/22 (https://phabricator.wikimedia.org/T394645)
[10:13:45] <wikibugs>	 (03update) 10aborrero: dns: use zonename_recordname as opentofu state key [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/22 (https://phabricator.wikimedia.org/T394645)
[10:15:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-2 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[10:15:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[10:40:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-2 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[10:40:33] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-2 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[10:44:47] <wikibugs>	 (03update) 10aborrero: dns: use zonename_recordname as opentofu state key [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/22 (https://phabricator.wikimedia.org/T394645)
[10:45:33] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-2 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[10:49:12] <wikibugs>	 (03update) 10dcaro: runtime.k8s.image: periodically refresh image-config data [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/160 (https://phabricator.wikimedia.org/T357112) (owner: 10raymond-ndibe)
[10:57:41] <wikibugs>	 10Cloud Services Proposals, 06cloud-services-team, 10Striker: Decision request - Tool account management and Striker - https://phabricator.wikimedia.org/T394035#10834265 (10taavi) >>! In T394035#10823194, @fnegri wrote: > Option Purple is my favourite so far, but I'm still a bit confused about how the new se...
[11:05:04] <wikibugs>	 06cloud-services-team, 10Toolforge: Decouple Toolforge API gateway authentication from Kubernetes certificates - https://phabricator.wikimedia.org/T332478#10834273 (10taavi) >>! In T332478#10747989, @dcaro wrote: > @taavi should I close this as duplicate? Or do you want to refresh/extend the oauth+dedicated au...
[11:18:01] <wikibugs>	 10Cloud Services Proposals, 06cloud-services-team, 10Striker: Decision request - Tool account management and Striker - https://phabricator.wikimedia.org/T394035#10834319 (10dcaro) >>! In T394035#10820913, @fnegri wrote: >>> Could it be a generic LDAP adapter, with some minimal logic to restrict the damage yo...
[12:13:20] <wikibugs>	 (03update) 10dcaro: [envvars-api] return custom message for invalid EnvvarName [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/53 (https://phabricator.wikimedia.org/T360147) (owner: 10raymond-ndibe)
[12:16:29] <wikibugs>	 (03open) 10aborrero: tofu-infra: introduce gitlab CI/CD workflow [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/236 (https://phabricator.wikimedia.org/T370652)
[12:18:51] <wikibugs>	 (03update) 10aborrero: tofu-infra: introduce gitlab CI/CD workflow [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/236 (https://phabricator.wikimedia.org/T370652)
[12:33:51] <wikibugs>	 10Toolforge (Toolforge iteration 20): [builds-api] Store the commit hash that was used for the build - https://phabricator.wikimedia.org/T389043#10834525 (10dcaro) 05In progress→03Resolved
[12:34:12] <wikibugs>	 10Toolforge (Toolforge iteration 20), 07Epic: [cicd] create cicd flow for non repo owners - https://phabricator.wikimedia.org/T394594#10834529 (10dcaro) 05Duplicate→03Resolved
[12:34:17] <wikibugs>	 10Toolforge (Toolforge iteration 20): [jobs-api] prepend date and pod name to filelog lines - https://phabricator.wikimedia.org/T372025#10834532 (10dcaro) 05Duplicate→03Resolved
[12:48:40] <wikibugs>	 (03approved) 10dcaro: api.metrics: add deprecation metrics [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/166 (https://phabricator.wikimedia.org/T390137) (owner: 10raymond-ndibe)
[12:48:43] <wikibugs>	 (03update) 10dcaro: api.metrics: add deprecation metrics [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/166 (https://phabricator.wikimedia.org/T390137) (owner: 10raymond-ndibe)
[12:57:16] <wikibugs>	 (03update) 10dcaro: [jobs-api] check services diff [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/158 (https://phabricator.wikimedia.org/T392717) (owner: 10raymond-ndibe)
[13:05:05] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Engineering: Create existencelinks table in production - https://phabricator.wikimedia.org/T394617#10834617 (10Ladsgroup) From the #DBA sign off, it had and has my sign off (I reviewed the schema, etc.). You can deploy it yourself. I added it to the table catalo...
[13:13:05] <wikibugs>	 06cloud-services-team, 10Bitu, 06Infrastructure-Foundations, 07LDAP: Allocate more available UNIX UIDs for human users - https://phabricator.wikimedia.org/T355663#10834648 (10Andrew) Hello @MoritzMuehlenhoff and @SLyngshede-WMF -- this re-allocation/bitu change needs to happen soon. We have a corresponding...
[13:17:50] <wikibugs>	 06cloud-services-team, 10Toolforge: Decouple Toolforge API gateway authentication from Kubernetes certificates - https://phabricator.wikimedia.org/T332478#10834658 (10taavi)
[13:17:54] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Understand Octavia network needs - https://phabricator.wikimedia.org/T394099#10834659 (10Andrew) @aborrero, can you stage patches for this same change in eqiad1?  Also: I'm pretty sure that  everything you did was in tofu and/or puppet but want to be 10...
[13:23:56] <wikibugs>	 (03update) 10raymond-ndibe: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136)
[13:30:32] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Service implementation for Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T394671 (10Andrew) 03NEW
[13:41:42] <wikibugs>	 10Toolforge (Toolforge iteration 20): [components-api]  Add support for port/helathcheck for continuous jobs in tool config/depolyment - https://phabricator.wikimedia.org/T362072#10834795 (10dcaro)
[13:41:42] <wikibugs>	 06cloud-services-team, 10Toolforge: [components-api] add order to the components deployment - https://phabricator.wikimedia.org/T362075#10834796 (10dcaro)
[13:49:09] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Understand Octavia network needs - https://phabricator.wikimedia.org/T394099#10834835 (10aborrero) I can confirm I didn't do anything "hidden". All the changes/commits were references with this ticket (puppet, gitlab), so it should be fairly simple to r...
[13:51:13] <wikibugs>	 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20), 05Cloud-Services-Origin-Team, and 3 others: [Hypothesis] WE6.3.10 start a beta for the push-to-deploy features - https://phabricator.wikimedia.org/T393564#10834837 (10dcaro)
[13:54:10] <wikibugs>	 10wikitech.wikimedia.org, 06serviceops-radar, 06SRE, 13Patch-For-Review, 07SRE-Unowned: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#10834856 (10Andrew) >>! In T376400#10825452, @taavi wrote: > The site at http://ec2-54-81-201-239.compute-1.amazonaws.com/ seems to embed images fro...
[13:55:20] <wikibugs>	 10Toolforge (Toolforge iteration 20): [components-api,buildsa-api] When building and deploying, if none of the settings changed, the jobs are not restarted - https://phabricator.wikimedia.org/T389044#10834867 (10dcaro) Now that we have the `resolved_ref` property returned by the builds-api for each build, we can...
[13:57:32] <wikibugs>	 10wikitech.wikimedia.org, 06serviceops-radar, 06SRE, 13Patch-For-Review, 07SRE-Unowned: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#10834875 (10taavi) >>! In T376400#10834856, @Andrew wrote: > Can you point me to some specific examples? My half-baked spot checks (e.g. http://ec2-...
[14:00:30] <wikibugs>	 10wikitech.wikimedia.org, 06serviceops-radar, 06SRE, 13Patch-For-Review, 07SRE-Unowned: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#10834892 (10Andrew) yep, I see it now.
[14:04:04] <jinxer-wm>	 FIRING: [2x] PuppetConstantChange: Puppet performing a change on every puppet run on clouddumps1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange
[14:27:32] <wikibugs>	 (03update) 10dcaro: [jobs-api] check services diff [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/158 (https://phabricator.wikimedia.org/T392717) (owner: 10raymond-ndibe)
[14:27:58] <wikibugs>	 (03update) 10dcaro: [envvars-cli] print error string and not dict [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/81 (https://phabricator.wikimedia.org/T360147) (owner: 10raymond-ndibe)
[14:29:26] <wikibugs>	 (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe)
[14:30:32] <wikibugs>	 (03update) 10raymond-ndibe: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136)
[14:34:52] <wikibugs>	 10Toolforge (Toolforge iteration 20), 07Epic: [cicd] create cicd flow for non repo owners - https://phabricator.wikimedia.org/T394594#10835043 (10JJMC89) →14Duplicate dup:03T394595
[14:34:53] <wikibugs>	 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [cicd] create cicd flow for non repo owners - https://phabricator.wikimedia.org/T394595#10835044 (10JJMC89)
[14:35:27] <wikibugs>	 10Toolforge (Toolforge iteration 20): [jobs-api] prepend date and pod name to filelog lines - https://phabricator.wikimedia.org/T372025#10835050 (10JJMC89) →14Duplicate dup:03T127367
[14:35:32] <wikibugs>	 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge tools - https://phabricator.wikimedia.org/T127367#10835051 (10JJMC89)
[15:10:25] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10835170 (10fnegri) @joanna_borun as we agreed, I sent an email to cloud-announce with the following text:  ` Starting next month, we are going t...
[15:18:32] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10835232 (10fnegri) @Marostegui is there any page on wikitech with the procedure that you usually follow for major-version upgrades?  I was think...
[15:19:16] <wikibugs>	 (03open) 10aborrero: eqiad1: introduce openstack octavia network support [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/237 (https://phabricator.wikimedia.org/T394099)
[15:21:01] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 10Data-Platform: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10835253 (10dr0ptp4kt) Looping Data Platform with a tag addition here for tracking.
[15:21:58] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Engineering, 06Data-Persistence, and 2 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10835264 (10dr0ptp4kt)
[15:23:13] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch
[15:23:46] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Engineering, 06Data-Persistence, and 2 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10835274 (10fnegri) @dr0ptp4kt thanks! We might even start doing this for `an-redacteddb1001`, before clouddb...
[15:24:45] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch
[15:39:09] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: puppet-enc issue with Hiera values starting with a colon due to PyYAML and Ruby YAML parsing differences - https://phabricator.wikimedia.org/T394691 (10taavi) 03NEW
[15:40:51] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: puppet-enc issue with Hiera values starting with a colon due to PyYAML and Ruby YAML parsing differences - https://phabricator.wikimedia.org/T394691#10835469 (10taavi) p:05Triage→03High a:03taavi
[15:44:32] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Understand Octavia network needs - https://phabricator.wikimedia.org/T394099#10835484 (10aborrero) >>! In T394099#10834659, @Andrew wrote: > @aborrero, can you stage patches for this same change in eqiad1? >   Done, see: * https://gitlab.wikimedia.org/r...
[15:50:14] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: puppet-enc issue with Hiera values starting with a colon due to PyYAML and Ruby YAML parsing differences - https://phabricator.wikimedia.org/T394691#10835509 (10taavi) 05Open→03Resolved
[15:54:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit libvirtd-tls.socket is in failed status on host cloudvirt1076. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1076 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[16:03:01] <wikibugs>	 (03open) 10raymond-ndibe: [runtimes.k8s.jobs] fix default resource bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/168
[16:03:50] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1075 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[16:08:08] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1072 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[16:12:24] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1069 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[16:14:53] <wikibugs>	 (03approved) 10dcaro: [runtimes.k8s.jobs] fix default resource bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/168 (owner: 10raymond-ndibe)
[16:16:42] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1076 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[16:18:53] <wikibugs>	 (03open) 10raymond-ndibe: [runtimes.k8s.jobs] fix default resource bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169
[16:21:00] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1073 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[16:22:35] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1065']
[16:22:41] <wikibugs>	 (03update) 10raymond-ndibe: [runtimes.k8s.runtime] testing diff bug fix [repos/cloud/toolforge/jobs-api] (fix_default_resource_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169
[16:22:46] <wikibugs>	 (03update) 10raymond-ndibe: [runtimes.k8s.runtime] testing diff bug fix [repos/cloud/toolforge/jobs-api] (fix_default_resource_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169
[16:23:07] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1065']
[16:24:24] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1075 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[16:25:16] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1071 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[16:25:24] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1075 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[16:27:26] <jinxer-wm>	 RESOLVED: SystemdUnitDown: The service unit libvirtd-tls.socket is in failed status on host cloudvirt1076. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1076 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[16:29:34] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1068 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[16:30:36] <wikibugs>	 (03update) 10dcaro: [runtimes.k8s.jobs] fix default resource bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/168 (owner: 10raymond-ndibe)
[16:35:03] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Engineering, 06Data-Persistence, and 2 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10835772 (10Marostegui) >>! In T394372#10835232, @fnegri wrote: > @Marostegui is there any page on wikitech w...
[16:35:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[16:56:21] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1068', 'cloudvirt1069', 'cloudvirt1070', 'cloudvirt1071']
[16:57:25] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1069 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[16:57:34] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1068 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[16:58:02] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1068', 'cloudvirt1069', 'cloudvirt1070', 'cloudvirt1071']
[16:58:16] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1071 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[17:02:32] <icinga-wm>	 PROBLEM - Host cloudvirt1072 is DOWN: PING CRITICAL - Packet loss = 100%
[17:05:49] <jinxer-wm>	 FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1072 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[17:06:00] <icinga-wm>	 RECOVERY - Host cloudvirt1072 is UP: PING OK - Packet loss = 0%, RTA = 0.39 ms
[17:06:10] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1072 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[17:10:48] <jinxer-wm>	 FIRING: PuppetFailure: Puppet has failed on cloudvirt1076:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[17:10:59] <wikibugs>	 06cloud-services-team: PuppetFailure Puppet has failed on cloudvirt1076:9100 - https://phabricator.wikimedia.org/T394706 (10phaultfinder) 03NEW
[17:11:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit networking.service is in failed status on host cloudvirt1072. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1072 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[17:15:48] <jinxer-wm>	 RESOLVED: PuppetFailure: Puppet has failed on cloudvirt1076:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure
[17:26:17] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1074']
[17:26:47] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1074']
[17:26:56] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitDown: The service unit libvirtd-tls.socket is in failed status on host cloudvirt1074. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1074 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[17:27:54] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1073.eqiad.wmnet}'
[17:30:34] <icinga-wm>	 PROBLEM - Host cloudvirt1073 is DOWN: PING CRITICAL - Packet loss = 100%
[17:31:18] <icinga-wm>	 RECOVERY - Host cloudvirt1073 is UP: PING OK - Packet loss = 0%, RTA = 0.38 ms
[17:31:27] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1073.eqiad.wmnet}'
[17:32:00] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1073 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[17:32:11] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1076.eqiad.wmnet}'
[17:32:49] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10836173 (10Dzahn) ` 17:32 <+icinga-wm> PROBLEM - ensure kvm processes are running on cloudvirt1073 is CRITICAL: PROCS CRITICAL: 0 processes with regex ar...
[17:34:18] <icinga-wm>	 PROBLEM - Host cloudvirt1076 is DOWN: PING CRITICAL - Packet loss = 100%
[17:35:29] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1076.eqiad.wmnet}'
[17:35:30] <wikibugs>	 06cloud-services-team: PuppetFailure Puppet has failed on cloudvirt1076:9100 - https://phabricator.wikimedia.org/T394706#10836193 (10Dzahn) alerts for host being completely down now:  https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=1&host=cloudvirt1076
[17:35:49] <jinxer-wm>	 FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1073 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[17:36:00] <icinga-wm>	 RECOVERY - Host cloudvirt1076 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms
[17:36:44] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1076 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[17:40:28] <wmcs-alerts>	 FIRING: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown
[17:40:39] <wmcs-alerts>	 FIRING: QuarryDown: Quarry application is unreachable   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown
[17:40:49] <jinxer-wm>	 FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1073 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[17:41:56] <jinxer-wm>	 FIRING: [4x] SystemdUnitDown: The service unit networking.service is in failed status on host cloudvirt1073. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[17:47:40] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1072.eqiad.wmnet}'
[17:50:18] <icinga-wm>	 PROBLEM - Host cloudvirt1072 is DOWN: PING CRITICAL - Packet loss = 100%
[17:52:38] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1072.eqiad.wmnet}'
[17:52:46] <icinga-wm>	 RECOVERY - Host cloudvirt1072 is UP: PING OK - Packet loss = 0%, RTA = 0.43 ms
[17:53:10] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1072 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[17:55:26] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1075.eqiad.wmnet}'
[17:55:49] <jinxer-wm>	 FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1072 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[17:57:58] <icinga-wm>	 PROBLEM - Host cloudvirt1075 is DOWN: PING CRITICAL - Packet loss = 100%
[17:58:59] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1075.eqiad.wmnet}'
[17:59:12] <icinga-wm>	 RECOVERY - Host cloudvirt1075 is UP: PING OK - Packet loss = 0%, RTA = 0.36 ms
[17:59:52] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1075 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:05:33] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1073']
[18:05:49] <jinxer-wm>	 RESOLVED: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1073 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[18:06:00] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1073 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:06:03] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1073']
[18:06:39] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1075']
[18:06:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit networking.service is in failed status on host cloudvirt1075. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1075 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[18:07:09] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1075']
[18:07:45] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1072', 'cloudvirt1076']
[18:07:52] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1075 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:08:10] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1072 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:08:42] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1072', 'cloudvirt1076']
[18:08:44] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1076 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[18:24:52] <wikibugs>	 06cloud-services-team: Rename cloudcontrol200[789]-dev.codfw to cloudrabbit200[123]-dev.codfw - https://phabricator.wikimedia.org/T392539#10836545 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1072.eqiad.wmnet with OS bookworm
[18:33:09] <wmcs-alerts>	 RESOLVED: QuarryDown: Quarry application is unreachable   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown
[18:37:58] <wmcs-alerts>	 RESOLVED: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown
[18:40:39] <wikibugs>	 06cloud-services-team, 10decommission-hardware: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727 (10Andrew) 03NEW
[18:43:59] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1031.eqiad.wmnet' (T394727)
[18:44:05] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[18:44:29] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1031.eqiad.wmnet' (T394727)
[18:44:37] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1032.eqiad.wmnet' (T394727)
[18:45:06] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1032.eqiad.wmnet' (T394727)
[18:45:53] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1033.eqiad.wmnet' (T394727)
[18:46:23] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1033.eqiad.wmnet' (T394727)
[18:46:49] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1034.eqiad.wmnet' (T394727)
[18:47:20] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1034.eqiad.wmnet' (T394727)
[18:47:28] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1035.eqiad.wmnet' (T394727)
[18:47:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1035.eqiad.wmnet' (T394727)
[18:48:52] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance
[18:48:55] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=99)
[18:50:28] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance
[18:50:31] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=99)
[18:50:56] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[18:51:27] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0)
[18:51:36] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance
[18:51:39] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0)
[18:51:45] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1036.eqiad.wmnet' (T394727)
[18:51:50] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[18:52:24] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1036.eqiad.wmnet' (T394727)
[18:54:33] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1068']
[18:54:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1068']
[18:56:23] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1069']
[18:56:47] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1069']
[18:57:21] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1070']
[18:57:46] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1070']
[18:58:09] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1037.eqiad.wmnet' (T394727)
[18:58:14] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[19:00:28] <wikibugs>	 (03update) 10raymond-ndibe: [runtimes.k8s.runtime] testing diff bug fix [repos/cloud/toolforge/jobs-api] (fix_default_resource_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169
[19:00:31] <wikibugs>	 06cloud-services-team: Rename cloudcontrol200[789]-dev.codfw to cloudrabbit200[123]-dev.codfw - https://phabricator.wikimedia.org/T392539#10836690 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1072.eqiad.wmnet with OS bookworm executed with errors:...
[19:09:21] <wikibugs>	 (03PS1) 10Amire80: Consistent spelling of "metadata" in a message [labs/tools/intuition] - 10https://gerrit.wikimedia.org/r/1147856
[19:09:30] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1073', 'cloudvirt1076']
[19:09:34] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1073', 'cloudvirt1076']
[19:15:23] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1073', 'cloudvirt1076']
[19:15:32] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1073', 'cloudvirt1076']
[19:17:20] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1075']
[19:17:27] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1075']
[19:28:03] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1037.eqiad.wmnet' (T394727)
[19:28:09] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[19:28:42] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1037.eqiad.wmnet' (T394727)
[19:38:13] <wikibugs>	 10Tool-campwiz-nxt: Implement Reverse proxy and Failover server into campwiz nxt - https://phabricator.wikimedia.org/T394730 (10Nokib_Sarkar) 03NEW
[19:38:50] <wikibugs>	 10Tool-campwiz-nxt: Implement Reverse proxy and Failover server into campwiz nxt - https://phabricator.wikimedia.org/T394730#10836795 (10Nokib_Sarkar)
[19:38:51] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1037.eqiad.wmnet' (T394727)
[19:38:52] <wikibugs>	 10Tool-campwiz-nxt: Migration of CampWiz NXT to toolforge - https://phabricator.wikimedia.org/T394515#10836796 (10Nokib_Sarkar)
[19:38:57] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[19:39:10] <wikibugs>	 10Tool-campwiz-nxt: Implement Reverse proxy and Failover server into campwiz nxt - https://phabricator.wikimedia.org/T394730#10836803 (10Nokib_Sarkar) 05Open→03In progress
[19:40:51] <wikibugs>	 10Tool-campwiz-nxt: Migration of CampWiz NXT to toolforge - https://phabricator.wikimedia.org/T394515#10836813 (10Nokib_Sarkar)
[19:43:48] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1038.eqiad.wmnet' (T394727)
[19:59:12] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1038.eqiad.wmnet' (T394727)
[19:59:21] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[20:31:51] <wikibugs>	 (03update) 10bd808: Convert project to golang [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/6
[20:37:45] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1039.eqiad.wmnet' (T394727)
[20:37:51] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[20:58:31] <wmcs-alerts>	 FIRING: ToolsNfsAlmostFull: Toolforge NFS is 0.8646319687934529/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull
[21:05:16] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1039.eqiad.wmnet' (T394727)
[21:05:22] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[21:07:17] <wikibugs>	 10Toolforge (Toolforge iteration 20): [jobs-api] bug in runtime diff_with_running_job function - https://phabricator.wikimedia.org/T394734 (10Raymond_Ndibe) 03NEW
[21:07:27] <wikibugs>	 10Toolforge (Toolforge iteration 20): [jobs-api] bug in runtime diff_with_running_job function - https://phabricator.wikimedia.org/T394734#10837225 (10Raymond_Ndibe) a:03Raymond_Ndibe
[21:08:42] <wikibugs>	 (03update) 10raymond-ndibe: [runtimes.k8s.runtime] testing diff bug fix [repos/cloud/toolforge/jobs-api] (fix_default_resource_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734)
[21:14:24] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1071']
[21:14:49] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1071']
[21:16:28] <wikibugs>	 10Toolforge (Toolforge iteration 20): [jobs-api] bug in runtime diff_with_running_job function - https://phabricator.wikimedia.org/T394734#10837250 (10Raymond_Ndibe) 05Open→03In progress
[21:16:42] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1076 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:16:44] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1073 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:16:56] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1073', 'cloudvirt1074', 'cloudvirt1075', 'cloudvirt1076']
[21:17:14] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1075 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:17:44] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1073 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:17:50] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1039.eqiad.wmnet' (T394727)
[21:17:55] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[21:18:14] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1075 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:18:24] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1073', 'cloudvirt1074', 'cloudvirt1075', 'cloudvirt1076']
[21:18:42] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1076 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:28:00] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1039.eqiad.wmnet' (T394727)
[21:28:07] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[21:28:33] <wikibugs>	 (03update) 10raymond-ndibe: [runtimes.k8s.runtime] testing diff bug fix [repos/cloud/toolforge/jobs-api] (fix_default_resource_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734)
[21:34:12] <wikibugs>	 (03update) 10raymond-ndibe: [runtimes.k8s.runtime] testing diff bug fix [repos/cloud/toolforge/jobs-api] (fix_default_resource_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734)
[21:37:36] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1037.eqiad.wmnet' (T394727)
[21:37:43] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[21:45:27] <wmcs-alerts>	 FIRING: ToolsbetaNFSDown: No toolsbeta nfs services running found - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNFSDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsbetaNFSDown
[21:47:46] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1037.eqiad.wmnet' (T394727)
[21:47:51] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[21:50:27] <wmcs-alerts>	 RESOLVED: ToolsbetaNFSDown: No toolsbeta nfs services running found - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNFSDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsbetaNFSDown
[21:56:01] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1039.eqiad.wmnet' (T394727)
[21:56:06] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[21:59:36] <wikibugs>	 (03update) 10raymond-ndibe: [runtimes.k8s.jobs] fix default resource bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/168
[22:06:02] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1039.eqiad.wmnet' (T394727)
[22:06:09] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[22:15:54] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1057 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:16:02] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1069 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:16:10] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1068 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:16:54] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1057 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:17:49] <wikibugs>	 (03update) 10raymond-ndibe: [runtimes.k8s.jobs] fix default resource bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/168
[22:18:48] <wikibugs>	 (03update) 10raymond-ndibe: [runtimes.k8s.runtime] testing diff bug fix [repos/cloud/toolforge/jobs-api] (fix_default_resource_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734)
[22:19:46] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1069 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:20:34] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1072']
[22:21:10] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1068 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:21:55] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) on eqiad1, with recreate False, for hosts list: ['cloudvirt1072']
[22:22:10] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1068 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:22:36] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1072']
[22:23:07] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) on eqiad1, with recreate False, for hosts list: ['cloudvirt1072']
[22:23:54] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1068 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:24:59] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1072']
[22:25:07] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) on eqiad1, with recreate False, for hosts list: ['cloudvirt1072']
[22:25:46] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1069 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:26:17] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,nova
[22:26:23] <wikibugs>	 (03update) 10bd808: Convert project to golang [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/6
[22:26:40] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.restart_openstack (exit_code=97) on deployment eqiad1 for service: project,nova
[22:27:10] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1072']
[22:27:21] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) on eqiad1, with recreate False, for hosts list: ['cloudvirt1072']
[22:28:45] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1072']
[22:29:10] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1072']
[22:29:16] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,nova
[22:30:45] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1069 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:30:46] <wikibugs>	 (03update) 10raymond-ndibe: [runtimes.k8s.runtime] testing diff bug fix [repos/cloud/toolforge/jobs-api] (fix_default_resource_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734)
[22:30:53] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1071 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:31:27] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1070 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:31:33] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1071 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:31:45] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1070 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:34:33] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1071 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:34:45] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1070 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:34:45] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1069 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:34:53] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1068 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:35:01] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1069 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:35:26] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for service: project,nova
[22:35:45] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1070 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:36:01] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1069 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:36:49] <wikibugs>	 (03update) 10raymond-ndibe: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] (fix_default_resource_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734)
[22:37:15] <wikibugs>	 (03update) 10raymond-ndibe: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] (fix_default_resource_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734)
[22:39:33] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1071 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:39:45] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1069 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:39:53] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1068 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:42:53] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1068 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:43:09] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,nova
[22:43:09] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1068 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:43:28] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.restart_openstack (exit_code=97) on deployment eqiad1 for service: project,nova
[22:44:33] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1071 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:44:53] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1071 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:45:27] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1070 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:45:45] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1070 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:46:15] <wikibugs>	 (03update) 10bd808: Convert project to golang [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/6
[22:46:45] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1069 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:47:01] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1069 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:47:48] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1039.eqiad.wmnet' (T394727)
[22:47:53] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[22:58:01] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1039.eqiad.wmnet' (T394727)
[22:58:08] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[23:26:11] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[23:36:22] <wikibugs>	 (03update) 10bd808: Convert project to golang [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/6
[23:52:29] <wikibugs>	 (03update) 10bd808: Convert project to golang [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/6