[00:45:09] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-7 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[00:45:17] <wikibugs>	 (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] kyverno do not validate DELETE operations [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/62 (https://phabricator.wikimedia.org/T375157)
[02:20:56] <wikibugs>	 (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] kyverno do not validate DELETE operations [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/62 (https://phabricator.wikimedia.org/T375157)
[02:41:10] <wikibugs>	 (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] kyverno do not validate DELETE operations [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/62 (https://phabricator.wikimedia.org/T375157)
[03:15:55] <wikibugs>	 (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] kyverno do not validate DELETE operations [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/62 (https://phabricator.wikimedia.org/T375157)
[03:50:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[03:52:14] <wikibugs>	 (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] kyverno do not validate DELETE operations [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/62 (https://phabricator.wikimedia.org/T375157)
[03:52:48] <wikibugs>	 (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] kyverno do not validate UPDATE operations [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/62 (https://phabricator.wikimedia.org/T375157)
[03:53:05] <wikibugs>	 (03update) 10raymond-ndibe: [maintain-kubeusers] kyverno do not validate UPDATE operations [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/62 (https://phabricator.wikimedia.org/T375157)
[04:30:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[06:20:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[06:30:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[07:07:49] <wikibugs>	 (03PS1) 10Elukey: requestctl: change comment for post_docroot.yaml [labs/private] - 10https://gerrit.wikimedia.org/r/1074849
[07:08:07] <wikibugs>	 (03CR) 10Elukey: [V:03+2 C:03+2] requestctl: change comment for post_docroot.yaml [labs/private] - 10https://gerrit.wikimedia.org/r/1074849 (owner: 10Elukey)
[08:11:27] <jinxer-wm>	 FIRING: ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:11:31] <wikibugs>	 06cloud-services-team: ProbeDown  virt.cloudgw.eqiad1.wikimediacloud.org:0 failed when probed by icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4 from codfw. Availability is 50%. - https://phabricator.wikimedia.org/T375362 (10phaultfinder) 03NEW
[08:15:06] <wikibugs>	 06cloud-services-team: ProbeDown  virt.cloudgw.eqiad1.wikimediacloud.org:0 failed when probed by icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4 from codfw. Availability is 50%. - https://phabricator.wikimedia.org/T375362#10166579 (10aborrero) 05Open→03Resolved a:03aborrero something happened with the...
[08:16:27] <jinxer-wm>	 RESOLVED: ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:42:15] <wikibugs>	 10Toolforge: restarting a continuous jobs causes for some seconds two jobs are running side by side - https://phabricator.wikimedia.org/T375366 (10Wurgl) 03NEW
[08:53:34] <wikibugs>	 10wikitech.wikimedia.org, 10Gerrit, 07LDAP: Rename account Zoranzoki21 to Kizule on Gerrit - https://phabricator.wikimedia.org/T260647#10166724 (10Kizule) 05Declined→03Open I'm reopening this task per https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/message/5NBCVPPOXB4O3KI7B4YJB...
[09:18:31] <wikibugs>	 10Toolforge: restarting a continuous jobs causes for some seconds two jobs are running side by side - https://phabricator.wikimedia.org/T375366#10166755 (10aborrero)
[09:19:58] <wikibugs>	 10Toolforge: restarting a continuous jobs causes for some seconds two jobs are running side by side - https://phabricator.wikimedia.org/T375366#10166754 (10aborrero) Kubernetes creates the replacement pod as soon as the first pod enters termination state, without waiting for the first pod to actually disappear....
[09:20:44] <wikibugs>	 10Toolforge: restarting a continuous jobs causes for some seconds two jobs are running side by side - https://phabricator.wikimedia.org/T375366#10166757 (10aborrero) 05Open→03In progress p:05Triage→03Low
[09:30:52] <wikibugs>	 10Toolforge: restarting a continuous jobs causes for some seconds two jobs are running side by side - https://phabricator.wikimedia.org/T375366#10166775 (10aborrero) I think this is the patch I'm proposing:  ` diff --git a/tjf/runtimes/k8s/jobs.py b/tjf/runtimes/k8s/jobs.py index fd3e85c..22eff5d 100644 --- a/tj...
[09:32:53] <wikibugs>	 (03open) 10aborrero: jobs: continuous: set strategy based on number of replicas [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/124 (https://phabricator.wikimedia.org/T375366)
[09:34:44] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10166785 (10elukey) Hi @dcaro! I know that you have been battling with some issues on cloud nod...
[09:37:29] <wikibugs>	 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: restarting a continuous jobs causes for some seconds two jobs are running side by side - https://phabricator.wikimedia.org/T375366#10166793 (10aborrero)
[09:43:25] <wikibugs>	 (03approved) 10dcaro: toolforge_depoly_mr: set the latest MR as default [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/191
[09:43:29] <wikibugs>	 (03merge) 10dcaro: toolforge_depoly_mr: set the latest MR as default [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/191
[09:48:32] <icinga-wm>	 RECOVERY - Host cloudvirt1063 is UP: PING OK - Packet loss = 0%, RTA = 0.36 ms
[09:49:04] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1063 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:51:24] <wikibugs>	 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Put cloudcephosd10[39-41] into service - https://phabricator.wikimedia.org/T372814#10166816 (10dcaro) >>! In T372814#10165304, @Jclark-ctr wrote: > @Andrew  i see this ticket is in my name. is there something i need to do for this?...
[09:53:59] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10166820 (10fnegri) I restarted the server from the mgmt interface, I could ssh to it and check the syslog at the time of the crash. It's not much helpful but it's similar to the log entry...
[09:54:29] <wikibugs>	 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Put cloudcephosd10[39-41] into service - https://phabricator.wikimedia.org/T372814#10166817 (10dcaro) a:05Jclark-ctr→03dcaro
[10:06:05] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10166885 (10fnegri)
[10:50:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[11:00:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[12:21:44] <wikibugs>	 (03merge) 10aborrero: secgroups: add optional default security group [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/50 (https://phabricator.wikimedia.org/T375111)
[12:22:00] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch
[12:23:35] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch
[12:24:14] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack
[12:26:30] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0)
[12:27:10] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch
[12:28:41] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch
[12:28:52] <wikibugs>	 10Tool-lexeme-forms: Lexeme-forms on Toolforge returns error - https://phabricator.wikimedia.org/T374344#10167261 (10Fnielsen) Fine. I haven't seen the problem for a while.
[12:29:17] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch
[12:30:58] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch
[12:34:58] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch
[12:35:37] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch
[12:36:58] <wikibugs>	 (03open) 10aborrero: secgroups: codfw1dev-r_default: fix protocol casing [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/52
[12:38:21] <wikibugs>	 (03merge) 10aborrero: secgroups: codfw1dev-r_default: fix protocol casing [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/52
[12:38:28] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch
[12:39:07] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch
[12:41:55] <wikibugs>	 (03update) 10aborrero: secgroups: enable delete_default_rules [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/51 (https://phabricator.wikimedia.org/T375111)
[12:47:17] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: tofu-infra: refactor repo structure - https://phabricator.wikimedia.org/T375283#10167307 (10aborrero) 05Open→03In progress p:05Triage→03Medium
[12:57:36] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: tofu-infra: refactor repo structure - https://phabricator.wikimedia.org/T375283#10167348 (10aborrero) I like the idea of refactoring the repo to be per-project.  However, I'm not that sure about requiring a single tofu plan/apply per tenant. On the other hand, if it is the c...
[14:01:03] <wikibugs>	 (03open) 10raymond-ndibe: [toolforge.kyverno] update kubeVersion [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/528 (https://phabricator.wikimedia.org/T359641)
[14:02:20] <wikibugs>	 (03update) 10raymond-ndibe: [toolforge.kyverno] update kubeVersion [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/528 (https://phabricator.wikimedia.org/T359641)
[14:10:43] <wikibugs>	 (03update) 10dcaro: [jobs-cli] remove _display_messages [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/62 (owner: 10raymond-ndibe)
[14:10:48] <wikibugs>	 (03update) 10dcaro: [envvars-cli] remove display_messages [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/57 (owner: 10raymond-ndibe)
[14:10:51] <wikibugs>	 (03update) 10dcaro: [builds-cli] remove _display_messages [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/69 (owner: 10raymond-ndibe)
[14:11:35] <wikibugs>	 10Tool-Global-user-contributions, 10Special:GlobalContributions, 06Stewards-and-global-tools, 07Epic, and 2 others: [Epic] Implement global contributions feature - https://phabricator.wikimedia.org/T337089#10167579 (10KColeman-WMF)
[14:33:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit wmf_auto_restart_virtlogd.service is in failed status on host cloudvirt1063. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[14:37:33] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_rack
[14:37:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[15:27:31] <wikibugs>	 (03merge) 10aborrero: secgroups: enable delete_default_rules [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/51 (https://phabricator.wikimedia.org/T375111)
[15:27:40] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch
[15:36:42] <wikibugs>	 (03PS1) 10David Caro: ceph.undrain_rack: undrain from different hosts in parallel [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075042
[15:36:43] <wikibugs>	 (03PS1) 10David Caro: ceph.osd.undrain_rack: undrain osds from different hosts when able [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075043
[15:37:02] <wikibugs>	 (03PS2) 10David Caro: ceph.undrain_rack: undrain from different hosts in parallel [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075042
[15:40:18] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch
[15:40:55] <wikibugs>	 (03open) 10aborrero: Revert "secgroups: enable delete_default_rules" [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/53 (https://phabricator.wikimedia.org/T375111)
[15:41:41] <wikibugs>	 (03merge) 10aborrero: Revert "secgroups: enable delete_default_rules" [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/53 (https://phabricator.wikimedia.org/T375111)
[15:42:07] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch
[15:43:00] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch
[15:43:08] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for main branch
[15:43:28] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for main branch
[15:47:50] <wikibugs>	 10Toolforge (Toolforge iteration 14): [jobs-api,jobs-cli] Support multiple replicas of continuous jobs - https://phabricator.wikimedia.org/T341066#10168072 (10Raymond_Ndibe) 05In progress→03Resolved
[15:48:18] <wikibugs>	 (03open) 10aborrero: secgroups: default: remove port ranges and fix protocol [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/54
[15:52:17] <wikibugs>	 (03merge) 10aborrero: secgroups: default: remove port ranges and fix protocol [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/54
[15:52:30] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for main branch
[15:52:54] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for main branch
[15:53:04] <wikibugs>	 10Striker: Concatenated URLs in toolinfo.json - https://phabricator.wikimedia.org/T345776#10168125 (10TBurmeister) I think I just encountered this bug when looking at the record in Toolhub for https://toolhub.wikimedia.org/tools/toolforge-tool-watch and reading through https://phabricator.wikimedia.org/T341379#9...
[15:54:42] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch
[15:55:18] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch
[16:07:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[16:09:26] <wikibugs>	 (03approved) 10dcaro: [toolforge-weld] move _display_message into toolforge weld [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/46 (owner: 10raymond-ndibe)
[16:10:16] <wikibugs>	 (03update) 10dcaro: [toolforge-weld] move _display_message into toolforge weld [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/46 (owner: 10raymond-ndibe)
[16:10:44] <wikibugs>	 06cloud-services-team: update labtestwiki user and password - https://phabricator.wikimedia.org/T328289#10168243 (10Ladsgroup) >>! In T328289#10157496, @fnegri wrote: > @Ladsgroup I stumbled upon this old task, not sure if it's still relevant, if yes I need more guidance :)  It depends on what are the plans for...
[16:18:11] <wikibugs>	 06cloud-services-team: update labtestwiki user and password - https://phabricator.wikimedia.org/T328289#10168271 (10fnegri) > Are there still use cases for it after removal of ldap from wikitech?  From my understanding, labtestwiki should no longer be needed after we complete the removal of LDAP. /cc @bd808 who...
[16:23:14] <wikibugs>	 (03approved) 10dcaro: [toolforge.kyverno] update kubeVersion [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/528 (https://phabricator.wikimedia.org/T359641) (owner: 10raymond-ndibe)
[16:23:15] <wikibugs>	 (03update) 10dcaro: [toolforge.kyverno] update kubeVersion [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/528 (https://phabricator.wikimedia.org/T359641) (owner: 10raymond-ndibe)
[16:26:09] <wikibugs>	 06cloud-services-team: update labtestwiki user and password - https://phabricator.wikimedia.org/T328289#10168330 (10fnegri) > or just for testing LDAP.  To clarify, the only usage of labtestwikitech that I am aware of is to manage LDAP users in the [testing deployment for Cloud VPS](https://wikitech.wikimedia.or...
[16:28:25] <wikibugs>	 10Tool-video-answer-tool, 06Future-Audiences: FA community call video demo - https://phabricator.wikimedia.org/T374878#10168335 (10Maryana) Goal: have 3 videos to show (in order to demonstrate range of topics this tool could cover). Looking for well-performing DYKs well-suited to a young audience, with >3 imag...
[16:28:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The systemd unit wmf_auto_restart_virtlogd.service on node cloudvirt1063 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[16:29:02] <wikibugs>	 06cloud-services-team: SystemdUnitDown  Unit wmf_auto_restart_virtlogd.service on node cloudvirt1063 has been down for long. - https://phabricator.wikimedia.org/T375403 (10phaultfinder) 03NEW
[16:31:58] <wikibugs>	 (03unapproved) 10dcaro: [toolforge-weld] move _display_message into toolforge weld [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/46 (owner: 10raymond-ndibe)
[16:48:08] <wikibugs>	 06cloud-services-team, 10wikitech.wikimedia.org, 07Epic: Set up a bitu instance for codfw1dev - https://phabricator.wikimedia.org/T360795#10168408 (10Ladsgroup) FWIW, getting this deployed simplifies a lot of database stack see {T328289} for example.
[16:50:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[16:58:43] <wikibugs>	 (03approved) 10dcaro: [toolforge-weld] move _display_message into toolforge weld [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/46 (owner: 10raymond-ndibe)
[17:00:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[17:11:28] <wikibugs>	 (03Abandoned) 10David Caro: ceph.osd.undrain_rack: undrain osds from different hosts when able [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075043 (owner: 10David Caro)
[17:13:23] <wikibugs>	 (03CR) 10David Caro: "@legoktm@debian.org is there anything I can help with to get this patch merged?" [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1053538 (owner: 10David Caro)
[17:14:53] <wikibugs>	 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Investigate different options for animation of images - https://phabricator.wikimedia.org/T374367#10168494 (10Maryana) Make a dev endpoint to deploy these changes
[17:18:04] <wikibugs>	 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Investigate different options for animation of images - https://phabricator.wikimedia.org/T374367#10168509 (10Maryana) Ping @Maryana & Lucas on Slack when ready to get feedback
[17:28:51] <wikibugs>	 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Investigate options for pulling more relevant images for video - https://phabricator.wikimedia.org/T374557#10168593 (10Maryana) 05Open→03Resolved
[17:32:39] <wikibugs>	 10Tool-video-answer-tool, 06Future-Audiences: FA community call video demo - https://phabricator.wikimedia.org/T374878#10168639 (10Maryana)
[17:32:45] <wikibugs>	 10Tool-video-answer-tool, 06Future-Audiences: FA community call video demo - https://phabricator.wikimedia.org/T374878#10168637 (10Maryana) a:05Maryana→03None
[17:32:50] <wikibugs>	 (03PS1) 10Majavah: t5: Fix condition [labs/tools/majavah-bot] - 10https://gerrit.wikimedia.org/r/1075057
[17:33:40] <wikibugs>	 10Tool-video-answer-tool, 06Future-Audiences: Improvements to video server-side rendering - https://phabricator.wikimedia.org/T375408 (10Maryana) 03NEW
[17:35:03] <wikibugs>	 (03CR) 10Majavah: [C:03+2] t5: Fix condition [labs/tools/majavah-bot] - 10https://gerrit.wikimedia.org/r/1075057 (owner: 10Majavah)
[17:36:47] <wikibugs>	 (03Merged) 10jenkins-bot: t5: Fix condition [labs/tools/majavah-bot] - 10https://gerrit.wikimedia.org/r/1075057 (owner: 10Majavah)
[18:20:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[18:30:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[19:38:50] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_rack (exit_code=99)
[19:38:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[22:00:34] <jinxer-wm>	 FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.948% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[22:06:25] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1063 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:08:25] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1063 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[22:21:21] <wikibugs>	 10Tool-Global-user-contributions, 10Special:GlobalContributions, 06Stewards-and-global-tools, 07Epic, and 2 others: [Epic] Implement global contributions feature - https://phabricator.wikimedia.org/T337089#10169562 (10KColeman-WMF)
[22:23:43] <wikibugs>	 10Toolforge (Quota-requests): Request increased quota for video-answer-tool-staging Toolforge tool - https://phabricator.wikimedia.org/T375446 (10etz) 03NEW
[23:05:03] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-65 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[23:10:03] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-65 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[23:15:03] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-65 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[23:20:03] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-65 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess