[01:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:12:38] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-nfs-7 [03:17:56] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-7 [03:18:26] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-nfs-7 [03:23:39] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-7 [03:38:36] (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] kyverno do not validate DELETE operations [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/62 (https://phabricator.wikimedia.org/T375157) [03:40:38] (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] kyverno do not validate DELETE operations [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/62 (https://phabricator.wikimedia.org/T375157) [03:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:11:06] (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] kyverno do not validate DELETE operations [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/62 (https://phabricator.wikimedia.org/T375157) [04:50:34] (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] kyverno do not validate DELETE operations [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/62 (https://phabricator.wikimedia.org/T375157) [05:10:15] (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] kyverno do not validate DELETE operations [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/62 (https://phabricator.wikimedia.org/T375157) [05:19:10] (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] kyverno do not validate DELETE operations [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/62 (https://phabricator.wikimedia.org/T375157) [09:01:09] PROBLEM - Host cloudvirt1063 is DOWN: PING CRITICAL - Packet loss = 100% [09:04:56] 10VPS-project-Codesearch: Codesearch should index mw-node-qunit on Github - https://phabricator.wikimedia.org/T375079#10165561 (10Ebrahim) 05Open→03Resolved a:03Ebrahim Now MwUri reaches to a result at least so this is fixed, thanks to @Ladsgroup for letting we know where this should be fixed from. {F... [09:06:29] FIRING: InstanceDown: Project tools instance tools-k8s-worker-nfs-66 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:06:47] FIRING: NodeDown: Cloudvirt node cloudvirt1063 is down. #page - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [09:06:55] 06cloud-services-team: NodeDown - https://phabricator.wikimedia.org/T375223#10165565 (10phaultfinder) [09:12:23] FIRING: ToolforgeKubernetesNodeNotReady: Kubernetes node tools-k8s-worker-nfs-66 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [09:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:41:46] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1063.eqiad.wmnet' [09:42:54] !log fnegri@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1063.eqiad.wmnet' [09:56:26] 06cloud-services-team, 10Cloud-VPS: NodeDown - https://phabricator.wikimedia.org/T375223#10165604 (10fnegri) 05Open→03In progress p:05Triage→03High a:03fnegri The server stopped responding today at 9:00 UTC (according to Grafana). The same server failed a few months ago: {T368093} I'm setting it to... [09:56:34] 06cloud-services-team, 10Cloud-VPS: NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10165611 (10fnegri) [09:56:50] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10165612 (10fnegri) [10:02:46] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10165616 (10fnegri) ` root@cloudcontrol1005:~# openstack server list --host cloudvirt1063 --all-projects +------------------------------+---------------------------+--------+--------------... [10:19:37] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10165618 (10fnegri) ` root@cloudcontrol1005:~# nova --os-username novaadmin --os-project-name admin --os-auth-url "https://openstack.eqiad1.wikimediacloud.org:25357/v3" --os-password XXXXX... [10:21:29] RESOLVED: InstanceDown: Project tools instance tools-k8s-worker-nfs-66 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:32:23] RESOLVED: ToolforgeKubernetesNodeNotReady: Kubernetes node tools-k8s-worker-nfs-66 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [10:37:21] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10165619 (10fnegri) `nova host-evacuate cloudvirt1063` did move all the above VMs to other cloudvirts, but they are now in `status=SHUTOFF`. I restarted them manually with `openstack serv... [10:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:01:47] FIRING: NodeDownForLong: The node cloudvirt1063 has been unreachable for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1063 - https://alerts.wikimedia.org/?q=alertname%3DNodeDownForLong [11:01:58] 06cloud-services-team: NodeDownForLong Node cloudvirt1063 has been down for long. - https://phabricator.wikimedia.org/T375323 (10phaultfinder) 03NEW [11:42:10] 06cloud-services-team: NodeDownForLong Node cloudvirt1063 has been down for long. - https://phabricator.wikimedia.org/T375323#10165631 (10RhinosF1) →14Duplicate dup:03T375223 [11:43:53] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10165634 (10RhinosF1) [13:07:02] 06cloud-services-team, 10Toolforge, 10Pywikibot: unexpected root-owned files in /data/project/pywikibot/public_html - https://phabricator.wikimedia.org/T375279#10165653 (10valhallasw) Some more thoughts: - Files were created on Tue 17 Sept 2024 at 15:30 (latest) and Wed 4 Sept 03:00 (.old.20240907 folder).... [13:41:06] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#10165664 (10dcaro) Not yet, I'm still draining the C8 rack, early next week I'll have something [14:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:33:20] 10Toolforge: Pywikibot tasks on Toolforge fail - https://phabricator.wikimedia.org/T375325 (10Ashot1997) 03NEW [14:35:52] 10Toolforge: Pywikibot tasks on Toolforge fail - https://phabricator.wikimedia.org/T375325#10165688 (10Ashot1997) [15:01:37] 10Toolforge: Pywikibot tasks on Toolforge fail - https://phabricator.wikimedia.org/T375325#10165691 (10valhallasw) This may be related to T375279 somehow: it started around the same moment (17 Sept at 15:30 UTC) and also seems to have an odd issue with file ownership. Specifically, the out file for this job (and... [15:31:15] 10Toolforge: Pywikibot tasks on Toolforge fail - https://phabricator.wikimedia.org/T375325#10165706 (10Ashot1997) @valhallasw [[ https://hy.wikipedia.org/w/index.php?title=Reverso&diff=prev&oldid=10113541 | it worked ]]. Thanks a lot. [16:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:24:25] 06cloud-services-team, 10Toolforge, 10Pywikibot: unexpected root-owned files in /data/project/pywikibot/public_html - https://phabricator.wikimedia.org/T375279#10165715 (10JJMC89) [16:24:38] 10Toolforge: Pywikibot tasks on Toolforge fail - https://phabricator.wikimedia.org/T375325#10165713 (10JJMC89) →14Duplicate dup:03T375279 [17:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks