[00:00:29] 10Cloud-Services, 10Lingua-Libre: Migrate from WMFR-OVH server to WMF Toolforge or WMF Cloud VPS ? - https://phabricator.wikimedia.org/T385064#10506917 (10Yug) [00:02:25] 06cloud-services-team, 10Cloud-VPS (Project-requests), 10Toolforge, 10Lingua-Libre: Migrate from WMFR-OVH server to WMF Toolforge or WMF Cloud VPS ? - https://phabricator.wikimedia.org/T385064#10506920 (10Yug) [00:05:39] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [00:06:26] 06cloud-services-team, 10Cloud-VPS (Project-requests), 10Toolforge, 10Lingua-Libre: Migrate from WMFR-OVH server to WMF Toolforge or WMF Cloud VPS ? - https://phabricator.wikimedia.org/T385064#10506943 (10Yug) Hello @Herald . I look for guidance for now, Toolforge or CloudVPS, those both teams can help.... [00:14:39] 06cloud-services-team, 10Cloud-VPS (Project-requests), 10Toolforge, 10Lingua-Libre: Migrate from WMFR-OVH server to WMF Toolforge or WMF Cloud VPS ? - https://phabricator.wikimedia.org/T385064#10506961 (10Yug) [00:17:04] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1067.eqiad.wmnet}' (T384946) [00:21:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:55:49] 06cloud-services-team, 10Toolforge, 10Lingua-Libre: Migrate from WMFR-OVH server to WMF Toolforge or WMF Cloud VPS ? - https://phabricator.wikimedia.org/T385064#10507025 (10bd808) This isn't a #cloud-vps-project-requests formatted ticket, so I'm removing that tag. [01:01:25] 06cloud-services-team, 10Toolforge, 10Lingua-Libre: Migrate from WMFR-OVH server to WMF Toolforge or WMF Cloud VPS ? - https://phabricator.wikimedia.org/T385064#10507036 (10bd808) >>! In T385064#10506859, @Yug wrote: >> Is docker available on toolforge ? > > No, **Docker is not available on Wikimedia Toolf... [01:01:41] RESOLVED: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:23:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-45 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [01:27:07] FIRING: KernelErrors: Server cloudbackup1002-dev logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [01:28:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-45 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [01:50:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:51:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-75 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [02:08:30] RESOLVED: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [02:11:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-75 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [02:17:42] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99) on hosts matched by 'D{cloudvirt1067.eqiad.wmnet}' (T384946) [02:19:26] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:47:25] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1067.eqiad.wmnet}' (T384946) [03:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:20:46] 10ToolforgeBundle: Upgrade ToolforgeBundle to Symfony 7 - https://phabricator.wikimedia.org/T361554#10507181 (10Samwilson) 05Open→03Resolved a:03Samwilson I think this is all done now. [03:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:32:40] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [04:48:03] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99) on hosts matched by 'D{cloudvirt1067.eqiad.wmnet}' (T384946) [04:57:10] 06cloud-services-team, 10Toolforge, 06Community-Tech, 10WS Export: Add 'Content-Length' in ws-export HTTP Response - https://phabricator.wikimedia.org/T384803#10507232 (10Samwilson) Yes, I was thinking something the same. I thought Content-Length would only be removed if we were sending e.g. `Transfer-Enco... [05:27:07] FIRING: KernelErrors: Server cloudbackup1002-dev logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [07:21:31] FIRING: ToolsNFSDown: No tools nfs services running found - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNFSDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNFSDown [07:26:31] RESOLVED: ToolsNFSDown: No tools nfs services running found - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNFSDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNFSDown [07:59:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-45 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:04:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-45 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:09:33] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-45 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [08:27:00] 10Tool-refill: Refill tool stuck "waiting for an available worker" - https://phabricator.wikimedia.org/T385100#10507360 (10Danners430) Appears to have been resolved. [08:32:48] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-75 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:39:58] 10Tool-refill: Refill tool stuck "waiting for an available worker" - https://phabricator.wikimedia.org/T385100#10507411 (10TheresNoTime) 05Open→03Resolved Good to hear :-) [09:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:27:07] FIRING: KernelErrors: Server cloudbackup1002-dev logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [09:51:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-45 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:56:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-45 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [10:05:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:41:17] 06cloud-services-team: KernelErrors - https://phabricator.wikimedia.org/T385097#10507734 (10fnegri) Bulk-resolving a set of "KernelErrors" alerts that were triggered by server reboots (T384946). [10:41:23] 06cloud-services-team: KernelErrors Server cloudservices1005 logged kernel errors - https://phabricator.wikimedia.org/T385095#10507738 (10fnegri) Bulk-resolving a set of "KernelErrors" alerts that were triggered by server reboots (T384946). [10:41:29] 06cloud-services-team: KernelErrors Server cloudservices1006 logged kernel errors - https://phabricator.wikimedia.org/T385094#10507742 (10fnegri) Bulk-resolving a set of "KernelErrors" alerts that were triggered by server reboots (T384946). [10:41:34] 06cloud-services-team: KernelErrors Server cloudvirtlocal1001 logged kernel errors - https://phabricator.wikimedia.org/T385091#10507746 (10fnegri) Bulk-resolving a set of "KernelErrors" alerts that were triggered by server reboots (T384946). [10:41:40] 06cloud-services-team: KernelErrors Server cloudvirtlocal1002 logged kernel errors - https://phabricator.wikimedia.org/T385088#10507750 (10fnegri) Bulk-resolving a set of "KernelErrors" alerts that were triggered by server reboots (T384946). [10:41:46] 06cloud-services-team: KernelErrors Server cloudvirtlocal1003 logged kernel errors - https://phabricator.wikimedia.org/T385087#10507754 (10fnegri) Bulk-resolving a set of "KernelErrors" alerts that were triggered by server reboots (T384946). [10:41:52] 06cloud-services-team: KernelErrors Server cloudrabbit1002 logged kernel errors - https://phabricator.wikimedia.org/T385085#10507758 (10fnegri) Bulk-resolving a set of "KernelErrors" alerts that were triggered by server reboots (T384946). [10:41:58] 06cloud-services-team: KernelErrors Server cloudrabbit1001 logged kernel errors - https://phabricator.wikimedia.org/T385083#10507762 (10fnegri) Bulk-resolving a set of "KernelErrors" alerts that were triggered by server reboots (T384946). [10:42:04] 06cloud-services-team: KernelErrors Server cloudcontrol1007 logged kernel errors - https://phabricator.wikimedia.org/T385079#10507766 (10fnegri) Bulk-resolving a set of "KernelErrors" alerts that were triggered by server reboots (T384946). [10:42:10] 06cloud-services-team: KernelErrors Server cloudcontrol1006 logged kernel errors - https://phabricator.wikimedia.org/T385075#10507770 (10fnegri) Bulk-resolving a set of "KernelErrors" alerts that were triggered by server reboots (T384946). [10:42:16] 06cloud-services-team: KernelErrors Server cloudcontrol1005 logged kernel errors - https://phabricator.wikimedia.org/T385074#10507774 (10fnegri) Bulk-resolving a set of "KernelErrors" alerts that were triggered by server reboots (T384946). [10:43:35] 06cloud-services-team: KernelErrors - https://phabricator.wikimedia.org/T385097#10507776 (10fnegri) 05Open→03Resolved a:03fnegri [10:43:36] 06cloud-services-team: KernelErrors Server cloudservices1005 logged kernel errors - https://phabricator.wikimedia.org/T385095#10507778 (10fnegri) 05Open→03Resolved a:03fnegri [10:43:37] 06cloud-services-team: KernelErrors Server cloudservices1006 logged kernel errors - https://phabricator.wikimedia.org/T385094#10507780 (10fnegri) 05Open→03Resolved a:03fnegri [10:43:39] 06cloud-services-team: KernelErrors Server cloudvirtlocal1001 logged kernel errors - https://phabricator.wikimedia.org/T385091#10507782 (10fnegri) 05Open→03Resolved a:03fnegri [10:43:40] 06cloud-services-team: KernelErrors Server cloudvirtlocal1002 logged kernel errors - https://phabricator.wikimedia.org/T385088#10507784 (10fnegri) 05Open→03Resolved a:03fnegri [10:43:41] 06cloud-services-team: KernelErrors Server cloudrabbit1002 logged kernel errors - https://phabricator.wikimedia.org/T385085#10507788 (10fnegri) 05Open→03Resolved a:03fnegri [10:43:44] 06cloud-services-team: KernelErrors Server cloudvirtlocal1003 logged kernel errors - https://phabricator.wikimedia.org/T385087#10507786 (10fnegri) 05Open→03Resolved a:03fnegri [10:43:48] 06cloud-services-team: KernelErrors Server cloudrabbit1001 logged kernel errors - https://phabricator.wikimedia.org/T385083#10507790 (10fnegri) 05Open→03Resolved a:03fnegri [10:43:52] 06cloud-services-team: KernelErrors Server cloudcontrol1006 logged kernel errors - https://phabricator.wikimedia.org/T385075#10507794 (10fnegri) 05Open→03Resolved a:03fnegri [10:43:56] 06cloud-services-team: KernelErrors Server cloudcontrol1007 logged kernel errors - https://phabricator.wikimedia.org/T385079#10507792 (10fnegri) 05Open→03Resolved a:03fnegri [10:44:00] 06cloud-services-team: KernelErrors Server cloudcontrol1005 logged kernel errors - https://phabricator.wikimedia.org/T385074#10507796 (10fnegri) 05Open→03Resolved a:03fnegri [10:44:58] 06cloud-services-team: NovafullstackSustainedFailures Novafullstack tests have been failing for more than 5hours in eqiad - https://phabricator.wikimedia.org/T385123#10507799 (10fnegri) 05Open→03Resolved a:03fnegri This is not firing anymore, it was probably caused by the cloudcontrol reboots in T384946. [11:18:44] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 17): [lima-kilo] some containers are not restarting when restarting the VM - https://phabricator.wikimedia.org/T385082#10507944 (10fnegri) Sometimes haproxy is able to connect to //some// nodes but not all: ` [WARNING] 029/110904 (8)... [11:27:39] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 17): [lima-kilo] some containers are not restarting when restarting the VM - https://phabricator.wikimedia.org/T385082#10507994 (10aborrero) I was trying to reproduce this, but find it impossible to create deployment using the default... [11:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:53:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-45 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [12:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:27:25] PROBLEM - Host cloudvirt1032 is DOWN: PING CRITICAL - Packet loss = 100% [12:30:51] RECOVERY - Host cloudvirt1032 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [12:30:51] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1032.eqiad.wmnet}' (T384946) [12:32:07] FIRING: [2x] KernelErrors: Server cloudvirt1031 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [12:32:18] 06cloud-services-team: KernelErrors - https://phabricator.wikimedia.org/T385165 (10phaultfinder) 03NEW [12:33:06] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1033.eqiad.wmnet}' (T384946) [12:37:33] (03update) 10raymond-ndibe: Draft: [jobs-api] use pydantic for core job model [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T359804) [12:37:37] (03update) 10raymond-ndibe: Draft: [jobs-api] use pydantic for core job model [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T359804) [12:37:52] (03update) 10raymond-ndibe: Draft: [jobs-api] use pydantic for core models [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T359804) [12:38:25] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1033.eqiad.wmnet}' (T384946) [12:39:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1034.eqiad.wmnet}' (T384946) [12:45:08] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1034.eqiad.wmnet}' (T384946) [12:46:39] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1035.eqiad.wmnet}' (T384946) [12:47:07] RESOLVED: KernelErrors: Server cloudcephosd1022 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1022 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [12:47:07] FIRING: KernelErrors: Server cloudvirt1034 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudvirt1034 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [12:47:17] 06cloud-services-team: KernelErrors Server cloudvirt1034 logged kernel errors - https://phabricator.wikimedia.org/T385166 (10phaultfinder) 03NEW [12:48:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-45 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [12:51:33] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1035.eqiad.wmnet}' (T384946) [12:51:43] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1036.eqiad.wmnet}' (T384946) [12:54:31] FIRING: [2x] KernelErrors: Server cloudvirt1034 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [12:54:36] 06cloud-services-team: KernelErrors - https://phabricator.wikimedia.org/T385165#10508179 (10phaultfinder) [13:34:39] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [13:35:20] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [13:35:26] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [13:36:06] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [13:36:25] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [13:37:05] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [13:37:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [13:37:55] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [13:38:04] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [13:38:40] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [14:25:43] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [14:26:56] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10508481 (10gmodena) >>! In T376267#10501910, @Reedy wrote: >>>! In T376267#10501808, @gmodena wrote: >> |**Wikitech account/LDAP:**| gmodena| >> |**SUL account**| GModena (WMF)| >> |**Accoun... [14:28:53] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 17): [lima-kilo] some containers are not restarting when restarting the VM - https://phabricator.wikimedia.org/T385082#10508506 (10fnegri) Related upstream issue: https://github.com/kubernetes-sigs/kind/issues/2045: > I don't recommend... [14:29:11] 06cloud-services-team, 10Cloud-VPS, 10Observability-Metrics, 10SRE Observability (FY2024/2025-Q3): Remove librenms -> graphite integration, replace with gnmi - https://phabricator.wikimedia.org/T372457#10508508 (10fgiunchedi) [14:30:43] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [14:32:22] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10508540 (10Ladsgroup) I renamed and force attached the user [14:33:05] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=97) on hosts matched by 'D{cloudvirt1036.eqiad.wmnet}' (T384946) [14:33:09] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1036.eqiad.wmnet}' (T384946) [14:45:06] 06cloud-services-team, 10Observability-Alerting, 10SRE Observability (FY2024/2025-Q3): cloud: prometheus: investigate weirdness with metrics and alertmanager - https://phabricator.wikimedia.org/T374599#10508674 (10fgiunchedi) [14:46:22] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=97) on hosts matched by 'D{cloudvirt1036.eqiad.wmnet}' (T384946) [15:00:28] FIRING: InstanceDown: Project toolsbeta instance toolsbeta-acme-chief-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:05:28] FIRING: [2x] InstanceDown: Project toolsbeta instance toolsbeta-acme-chief-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:13:15] (03open) 10fnegri: Create single-node clusters by default [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/223 (https://phabricator.wikimedia.org/T385082) [15:22:45] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: [wikireplicas] Create views for new wiki kncwiki - https://phabricator.wikimedia.org/T385188#10509026 (10fnegri) p:05Triage→03Medium a:03fnegri [15:25:28] FIRING: [2x] InstanceDown: Project toolsbeta instance toolsbeta-acme-chief-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:35:28] FIRING: [2x] InstanceDown: Project toolsbeta instance toolsbeta-acme-chief-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:38:22] (03open) 10aminalhazwani: basic styling [toolforge-repos/stealthispage] - 10https://gitlab.wikimedia.org/toolforge-repos/stealthispage/-/merge_requests/1 [15:42:27] (03merge) 10dancy: basic styling [toolforge-repos/stealthispage] - 10https://gitlab.wikimedia.org/toolforge-repos/stealthispage/-/merge_requests/1 (owner: 10aminalhazwani) [15:51:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-static-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:56:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-proxy-5 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:01:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-proxy-5 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:06:28] FIRING: [7x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-prometheus-1 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:38:45] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: [wikireplicas] Create views for new wiki kncwiki - https://phabricator.wikimedia.org/T385188#10509310 (10Pppery) [17:46:54] 06cloud-services-team, 10Toolforge, 10Lingua-Libre: Clean up Toolforge directory tools.lingua-libre ? - https://phabricator.wikimedia.org/T385124#10509742 (10Yug) [17:51:03] 06cloud-services-team, 10Toolforge, 10Lingua-Libre: Clean up Toolforge directory tools.lingua-libre ? - https://phabricator.wikimedia.org/T385124#10509761 (10Yug) >>! In T385124#10506783, @bd808 wrote: Hello Bob, I don't think our Django web app requires "Wiki Replica and ToolsDB database services" (which,... [17:55:28] RESOLVED: [2x] InstanceDown: Project toolsbeta instance toolsbeta-acme-chief-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:56:22] PROBLEM - nova-compute proc minimum on cloudvirt1036 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [17:57:22] RECOVERY - nova-compute proc minimum on cloudvirt1036 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:05:23] 06cloud-services-team, 10Toolforge, 10Lingua-Libre: Clean up Toolforge directory tools.lingua-libre ? - https://phabricator.wikimedia.org/T385124#10509853 (10Yug) > What are Wikimedia Toolforge's Wiki Replica and ToolsDB database services ? Wikimedia Toolforge provides two main database services: ## **1.... [18:06:28] FIRING: [7x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-prometheus-1 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:11:06] 06cloud-services-team, 10Toolforge, 10Lingua-Libre: Clean up Toolforge directory tools.lingua-libre ? - https://phabricator.wikimedia.org/T385124#10509870 (10Yug) 05Open→03Resolved a:03Yug @bd808 hello, I think I understand better thank to you pointing the file purpose and to LLM's summary. I will... [18:15:46] 06cloud-services-team, 10Lingua-Libre: Clean up Toolforge directory tools.lingua-libre ? - https://phabricator.wikimedia.org/T385124#10509890 (10JJMC89) [18:16:28] FIRING: [7x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-prometheus-1 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:17:07] RESOLVED: [2x] KernelErrors: Server cloudcontrol1005 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [18:17:20] PROBLEM - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out - string OK not found on http://checker.tools.wmflabs.org:80/nfs/home - 324 bytes in 60.007 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [18:17:50] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:19:18] RECOVERY - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 53.707 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [18:21:28] RESOLVED: [7x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-prometheus-1 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:30:23] (03open) 10bd808: Use $TOOL_DATA_DIR as root storage path if set [toolforge-repos/stealthispage] - 10https://gitlab.wikimedia.org/toolforge-repos/stealthispage/-/merge_requests/2 [18:34:28] FIRING: InstanceDown: Project toolsbeta instance toolsbeta-acme-chief-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:39:28] RESOLVED: InstanceDown: Project toolsbeta instance toolsbeta-acme-chief-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:57:50] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:58:31] (03merge) 10dancy: Use $TOOL_DATA_DIR as root storage path if set [toolforge-repos/stealthispage] - 10https://gitlab.wikimedia.org/toolforge-repos/stealthispage/-/merge_requests/2 (owner: 10bd808) [19:02:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-test-k8s-control-12 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:19:29] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10510145 (10VRiley-WMF) [19:40:20] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [19:41:01] (03open) 10dancy: app.py: remove testing hacks [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/3 [19:41:05] (03update) 10dancy: app.py: remove testing hacks [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/3 [19:41:18] (03merge) 10dancy: app.py: remove testing hacks [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/3 [19:42:02] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10510197 (10VRiley-WMF) [19:47:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-test-k8s-control-12 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:47:31] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [19:53:28] (03open) 10dancy: app.py: Drop SERVER_URL, use Referer header [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/4 [19:53:30] (03update) 10dancy: app.py: Drop SERVER_URL, use Referer header [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/4 [19:53:50] (03merge) 10dancy: app.py: Drop SERVER_URL, use Referer header [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/4 [19:55:21] FIRING: MaintainKubeusersDown: maintain-kubeusers is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainKubeusersDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DMaintainKubeusersDown [20:13:28] 06cloud-services-team, 10Lingua-Libre: Clean up Toolforge directory tools.lingua-libre ? - https://phabricator.wikimedia.org/T385124#10510271 (10bd808) 05Resolved→03Invalid [20:22:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:24:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-cumin-1 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [20:32:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:39:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-cumin-1 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [20:44:32] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [20:44:40] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [20:53:59] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [20:54:08] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [20:55:04] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/29 (owner: 10l10n-bot) [20:55:07] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/29 (owner: 10l10n-bot) [21:03:28] FIRING: InstanceDown: Project toolsbeta instance toolsbeta-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [21:08:28] RESOLVED: InstanceDown: Project toolsbeta instance toolsbeta-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [21:08:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-redis-4 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:13:28] FIRING: [3x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-redis-4 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:21:35] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1036.eqiad.wmnet}' (T384946) [21:23:28] FIRING: [3x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-redis-4 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:28:28] RESOLVED: [3x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-redis-4 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:37:03] (03PS1) 10Andrew Bogott: restart_openstack: restart cinder-api [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1115509 [21:39:43] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=97) on hosts matched by 'D{cloudvirt1036.eqiad.wmnet}' (T384946) [21:40:15] (03open) 10dancy: app.py: Add a comment [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/5 [21:40:16] (03update) 10dancy: app.py: Add a comment [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/5 [21:40:16] (03open) 10dancy: README.md: stealthispage -> sitesampler [toolforge-repos/sitesampler] (master-I91026c2134c5026c3622386d2bdc10bd5698e17e) - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/6 [21:40:16] (03update) 10dancy: README.md: stealthispage -> sitesampler [toolforge-repos/sitesampler] (master-I91026c2134c5026c3622386d2bdc10bd5698e17e) - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/6 [21:40:17] (03open) 10dancy: Pull in files from https://gitlab.wikimedia.org/gengh/sprinthackular-2025 [toolforge-repos/sitesampler] (master-I4bfe753ece35b7935560d69797c5d24be2f5736d) - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/7 [21:40:18] (03update) 10dancy: Pull in files from https://gitlab.wikimedia.org/gengh/sprinthackular-2025 [toolforge-repos/sitesampler] (master-I4bfe753ece35b7935560d69797c5d24be2f5736d) - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/7 [21:40:22] (03update) 10dancy: app.py: Add a comment [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/5 [21:40:26] (03update) 10dancy: README.md: stealthispage -> sitesampler [toolforge-repos/sitesampler] (master-I91026c2134c5026c3622386d2bdc10bd5698e17e) - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/6 [21:40:30] (03update) 10dancy: Pull in files from https://gitlab.wikimedia.org/gengh/sprinthackular-2025 [toolforge-repos/sitesampler] (master-I4bfe753ece35b7935560d69797c5d24be2f5736d) - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/7 [21:40:51] (03merge) 10dancy: app.py: Add a comment [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/5 [21:40:56] (03update) 10dancy: README.md: stealthispage -> sitesampler [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/6 [21:41:35] (03merge) 10dancy: README.md: stealthispage -> sitesampler [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/6 [21:41:36] (03update) 10dancy: Pull in files from https://gitlab.wikimedia.org/gengh/sprinthackular-2025 [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/7 [21:42:14] (03update) 10dancy: Pull in files from https://gitlab.wikimedia.org/gengh/sprinthackular-2025 [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/7 [21:48:49] (03merge) 10dancy: Pull in files from https://gitlab.wikimedia.org/gengh/sprinthackular-2025 [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/7 [21:52:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-redis-6 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:56:11] FIRING: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [21:57:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-74 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:01:11] RESOLVED: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [22:07:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-redis-6 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:11:54] (03open) 10bd808: Enable Cross-Origin Resource Sharing (CORS) [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/8 [22:12:37] (03update) 10bd808: Enable Cross-Origin Resource Sharing (CORS) [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/8 [22:13:33] (03open) 10annet: Add "save" option [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/9 [22:16:10] (03merge) 10dancy: Add "save" option [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/9 (owner: 10annet) [22:18:34] (03merge) 10dancy: Enable Cross-Origin Resource Sharing (CORS) [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/8 (owner: 10bd808) [22:22:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-74 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:29:11] FIRING: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [22:34:11] RESOLVED: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [22:36:57] FIRING: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [22:58:41] RESOLVED: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [23:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:27:11] FIRING: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [23:30:56] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:32:11] RESOLVED: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [23:44:49] (03open) 10bd808: Fixups for sideloading as a userscript [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/10 [23:50:55] (03merge) 10bd808: Fixups for sideloading as a userscript [toolforge-repos/sitesampler] - 10https://gitlab.wikimedia.org/toolforge-repos/sitesampler/-/merge_requests/10