[00:38:22] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10404985 [00:43:14] FIRING: KernelPanic: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelPanic [00:43:14] FIRING: KernelError: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelError [00:43:19] FIRING: KernelTaint: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelTaint [00:43:24] FIRING: KernelWarning: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelWarning [00:43:25] 06cloud-services-team: KernelError Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382220 (10phaultfinder) 03NEW [00:43:26] 06cloud-services-team: KernelPanic Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382221 (10phaultfinder) 03NEW [00:43:27] 06cloud-services-team: ProbeDown Service wan.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_wan_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T382222 (10phaultfinder) 03NEW [00:43:30] FIRING: ProbeDown: Service wan.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_wan_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:48:15] RESOLVED: ProbeDown: Service wan.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_wan_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:04:27] FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [01:29:27] RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [04:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:43:14] FIRING: KernelPanic: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelPanic [04:43:14] FIRING: KernelError: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelError [04:43:19] FIRING: KernelTaint: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelTaint [04:43:24] FIRING: KernelWarning: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelWarning [05:28:46] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10405074 [07:09:27] 10Tool-wikiqanda, 06Future-Audiences: Add testing capabilities to Slack bot - https://phabricator.wikimedia.org/T379029#10405098 (10DLin-WMF) 05Open→03Resolved a:03DLin-WMF [08:43:14] FIRING: KernelPanic: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelPanic [08:43:14] FIRING: KernelError: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelError [08:43:19] FIRING: KernelTaint: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelTaint [08:43:24] FIRING: KernelWarning: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelWarning [08:58:27] 10Tool-wikiqanda, 06Future-Audiences: Data collection for external release - https://phabricator.wikimedia.org/T380780#10405233 (10DLin-WMF) [08:59:46] 10Tool-wikiqanda, 06Future-Audiences: Data collection for external release - https://phabricator.wikimedia.org/T380780#10405236 (10DLin-WMF) [11:50:13] 06cloud-services-team, 10Toolforge, 07Epic: [Epic] Toolforge UI: Discovery - https://phabricator.wikimedia.org/T375914#10405786 (10Sarai-WMF) [12:20:14] 10VPS-Projects, 10fundraising-tech-ops, 10Puppet (Puppet 7.0): Update puppet civicrm-prototype puppetmaster - https://phabricator.wikimedia.org/T361595#10405852 (10MoritzMuehlenhoff) What's the status, is the old puppet 5 puppet master already out of service? [12:21:16] (03update) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/26 [12:22:36] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1104618 (owner: 10L10n-bot) [12:34:19] (03PS1) 10Btullis: cephosd: move the auth keydata into profile default [labs/private] - 10https://gerrit.wikimedia.org/r/1104620 (https://phabricator.wikimedia.org/T378735) [12:35:00] (03CR) 10Btullis: [V:03+2 C:03+2] cephosd: move the auth keydata into profile default [labs/private] - 10https://gerrit.wikimedia.org/r/1104620 (https://phabricator.wikimedia.org/T378735) (owner: 10Btullis) [12:43:14] FIRING: KernelPanic: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelPanic [12:43:14] FIRING: KernelError: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelError [12:43:19] FIRING: KernelTaint: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelTaint [12:43:24] FIRING: KernelWarning: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelWarning [12:52:45] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10405918 [15:19:44] 06cloud-services-team, 10Toolforge, 13Patch-For-Review, 07Puppet: Too many puppet facts on toolforge k8s workers - https://phabricator.wikimedia.org/T381293#10406389 (10Andrew) 05Open→03Resolved This warning is no longer displayed, and having lots of facts doesn't seem to actually break anything. [15:26:50] (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1104618 (owner: 10L10n-bot) [15:43:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-legacy-redirector-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:49:30] 06cloud-services-team, 06DC-Ops, 10ops-codfw, 06SRE: PowerSupplyFailure Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T380479#10406510 (10Jhancock.wm) Hey Andrew, let me know when you are free this week. [16:02:28] 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge webservices and bots - https://phabricator.wikimedia.org/T127367#10406539 (10rook) For anyone curious ` for namespace in $(kubectl get ns | tail -n +2 | awk '{print $1}') ; d... [16:06:43] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS (Quota-requests): Subdomain for catalyst-dev project - https://phabricator.wikimedia.org/T381508#10406554 (10fnegri) @Andrew found the error in the logs: ` Dec 16 16:05:03 proxy-03 uwsgi-invisible-unicorn[4026189]: 2024-12-16 16:05:03.950 4026189 ERRO... [16:07:38] 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge webservices and bots - https://phabricator.wikimedia.org/T127367#10406555 (10taavi) >>! In T127367#10406539, @rook wrote: > For anyone curious > ` > for namespace in $(kubect... [16:17:06] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS (Quota-requests): Subdomain for catalyst-dev project - https://phabricator.wikimedia.org/T381508#10406596 (10bd808) 05Resolved→03Open a:05bd808→03None Reopening. I imagine the first thing to triple check is the configuration I set in T381508#10381726. [16:17:44] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS (Quota-requests): Subdomain for catalyst-dev project - https://phabricator.wikimedia.org/T381508#10406599 (10fnegri) 05Open→03Resolved a:03fnegri The error was in the zone id in the puppet prefix, it was incorrectly set to the same value of `cata... [16:29:08] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10406639 [16:43:11] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS (Quota-requests): Subdomain for catalyst-dev project - https://phabricator.wikimedia.org/T381508#10406679 (10Andrew) a:05fnegri→03Andrew *now working correctly :D [16:43:14] FIRING: KernelPanic: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelPanic [16:43:14] FIRING: KernelError: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelError [16:43:14] FIRING: KernelTaint: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelTaint [16:43:19] FIRING: KernelWarning: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelWarning [16:44:51] 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge webservices and bots - https://phabricator.wikimedia.org/T127367#10406694 (10rook) Fair enough, do you have any estimate on how much those logs would account for in a day? [16:50:45] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS (Quota-requests): Subdomain for catalyst-dev project - https://phabricator.wikimedia.org/T381508#10406729 (10fnegri) > *now working correctly :D Ha, yes, thanks for finding the typo! :D [17:06:07] 06cloud-services-team, 10Cloud-VPS: 'backy2 cleanup' fails on cloudbackup1004 - https://phabricator.wikimedia.org/T381548#10406781 (10Andrew) [17:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:42:21] 10Tool-wikiqanda, 06Future-Audiences: Identify errors/issues from Slack release that may apply to Discord - https://phabricator.wikimedia.org/T381702#10406907 (10derenrich) a:03derenrich [18:05:41] 06cloud-services-team: KernelPanic Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382221#10407030 (10fnegri) →14Duplicate dup:03T382220 [18:05:42] 06cloud-services-team: KernelError Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382220#10407032 (10fnegri) [18:15:11] 06cloud-services-team: KernelError Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382220#10407068 (10fnegri) A kernel error was logged at 00:40 UTC last night: https://phabricator.wikimedia.org/P71714 It did not cause a reboot, but a bunch of these were logged afterwards, betwee... [18:20:53] 10cloud-services-team (FY2024/2025-Q1-Q2): KernelError Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382220#10407087 (10fnegri) p:05Triage→03High [18:30:06] 06cloud-services-team: ProbeDown Service wan.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_wan_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T382222#10407122 (10fnegri) [18:30:07] 10cloud-services-team (FY2024/2025-Q1-Q2): KernelError Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382220#10407123 (10fnegri) [18:30:16] 06cloud-services-team: ProbeDown Service wan.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_wan_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T382222#10407126 (10fnegri) p:05Triage→03High [18:33:44] 06cloud-services-team: ProbeDown Service wan.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_wan_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T382222#10407136 (10fnegri) This was likely caused by {T382220} and fired a few times last night. [18:33:53] 10cloud-services-team (FY2024/2025-Q1-Q2): ProbeDown Service wan.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_wan_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T382222#10407138 (10fnegri) [18:36:50] 10cloud-services-team (FY2024/2025-Q1-Q2): ProbeDown Service wan.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_wan_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T382222#10407145 (10fnegri) The average success rate for that probe seems to be deteriorating: {F58024279} [18:38:06] 10cloud-services-team (FY2024/2025-Q1-Q2): KernelError Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382220#10407150 (10Andrew) This is no longer happening, but is concerning! I'm mentally filing it in the same box as T374830 because it's one more reason I don't totally trust our... [18:48:54] 10cloud-services-team (FY2024/2025-Q1-Q2): ProbeDown Service wan.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_wan_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T382222#10407179 (10fnegri) Same graph over 9 months, the success rate is still decent at around 99%, but the... [19:23:06] 06cloud-services-team: SystemdUnitDown The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1002-dev has been failing for more than two hours. - https://phabricator.wikimedia.org/T381545#10407239 (10Andrew) 05Open→03Resolved this seems to be working now [20:02:30] (03update) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/26 (owner: 10l10n-bot) [20:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:41:17] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/26 (owner: 10l10n-bot) [20:41:24] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/26 (owner: 10l10n-bot) [21:19:19] 10cloud-services-team (FY2024/2025-Q1-Q2): ProbeDown Service wan.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_wan_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T382222#10407511 (10cmooney) I don't think the failure rate here is significant enough to warrant any concern... [21:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks