[04:06:56] FIRING: SystemdUnitDown: The service unit purge_vm_rbd_images.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:01:56] FIRING: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:02:03] 06cloud-services-team: SystemdUnitDown The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://phabricator.wikimedia.org/T382770 (10phaultfinder) 03NEW [08:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:01:56] FIRING: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:07:00] FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [10:08:22] FIRING: [12x] HAProxyBackendUnavailable: HAProxy service wikireplica-db-analytics-s4 backend clouddb1019.eqiad.wmnet is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [10:11:26] 06cloud-services-team, 10Data-Services, 06DBA: pagelinks table broken on s1 sanitarium - https://phabricator.wikimedia.org/T382771 (10Marostegui) 03NEW [10:13:22] RESOLVED: [12x] HAProxyBackendUnavailable: HAProxy service wikireplica-db-analytics-s4 backend clouddb1019.eqiad.wmnet is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [10:16:36] 06cloud-services-team, 10Data-Services, 06DBA: pagelinks table broken on s1 sanitarium - https://phabricator.wikimedia.org/T382771#10423358 (10Marostegui) I am rebuilding the table and see if it helps - it will take a while as it is almost 200GB [10:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:54:44] 06cloud-services-team, 10Data-Services, 06DBA: pagelinks table broken on s1 sanitarium - https://phabricator.wikimedia.org/T382771#10423376 (10Marostegui) p:05Triage→03Medium a:03Marostegui [11:35:58] 10Tool-globalcontribution: Modify frontend compatile with backend - https://phabricator.wikimedia.org/T382656#10423379 (10Anujagrawal) I'm working on this @Athulvis [14:01:56] FIRING: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:07:01] FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [14:51:32] 06cloud-services-team, 10Data-Services, 06DBA: pagelinks table broken on s1 sanitarium - https://phabricator.wikimedia.org/T382771#10423469 (10Marostegui) This has been fixed and the host is catching up. Running the same on an-redacttedb1001 [15:55:22] 06cloud-services-team, 10Data-Services, 06DBA: pagelinks table broken on s1 sanitarium - https://phabricator.wikimedia.org/T382771#10423497 (10Marostegui) 05Open→03Resolved clouddb* hosts caught up [16:14:13] 06cloud-services-team, 10Data-Services, 06DBA: pagelinks table broken on s1 sanitarium - https://phabricator.wikimedia.org/T382771#10423506 (10Ladsgroup) Thanks! [18:01:57] FIRING: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:07:01] FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [18:18:12] 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, 10Bitu, and 2 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10423540 (10Pppery) Anything left to do here? [18:44:56] 10Tool-spacemedia, 06Commons, 10MediaWiki-Action-API, 10MediaWiki-Uploading, 07Documentation: ApiUpload allows duplicates when ignorewarnings is set. - https://phabricator.wikimedia.org/T254060#10423626 (10Pppery) [22:01:57] FIRING: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:07:00] RESOLVED: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse