[00:16:29] <wikibugs>	 06Toolforge-standards-committee: Adoption request for image-metadata-viewer - https://phabricator.wikimedia.org/T338558#10043918 (10mdaniels5757) 05Open→03Resolved a:03mdaniels5757 In that case, I'm not interested. Thanks for looking!
[00:16:29] <wmcs-alerts>	 FIRING: InstanceDown: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:17:40] <wikibugs>	 06Toolforge-standards-committee: Adoption request for image-metadata-viewer - https://phabricator.wikimedia.org/T338558#10043921 (10Pppery) 05Resolved→03Declined
[00:18:05] <wikibugs>	 (03update) 10raymond-ndibe: [jobs-api] custom resource definition deployment templates [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/101 (https://phabricator.wikimedia.org/T359650)
[00:21:29] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:53:02] <wikibugs>	 (03update) 10raymond-ndibe: [jobs-api] custom resource definition deployment templates [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/101 (https://phabricator.wikimedia.org/T359650)
[01:12:55] <jinxer-wm>	 FIRING: MaxConntrack: Max conntrack at 82.15% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[01:17:55] <jinxer-wm>	 RESOLVED: MaxConntrack: Max conntrack at 81.52% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[02:06:54] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Update designate sink plugins to work with caracal - https://phabricator.wikimedia.org/T371707#10043968 (10Andrew) The remaining case is in wmf_sink, the proxy cleanup code. That code relies on being able to look up the dns records (and, by extension, t...
[03:15:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[03:26:03] <wikibugs>	 10Toolforge-standards-committee (Maintainer needed): New maintainer needed for Phetools OCR for Wikisource - https://phabricator.wikimedia.org/T239353#10044000 (10Soda) 05Open→03Declined Boldly closing this as declined, phetools OCR hasn't been up for a while (since 5 years?) and with the Grid engine shu...
[03:49:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[03:59:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[04:49:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[04:59:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[05:10:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[06:06:39] <wmcs-alerts>	 FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[06:11:39] <wmcs-alerts>	 RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[06:16:52] <wikibugs>	 (03update) 10raymond-ndibe: Draft: [jobs-api] custom resource definition deployment templates [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650)
[06:18:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[06:19:14] <wikibugs>	 (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650)
[06:29:27] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:29:38] <wikibugs>	 06cloud-services-team: ProbeDown - https://phabricator.wikimedia.org/T371867 (10phaultfinder) 03NEW
[06:34:27] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:54:27] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:59:27] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:07:45] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 06collaboration-services, 13Patch-For-Review: puppet problems mounting cinder volumes (and suggested fixes) - https://phabricator.wikimedia.org/T371573#10044129 (10LSobanski) p:05Triage→03Medium
[07:09:27] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:19:27] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:35:44] <wikibugs>	 06cloud-services-team, 06Data-Persistence, 10observability, 07Grafana: Grafana MySQL charts can be inconsistent when zooming out - https://phabricator.wikimedia.org/T371485#10044163 (10fgiunchedi) If going with `rate[$__rate_interval]` is acceptable I'd recommend going for that as the simplest solution, se...
[08:10:49] <jinxer-wm>	 FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1045 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[08:17:28] <wmcs-alerts>	 FIRING: PuppetAgentNoResources: No Puppet resources found on instance tools-k8s-worker-nfs-52 on project tools   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[08:24:27] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:29:27] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:30:49] <jinxer-wm>	 RESOLVED: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1045 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[08:32:28] <wmcs-alerts>	 RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance tools-k8s-worker-nfs-52 on project tools   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[08:35:57] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:45:57] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:54:09] <jinxer-wm>	 FIRING: CephSlowOps: Ceph cluster in eqiad has 37 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[08:54:17] <wikibugs>	 06cloud-services-team: CephSlowOps  Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T370752#10044353 (10phaultfinder)
[08:55:56] <jinxer-wm>	 RESOLVED: SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[09:02:28] <wmcs-alerts>	 FIRING: PuppetSyncFailure: Failed to update Puppet repository /srv/git/operations/puppet on instance toolsbeta-puppetserver-1 in project toolsbeta   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure
[09:02:58] <wmcs-alerts>	 RESOLVED: PuppetSyncFailure: Failed to update Puppet repository /srv/git/operations/puppet on instance metricsinfra-puppetserver-1 in project metricsinfra   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure
[09:03:28] <wmcs-alerts>	 FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[09:05:26] <jinxer-wm>	 RESOLVED: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[09:09:27] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:19:27] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:19:33] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console
[09:19:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:19:41] <wm-bot2>	 !log dcaro@urcuchillay tools END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255)
[09:19:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:19:57] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console
[09:19:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:20:02] <wm-bot2>	 !log dcaro@urcuchillay tools END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255)
[09:20:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:20:08] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console
[09:20:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:20:34] <wm-bot2>	 !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
[09:20:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:28:17] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console
[09:28:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:29:27] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:31:54] <jinxer-wm>	 RESOLVED: CephSlowOps: Ceph cluster in eqiad has 20 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[09:33:58] <wmcs-alerts>	 RESOLVED: PuppetSyncFailure: Failed to update Puppet repository /srv/git/operations/puppet on instance toolsbeta-puppetserver-1 in project toolsbeta   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure
[09:44:28] <wmcs-alerts>	 FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project gitlab-runners   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure
[09:45:57] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:50:07] <wm-bot2>	 !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
[09:50:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:50:29] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console
[09:50:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:50:38] <wm-bot2>	 !log dcaro@urcuchillay tools END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=1)
[09:50:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:59:28] <wmcs-alerts>	 RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project gitlab-runners   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure
[09:59:39] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[10:05:58] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[11:13:00] <jinxer-wm>	 FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[11:18:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[11:33:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[11:57:54] <wikibugs>	 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10044740 (10SLyngshede-WMF) @Htriedman I'm responsible for offboarding you from any systems you no longer requ...
[12:00:48] <wikibugs>	 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10044744 (10SLyngshede-WMF) @Reedy Given that @Htriedman won't be needing access to security security and priv...
[12:16:35] <wikibugs>	 06cloud-services-team: CephSlowOps  Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T370752#10044786 (10dcaro) Last round was caused by {T371879}
[12:18:40] <wikibugs>	 06cloud-services-team: ProbeDown - https://phabricator.wikimedia.org/T371867#10044793 (10dcaro) 05Open→03Resolved a:03dcaro Caused by {T371879}
[12:34:00] <wikibugs>	 (03update) 10dcaro: show backend status [toolforge-repos/sample-complex-app-frontend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-frontend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370324)
[12:41:36] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2): [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10044851 (10dcaro)
[12:42:02] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2): [ceph,network] Intermittent network packets lost - https://phabricator.wikimedia.org/T371869#10044853 (10dcaro)
[12:42:39] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [ceph,network] Intermittent network packets lost - https://phabricator.wikimedia.org/T371869#10044854 (10dcaro)
[12:43:10] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10044855 (10dcaro)
[12:54:23] <wikibugs>	 (03PS5) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921
[12:54:31] <wm-bot2>	 !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api
[12:54:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[12:55:34] <wm-bot2>	 !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api
[12:55:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[12:57:16] <wikibugs>	 (03PS6) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921
[13:00:32] <wikibugs>	 (03CR) 10CI reject: [V:04-1] toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 (owner: 10David Caro)
[13:00:47] <wm-bot2>	 !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api
[13:00:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:01:41] <wm-bot2>	 !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api
[13:01:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:03:35] <wm-bot2>	 !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api
[13:03:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:04:06] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: [toolsdb] Migrate mixnmatch db to Trove - https://phabricator.wikimedia.org/T350862#10044926 (10Magnus) Hi @fnegri if this will take a while, could you throw me a few more DB connections for mix-n-match on ToolsDB in the meantime? I have a l...
[13:04:23] <wm-bot2>	 !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api
[13:04:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[13:10:44] <wikibugs>	 (03PS5) 10David Caro: WMCSCookbookRunnerBase: load the wmcs config if it's there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059920
[13:10:44] <wikibugs>	 (03PS3) 10David Caro: openstack.tofu: use gitlab token from wmcs config [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059925
[13:10:44] <wikibugs>	 (03PS5) 10David Caro: toolforge.component.deploy: remove the k8s prefix [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059890
[13:10:45] <wikibugs>	 (03PS5) 10David Caro: toolforge.component.deploy: use bump_<component> as default branch [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059905
[13:10:46] <wikibugs>	 (03PS8) 10David Caro: toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907
[13:10:50] <wikibugs>	 (03PS5) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919
[13:10:54] <wikibugs>	 (03PS7) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921
[13:13:43] <wikibugs>	 (03CR) 10David Caro: toolforge.component.deploy: use bump_<component> as default branch (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059905 (owner: 10David Caro)
[13:14:22] <wikibugs>	 (03CR) 10CI reject: [V:04-1] openstack.tofu: use gitlab token from wmcs config [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059925 (owner: 10David Caro)
[13:14:26] <wikibugs>	 (03CR) 10CI reject: [V:04-1] toolforge.component.deploy: use bump_<component> as default branch [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059905 (owner: 10David Caro)
[13:14:32] <wikibugs>	 (03CR) 10CI reject: [V:04-1] openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 (owner: 10David Caro)
[13:14:33] <wikibugs>	 (03CR) 10CI reject: [V:04-1] toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 (owner: 10David Caro)
[13:14:42] <wikibugs>	 (03CR) 10CI reject: [V:04-1] toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907 (owner: 10David Caro)
[13:18:34] <wikibugs>	 (03PS6) 10David Caro: toolforge.component.deploy: use bump_<component> as default branch [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059905
[13:18:34] <wikibugs>	 (03PS9) 10David Caro: toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907
[13:18:34] <wikibugs>	 (03PS6) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919
[13:18:35] <wikibugs>	 (03PS8) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921
[13:18:35] <wikibugs>	 (03PS1) 10David Caro: wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060102
[13:20:14] <wikibugs>	 (03PS8) 10David Caro: wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059906
[13:20:14] <wikibugs>	 (03PS10) 10David Caro: toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907
[13:20:14] <wikibugs>	 (03PS7) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919
[13:20:14] <wikibugs>	 (03PS9) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921
[13:20:42] <wikibugs>	 (03Abandoned) 10David Caro: wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060102 (owner: 10David Caro)
[13:23:05] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10044971 (10dcaro)
[13:24:09] <wikibugs>	 (03CR) 10CI reject: [V:04-1] openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 (owner: 10David Caro)
[13:24:17] <wikibugs>	 (03CR) 10CI reject: [V:04-1] wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059906 (owner: 10David Caro)
[13:24:22] <wikibugs>	 (03CR) 10CI reject: [V:04-1] toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907 (owner: 10David Caro)
[13:24:27] <wikibugs>	 (03CR) 10CI reject: [V:04-1] toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 (owner: 10David Caro)
[13:29:28] <wikibugs>	 (03PS9) 10David Caro: wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059906
[13:29:29] <wikibugs>	 (03PS11) 10David Caro: toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907
[13:29:29] <wikibugs>	 (03PS8) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919
[13:30:11] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#10044979 (10fnegri) So far the upgrade to Bookworm and MariaDB 10.6.18 seems to have helped. There is no replication lag and traffic shapes in clouddb10...
[13:32:56] <wikibugs>	 (03CR) 10CI reject: [V:04-1] toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907 (owner: 10David Caro)
[13:33:12] <wikibugs>	 (03CR) 10CI reject: [V:04-1] wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059906 (owner: 10David Caro)
[13:33:19] <wikibugs>	 (03CR) 10CI reject: [V:04-1] openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 (owner: 10David Caro)
[13:48:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[13:52:26] <wikibugs>	 (03CR) 10FNegri: [C:03+1] "LGTM!" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059920 (owner: 10David Caro)
[14:03:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[14:04:46] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10045162 (10fnegri)
[14:15:47] <wikibugs>	 10Tool-toolviews: Statistics from toolviews are erratic for Scholia - https://phabricator.wikimedia.org/T320533#10045202 (10Fnielsen) Since 15 March 2024 the statistics for the Scholia Toolforge application has not been erratic.
[14:17:36] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10045206 (10BTullis) @fnegri, I forgot to mention something. I think there is a manual step that we need to carry out on these hosts after reimage. The reason is...
[14:20:47] <wikibugs>	 10Data-Services, 06DBA: Prepare and check storage layer for bdrwiki - https://phabricator.wikimedia.org/T371759#10045222 (10Zabe) Wiki has been created
[14:25:08] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10045236 (10fnegri) @BTullis thanks! That was actually in my checklist at https://wikitech.wikimedia.org/wiki/MariaDB/Rebooting_a_host but I somehow managed to mi...
[15:13:15] <jinxer-wm>	 FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[15:14:04] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045456 (10Andrew) Current plan:  [] Cathal gets new cloudcephosd nodes online (T363344) [] David drains as many affected OSD nodes as possible [] Andrew depools all affected clou...
[15:20:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[15:20:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[15:21:39] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[15:21:39] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[15:22:13] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1036.eqiad.wmnet'
[15:22:13] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1036.eqiad.wmnet'
[15:25:15] <logmsgbot_cloud>	 !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[15:25:16] <logmsgbot_cloud>	 !log fnegri@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[15:28:24] <wikibugs>	 (03PS1) 10David Caro: proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130
[15:29:31] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+1] proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro)
[15:29:45] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[15:29:46] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[15:30:35] <wikibugs>	 (03PS2) 10David Caro: proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130
[15:30:44] <wikibugs>	 (03CR) 10David Caro: proxy: skip the proxy if there's no proxy settings (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro)
[15:30:50] <wikibugs>	 (03CR) 10David Caro: proxy: skip the proxy if there's no proxy settings (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro)
[15:35:50] <wikibugs>	 (03CR) 10David Caro: [C:03+2] "Tested with test-cookbooks on cloudcumin1001" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059920 (owner: 10David Caro)
[15:36:47] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[15:36:47] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[15:37:16] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045627 (10dcaro)
[15:37:39] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045629 (10dcaro) >>! In T371878#10045456, @Andrew wrote: > Current plan: >   Thanks! Moved it to the task description :)
[15:39:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WMCSCookbookRunnerBase: load the wmcs config if it's there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059920 (owner: 10David Caro)
[15:39:17] <wikibugs>	 (03PS1) 10David Caro: ceph.osd.drain_node: fix the example [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134
[15:41:34] <wikibugs>	 (03PS1) 10David Caro: ceph.osd.drain_node: use the osd from the node list only [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060135
[15:41:49] <wikibugs>	 (03CR) 10David Caro: [C:03+2] proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro)
[15:42:03] <wikibugs>	 (03CR) 10David Caro: [C:03+2] ceph.osd.drain_node: fix the example [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 (owner: 10David Caro)
[15:42:25] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.osd.drain_node: fix the example [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 (owner: 10David Caro)
[15:43:23] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T371878)
[15:43:29] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[15:44:04] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) (T371878)
[15:44:59] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.osd.drain_node: use the osd from the node list only [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060135 (owner: 10David Caro)
[15:45:01] <wikibugs>	 (03CR) 10CI reject: [V:04-1] proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro)
[15:45:52] <wikibugs>	 (03CR) 10David Caro: [C:03+2] "recheck" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 (owner: 10David Caro)
[15:46:05] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10045661 (10cmooney) 05Open→03Resolved Thanks guys, the second ports are now configured on the switches.
[15:49:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro)
[15:49:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.osd.drain_node: fix the example [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 (owner: 10David Caro)
[15:50:25] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10045673 (10cmooney) I should say cloudcephosd1036 change I've not pushed to the switch - that will happen when we do a homer run after the planned reb...
[15:51:22] <wikibugs>	 (03PS3) 10David Caro: proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130
[15:51:22] <wikibugs>	 (03PS2) 10David Caro: ceph.osd.drain_node: fix the example [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134
[15:51:22] <wikibugs>	 (03CR) 10David Caro: ceph.osd.drain_node: fix the example (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 (owner: 10David Caro)
[15:51:22] <wikibugs>	 (03PS2) 10David Caro: ceph.osd.drain_node: use the osd from the node list only [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060135
[15:51:23] <wikibugs>	 (03PS1) 10David Caro: gitlab: fix no-member issue on ci [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060138
[15:54:58] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+1] gitlab: fix no-member issue on ci [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060138 (owner: 10David Caro)
[15:55:49] <wikibugs>	 (03CR) 10David Caro: [C:03+2] gitlab: fix no-member issue on ci [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060138 (owner: 10David Caro)
[15:55:56] <wikibugs>	 (03CR) 10David Caro: [C:03+1] proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro)
[15:55:59] <wikibugs>	 (03CR) 10David Caro: [C:03+2] proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro)
[15:56:33] <wikibugs>	 (03CR) 10David Caro: [C:03+2] "Tested in cloudcumin" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060135 (owner: 10David Caro)
[15:58:51] <wikibugs>	 (03Merged) 10jenkins-bot: gitlab: fix no-member issue on ci [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060138 (owner: 10David Caro)
[15:58:59] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[15:58:59] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[15:59:13] <wikibugs>	 (03Merged) 10jenkins-bot: proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro)
[15:59:52] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045719 (10cmooney)
[16:01:03] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:01:42] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0)
[16:02:23] <wikibugs>	 (03Merged) 10jenkins-bot: ceph.osd.drain_node: fix the example [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 (owner: 10David Caro)
[16:02:24] <wikibugs>	 (03Merged) 10jenkins-bot: ceph.osd.drain_node: use the osd from the node list only [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060135 (owner: 10David Caro)
[16:02:27] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045741 (10cmooney) It's probably best to manually flip the HA on the cloudgw/cloudnet nodes to the ones in rack C8 before we start.  I just checked and the two nodes in rack D5 (...
[16:02:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:03:25] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[16:03:27] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:03:55] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[16:03:56] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:04:25] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[16:04:26] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:04:55] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[16:04:56] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:05:25] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[16:05:26] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:05:55] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[16:05:56] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:06:25] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[16:06:26] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:06:54] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[16:06:55] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:07:24] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[16:07:25] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:07:34] <wikibugs>	 (03PS1) 10David Caro: ceph.osd.drain_*: fix the help note for --no-wait [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060140
[16:07:53] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[16:07:54] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:08:23] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99)
[16:08:57] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045793 (10cmooney) >>! In T371878#10045741, @cmooney wrote: > It's probably best to manually flip the HA on the cloudgw/cloudnet nodes to the ones in rack C8 before we start.  I...
[16:08:59] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10045794 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1002 for host clouddb1020.eqiad.wmnet with OS bookworm
[16:09:56] <wikibugs>	 (03CR) 10David Caro: [C:03+2] ceph.osd.drain_*: fix the help note for --no-wait [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060140 (owner: 10David Caro)
[16:10:17] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.drain_node (T371878)
[16:13:14] <wmcs-alerts>	 FIRING: ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: tools-k8s-control-8.tools.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown
[16:13:51] <wmcs-alerts>	 FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[16:16:31] <wmcs-alerts>	 FIRING: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-1 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing
[16:17:58] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045845 (10Andrew)
[16:18:14] <wmcs-alerts>	 RESOLVED: [6x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: tools-k8s-control-8.tools.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown
[16:18:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[16:18:51] <wmcs-alerts>	 RESOLVED: [3x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[16:31:31] <wmcs-alerts>	 RESOLVED: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-1 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing
[16:33:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[16:37:23] <wm-bot2>	 !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1036.eqiad.wmnet'
[16:37:23] <wm-bot2>	 !log andrew@bullseye admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1036.eqiad.wmnet'
[16:37:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:37:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:39:11] <wikibugs>	 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10046012 (10Htriedman) @SLyngshede-WMF that sounds like a good plan! Can you link me too the volunteer NDA? I...
[16:39:29] <wikibugs>	 (03PS1) 10David Caro: readme: fix the configuration command [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060143
[16:39:30] <wikibugs>	 (03PS1) 10David Caro: alerts: don't fail if we can't reach icinga from cloudcumin [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060144
[16:41:39] <wikibugs>	 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10046019 (10Dzahn) Hi @Htriedman I can see that you are not on the doc that lists people who signed the NDA. T...
[16:42:29] <wikibugs>	 (03CR) 10CI reject: [V:04-1] alerts: don't fail if we can't reach icinga from cloudcumin [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060144 (owner: 10David Caro)
[16:44:32] <wikibugs>	 (03PS2) 10David Caro: alerts: don't fail if we can't reach icinga from cloudcumin [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060144
[16:47:14] <wm-bot2>	 !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1036.eqiad.wmnet'
[16:47:14] <wm-bot2>	 !log andrew@bullseye admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1036.eqiad.wmnet'
[16:47:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:47:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:48:01] <wm-bot2>	 !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1036.eqiad.wmnet'
[16:48:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:48:44] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+1] "these worked for me!" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060143 (owner: 10David Caro)
[16:48:54] <wikibugs>	 (03CR) 10David Caro: [C:03+2] readme: fix the configuration command [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060143 (owner: 10David Caro)
[16:51:32] <wikibugs>	 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10046045 (10Htriedman) @Dzahn Got it — yeah, the one I recall signing a few years ago was through the phab UI....
[16:52:03] <wikibugs>	 (03Merged) 10jenkins-bot: readme: fix the configuration command [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060143 (owner: 10David Caro)
[16:54:17] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:54:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0)
[16:54:58] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:55:36] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0)
[16:55:37] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:56:15] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0)
[16:56:16] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:56:52] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10046064 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1002 for host clouddb1020.eqiad.wmnet with OS bookworm completed: - cl...
[16:56:56] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0)
[16:56:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:57:35] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0)
[16:57:36] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:58:05] <wikibugs>	 (03CR) 10FNegri: [C:03+1] "I think we can find nicer solutions but this one is good enough to unblock the current use case." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060144 (owner: 10David Caro)
[16:58:14] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0)
[16:58:15] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:58:53] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0)
[16:58:54] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[16:59:32] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0)
[16:59:33] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[17:00:11] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0)
[17:00:12] <wm-bot2>	 !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1036.eqiad.wmnet'
[17:00:12] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[17:00:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:00:51] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0)
[17:00:52] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[17:01:22] <wm-bot2>	 !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1037.eqiad.wmnet'
[17:01:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:01:31] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0)
[17:03:02] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1047.eqiad.wmnet'
[17:03:16] <wikibugs>	 (03CR) 10David Caro: [C:03+2] alerts: don't fail if we can't reach icinga from cloudcumin [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060144 (owner: 10David Caro)
[17:04:03] <wikibugs>	 (03PS6) 10David Caro: WMCSCookbookRunnerBase: load the wmcs config if it's there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059920
[17:04:03] <wikibugs>	 (03PS4) 10David Caro: openstack.tofu: use gitlab token from wmcs config [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059925
[17:04:03] <wikibugs>	 (03PS6) 10David Caro: toolforge.component.deploy: remove the k8s prefix [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059890
[17:04:03] <wikibugs>	 (03PS7) 10David Caro: toolforge.component.deploy: use bump_<component> as default branch [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059905
[17:04:04] <wikibugs>	 (03PS10) 10David Caro: wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059906
[17:04:06] <wikibugs>	 (03PS12) 10David Caro: toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907
[17:04:10] <wikibugs>	 (03PS9) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919
[17:04:14] <wikibugs>	 (03PS10) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921
[17:06:49] <jinxer-wm>	 FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirtlocal1001 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[17:06:53] <wikibugs>	 (03Merged) 10jenkins-bot: alerts: don't fail if we can't reach icinga from cloudcumin [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060144 (owner: 10David Caro)
[17:08:04] <wikibugs>	 10VPS-project-Wikistats: Add bdrwiki to wikistats - https://phabricator.wikimedia.org/T371764#10046090 (10Dzahn) 05Open→03Resolved p:05Triage→03Medium a:03Dzahn ` MariaDB [wikistats]> insert into wikipedias (prefix, lang, loclang, method) values ("bdr", "West Coast Bajau", "Ling Sama", 8);  `   ` d...
[17:08:15] <wikibugs>	 (03CR) 10CI reject: [V:04-1] openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 (owner: 10David Caro)
[17:08:32] <wikibugs>	 (03CR) 10CI reject: [V:04-1] toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 (owner: 10David Caro)
[17:09:22] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: [toolsdb] Migrate mixnmatch db to Trove - https://phabricator.wikimedia.org/T350862#10046099 (10fnegri) @magnus the migration to Trove is still in my to-do list but I'm not planning to start it before next month at the earliest.  Re: connect...
[17:10:29] <wikibugs>	 (03PS10) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919
[17:10:29] <wikibugs>	 (03PS11) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921
[17:13:55] <wikibugs>	 (03CR) 10CI reject: [V:04-1] toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 (owner: 10David Caro)
[17:14:01] <wm-bot2>	 !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1037.eqiad.wmnet'
[17:14:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:14:44] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1047.eqiad.wmnet'
[17:18:32] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10046125 (10fnegri)
[17:20:59] <wm-bot2>	 !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1038.eqiad.wmnet'
[17:21:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:21:07] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1046.eqiad.wmnet'
[17:27:05] <wikibugs>	 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 06Trust and Safety Product Team, 10Temporary accounts (Create/update essential tools/anti-abuse management): Use a time period field in Special:GlobalContributions instead of start/end date fields - https://phabricator.wikimedia.org/T371917 (10T...
[17:29:21] <wikibugs>	 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 06Trust and Safety Product Team, 10Temporary accounts (Create/update essential tools/anti-abuse management): Use a time period field in Special:GlobalContributions instead of start/end date fie... - https://phabricator.wikimedia.org/T371917#10046220
[17:30:12] <wikibugs>	 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 07Epic, 10Temporary accounts (Create/update essential tools/anti-abuse management): [Epic] Implement global user contributions feature - https://phabricator.wikimedia.org/T337089#10046225 (10Tchanders)
[17:31:04] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1046.eqiad.wmnet'
[17:31:21] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1045.eqiad.wmnet'
[17:32:52] <wm-bot2>	 !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1038.eqiad.wmnet'
[17:32:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:34:22] <wm-bot2>	 !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1039.eqiad.wmnet'
[17:34:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:36:50] <jinxer-wm>	 RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirtlocal1001 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[17:40:12] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack
[17:41:38] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0)
[17:43:51] <wm-bot2>	 !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1039.eqiad.wmnet'
[17:43:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:45:03] <wm-bot2>	 !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1040.eqiad.wmnet'
[17:45:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:47:45] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1045.eqiad.wmnet'
[17:48:00] <jinxer-wm>	 RESOLVED: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[17:48:20] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1044.eqiad.wmnet'
[17:58:49] <wm-bot2>	 !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1040.eqiad.wmnet'
[17:58:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:59:04] <wm-bot2>	 !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1041.eqiad.wmnet'
[17:59:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[18:02:26] <wikibugs>	 06Toolforge-standards-committee: Adoption request for Yapperbot - https://phabricator.wikimedia.org/T361426#10046303 (10bd808) >>! In T361426#9757455, @Soda wrote: > @taavi Would it be possible to unprotect the yml files in the yapper bot directory?  Lets just publish the config here. The $HOME/pruner/config-pru...
[18:03:16] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1044.eqiad.wmnet'
[18:05:28] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1043.eqiad.wmnet'
[18:07:18] <wikibugs>	 06Toolforge-standards-committee: Adoption request for Yapperbot - https://phabricator.wikimedia.org/T361426#10046351 (10bd808) @DavidTornheim I can turn the tool over to you, but only after breaking it by stopping the jobs and removing the bot password currently stored in $HOME/botpassword. Would you like me to...
[18:17:02] <wm-bot2>	 !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1041.eqiad.wmnet'
[18:17:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[18:19:34] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1043.eqiad.wmnet'
[18:21:28] <wm-bot2>	 !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1042.eqiad.wmnet'
[18:21:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[18:36:42] <wm-bot2>	 !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1042.eqiad.wmnet'
[18:36:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[18:41:37] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10046446 (10Andrew)
[18:49:46] <wikibugs>	 10Tool-handle-commons-on-osm: HandleCommonsOnOSM: Add SDC directly - https://phabricator.wikimedia.org/T371776#10046486 (10FlSchmitt) Fixed by #4ff1124d
[18:51:56] <wikibugs>	 10Tool-handle-commons-on-osm: HandleCommonsOnOSM: Add SDC directly - https://phabricator.wikimedia.org/T371776#10046488 (10FlSchmitt) 05Open→03Resolved a:03FlSchmitt fixed by 4ff1124d90e8b5d81db39a84b606f37901580243
[19:51:40] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) (T371878)
[19:51:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[19:51:46] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[20:06:06] <wikibugs>	 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930 (10hashar) 03NEW
[20:06:28] <wikibugs>	 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930#10046604 (10hashar)
[20:11:41] <wikibugs>	 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930#10046610 (10bd808) The login for https://idm.wikimedia.org is via https://idp.wikimedia.org. The IdP service uses the...
[20:12:03] <wikibugs>	 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930#10046614 (10hashar)
[20:17:12] <wikibugs>	 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Bitu, 10CAS-SSO, and 2 others: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930#10046623 (10bd808) Here is the developer account record: ` $ ldap uid=jenkins-deploy dn: uid=jenkins-deploy,ou=people,dc=wik...
[20:41:31] <wikibugs>	 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Bitu, 10CAS-SSO, and 2 others: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930#10046726 (10Dzahn) Given that users are always supposed to use different keys for prod vs cloud, should the system user also...
[20:49:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[20:59:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[21:20:59] <wikibugs>	 (03PS1) 10David Caro: ceph.{drain,undrain}: fix chunking [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060173
[21:22:20] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.drain_node (T371878)
[21:22:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[21:22:25] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[22:03:44] <wikibugs>	 10Tool-extloc, 13Patch-For-Review, 10Release-Engineering-Team (Yak Shaving 🐃🪒): extloc: Move to Toolforge Build Service - https://phabricator.wikimedia.org/T365665#10046945 (10brennen) 05Open→03Resolved
[23:12:13] <wikibugs>	 (03PS5) 10Krinkle: frontend: Server-side rendering (take 2) [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1056248