[00:10:28] <wmcs-alerts>	 FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun
[00:10:50] <wmcs-alerts>	 FIRING: TfInfraTestDestroyFailed: Terraform failed to destroy the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[00:14:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[00:15:28] <wmcs-alerts>	 RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun
[00:19:09] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[00:22:36] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[00:22:37] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[00:22:49] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[00:22:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[00:23:07] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[00:23:15] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[00:23:28] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[00:23:40] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[00:29:05] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[00:51:55] <jinxer-wm>	 FIRING: MaxConntrack: Max conntrack at 84.14% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[00:52:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[00:56:55] <jinxer-wm>	 RESOLVED: MaxConntrack: Max conntrack at 88.41% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[01:38:55] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[01:39:06] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[01:39:37] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[01:39:48] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[01:52:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[01:52:18] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[01:53:33] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[01:57:18] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[02:00:42] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[02:00:55] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[02:07:18] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[02:13:55] <wikibugs>	 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: [k8s] Add node anti-affinity topologySpreadConstraints to infrastructure components where relevant - https://phabricator.wikimedia.org/T358203#10065728 (10Raymond_Ndibe) This has been open for some time now so we should probably close it. To test it her...
[02:34:28] <wikibugs>	 10Tools: Template transclusion count per page - https://phabricator.wikimedia.org/T372523 (10Fgnievinski) 03NEW
[03:02:28] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[03:02:37] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[03:02:48] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[03:02:59] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[03:03:08] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[03:03:16] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[03:03:30] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[03:03:31] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[03:52:49] <icinga-wm>	 PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
[03:53:43] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29689 bytes in 3.496 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
[04:18:22] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[04:18:23] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[04:18:38] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[04:18:39] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[04:18:54] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[04:18:55] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[04:19:25] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[04:19:26] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[04:19:47] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[04:21:16] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[04:21:17] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[04:22:04] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[04:22:21] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[04:25:03] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[04:25:51] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[04:26:50] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[06:23:40] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [ceph] Metrics started not responding during the drain - https://phabricator.wikimedia.org/T372528 (10dcaro) 03NEW
[06:24:33] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-20
[06:24:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[06:30:24] <wm-bot2>	 !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-20
[06:30:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[06:56:21] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [ceph] Metrics started not responding during the drain - https://phabricator.wikimedia.org/T372528#10065958 (10dcaro) This explains the issue: https://documentation.suse.com/ses/7.1/html/ses-all/monitoring-alerting.html#monitoring-stale-cache  Turns out t...
[07:07:22] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [ceph] Metrics started not responding during the drain - https://phabricator.wikimedia.org/T372528#10065962 (10dcaro) 05Open→03Resolved a:03dcaro I've also set the scrape_interval to the same we have on prometheus side (300), and restarted the m...
[07:09:23] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [ceph] Metrics started not responding during the drain - https://phabricator.wikimedia.org/T372528#10065972 (10dcaro) Actually, changed the scrape_interval to 60s as that's what we have configured: ` root@prometheus1005:~# cat /srv/prometheus/cloud/pr...
[07:10:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[07:51:15] <wikibugs>	 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: [k8s] Add node anti-affinity topologySpreadConstraints to infrastructure components where relevant - https://phabricator.wikimedia.org/T358203#10066033 (10dcaro) >>! In T358203#10065728, @Raymond_Ndibe wrote: > Any other way to have less number of deplo...
[08:05:02] <wikibugs>	 (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321)
[08:26:40] <wikibugs>	 (03approved) 10dcaro: [jobs-cli] remove _display_messages [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/62 (owner: 10raymond-ndibe)
[08:35:25] <wm-bot2>	 !log dcaro@urcuchillay samplecomplexapp START - Cookbook wmcs.vps.create_project for trove-only project samplecomplexapp in eqiad1 (T370317)
[08:35:27] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "samplecomplexapp"
[08:35:27] <stashbot>	 T370317: [sct.backend] Create trove database - https://phabricator.wikimedia.org/T370317
[08:36:04] <wm-bot2>	 !log dcaro@urcuchillay samplecomplexapp END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for trove-only project samplecomplexapp in eqiad1 (T370317)
[08:36:05] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "samplecomplexapp"
[08:40:56] <wm-bot2>	 !log dcaro@urcuchillay samplecomplexapp START - Cookbook wmcs.vps.create_project for trove-only project samplecomplexapp in eqiad1 (T370317)
[08:40:58] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "samplecomplexapp"
[08:40:58] <stashbot>	 T370317: [sct.backend] Create trove database - https://phabricator.wikimedia.org/T370317
[08:40:59] <wm-bot2>	 !log dcaro@urcuchillay samplecomplexapp END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for trove-only project samplecomplexapp in eqiad1 (T370317)
[08:40:59] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "samplecomplexapp"
[08:41:23] <wm-bot2>	 !log dcaro@urcuchillay samplecomplexapp START - Cookbook wmcs.vps.create_project for trove-only project samplecomplexapp in eqiad1 (T370317)
[08:41:24] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "samplecomplexapp"
[08:41:42] <wm-bot2>	 !log dcaro@urcuchillay samplecomplexapp END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for trove-only project samplecomplexapp in eqiad1 (T370317)
[08:41:43] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "samplecomplexapp"
[08:41:45] <wm-bot2>	 !log dcaro@urcuchillay samplecomplexapp START - Cookbook wmcs.vps.create_project for trove-only project samplecomplexapp in eqiad1 (T370317)
[08:41:45] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "samplecomplexapp"
[08:43:22] <wm-bot2>	 !log dcaro@urcuchillay samplecomplexapp END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for trove-only project samplecomplexapp in eqiad1 (T370317)
[08:43:22] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "samplecomplexapp"
[08:43:25] <wm-bot2>	 !log dcaro@urcuchillay samplecomplexapp START - Cookbook wmcs.vps.create_project for trove-only project samplecomplexapp in eqiad1 (T370317)
[08:43:25] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "samplecomplexapp"
[08:43:56] <wm-bot2>	 !log dcaro@urcuchillay samplecomplexapp END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for trove-only project samplecomplexapp in eqiad1 (T370317)
[08:43:56] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "samplecomplexapp"
[08:45:09] <wm-bot2>	 !log dcaro@urcuchillay samplecomplexapp START - Cookbook wmcs.vps.create_project for trove-only project samplecomplexapp in eqiad1 (T370317)
[08:45:09] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "samplecomplexapp"
[08:45:24] <wm-bot2>	 !log dcaro@urcuchillay samplecomplexapp END (PASS) - Cookbook wmcs.vps.create_project (exit_code=0) for trove-only project samplecomplexapp in eqiad1 (T370317)
[08:45:24] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "samplecomplexapp"
[08:47:04] <wikibugs>	 (03PS1) 10David Caro: create_project: fix the quota setting for the project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1062964
[08:50:45] <wikibugs>	 (03CR) 10CI reject: [V:04-1] create_project: fix the quota setting for the project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1062964 (owner: 10David Caro)
[08:50:58] <wikibugs>	 (03approved) 10dcaro: toolforge_deploy_mr: make all apt actions non-interactive [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/182
[08:51:02] <wikibugs>	 (03merge) 10dcaro: toolforge_deploy_mr: make all apt actions non-interactive [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/182
[08:54:47] <wikibugs>	 (03PS2) 10David Caro: openstack: skip passing envvar for role commands [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1062674
[08:54:47] <wikibugs>	 (03PS2) 10David Caro: create_project: fix the quota setting for the project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1062964
[08:56:12] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.wait_for_rebalance
[08:56:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[08:58:20] <wikibugs>	 (03approved) 10dcaro: [envvars-cli] remove display_messages [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/57 (owner: 10raymond-ndibe)
[08:58:45] <wikibugs>	 (03update) 10dcaro: [toolforge-weld] move _display_message into toolforge weld [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/46 (owner: 10raymond-ndibe)
[09:42:32] <wikibugs>	 10Tools: Flickr2 Commons is currently down - https://phabricator.wikimedia.org/T372451#10066263 (10Jeff_G) See also https://commons.wikimedia.org/wiki/Commons:Village_pump#Flickr2Commons and unactioned issue 320 at https://bitbucket.org/magnusmanske/flickr2commons/issues/320/doesnt-respond .
[09:52:25] <wm-bot2>	 !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.wait_for_rebalance (exit_code=0)
[09:52:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[10:00:50] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.wait_for_rebalance
[10:00:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[10:03:54] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[10:10:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[10:22:02] <wikibugs>	 10VPS-project-devtools, 10GitLab: https://gitlab.devtools.wmcloud.org is being indexed by google (and scoring pretty high) - https://phabricator.wikimedia.org/T372538#10066439 (10Bugreporter) Also, we need a notice in that tool to indicate this is only a test instance, and provide a link to official Wikimedia...
[12:54:54] <wikibugs>	 (03open) 10dcaro: db: add database check and boileplate [toolforge-repos/sample-complex-app-backend] (add_worker) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/2 (https://phabricator.wikimedia.org/T370317)
[13:02:39] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[13:35:53] <wikibugs>	 (03update) 10dcaro: db: add database check and boileplate [toolforge-repos/sample-complex-app-backend] (add_worker) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/2 (https://phabricator.wikimedia.org/T370317)
[13:39:17] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation), 10Wikispore: Rebuild Wikispore Vagrant boxes on Bullseye or Bookworm - https://phabricator.wikimedia.org/T365934#10066871 (10Andrew) Hello @tgr, were you able to make any progress with this? I'm going on sabbatical soon and (for arcane backend reasons, T364457) need to...
[13:40:36] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation), 10linkwatcher: Cloud VPS "linkwatcher" project Buster deprecation - https://phabricator.wikimedia.org/T367536#10066881 (10Andrew) Hi @Beetstra! Did you ever get your access issues sorted out? I need to rebuild the host that these VMs are on soon (see T364457 for probabl...
[13:40:48] <wikibugs>	 (03update) 10dcaro: db: add database check and boileplate [toolforge-repos/sample-complex-app-backend] (add_worker) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/2 (https://phabricator.wikimedia.org/T370317)
[13:48:40] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wikipathways" project Buster deprecation - https://phabricator.wikimedia.org/T367563#10066919 (10Andrew) Emailed project admins today:   ` Hello!  You are receiving this email because you are listed as an admin of the 'wikipathways' project in cloud-vps.  None...
[13:53:09] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "dumps" project Buster deprecation - https://phabricator.wikimedia.org/T367528#10066923 (10Andrew) I'm going to delete these VMs next week. If you need to check them or rescue an data, now's the time :)
[13:59:00] <wikibugs>	 (03update) 10dcaro: db: add database check and boileplate [toolforge-repos/sample-complex-app-backend] (add_worker) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/2 (https://phabricator.wikimedia.org/T370317)
[14:00:18] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation), 06Infrastructure-Foundations, 10Puppet CI: Cloud VPS "puppet-diffs" project Buster deprecation - https://phabricator.wikimedia.org/T367547#10066941 (10Andrew) Hi folks! I haven't read all of this ticket but can I get an update about when/if you plan to remove the Bust...
[14:02:21] <wikibugs>	 (03update) 10dcaro: db: add database check and boileplate [toolforge-repos/sample-complex-app-backend] (add_worker) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/2 (https://phabricator.wikimedia.org/T370317)
[14:08:21] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10066963 (10Andrew) Quick summary:  Cathal upgraded and rebooted the switch on Tuesday the 13th. That did not solve the flapping. Vriley then suggested that we do a physical powerd...
[14:08:26] <wikibugs>	 (03update) 10dcaro: db: add database check and boileplate [toolforge-repos/sample-complex-app-backend] (add_worker) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/2 (https://phabricator.wikimedia.org/T370317)
[14:13:41] <wikibugs>	 (03PS1) 10Pwangai: Exempt Test Group Repositories [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/1063008 (https://phabricator.wikimedia.org/T372565)
[14:13:58] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Exempt Test Group Repositories [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/1063008 (https://phabricator.wikimedia.org/T372565) (owner: 10Pwangai)
[14:15:14] <wikibugs>	 (03PS2) 10Pwangai: Exempt Test Group Repositories [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/1063008 (https://phabricator.wikimedia.org/T372565)
[14:15:32] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Exempt Test Group Repositories [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/1063008 (https://phabricator.wikimedia.org/T372565) (owner: 10Pwangai)
[14:16:43] <wikibugs>	 (03open) 10dcaro: checks: add database check [toolforge-repos/sample-complex-app-frontend] (show_deployment_steps) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-frontend/-/merge_requests/4 (https://phabricator.wikimedia.org/T370317)
[14:19:48] <wikibugs>	 (03update) 10dcaro: checks: add database check [toolforge-repos/sample-complex-app-frontend] (show_deployment_steps) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-frontend/-/merge_requests/4 (https://phabricator.wikimedia.org/T370317)
[14:19:51] <wikibugs>	 (03update) 10dcaro: checks: add database check [toolforge-repos/sample-complex-app-frontend] (show_deployment_steps) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-frontend/-/merge_requests/4 (https://phabricator.wikimedia.org/T370317)
[14:19:55] <wikibugs>	 (03update) 10dcaro: checks: add database check [toolforge-repos/sample-complex-app-frontend] (show_deployment_steps) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-frontend/-/merge_requests/4 (https://phabricator.wikimedia.org/T370317)
[14:19:58] <wikibugs>	 (03update) 10dcaro: checks: add database check [toolforge-repos/sample-complex-app-frontend] (show_deployment_steps) - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-frontend/-/merge_requests/4 (https://phabricator.wikimedia.org/T370317)
[14:30:34] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+1] create_project: fix the quota setting for the project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1062964 (owner: 10David Caro)
[14:32:15] <wikibugs>	 (03PS3) 10Pwangai: Exempt Test Group Repositories [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/1063008 (https://phabricator.wikimedia.org/T372565)
[14:44:55] <wikibugs>	 (03CR) 10David Caro: [C:03+2] create_project: fix the quota setting for the project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1062964 (owner: 10David Caro)
[14:44:59] <wikibugs>	 (03CR) 10David Caro: [C:03+2] openstack: skip passing envvar for role commands [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1062674 (owner: 10David Caro)
[14:45:39] <wikibugs>	 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: [k8s] Add node anti-affinity topologySpreadConstraints to infrastructure components where relevant - https://phabricator.wikimedia.org/T358203#10067120 (10Raymond_Ndibe) >>! In T358203#10066033, @dcaro wrote: >>>! In T358203#10065728, @Raymond_Ndibe wro...
[14:48:58] <wikibugs>	 (03Merged) 10jenkins-bot: openstack: skip passing envvar for role commands [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1062674 (owner: 10David Caro)
[14:48:58] <wikibugs>	 (03Merged) 10jenkins-bot: create_project: fix the quota setting for the project [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1062964 (owner: 10David Caro)
[14:54:24] <wikibugs>	 (03update) 10raymond-ndibe: [builds-api] add topologySpreadConstraints to deployment [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/82 (https://phabricator.wikimedia.org/T358203)
[15:17:45] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[15:22:06] <wikibugs>	 10VPS-project-devtools, 06collaboration-services, 10GitLab, 06Release-Engineering-Team: https://gitlab.devtools.wmcloud.org is being indexed by google (and scoring pretty high) - https://phabricator.wikimedia.org/T372538#10067210 (10brennen) > Have a robots.txt that makes it non-indexable  Reasonable. I do...
[15:22:12] <wikibugs>	 10VPS-project-devtools, 06collaboration-services, 10GitLab, 06Release-Engineering-Team: https://gitlab.devtools.wmcloud.org is being indexed by google (and scoring pretty high) - https://phabricator.wikimedia.org/T372538#10067212 (10brennen)
[15:22:45] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[15:31:58] <wikibugs>	 (03open) 10raymond-ndibe: [toolforge-deploy] DO_NOT_MERGE : increase builds-api replicas in local env [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/481 (https://phabricator.wikimedia.org/T358203)
[15:40:52] <wikibugs>	 (03open) 10raymond-ndibe: Draft: [builds-api] DO_NOT_MERGE: schedule all pods on toolforge-worker [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/108 (https://phabricator.wikimedia.org/T358203)
[15:41:05] <wikibugs>	 (03update) 10raymond-ndibe: Draft: [toolforge-deploy] DO_NOT_MERGE : increase builds-api replicas in local env [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/481 (https://phabricator.wikimedia.org/T358203)
[15:42:23] <wikibugs>	 (03update) 10raymond-ndibe: [builds-api] add topologySpreadConstraints to deployment [repos/cloud/toolforge/builds-api] (node-selector-to-test-topology-spread) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/82 (https://phabricator.wikimedia.org/T358203)
[15:44:11] <wikibugs>	 (03update) 10raymond-ndibe: [builds-api] add topologySpreadConstraints to deployment [repos/cloud/toolforge/builds-api] (node-selector-to-test-topology-spread) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/82 (https://phabricator.wikimedia.org/T358203)
[15:44:28] <wikibugs>	 (03update) 10raymond-ndibe: [builds-api] add topologySpreadConstraints to deployment [repos/cloud/toolforge/builds-api] (node-selector-to-test-topology-spread) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/82 (https://phabricator.wikimedia.org/T358203)
[15:53:09] <jinxer-wm>	 FIRING: CephClusterInUnknown: #page Ceph cluster in eqiad is in unknown status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInUnknown - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInUnknown
[15:53:14] <wikibugs>	 06cloud-services-team: CephClusterInUnknown - https://phabricator.wikimedia.org/T372573 (10phaultfinder) 03NEW
[15:54:42] <wikibugs>	 (03update) 10raymond-ndibe: DO_NOT_MERGE: testing _display_messages move to toolforge-weld [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/90
[16:02:41] <wikibugs>	 10VPS-project-devtools, 06collaboration-services, 10GitLab, 06Release-Engineering-Team: https://gitlab.devtools.wmcloud.org is being indexed by google (and scoring pretty high) - https://phabricator.wikimedia.org/T372538#10067315 (10brennen) Here we go:  https://docs.gitlab.com/omnibus/settings/nginx.html#...
[16:05:34] <wikibugs>	 (03CR) 10Kosta Harlan: Exempt Test Group Repositories (034 comments) [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/1063008 (https://phabricator.wikimedia.org/T372565) (owner: 10Pwangai)
[16:09:02] <wikibugs>	 (03approved) 10dcaro: [jobs-cli] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/63 (https://phabricator.wikimedia.org/T341066) (owner: 10raymond-ndibe)
[16:09:06] <wikibugs>	 (03update) 10dcaro: [jobs-cli] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/63 (https://phabricator.wikimedia.org/T341066) (owner: 10raymond-ndibe)
[16:25:06] <icinga-wm>	 PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
[16:26:00] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29687 bytes in 4.383 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
[16:27:00] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:27:23] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=97)
[16:27:45] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:27:52] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[16:28:29] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:28:35] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[16:28:54] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:29:01] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[16:29:06] <icinga-wm>	 PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
[16:29:48] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:29:54] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[16:30:00] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29687 bytes in 4.421 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
[16:30:39] <jinxer-wm>	 RESOLVED: CephClusterInUnknown: #page Ceph cluster in eqiad is in unknown status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInUnknown - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInUnknown
[16:30:40] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:30:46] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[16:31:30] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:31:36] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[16:32:04] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:32:10] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[16:33:03] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.wait_for_rebalance (exit_code=99)
[16:33:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:34:10] <wikibugs>	 10Tool-bridgebot: Have wm-bridgebot bridge #wikipedia-it-sysop on Libera like it does on freenode - https://phabricator.wikimedia.org/T283357#10067415 (10bd808) >>! In T283357#7114024, @bd808 wrote: > This bridge is now shutdown.  :facepalm: This comment should have been applied to {T261954} rather than this...
[16:36:49] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10067421 (10Andrew) @cmooney can we get cloudcephosd1036 set up now that the switch work is done?
[16:38:20] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:38:26] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[16:39:24] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:39:30] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[16:39:43] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:39:49] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[16:40:54] <wikibugs>	 (03PS4) 10Pwangai: Exempt Test Group Repositories [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/1063008 (https://phabricator.wikimedia.org/T372565)
[16:44:50] <wikibugs>	 (03CR) 10Pwangai: Exempt Test Group Repositories (033 comments) [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/1063008 (https://phabricator.wikimedia.org/T372565) (owner: 10Pwangai)
[17:22:39] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation), 10Humaniki: Cloud VPS "wikidumpparse" project Buster deprecation - https://phabricator.wikimedia.org/T367561#10067519 (10Maximilianklein) update for 2024-08-14  [x] create cinder volume. [x] move project code [x] move mysql-db files [x] create a new debian bookworm inst...
[17:30:26] <wikibugs>	 10Toolforge, 10Bitu, 06Infrastructure-Foundations: Can't activate my new key using the idm.wikimedia.org (bitu) interface - https://phabricator.wikimedia.org/T372581 (10Meno25) 03NEW
[18:09:57] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation), 10Humaniki: Cloud VPS "wikidumpparse" project Buster deprecation - https://phabricator.wikimedia.org/T367561#10067602 (10Maximilianklein) update for 2024-08-15  [x] create cinder volume. [x] move project code [x] move mysql-db files [x] create a new debian bookworm inst...
[18:53:18] <wikibugs>	 (03update) 10raymond-ndibe: DO_NOT_MERGE: testing _display_messages move to toolforge-weld [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/90
[19:15:38] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[19:15:44] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[19:15:49] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[19:15:55] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[19:16:13] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[19:16:19] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[19:16:24] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[19:16:31] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[19:16:47] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[19:16:54] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[19:18:00] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[19:18:07] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[19:18:43] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[19:18:51] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[19:19:36] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[19:19:43] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[19:20:46] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[19:20:52] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[19:21:35] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[19:21:42] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[19:38:05] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q1:rack/setup/install cloudlb2004-dev - https://phabricator.wikimedia.org/T370678#10067817 (10Jhancock.wm)
[20:19:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[20:23:39] <wmcs-alerts>	 FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[20:23:46] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q1:rack/setup/install cloudlb2004-dev - https://phabricator.wikimedia.org/T370678#10067968 (10Jhancock.wm) @aborrero when you have a moment, can you do this step for me please? thanks!  Update the operations/puppet repo
[20:28:39] <wmcs-alerts>	 RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[20:56:23] <wikibugs>	 10Toolforge (Toolforge iteration 14): something is wrong with pre-commit on builds-api - https://phabricator.wikimedia.org/T372601 (10Raymond_Ndibe) 03NEW
[21:08:18] <icinga-wm>	 PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
[21:10:10] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29698 bytes in 0.435 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
[22:10:18] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q1:rack/setup/install cloudlb2004-dev - https://phabricator.wikimedia.org/T370678#10068225 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm
[23:30:29] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q1:rack/setup/install cloudlb2004-dev - https://phabricator.wikimedia.org/T370678#10068319 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm executed...