[00:17:45] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Import Fedora CoreOS 42 image for use with Magnum - https://phabricator.wikimedia.org/T396912 (10bd808) 03NEW
[00:17:58] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) (T309789)
[00:18:04] <stashbot>	 T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789
[00:20:50] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Import Fedora CoreOS 42 image for use with Magnum - https://phabricator.wikimedia.org/T396912#10914903 (10bd808) @Andrew has been poking at some Magnum related things, so maybe he would be interested in picking this up? Fedora has a 6 month release cycle for it's CoreOS vers...
[01:36:17] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=97)
[01:36:30] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add
[01:36:33] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0)
[01:37:06] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[01:37:36] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[01:38:44] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T309789)
[01:38:49] <stashbot>	 T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789
[02:07:49] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/245
[02:08:09] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/245
[02:08:56] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/248
[02:09:13] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/248
[03:14:18] <jinxer-wm>	 RESOLVED: KernelErrors: Server cloudcephosd1017 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1017 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors
[03:51:35] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.drain_node (exit_code=97) (T309789)
[03:51:42] <stashbot>	 T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789
[03:54:39] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[03:54:42] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[03:55:25] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T309789)
[03:56:17] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) (T309789)
[03:56:47] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T309789)
[03:56:53] <stashbot>	 T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789
[03:57:47] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (T309789)
[03:58:34] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy
[03:59:09] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99)
[04:02:46] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy
[04:02:58] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99)
[04:06:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[04:20:30] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy
[04:21:10] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0)
[04:23:01] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T309789)
[04:23:06] <stashbot>	 T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789
[04:23:51] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) (T309789)
[04:25:54] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T309789)
[04:38:39] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[05:12:54] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add
[05:12:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99)
[05:13:56] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add
[05:13:58] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99)
[05:14:39] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add
[05:14:41] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99)
[06:58:17] <wikibugs>	 10Quarry: [bug] Query results do not appear due to JS error - https://phabricator.wikimedia.org/T396904#10914991 (10Liz) Any progress with this problem?
[07:29:48] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (T309789)
[07:29:55] <stashbot>	 T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789
[07:52:33] <notefromgithub>	 supertassu opened https://github.com/toolforge/quarry/pull/88
[07:56:10] <notefromgithub>	 supertassu closed https://github.com/toolforge/quarry/pull/88
[07:57:42] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Quarry: [bug] Query results do not appear due to JS error - https://phabricator.wikimedia.org/T396904#10915001 (10taavi) 05Open→03Resolved p:05Triage→03High a:03taavi At first I thought this was the same issue as T396893#10914253, i.e. someone trying t...
[08:42:33] <wikibugs>	 10Toolforge (Toolforge iteration 21): [jobs-emailer] stops processing k8s events - https://phabricator.wikimedia.org/T396850#10915026 (10dcaro) I got stuck today also, the last log of the event fetching task does not show any error, but there's a log for connection error and retrying right after (from the config...
[08:47:09] <wikibugs>	 10Toolforge (Toolforge iteration 21): [jobs-emailer] stops processing k8s events - https://phabricator.wikimedia.org/T396850#10915027 (10dcaro) This might be a good place to start to get example of setting timeouts and recovering from connection issues: https://github.com/kubernetes-client/python/tree/master/exa...
[08:57:02] <wikibugs>	 10Toolforge (Toolforge iteration 21): [jobs-emailer] stops processing k8s events - https://phabricator.wikimedia.org/T396850#10915039 (10dcaro) This seems very similar https://github.com/kubernetes-client/python/issues/1148
[09:05:38] <wikibugs>	 (03close) 10dcaro: emailer: run webserver in a different thread [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/9 (https://phabricator.wikimedia.org/T379924) (owner: 10aborrero)
[09:59:18] <jinxer-wm>	 FIRING: KernelErrors: Server cloudcephosd1020 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1020 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors
[09:59:30] <wikibugs>	 06cloud-services-team: KernelErrors Server cloudcephosd1020 logged kernel errors - https://phabricator.wikimedia.org/T396917 (10phaultfinder) 03NEW
[11:49:00] <jinxer-wm>	 FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[11:59:00] <jinxer-wm>	 FIRING: [3x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[11:59:10] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,heat
[11:59:17] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for service: project,heat
[12:00:09] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy
[12:01:07] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0)
[12:03:27] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add
[12:06:31] <icinga-wm>	 PROBLEM - Host cloudcephosd1020 is DOWN: PING CRITICAL - Packet loss = 100%
[12:08:09] <icinga-wm>	 RECOVERY - Host cloudcephosd1020 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms
[12:10:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[12:11:42] <logmsgbot_cloud>	 andrew@cloudcumin1001 bootstrap_and_add (PID 2984097) is awaiting input
[12:15:41] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0)
[12:29:18] <jinxer-wm>	 FIRING: KernelErrors: Server cloudcephosd1022 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1022 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors
[12:29:28] <wikibugs>	 06cloud-services-team: KernelErrors Server cloudcephosd1022 logged kernel errors - https://phabricator.wikimedia.org/T396921 (10phaultfinder) 03NEW
[12:30:54] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[12:36:35] <icinga-wm>	 PROBLEM - Host cloudcephosd1022 is DOWN: PING CRITICAL - Packet loss = 100%
[12:40:03] <icinga-wm>	 RECOVERY - Host cloudcephosd1022 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms
[12:44:18] <jinxer-wm>	 RESOLVED: KernelErrors: Server cloudcephosd1018 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1018 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors
[13:18:22] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add
[13:18:25] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99)
[13:19:05] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add
[13:19:08] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99)
[13:19:50] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add
[13:23:33] <icinga-wm>	 PROBLEM - Host cloudcephosd1022 is DOWN: PING CRITICAL - Packet loss = 100%
[13:24:09] <icinga-wm>	 RECOVERY - Host cloudcephosd1022 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms
[13:27:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[13:28:25] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0)
[13:28:49] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[13:30:05] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1023.eqiad.wmnet' (T394727)
[13:30:11] <stashbot>	 T394727: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727
[13:30:31] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1023.eqiad.wmnet' (T394727)
[13:30:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T309789)
[13:31:02] <stashbot>	 T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789
[13:32:09] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[14:26:43] <wikibugs>	 (03PS1) 10Essa237: [Fix] added a landing page [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1157586
[15:36:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-37 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[16:08:06] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37
[16:11:30] <jinxer-wm>	 RESOLVED: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[16:12:22] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37
[16:25:59] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[16:26:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-37 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[17:16:26] <wikibugs>	 10Quarry: [bug] Another problem with Quarry - https://phabricator.wikimedia.org/T396910#10915302 (10Aklapper) For future reference, please summarize the actual problem in the task title - thanks!]
[18:44:34] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (T309789)
[18:44:40] <stashbot>	 T309789: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789
[18:45:03] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy
[18:46:08] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0)
[19:24:18] <jinxer-wm>	 FIRING: KernelErrors: Server cloudcephosd1023 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1023 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors
[19:24:28] <wikibugs>	 06cloud-services-team: KernelErrors Server cloudcephosd1023 logged kernel errors - https://phabricator.wikimedia.org/T396929 (10phaultfinder) 03NEW
[19:29:02] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy
[19:54:18] <jinxer-wm>	 RESOLVED: KernelErrors: Server cloudcephosd1019 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1019 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors
[20:27:32] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: ZuulDevOpsBot user can create but not delete a cluster template - https://phabricator.wikimedia.org/T396932 (10bd808) 03NEW
[20:41:12] <wikibugs>	 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T394614#10915437 (10LibUp-bot) A new upstream version of Pywikibot is now available: 10.2.0. * https://gerrit.wikimedia.org/g/pywikibot/core/+/refs/tags/10.2.0 * https://doc.wikimedia.org/pywikibot/stable/changelog.html
[20:41:13] <wikibugs>	 06cloud-services-team, 10Toolforge: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T396933#10915438 (10LibUp-bot)
[20:42:00] <jinxer-wm>	 FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[20:42:07] <wikibugs>	 06cloud-services-team: NovafullstackSustainedFailures Novafullstack tests have been failing for more than 5hours in eqiad - https://phabricator.wikimedia.org/T396934 (10phaultfinder) 03NEW
[20:43:34] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure (Zuul upgrade): Magnum created instances failing to talk to OpenStack user_data service - https://phabricator.wikimedia.org/T396935 (10bd808) 03NEW
[20:44:48] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0)
[20:48:28] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add
[20:52:07] <icinga-wm>	 PROBLEM - Host cloudcephosd1023 is DOWN: PING CRITICAL - Packet loss = 100%
[20:53:37] <icinga-wm>	 RECOVERY - Host cloudcephosd1023 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms
[20:55:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[20:56:01] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure (Zuul upgrade): ZuulDevOpsBot user can create but not delete a cluster template - https://phabricator.wikimedia.org/T396932#10915482 (10bd808)
[20:58:06] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0)
[21:02:48] <wikibugs>	 (03PS1) 10Bovimacoco: T390397 Enforce Strict Typing. Bug=T390397 [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1157764
[21:09:18] <jinxer-wm>	 FIRING: KernelErrors: Server cloudcephosd1024 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1024 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors
[21:09:27] <wikibugs>	 06cloud-services-team: KernelErrors Server cloudcephosd1024 logged kernel errors - https://phabricator.wikimedia.org/T396937 (10phaultfinder) 03NEW
[21:12:25] <icinga-wm>	 PROBLEM - Host cloudcephosd1024 is DOWN: PING CRITICAL - Packet loss = 100%
[21:13:39] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[21:15:53] <icinga-wm>	 RECOVERY - Host cloudcephosd1024 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms
[21:33:30] <jinxer-wm>	 RESOLVED: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[21:49:18] <jinxer-wm>	 FIRING: KernelErrors: Server cloudcephosd1023 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1023 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors
[22:09:38] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: cloudcephosd1xxxx.private.eqiad.wikimedia.cloud - https://phabricator.wikimedia.org/T396940 (10Andrew) 03NEW
[22:10:33] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: cloudcephosd1xxxx.private.eqiad.wikimedia.cloud - https://phabricator.wikimedia.org/T396940#10915559 (10Andrew)
[22:12:33] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: cloudcephosd1xxxx.private.eqiad.wikimedia.cloud - https://phabricator.wikimedia.org/T396940#10915560 (10Andrew) Bonus question: Is there some reason why it is good, actually, for the reimage s...
[22:25:20] <wikibugs>	 06cloud-services-team, 10Striker: Update StrikerBot Developer, SUL, and related accounts to email folks besides just bd808 - https://phabricator.wikimedia.org/T395697#10915567 (10Aklapper)
[22:25:30] <wikibugs>	 06cloud-services-team, 10Striker: Update StrikerBot Developer, SUL, and related accounts to email folks besides just bd808 - https://phabricator.wikimedia.org/T395697#10915569 (10Aklapper) 05Stalled→03Resolved I ran `UPDATE phabricator_user.user_email SET address = "admin.strikerbot@toolforge.org" WHER...
[22:28:04] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: cloudcephosd1xxxx.private.eqiad.wikimedia.cloud - https://phabricator.wikimedia.org/T396940#10915578 (10Andrew) Here is an example of the cookbook yanking some entries:  https://netbox.wikimed...
[22:28:40] <wikibugs>	 (03PS1) 10Bovimacoco: T390402 Secure: Migrate hardcoded URLs to .env with validation. Bug=T390402 [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1157840
[22:29:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add
[22:33:50] <icinga-wm>	 PROBLEM - Host cloudcephosd1024 is DOWN: PING CRITICAL - Packet loss = 100%
[22:34:18] <icinga-wm>	 RECOVERY - Host cloudcephosd1024 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms
[22:37:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[22:38:26] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0)
[22:40:24] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10915587 (10Andrew)
[22:56:39] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning