[00:04:31] <wikibugs>	 06Toolforge-standards-committee: Adoption request for geograph2commons - https://phabricator.wikimedia.org/T345707#10053311 (10bd808) >>! In T345707#10053246, @bjh21 wrote: > I made this request in response to [[ https://commons.wikimedia.org/wiki/Commons:Help_desk/Archive/2023/08#Transfer_from_Geograph | a thre...
[00:16:29] <wmcs-alerts>	 FIRING: InstanceDown: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:21:29] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:46:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T371878)
[00:47:04] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[00:47:10] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.drain_node (exit_code=97) (T371878)
[00:47:21] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T371878)
[00:47:42] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10053322 (10Andrew)
[01:51:18] <wikibugs>	 06cloud-services-team, 06DC-Ops, 10ops-codfw, 06SRE: Test new hardware candidate for cloudbackup replacement - https://phabricator.wikimedia.org/T353746#10053337 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host testhost2001.codfw.wmnet with OS bookworm execut...
[03:16:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[04:25:51] <wikibugs>	 (03update) 10samwilson: Add hourly update-focus-areas command [toolforge-repos/wishlist] - 10https://gitlab.wikimedia.org/toolforge-repos/wishlist/-/merge_requests/1 (https://phabricator.wikimedia.org/T363240 https://phabricator.wikimedia.org/T364648)
[04:28:09] <wikibugs>	 (03update) 10samwilson: Add hourly update-focus-areas command [toolforge-repos/wishlist] - 10https://gitlab.wikimedia.org/toolforge-repos/wishlist/-/merge_requests/1 (https://phabricator.wikimedia.org/T364648)
[04:50:39] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[05:06:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit backup_vms.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[05:29:03] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10053420 (10Andrew)
[05:35:08] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (T371878)
[05:35:14] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[05:36:23] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T371878)
[06:59:10] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[07:01:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The systemd unit backup_vms.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[07:02:06] <wikibugs>	 06cloud-services-team: SystemdUnitDown  Unit backup_vms.service on node cloudbackup1003 has been down for long. - https://phabricator.wikimedia.org/T372126 (10phaultfinder) 03NEW
[07:23:31] <wikibugs>	 10Toolforge: Java application redeploys several times until it starts - https://phabricator.wikimedia.org/T372092#10053514 (10Benjavalero) Today I have seen that along the day there was another restart at 14.41 UTC, this time with a (maybe useful) trace: ` 2024-08-08 14:05:37,826 DEBUG [uler-2] e.b.r.f.l.load.Li...
[08:19:10] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[08:21:56] <wikibugs>	 10Cloud-VPS (Quota-requests): [Quota increase]: globaleducation - https://phabricator.wikimedia.org/T372134 (10Ragesoss) 03NEW
[08:25:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[08:32:22] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C:04-1] Fix the typo error from one to on (031 comment) [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1059930 (owner: 10GauriGuptaa)
[09:06:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[10:41:56] <jinxer-wm>	 FIRING: [2x] SystemdUnitDown: The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[11:01:56] <jinxer-wm>	 FIRING: [2x] SystemdUnitDown: The systemd unit backup_vms.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[11:02:06] <wikibugs>	 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T370383#10053862 (10phaultfinder)
[11:27:15] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) (T371878)
[11:27:20] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[12:25:24] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[12:36:56] <jinxer-wm>	 RESOLVED: SystemdUnitDown: The service unit backup_vms.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[12:36:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The systemd unit purge_vm_backup.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[12:37:02] <wikibugs>	 06cloud-services-team: SystemdUnitDown  Unit purge_vm_backup.service on node cloudbackup1004 has been down for long. - https://phabricator.wikimedia.org/T372143 (10phaultfinder) 03NEW
[12:41:56] <jinxer-wm>	 RESOLVED: SystemdUnitDown: The systemd unit purge_vm_backup.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[12:46:49] <wikibugs>	 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T371944#10054012 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/444
[12:46:54] <wikibugs>	 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T371944#10054013 (10rook) 05Open→03Resolved a:03rook
[12:47:01] <notefromgithub>	 vivian-rook closed https://github.com/toolforge/paws/pull/444
[12:55:35] <wikibugs>	 10VPS-project-Codesearch: mwclient should be indexed by codesearch - https://phabricator.wikimedia.org/T372144 (10Tgr) 03NEW
[12:58:14] <wikibugs>	 10VPS-project-Codesearch: AWB should be indexed by codesearch - https://phabricator.wikimedia.org/T372145 (10Tgr) 03NEW
[13:00:23] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 6 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10054051 (10Tgr) >>! In T371977#10049882, @Krinkle wrote: > I spe...
[13:12:50] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 6 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10054081 (10AdamWill) mwclient-side fix is merged and I intend to...
[13:14:25] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10054082 (10Andrew)
[13:16:05] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 6 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10054089 (10Tgr)
[13:16:16] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 6 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10054091 (10Tgr)
[13:18:53] <wikibugs>	 10VPS-project-Codesearch: Index known popular MediaWiki client libraries - https://phabricator.wikimedia.org/T371993#10054103 (10Tgr)
[13:18:54] <wikibugs>	 10VPS-project-Codesearch: AWB should be indexed by codesearch - https://phabricator.wikimedia.org/T372145#10054101 (10Tgr) →14Duplicate dup:03T371993
[13:19:48] <wikibugs>	 10VPS-project-Codesearch: Index known popular MediaWiki client libraries - https://phabricator.wikimedia.org/T371993#10054098 (10Tgr)
[13:20:31] <wikibugs>	 10VPS-project-Codesearch: mwclient should be indexed by codesearch - https://phabricator.wikimedia.org/T372144#10054096 (10Tgr) →14Duplicate dup:03T371993
[13:22:06] <wikibugs>	 10VPS-project-Codesearch: Index known popular MediaWiki client libraries - https://phabricator.wikimedia.org/T371993#10054106 (10Tgr) The other major fallout was {T372017}. AWB is still using SVN so that sounds like a challenge.
[13:27:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-6 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[13:34:15] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T371878)
[13:34:20] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[13:35:53] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.drain_node (exit_code=97) (T371878)
[13:36:19] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T371878)
[13:38:28] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.drain_node (exit_code=97) (T371878)
[13:38:32] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T371878)
[14:07:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-6 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[14:15:47] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T371878)
[14:19:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[15:12:55] <wikibugs>	 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: Account recovery help needed for Developer account [gabina] - https://phabricator.wikimedia.org/T372153 (10Gabinaluz) 03NEW
[15:15:20] <wikibugs>	 10Cloud-VPS (Quota-requests): [Quota increase]: globaleducation - https://phabricator.wikimedia.org/T372134#10054411 (10Slst2020) +1
[15:17:49] <wikibugs>	 10Cloud-VPS (Quota-requests): [Quota increase]: globaleducation - https://phabricator.wikimedia.org/T372134#10054413 (10Slst2020) a:03Slst2020
[15:22:18] <logmsgbot_cloud>	 !log sstefanova@cloudcumin1001 globaleducation START - Cookbook wmcs.openstack.quota_increase
[15:22:26] <logmsgbot_cloud>	 !log sstefanova@cloudcumin1001 globaleducation END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0)
[15:25:56] <wikibugs>	 10Cloud-VPS (Quota-requests): [Quota increase]: globaleducation - https://phabricator.wikimedia.org/T372134#10054438 (10Slst2020) 05Open→03Resolved Done; please reopen the ticket when you no longer need the extra quota. :)
[15:29:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[16:30:55] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "wikidumpparse" project Buster deprecation - https://phabricator.wikimedia.org/T367561#10054599 (10Maximilianklein) @andrew , confirmed. That is my plan this next week. To get this done.   [ ] create cinder volume. [ ] move project code [ ] move mysql-db files [...
[17:00:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-6 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[17:26:27] <wikibugs>	 10Tool-wikiloves: WLE in the Democratic Republic of the Congo - https://phabricator.wikimedia.org/T372166 (10CapitainAfrika) 03NEW
[18:24:01] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 6 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10054698 (10DavidBrooks) To the comment on breaking-or-not //site...
[18:35:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-6 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[18:35:33] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-6 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[18:39:36] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) (T371878)
[18:39:41] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[18:40:33] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-6 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[18:41:33] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-6 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[19:16:39] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) (T371878)
[19:16:45] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[19:41:33] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-6 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[22:20:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[23:08:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-6 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses