[00:04:00] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T348643) [00:48:59] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.drain_node (exit_code=97) (T348643) [00:49:20] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T348643) [02:58:58] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (T348643) [04:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:49:09] 10Tool-mwcli: Migrate from GitHub to GitLab and Phab - https://phabricator.wikimedia.org/T384894 (10Samwilson) 03NEW [04:49:17] (03open) 10samwilson: Update project URLs [toolforge-repos/mwcli] - 10https://gitlab.wikimedia.org/toolforge-repos/mwcli/-/merge_requests/1 (https://phabricator.wikimedia.org/T384894) [04:50:18] (03merge) 10samwilson: Update project URLs [toolforge-repos/mwcli] - 10https://gitlab.wikimedia.org/toolforge-repos/mwcli/-/merge_requests/1 (https://phabricator.wikimedia.org/T384894) [05:11:57] 10Tool-mwcli: Add extension:upgrade command - https://phabricator.wikimedia.org/T384895 (10Samwilson) 03NEW [05:13:08] 10Tool-mwcli: Add import:export command to move revisions between wikis - https://phabricator.wikimedia.org/T384896 (10Samwilson) 03NEW [05:18:10] 10wikitech.wikimedia.org: β˜‚ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10499505 (10XIDME) |**Wikitech account/LDAP:**| XIDME| |**SUL account**| XIDME| |**Account linked on [[ https://idm.wikimedia.org/ | IDM ]]** |Y| |**I have visited [[ https://wikitech.wikimed... [05:23:14] (03open) 10samwilson: Set up GitLab CI [toolforge-repos/mwcli] - 10https://gitlab.wikimedia.org/toolforge-repos/mwcli/-/merge_requests/2 (https://phabricator.wikimedia.org/T384894) [05:29:51] (03update) 10samwilson: Set up GitLab CI [toolforge-repos/mwcli] - 10https://gitlab.wikimedia.org/toolforge-repos/mwcli/-/merge_requests/2 (https://phabricator.wikimedia.org/T384894) [05:31:22] (03merge) 10samwilson: Set up GitLab CI [toolforge-repos/mwcli] - 10https://gitlab.wikimedia.org/toolforge-repos/mwcli/-/merge_requests/2 (https://phabricator.wikimedia.org/T384894) [05:45:09] 10Tool-mwcli: Add user:passwordreset command - https://phabricator.wikimedia.org/T384897 (10Samwilson) 03NEW [05:46:41] 10Tool-mwcli, 13Patch-For-Review: Migrate from GitHub to GitLab and Phab - https://phabricator.wikimedia.org/T384894#10499527 (10Samwilson) [05:46:57] 10Tool-mwcli, 13Patch-For-Review: Migrate from GitHub to GitLab and Phab - https://phabricator.wikimedia.org/T384894#10499528 (10Samwilson) 05Openβ†’03Resolved a:03Samwilson All done I think. [06:30:21] 10Tool-mwcli: Plan command nomenclature - https://phabricator.wikimedia.org/T384898 (10Samwilson) 03NEW [06:33:09] (03CR) 10Hridyesh_Gupta: Fixed the typo in Top contributor campaign page (031 comment) [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1053046 (https://phabricator.wikimedia.org/T358396) (owner: 10Hridyesh_Gupta) [06:35:29] 10wikitech.wikimedia.org: β˜‚ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10499563 (10Arinaigu) |**Wikitech account/LDAP:**| arinaigum| |**SUL account**| AIgumenshcheva-WMF| |**Account linked on [[ https://idm.wikimedia.org/ | IDM ]]** |N| |**I have visited [[ http... [08:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:30:56] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:49:08] 10Tool-erinnermich: [ErinnerMichBot] Possible support for other languages and projects? - https://phabricator.wikimedia.org/T384842#10499696 (10Mr_Tortue) If needed, I can either help with the code or just operate it for frwiki. However, like most wikis, the community approval is needed before the bot can operat... [09:08:32] (03close) 10hashar: phorge: move datacenter tasks to DC-Ops channel [toolforge-repos/wikibugs2] - 10https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/49 (https://phabricator.wikimedia.org/T384804) [11:20:54] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:30:54] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:38:13] (03PS4) 10Hridyesh_Gupta: Fix the typo in Top contributor campaign page and update messages.pot [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1053046 (https://phabricator.wikimedia.org/T358396) [11:39:29] (03CR) 10Hridyesh_Gupta: "I have done the needful, please have a look!" [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1053046 (https://phabricator.wikimedia.org/T358396) (owner: 10Hridyesh_Gupta) [11:45:28] 06cloud-services-team, 10Toolforge, 10Tools: Flickr blocking image requests from Toolforge k8s, breaking multiple tools - https://phabricator.wikimedia.org/T384468#10500113 (10Andrew) Got a reponse from flikr support today: ` Hi πŸ‘‹, Thank you for reaching out. My name is Tara - happy to help you today. I... [11:46:47] FIRING: NodeDown: Node cloudcephosd1021 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1021 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [11:53:07] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10500134 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host clo... [12:11:32] FIRING: ToolsNfsAlmostFull: Toolforge NFS is 0.8528143474493756/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull [12:50:42] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10500383 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudgw... [13:02:17] 06cloud-services-team: KernelErrors Server cloudcephosd1013 logged kernel errors - https://phabricator.wikimedia.org/T384850#10500438 (10fnegri) Three reboots in the past 24h, which caused some `priority=crit` errors that seem innocuous: ` fnegri@cloudcephosd1013:~$ sudo journalctl _TRANSPORT=kernel --since -2... [13:02:39] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T348643) [13:02:59] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.drain_node (exit_code=97) (T348643) [13:03:15] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T348643) [13:04:02] 06cloud-services-team: KernelErrors Server cloudcephosd1013 logged kernel errors - https://phabricator.wikimedia.org/T384850#10500443 (10fnegri) 05Openβ†’03Resolved a:03fnegri All reboots were manual reboots by @Andrew reimaging the host. [13:04:11] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T348643) [13:04:58] 06cloud-services-team: KernelError Server cloudcephosd1019 may have kernel errors - https://phabricator.wikimedia.org/T384054#10500447 (10fnegri) 05Openβ†’03Resolved a:03fnegri This has not reoccured, resolving for now. [13:22:42] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: replace cloudgw100[12] with spare 'second region' dev servers cloudnet100[78]-dev - https://phabricator.wikimedia.org/T382356#10500492 (10aborrero) [13:33:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-75 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [13:38:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-75 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [13:41:47] FIRING: NodeDown: Node cloudcephosd1021 has been down for long. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1021 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [13:41:55] 06cloud-services-team: NodeDown Node cloudcephosd1021 has been down for long. - https://phabricator.wikimedia.org/T384930 (10phaultfinder) 03NEW [14:04:23] 06cloud-services-team: NodeDown Node cloudcephosd1021 has been down for long. - https://phabricator.wikimedia.org/T384930#10500635 (10fnegri) 05Openβ†’03Resolved a:03fnegri Node was drained in https://phabricator.wikimedia.org/T348643#10497103 [14:14:31] FIRING: [2x] KernelErrors: Server cloudgw1003 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudgw1003 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [14:14:41] 06cloud-services-team: KernelErrors Server cloudgw1003 logged kernel errors - https://phabricator.wikimedia.org/T384932 (10phaultfinder) 03NEW [14:16:22] 06cloud-services-team, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, 06SRE: decommission cloudcephmon100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T380893#10500680 (10cmooney) >>! In T380893#10396432, @Andrew wrote: > These hosts have a somewhat unusual vlan setup, so my guess is something i... [14:17:17] 06cloud-services-team, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, 06SRE: decommission cloudcephmon100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T380893#10500684 (10cmooney) a:05cmooneyβ†’03None [14:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:36:28] 06cloud-services-team, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, 06SRE: decommission cloudcephmon100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T380893#10500778 (10Andrew) Thanks @cmooney ! @VRiley-WMF, you can give this another try at your convenience. [14:38:57] (03PS1) 10FNegri: Add missing WMCS repos [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1114742 [15:01:14] 10Tool-erinnermich: [ErinnerMichBot] Possible support for other languages and projects? - https://phabricator.wikimedia.org/T384842#10500900 (10M-J) >>! In T384842#10499696, @Mr_Tortue hat geschrieben: > If needed, I can either help with the code or just operate it for frwiki. However, like most wikis, the commu... [15:55:57] FIRING: HarborDown: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [16:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:30:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [16:32:16] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: replace cloudgw100[12] with spare 'second region' dev servers cloudnet100[78]-dev - https://phabricator.wikimedia.org/T382356#10501284 (10aborrero) reminder: verify VLAN trunk on the NIC of the cloudgw servers. [16:35:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:40:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [16:43:24] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (T348643) [16:43:24] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) (T348643) [16:44:31] FIRING: [2x] NodeDown: Node cloudcephosd1022 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [17:19:08] 06cloud-services-team: KernelErrors Server cloudgw1003 logged kernel errors - https://phabricator.wikimedia.org/T384932#10501453 (10fnegri) 05Openβ†’03Resolved a:03fnegri Host was reimaged in {T382412}, errors are expected. ` Jan 28 12:07:30 cloudgw1003 kernel: x86/cpu: VMX (outside TXT) disabled by BIO... [17:19:25] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install cloudcephosd2004-dev - https://phabricator.wikimedia.org/T378825#10501461 (10Jhancock.wm) 05Openβ†’03Resolved [17:19:40] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install cloudcephosd2004-dev - https://phabricator.wikimedia.org/T378825#10501464 (10Jhancock.wm) [17:47:07] RESOLVED: NodeDown: Node cloudcephosd1021 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1021 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [17:47:07] FIRING: KernelErrors: Server cloudcephosd1021 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1021 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [17:47:12] 06cloud-services-team: KernelErrors Server cloudcephosd1021 logged kernel errors - https://phabricator.wikimedia.org/T384949 (10phaultfinder) 03NEW [17:52:32] 06cloud-services-team, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, 06SRE: decommission cloudcephmon100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T380893#10501556 (10Papaul) replaced 1002 with {F58301781} [17:54:31] RESOLVED: NodeDown: Node cloudcephosd1021 has been down for long. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1021 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [17:57:17] 06cloud-services-team: KernelErrors Server cloudcephosd1021 logged kernel errors - https://phabricator.wikimedia.org/T384949#10501581 (10fnegri) 05Openβ†’03Resolved a:03fnegri Host was restarted in {T348643} ` fnegri@cloudcephosd1021:~$ sudo journalctl -k -perr --boot all -- Journal begins at Mon 2024-1... [18:01:04] 06cloud-services-team, 10Cloud-VPS, 10SRE Observability (FY2024/2025-Q3): Remove librenms -> graphite integration, replace with gnmi - https://phabricator.wikimedia.org/T372457#10501597 (10cmooney) @dcaro hey just a heads up. I've been doing some work on the gnmic config to try and add extra stats, and impr... [18:08:22] 06cloud-services-team, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, 06SRE: decommission cloudcephmon100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T380893#10501630 (10Papaul) 05Openβ†’03Resolved a:03Papaul This is complete [18:17:07] FIRING: KernelErrors: Server cloudcephosd1022 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1022 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [18:17:16] 06cloud-services-team: KernelErrors Server cloudcephosd1022 logged kernel errors - https://phabricator.wikimedia.org/T384953 (10phaultfinder) 03NEW [18:25:47] FIRING: NodeDown: Node cloudcephosd1023 has been down for long. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1023 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [18:25:52] 06cloud-services-team: NodeDown Node cloudcephosd1023 has been down for long. - https://phabricator.wikimedia.org/T384955 (10phaultfinder) 03NEW [18:39:27] RESOLVED: HarborDown: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [18:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:09:56] 10wikitech.wikimedia.org: β˜‚ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10501808 (10gmodena) |**Wikitech account/LDAP:**| gmodena| |**SUL account**| GModena (WMF)| |**Account linked on [[ https://idm.wikimedia.org/ | IDM ]]** |Y| |**I have visited [[ https://wiki... [19:13:33] 10Striker: [toolsadmin] Striker cannot create Developer accounts with names matching existing SUL accounts - https://phabricator.wikimedia.org/T380384#10501823 (10bd808) `lang=irc [17:50] < dancy> bd808: I'm attempting to create a new toolforge tool named "prototyper".Β Β I don't see any hit for this name atΒ h... [19:15:42] 10Tool-wikiqanda, 06Future-Audiences, 07Design, 07Epic: [Epic] Non-Q&A use-cases - https://phabricator.wikimedia.org/T378125#10501836 (10DLin-WMF) 05Openβ†’03Resolved [19:25:38] 10Striker: [toolsadmin] Striker cannot create Developer accounts or tools with names matching existing SUL accounts - https://phabricator.wikimedia.org/T380384#10501907 (10bd808) [19:27:39] 10wikitech.wikimedia.org: β˜‚ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10501910 (10Reedy) >>! In T376267#10501808, @gmodena wrote: > |**Wikitech account/LDAP:**| gmodena| > |**SUL account**| GModena (WMF)| > |**Account linked on [[ https://idm.wikimedia.org/ | I... [20:02:07] FIRING: [2x] KernelErrors: Server cloudcephosd1022 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [20:02:11] 06cloud-services-team: KernelErrors - https://phabricator.wikimedia.org/T384968 (10phaultfinder) 03NEW [20:05:47] RESOLVED: NodeDown: Node cloudcephosd1023 has been down for long. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1023 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [20:31:47] FIRING: NodeDown: Node cloudcephosd1021 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1021 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [20:37:07] FIRING: KernelErrors: Server cloudcephosd1021 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1021 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [20:48:59] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [20:49:47] FIRING: NodeDown: Node cloudcephosd1023 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1023 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [20:50:05] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [20:51:31] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [20:54:47] FIRING: [2x] NodeDown: Node cloudcephosd1022 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [20:58:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [20:59:47] RESOLVED: NodeDown: Node cloudcephosd1023 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1023 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [21:03:27] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [21:04:01] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [21:04:14] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [21:04:49] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [21:14:39] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [22:29:31] RESOLVED: [2x] KernelErrors: Server cloudcephosd1013 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1013 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [23:05:28] FIRING: PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-harbor-2 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [23:24:57] FIRING: HarborDown: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [23:25:28] RESOLVED: PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-harbor-2 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [23:56:05] (03approved) 10raymond-ndibe: [jobs-cli] support http healthcheck for continuous jobs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/81 (https://phabricator.wikimedia.org/T362621) [23:56:12] (03merge) 10raymond-ndibe: [jobs-cli] support http healthcheck for continuous jobs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/81 (https://phabricator.wikimedia.org/T362621)