[00:02:09] <wikibugs>	 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T371944#10047096 (10LibUp-bot)
[00:02:11] <wikibugs>	 10Toolforge: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T370115#10047098 (10LibUp-bot) A new upstream version of Pywikibot is now available: 9.3.1. * https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Pywikibot_image * https://gerrit.wikimedia.org/g/pywikibot/core/+/refs/tags/...
[00:16:29] <wmcs-alerts>	 FIRING: InstanceDown: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:21:29] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[01:18:29] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T371878)
[01:18:35] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[01:20:57] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10047217 (10Andrew)
[02:13:24] <wikibugs>	 (03update) 10raymond-ndibe: [toolforge-weld] move _display_message into toolforge weld [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/46
[02:19:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[02:35:09] <jinxer-wm>	 FIRING: CephSlowOps: Ceph cluster in eqiad has 7 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[02:35:16] <wikibugs>	 06cloud-services-team: CephSlowOps  Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T370752#10047266 (10phaultfinder)
[02:45:31] <wmcs-alerts>	 FIRING: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-1 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing
[02:50:09] <jinxer-wm>	 RESOLVED: CephSlowOps: Ceph cluster in eqiad has 15 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[02:50:31] <wmcs-alerts>	 RESOLVED: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-1 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing
[02:53:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit clean_puppet_client_bucket.service is in failed status on host cloudcephosd1038. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1038 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[03:05:14] <wikibugs>	 (03update) 10raymond-ndibe: [toolforge-weld] move _display_message into toolforge weld [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/46
[03:08:55] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) (T371878)
[03:09:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[03:09:01] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[03:10:39] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[03:20:53] <wikibugs>	 (03update) 10raymond-ndibe: [builds-cli] remove _display_messages [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/69
[03:41:28] <wikibugs>	 (03open) 10raymond-ndibe: [jobs-cli] remove _display_messages [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/62
[03:41:36] <wikibugs>	 (03update) 10raymond-ndibe: [jobs-cli] remove _display_messages [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/62
[04:18:56] <jinxer-wm>	 FIRING: [2x] SystemdUnitDown: The systemd unit clean_puppet_client_bucket.service on node cloudcephosd1035 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[04:19:07] <wikibugs>	 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T370383#10047286 (10phaultfinder)
[04:19:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[04:23:55] <wikibugs>	 (03open) 10raymond-ndibe: [envvars-cli] remove display_messages [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/57
[04:26:31] <wikibugs>	 (03update) 10raymond-ndibe: [envvars-cli] remove display_messages [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/57
[04:27:12] <wikibugs>	 (03update) 10raymond-ndibe: [envvars-cli] remove display_messages [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/57
[04:30:33] <wikibugs>	 10Toolforge (Toolforge iteration 14): [jobs-api] move jobs load feature to the backend - https://phabricator.wikimedia.org/T366209#10047294 (10Raymond_Ndibe) 05In progress→03Resolved
[04:30:38] <wikibugs>	 10Toolforge (Toolforge iteration 14): [jobs-cli] enforce proper validation for load jobs before calculate_changes - https://phabricator.wikimedia.org/T366211#10047291 (10Raymond_Ndibe) 05In progress→03Resolved
[04:31:59] <wikibugs>	 10Toolforge (Toolforge iteration 14): envvars-api 0.0.50 depends on unreleased envvars-cli changes - https://phabricator.wikimedia.org/T367961#10047299 (10Raymond_Ndibe) 05In progress→03Resolved
[04:32:09] <wikibugs>	 10Toolforge (Toolforge iteration 14): [toolforge-weld] support back python 3.7 - https://phabricator.wikimedia.org/T370932#10047297 (10Raymond_Ndibe) 05In progress→03Resolved
[04:37:45] <wikibugs>	 10Toolforge: toolforge jobs load flushes out all jobs - https://phabricator.wikimedia.org/T364204#10047300 (10Raymond_Ndibe) @Multichill this issue has been fixed. closing now. you can re-open if you notice something similar again
[04:37:53] <wikibugs>	 10Toolforge: toolforge jobs load flushes out all jobs - https://phabricator.wikimedia.org/T364204#10047301 (10Raymond_Ndibe) 05Open→03Resolved
[04:45:01] <wikibugs>	 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: [jobs-api] Save business models in a DB - https://phabricator.wikimedia.org/T359650#10047302 (10Raymond_Ndibe)
[04:48:56] <jinxer-wm>	 FIRING: [3x] SystemdUnitDown: The systemd unit clean_puppet_client_bucket.service on node cloudcephosd1035 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[04:49:08] <wikibugs>	 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T370383#10047306 (10phaultfinder)
[04:59:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[05:06:22] <wikibugs>	 10Toolforge (Toolforge iteration 14): Support HTTP health checks in jobs framework - https://phabricator.wikimedia.org/T362621#10047309 (10Raymond_Ndibe)
[05:06:25] <wikibugs>	 06cloud-services-team, 10Toolforge: toolforge: integrate fourohfour as a custom component, rather than a normal tool - https://phabricator.wikimedia.org/T369364#10047311 (10Raymond_Ndibe)
[05:12:59] <wikibugs>	 10Toolforge (Toolforge iteration 14): [jobs-api,jobs-cli] Support multiple replicas of continuous jobs - https://phabricator.wikimedia.org/T341066#10047316 (10Raymond_Ndibe)
[05:18:42] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge, 05Goal: [harbor] Create backups and/or replication - https://phabricator.wikimedia.org/T336668#10047317 (10Raymond_Ndibe)
[05:18:49] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge, 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#10047318 (10Raymond_Ndibe)
[05:20:24] <wikibugs>	 10Toolforge (Toolforge iteration 14): [jobs-cli,jobs-api] quota shows different units for limit and usage - https://phabricator.wikimedia.org/T361120#10047319 (10Raymond_Ndibe) a:03Raymond_Ndibe
[05:21:35] <wikibugs>	 10Toolforge (Toolforge iteration 14): [jobs-cli,jobs-api] quota shows different units for limit and usage - https://phabricator.wikimedia.org/T361120#10047321 (10Raymond_Ndibe)
[05:25:06] <wikibugs>	 10Toolforge: [toolforge,jobs] "toolforge jobs logs" fails when job has not started yet - https://phabricator.wikimedia.org/T349775#10047324 (10Raymond_Ndibe) a:03Raymond_Ndibe
[05:25:27] <wikibugs>	 10Toolforge: [maintain-harbor] Move to become a toolforge component - https://phabricator.wikimedia.org/T358225#10047322 (10Raymond_Ndibe) a:03Raymond_Ndibe
[05:29:38] <wikibugs>	 10Toolforge: [jobs-api] when running a command with wrong quoting, no logs nor useful feedback is given to the user - https://phabricator.wikimedia.org/T356267#10047325 (10Raymond_Ndibe)
[05:31:22] <wikibugs>	 10Toolforge: [jobs-cli] Add a new output format for toolforge jobs list command which returns the input command for scheduled jobs - https://phabricator.wikimedia.org/T356581#10047326 (10Raymond_Ndibe)
[05:32:07] <wikibugs>	 10Toolforge: [jobs-api] Periodically refresh image-config data - https://phabricator.wikimedia.org/T357112#10047327 (10Raymond_Ndibe)
[05:33:43] <wikibugs>	 10Toolforge: Expose Toolforge service names via environment variables - https://phabricator.wikimedia.org/T151002#10047328 (10Raymond_Ndibe)
[06:39:20] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) (T371878)
[06:39:25] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[07:03:56] <wikibugs>	 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Bitu, 10CAS-SSO, and 2 others: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930#10047378 (10SLyngshede-WMF) ` 2024-08-06 20:10:04,149 WARN [org.apereo.cas.util.function.FunctionUtils] - <Found 2 DNs for [...
[07:06:21] <wikibugs>	 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Bitu, 10CAS-SSO, and 2 others: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930#10047381 (10SLyngshede-WMF) CAS uses the following to lookup the user:  ` cas.authn.ldap[0].basedn=dc=wikimedia,dc=org cas.a...
[07:15:50] <wikibugs>	 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Bitu, 10CAS-SSO, and 2 others: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930#10047401 (10SLyngshede-WMF) While I don't have the password, I've tested authenticating as jenkin-deploy on idp-test2004, an...
[07:16:01] <icinga-wm>	 PROBLEM - Host cloudcephosd1035 is DOWN: PING CRITICAL - Packet loss = 100%
[07:18:29] <icinga-wm>	 RECOVERY - Host cloudcephosd1035 is UP: PING OK - Packet loss = 0%, RTA = 0.56 ms
[07:18:56] <jinxer-wm>	 RESOLVED: SystemdUnitDown: The service unit clean_puppet_client_bucket.service is in failed status on host cloudcephosd1035. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1035 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[07:23:56] <jinxer-wm>	 FIRING: [2x] SystemdUnitDown: The service unit clean_puppet_client_bucket.service is in failed status on host cloudcephosd1035. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1035 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[07:38:56] <jinxer-wm>	 RESOLVED: SystemdUnitDown: The service unit ifup@eno12409np1.service is in failed status on host cloudcephosd1035. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1035 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[07:40:01] <icinga-wm>	 PROBLEM - Host cloudcephosd1035 is DOWN: PING CRITICAL - Packet loss = 100%
[07:43:47] <jinxer-wm>	 FIRING: NodeDown: The node cloudcephosd1035 is unreachable. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1035 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown
[07:52:35] <icinga-wm>	 RECOVERY - Host cloudcephosd1035 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms
[07:53:47] <jinxer-wm>	 RESOLVED: NodeDown: The node cloudcephosd1035 is unreachable. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1035 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown
[08:11:10] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T371878)
[08:11:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[08:11:16] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[08:18:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[08:26:46] <wm-bot2>	 !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) (T371878)
[08:26:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[08:26:52] <stashbot>	 T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878
[08:27:17] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.wait_for_rebalance
[08:27:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[08:29:09] <jinxer-wm>	 FIRING: CephSlowOps: Ceph cluster in eqiad has 7 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[08:30:31] <wikibugs>	 (03CR) 10David Caro: [C:03+2] WMCSCookbookRunnerBase: load the wmcs config if it's there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059920 (owner: 10David Caro)
[08:33:29] <wmcs-alerts>	 FIRING: InstanceDown: Project gitlab-runners instance runner-1025 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[08:33:50] <wmcs-alerts>	 FIRING: [3x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[08:34:13] <icinga-wm>	 PROBLEM - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out - string OK not found on http://checker.tools.wmflabs.org:80/nfs/home - 324 bytes in 60.015 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[08:37:28] <wmcs-alerts>	 FIRING: InstanceDown: Project cloudinfra instance cloud-cumin-04 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[08:40:31] <wmcs-alerts>	 FIRING: ToolsToolsDBWritableState: There should be exactly one writable MariaDB instance instead of -1 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState
[08:40:33] <wikibugs>	 10Cloud-Services: PetScan not responding - https://phabricator.wikimedia.org/T371955 (10Magnus) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this task....
[08:41:20] <wikibugs>	 10Cloud-Services: PetScan not responding - https://phabricator.wikimedia.org/T371955#10047546 (10Magnus) p:05Triage→03Unbreak!
[08:43:29] <wmcs-alerts>	 RESOLVED: InstanceDown: Project gitlab-runners instance runner-1025 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[08:44:05] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10047553 (10dcaro) cloudcephosd1035 has one drive that wrongly assigned as 'os raid':  ` sdb...
[08:44:39] <wikibugs>	 (03Merged) 10jenkins-bot: WMCSCookbookRunnerBase: load the wmcs config if it's there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059920 (owner: 10David Caro)
[08:45:17] <wikibugs>	 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Bitu, 10CAS-SSO, and 2 others: Update basedn in CAS - https://phabricator.wikimedia.org/T371930#10047550 (10SLyngshede-WMF) a:03SLyngshede-WMF
[08:51:07] <icinga-wm>	 RECOVERY - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 21.072 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[08:51:11] <wikibugs>	 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Bitu, 10CAS-SSO, and 2 others: Update basedn in CAS - https://phabricator.wikimedia.org/T371930#10047573 (10SLyngshede-WMF) p:05Triage→03Medium We've tested modifying the basedn on test and @hashar confirms that login is now working.
[08:51:55] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console
[08:53:16] <wm-bot2>	 !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
[08:55:39] <jinxer-wm>	 RESOLVED: CephSlowOps: Ceph cluster in eqiad has 14 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[08:56:46] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-25, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-41, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-22
[09:00:37] <wikibugs>	 10Cloud-Services: PetScan not responding - https://phabricator.wikimedia.org/T371955#10047587 (10Magnus) 05Open→03Resolved a:03Magnus Works again
[09:00:39] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[09:00:58] <wmcs-alerts>	 RESOLVED: InstanceDown: Project cloudinfra instance cloud-cumin-04 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[09:03:05] <wikibugs>	 (03PS1) 10David Caro: bootstrap_and_add: ask only once for device destroy ack [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060395
[09:05:20] <wmcs-alerts>	 RESOLVED: [5x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[09:06:26] <wikibugs>	 (03CR) 10CI reject: [V:04-1] bootstrap_and_add: ask only once for device destroy ack [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060395 (owner: 10David Caro)
[09:19:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[09:29:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[09:31:42] <wm-bot2>	 !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-25, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-41, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-22
[09:34:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-19 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[09:36:28] <wmcs-alerts>	 FIRING: PuppetAgentFailure: Puppet agent failure detected on instance tools-sgebastion-10 in project tools   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure
[09:36:47] <wikibugs>	 10Tools: PetScan not responding - https://phabricator.wikimedia.org/T371955#10047630 (10Aklapper)
[09:39:03] <wmcs-alerts>	 FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-19 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[09:49:03] <wmcs-alerts>	 FIRING: [5x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-11 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[09:59:03] <wmcs-alerts>	 RESOLVED: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-11 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[10:10:42] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-sgebastion-10
[10:11:00] <wm-bot2>	 !log dcaro@urcuchillay tools END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-sgebastion-10
[10:11:42] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console
[10:12:02] <wm-bot2>	 !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
[10:12:50] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-9
[10:21:28] <wmcs-alerts>	 RESOLVED: PuppetAgentFailure: Puppet agent failure detected on instance tools-sgebastion-10 in project tools   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure
[10:24:20] <wikibugs>	 10toolforge_i18n, 10Tools, 07I18n, 03Wikimania-Hackathon-2024: Extract Python library for Wikimedia tool i18n from Wikidata Lexeme Forms tool - https://phabricator.wikimedia.org/T283376#10047729 (10LucasWerkmeister) The library is making good progress; the biggest TODO left is better documentation, which I...
[10:30:27] <wm-bot2>	 !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-9
[11:19:49] <wm-bot2>	 !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.wait_for_rebalance (exit_code=0)
[12:06:29] <wikibugs>	 (03CR) 10FNegri: [C:03+1] "LGTM!" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059925 (owner: 10David Caro)
[13:08:23] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10048135 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by dcaro@cumin1002 for host cloudcephosd1037.eqi...
[13:16:05] <wikibugs>	 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T371944#10048149 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/444
[13:16:13] <notefromgithub>	 vivian-rook opened https://github.com/toolforge/paws/pull/444
[13:17:24] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-8, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-43
[13:17:25] <stashbot>	 wmbot~dcaro@urcuchillay: Failed to log message to wiki. Somebody should check the error logs.
[13:27:28] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 10Data-Platform-SRE (2024.07.29 - 2024.08.16), 13Patch-For-Review: Remove or replace deployment-snapshot03.deployment-prep.eqiad1.wikimedia.cloud (Buster depre... - https://phabricator.wikimedia.org/T370465#10048205
[13:34:06] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 10Data-Platform-SRE (2024.07.29 - 2024.08.16), 13Patch-For-Review: Remove or replace deployment-snapshot03.deployment-prep.eqiad1.wikimedia.cloud (Buster depr... - https://phabricator.wikimedia.org/T370465#10048248
[13:34:12] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm - https://phabricator.wikimedia.org/T327742#10048252 (10BTullis)
[13:35:47] <wikibugs>	 (03PS2) 10David Caro: bootstrap_and_add: ask only once for device destroy ack [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060395
[13:36:45] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Replace deployment-eventlog08 with Bullseye or Bookworm host - https://phabricator.wikimedia.org/T369918#10048254 (10BTullis) @Ottomata - Do you think we could delete this host, rather than replace it? Or do we sti...
[13:47:13] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10048325 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by dcaro@cumin1002 for host cloudcephosd1037.eqiad.w...
[13:48:44] <wikibugs>	 (03PS1) 10David Caro: ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443
[13:51:36] <wikibugs>	 (03CR) 10CI reject: [V:04-1] bootstrap_and_add: ask only once for device destroy ack [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060395 (owner: 10David Caro)
[13:55:10] <wikibugs>	 (03CR) 10David Caro: "recheck" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060395 (owner: 10David Caro)
[14:02:00] <wikibugs>	 06cloud-services-team, 10DNS: Move some of wikimediacloud.org 185.15.56.0/23 to Netbox - https://phabricator.wikimedia.org/T268621#10048375 (10ayounsi)
[14:04:19] <wikibugs>	 (03PS1) 10David Caro: ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451
[14:04:35] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443 (owner: 10David Caro)
[14:05:15] <wm-bot2>	 !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-8, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-43
[14:05:16] <stashbot>	 wmbot~dcaro@urcuchillay: Failed to log message to wiki. Somebody should check the error logs.
[14:07:39] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T363344)
[14:07:40] <stashbot>	 wmbot~dcaro@urcuchillay: Failed to log message to wiki. Somebody should check the error logs.
[14:07:41] <stashbot>	 T363344: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344
[14:13:30] <wikibugs>	 (03PS3) 10David Caro: bootstrap_and_add: ask only once for device destroy ack [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060395
[14:13:30] <wikibugs>	 (03PS2) 10David Caro: ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443
[14:13:30] <wikibugs>	 (03PS2) 10David Caro: ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451
[14:13:30] <wikibugs>	 (03PS1) 10David Caro: tox: skip py312 as spicerack does not support it yet [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060453
[14:13:54] <wikibugs>	 (03CR) 10CI reject: [V:04-1] bootstrap_and_add: ask only once for device destroy ack [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060395 (owner: 10David Caro)
[14:13:56] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443 (owner: 10David Caro)
[14:14:00] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451 (owner: 10David Caro)
[14:14:01] <wikibugs>	 (03CR) 10CI reject: [V:04-1] tox: skip py312 as spicerack does not support it yet [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060453 (owner: 10David Caro)
[14:14:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[14:15:36] <wikibugs>	 (03PS2) 10David Caro: tox: skip py312 as spicerack does not support it yet [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060453
[14:15:36] <wikibugs>	 (03PS4) 10David Caro: bootstrap_and_add: ask only once for device destroy ack [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060395
[14:15:36] <wikibugs>	 (03PS3) 10David Caro: ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443
[14:15:36] <wikibugs>	 (03PS3) 10David Caro: ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451
[14:17:41] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451 (owner: 10David Caro)
[14:18:19] <wikibugs>	 (03PS3) 10David Caro: tox: skip py312 as spicerack does not support it yet [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060453 (https://phabricator.wikimedia.org/T354410)
[14:18:21] <wikibugs>	 (03PS5) 10David Caro: bootstrap_and_add: ask only once for device destroy ack [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060395
[14:18:21] <wikibugs>	 (03PS4) 10David Caro: ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443
[14:18:21] <wikibugs>	 (03PS4) 10David Caro: ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451
[14:18:30] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T363344)
[14:18:32] <stashbot>	 wmbot~dcaro@urcuchillay: Failed to log message to wiki. Somebody should check the error logs.
[14:18:33] <stashbot>	 T363344: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344
[14:18:53] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+1] "lgtm" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060453 (https://phabricator.wikimedia.org/T354410) (owner: 10David Caro)
[14:19:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[14:27:23] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 10Data-Platform-SRE (2024.07.29 - 2024.08.16), 13Patch-For-Review: Remove or replace deployment-snapshot03.deployment-prep.eqiad1.wikimedia.cloud (Buster depre... - https://phabricator.wikimedia.org/T370465#10048432
[14:28:12] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451 (owner: 10David Caro)
[14:28:49] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443 (owner: 10David Caro)
[14:29:02] <wikibugs>	 (03CR) 10CI reject: [V:04-1] tox: skip py312 as spicerack does not support it yet [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060453 (https://phabricator.wikimedia.org/T354410) (owner: 10David Caro)
[14:29:29] <wikibugs>	 (03CR) 10CI reject: [V:04-1] bootstrap_and_add: ask only once for device destroy ack [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060395 (owner: 10David Caro)
[14:29:49] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 10Data-Platform-SRE (2024.07.29 - 2024.08.16), 13Patch-For-Review: Remove or replace deployment-snapshot03.deployment-prep.eqiad1.wikimedia.cloud (Buster depre... - https://phabricator.wikimedia.org/T370465#10048438
[14:36:01] <wikibugs>	 06Toolforge-standards-committee: Adoption request for vrb (VimeoReviewBot) - https://phabricator.wikimedia.org/T338556#10048449 (10Novem_Linguae)
[14:37:39] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[14:42:43] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Replace deployment-eventlog08 with Bullseye or Bookworm host - https://phabricator.wikimedia.org/T369918#10048472 (10Ottomata) WE ARE SO CLOSE TO DELETING.  I had hoped to be done already but keep encountering anno...
[14:43:05] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Replace deployment-eventlog08 with Bullseye or Bookworm host - https://phabricator.wikimedia.org/T369918#10048473 (10Ottomata) Oh! wait this is in beta!  Yes, we can delete this.  There should be no use for this in...
[14:45:40] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 10Data-Platform-SRE (2024.07.29 - 2024.08.16), 13Patch-For-Review: Remove or replace deployment-snapshot03.deployment-prep.eqiad1.wikimedia.cloud (Buster depre... - https://phabricator.wikimedia.org/T370465#10048492
[14:48:19] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Replace deployment-eventlog08 with Bullseye or Bookworm host - https://phabricator.wikimedia.org/T369918#10048500 (10BTullis) a:03BTullis >>! In T369918#10048473, @Ottomata wrote: > Oh! wait this is in beta! >  >...
[14:51:02] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm - https://phabricator.wikimedia.org/T327742#10048518 (10BTullis)
[14:52:29] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Replace deployment-eventlog08 with Bullseye or Bookworm host - https://phabricator.wikimedia.org/T369918#10048510 (10BTullis) 05Open→03Resolved Oh it looks like it was already shut down. Not sure when that...
[15:00:33] <wikibugs>	 (03PS4) 10David Caro: tox: skip py312 as spicerack does not support it yet [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060453 (https://phabricator.wikimedia.org/T354410)
[15:00:33] <wikibugs>	 (03PS6) 10David Caro: bootstrap_and_add: ask only once for device destroy ack [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060395
[15:00:33] <wikibugs>	 (03PS5) 10David Caro: ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443
[15:00:33] <wikibugs>	 (03PS5) 10David Caro: ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451
[15:02:11] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[15:02:13] <stashbot>	 wmbot~dcaro@urcuchillay: Failed to log message to wiki. Somebody should check the error logs.
[15:02:25] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[15:02:25] <stashbot>	 wmbot~dcaro@urcuchillay: Failed to log message to wiki. Somebody should check the error logs.
[15:03:59] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451 (owner: 10David Caro)
[15:04:17] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[15:04:17] <stashbot>	 wmbot~dcaro@urcuchillay: Failed to log message to wiki. Somebody should check the error logs.
[15:06:44] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[15:06:45] <stashbot>	 wmbot~dcaro@urcuchillay: Failed to log message to wiki. Somebody should check the error logs.
[15:11:10] <wikibugs>	 10Cloud-VPS: PetScan not responding - https://phabricator.wikimedia.org/T371955#10048592 (10JJMC89)
[15:12:26] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.wait_for_rebalance
[15:12:27] <stashbot>	 wmbot~dcaro@urcuchillay: Failed to log message to wiki. Somebody should check the error logs.
[15:13:52] <wikibugs>	 (03CR) 10David Caro: [C:03+2] tox: skip py312 as spicerack does not support it yet [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060453 (https://phabricator.wikimedia.org/T354410) (owner: 10David Caro)
[15:14:55] <wikibugs>	 (03PS6) 10David Caro: ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451
[15:14:55] <wikibugs>	 (03PS6) 10David Caro: ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443
[15:15:31] <wikibugs>	 (03CR) 10David Caro: [C:03+2] bootstrap_and_add: ask only once for device destroy ack [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060395 (owner: 10David Caro)
[15:18:01] <wikibugs>	 (03Merged) 10jenkins-bot: tox: skip py312 as spicerack does not support it yet [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060453 (https://phabricator.wikimedia.org/T354410) (owner: 10David Caro)
[15:18:11] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 10Data-Platform-SRE (2024.07.29 - 2024.08.16), 13Patch-For-Review: Remove or replace deployment-snapshot03.deployment-prep.eqiad1.wikimedia.cloud (Buster depr... - https://phabricator.wikimedia.org/T370465#10048607
[15:19:05] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451 (owner: 10David Caro)
[15:19:14] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443 (owner: 10David Caro)
[15:19:57] <wikibugs>	 (03Merged) 10jenkins-bot: bootstrap_and_add: ask only once for device destroy ack [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060395 (owner: 10David Caro)
[15:21:30] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10048620 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudcephosd1038.eq...
[15:23:40] <wikibugs>	 (03PS7) 10David Caro: ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451
[15:23:41] <wikibugs>	 (03PS7) 10David Caro: ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443
[15:30:01] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy
[15:30:03] <stashbot>	 dcaro@cloudcumin1001: Failed to log message to wiki. Somebody should check the error logs.
[15:36:24] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10048674 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudcephosd1038.eqiad....
[15:37:19] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10048675 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudcephosd1038.eq...
[15:39:36] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=97)
[15:39:37] <stashbot>	 dcaro@cloudcumin1001: Failed to log message to wiki. Somebody should check the error logs.
[15:41:06] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[15:41:15] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[15:55:49] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 4 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10048714 (10bd808)
[16:01:21] <wikibugs>	 (03PS8) 10David Caro: ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443
[16:02:07] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:02:12] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[16:05:37] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443 (owner: 10David Caro)
[16:15:48] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10048836 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudcephosd1038.eqiad....
[16:23:17] <wikibugs>	 (03PS9) 10David Caro: ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443
[16:23:17] <wikibugs>	 (03PS1) 10David Caro: ceph.drain*: use --osd-hostname and --cluster-name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060469
[16:23:17] <wikibugs>	 (03PS1) 10David Caro: undrain_node: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060470
[16:24:57] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:25:02] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[16:27:43] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443 (owner: 10David Caro)
[16:27:43] <wikibugs>	 (03CR) 10CI reject: [V:04-1] undrain_node: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060470 (owner: 10David Caro)
[16:27:59] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.drain*: use --osd-hostname and --cluster-name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060469 (owner: 10David Caro)
[16:28:27] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:28:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:28:46] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[16:28:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:29:24] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 4 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10048887 (10brennen) > Change #1060468 had a related patch set up...
[16:30:43] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:30:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:31:02] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[16:31:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:31:54] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:31:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:32:18] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[16:32:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:33:44] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:33:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:34:01] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[16:34:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:34:25] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:34:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:35:33] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[16:35:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:36:39] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:36:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:38:12] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[16:38:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:38:15] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:38:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:38:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[16:39:45] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[16:39:48] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:39:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:39:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:40:46] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[16:40:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:43:56] <jinxer-wm>	 RESOLVED: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[16:45:12] <wikibugs>	 10PAWS: Non-notebook files don't redirect to paws-public when URL is changed - https://phabricator.wikimedia.org/T143459#10048969 (10Pppery)
[16:45:19] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.drain_node
[16:45:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:46:17] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99)
[16:46:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:49:27] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[16:49:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:50:17] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[16:50:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:50:24] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.drain_node
[16:50:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[16:51:09] <wikibugs>	 (03PS10) 10David Caro: ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443
[16:51:09] <wikibugs>	 (03PS2) 10David Caro: ceph.drain*: use --osd-hostname and --cluster-name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060469
[16:51:09] <wikibugs>	 (03PS2) 10David Caro: undrain_node: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060470
[16:51:09] <wikibugs>	 (03PS1) 10David Caro: ceph.drain/undrain_node: allow filtering by osd-id [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060473
[16:53:56] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services, 06DBA: Prepare and check storage layer for btmwiki - https://phabricator.wikimedia.org/T368066#10048980 (10BTullis) We are still experiencing a failure relating to `btmwiki` at the beginning of each month. It is something to do with the grants o...
[16:55:02] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443 (owner: 10David Caro)
[16:55:13] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.drain/undrain_node: allow filtering by osd-id [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060473 (owner: 10David Caro)
[16:55:14] <wikibugs>	 (03CR) 10CI reject: [V:04-1] undrain_node: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060470 (owner: 10David Caro)
[16:55:23] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ceph.drain*: use --osd-hostname and --cluster-name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060469 (owner: 10David Caro)
[16:56:13] <wikibugs>	 (03PS11) 10David Caro: ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443
[16:56:13] <wikibugs>	 (03PS3) 10David Caro: ceph.drain*: use --osd-hostname and --cluster-name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060469
[16:56:13] <wikibugs>	 (03PS3) 10David Caro: undrain_node: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060470
[16:56:13] <wikibugs>	 (03PS2) 10David Caro: ceph.drain/undrain_*: allow filtering by osd-id [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060473
[16:56:33] <wm-bot2>	 !log dcaro@urcuchillay admin END (ERROR) - Cookbook wmcs.ceph.wait_for_rebalance (exit_code=97)
[16:56:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:00:25] <wikibugs>	 10VPS-project-Codesearch: Index https://gitlab.wikimedia.org/toolforge-repos/ repos - https://phabricator.wikimedia.org/T371992 (10bd808) 03NEW
[17:02:17] <wikibugs>	 (03PS2) 10David Caro: ceph.{drain,undrain}: fix chunking [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060173
[17:02:19] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99)
[17:02:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:02:26] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[17:02:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:03:08] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[17:03:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:03:16] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[17:03:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:03:32] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[17:03:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:04:45] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[17:04:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:05:37] <wm-bot2>	 !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=99)
[17:05:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:05:41] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[17:05:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:06:29] <wm-bot2>	 !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[17:06:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:06:40] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.undrain_node
[17:06:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:07:11] <wm-bot2>	 !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[17:07:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:07:19] <wikibugs>	 (03PS3) 10David Caro: ceph.{drain,undrain}: fix chunking [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060173
[17:07:20] <wikibugs>	 10VPS-project-Codesearch: Index known popular MediaWiki client libraries - https://phabricator.wikimedia.org/T371993 (10bd808) 03NEW
[17:10:44] <wikibugs>	 (03CR) 10David Caro: [C:03+2] openstack.tofu: use gitlab token from wmcs config [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059925 (owner: 10David Caro)
[17:11:55] <wikibugs>	 10VPS-project-Codesearch: Index known popular MediaWiki client libraries - https://phabricator.wikimedia.org/T371993#10049054 (10bd808) Determining what to index for this feels like an open question. There are lists of clients at https://www.mediawiki.org/wiki/API:Client_code and https://www.mediawiki.org/wiki/A...
[17:14:39] <wikibugs>	 (03Merged) 10jenkins-bot: openstack.tofu: use gitlab token from wmcs config [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059925 (owner: 10David Caro)
[17:19:33] <wm-bot2>	 !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.wait_for_rebalance
[17:19:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:53:03] <wm-bot2>	 !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.wait_for_rebalance (exit_code=0)
[17:53:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:54:34] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10049148 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudcephosd1038.eq...
[17:56:11] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[17:57:08] <wikibugs>	 10VPS-project-Codesearch: Index https://gitlab.wikimedia.org/toolforge-repos/ repos - https://phabricator.wikimedia.org/T371992#10049153 (10Bugreporter) Per {T268196} I think we should index all primary (non-fork) GitLab repos instead, since GitLab CE does not have any global search feature.
[18:01:54] <wikibugs>	 10VPS-project-Codesearch: Index https://gitlab.wikimedia.org/toolforge-repos/ repos - https://phabricator.wikimedia.org/T371992#10049161 (10Dzahn) This looks like an example change where a repo was added to codesearch in the past:  https://gerrit.wikimedia.org/r/c/labs/codesearch/+/414060
[18:03:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[18:13:09] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[18:17:13] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[18:17:40] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[18:19:30] <wikibugs>	 10Tool-Pageviews: pageviews tool doesn't work in several newer wikis - https://phabricator.wikimedia.org/T371997 (10Amire80) 03NEW
[18:19:56] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 5 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[18:24:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[18:32:16] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10049245 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudcephosd1038.eqiad....
[18:33:59] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[18:34:09] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[19:43:22] <wikibugs>	 10VPS-project-Codesearch: Index https://gitlab.wikimedia.org/toolforge-repos/ repos - https://phabricator.wikimedia.org/T371992#10049412 (10bd808) I found that https://gerrit.wikimedia.org/r/plugins/gitiles/labs/codesearch/+/fae2553e35f901c6f678cb5b696681a35df1cd50/write_config.py#469 is configuring codesearch t...
[19:54:05] <wikibugs>	 (03PS1) 10BryanDavis: config: Index https://gitlab.wikimedia.org/toolforge-repos/* [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1060493 (https://phabricator.wikimedia.org/T371992)
[19:56:18] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add
[19:59:06] <icinga-wm>	 PROBLEM - Host cloudcephosd1037 is DOWN: PING CRITICAL - Packet loss = 100%
[20:01:38] <icinga-wm>	 ACKNOWLEDGEMENT - SSH on cloudcephosd1037 is CRITICAL: CRITICAL - Socket timeout after 10 seconds Andrew Bogott rebooting from a non-icinga-enabled cookbook https://wikitech.wikimedia.org/wiki/SSH/monitoring
[20:01:38] <icinga-wm>	 ACKNOWLEDGEMENT - Host cloudcephosd1037 is DOWN: PING CRITICAL - Packet loss = 100% Andrew Bogott rebooting from a non-icinga-enabled cookbook
[20:03:02] <icinga-wm>	 RECOVERY - Host cloudcephosd1037 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms
[20:03:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[20:06:20] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99)
[20:07:28] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[20:07:57] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 4 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10049480 (10brennen)
[20:08:09] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[20:08:32] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 4 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10049505 (10brennen) Removing as train blocker for .17, leaving o...
[20:10:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance
[20:11:08] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=97)
[20:11:53] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[20:15:05] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 6 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10049550 (10bd808) >>! In T371977#10048706, @bd808 wrote: >  the...
[20:17:24] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 6 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10049556 (10LucasWerkmeister) >>! In T371977#10048887, @brennen w...
[20:18:10] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 6 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10049558 (10LucasWerkmeister) p:05Unbreak!→03Triage
[20:29:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[21:00:12] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 6 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10049711 (10LucasWerkmeister) >>! In T371977#10049550, @bd808 wro...
[21:05:18] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 6 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10049722 (10bd808) >>! In T371977#10049711, @LucasWerkmeister wro...
[21:05:56] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [ceph,network] Intermittent network packets lost - https://phabricator.wikimedia.org/T371869#10049717 (10Dzahn) Also see: T371879#10049699  Something created a large traffic spike between cloudsw1-d5 and cloudsw1-f4 today.
[21:49:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[21:59:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[22:39:32] <wikibugs>	 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 6 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10049882 (10Krinkle) "I told you so".   I specifically amended ht...
[23:23:22] <wikibugs>	 (03open) 10raymond-ndibe: Draft: [maintain-kubeusers] increment default quota for pods, cpu, mem [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/58 (https://phabricator.wikimedia.org/T341066)
[23:24:18] <wikibugs>	 (03open) 10raymond-ndibe: Draft: [jobs-api] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/115 (https://phabricator.wikimedia.org/T341066)
[23:24:47] <wikibugs>	 (03update) 10raymond-ndibe: Draft: [jobs-api] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/115 (https://phabricator.wikimedia.org/T341066)
[23:25:09] <wikibugs>	 (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] increment default quota for pods, cpu, mem [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/58 (https://phabricator.wikimedia.org/T341066)