[00:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:06:13] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS (Quota-requests): Subdomain for catalyst-dev project - https://phabricator.wikimedia.org/T381508#10381880 (10jeena) Thank you! [00:13:06] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:14:55] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Kernel error Server cloudvirt1061 may have kernel errors - https://phabricator.wikimedia.org/T380673#10381891 (10Jclark-ctr) Followed up with Dell. can you confirm that i can power down server again tomorrow to inspect memory @aborrero [00:18:06] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:23:06] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:25:52] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Kernel error Server cloudvirt1061 may have kernel errors - https://phabricator.wikimedia.org/T380673#10381908 (10Jclark-ctr) a:05Jhancock.wm→03Jclark-ctr I did notice it looks like memory is missing from inventory report looks like slots... [02:06:56] FIRING: SystemdUnitDown: The service unit remove_dangling_cinder_snapshots.service is in failed status on host cloudbackup1002-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:57:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.535% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:01:56] FIRING: SystemdUnitDown: The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:06:56] FIRING: [2x] SystemdUnitDown: The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:12:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.986% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:15:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.693% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:20:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.844% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:25:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.781% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:30:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.781% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:36:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.328% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:41:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.669% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:46:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.442% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:51:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.761% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:55:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.681% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:00:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.681% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:01:56] FIRING: SystemdUnitDown: The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1002-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:02:02] 06cloud-services-team: SystemdUnitDown The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1002-dev has been failing for more than two hours. - https://phabricator.wikimedia.org/T381545 (10phaultfinder) 03NEW [04:04:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.459% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:09:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.803% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:14:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.321% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:19:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.987% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:39:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.558% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:44:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.771% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:49:52] 06cloud-services-team, 10Cloud-VPS: 'backy2 cleanup' fails on backy2 cleanup - https://phabricator.wikimedia.org/T381548 (10Andrew) 03NEW [04:51:56] RESOLVED: [2x] SystemdUnitDown: The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:56:56] RESOLVED: SystemdUnitDown: The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1002-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [07:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:57:31] (03close) 10abartov: Load Bootstrap from Toolforge CDNjs [toolforge-repos/luthor] - 10https://gitlab.wikimedia.org/toolforge-repos/luthor/-/merge_requests/1 (https://phabricator.wikimedia.org/T368833) (owner: 10lucaswerkmeister) [08:58:44] (03merge) 10abartov: Remove

tags [toolforge-repos/luthor] - 10https://gitlab.wikimedia.org/toolforge-repos/luthor/-/merge_requests/2 (owner: 10amire80) [09:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:04:35] 10Tool-lexeme-forms, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Dec), 13Patch-For-Review: translatewiki export for Wikidata Lexeme Forms tries to remove sh-latn translations - https://phabricator.wikimedia.org/T379188#10382533 (10Nikerabbit) [09:11:18] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10382542 [09:13:15] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS (Quota-requests): Subdomain for catalyst-dev project - https://phabricator.wikimedia.org/T381508#10382549 (10fnegri) Thanks @bd808! [09:29:42] 10Tool-schedule-deployment: Allow scheduling for current backport window - https://phabricator.wikimedia.org/T381237#10382626 (10kostajh) >>! In T381237#10372235, @bd808 wrote: > How many minutes of slack in adding content to a deployment window are reasonable? IMO if the submission happens at any time during t... [11:19:59] 10Tools: [ErinnerMichBot] Query current page title before posting reminder - https://phabricator.wikimedia.org/T381563 (10Tkarcher) 03NEW [13:24:02] 06cloud-services-team, 10Cloud-VPS: 'backy2 cleanup' fails on backy2 cleanup - https://phabricator.wikimedia.org/T381548#10383240 (10Andrew) It completed. I've re-enabled puppet and restored the settings for now, which probably means it will break again unless there was something unlucky about the specific bl... [13:40:57] (03update) 10sstefanova: pre-commit: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/55 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:44:11] (03update) 10sstefanova: pre-commit: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/55 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:44:14] (03approved) 10sstefanova: pre-commit: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/55 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:44:18] (03merge) 10sstefanova: pre-commit: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/55 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:44:48] (03update) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/54 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:46:54] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: api-gateway: bump to 0.0.57-20241205134431-4c896daa [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/641 [13:49:11] (03update) 10sstefanova: api-gateway: bump to 0.0.57-20241205134431-4c896daa [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/641 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:50:54] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer [13:57:48] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer [13:59:18] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer [14:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:06:37] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer [14:08:42] (03update) 10sstefanova: jobs-emailer: bump to 0.0.45-20241204192251-ef3470d9 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/640 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:08:43] (03approved) 10sstefanova: jobs-emailer: bump to 0.0.45-20241204192251-ef3470d9 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/640 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:08:46] (03merge) 10sstefanova: jobs-emailer: bump to 0.0.45-20241204192251-ef3470d9 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/640 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:09:49] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [14:13:00] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [14:14:54] (03update) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/toolforge-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-cli/-/merge_requests/35 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:14:55] (03approved) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/toolforge-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-cli/-/merge_requests/35 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:14:58] (03merge) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/toolforge-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-cli/-/merge_requests/35 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:26:51] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [14:33:33] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [14:33:37] 06cloud-services-team, 10Toolforge (Quota-requests): Increase kurbernetes quota for tools.multichill - https://phabricator.wikimedia.org/T380902#10383469 (10Andrew) Hello again @Multichill -- this request is stalled pending your response. [14:34:55] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [14:36:11] (03update) 10sstefanova: api-gateway: bump to 0.0.57-20241205134431-4c896daa [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/641 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:38:39] (03update) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/54 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:38:41] (03approved) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/54 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:38:44] (03merge) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/54 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:41:47] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: api-gateway: bump to 0.0.57-20241205134431-4c896daa [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/641 [14:41:51] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: api-gateway: bump to 0.0.57-20241205134431-4c896daa [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/641 [14:42:11] (03update) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/45 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:42:52] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [14:43:05] (03update) 10sstefanova: api-gateway: bump to 0.0.57-20241205134431-4c896daa [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/641 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:43:08] (03approved) 10sstefanova: api-gateway: bump to 0.0.57-20241205134431-4c896daa [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/641 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:43:11] (03merge) 10sstefanova: api-gateway: bump to 0.0.57-20241205134431-4c896daa [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/641 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:45:07] 06cloud-services-team, 10Toolforge: Rescue file from Toolforge backup - https://phabricator.wikimedia.org/T381582 (10-jem-) 03NEW [14:46:32] 06cloud-services-team, 10Toolforge: Rescue file from Toolforge backup - https://phabricator.wikimedia.org/T381582#10383553 (10-jem-) p:05Triage→03High [14:48:08] (03approved) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/45 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:48:13] (03merge) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/45 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:50:17] 06cloud-services-team, 10Cloud-VPS: 'backy2 cleanup' fails on backy2 cleanup - https://phabricator.wikimedia.org/T381548#10383564 (10Andrew) p:05Triage→03Medium [14:51:35] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: components-api: bump to 0.0.73-20241205144825-1bfbda9c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/642 [15:00:04] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [15:00:35] (03update) 10sstefanova: components-api: bump to 0.0.73-20241205144825-1bfbda9c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/642 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:02:07] (03update) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/131 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:06:14] 06cloud-services-team: SystemdUnitDown The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1002-dev has been failing for more than two hours. - https://phabricator.wikimedia.org/T381545#10383622 (10Andrew) p:05Triage→03Medium a:03Andrew [15:06:27] 06cloud-services-team, 10Cloud-VPS: Upgrade cloud-vps openstack to version 'Dalmation' - https://phabricator.wikimedia.org/T381499#10383624 (10Andrew) p:05Triage→03Medium [15:07:05] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [15:08:00] 10wikitech.wikimedia.org, 06Content-Transform-Team, 10Parsoid: Parsoid renders "Incident status" (wikitech) incorrectly - https://phabricator.wikimedia.org/T380899#10383632 (10MSantos) [15:08:09] (03approved) 10sstefanova: components-api: bump to 0.0.73-20241205144825-1bfbda9c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/642 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:08:13] (03merge) 10sstefanova: components-api: bump to 0.0.73-20241205144825-1bfbda9c [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/642 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:09:02] 10wikitech.wikimedia.org, 10Parsoid: Parsoid renders "Incident status" (wikitech) incorrectly - https://phabricator.wikimedia.org/T380899#10383637 (10MSantos) [15:12:12] (03update) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/131 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:27:59] 06cloud-services-team, 10Cloud-VPS, 10Beta-Cluster-Infrastructure: Future growth of deployment-prep? - https://phabricator.wikimedia.org/T381420#10383732 (10Andrew) p:05Triage→03Medium [15:28:08] 06cloud-services-team, 10Cloud-VPS, 06collaboration-services, 10Continuous-Integration-Infrastructure, and 2 others: Future testing-infra growth on cloud-vps - https://phabricator.wikimedia.org/T381419#10383733 (10Andrew) p:05Triage→03Medium [15:28:16] 06cloud-services-team, 10Cloud-VPS, 10Catalyst: Future catalyst cloud-vps usage - https://phabricator.wikimedia.org/T381418#10383734 (10Andrew) p:05Triage→03Medium [15:28:46] 06cloud-services-team, 10Toolforge (Quota-requests): Increase kubernetes quota for tools.multichill - https://phabricator.wikimedia.org/T380902#10383735 (10Andrew) [15:29:16] 06cloud-services-team, 06DC-Ops, 10ops-codfw, 06SRE: PowerSupplyFailure Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T380479#10383736 (10Andrew) p:05Triage→03Medium [15:29:30] 06cloud-services-team, 10Cloud-VPS, 10Ceph: puppet: partman comments in cephosd.cfg are misleading - https://phabricator.wikimedia.org/T380339#10383738 (10Andrew) p:05Triage→03Low [15:29:52] 06cloud-services-team, 10Cloud-VPS: Audit WMCS compute capacity - https://phabricator.wikimedia.org/T380099#10383739 (10Andrew) p:05Triage→03Medium [15:32:17] 06cloud-services-team, 10Data-Services: Bad credentials for tools - https://phabricator.wikimedia.org/T348259#10383749 (10Andrew) @taavi -- for future reference, regenerating the config file just consists of deleting replica.my.cnf on the nfs server and waiting for it to be dropped back in place by maintai... [15:46:29] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services: [wikireplicas] Gather usage stats - https://phabricator.wikimedia.org/T381587 (10fnegri) 03NEW [15:46:33] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services: [wikireplicas] Gather usage stats - https://phabricator.wikimedia.org/T381587#10383798 (10fnegri) p:05Triage→03High [15:49:57] 06cloud-services-team, 10Toolforge: toolforge jobs load errors with 404 repetatively - https://phabricator.wikimedia.org/T381273#10383804 (10taavi) Can you try running the command with `toolforge jobs --debug`? [15:50:39] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:54:57] (03approved) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/131 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:55:01] (03merge) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/131 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:55:39] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:56:28] (03update) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/8 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:56:29] (03update) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/8 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:57:56] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: jobs-api: bump to 0.0.342-20241205155513-fab73d57 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/643 [16:00:25] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence: [wikireplicas] Route alerts to WMCS team - https://phabricator.wikimedia.org/T381589 (10fnegri) 03NEW [16:00:39] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:00:42] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence: [wikireplicas] Route alerts to WMCS team - https://phabricator.wikimedia.org/T381589#10383862 (10fnegri) p:05Triage→03High [16:08:14] (03approved) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/8 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [16:08:17] (03merge) 10sstefanova: poetry: Autoupdate [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/8 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [16:18:55] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [16:19:08] (03update) 10sstefanova: jobs-api: bump to 0.0.342-20241205155513-fab73d57 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/643 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [16:20:39] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:25:07] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [16:25:40] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:26:39] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:26:50] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [16:31:39] RESOLVED: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:32:02] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence: [wikireplicas] Route alerts to WMCS team - https://phabricator.wikimedia.org/T381589#10384009 (10fnegri) [16:34:43] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [16:47:26] (03approved) 10sstefanova: jobs-api: bump to 0.0.342-20241205155513-fab73d57 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/643 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [16:47:28] (03merge) 10sstefanova: jobs-api: bump to 0.0.342-20241205155513-fab73d57 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/643 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [16:49:15] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence: [wikireplicas] Route alerts to WMCS team - https://phabricator.wikimedia.org/T381589#10384113 (10fnegri) > clouddbs are currently getting tagged with cluster: mysql There are actually 4 different Hiera tags we should check for c... [17:17:50] 06cloud-services-team, 10Data-Services: Bad credentials for tools - https://phabricator.wikimedia.org/T348259#10384243 (10taavi) >>! In T348259#10383749, @Andrew wrote: > @taavi -- for future reference, regenerating the config file just consists of deleting replica.my.cnf on the nfs server and waiting for... [17:20:29] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install cloudcephosd2004-dev - https://phabricator.wikimedia.org/T378825#10384256 (10Jhancock.wm) [17:24:27] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence: [wikireplicas] Route alerts to WMCS team - https://phabricator.wikimedia.org/T381589#10384264 (10fnegri) I'm not entirely sure about Icinga alerts for clouddb* hosts. As far as I understand it: * emails and pages sent by Icinga a... [17:24:48] FIRING: PuppetFailure: Puppet has failed on cloudbackup2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [17:24:53] 06cloud-services-team: PuppetFailure Puppet has failed on cloudbackup2003:9100 - https://phabricator.wikimedia.org/T381600 (10phaultfinder) 03NEW [17:28:24] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge: Rescue file from Toolforge backup - https://phabricator.wikimedia.org/T381582#10384305 (10fnegri) a:03fnegri [17:29:48] FIRING: [2x] PuppetFailure: Puppet has failed on cloudbackup2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [17:29:59] 06cloud-services-team: PuppetFailure - https://phabricator.wikimedia.org/T381602 (10phaultfinder) 03NEW [17:44:31] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge: Rescue file from Toolforge backup - https://phabricator.wikimedia.org/T381582#10384386 (10fnegri) 05Open→03In progress I started looking at this and I identified that the file should be in the `tools-nfs` Cinder volume (id: `49bc0222-14cc-4bfb-92e4-66... [18:17:45] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:22:45] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:27:09] PROBLEM - Host checker.tools.wmflabs.org is DOWN: PING CRITICAL - Packet loss = 100% [18:27:09] RECOVERY - Host checker.tools.wmflabs.org is UP: PING WARNING - Packet loss = 50%, RTA = 18.19 ms [18:34:48] FIRING: [2x] PuppetFailure: Puppet has failed on cloudbackup2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:39:48] RESOLVED: [2x] PuppetFailure: Puppet has failed on cloudbackup2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:55:56] (03open) 10waldir: Fix placeholder message for username field [toolforge-repos/yearinreview] - 10https://gitlab.wikimedia.org/toolforge-repos/yearinreview/-/merge_requests/9 [18:58:05] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10384697 (10NPRB) |**Wikitech account/LDAP:**|NPRB| |**SUL account**| NPRB| |**Account linked on [[ https://idm.wikimedia.org/ | IDM ]]** |Y| |**I have visited [[ https://wikitech.wikimedia.o... [19:06:56] FIRING: SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1002-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:16:56] RESOLVED: SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1002-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:18:56] 10VPS-project-Wikistats: Add idwikivoyage to wikistats - https://phabricator.wikimedia.org/T381084#10385107 (10Dzahn) 05Open→03Resolved ` MariaDB [wikistats]> insert into wikivoyage (lang,prefix,loclang,method) values ("Indonesian","id","Bahasa Indonesia",8); .. dzahn@wikistats-bookworm:~$ /usr/lib/wik... [22:24:23] 10VPS-project-Wikistats: since all updates run with the "extinfo" parameter some tables did not get updated numbers - https://phabricator.wikimedia.org/T381623 (10Dzahn) 03NEW [22:25:03] 10VPS-project-Wikistats: since all updates run with the "extinfo" parameter some tables did not get updated numbers - https://phabricator.wikimedia.org/T381623#10385132 (10Dzahn) p:05Triage→03High