[02:24:48] 06Toolforge-standards-committee: Adoption request for jawi - https://phabricator.wikimedia.org/T379340#10527535 (10Hakimi97) Ping back @bd808 and @JJMC89 since the last response. [05:35:23] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10527641 (10HormigasAIS) cloudgw1003.eqiad.wmnet: C8 cloudgw1004.eqiad.wmnet: D5{F58366376} [06:34:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:44:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:39:57] (03approved) 10aborrero: only_image_publish.yaml: Remove KOKKURI_REGISTRY_INTERNAL variable [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/47 (owner: 10dancy) [10:41:14] (03merge) 10aborrero: only_image_publish.yaml: Remove KOKKURI_REGISTRY_INTERNAL variable [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/47 (owner: 10dancy) [11:37:13] (03open) 10lucaswerkmeister-wmde: Fix parsing results with empty times [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/1 (https://phabricator.wikimedia.org/T384925) [11:37:27] (03close) 10lucaswerkmeister-wmde: Fix parsing results with empty times [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/1 (https://phabricator.wikimedia.org/T384925) [11:37:36] (03open) 10lucaswerkmeister-wmde: Recommend Flask debug mode for development [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/2 [11:37:45] (03open) 10lucaswerkmeister-wmde: Fix parsing results with empty times [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/3 [11:45:25] (03approved) 10arthurtaylor: Recommend Flask debug mode for development [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/2 (owner: 10lucaswerkmeister-wmde) [11:45:29] (03merge) 10arthurtaylor: Recommend Flask debug mode for development [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/2 (owner: 10lucaswerkmeister-wmde) [11:48:43] (03approved) 10arthurtaylor: Fix parsing results with empty times [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/3 (owner: 10lucaswerkmeister-wmde) [11:48:45] (03merge) 10arthurtaylor: Fix parsing results with empty times [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/3 (owner: 10lucaswerkmeister-wmde) [11:51:49] PROBLEM - Host clouddb1016 is DOWN: PING CRITICAL - Packet loss = 100% [11:51:49] RECOVERY - Host clouddb1016 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [11:52:39] PROBLEM - mysqld processes on clouddb1016 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting [11:54:39] RECOVERY - mysqld processes on clouddb1016 is OK: PROCS OK: 2 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting [12:10:06] (03update) 10raymond-ndibe: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) [12:10:50] (03update) 10raymond-ndibe: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) [12:12:05] (03update) 10raymond-ndibe: [toolforge-weld]: support query_params for K8sClient [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/69 (https://phabricator.wikimedia.org/T359804) [12:12:35] (03update) 10raymond-ndibe: [toolforge-weld]: support query_params for K8sClient [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/69 (https://phabricator.wikimedia.org/T359804) [12:17:10] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission [12:20:02] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/3 [12:20:03] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/30 [12:25:44] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission [12:26:09] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission [12:26:34] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [12:33:50] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:35:33] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [12:35:48] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission [12:37:29] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer [12:37:48] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api [12:37:55] !log raymond-ndibe@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api [12:43:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:43:59] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer [12:44:54] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission [12:45:23] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer [12:46:56] 06cloud-services-team, 10Pontoon: Puppet CA certificate pontoon-conf-01.monitoring.eqiad.wmflabs is about to expire in 26d 1h 30m 40s - https://phabricator.wikimedia.org/T385797 (10Andrew) 03NEW [12:51:55] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission [12:52:58] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [12:53:50] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer [12:55:03] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [13:02:20] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [13:02:57] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission [13:06:21] !log raymond-ndibe@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission [13:07:49] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission [13:08:57] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api [13:11:56] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [13:15:47] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission [13:18:51] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [13:19:13] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component volume-admission [13:19:48] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-api [13:26:07] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission [13:27:52] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [13:28:35] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api [13:34:50] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [13:36:06] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor [13:36:20] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [13:39:02] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [13:43:46] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor [13:54:34] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [13:55:09] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-builer [13:55:13] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builer [13:55:41] 06cloud-services-team, 10Pontoon: Puppet CA certificate pontoon-conf-01.monitoring.eqiad.wmflabs is about to expire in 26d 1h 30m 40s - https://phabricator.wikimedia.org/T385797#10528724 (10fgiunchedi) The monitoring project doesn't exist anymore [13:59:43] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-builer [13:59:46] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builer [14:00:34] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-builer [14:00:38] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builer [14:01:06] (03approved) 10raymond-ndibe: ingress-admission: bump to 0.0.56-20250205000153-8f1e7076 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/667 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:01:09] (03merge) 10raymond-ndibe: ingress-admission: bump to 0.0.56-20250205000153-8f1e7076 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/667 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:01:22] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [14:01:23] (03update) 10raymond-ndibe: ingress-admission: bump to 0.0.56-20250205000153-8f1e7076 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/667 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:01:40] (03update) 10raymond-ndibe: components-api: bump to 0.0.77-20250204235931-5247bf60 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/666 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:01:44] (03update) 10raymond-ndibe: components-api: bump to 0.0.77-20250204235931-5247bf60 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/666 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:01:45] (03approved) 10raymond-ndibe: components-api: bump to 0.0.77-20250204235931-5247bf60 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/666 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:02:08] (03merge) 10raymond-ndibe: components-api: bump to 0.0.77-20250204235931-5247bf60 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/666 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:02:23] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component volume-admission [14:05:06] !log raymond-ndibe@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission [14:10:31] FIRING: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-4 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [14:16:29] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate pontoon-conf-01.monitoring.eqiad.wmflabs is about to expire in 25d 23h 58m 40s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [14:17:31] FIRING: ToolsToolsDBWritableState: There should be exactly one writable MariaDB instance instead of -1 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState [14:17:56] FIRING: SystemdUnitDown: The service unit disable-tool.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:18:50] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component volume-admission [14:19:04] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-builer [14:19:07] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builer [14:19:32] (03update) 10raymond-ndibe: jobs-emailer: bump to 0.0.49-20250204235856-e9daf12d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/665 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:19:35] (03update) 10raymond-ndibe: jobs-emailer: bump to 0.0.49-20250204235856-e9daf12d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/665 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:19:36] (03approved) 10raymond-ndibe: jobs-emailer: bump to 0.0.49-20250204235856-e9daf12d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/665 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:19:54] (03merge) 10raymond-ndibe: jobs-emailer: bump to 0.0.49-20250204235856-e9daf12d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/665 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:26:29] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission [14:28:42] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component volume-admission [14:29:06] (03update) 10raymond-ndibe: envvars-admission: bump to 0.0.24-20250204235727-1c4069a7 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/664 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:29:09] (03approved) 10raymond-ndibe: envvars-admission: bump to 0.0.24-20250204235727-1c4069a7 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/664 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:29:12] (03update) 10raymond-ndibe: envvars-admission: bump to 0.0.24-20250204235727-1c4069a7 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/664 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:33:01] RESOLVED: ToolsToolsDBWritableState: There should be exactly one writable MariaDB instance instead of 0 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState [14:34:26] RESOLVED: SystemdUnitDown: The service unit disable-tool.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:36:41] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission [14:37:21] (03merge) 10raymond-ndibe: envvars-admission: bump to 0.0.24-20250204235727-1c4069a7 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/664 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:37:37] (03update) 10raymond-ndibe: builds-api: bump to 0.0.178-20250204235225-204a2a86 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/663 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:37:39] (03approved) 10raymond-ndibe: builds-api: bump to 0.0.178-20250204235225-204a2a86 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/663 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:37:42] (03update) 10raymond-ndibe: builds-api: bump to 0.0.178-20250204235225-204a2a86 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/663 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:37:59] (03merge) 10raymond-ndibe: builds-api: bump to 0.0.178-20250204235225-204a2a86 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/663 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:38:37] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [14:43:58] (03update) 10raymond-ndibe: jobs-api: bump to 0.0.348-20250204235743-8cc6991d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/662 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:43:59] (03approved) 10raymond-ndibe: jobs-api: bump to 0.0.348-20250204235743-8cc6991d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/662 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:44:00] (03update) 10raymond-ndibe: jobs-api: bump to 0.0.348-20250204235743-8cc6991d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/662 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:44:19] (03merge) 10raymond-ndibe: jobs-api: bump to 0.0.348-20250204235743-8cc6991d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/662 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:46:11] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [14:46:33] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor [14:46:48] (03update) 10raymond-ndibe: volume-admission: bump to 0.0.61-20250204235343-f67d11a3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/661 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:46:50] (03approved) 10raymond-ndibe: volume-admission: bump to 0.0.61-20250204235343-f67d11a3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/661 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:46:53] (03update) 10raymond-ndibe: volume-admission: bump to 0.0.61-20250204235343-f67d11a3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/661 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:47:12] (03merge) 10raymond-ndibe: volume-admission: bump to 0.0.61-20250204235343-f67d11a3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/661 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:50:13] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor [14:50:30] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor [14:56:45] (03open) 10lucaswerkmeister-wmde: Fetch job data on demand [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/4 (https://phabricator.wikimedia.org/T384925) [14:57:03] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor [14:58:50] (03update) 10raymond-ndibe: api-gateway: bump to 0.0.61-20250204235237-f1e33e1f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/660 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:58:51] (03approved) 10raymond-ndibe: api-gateway: bump to 0.0.61-20250204235237-f1e33e1f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/660 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:58:54] (03update) 10raymond-ndibe: api-gateway: bump to 0.0.61-20250204235237-f1e33e1f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/660 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:59:15] (03merge) 10raymond-ndibe: api-gateway: bump to 0.0.61-20250204235237-f1e33e1f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/660 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:01:32] (03update) 10raymond-ndibe: maintain-harbor: bump to 0.0.20-20250204235116-d595f2ae [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/659 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:01:35] (03approved) 10raymond-ndibe: maintain-harbor: bump to 0.0.20-20250204235116-d595f2ae [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/659 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:01:38] (03update) 10raymond-ndibe: maintain-harbor: bump to 0.0.20-20250204235116-d595f2ae [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/659 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:01:54] (03merge) 10raymond-ndibe: maintain-harbor: bump to 0.0.20-20250204235116-d595f2ae [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/659 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:02:29] (03approved) 10raymond-ndibe: builds-builder: bump to 0.0.124-20250204235137-18656c89 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/658 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:02:33] (03update) 10raymond-ndibe: builds-builder: bump to 0.0.124-20250204235137-18656c89 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/658 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:05:19] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-builder [15:12:01] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder [15:15:49] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-builder [15:22:58] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder [15:23:25] (03update) 10raymond-ndibe: builds-builder: bump to 0.0.124-20250204235137-18656c89 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/658 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:23:30] (03merge) 10raymond-ndibe: builds-builder: bump to 0.0.124-20250204235137-18656c89 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/658 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:35:09] (03update) 10raymond-ndibe: [toolforge-weld]: support query_params for K8sClient [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/69 (https://phabricator.wikimedia.org/T359804) [15:35:10] (03approved) 10raymond-ndibe: [toolforge-weld]: support query_params for K8sClient [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/69 (https://phabricator.wikimedia.org/T359804) [15:35:19] (03merge) 10raymond-ndibe: [toolforge-weld]: support query_params for K8sClient [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/69 (https://phabricator.wikimedia.org/T359804) [15:35:36] (03update) 10raymond-ndibe: [toolforge-weld] support apply_object method [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/70 (https://phabricator.wikimedia.org/T359804) [15:35:37] (03approved) 10raymond-ndibe: [toolforge-weld] support apply_object method [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/70 (https://phabricator.wikimedia.org/T359804) [15:35:44] (03merge) 10raymond-ndibe: [toolforge-weld] support apply_object method [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/70 (https://phabricator.wikimedia.org/T359804) [15:35:45] (03update) 10raymond-ndibe: [toolforge-weld] make user_agent importable [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/71 (https://phabricator.wikimedia.org/T359804) [15:35:54] (03approved) 10raymond-ndibe: [toolforge-weld] make user_agent importable [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/71 (https://phabricator.wikimedia.org/T359804) [15:35:59] (03update) 10raymond-ndibe: [toolforge-weld] make user_agent importable [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/71 (https://phabricator.wikimedia.org/T359804) [15:36:08] (03merge) 10raymond-ndibe: [toolforge-weld] make user_agent importable [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/71 (https://phabricator.wikimedia.org/T359804) [15:47:08] 06cloud-services-team, 10Pontoon: Puppet CA certificate pontoon-conf-01.monitoring.eqiad.wmflabs is about to expire in 26d 1h 30m 40s - https://phabricator.wikimedia.org/T385797#10529199 (10Andrew) I have the feeling we've had this conversation before :/ [16:02:35] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10529293 (10VRiley-WMF) For reference here is a picture to get a better understanding. The top unit is clouddumps1001, and right underneth it is ganeti1044 (w... [16:22:12] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1067.eqiad.wmnet}' [16:26:30] PROBLEM - Host cloudvirt1067 is DOWN: PING CRITICAL - Packet loss = 100% [16:28:39] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1067.eqiad.wmnet}' [16:28:47] RECOVERY - Host cloudvirt1067 is UP: PING OK - Packet loss = 0%, RTA = 0.42 ms [16:29:21] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1066.eqiad.wmnet}' [16:30:18] FIRING: [2x] KernelErrors: Server cloudvirt1067 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudvirt1067 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [16:33:28] FIRING: InstanceDown: Project tools instance tools-k8s-worker-nfs-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:38:24] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1066.eqiad.wmnet}' [16:38:25] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1065.eqiad.wmnet}' [16:38:28] RESOLVED: InstanceDown: Project tools instance tools-k8s-worker-nfs-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:40:18] FIRING: [2x] KernelErrors: Server cloudvirt1066 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudvirt1066 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [16:47:35] 06cloud-services-team, 06DC-Ops, 10Ganeti, 06Infrastructure-Foundations, and 2 others: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10529535 (10fnegri) Thanks @VRiley-WMF, I'm adding #ganeti and #infrastructure-foundations as I think they own ganeti1044. I... [16:55:27] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1065.eqiad.wmnet}' [16:55:28] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1064.eqiad.wmnet}' [16:59:56] (03open) 10raymond-ndibe: d/changelog: bump to 1.6.6 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/72 (https://phabricator.wikimedia.org/T359804) [17:00:42] (03update) 10raymond-ndibe: d/changelog: bump to 1.6.6 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/72 (https://phabricator.wikimedia.org/T359804) [17:02:10] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld [17:08:52] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld [17:09:42] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld [17:11:29] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate pontoon-conf-01.monitoring.eqiad.wmflabs is about to expire in 25d 21h 7m 40s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [17:13:35] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1064.eqiad.wmnet}' [17:13:36] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1063.eqiad.wmnet}' [17:15:37] 06cloud-services-team, 10Pontoon: Puppet CA certificate pontoon-conf-01.monitoring.eqiad.wmflabs is about to expire in 26d 1h 30m 40s - https://phabricator.wikimedia.org/T385797#10529685 (10Andrew) 05Open→03Resolved a:03Andrew New theory: this project may have been deleted in some nonstandard way tha... [17:16:29] RESOLVED: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate pontoon-conf-01.monitoring.eqiad.wmflabs is about to expire in 25d 21h 6m 40s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [17:18:48] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld [17:19:32] (03approved) 10raymond-ndibe: d/changelog: bump to 1.6.6 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/72 (https://phabricator.wikimedia.org/T359804) [17:19:38] (03update) 10raymond-ndibe: d/changelog: bump to 1.6.6 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/72 (https://phabricator.wikimedia.org/T359804) [17:19:42] (03merge) 10raymond-ndibe: d/changelog: bump to 1.6.6 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/72 (https://phabricator.wikimedia.org/T359804) [17:35:36] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1063.eqiad.wmnet}' [17:35:37] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1062.eqiad.wmnet}' [17:52:02] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1062.eqiad.wmnet}' [17:52:05] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1061.eqiad.wmnet}' [17:52:05] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1059.eqiad.wmnet}' [17:56:46] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1061.eqiad.wmnet}' [17:56:47] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1060.eqiad.wmnet}' [18:00:13] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#10529815 (10bd808) I now have a script at `mwmaint2002.codfw.wmnet:/home/bd808/projects/wikitech/2025-02-04/rai.py` that can read a TSV data file of (... [18:13:36] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1059.eqiad.wmnet}' [18:13:37] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1058.eqiad.wmnet}' [18:21:10] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1060.eqiad.wmnet}' [18:21:54] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1049.eqiad.wmnet}' [18:36:53] PROBLEM - Host cloudvirt1049 is DOWN: PING CRITICAL - Packet loss = 100% [18:40:33] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1049.eqiad.wmnet}' [18:40:34] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1048.eqiad.wmnet}' [18:40:53] RECOVERY - Host cloudvirt1049 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [18:41:54] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1058.eqiad.wmnet}' [18:41:55] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1057.eqiad.wmnet}' [18:50:32] 06cloud-services-team, 06DC-Ops, 10Ganeti, 06Infrastructure-Foundations, and 2 others: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10530006 (10VRiley-WMF) Thanks! Yeah, we wouldn't need much downtime for this ganeti1044 device. No changes to IP or anythin... [19:00:19] PROBLEM - Host cloudvirt1048 is DOWN: PING CRITICAL - Packet loss = 100% [19:02:32] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1057.eqiad.wmnet}' [19:02:34] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1056.eqiad.wmnet}' [19:02:49] RECOVERY - Host cloudvirt1048 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [19:02:58] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1048.eqiad.wmnet}' [19:02:59] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1047.eqiad.wmnet}' [19:03:10] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: VM live migration failing for many/most VMs - https://phabricator.wikimedia.org/T385264#10530026 (10Andrew) 05Open→03Resolved [19:19:29] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1056.eqiad.wmnet}' [19:19:30] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1055.eqiad.wmnet}' [19:19:52] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1047.eqiad.wmnet}' [19:19:53] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1046.eqiad.wmnet}' [19:24:27] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1055.eqiad.wmnet}' [19:24:29] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1054.eqiad.wmnet}' [19:42:49] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1046.eqiad.wmnet}' [19:42:50] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1045.eqiad.wmnet}' [19:48:52] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1054.eqiad.wmnet}' [19:48:53] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1053.eqiad.wmnet}' [20:03:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:04:03] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1045.eqiad.wmnet}' [20:04:04] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1044.eqiad.wmnet}' [20:04:46] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1053.eqiad.wmnet}' [20:04:47] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1052.eqiad.wmnet}' [20:13:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:25:31] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1052.eqiad.wmnet}' [20:25:32] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1051.eqiad.wmnet}' [20:26:16] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1044.eqiad.wmnet}' [20:26:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1043.eqiad.wmnet}' [20:46:14] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1043.eqiad.wmnet}' [20:46:15] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1042.eqiad.wmnet}' [20:48:47] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1051.eqiad.wmnet}' [20:48:48] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1050.eqiad.wmnet}' [21:09:28] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1042.eqiad.wmnet}' [21:09:30] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1041.eqiad.wmnet}' [21:10:09] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1050.eqiad.wmnet}' [21:25:38] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99) on hosts matched by 'D{cloudvirt1041.eqiad.wmnet}' [21:25:39] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1040.eqiad.wmnet}' [21:46:35] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1040.eqiad.wmnet}' [22:06:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:18:37] 10VPS-project-Phabricator: phab.wmflabs.org should be more clearly marked as a testing instance - https://phabricator.wikimedia.org/T283088#10530401 (10stjn) At least one user mistakenly went to phabricator.wmcloud.org from https://www.mediawiki.org/wiki/Phabricator and did not notice it is a separate website. I... [22:20:33] 10VPS-project-Phabricator, 06collaboration-services: phab.wmflabs.org should be more clearly marked as a testing instance - https://phabricator.wikimedia.org/T283088#10530417 (10Nemoralis) [22:46:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:59:47] (03update) 10raymond-ndibe: [jobs-api] replace load with diff_job runtime method [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/143 (https://phabricator.wikimedia.org/T359804) [23:00:49] (03update) 10raymond-ndibe: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) [23:23:16] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:28:16] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:46:12] (03update) 10raymond-ndibe: [jobs-api] replace load with diff_job runtime method [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/143 (https://phabricator.wikimedia.org/T359804) [23:47:09] (03update) 10raymond-ndibe: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) [23:51:59] (03update) 10raymond-ndibe: [jobs-api] replace load with diff_job runtime method [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/143 (https://phabricator.wikimedia.org/T359804) [23:52:47] (03update) 10raymond-ndibe: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804)