[00:08:10] 06cloud-services-team, 10Toolforge, 07SecTeam-Processed, 07Security, 07Vuln-Infoleak: 21 public Python tool configuration files - https://phabricator.wikimedia.org/T286416#10360188 (10Pppery) Anything left to do here? [00:13:56] 06cloud-services-team, 10Tool-bodh, 10Toolforge, 07Security, 07Vuln-Infoleak: Bodh Toolforge OAuth consumer was world-readable - https://phabricator.wikimedia.org/T318622#10360190 (10Pppery) Anything left to do here? [00:53:14] FIRING: Kernel error: Server cloudvirt1062 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1062 - https://alerts.wikimedia.org/?q=alertname%3DKernel+error [00:53:14] FIRING: Kernel warning: Server cloudvirt1062 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1062 - https://alerts.wikimedia.org/?q=alertname%3DKernel+warning [01:23:30] 10VPS-Projects, 07Documentation: Create docs on how to restart both varnish and the node process for wikistream - https://phabricator.wikimedia.org/T232547#10360272 (10bd808) 05Open→03Declined The wikistream Cloud VPS project was deleted after migrating wikistream to Toolforge. See also {T251555}. [01:26:31] 10Tools: wikistream.toolforge.org not working properly with new irc.wikimedia.org implementation - https://phabricator.wikimedia.org/T380950 (10bd808) 03NEW [01:28:47] 10Tools: wikistream.toolforge.org not working properly with new irc.wikimedia.org implementation - https://phabricator.wikimedia.org/T380950#10360291 (10bd808) a:03bd808 [01:57:49] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org - https://phabricator.wikimedia.org/T374830#10360310 (10cscott) Ther... [02:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:09:59] 10tool-wscontest, 07good first task: Make all interface messages translatable - https://phabricator.wikimedia.org/T346994#10360346 (10Samwilson) 05Open→03Resolved Done and released in [[https://github.com/wikisource/wscontest/releases/tag/2.5.3 | 2.5.3]]. Thanks @AS1100K! [04:10:00] 10tool-wscontest, 07good first task: Add UTC in the WSContest contest page - https://phabricator.wikimedia.org/T331225#10360348 (10Samwilson) Done and released in [[https://github.com/wikisource/wscontest/releases/tag/2.5.3 | 2.5.3]]. Thanks @AS1100K! [04:53:14] FIRING: Kernel error: Server cloudvirt1062 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1062 - https://alerts.wikimedia.org/?q=alertname%3DKernel+error [04:53:14] FIRING: Kernel warning: Server cloudvirt1062 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1062 - https://alerts.wikimedia.org/?q=alertname%3DKernel+warning [05:57:29] 10ToolforgeBundle: Upgrade to Symfony 7 - https://phabricator.wikimedia.org/T361554#10360386 (10Samwilson) [05:57:30] 10tool-wscontest: The score command throws deprecated warning - https://phabricator.wikimedia.org/T348270#10360387 (10Samwilson) [05:58:16] 10ToolforgeBundle: Upgrade to Symfony 7 - https://phabricator.wikimedia.org/T361554#10360388 (10Samwilson) Another PR, for fixing the above name error and switching to route attributes: https://github.com/wikimedia/ToolforgeBundle/pull/70 [06:21:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:01:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:36:44] 06cloud-services-team, 10Toolforge (Toolforge iteration 16): [jobs-api,jobs-emailer] Prometheus monitoring toolforge-jobs server side components - https://phabricator.wikimedia.org/T320284#10360564 (10dcaro) [08:42:07] (03open) 10dcaro: jobs-api: add alerts for it being down [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/20 (https://phabricator.wikimedia.org/T320284) [08:43:50] 06cloud-services-team, 10Toolforge: jobs-api crashing - https://phabricator.wikimedia.org/T380832#10360568 (10dcaro) > why no alerts? Patch for the alerts https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/20 For some reason I was expecting to get an alert for any scrape target being... [08:53:14] FIRING: Kernel error: Server cloudvirt1062 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1062 - https://alerts.wikimedia.org/?q=alertname%3DKernel+error [08:53:14] FIRING: Kernel warning: Server cloudvirt1062 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudvirt1062 - https://alerts.wikimedia.org/?q=alertname%3DKernel+warning [09:04:52] 06cloud-services-team, 10Toolforge (Quota-requests): Increase kurbernetes quota for tools.multichill - https://phabricator.wikimedia.org/T380902#10360590 (10Slst2020) Hi @Multichill ! Please edit the task to include all the info needed, see https://phabricator.wikimedia.org/project/manage/4834/, so that your r... [09:21:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:31:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:09:27] 06cloud-services-team, 10Toolforge, 10Sustainability (Incident Followup): toolforge: create docs on how to operate the cluster and core components - https://phabricator.wikimedia.org/T380959 (10aborrero) 03NEW [10:11:50] 06cloud-services-team: Kernel error Server cloudvirt1062 may have kernel errors - https://phabricator.wikimedia.org/T380923#10360707 (10aborrero) 05Open→03Resolved a:03aborrero server was rebooted: `Nov 26 16:51:04 cloudvirt1062 kernel: x86/cpu: SGX disabled by BIOS.` [10:27:59] 06cloud-services-team: kernel error detector: have a way to ignore certain messages - https://phabricator.wikimedia.org/T380960 (10aborrero) 03NEW [10:30:16] 06cloud-services-team: kernel error detector: have a way to ignore certain messages - https://phabricator.wikimedia.org/T380960#10360754 (10aborrero) p:05Triage→03Medium [10:31:03] 06cloud-services-team, 10Toolforge, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, 07Epic: [builds-api,harbor,builds-builder] user-story 11: I want to know how to debug the service - https://phabricator.wikimedia.org/T325172#10360760 (10dcaro) 05Open→03Resolved a:03dcaro [10:33:28] 06cloud-services-team, 10Toolforge, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project: [builds-builder,harbor,bulid-service,docs] user-story 11: Add section to admin docs on how to debug the service, how to pin-point the failing component a... - https://phabricator.wikimedia.org/T325174#10360756 [10:33:44] 06cloud-services-team, 10Toolforge, 10Sustainability (Incident Followup): toolforge: create docs on how to operate the cluster and core components - https://phabricator.wikimedia.org/T380959#10360764 (10dcaro) [10:33:48] 06cloud-services-team, 10Toolforge, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, 07Epic: [builds-api,harbor,builds-builder] user-story 11: I want to know how to debug the service - https://phabricator.wikimedia.org/T325172#10360765 (10dcaro) [10:33:53] 06cloud-services-team, 14Toolforge (Toolforge iteration 03), 14Toolforge Build Service, 05Cloud-Services-Origin-Team, and 2 others: tbs: user-story 10: I want to know how to manage the service - https://phabricator.wikimedia.org/T325166#10360766 (10dcaro) [10:34:04] 06cloud-services-team, 10Toolforge, 10Sustainability (Incident Followup): toolforge: create docs on how to operate the cluster and core components - https://phabricator.wikimedia.org/T380959#10360768 (10dcaro) [10:34:12] 06cloud-services-team, 10Toolforge (Toolforge iteration 16), 13Patch-For-Review: [jobs-api,jobs-emailer] Prometheus monitoring toolforge-jobs server side components - https://phabricator.wikimedia.org/T320284#10360769 (10dcaro) [10:36:28] 06cloud-services-team, 10Toolforge, 10Sustainability (Incident Followup): toolforge: create docs on how to operate the cluster and core components - https://phabricator.wikimedia.org/T380959#10360785 (10dcaro) [10:39:52] (03open) 10dcaro: builds-api: add up alert [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/21 (https://phabricator.wikimedia.org/T380959) [11:28:51] (03open) 10dcaro: envvars-service: add alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/22 (https://phabricator.wikimedia.org/T380959) [11:36:17] (03approved) 10fnegri: builds-api: add up alert [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/21 (https://phabricator.wikimedia.org/T380959) (owner: 10dcaro) [12:21:40] (03update) 10dcaro: envvars-service: add alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/22 (https://phabricator.wikimedia.org/T380959) [12:22:26] (03update) 10raymond-ndibe: [toolforge-deploy] allow for running both admin and tools tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/605 (https://phabricator.wikimedia.org/T358225) [12:22:32] (03update) 10dcaro: builds-api: add up alert [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/21 (https://phabricator.wikimedia.org/T380959) [12:22:36] (03update) 10dcaro: builds-api: add up alert [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/21 (https://phabricator.wikimedia.org/T380959) [12:23:43] (03merge) 10dcaro: builds-api: add up alert [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/21 (https://phabricator.wikimedia.org/T380959) [12:26:25] (03update) 10dcaro: jobs-api: add alerts for it being down [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/20 (https://phabricator.wikimedia.org/T320284) [12:35:51] (03update) 10raymond-ndibe: [toolforge-deploy] allow for running both admin and tools tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/605 (https://phabricator.wikimedia.org/T358225) [12:42:22] (03update) 10raymond-ndibe: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] (admin_and_tools_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [12:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:52:36] (03update) 10raymond-ndibe: [toolforge-deploy] allow for running both admin and tools tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/605 (https://phabricator.wikimedia.org/T358225) [12:55:25] (03update) 10raymond-ndibe: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] (admin_and_tools_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [12:56:03] (03update) 10raymond-ndibe: [toolforge-deploy] allow for running both admin and tools tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/605 (https://phabricator.wikimedia.org/T358225) [13:22:17] (03approved) 10sstefanova: config: load api url from env too [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/66 (https://phabricator.wikimedia.org/T379893) (owner: 10dcaro) [13:23:32] (03update) 10raymond-ndibe: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] (admin_and_tools_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [13:23:38] (03update) 10raymond-ndibe: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] (admin_and_tools_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [13:24:21] (03update) 10raymond-ndibe: [toolforge-deploy] allow for running both admin and tools tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/605 (https://phabricator.wikimedia.org/T358225) [13:25:27] (03close) 10raymond-ndibe: [lima-kilo] add harbor to toolforge-common.yaml [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/214 (https://phabricator.wikimedia.org/T358225) [13:26:16] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1061.eqiad.wmnet' [13:26:31] (03update) 10raymond-ndibe: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] (admin_and_tools_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [13:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:36:32] 06cloud-services-team, 10Cloud-VPS, 10Sustainability (Incident Followup): openstack: introduce additional DNS monitoring and alerting - https://phabricator.wikimedia.org/T380980#10361467 (10aborrero) we have at least __some__ prometheus metrics about pdns, but I don't think we have alerts based on them. See... [13:40:07] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1061.eqiad.wmnet' [13:40:12] (03update) 10sstefanova: envvars-service: add alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/22 (https://phabricator.wikimedia.org/T380959) (owner: 10dcaro) [13:41:53] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Kernel error Server cloudvirt1061 may have kernel errors - https://phabricator.wikimedia.org/T380673#10361494 (10aborrero) the server has been drained and is ready for a reboot when you need it. [13:45:30] (03update) 10raymond-ndibe: [toolforge-deploy] allow for running both admin and tools tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/605 (https://phabricator.wikimedia.org/T358225) [13:45:43] 06cloud-services-team, 10Cloud-VPS, 10Sustainability (Incident Followup): openstack: increase virtual network observability - https://phabricator.wikimedia.org/T380886#10361507 (10aborrero) >>! In T380886#10359947, @bd808 wrote: >> running on every toolforge kubernetes worker node, ping other workers on the... [13:46:07] (03update) 10raymond-ndibe: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] (admin_and_tools_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [13:46:38] 06cloud-services-team, 10Cloud-VPS, 10Sustainability (Incident Followup): toolforge: introduce additional observability for calico and general networking - https://phabricator.wikimedia.org/T380892#10361502 (10aborrero) [13:47:34] (03update) 10raymond-ndibe: [toolforge-deploy] allow for running both admin and tools tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/605 (https://phabricator.wikimedia.org/T358225) [13:48:00] (03update) 10raymond-ndibe: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] (admin_and_tools_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [13:55:09] 06cloud-services-team, 10Cloud-VPS, 10Sustainability (Incident Followup): openstack: increase virtual network observability - https://phabricator.wikimedia.org/T380886#10361545 (10aborrero) [14:00:12] (03update) 10dcaro: jobs-api: add alerts for it being down [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/20 (https://phabricator.wikimedia.org/T320284) [14:00:14] (03update) 10dcaro: jobs-api: add alerts for it being down [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/20 (https://phabricator.wikimedia.org/T320284) [14:00:51] (03update) 10dcaro: envvars-service: add alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/22 (https://phabricator.wikimedia.org/T380959) [14:01:36] (03approved) 10sstefanova: jobs-api: add alerts for it being down [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/20 (https://phabricator.wikimedia.org/T320284) (owner: 10dcaro) [14:02:36] (03open) 10dcaro: builds-api: fix typo in the test [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/23 [14:03:28] (03approved) 10sstefanova: envvars-service: add alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/22 (https://phabricator.wikimedia.org/T380959) (owner: 10dcaro) [14:03:54] (03approved) 10sstefanova: builds-api: fix typo in the test [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/23 (owner: 10dcaro) [14:11:37] (03update) 10dcaro: envvars-service: add alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/22 (https://phabricator.wikimedia.org/T380959) [14:11:59] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Kernel error Server cloudvirt1061 may have kernel errors - https://phabricator.wikimedia.org/T380673#10361606 (10Jclark-ctr) Dell rejected parts request opening new ticket with them 201666996 [14:12:33] (03merge) 10dcaro: envvars-service: add alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/22 (https://phabricator.wikimedia.org/T380959) [14:15:54] PROBLEM - Host cloudvirt1061 is DOWN: PING CRITICAL - Packet loss = 100% [14:20:00] 06cloud-services-team, 10Toolforge: [infra,k8s] Introduce worker checks - https://phabricator.wikimedia.org/T380985 (10dcaro) 03NEW [14:20:09] FIRING: CloudVirtDown: Cloudvirt node cloudvirt1061 is down. #page - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CloudVirtDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1061 - https://alerts.wikimedia.org/?q=alertname%3DCloudVirtDown [14:20:13] 06cloud-services-team, 10Toolforge: [jobs-api] crashing - https://phabricator.wikimedia.org/T380832#10361659 (10dcaro) [14:20:15] 06cloud-services-team: CloudVirtDown Cloudvirt node cloudvirt1061 is down. # page - https://phabricator.wikimedia.org/T380986 (10phaultfinder) 03NEW [14:20:31] 06cloud-services-team, 10Toolforge (Toolforge iteration 16): [jobs-api] crashing - https://phabricator.wikimedia.org/T380832#10361666 (10dcaro) [14:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:21:30] 06cloud-services-team, 10Toolforge (Toolforge iteration 16), 13Patch-For-Review, 10Sustainability (Incident Followup): [docs,envvars-api,jobs-api,builds-api] create docs on how to operate the cluster and core components - https://phabricator.wikimedia.org/T380959#10361674 (10dcaro) p:05Triage→03High [14:22:10] 06cloud-services-team, 10Toolforge (Toolforge iteration 16), 13Patch-For-Review, 10Sustainability (Incident Followup): [docs,envvars-api,jobs-api,builds-api] create docs on how to operate the cluster and core components - https://phabricator.wikimedia.org/T380959#10361672 (10dcaro) [14:23:12] RECOVERY - Host cloudvirt1061 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [14:23:36] (03update) 10dcaro: builds-api: fix typo in the test [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/23 [14:24:10] 06cloud-services-team, 10Toolforge, 10Sustainability (Incident Followup): [infra,k8s,o11y] introduce additional observability for calico and general networking - https://phabricator.wikimedia.org/T380892#10361677 (10dcaro) [14:24:36] (03merge) 10dcaro: builds-api: fix typo in the test [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/23 [14:25:08] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Kernel error Server cloudvirt1061 may have kernel errors - https://phabricator.wikimedia.org/T380673#10361683 (10Jclark-ctr) Finished with bios update waiting on dell for response for new ticket [14:26:01] (03update) 10dcaro: jobs-api: add alerts for it being down [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/20 (https://phabricator.wikimedia.org/T320284) [14:26:46] 06cloud-services-team, 10Toolforge: [infra,k8s,o11y] Introduce worker checks - https://phabricator.wikimedia.org/T380985#10361680 (10dcaro) p:05Triage→03Medium [14:27:01] (03merge) 10dcaro: jobs-api: add alerts for it being down [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/20 (https://phabricator.wikimedia.org/T320284) [14:28:32] 06cloud-services-team: CloudVirtDown Cloudvirt node cloudvirt1061 is down. # page - https://phabricator.wikimedia.org/T380986#10361698 (10fnegri) 05Open→03Resolved a:03fnegri This server is currently undergoing maintenance (T380673) but it was not downtimed. I have now downtimed it for 14 days. [14:29:02] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Kernel error Server cloudvirt1061 may have kernel errors - https://phabricator.wikimedia.org/T380673#10361703 (10fnegri) 05Open→03In progress [14:30:32] 06cloud-services-team, 10Toolforge (Toolforge iteration 16), 13Patch-For-Review, 10Sustainability (Incident Followup): [docs,envvars-api,jobs-api,builds-api] create docs on how to operate the cluster and core components - https://phabricator.wikimedia.org/T380959#10361726 (10dcaro) [14:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:48:46] 06cloud-services-team, 10Toolforge: [harbor] some artifacts and projects seems to have gone missing - https://phabricator.wikimedia.org/T380833#10361805 (10Andrew) I'm told via IRC discussion that 'quite a few' is ~10 [14:56:38] 06cloud-services-team, 10Cloud-VPS, 10Sustainability (Incident Followup): openstack: prevent puppet from restarting neutron-openvswitch-agent - https://phabricator.wikimedia.org/T380972#10361862 (10Andrew) Is there any theory about why restarting openvswitch-agent is more delicate than restarting the old... [14:59:51] 06cloud-services-team, 10Toolforge: [harbor] some artifacts and projects seems to have gone missing - https://phabricator.wikimedia.org/T380833#10361878 (10dcaro) a quick scan of the harbor images running on the cluster reveals 3 projects with missing harbor project: ` dcaro@urcuchillay$ grep '###' out #######... [15:01:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:02:11] 06cloud-services-team, 10Cloud-VPS, 10Sustainability (Incident Followup): openstack: prevent puppet from restarting neutron-openvswitch-agent - https://phabricator.wikimedia.org/T380972#10361884 (10aborrero) >>! In T380972#10361862, @Andrew wrote: > Is there any theory about why restarting openvswitch-ag... [15:06:06] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:11:06] RESOLVED: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:13:26] 06cloud-services-team, 10Tool-bodh, 10Toolforge, 07Security, 07Vuln-Infoleak: Bodh Toolforge OAuth consumer was world-readable - https://phabricator.wikimedia.org/T318622#10361926 (10sbassett) 05Open→03Resolved p:05Triage→03Medium a:03bd808 [15:16:58] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991 (10Lucas_Werkmeister_WM... [15:19:26] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org - https://phabricator.wikimedia.org/T374830#10361978 (10Lucas_Werkmei... [15:20:08] 10Tool-wikiqanda, 06Future-Audiences: Sensitive topic filtering/special-casing - https://phabricator.wikimedia.org/T378851#10362005 (10Maryana) [15:20:12] 10Tool-wikiqanda, 06Future-Audiences, 07Epic: [Epic] Discord Q&A bot Milestone 2 - https://phabricator.wikimedia.org/T378121#10362007 (10Maryana) [15:23:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:28:06] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:29:39] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:30:44] 10Tool-wikiqanda, 06Future-Audiences, 07Design: First-time UX - https://phabricator.wikimedia.org/T380996 (10Maryana) 03NEW [15:31:30] 10Tool-wikiqanda, 06Future-Audiences: Fill in bot profile - https://phabricator.wikimedia.org/T380997 (10Maryana) 03NEW [15:34:39] RESOLVED: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:37:28] 10Tool-wikiqanda, 06Future-Audiences: Review data logging practices for past experiments - https://phabricator.wikimedia.org/T380655#10362131 (10Maryana) [15:37:30] 10Tool-wikiqanda, 06Future-Audiences, 07Epic: [Epic] Discord Q&A bot Milestone 2 - https://phabricator.wikimedia.org/T378121#10362132 (10Maryana) [16:02:54] 10Tool-wikiqanda, 06Future-Audiences: Data collection for external release - https://phabricator.wikimedia.org/T380780#10362240 (10Maryana) [16:12:52] 10Tool-wikiqanda, 06Future-Audiences, 07Design, 07Epic: [Epic] Non-Q&A use-cases - https://phabricator.wikimedia.org/T378125#10362275 (10Maryana) [16:16:29] 10Tool-wikiqanda, 06Future-Audiences, 07Design: Bot user journey mapping - https://phabricator.wikimedia.org/T381001 (10Maryana) 03NEW [16:16:50] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10362307 (10bd808) My... [16:24:07] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10362357 (10Lucas_Werkm... [16:29:14] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10362375 (10fnegri) +1... [16:31:18] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10362378 (10bd808) >>!... [16:32:18] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10362402 (10bd808)... [16:33:44] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org - https://phabricator.wikimedia.org/T374830#10362413 (10bd808) >>! In T3... [16:34:04] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org - https://phabricator.wikimedia.org/T374830#10362400 (10bd808) [16:34:23] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org - https://phabricator.wikimedia.org/T374830#10362405 (10bd808) 05Resolv... [16:35:04] 06cloud-services-team, 10Cloud-VPS: 2024-11-26 openstack network problems - https://phabricator.wikimedia.org/T380882#10362429 (10fnegri) [16:35:07] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 07Puppet: Puppet removed "nameserver" line from /etc/resolv.conf - https://phabricator.wikimedia.org/T379927#10362431 (10fnegri) [16:35:16] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10362432 [16:35:30] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10362433 [16:39:07] 06cloud-services-team, 10Cloud-VPS: openstack network problems - https://phabricator.wikimedia.org/T380882#10362438 (10fnegri) [16:40:48] 06cloud-services-team, 10Toolforge: [harbor] some artifacts and projects seems to have gone missing - https://phabricator.wikimedia.org/T380833#10362466 (10dcaro) Full list of tools without artifact or with no project in harbor (total of 8, half of them are ours): ` dcaro@urcuchillay$ grep '###' out ##########... [16:42:06] 10Tool-wikiqanda, 06Future-Audiences: Add testing capabilities to Slack bot - https://phabricator.wikimedia.org/T379029#10362472 (10etz) [16:44:01] 10Tool-wikiqanda, 06Future-Audiences: Slack version of bad response logging - https://phabricator.wikimedia.org/T380216#10362487 (10etz) 05Open→03Resolved [16:44:03] 10Tool-wikiqanda, 06Future-Audiences: Flag incorrect answer (internal testing version) - https://phabricator.wikimedia.org/T378821#10362490 (10etz) 05Open→03Resolved [16:44:47] 10Tool-wikiqanda, 06Future-Audiences, 07Design: Bot user personas - https://phabricator.wikimedia.org/T381009 (10Maryana) 03NEW [16:45:15] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10362518 [16:45:21] 10Tool-wikiqanda, 06Future-Audiences, 07Security: Safeguard against "pretend you're X" jailbreak hack - https://phabricator.wikimedia.org/T378853#10362493 (10etz) 05Open→03Resolved [16:47:32] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10362520 [16:47:42] 10Tool-wikiqanda, 06Future-Audiences: [Chatbot] Add performance instrumentation tests - https://phabricator.wikimedia.org/T379650#10362525 (10etz) Moving to done since this info is captured in this ticket, https://phabricator.wikimedia.org/T379029, which has been adjusted for Slack [16:47:47] 10Tool-wikiqanda, 06Future-Audiences, 07Design: Bot user personas - https://phabricator.wikimedia.org/T381009#10362535 (10Maryana) [16:48:09] 10Tool-wikiqanda, 06Future-Audiences, 07Design: Bot user personas - https://phabricator.wikimedia.org/T381009#10362522 (10Maryana) a:05SChekfa-WMF→03None [16:48:15] 10Tool-wikiqanda, 06Future-Audiences: [Chatbot] Add performance instrumentation tests - https://phabricator.wikimedia.org/T379650#10362530 (10etz) 05Open→03Resolved [16:48:58] 10Tool-wikiqanda, 06Future-Audiences, 07Epic: [Epic] Discord Q&A bot Milestone 3 - https://phabricator.wikimedia.org/T378126#10362540 (10Maryana) [16:50:06] 10Tool-wikiqanda, 06Future-Audiences: Review data logging practices for past experiments - https://phabricator.wikimedia.org/T380655#10362542 (10Maryana) [16:50:08] 10Tool-wikiqanda, 06Future-Audiences, 07Epic: [Epic] Discord Q&A bot Milestone 3 - https://phabricator.wikimedia.org/T378126#10362543 (10Maryana) [16:50:12] 10Tool-wikiqanda, 06Future-Audiences, 07Epic: [Epic] Discord Q&A bot Milestone 2 - https://phabricator.wikimedia.org/T378121#10362544 (10Maryana) [16:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:51:17] 10Tool-wikiqanda, 06Future-Audiences, 07Epic: [Epic] Discord Q&A bot Milestone 2 - https://phabricator.wikimedia.org/T378121#10362551 (10Maryana) [16:51:36] !log taavi@cloudcumin1001 deployment-prep START - Cookbook wmcs.vps.remove_instance for instance deployment-cumin [16:51:58] !log taavi@cloudcumin1001 deployment-prep END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance deployment-cumin [16:52:07] 10Tool-wikiqanda, 06Future-Audiences: Test out logging before publicly announcing internal Slack launch - https://phabricator.wikimedia.org/T381010 (10Maryana) 03NEW [16:52:36] 10Tool-wikiqanda, 06Future-Audiences: Test out logging before publicly announcing internal Slack launch - https://phabricator.wikimedia.org/T381010#10362566 (10Maryana) [16:52:37] 10Tool-wikiqanda, 06Future-Audiences, 07Epic: [Epic] Discord Q&A bot Milestone 2 - https://phabricator.wikimedia.org/T378121#10362567 (10Maryana) [16:54:10] 06cloud-services-team, 10Toolforge: [functional-tests,deploy,cookbook] Run only selected tests when deploying a component - https://phabricator.wikimedia.org/T381011 (10dcaro) 03NEW [16:54:23] 06cloud-services-team, 10Toolforge: [functional-tests,deploy,cookbook] Run only selected tests when deploying a component - https://phabricator.wikimedia.org/T381011#10362593 (10dcaro) p:05Triage→03Medium [16:59:43] 10PAWS: update application cred for codfw1dev - https://phabricator.wikimedia.org/T380900#10362641 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/464 [16:59:47] 10PAWS: update application cred for codfw1dev - https://phabricator.wikimedia.org/T380900#10362642 (10rook) 05Open→03Resolved [16:59:49] vivian-rook closed https://github.com/toolforge/paws/pull/464 [17:00:47] 10Tool-wikiqanda, 06Future-Audiences, 07Design: Bot user journey mapping - https://phabricator.wikimedia.org/T381001#10362643 (10Maryana) a:05SChekfa-WMF→03None [17:10:57] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10362670 (10Andrew... [17:18:01] 10Tool-wikiqanda, 06Future-Audiences, 07Epic: [Epic] Discord Q&A bot Milestone 2 - https://phabricator.wikimedia.org/T378121#10362721 (10Maryana) [17:21:03] 06cloud-services-team, 10Cloud-VPS: openstack network problems (November 2024) - https://phabricator.wikimedia.org/T380882#10362738 (10Aklapper) [17:24:29] (03update) 10dcaro: cli: add config subcommands [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/2 (https://phabricator.wikimedia.org/T379091) (owner: 10sstefanova) [17:27:51] (03approved) 10dcaro: cli: add config subcommands [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/2 (https://phabricator.wikimedia.org/T379091) (owner: 10sstefanova) [17:31:33] (03approved) 10dcaro: cli: add deploy-token subcommands [repos/cloud/toolforge/components-cli] (slavina/add-config-subcommands) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/3 (https://phabricator.wikimedia.org/T379091) (owner: 10sstefanova) [17:36:34] (03approved) 10dcaro: cli: add deployment subcommands [repos/cloud/toolforge/components-cli] (slavina/add-deploy-token-commands) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/4 (https://phabricator.wikimedia.org/T379091) (owner: 10sstefanova) [17:51:11] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10362840 (10Andrew... [18:28:30] 06cloud-services-team, 10Cloud-VPS: [horizon] Floating IP pointing to Neutron VIP is not displayed - https://phabricator.wikimedia.org/T381021 (10fnegri) 03NEW [18:35:22] 06cloud-services-team: kernel error detector: have a way to ignore certain messages - https://phabricator.wikimedia.org/T380960#10362997 (10dcaro) Another possibility (maybe on top of) would be to be able to acknowledge the errors, for example read a timestamp from a file before which the errors will be ignored... [18:45:48] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10363016 (10Andrew... [18:46:04] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10363013 (10Andrew... [18:51:27] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10363043 [19:06:05] 06cloud-services-team, 10Toolforge, 07SecTeam-Processed, 07Security, 07Vuln-Infoleak: 21 public Python tool configuration files - https://phabricator.wikimedia.org/T286416#10363159 (10LucasWerkmeister) 05Open→03Resolved I don’t think so. [19:06:53] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10363163 (10Andrew) 0... [19:08:37] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10363177 (10Andrew) ht... [19:10:50] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10363188 (10Andrew) A... [19:12:44] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10363190 (10Andrew) p:... [19:13:55] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10363191 (10Andrew) Ju... [19:18:14] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10363203 (10Andrew) al... [19:21:47] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 07Puppet: Puppet removed "nameserver" line from /etc/resolv.conf - https://phabricator.wikimedia.org/T379927#10363222 (10Andrew) This has not recurred. Nevertheless we should figure out what's happening with the ruby functions that don't rai... [19:22:00] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10363225 (10bd808) > A... [19:34:34] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10363330 [19:34:40] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10363336 (10Andrew) >>... [19:36:43] 06cloud-services-team, 10Cloud-VPS, 10Cumin, 06Infrastructure-Foundations: Revive the HostFile backend on cloudcuminXXXX - https://phabricator.wikimedia.org/T380789#10363343 (10Volans) @Andrew the host file backend is not a cumin official backend. Cumin allows to plug in externally developed backends and t... [20:00:44] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10363600 [20:13:02] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10363633 [20:50:41] FIRING: CloudVPSDesignateLeaks: Detected 5 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:47:06] 06cloud-services-team, 10Toolforge: [harbor] some artifacts and projects seems to have gone missing - https://phabricator.wikimedia.org/T380833#10363916 (10Raymond_Ndibe) I looked into this a bit. In my opinion, there are 5 ways we know projects can be deleted [technically it's just 3, the others are just abst... [22:01:48] (03update) 10raymond-ndibe: [toolforge-deploy] allow for running both admin and tools tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/605 (https://phabricator.wikimedia.org/T358225) [22:02:49] (03update) 10raymond-ndibe: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] (admin_and_tools_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) [22:09:22] (03update) 10raymond-ndibe: [toolforge-deploy] allow for running both admin and tools tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/605 (https://phabricator.wikimedia.org/T358225)