[00:11:38] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10374252 (10bd808) >>! In T376267#10373975, @amastilovic wrote: > Please note that I seem to have another account "Aleksandar Mastilovic" that is still active on Wikitech, but I haven't used... [02:52:51] 06Toolforge-standards-committee, 10Tools, 07SecTeam-Processed, 07Security, 07Vuln-Infoleak: OAuth credentials of Cradle tool are world-readable on Toolforge - https://phabricator.wikimedia.org/T314135#10374425 (10PKM) Thanks, all! [03:45:38] 06cloud-services-team, 10Data-Services, 06DBA, 07Chinese-Sites: Prepare and check storage layer for arbcom_zhwiki - https://phabricator.wikimedia.org/T381086#10374464 (10Shizhao) [05:30:12] (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/1099680 (owner: 10L10n-bot) [06:22:17] 10Tool-video-answer-tool, 06Future-Audiences: Make video narration sped up in preview - https://phabricator.wikimedia.org/T379665#10374565 (10derenrich) PR is in review https://gitlab.wikimedia.org/repos/future-audiences/video-answer-tool/-/merge_requests/55 [07:41:29] (03approved) 10sstefanova: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) (owner: 10raymond-ndibe) [07:53:04] (03approved) 10sstefanova: [maintain-harbor] avoid graphql query complexity error [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/39 (https://phabricator.wikimedia.org/T358225) (owner: 10raymond-ndibe) [08:19:42] (03CR) 10Slavina Stefanova: "wouldn't the ToolforgeRunTestsRunner in cookbooks/wmcs/toolforge/component/deploy.py need the branch param when it's instantiated?" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [08:29:03] (03CR) 10Slavina Stefanova: [wmcs-cookbooks] make extraction of toolforge_get_versions less brittle (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099776 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [09:30:07] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 16): [components-cli] Create cli with subcommand - https://phabricator.wikimedia.org/T379091#10374860 (10Slst2020) 05In progress→03Resolved [09:44:34] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10374924 [10:10:34] FIRING: DiskSpace: Disk space cloudvirt1047:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudvirt1047 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [10:11:57] FIRING: SystemdUnitDown: The service unit prometheus-ethtool-exporter.service is in failed status on host cloudvirt1047. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1047 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:24:45] PROBLEM - Disk space on cloudvirt1047 is CRITICAL: DISK CRITICAL - free space: / 0MiB (0% inode=97%): /tmp 0MiB (0% inode=97%): /var/tmp 0MiB (0% inode=97%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudvirt1047&var-datasource=eqiad+prometheus/ops [10:28:14] 06cloud-services-team, 10Cloud-VPS: cloudgw: suspected network problems - https://phabricator.wikimedia.org/T381078#10375043 (10cmooney) Right so I was able to capture some of the traffic this morning as I could see PAWS node 4 was sending ~2Gb/sec. It seems to be a steady stream of 512-byte UDP packets going... [10:34:29] 10Tool-wikiqanda, 06Future-Audiences: Push the button for Slack launch - https://phabricator.wikimedia.org/T380786#10375061 (10Aklapper) [10:44:45] RECOVERY - Disk space on cloudvirt1047 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudvirt1047&var-datasource=eqiad+prometheus/ops [10:45:34] RESOLVED: DiskSpace: Disk space cloudvirt1047:9100:/ 1.049% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudvirt1047 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [10:56:57] FIRING: [2x] SystemdUnitDown: The service unit prometheus-ethtool-exporter.service is in failed status on host cloudvirt1047. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1047 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:11:57] RESOLVED: [2x] SystemdUnitDown: The service unit prometheus-ethtool-exporter.service is in failed status on host cloudvirt1047. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudvirt1047 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:38:04] 06cloud-services-team, 10Cloud-VPS, 10PAWS: Restrict outbound connectivity from PAWS hosts - https://phabricator.wikimedia.org/T381373 (10fnegri) 03NEW [11:46:51] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 10PAWS: Restrict outbound connectivity from PAWS hosts - https://phabricator.wikimedia.org/T381373#10375228 (10fnegri) 05Open→03In progress p:05Triage→03High a:03fnegri [11:51:32] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 10PAWS: Restrict outbound connectivity from PAWS hosts - https://phabricator.wikimedia.org/T381373#10375254 (10cmooney) Separately I've merged these patches ([[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1100087 | one ]], [[ https://gerrit.wik... [12:00:29] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 10PAWS: Restrict outbound connectivity from PAWS hosts - https://phabricator.wikimedia.org/T381373#10375273 (10cmooney) Seems to be working as expected: ` ip saddr @paws_workers ip protocol udp counter packets 24501532 bytes 3018694396 drop ` Looking a... [12:07:02] dhinus opened https://github.com/toolforge/paws/pull/466 [12:09:18] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 10PAWS: Restrict outbound connectivity from PAWS hosts - https://phabricator.wikimedia.org/T381373#10375325 (10github-toolforge-bot) dhinus opened https://github.com/toolforge/paws/pull/466 [12:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:51:17] (03open) 10sstefanova: functional tests: add components-api tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/631 (https://phabricator.wikimedia.org/T379092) [12:51:22] (03update) 10sstefanova: functional tests: add components-api tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/631 (https://phabricator.wikimedia.org/T379092) [12:51:27] (03update) 10sstefanova: metrics: add prometheus instrumentation [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/46 (https://phabricator.wikimedia.org/T381249) [12:51:31] (03update) 10sstefanova: cli: Improve deploy-token command UX and safety [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/6 (https://phabricator.wikimedia.org/T380706) [12:54:17] 06cloud-services-team, 10Data-Services, 06DBA, 07Chinese-Sites: Prepare and check storage layer for arbcom_zhwiki - https://phabricator.wikimedia.org/T381086#10375512 (10ABran-WMF) rebased with the [[ https://github.com/wikimedia/operations-software-spicerack/releases/tag/v9.0.0 | latest spicerack release... [12:55:02] (03open) 10na1307: Migrate to .NET [toolforge-repos/bluehillbotb] - 10https://gitlab.wikimedia.org/toolforge-repos/bluehillbotb/-/merge_requests/1 (https://phabricator.wikimedia.org/T381303) [12:55:21] (03approved) 10na1307: Migrate to .NET [toolforge-repos/bluehillbotb] - 10https://gitlab.wikimedia.org/toolforge-repos/bluehillbotb/-/merge_requests/1 (https://phabricator.wikimedia.org/T381303) [12:55:38] (03merge) 10na1307: Migrate to .NET [toolforge-repos/bluehillbotb] - 10https://gitlab.wikimedia.org/toolforge-repos/bluehillbotb/-/merge_requests/1 (https://phabricator.wikimedia.org/T381303) [12:56:54] 10Tool-bluehillbotb: Migrate BluehillBot B from pywikibot to .NET - https://phabricator.wikimedia.org/T381303#10375523 (10Bluehill395) [12:56:56] 10Tool-bluehillbotb: Migrate BluehillBot B from pywikibot to .NET - https://phabricator.wikimedia.org/T381303#10375524 (10Bluehill395) 05Open→03Resolved [13:34:36] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 10PAWS: Restrict outbound connectivity from PAWS hosts - https://phabricator.wikimedia.org/T381373#10375682 (10fnegri) [13:53:12] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 10PAWS: Restrict outbound connectivity from PAWS hosts - https://phabricator.wikimedia.org/T381373#10375730 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/466 [13:53:24] vivian-rook closed https://github.com/toolforge/paws/pull/466 [13:56:19] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 10PAWS: Restrict outbound connectivity from PAWS hosts - https://phabricator.wikimedia.org/T381373#10375740 (10aborrero) Thanks for working on this. You may be aware of this, but let me note for the record: PAWS virtual machines are dynamically created... [14:02:32] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 10PAWS: Restrict outbound connectivity from PAWS hosts - https://phabricator.wikimedia.org/T381373#10375763 (10fnegri) @aborrero thanks! Yes the current filter in cloudgw is only meant as a temporary solution. My k8s patch was just merged by @rook, let... [14:08:41] 06cloud-services-team, 07affects-Kiwix-and-openZIM: SystemdUnitDown kiwix-mirror-update.service - https://phabricator.wikimedia.org/T381212#10375802 (10fnegri) [14:08:52] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T381275#10375800 (10fnegri) →14Duplicate dup:03T381212 [14:09:16] 06cloud-services-team, 10Toolforge: [toolsdb] Remove floating IP - https://phabricator.wikimedia.org/T381272#10375803 (10fnegri) p:05Triage→03Medium [14:11:57] (03CR) 10Raymond Ndibe: "This is already being done @sstefanova@wikimedia.org . I'm also going to add it to `ToolforgeRunTests` though it doesn't seem like the cla" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [14:13:09] (03CR) 10Raymond Ndibe: [wmcs-cookbooks] make extraction of toolforge_get_versions less brittle (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099776 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [14:13:40] (03PS5) 10Raymond Ndibe: [wmcs-cookbooks] pass --branch to run_functional_tests.sh [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) [14:19:55] (03PS2) 10Raymond Ndibe: [wmcs-cookbooks] make extraction of toolforge_get_versions less brittle [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099776 (https://phabricator.wikimedia.org/T358225) [14:20:58] (03PS6) 10Raymond Ndibe: [wmcs-cookbooks] pass --branch to run_functional_tests.sh [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) [14:22:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:22:36] (03CR) 10Raymond Ndibe: "Done" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [14:23:03] (03CR) 10CI reject: [V:04-1] [wmcs-cookbooks] make extraction of toolforge_get_versions less brittle [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099776 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [14:24:14] (03CR) 10CI reject: [V:04-1] [wmcs-cookbooks] pass --branch to run_functional_tests.sh [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [14:25:05] 06cloud-services-team: SystemdUnitDown The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://phabricator.wikimedia.org/T381143#10375847 (10fnegri) 05Open→03Resolved a:03fnegri This alert self-resolved a few hours later. I cannot find t... [14:27:06] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:30:31] 06cloud-services-team: PuppetFailure Puppet has failed on cloudcontrol2006-dev:9100 - https://phabricator.wikimedia.org/T380048#10375864 (10fnegri) 05Open→03Resolved a:03fnegri Not sure what this was, but it's not firing anymore. [14:32:58] (03PS3) 10Raymond Ndibe: [wmcs-cookbooks] make extraction of toolforge_get_versions less brittle [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099776 (https://phabricator.wikimedia.org/T358225) [14:33:27] 06cloud-services-team: NovafullstackSustainedFailures Novafullstack tests have been failing for more than 5hours in eqiad - https://phabricator.wikimedia.org/T380067#10375880 (10fnegri) 05Open→03Resolved a:03fnegri Novafullstack tests are working again, [14:38:37] (03CR) 10Raymond Ndibe: [wmcs-cookbooks] make extraction of toolforge_get_versions less brittle (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099776 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [14:39:13] 06cloud-services-team, 06DC-Ops, 10ops-codfw: PowerSupplyFailure Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T380479#10375904 (10Andrew) This has been recurring for some time (e.g. T368211) so probably needs DC attention. @Jhancock.wm, it's OK to power down this... [14:40:05] (03PS7) 10Raymond Ndibe: [wmcs-cookbooks] pass --branch to run_functional_tests.sh [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) [14:41:07] 06cloud-services-team, 06DC-Ops, 10ops-codfw: PowerSupplyFailure Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T380479#10375913 (10fnegri) More previous occurrences: * {T368212} * {T370732} [14:54:08] 06cloud-services-team, 10Cloud-VPS, 10WikiWho: Enable use of web proxy for wikiwho.net domain - https://phabricator.wikimedia.org/T376637#10375965 (10Kkzo) I checked wikiwho.net domain today. It is now working and redirecting to https://wikiwho-api.wmcloud.org/. Could you confirm so that I can close the ti... [15:08:01] 06cloud-services-team, 10Data-Services, 06DBA: Prepare and check storage layer for tigwiki - https://phabricator.wikimedia.org/T381378#10375986 (10Marostegui) p:05Triage→03Medium Let us know when the wiki is created so we can sanitize it [15:12:31] (03CR) 10Slavina Stefanova: [C:03+1] "LGTM" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099776 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [15:13:36] (03CR) 10Slavina Stefanova: [C:03+1] "LGTM" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [15:13:51] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 10PAWS: Restrict outbound connectivity from PAWS hosts - https://phabricator.wikimedia.org/T381373#10375995 (10fnegri) A new spike in packets was detected and succesfully dropped by the new network policy! 🎉 {F57775288} [15:23:13] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10376056 (10amastilovic) >>! In T376267#10374252, @bd808 wrote: > Your "Aleksandar Mastilovic" account on Wikitech has been renamed to "AMastilovic-WMF". Please login to Wikitech with the old... [15:26:17] 06cloud-services-team, 06DC-Ops, 10ops-codfw: PowerSupplyFailure Power Supply - Status - issue on cloudbackup2003:9290 - https://phabricator.wikimedia.org/T380479#10376063 (10Jhancock.wm) Now that you mention it, I think this might be a PDU issue rather than a server issue. Looking back through the tickets w... [15:54:24] 10Tool-wikiqanda, 06Future-Audiences: Add total bot responses to aggregate metrics logging - https://phabricator.wikimedia.org/T381402 (10Maryana) 03NEW [15:54:56] 10Tool-wikiqanda, 06Future-Audiences: Add total bot responses to aggregate metrics logging - https://phabricator.wikimedia.org/T381402#10376290 (10Maryana) [15:54:57] 10Tool-wikiqanda, 06Future-Audiences, 07Epic: [Epic] Discord Q&A bot Milestone 2 - https://phabricator.wikimedia.org/T378121#10376291 (10Maryana) [15:55:13] 06cloud-services-team, 10Cloud-VPS: cloudgw: suspected network problems - https://phabricator.wikimedia.org/T381078#10376292 (10fnegri) p:05Triage→03High [15:55:35] 06cloud-services-team, 10Cloud-VPS: [horizon] Floating IP pointing to Neutron VIP is not displayed - https://phabricator.wikimedia.org/T381021#10376293 (10fnegri) p:05Triage→03Medium [15:57:02] 06cloud-services-team, 10Cloud-VPS: [tofu] [designate] [pdns] Swapping a CNAME and an A record can cause a loop - https://phabricator.wikimedia.org/T381180#10376284 (10fnegri) p:05Triage→03Medium @Andrew suggested this might be expected because pdns will cache the old record for the duration of the TTL, wh... [15:57:23] 10Tool-wikiqanda, 06Future-Audiences, 07Spike: [Spike] Track original queries reacted to by bot - https://phabricator.wikimedia.org/T381403 (10Maryana) 03NEW [15:57:49] 10Tool-wikiqanda, 06Future-Audiences, 07Spike: [Spike] Track original queries reacted to by bot - https://phabricator.wikimedia.org/T381403#10376311 (10Maryana) [15:57:53] 10Tool-wikiqanda, 06Future-Audiences, 07Epic: [Epic] Discord Q&A bot Milestone 2 - https://phabricator.wikimedia.org/T378121#10376312 (10Maryana) [15:58:58] 10Tool-wikiqanda, 06Future-Audiences: Add total bot responses to aggregate metrics logging - https://phabricator.wikimedia.org/T381402#10376327 (10etz) a:03etz [15:59:08] 10Tool-wikiqanda, 06Future-Audiences, 07Spike: [Spike] Track original queries reacted to by bot - https://phabricator.wikimedia.org/T381403#10376328 (10etz) a:03etz [16:02:37] 06cloud-services-team, 10Cloud-VPS: ProbeDown Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T380692#10376340 (10fnegri) p:05Triage→03Medium This probe is showing a few blips in the past week. Maybe re... [16:02:42] 06cloud-services-team, 10Cloud-VPS: ProbeDown Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T380692#10376345 (10fnegri) [16:03:26] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10376336 [16:21:39] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:26:39] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:31:19] 06cloud-services-team, 10Cloud-VPS: ProbeDown Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T380692#10376493 (10fnegri) [16:31:27] 06cloud-services-team, 10Cloud-VPS: cloudgw: suspected network problems - https://phabricator.wikimedia.org/T381078#10376494 (10fnegri) [16:31:39] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:32:33] 06cloud-services-team, 10Cloud-VPS: ProbeDown Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T380692#10376496 (10cmooney) >>! In T380692#10376340, @fnegri wrote: > This probe is showing a few blips in the... [16:33:45] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:38:39] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:38:48] 06cloud-services-team, 10Cloud-VPS: cloudgw: suspected network problems - https://phabricator.wikimedia.org/T381078#10376513 (10taavi) AFAIK NTP traffic should be originating from the nodes (not user-controlled pods) and should stay within Cloud VPS so 123/udp could be dropped from both filters. [16:39:37] 10Tool-yearinreview, 06Indic MediaWiki Developers UG, 06Indic-TechCom: Check number of thanks stats in year in review tool - https://phabricator.wikimedia.org/T381413 (10Kuldeepburjbhalaike) 03NEW [16:40:39] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:40:57] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 10PAWS: Restrict outbound connectivity from PAWS hosts - https://phabricator.wikimedia.org/T381373#10376516 (10taavi) AFAIK NTP traffic should be originating from the nodes (not user-controlled pods) and should stay within Cloud VPS so 123/udp could be... [16:45:39] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:48:44] (03CR) 10Raymond Ndibe: [C:03+2] [wmcs-cookbooks] make extraction of toolforge_get_versions less brittle [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099776 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [16:50:39] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:53:41] (03Merged) 10jenkins-bot: [wmcs-cookbooks] make extraction of toolforge_get_versions less brittle [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099776 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [16:55:39] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:00:23] (03CR) 10Raymond Ndibe: [C:03+2] [wmcs-cookbooks] pass --branch to run_functional_tests.sh [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [17:00:36] 06cloud-services-team, 10Cloud-VPS: Audit WMCS compute capacity - https://phabricator.wikimedia.org/T380099#10376693 (10Andrew) [17:04:16] (03Merged) 10jenkins-bot: [wmcs-cookbooks] pass --branch to run_functional_tests.sh [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [17:08:21] 06cloud-services-team, 10Cloud-VPS, 10Catalyst: Future catalyst cloud-vps usage - https://phabricator.wikimedia.org/T381418 (10Andrew) 03NEW [17:12:09] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [17:13:32] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [17:15:04] 06cloud-services-team, 10Cloud-VPS, 06QTE-TestingOverview, 10GitLab (CI & Job Runners): Future testing-infra growth on cloud-vps - https://phabricator.wikimedia.org/T381419 (10Andrew) 03NEW [17:18:33] 06cloud-services-team, 10Cloud-VPS, 10Beta-Cluster-Infrastructure: Future growth of deployment-prep? - https://phabricator.wikimedia.org/T381420 (10Andrew) 03NEW [17:20:43] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [17:22:06] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [17:22:58] 06cloud-services-team, 10Cloud-VPS, 06QTE-TestingOverview, 10GitLab (CI & Job Runners): Future testing-infra growth on cloud-vps - https://phabricator.wikimedia.org/T381419#10376765 (10bd808) A couple of months ago I grabbed a larger quota for the Integration project with {T376847}. I have not yet built ou... [17:28:01] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06QTE-TestingOverview, 10GitLab (CI & Job Runners): Future testing-infra growth on cloud-vps - https://phabricator.wikimedia.org/T381419#10376792 (10hashar) [17:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:37:12] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [17:38:08] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06QTE-TestingOverview, 10GitLab (CI & Job Runners): Future testing-infra growth on cloud-vps - https://phabricator.wikimedia.org/T381419#10376813 (10hashar) Some time ago, @aborrero offered to have the Jenkins instances (`integr... [17:38:33] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [17:47:09] 06cloud-services-team, 10Toolforge, 06Design-Research, 07Design: Toolforge UI: Publish newcomer experience and recruitment survey - https://phabricator.wikimedia.org/T381266#10376856 (10Sarai-WMF) [17:51:32] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06QTE-TestingOverview, 10GitLab (CI & Job Runners): Future testing-infra growth on cloud-vps - https://phabricator.wikimedia.org/T381419#10376872 (10Andrew) >>! In T381419#10376812, @hashar wrote: > > We have 18 integration-age... [17:52:25] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [17:54:14] 06cloud-services-team, 10Cloud-VPS, 06collaboration-services, 10Continuous-Integration-Infrastructure, and 2 others: Future testing-infra growth on cloud-vps - https://phabricator.wikimedia.org/T381419#10376883 (10Dzahn) [17:55:20] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [17:58:13] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10376906 [17:59:51] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [18:01:14] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [18:03:02] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [18:04:14] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [18:06:46] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [18:07:08] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [18:07:20] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [18:07:34] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [18:07:59] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [18:09:16] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [18:09:21] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [18:09:36] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [18:10:04] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [18:10:20] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [18:14:01] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [18:14:17] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [18:15:40] 06cloud-services-team, 10Cloud-VPS, 10Catalyst: Future catalyst cloud-vps usage - https://phabricator.wikimedia.org/T381418#10376993 (10matmarex) I'm not involved in Catalyst / Patchdemo work these days. (btw you've fallen victim to Phabricator's poor autocompletion – see T353937) [18:20:09] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [18:23:16] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [18:23:36] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [18:23:42] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [18:23:57] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [18:25:04] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [18:25:26] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [18:28:56] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [18:30:52] 06cloud-services-team, 10Toolforge, 06Design-Research, 07Design: Toolforge UI: Publish newcomer experience and recruitment survey - https://phabricator.wikimedia.org/T381266#10377119 (10Sarai-WMF) Central notice banner request: https://meta.wikimedia.org/wiki/CentralNotice/Request#Toolforge_Newcomer_Experi... [18:55:13] 10wikitech.wikimedia.org, 06Content-Transform-Team, 10Parsoid: Parsoid renders "Incident status" (wikitech) incorrectly - https://phabricator.wikimedia.org/T380899#10377217 (10andrea.denisse) [19:11:11] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [19:11:30] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [19:19:45] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [19:20:03] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [19:25:14] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10377366 [19:36:18] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [19:36:43] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [19:37:01] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [19:37:09] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [19:37:25] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [19:37:52] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10377432 [19:44:13] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10377455 [19:47:56] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [19:48:12] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [19:52:14] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [19:52:32] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [19:55:20] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [19:57:43] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [19:58:01] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [20:21:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:29:09] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [20:31:16] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [20:31:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:33:12] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10377592 [20:40:19] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [20:41:42] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [20:44:06] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [20:45:21] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [20:56:40] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 07Puppet: Puppet removed "nameserver" line from /etc/resolv.conf - https://phabricator.wikimedia.org/T379927#10377666 (10Andrew) a:05Andrew→03ssingh I've checked all the resolv.confs and they all look fine. I'm passing this task over to... [20:58:56] 06cloud-services-team, 10Cloud-VPS: openstack: fix missing prometheus metrics - https://phabricator.wikimedia.org/T373878#10377670 (10Andrew) I've replaced a lot of these metrics, but maybe not all of them. @aborrero can you tell me how to reproduce the panel in the screenshot? [21:00:34] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [21:01:27] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [21:02:00] !log raymond-ndibe@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component builds-api [21:02:03] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [21:02:13] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [21:02:36] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [21:02:46] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [21:05:32] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [21:06:56] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [21:09:12] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [21:09:29] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [21:10:02] !log raymond-ndibe@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component builds-api [21:10:06] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [21:10:17] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [21:10:41] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [21:11:53] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [21:12:41] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install cloudcontrol1011 - https://phabricator.wikimedia.org/T380499#10377712 (10Andrew) a:05Andrew→03None [21:13:18] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [21:13:53] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10377717 [21:14:06] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [21:14:33] !log raymond-ndibe@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component builds-api [21:14:36] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [21:14:48] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [21:14:57] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [21:15:08] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [21:15:21] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [21:15:52] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api [21:17:14] (03approved) 10raymond-ndibe: [maintain-harbor] avoid graphql query complexity error [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/39 (https://phabricator.wikimedia.org/T358225) [21:17:21] (03merge) 10raymond-ndibe: [maintain-harbor] avoid graphql query complexity error [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/39 (https://phabricator.wikimedia.org/T358225) [21:19:46] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: maintain-harbor: bump to 0.0.20-20241129041008-d8e4a15f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/624 [21:21:11] (03update) 10raymond-ndibe: maintain-harbor: bump to 0.0.20-20241129041008-d8e4a15f [repos/cloud/toolforge/toolforge-deploy] (toolforge-deploy-bug-fixes) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/624 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [21:21:19] (03update) 10raymond-ndibe: maintain-harbor: bump to 0.0.20-20241129041008-d8e4a15f [repos/cloud/toolforge/toolforge-deploy] (toolforge-deploy-bug-fixes) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/624 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [21:23:19] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor [21:24:43] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor [21:25:25] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor [21:26:40] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor [21:36:18] 06cloud-services-team, 10Cloud-VPS, 10WikiWho: Enable use of web proxy for wikiwho.net domain - https://phabricator.wikimedia.org/T376637#10377792 (10MusikAnimal) >>! In T376637#10375965, @Kkzo wrote: > I checked wikiwho.net domain today. It is now working and redirecting to https://wikiwho-api.wmcloud.org/.... [21:36:57] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [21:38:07] (03update) 10raymond-ndibe: maintain-harbor: bump to 0.0.20-20241129041008-d8e4a15f [repos/cloud/toolforge/toolforge-deploy] (toolforge-deploy-bug-fixes) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/624 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [21:38:32] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor [21:45:27] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor [21:46:23] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor [21:51:59] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor [21:53:13] (03approved) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [21:53:18] (03merge) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [21:53:19] (03update) 10raymond-ndibe: maintain-harbor: bump to 0.0.20-20241129041008-d8e4a15f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/624 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [21:53:30] (03approved) 10raymond-ndibe: maintain-harbor: bump to 0.0.20-20241129041008-d8e4a15f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/624 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [21:53:35] (03merge) 10raymond-ndibe: maintain-harbor: bump to 0.0.20-20241129041008-d8e4a15f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/624 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [21:54:34] (03update) 10raymond-ndibe: components-api: bump to 0.0.71-20241129083321-0a425581 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/629 (https://phabricator.wikimedia.org/T380706) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [21:55:30] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component main [21:55:35] !log raymond-ndibe@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component main [21:56:02] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor [22:03:06] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor [22:04:48] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor [22:11:42] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor [22:19:49] 06cloud-services-team, 10Cloud-VPS, 10WikiWho: Enable use of web proxy for wikiwho.net domain - https://phabricator.wikimedia.org/T376637#10377877 (10Kkzo) @MusikAnimal, you are right; I got confused because last time, the browser was automatically assigning the HTTPS automatically. We no longer maintain th... [22:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:35:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks