[01:10:00] FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [03:12:10] FIRING: ProjectProxyMainProxyDown: Proxy on proxy-04 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProjectProxyMainProxyDown [04:00:44] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 13Patch-For-Review, 07Puppet: Puppet removed "nameserver" line from /etc/resolv.conf - https://phabricator.wikimedia.org/T379927#10351489 (10fnegri) 05Resolved→03Open p:05Triage→03High This has just caused a WMCS proxy outage, beca... [04:15:40] RESOLVED: ProjectProxyMainProxyDown: Proxy on proxy-04 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProjectProxyMainProxyDown [04:51:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:01:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:10:00] FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [07:21:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:31:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:59:00] (03merge) 10dcaro: deployment: cleanup old deployment on deployment creation [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/39 (https://phabricator.wikimedia.org/T380283) [09:02:54] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: components-api: bump to 0.0.69-20241125085913-80897cab [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/619 (https://phabricator.wikimedia.org/T380283) [09:10:00] FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [09:16:50] 10Tools: versions.toolforge.org is down - https://phabricator.wikimedia.org/T380703 (10Urbanecm_WMF) 03NEW [09:19:26] (03open) 10sstefanova: toolforge client: fix wrong user agent [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/69 [09:20:28] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [09:21:24] (03update) 10sstefanova: cli: add deploy-token subcommands [repos/cloud/toolforge/components-cli] (slavina/add-config-subcommands) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/3 (https://phabricator.wikimedia.org/T379091) [09:21:37] (03update) 10sstefanova: cli: add config subcommands [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/2 (https://phabricator.wikimedia.org/T379091) [09:21:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:26:55] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [09:31:28] (03approved) 10dcaro: components-api: bump to 0.0.69-20241125085913-80897cab [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/619 (https://phabricator.wikimedia.org/T380283) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [09:31:32] (03merge) 10dcaro: components-api: bump to 0.0.69-20241125085913-80897cab [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/619 (https://phabricator.wikimedia.org/T380283) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [09:31:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:32:26] 10Toolforge (Toolforge iteration 16): [components-api] Limit the amount of deployments to (say) 25 - https://phabricator.wikimedia.org/T380283#10351796 (10dcaro) 05In progress→03Resolved [09:33:59] 10Toolforge (Toolforge iteration 16): [components-api] deploy-token: separate create from update - https://phabricator.wikimedia.org/T380706 (10Slst2020) 03NEW [09:43:40] (03update) 10dcaro: deployment list response: update format [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/42 (owner: 10sstefanova) [09:43:56] (03approved) 10dcaro: deployment list response: update format [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/42 (owner: 10sstefanova) [09:46:54] (03update) 10sstefanova: deployment list response: update format [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/42 [09:46:58] (03merge) 10sstefanova: deployment list response: update format [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/42 [09:47:40] 10Tools: versions.toolforge.org is down - https://phabricator.wikimedia.org/T380703#10351951 (10Lucas_Werkmeister_WMDE) The `error.log` shows various errors, but the “root” errors seem to be: ` 2024-11-25 09:17:21: (mod_fastcgi.c.421) FastCGI-stderr: PHP Warning: file_get_contents(https://noc.wikimedia.org/con... [09:48:58] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: components-api: bump to 0.0.70-20241125094709-8e874f4a [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/620 [09:54:40] 10Toolforge (Toolforge iteration 16): [components-api] deploy-token: separate create from update - https://phabricator.wikimedia.org/T380706#10351961 (10Slst2020) a:03Slst2020 [10:01:35] (03approved) 10dcaro: components-api: bump to 0.0.70-20241125094709-8e874f4a [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/620 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [10:04:08] (03approved) 10dcaro: toolforge client: fix wrong user agent [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/69 (owner: 10sstefanova) [10:07:57] (03update) 10dcaro: [toolforge-deploy] deploy maintain-harbor [repos/cloud/toolforge/toolforge-deploy] (admin_and_tools_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/563 (https://phabricator.wikimedia.org/T358225) (owner: 10raymond-ndibe) [10:13:17] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org - https://phabricator.wikimedia.org/T374830#10352028 (10Nikerabbit)... [10:14:10] (03update) 10sstefanova: toolforge client: fix wrong user agent [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/69 [10:14:13] (03merge) 10sstefanova: toolforge client: fix wrong user agent [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/69 [10:21:20] 10Cloud-Services, 06DC-Ops, 06Infrastructure-Foundations, 10netops, and 2 others: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503#10352059 (10cmooney) Link has been clean since the optic was replaced: {F57745141 width=600} I'll sug... [10:24:40] (03update) 10aborrero: eqiad1: add IPv6 support [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/132 (https://phabricator.wikimedia.org/T380174) [10:25:26] (03update) 10aborrero: eqiad1: add IPv6 support [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/132 (https://phabricator.wikimedia.org/T380174) [10:28:57] (03merge) 10aborrero: eqiad1: add IPv6 support [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/132 (https://phabricator.wikimedia.org/T380174) [10:28:58] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [10:29:12] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org - https://phabricator.wikimedia.org/T374830#10352082 (10Nikerabbit) 05R... [10:29:18] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [10:31:14] (03open) 10aborrero: eqiad1: network: use segmentation id 10 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/137 [10:32:12] (03open) 10sstefanova: deploy-token: split create/refresh into POST/PUT endpoints [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/43 (https://phabricator.wikimedia.org/T380706) [10:32:20] (03merge) 10aborrero: eqiad1: network: use segmentation id 10 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/137 [10:32:26] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [10:33:56] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [10:35:16] (03update) 10sstefanova: components-api: bump to 0.0.70-20241125094709-8e874f4a [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/620 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [10:35:33] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [10:38:00] (03open) 10aborrero: eqiad1: subnets: fix allocation pools [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/138 [10:39:23] (03merge) 10aborrero: eqiad1: subnets: fix allocation pools [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/138 [10:39:26] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [10:39:39] 10Tools: versions.toolforge.org is down - https://phabricator.wikimedia.org/T380703#10352130 (10Lucas_Werkmeister_WMDE) Doesn’t seem to have helped :/ [10:39:48] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org - https://phabricator.wikimedia.org/T374830#10352133 (10hashar) Those fa... [10:40:12] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [10:40:14] 10Tools: versions.toolforge.org is down - https://phabricator.wikimedia.org/T380703#10352136 (10Reedy) I believe there's some DNS/resolver issues in cloud ongoing [10:40:16] !log sstefanova@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api [10:42:11] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [10:42:36] !log sstefanova@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api [10:43:22] 10Tools: versions.toolforge.org is down - https://phabricator.wikimedia.org/T380703#10352153 (10LucasWerkmeister) (Switching hats, sorry.) I can definitely reproduce the issue within the running webservice’s pod: `lang=shell-session tools.versions@tools-bastion-13:~$ kubectl get pods NAME... [10:45:17] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org - https://phabricator.wikimedia.org/T374830#10352159 (10Lucas_Werkmeister... [10:51:12] FIRING: HarborDown: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [10:52:11] FIRING: HarborComponentDown: No data about Harbor components found. #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborComponentDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborComponentDown [10:52:12] FIRING: HarborDown: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [10:52:51] (03merge) 10aborrero: eqiad1: revert IPv6 network changes [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/139 (https://phabricator.wikimedia.org/T380174) [10:53:05] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [11:11:44] 10Toolforge (Toolforge iteration 16), 13Patch-For-Review: [components-api] deploy-token: separate create from update - https://phabricator.wikimedia.org/T380706#10352295 (10dcaro) p:05Triage→03Medium [11:16:22] RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:23:37] 10Tools: versions.toolforge.org is down - https://phabricator.wikimedia.org/T380703#10352342 (10Urbanecm_WMF) Thanks! Seems to be working now. [11:31:24] 10Cloud-Services, 06DC-Ops, 06Infrastructure-Foundations, 10netops, and 2 others: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503#10352382 (10cmooney) Ok the BGP downpref policy has been reverted, and we have routed traffic back runn... [11:47:02] 06cloud-services-team, 10Cloud-VPS, 07IPv6: openstack: network problems when introducing new dualstack and ipv4-only networks - https://phabricator.wikimedia.org/T380728 (10aborrero) 03NEW [11:49:30] 06cloud-services-team, 10Cloud-VPS, 07IPv6: openstack: network problems when introducing new dualstack and ipv4-only networks - https://phabricator.wikimedia.org/T380728#10352435 (10aborrero) 05Open→03In progress p:05Triage→03Medium [11:50:17] 06cloud-services-team, 10Cloud-VPS, 07IPv6: openstack: network problems when introducing new networks - https://phabricator.wikimedia.org/T380728#10352453 (10aborrero) [12:03:59] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 13Patch-For-Review: tofu-infra: refactor repo structure - https://phabricator.wikimedia.org/T375283#10352499 (10fnegri) a:03fnegri [12:09:40] (03open) 10aborrero: codfw1dev: tools-codfw1dev: manage default security group [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/140 (https://phabricator.wikimedia.org/T380728) [12:11:16] (03merge) 10aborrero: codfw1dev: tools-codfw1dev: manage default security group [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/140 (https://phabricator.wikimedia.org/T380728) [12:11:25] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [12:11:55] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [12:24:26] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [12:29:08] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [12:29:17] 10Toolforge (Toolforge iteration 16), 13Patch-For-Review: [components-api] deploy-token: separate create from update - https://phabricator.wikimedia.org/T380706#10352592 (10Slst2020) 05Open→03In progress [12:30:32] (03merge) 10sstefanova: components-api: bump to 0.0.70-20241125094709-8e874f4a [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/620 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [12:31:01] 10Toolforge (Toolforge iteration 16): [components-api] Add feature flag to disable user endpoints for deployment in tools - https://phabricator.wikimedia.org/T378500#10352627 (10Slst2020) Is this still needed? The 1.28 upgrade went ahead as planned. [12:31:04] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Migrate "db.svc.eqiad.wmflabs." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380491#10352630 (10fnegri) Successfully migrated with the following commands: ` fnegri@cloudcontrol1005:~$ sudo wmcs-openstack zone transfer request creat... [12:32:49] 10PAWS: openrefine in PAWS fails silently to upload new WD item - https://phabricator.wikimedia.org/T380737 (10So9q) 03NEW [12:34:22] 10PAWS: openrefine in PAWS fails silently to upload new WD item - https://phabricator.wikimedia.org/T380737#10352671 (10So9q) [12:37:08] (03open) 10fnegri: Import zone db.svc.eqiad.wmflabs [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/141 (https://phabricator.wikimedia.org/T380491) [12:38:02] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 13Patch-For-Review: Migrate "db.svc.eqiad.wmflabs." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380491#10352672 (10fnegri) 05In progress→03Resolved [12:39:21] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 13Patch-For-Review: Migrate "db.svc.eqiad.wmflabs." DNS zone to cloudinfra project - https://phabricator.wikimedia.org/T380491#10352677 (10fnegri) 05Resolved→03In progress Re-opening as I also want to import them to tofu-infra. [12:40:27] (03update) 10fnegri: Import zone db.svc.eqiad.wmflabs [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/141 (https://phabricator.wikimedia.org/T380491) [12:41:26] (03approved) 10aborrero: Import zone db.svc.eqiad.wmflabs [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/141 (https://phabricator.wikimedia.org/T380491) (owner: 10fnegri) [12:42:48] (03merge) 10fnegri: Import zone db.svc.eqiad.wmflabs [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/141 (https://phabricator.wikimedia.org/T380491) [12:46:11] (03open) 10sstefanova: d/changelog: bump to 0.0.12 [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/70 [12:52:09] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli [12:56:36] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [12:57:05] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli [12:57:07] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [12:59:17] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli [13:01:41] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 16), 05Goal: [toolsdb] Upgrade to MariaDB 10.6 - https://phabricator.wikimedia.org/T352206#10352778 (10fnegri) [13:04:25] (03open) 10fnegri: Move toolsdb primary host to tools-db-4 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/142 [13:05:43] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli [13:06:35] (03update) 10sstefanova: d/changelog: bump to 0.0.12 [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/70 [13:10:00] FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [13:10:33] (03approved) 10sstefanova: d/changelog: bump to 0.0.12 [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/70 [13:10:37] (03merge) 10sstefanova: d/changelog: bump to 0.0.12 [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/70 [13:17:07] 10Tools, 10Wikidata, 10Wikidata Integration in Wikimedia projects: Listeria no longer translates property numbers to labels in the local language for Wikidatalist headings - https://phabricator.wikimedia.org/T247047#10352832 (10Ifeatu_Nnaobi_WMDE) [13:28:53] (03merge) 10fnegri: Move toolsdb primary host to tools-db-4 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/142 [13:29:02] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [13:30:48] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [13:32:33] 06cloud-services-team, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Kernel error Server an-redacteddb1001 may have kernel errors - https://phabricator.wikimedia.org/T379571#10352935 (10Gehel) [13:35:28] 06cloud-services-team, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Kernel error Server an-redacteddb1001 may have kernel errors - https://phabricator.wikimedia.org/T379571#10352982 (10BTullis) a:03BTullis [13:35:35] 06cloud-services-team, 10Toolforge: add on-wiki edits of toolforge tools to toolviews report - https://phabricator.wikimedia.org/T317953#10352987 (10Raymond_Ndibe) a:03Raymond_Ndibe [13:39:53] 06cloud-services-team, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Kernel error Server an-redacteddb1001 may have kernel errors - https://phabricator.wikimedia.org/T379571#10353007 (10BTullis) 05Open→03Resolved I have removed the remaining resources manually. Some had been removed anyway by `purge =... [13:40:31] FIRING: ToolsToolsDBWritableState: There should be exactly one writable MariaDB instance instead of 0 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState [13:41:29] 10PAWS: openrefine in PAWS fails silently to upload new WD item - https://phabricator.wikimedia.org/T380737#10353019 (10rook) @Spinster any thoughts on this? [13:41:31] FIRING: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-3 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [13:41:56] FIRING: SystemdUnitDown: The service unit disable-tool.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:43:25] 06cloud-services-team, 10Cloud-VPS, 07IPv6: openstack: network problems when introducing new networks - https://phabricator.wikimedia.org/T380728#10353028 (10aborrero) I detected a few inconsistencies in the network testing scripts, I will fix them. Among others, I will use the `vlanX120.cloudgwYYYY. !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [13:48:36] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [13:55:13] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for main branch [13:55:36] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for main branch [14:00:00] RESOLVED: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [14:01:25] 10PAWS: openrefine in PAWS fails silently to upload new WD item - https://phabricator.wikimedia.org/T380737#10353107 (10Spinster) >>! In T380737#10353018, @rook wrote: > @Spinster any thoughts on this? I just tested the exact above process with OpenRefine on PAWS and I can successfully create an item (i.e. I ca... [14:03:14] 06cloud-services-team, 10Cloud-VPS, 07IPv6: dns: add PTR support for 2a02:ec80:a000:: - https://phabricator.wikimedia.org/T380746 (10aborrero) 03NEW [14:06:31] FIRING: [2x] ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-1 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [14:06:56] RESOLVED: SystemdUnitDown: The service unit disable-tool.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:11:31] RESOLVED: [2x] ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-1 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [14:15:31] RESOLVED: ToolsToolsDBWritableState: There should be exactly one writable MariaDB instance instead of 0 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState [14:16:31] FIRING: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-3 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [14:21:31] RESOLVED: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-3 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [14:33:05] (03open) 10fnegri: Update tools-db DNS import id [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/143 (https://phabricator.wikimedia.org/T352206) [14:40:36] (03merge) 10fnegri: Update tools-db DNS import id [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/143 (https://phabricator.wikimedia.org/T352206) [14:40:46] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [14:41:41] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [14:46:53] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for main branch [14:47:15] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for main branch [14:49:26] (03open) 10fnegri: tools-db DNS: update comments and descriptions [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/144 (https://phabricator.wikimedia.org/T352206) [14:50:28] FIRING: [2x] InstanceDown: Project tools instance tools-db-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:50:41] FIRING: CloudVPSDesignateLeaks: Detected 10 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:55:28] RESOLVED: [2x] InstanceDown: Project tools instance tools-db-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:58:57] (03update) 10fnegri: Move toolsdb primary host to tools-db-4 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/142 (https://phabricator.wikimedia.org/T352206) [15:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 10 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:04:01] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 13Patch-For-Review, 07Puppet: Puppet removed "nameserver" line from /etc/resolv.conf - https://phabricator.wikimedia.org/T379927#10354356 (10Andrew) From Gerrit, @dcaro writes: > > Did a quick test, there's three functions we use to res... [17:04:06] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 13Patch-For-Review, 07Puppet: Puppet removed "nameserver" line from /etc/resolv.conf - https://phabricator.wikimedia.org/T379927#10354358 (10fnegri) a:05fnegri→03Andrew Assigning this task to @Andrew as he's currently working on a patch. [17:13:06] 10Toolforge (Toolforge iteration 16): [components-api] Add feature flag to disable user endpoints for deployment in tools - https://phabricator.wikimedia.org/T378500#10354382 (10dcaro) 05Open→03Resolved I think it's not needed anymore yep [17:30:36] 10Tool-wikiqanda, 06Future-Audiences: Output UX for internal release - https://phabricator.wikimedia.org/T380098#10354452 (10Maryana) [18:01:12] 10Tool-wikiqanda, 06Future-Audiences: Output UX for internal release - https://phabricator.wikimedia.org/T380098#10354623 (10Maryana) 05Open→03Resolved a:03Maryana [18:02:14] 10Tool-video-answer-tool, 06Future-Audiences: Change default audio output to 1.25x - https://phabricator.wikimedia.org/T379307#10354628 (10Maryana) 05Open→03Resolved a:03Maryana [18:03:26] 10Tool-wikiqanda, 06Future-Audiences, 07Epic: Email ITS - https://phabricator.wikimedia.org/T379792#10354634 (10Maryana) 05Open→03Resolved [18:04:53] 10Tool-wikiqanda, 06Future-Audiences: Slack version of bad response logging - https://phabricator.wikimedia.org/T380216#10354660 (10etz) a:03etz [18:17:46] 10Tool-wikiqanda, 06Future-Audiences: Add Slack Support to Bot - https://phabricator.wikimedia.org/T379786#10354707 (10Maryana) Still needed: getting scope down to what we want in terms of what's being tracked Will be written to CSV file for Slack & Discord separately for reactions. To discuss later today: ho... [18:18:04] 10Tool-wikiqanda, 06Future-Audiences: Add Slack Support to Bot - https://phabricator.wikimedia.org/T379786#10354732 (10Maryana) Deprioritize Discord-related work for next couple of weeks [18:22:42] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 13Patch-For-Review, 07Puppet: Puppet removed "nameserver" line from /etc/resolv.conf - https://phabricator.wikimedia.org/T379927#10354753 (10Andrew) Nameserver is missing from the following hosts: cn-staging-1.centralnotice-staging.eqiad1... [19:17:51] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10355015 (10Khantstop) |**Wikitech account/LDAP:**| khantstop| |**SUL account**| khantstop| |**Account linked on [[ https://idm.wikimedia.org/ | IDM ]]** |Y| |**I have visited [[ https://wiki... [19:19:45] 06cloud-services-team, 10Cloud-VPS, 10Cumin, 06Infrastructure-Foundations: Revive the HostFile backend on cloudcuminXXXX - https://phabricator.wikimedia.org/T380789 (10Andrew) 03NEW [19:20:09] 06cloud-services-team, 10Cloud-VPS, 10Cumin, 06Infrastructure-Foundations: Revive the HostFile backend on cloudcuminXXXX - https://phabricator.wikimedia.org/T380789#10355031 (10Andrew) p:05Triage→03Medium [19:36:06] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10355074 (10Reedy) ` reedy@deploy2002:~$ mwscript extensions/CentralAuth/maintenance/createLocalAccount.php --wiki=labswiki "Khantstop" DEPRECATION WARNING: Maintenance scripts are moving to... [21:21:47] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10355325 (10Khantstop) @Reedy it works, I was able to login and change my password. Many thanks! [21:24:41] (03open) 10rook: Add pawsdev to codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/145 [21:25:37] 10PAWS: pawsdev in codfw1dev - https://phabricator.wikimedia.org/T380794 (10rook) 03NEW [21:26:12] (03update) 10rook: Add pawsdev to codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/145 (https://phabricator.wikimedia.org/T380794) [21:36:25] 10PAWS: pawsdev in codfw1dev - https://phabricator.wikimedia.org/T380794#10355389 (10rook) https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/145 [22:19:11] 10Tool-wikiqanda, 06Future-Audiences: [Bug] Investigate issues from internal testing - https://phabricator.wikimedia.org/T380799 (10Maryana) 03NEW [22:22:12] 10Tool-wikiqanda, 06Future-Audiences: Flag incorrect answer (internal testing version) - https://phabricator.wikimedia.org/T378821#10355519 (10Maryana) @etz from testing today: * Display total number of responses sent to users (in order to be able to approximate % of different react responses) [23:09:19] 10Tool-wikiqanda, 06Future-Audiences: [Bug] Investigate issues from internal testing - https://phabricator.wikimedia.org/T380799#10355641 (10Maryana) [23:14:37] (03CR) 10Andrew Bogott: Get openstack project list from keystone (031 comment) [labs/tools/stashbot] - 10https://gerrit.wikimedia.org/r/1093997 (https://phabricator.wikimedia.org/T379030) (owner: 10Andrew Bogott) [23:15:41] (03CR) 10Andrew Bogott: openstack: ensure_canary: Use new g4 flavor for canary instances (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1043149 (owner: 10Andrew Bogott) [23:22:58] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: Repurpose 5 config B servers - https://phabricator.wikimedia.org/T380805 (10Andrew) 03NEW