[00:10:56] FIRING: SystemdUnitDown: The service unit logrotate.service is in failed status on host cloudgw1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudgw1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:23:30] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [00:27:54] (03update) 10raymond-ndibe: Draft: [maintain-harbor] add tests and configurations for new maintain-harbor jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/881 (https://phabricator.wikimedia.org/T360509) [00:32:46] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [00:37:38] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [00:50:59] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [01:01:11] RESOLVED: SystemdUnitDown: The service unit logrotate.service is in failed status on host cloudgw1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudgw1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:11:11] FIRING: [2x] SystemdUnitDown: The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:33:31] (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509) [04:05:57] FIRING: [3x] SystemdUnitDown: The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:06:02] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T400225#11029878 (10phaultfinder) [04:10:56] FIRING: [4x] SystemdUnitDown: The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:11:07] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T400225#11029879 (10phaultfinder) [08:11:12] FIRING: [4x] SystemdUnitDown: The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:06:21] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 06SRE-OnFire, 10Sustainability (Incident Followup): [ceph,codfw1dev] upgrade the hosts from pacific->quincy - https://phabricator.wikimedia.org/T400334 (10dcaro) 03NEW [10:09:00] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 06SRE-OnFire, 10Sustainability (Incident Followup): [ceph,codfw1dev] upgrade the hosts from pacific->quincy - https://phabricator.wikimedia.org/T400334#11030433 (10dcaro) [10:19:47] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T400225#11030439 (10dcaro) [10:19:52] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 06SRE-OnFire, 10Sustainability (Incident Followup): [ceph,codfw1dev] upgrade the hosts from pacific->quincy - https://phabricator.wikimedia.org/T400334#11030440 (10dcaro) [10:31:21] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 06SRE-OnFire, 10Sustainability (Incident Followup): [ceph,codfw1dev] upgrade the hosts from pacific->quincy - https://phabricator.wikimedia.org/T400334#11030444 (10dcaro) Doing this upgrade, the mons crashed, the error they shown was about using an old mon... [10:43:08] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 06SRE-OnFire, 10Sustainability (Incident Followup): [ceph,codfw1dev] upgrade the hosts from pacific->quincy - https://phabricator.wikimedia.org/T400334#11030459 (10dcaro) p:05Triage→03High [10:58:09] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 06SRE-OnFire, 10Sustainability (Incident Followup): [ceph,codfw1dev] upgrade the hosts from pacific->quincy - https://phabricator.wikimedia.org/T400334#11030499 (10dcaro) The client on 2004 keeps getting connection refused: ` 148677 connect(12, {sa_family=... [11:19:44] 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Large quota increase for zuul Cloud VPS project - https://phabricator.wikimedia.org/T400305#11030524 (10taavi) Is the plan to gradually unprovision resources from the integration project and spin up equivalent resources in zu... [11:54:49] 06cloud-services-team, 10Cloud-VPS: cloudgw1004 has a massive /var/log/conntrackd.log - https://phabricator.wikimedia.org/T400343 (10taavi) 03NEW p:05Triage→03High [11:59:38] 06cloud-services-team, 10Cloud-VPS: cloudgw1004 has a massive /var/log/conntrackd.log - https://phabricator.wikimedia.org/T400343#11030722 (10taavi) Something happened at the start of the log file that made it grow massive on certain days: `lang=shell-session taavi@cloudgw1004 ~ $ sudo awk '{print $3}' /var/lo... [12:18:53] 06cloud-services-team, 10Cloud-VPS: cloudgw1004 has a massive /var/log/conntrackd.log - https://phabricator.wikimedia.org/T400343#11030834 (10taavi) 05Open→03Resolved Truncated the file after having a brief look at the contents. [12:28:09] (03update) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/42 [12:28:25] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/6 [12:30:08] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/weapon-of-mass-description] - 10https://gerrit.wikimedia.org/r/1172307 (owner: 10L10n-bot) [12:30:11] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1172304 (owner: 10L10n-bot) [12:30:11] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/map-of-monuments] - 10https://gerrit.wikimedia.org/r/1172305 (owner: 10L10n-bot) [12:30:12] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1172303 (owner: 10L10n-bot) [12:35:56] FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:45:56] FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:50:56] RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:55:48] FIRING: PuppetDisabled: Puppet disabled on cloudcontrol2005-dev:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=wmcs&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [12:55:55] 06cloud-services-team: PuppetDisabled Puppet disabled on cloudcontrol2005-dev:9100 - https://phabricator.wikimedia.org/T400356 (10phaultfinder) 03NEW [13:00:48] FIRING: PuppetDisabled: Puppet disabled on cloudcontrol2010-dev:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=wmcs&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [13:00:57] 06cloud-services-team: PuppetDisabled Puppet disabled on cloudcontrol2010-dev:9100 - https://phabricator.wikimedia.org/T400357 (10phaultfinder) 03NEW [13:15:58] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 06SRE-OnFire, 10Sustainability (Incident Followup): [ceph,codfw1dev] upgrade the hosts from pacific->quincy - https://phabricator.wikimedia.org/T400334#11031085 (10dcaro) I was able to get the mon working by disabling cephx on the config, and only setting... [13:34:52] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment codfw1dev for service: project,nova [13:35:24] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment codfw1dev for service: project,nova [13:36:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [13:41:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-44 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [13:41:57] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 06SRE-OnFire, 10Sustainability (Incident Followup): [ceph,codfw1dev] upgrade the hosts from pacific->quincy - https://phabricator.wikimedia.org/T400334#11031179 (10dcaro) with this, I added a few of the config values back: ` root@cloudcephmon2004-dev:~# ce... [13:46:03] RESOLVED: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-38 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProce [13:51:59] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS: [cinder] Clean up unused linkwatcher volumes in "trove" project - https://phabricator.wikimedia.org/T400285#11031197 (10fnegri) Reading more carefully the terminal session, I realized I //did try to delete// the snapshot that later vanished: ` fnegri@cloudcon... [13:56:45] 06cloud-services-team, 10Toolforge: Investigate daily disconnections of IRC bots hosted in Toolforge - https://phabricator.wikimedia.org/T400223#11031228 (10taavi) I did check the obvious thing, and verified that we're not exhausting conntrack tables on the K8s workers or similar? Were are those timestamps fr... [13:58:33] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [14:03:33] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-32 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [14:35:24] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10linkwatcher: [cinder] Clean up unused linkwatcher volumes in "trove" project - https://phabricator.wikimedia.org/T400285#11031522 (10fnegri) 05Open→03Resolved I verified that both the volume `linkwatcher-db` and the instance with the same name in t... [14:37:14] 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10linkwatcher: [cinder,trove] Clean up unused linkwatcher instances and volumes - https://phabricator.wikimedia.org/T400285#11031531 (10fnegri) [14:41:14] 10Cloud-VPS (Project-requests): Request creation of voterlists VPS project - https://phabricator.wikimedia.org/T399418#11031563 (10fnegri) 05Stalled→03In progress [14:43:24] !log fnegri@cloudcumin1001 voterlists START - Cookbook wmcs.vps.create_project for project voterlists in eqiad1 (T399418) [14:43:26] fnegri@cloudcumin1001: Unknown project "voterlists" [14:43:26] T399418: Request creation of voterlists VPS project - https://phabricator.wikimedia.org/T399418 [14:43:58] !log fnegri@cloudcumin1001 voterlists END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for project voterlists in eqiad1 (T399418) [14:43:58] fnegri@cloudcumin1001: Unknown project "voterlists" [14:45:49] !log fnegri@cloudcumin1001 voterlists START - Cookbook wmcs.vps.create_project for project voterlists in eqiad1 (T399418) [14:45:49] fnegri@cloudcumin1001: Unknown project "voterlists" [14:46:03] 10Cloud-VPS (Project-requests): Request creation of voterlists VPS project - https://phabricator.wikimedia.org/T399418#11031591 (10fnegri) Creating with 32 cores and 64G of memory, which should be enough for 2 instances of type `g4.cores16.ram32.disk20`. ` fnegri@cloudcumin1001:~$ sudo cookbook wmcs.vps.create_... [14:46:28] (03open) 10group_199_bot_333a6c67971a471aeb1cf0b14ccf9f49: projects: added project voterlists [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/258 (https://phabricator.wikimedia.org/T399418) [14:46:58] !log fnegri@cloudcumin1001 voterlists END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for project voterlists in eqiad1 (T399418) [14:46:58] fnegri@cloudcumin1001: Unknown project "voterlists" [14:54:17] (03update) 10raymond-ndibe: api: allow protocol to be specified for ports [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/186 (owner: 10dcaro) [14:56:58] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/258 [14:57:18] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/258 [14:57:32] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/258 [14:57:51] !log fnegri@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/258 [15:06:34] 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Large quota increase for zuul Cloud VPS project - https://phabricator.wikimedia.org/T400305#11031704 (10Andrew) Ceph looks to have plenty of space so this is a +1 from me. [15:11:04] 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Large quota increase for zuul Cloud VPS project - https://phabricator.wikimedia.org/T400305#11031728 (10fnegri) +1 [15:31:00] 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Large quota increase for zuul Cloud VPS project - https://phabricator.wikimedia.org/T400305#11031833 (10bd808) >>! In T400305#11030524, @taavi wrote: > Is the plan to gradually unprovision resources from the integration proje... [15:35:48] FIRING: PuppetDisabled: Puppet disabled on cloudcontrol2006-dev:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=wmcs&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [15:35:59] 06cloud-services-team: PuppetDisabled Puppet disabled on cloudcontrol2006-dev:9100 - https://phabricator.wikimedia.org/T400381 (10phaultfinder) 03NEW [15:39:49] (03approved) 10dcaro: projects: added project voterlists [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/258 (https://phabricator.wikimedia.org/T399418) (owner: 10group_199_bot_333a6c67971a471aeb1cf0b14ccf9f49) [15:41:23] (03merge) 10fnegri: projects: added project voterlists [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/258 (https://phabricator.wikimedia.org/T399418) (owner: 10group_199_bot_333a6c67971a471aeb1cf0b14ccf9f49) [15:42:01] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [15:44:15] !log fnegri@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [15:44:59] !log fnegri@cloudcumin1001 voterlists START - Cookbook wmcs.vps.create_project for project voterlists in eqiad1 (T399418) [15:44:59] fnegri@cloudcumin1001: Unknown project "voterlists" [15:45:00] T399418: Request creation of voterlists VPS project - https://phabricator.wikimedia.org/T399418 [15:46:07] !log fnegri@cloudcumin1001 voterlists END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for project voterlists in eqiad1 (T399418) [15:46:07] fnegri@cloudcumin1001: Unknown project "voterlists" [15:52:06] (03PS1) 10FNegri: create_project: add option to skip tofu apply [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1172350 [15:53:18] (03PS2) 10FNegri: create_project: add option to skip tofu apply [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1172350 [15:53:21] !log fnegri@cloudcumin1001 voterlists START - Cookbook wmcs.vps.create_project for project voterlists in eqiad1 (T399418) [15:53:26] T399418: Request creation of voterlists VPS project - https://phabricator.wikimedia.org/T399418 [15:54:07] !log fnegri@cloudcumin1001 voterlists END (PASS) - Cookbook wmcs.vps.create_project (exit_code=0) for project voterlists in eqiad1 (T399418) [15:54:55] 06cloud-services-team, 10Toolforge: Investigate daily disconnections of IRC bots hosted in Toolforge - https://phabricator.wikimedia.org/T400223#11031942 (10Danilo) Yes, the timestamps are from IRC side. wmopbot and stashbot disconnect daily always between 3h41min and 3h43min. The last times were: 2025-07-18... [15:55:01] 10Cloud-VPS (Project-requests), 13Patch-For-Review: Request creation of voterlists VPS project - https://phabricator.wikimedia.org/T399418#11031943 (10fnegri) 05In progress→03Resolved [15:57:21] 06cloud-services-team, 10Toolforge, 06SRE-OnFire, 10Sustainability (Incident Followup): [k8s,infra,o11y] Add paging alert when many tools are unreachable - https://phabricator.wikimedia.org/T399870#11031974 (10fnegri) [16:11:37] 10Toolforge (Toolforge iteration 22): [toolforge-deploy] account for warning messages printed to stderr - https://phabricator.wikimedia.org/T400390 (10Raymond_Ndibe) 03NEW [16:11:46] 10Toolforge (Toolforge iteration 22): [toolforge-deploy.tests] account for warning messages printed to stderr - https://phabricator.wikimedia.org/T400390#11032060 (10Raymond_Ndibe) [16:13:50] (03open) 10raymond-ndibe: [tests] account for warning messages printed to stderr [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/883 (https://phabricator.wikimedia.org/T400390) [16:13:55] (03update) 10raymond-ndibe: [tests] account for warning messages printed to stderr [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/883 (https://phabricator.wikimedia.org/T400390) [17:36:47] 10Tool-erinnermich: [ErinnerMichBot] Query current page title before posting reminder - https://phabricator.wikimedia.org/T381563#11032387 (10Tkarcher) 05In progress→03Resolved Fixed in production. [17:39:02] 10Tool-wosretbot: [WosretBot] Gazette stops working when Kurier section has no date in signature - https://phabricator.wikimedia.org/T393116#11032395 (10Tkarcher) 05Open→03Resolved a:03Tkarcher Fixed in production. [17:41:12] (03update) 10vriaa: Draft: Basic banner implementation [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/1 [17:47:36] (03update) 10raymond-ndibe: api: allow protocol to be specified for ports [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/186 (owner: 10dcaro) [18:05:41] 06cloud-services-team, 10Toolforge: [k8s,infra] k8s control plane freezing and other stability issues - https://phabricator.wikimedia.org/T333922#11032429 (10taavi) 05Open→03Resolved [18:06:52] 06cloud-services-team, 10Toolforge: Toolforge: potential improvements for labs/toollabs.git - https://phabricator.wikimedia.org/T279308#11032431 (10taavi) 05Open→03Resolved [18:08:45] 06cloud-services-team, 10Toolforge, 07Kubernetes: toolforge: new k8s: evalute and test firewalling via calico - https://phabricator.wikimedia.org/T239406#11032435 (10taavi) 05Stalled→03Resolved We have some calico network policies in use today and we've seen them work. This task is not very specific... [18:09:57] 06cloud-services-team, 10Toolforge, 07Upstream: Debian Stretch lighttpd does not allow overriding existing mimetype.assign values - https://phabricator.wikimedia.org/T215683#11032439 (10taavi) 05Stalled→03Invalid I don't see anything actionable here. [18:12:05] 06cloud-services-team, 10Toolforge: [toolforge.infra] Automate getting the maintainers from the tool accounts/uids - https://phabricator.wikimedia.org/T114560#11032446 (10taavi) 05Open→03Invalid This can be done with a single `id` or `getent group` call so I don't see how to automate this more than that. [18:16:11] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11032454 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host clouddb1022.eqiad.wmnet with OS bookworm [18:20:23] 06cloud-services-team, 10Toolforge: [jobs-api] add -l|--last to toolforge jobs logs ... - https://phabricator.wikimedia.org/T388088#11032500 (10taavi) 05Open→03Invalid `lang=shell-session taavi@tools-bastion-12:~ $ toolforge jobs logs --help | grep last -l LAST, --last LAST number of recent log line... [18:24:25] 10Data-Services, 06Data-Engineering: Create a view for existencelinks table - https://phabricator.wikimedia.org/T394898#11032509 (10taavi) a:03Ladsgroup https://gerrit.wikimedia.org/r/c/operations/puppet/+/1163846 seems to have done this. [18:24:30] 10Data-Services, 06Data-Engineering: Create a view for existencelinks table - https://phabricator.wikimedia.org/T394898#11032511 (10taavi) 05Open→03Resolved [18:25:57] 06cloud-services-team, 06DBA, 07Chinese-Sites, 13Patch-For-Review: Prepare and check storage layer for arbcom_zhwiki - https://phabricator.wikimedia.org/T381086#11032516 (10taavi) Unatgging #data-services, as this will be a private wiki. [18:26:44] 06cloud-services-team, 10Data-Services: [wikireplicas] Automatically check for missing tables - https://phabricator.wikimedia.org/T378470#11032519 (10taavi) Is https://gerrit.wikimedia.org/r/c/operations/puppet/+/1163846 good enough to declare this done, or do we still want a view of things marked as "partiall... [18:27:55] 06cloud-services-team, 10Data-Services, 06Abstract Wikipedia team, 10Wikifunctions: WikiLambda tables are not replicated to cloud - https://phabricator.wikimedia.org/T372058#11032522 (10taavi) 05Open→03Resolved a:03Ladsgroup Done via https://gerrit.wikimedia.org/r/c/operations/puppet/+/1163846. [18:28:18] 06cloud-services-team, 10Data-Services: Create views for DiscussionTools items tables - https://phabricator.wikimedia.org/T374584#11032527 (10taavi) 05Open→03Resolved a:03Ladsgroup Done in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1163846. [18:28:57] 06cloud-services-team, 10Data-Services: expose entityschema_id_counter table to cloud replica - https://phabricator.wikimedia.org/T345089#11032531 (10taavi) 05Open→03Resolved a:03Ladsgroup Done via https://gerrit.wikimedia.org/r/c/operations/puppet/+/1163846. [18:52:10] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11032596 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host clouddb1022.eqiad.wmnet with OS bookworm executed with... [18:54:17] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11032607 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host clouddb1022.eqiad.wmnet with OS bookworm [19:13:48] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/6 (owner: 10l10n-bot) [19:13:51] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/6 (owner: 10l10n-bot) [19:42:48] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11032733 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host clouddb1022.eqiad.wmnet with OS bookworm executed with... [20:32:34] (03open) 10raymond-ndibe: [T400024] Allow protocol to be specified for ports [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [20:36:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment codfw1dev for service: project,nova [20:37:43] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment codfw1dev for service: project,nova [20:38:36] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment codfw1dev for all services [20:40:01] (03update) 10raymond-ndibe: [T400024] Allow protocol to be specified for ports [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [20:42:09] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [20:42:19] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment codfw1dev for all services [20:42:38] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [20:42:55] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [21:03:37] 10Data-Services, 06Data-Engineering: Create a view for existencelinks table - https://phabricator.wikimedia.org/T394898#11032974 (10Tacsipacsi) Indeed, thanks for noticing this! I’m still eagerly waiting for the API, but at least there is now a workaround for WMF production wikis. [21:05:24] 10Tool-wosretbot: [WosretBot] Gazette stops working when subscribers become inactive - https://phabricator.wikimedia.org/T393114#11032981 (10Tkarcher) 05Open→03Resolved a:03Tkarcher Fixed. [21:06:18] (03close) 10damian: [T400024] Allow protocol to be specified with port [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/113 [21:25:29] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance toolsbeta-harbor-2 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [22:15:28] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11033199 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host clouddb1022.eqiad.wmnet with OS bookworm [23:04:17] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11033279 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host clouddb1022.eqiad.wmnet with OS bookworm executed with... [23:11:57] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install clouddb102[2-5] - https://phabricator.wikimedia.org/T393733#11033292 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host clouddb1022.eqiad.wmnet with OS bookworm [23:39:29] FIRING: NfsAlmostFull: The NFS drive is over 85% capacity (currently 88.58%) at host paws-nfs-1 in project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DNfsAlmostFull