[03:11:26] FIRING: SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:41:33] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [04:11:26] FIRING: SystemdUnitDown: The service unit purge_vm_rbd_images.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:41:32] FIRING: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 161335 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [06:07:26] FIRING: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:07:36] 06cloud-services-team: SystemdUnitDown The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://phabricator.wikimedia.org/T386601 (10phaultfinder) 03NEW [10:07:26] FIRING: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:21:07] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-02-12 - https://phabricator.wikimedia.org/T386240#10556481 (10fnegri) Replication worked from 19:00 UTC on Friday until 03:40 UTC on Saturday, then it got stuck again: {F58411591} I didn't touch it ov... [10:28:35] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10556499 (10gmodena) > wkitech account renaming to your WMF account and attaching? > > That is me. Auth for Gmodena fails with `Verification failed.` after I provide an MFA token. > @Reedy... [10:29:54] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10556500 (10taavi) >>! In T376267#10556499, @gmodena wrote: > My user is now unable to edit or create pages on wikitech (meta works). Could it be related to SUL changes? Is your email addres... [10:41:44] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10556529 (10gmodena) >>! In T376267#10556500, @taavi wrote: >>>! In T376267#10556499, @gmodena wrote: >> My user is now unable to edit or create pages on wikitech (meta works). Could it be re... [11:00:07] 10Tool-erinnermich: [ErinnerMichBot] Possible support for other languages and projects? - https://phabricator.wikimedia.org/T384842#10556598 (10M-J) p:05Triage→03Medium a:03M-J hello, after some conversation with tkarcher in the background, here a possible roadmap: * add 2 more accounts RapelleMoiBot @ fr... [11:19:58] 06cloud-services-team, 10Data-Services: Enable binlog on Wiki Replicas - https://phabricator.wikimedia.org/T386618#10556684 (10Bugreporter) [13:25:27] (03merge) 10andrew: projects_eqiad1: remove andrewhooktest1 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/152 (https://phabricator.wikimedia.org/T386543) [13:26:54] (03update) 10andrew: tofu-infra: delete test projects [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/92 (owner: 10aborrero) [13:34:49] 06cloud-services-team, 10Data-Services: Enable binlog on Wiki Replicas - https://phabricator.wikimedia.org/T386618#10557066 (10fnegri) a:05fnegri→03None After some thinking, this is quite complicated: Wiki Replicas DBs contain //less// personal data, but they still contain //some//. We ensure that Toolforg... [13:36:28] (03update) 10andrew: tofu-infra: delete test projects [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/92 (owner: 10aborrero) [13:37:06] (03merge) 10andrew: tofu-infra: delete test projects [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/92 (owner: 10aborrero) [13:37:33] 06cloud-services-team, 10Data-Services: Enable binlog on Wiki Replicas - https://phabricator.wikimedia.org/T386618#10557078 (10taavi) Ihmo this is quite complicated and risky, and would be a rather large maintenance burden. I'd also hope that https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams_HTTP... [14:07:26] RESOLVED: SystemdUnitDown: The systemd unit purge_vm_rbd_images.service on node cloudcontrol1005 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:18:02] 06cloud-services-team, 10Data-Services: Enable binlog on Wiki Replicas - https://phabricator.wikimedia.org/T386618#10557150 (10fnegri) 05Open→03Declined > I'd also hope that https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams_HTTP_Service covers most of the use cases. I agree this seems an... [14:59:13] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-02-12 - https://phabricator.wikimedia.org/T386240#10557268 (10fnegri) Running `gdb` caused the replication to resume: ` fnegri@tools-db-5:~$ sudo gdb --batch --eval-command="set print frame-arguments a... [15:07:31] FIRING: ToolsToolsDBReplicationError: ToolsDB replication is broken on tools-db-5 (errno 1927) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationError [15:10:46] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 10Content-Transform-Team (Work In Progress), 07OKR-Work: If necessary, bump down quota for wikitextexp now that we've migrated from parsing-qa-02 -> ctt-qa-03 - https://phabricator.wikimedia.org/T386030#10557281 (10MSantos) [15:12:31] RESOLVED: ToolsToolsDBReplicationError: ToolsDB replication is broken on tools-db-5 (errno 1927) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationError [16:02:54] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-02-12 - https://phabricator.wikimedia.org/T386240#10557397 (10fnegri) tools-db-5 is now running MariaDB 10.6.20. I temporarily enabled additional logging (`SET GLOBAL log_warnings = 3;`), let's see if r... [16:05:08] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-02-12 - https://phabricator.wikimedia.org/T386240#10557401 (10fnegri) [16:15:38] (03update) 10raymond-ndibe: [toolforge-weld] add custom resources version to k8sclient [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/51 (https://phabricator.wikimedia.org/T359650) [16:22:41] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [16:36:29] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge, 13Patch-For-Review: [toolsdb] Remove apt pinning and upgrade to latest version - https://phabricator.wikimedia.org/T385885#10557459 (10fnegri) I merged the patch and the pinning is now working correctly: unattended-upgrades is not upgrading the package... [16:36:31] 06cloud-services-team, 10Cloud-VPS: petscan5 unresponsive - https://phabricator.wikimedia.org/T384642#10557460 (10Magnus) Working on it. [17:06:35] (03update) 10raymond-ndibe: [toolforge-weld] add custom resources version to k8sclient [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/51 (https://phabricator.wikimedia.org/T359650) [17:06:38] (03approved) 10raymond-ndibe: [toolforge-weld] add custom resources version to k8sclient [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/51 (https://phabricator.wikimedia.org/T359650) [17:06:54] (03merge) 10raymond-ndibe: [toolforge-weld] add custom resources version to k8sclient [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/51 (https://phabricator.wikimedia.org/T359650) [17:17:46] (03open) 10raymond-ndibe: [toolforge-weld] remove apply_object [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/73 (https://phabricator.wikimedia.org/T359804) [17:18:04] (03approved) 10raymond-ndibe: [toolforge-weld] remove apply_object [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/73 (https://phabricator.wikimedia.org/T359804) [17:18:44] (03merge) 10raymond-ndibe: [toolforge-weld] remove apply_object [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/73 (https://phabricator.wikimedia.org/T359804) [17:23:36] (03open) 10raymond-ndibe: d/changelog: bump to 1.6.7 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/74 (https://phabricator.wikimedia.org/T359650 https://phabricator.wikimedia.org/T359804) [17:25:22] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld [17:31:58] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld [17:32:24] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld [17:36:41] FIRING: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:40:50] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld [17:56:55] (03approved) 10raymond-ndibe: d/changelog: bump to 1.6.7 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/74 (https://phabricator.wikimedia.org/T359650 https://phabricator.wikimedia.org/T359804) [17:57:46] (03update) 10raymond-ndibe: d/changelog: bump to 1.6.7 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/74 (https://phabricator.wikimedia.org/T359650 https://phabricator.wikimedia.org/T359804) [17:57:51] (03merge) 10raymond-ndibe: d/changelog: bump to 1.6.7 [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/74 (https://phabricator.wikimedia.org/T359650 https://phabricator.wikimedia.org/T359804) [18:03:08] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [18:15:39] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [18:16:39] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [18:28:38] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [18:31:15] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/7 (owner: 10l10n-bot) [18:31:18] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/7 (owner: 10l10n-bot) [18:45:13] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [18:46:10] 10Tool-techcontribs: Add reporting on SAL entries - https://phabricator.wikimedia.org/T384113#10557685 (10taavi) In addition to the Phab field, you probably also want to search for the shell name, as that's what cookbooks, `scap` and other semi-automated logs end up using. (Also it'd be awesome if the tool would... [20:25:08] 10wikitech.wikimedia.org: Temporarily suppress SUL migration banner from Help:Toolforge pages on Wikitech - https://phabricator.wikimedia.org/T384534#10557822 (10Pppery) 05Open→03Declined This didn't happen. [20:51:01] RESOLVED: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 4385 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [21:31:41] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [21:36:41] FIRING: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:16:41] RESOLVED: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:20:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-78 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:25:03] FIRING: [4x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-27 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [22:30:03] FIRING: [5x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-27 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [23:34:12] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [23:37:47] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10558107 (10Gryllida) |**Wikitech account/LDAP:**| Svetlana Tkachenko| |**SUL account**| Gryllida| |**Account linked on [[ https://idm.wikimedia.org/ | IDM ]]** |N| |**I have visited [[ https... [23:42:01] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650)