[01:21:27] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:07:18] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [03:50:32] (03PS1) 10Krinkle: write_config: Add design/codex-php to Codesearch "Libraries" index [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1079381 (https://phabricator.wikimedia.org/T373708) [03:52:16] (03CR) 10Krinkle: Add design/* and discovery/* repo groups (031 comment) [labs/codesearch] - 10https://gerrit.wikimedia.org/r/828055 (owner: 10Chad) [03:58:07] (03CR) 10Krinkle: [C:03+2] write_config: Add design/codex-php to Codesearch "Libraries" index [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1079381 (https://phabricator.wikimedia.org/T373708) (owner: 10Krinkle) [03:58:35] (03Abandoned) 10Krinkle: New topic: Add a top level Wikimedia SRE tools topic [labs/codesearch] - 10https://gerrit.wikimedia.org/r/773781 (https://phabricator.wikimedia.org/T303434) (owner: 10Jbond) [03:58:59] (03Merged) 10jenkins-bot: write_config: Add design/codex-php to Codesearch "Libraries" index [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1079381 (https://phabricator.wikimedia.org/T373708) (owner: 10Krinkle) [04:22:05] 10Tool-toolwatch: Implementing alert system to notify maintainers of downtime - https://phabricator.wikimedia.org/T368816#10220476 (10Reputation22) >>! In T368816#10219107, @MahimaSinghal wrote: >>>! In T368816#10210400, @Tacsipacsi wrote: >> Who will receive the notification emails? The //Author// column contai... [04:37:31] (03update) 10raymond-ndibe: Draft: [lima-kilo] cache container images [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/196 [05:21:27] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:43:35] (03open) 10raymond-ndibe: [lima-kilo] refactor the project to suit a multi VM configuration [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/198 [05:45:04] (03update) 10raymond-ndibe: [lima-kilo] cache container images [repos/cloud/toolforge/lima-kilo] (refactor_in_preparation_for_cache) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/196 [06:00:41] (03update) 10raymond-ndibe: [lima-kilo] cache container images [repos/cloud/toolforge/lima-kilo] (refactor_in_preparation_for_cache) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/196 [06:07:19] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:08:54] (03update) 10raymond-ndibe: [lima-kilo] cache container images [repos/cloud/toolforge/lima-kilo] (refactor_in_preparation_for_cache) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/196 [06:09:41] (03update) 10raymond-ndibe: [lima-kilo] cache container images [repos/cloud/toolforge/lima-kilo] (refactor_in_preparation_for_cache) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/196 [06:12:21] (03update) 10raymond-ndibe: [lima-kilo] cache container images [repos/cloud/toolforge/lima-kilo] (refactor_in_preparation_for_cache) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/196 [06:12:53] (03update) 10raymond-ndibe: [lima-kilo] cache container images [repos/cloud/toolforge/lima-kilo] (refactor_in_preparation_for_cache) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/196 [07:05:04] 10Tool-toolwatch: Implementing alert system to notify maintainers of downtime - https://phabricator.wikimedia.org/T368816#10220532 (10Tacsipacsi) >>! In T368816#10219466, @bd808 wrote: >>>! In T368816#10201603, @Gopavasanth wrote: >> The idea is to automatically send a reminder to the tool's maintainers at the a... [08:11:18] 10Tool-toolwatch: Implementing alert system to notify maintainers of downtime - https://phabricator.wikimedia.org/T368816#10220619 (10Reputation22) >>! In T368816#10220531, @Tacsipacsi wrote: >>>! In T368816#10219466, @bd808 wrote: >>>>! In T368816#10201603, @Gopavasanth wrote: >>> The idea is to automatically s... [08:31:25] 10Tool-toolwatch: Implementing alert system to notify maintainers of downtime - https://phabricator.wikimedia.org/T368816#10220652 (10Tacsipacsi) I see. I don’t feel that. Sorry for accusing you of arguing with a slippery slope; however, next time please be more explicit about your thinking to avoid such misunde... [09:17:26] FIRING: SystemdUnitDown: The service unit prometheus-node-kernel-panic.service is in failed status on host cloudcephosd1020. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1020 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:21:27] FIRING: CloudVPSDesignateLeaks: Detected 5 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:22:26] FIRING: [2x] SystemdUnitDown: The service unit prometheus-node-kernel-panic.service is in failed status on host cloudcephmon1002. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:23:41] FIRING: [3x] SystemdUnitDown: The service unit prometheus-node-kernel-panic.service is in failed status on host cloudcephmon1002. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:27:26] FIRING: [5x] SystemdUnitDown: The service unit prometheus-node-kernel-panic.service is in failed status on host cloudcephmon1002. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:28:41] FIRING: [6x] SystemdUnitDown: The service unit prometheus-node-kernel-panic.service is in failed status on host cloudcephmon1002. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:33:41] FIRING: [8x] SystemdUnitDown: The service unit prometheus-node-kernel-panic.service is in failed status on host cloudcephmon1002. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:37:26] FIRING: [11x] SystemdUnitDown: The service unit prometheus-node-kernel-panic.service is in failed status on host cloudcephmon1002. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:42:26] FIRING: [15x] SystemdUnitDown: The service unit prometheus-node-kernel-panic.service is in failed status on host cloudcephmon1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:02:03] 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure: Quota increase for Integration project (Jenkins CI runners) - https://phabricator.wikimedia.org/T376847#10220850 (10aborrero) 05Open→03In progress p:05Triage→03Medium a:03Slst2020 @dcaro checked our resources. We can accommodate t... [10:07:19] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [11:09:42] FIRING: SystemdUnitDown: The systemd unit prometheus-node-kernel-panic.service on node cloudcephosd1020 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1020 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:09:47] 06cloud-services-team: SystemdUnitDown Unit prometheus-node-kernel-panic.service on node cloudcephosd1020 has been down for long. - https://phabricator.wikimedia.org/T376989 (10phaultfinder) 03NEW [11:16:27] FIRING: [2x] SystemdUnitDown: The systemd unit prometheus-node-kernel-panic.service on node cloudcephmon1002 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:16:31] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T376990 (10phaultfinder) 03NEW [11:19:42] FIRING: [3x] SystemdUnitDown: The systemd unit prometheus-node-kernel-panic.service on node cloudcephmon1002 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:19:50] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T376990#10221003 (10phaultfinder) [11:21:27] FIRING: [5x] SystemdUnitDown: The systemd unit prometheus-node-kernel-panic.service on node cloudcephmon1002 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:21:39] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T376990#10221005 (10phaultfinder) [11:22:54] 14Grid-Engine-to-K8s-Migration, 10Tools, 06All-and-every-Wikisource: Migrate phetools from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319965#10221007 (10Xover) a:05Xover→03None Remove myself as assignee on this task since I won't have time to work on it any time soo... [11:24:42] FIRING: [6x] SystemdUnitDown: The systemd unit prometheus-node-kernel-panic.service on node cloudcephmon1002 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:24:54] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T376990#10221030 (10phaultfinder) [11:29:19] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops: keepalived: it doesn't support mixing IPv4 and IPv6 VIPs on the same VRRP instance - https://phabricator.wikimedia.org/T376879#10221052 (10aborrero) 05In progress→03Resolved >>! In T376879#10219564, @Multichill wrote: > Ipv... [11:29:41] FIRING: [9x] SystemdUnitDown: The systemd unit prometheus-node-kernel-panic.service on node cloudcephmon1002 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:29:46] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T376990#10221066 (10phaultfinder) [11:31:27] FIRING: [10x] SystemdUnitDown: The systemd unit prometheus-node-kernel-panic.service on node cloudcephmon1002 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:31:31] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T376990#10221074 (10phaultfinder) [11:34:41] FIRING: [11x] SystemdUnitDown: The systemd unit prometheus-node-kernel-panic.service on node cloudcephmon1002 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:34:44] (03approved) 10sstefanova: kubernetes_config: use the default namespace if non in kubeconfig [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/62 (owner: 10dcaro) [11:34:48] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T376990#10221080 (10phaultfinder) [11:34:49] (03update) 10sstefanova: kubernetes_config: use the default namespace if non in kubeconfig [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/62 (owner: 10dcaro) [11:36:27] FIRING: [15x] SystemdUnitDown: The systemd unit prometheus-node-kernel-panic.service on node cloudcephmon1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:36:36] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T376990#10221082 (10phaultfinder) [11:39:41] (03open) 10sstefanova: deployment: add k8s storage support [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/24 (https://phabricator.wikimedia.org/T362069) [11:44:05] (03approved) 10sstefanova: api_client: when loading cert data from kubeconfig try base64 too [repos/cloud/toolforge/toolforge-weld] (add_default_namespace) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/63 (owner: 10dcaro) [11:44:07] (03update) 10sstefanova: api_client: when loading cert data from kubeconfig try base64 too [repos/cloud/toolforge/toolforge-weld] (add_default_namespace) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/63 (owner: 10dcaro) [11:51:51] 10Tool-spacemedia: Detect duplicate images via perceptual hashes before upload - https://phabricator.wikimedia.org/T251026#10221134 (10Don-vip) 05Open→03Resolved [12:11:34] 10VPS-project-Codesearch, 10VPS-project-Extdist, 06collaboration-services, 10Gerrit, 13Patch-For-Review: Move clients off of gerrit-replica.wikimedia.org back to gerrit.wikimedia.org - https://phabricator.wikimedia.org/T336710#10221188 (10hashar) 05Open→03Resolved a:03hashar From the Apache acc... [12:28:47] 10Cloud-VPS (Quota-requests): Request floating IP for wikiwho project - https://phabricator.wikimedia.org/T376637#10221236 (10aborrero) I don't think it makes sense to allocate a floating IP for this. The FQDN `wikiwho.wmcloud.org` is proxied from the IP address `185.15.56.49`, see also https://wikitech.wikimed... [12:35:45] (03update) 10sstefanova: deployment: add k8s storage support [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/24 (https://phabricator.wikimedia.org/T362069) [12:53:46] (03update) 10sstefanova: deployment: add k8s storage support [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/24 (https://phabricator.wikimedia.org/T362069) [13:09:57] (03update) 10sstefanova: deployment: add k8s storage support [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/24 (https://phabricator.wikimedia.org/T362069) [13:18:06] (03update) 10sstefanova: deployment: add k8s storage support [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/24 (https://phabricator.wikimedia.org/T362069) [13:21:27] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:27:26] RESOLVED: SystemdUnitDown: The service unit prometheus-node-kernel-panic.service is in failed status on host cloudcephosd1016. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1016 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:39:09] !log sstefanova@cloudcumin1001 integration START - Cookbook wmcs.openstack.quota_increase (T376847) [13:39:13] T376847: Quota increase for Integration project (Jenkins CI runners) - https://phabricator.wikimedia.org/T376847 [13:39:17] !log sstefanova@cloudcumin1001 integration END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T376847) [13:42:35] 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure: Quota increase for Integration project (Jenkins CI runners) - https://phabricator.wikimedia.org/T376847#10221426 (10Slst2020) 05In progress→03Resolved Done! [14:06:27] RESOLVED: [4x] SystemdUnitDown: The systemd unit prometheus-node-kernel-panic.service on node cloudcephmon1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:07:19] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [14:07:36] (03open) 10kbach: Add #wikimedia-techdocs [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/8 (https://phabricator.wikimedia.org/T375854) [14:31:27] (03merge) 10bd808: Add #wikimedia-techdocs [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/8 (https://phabricator.wikimedia.org/T375854) (owner: 10kbach) [15:32:39] 10Tools, 06Wikimedia-Medicine: Bring back `mdwiki` tool from the demise of GridEngine (transition to Toolforge Kubernetes) - https://phabricator.wikimedia.org/T319887#10221756 (10Harej) a:05Harej→03None [15:34:20] 10Quarry: [bug] Quarry queries are stopped - https://phabricator.wikimedia.org/T377010 (10Prototyperspective) 03NEW [16:16:42] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q1:rack/setup/install cloudlb2004-dev - https://phabricator.wikimedia.org/T370678#10222008 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm [16:17:06] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 06Tech-Docs-Team, 07Documentation: WMCS: Document different types of root and admin privileges - https://phabricator.wikimedia.org/T375113#10222009 (10fnegri) > Maybe we could expand it to also cover Toolforge admin permissions, or maybe we should hav... [16:57:11] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q1:rack/setup/install cloudlb2004-dev - https://phabricator.wikimedia.org/T370678#10222171 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudlb2004-dev.codfw.wmnet with OS bookworm complete... [17:12:24] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q1:rack/setup/install cloudlb2004-dev - https://phabricator.wikimedia.org/T370678#10222232 (10Jhancock.wm) 05Open→03Resolved @aborrero this is finally ready. turned into a learning opprotunity [17:12:25] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q1:rack/setup/install cloudlb2004-dev - https://phabricator.wikimedia.org/T370678#10222236 (10Jhancock.wm) [17:13:23] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge: [openstack object storage] deleted files still occupying space - https://phabricator.wikimedia.org/T376673#10222238 (10taavi) [17:21:27] FIRING: CloudVPSDesignateLeaks: Detected 5 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:27:11] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/20 (owner: 10l10n-bot) [17:27:14] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/20 (owner: 10l10n-bot) [17:36:39] 10wikitech.wikimedia.org, 06Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Wikitech showing Wikipedia CentralNotice banners - https://phabricator.wikimedia.org/T377030 (10Pcoombe) 03NEW [18:01:12] 10Quarry: [bug] Quarry queries are stopped - https://phabricator.wikimedia.org/T377010#10222394 (10rook) It's been awhile since I've looked at that code. When I worked on it it was to have the stopped status appear when someone manually presses the "stop" button, I thought I added it just for that, but maybe it... [18:03:46] 10Quarry: [bug] Quarry queries are stopped - https://phabricator.wikimedia.org/T377010#10222415 (10Prototyperspective) @rook Yes, I did not press the stop button for any of the queries that were stopped and it only displays the above two lines and not any further info like some error code. Other example: https:/... [18:07:19] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [19:04:31] 10Openstack-Magnum: magnum: kubectl fails to connect after time - https://phabricator.wikimedia.org/T336586#10222543 (10SD0001) @rook I still can't get kubectl to connect to the cluster with any of the config files in `/opt`. Which one is the current config that's supposed to be used? [19:36:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.991% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [19:40:12] 06Toolforge-standards-committee: Facilitate Volunteer NDA application process for 2024 Toolforge standards committee appointees - https://phabricator.wikimedia.org/T374993#10222631 (10KFrancis) Hi all, I am confirming @JJMC89 has an NDA on file. Thanks! [19:49:06] 10Openstack-Magnum: magnum: kubectl fails to connect after time - https://phabricator.wikimedia.org/T336586#10222636 (10rook) It's probably better to generate it using the deploy.sh script from the bastion. The only tofu thing that it should attempt doing is creating the kube.config file for the current clus... [21:21:27] FIRING: CloudVPSDesignateLeaks: Detected 5 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:07:19] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [22:47:33] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate mwv-builder-03.mediawiki-vagrant.eqiad.wmflabs is about to expire in 13d 23h 58m 34s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [23:31:50] (03CR) 10Krinkle: [C:03+1] "LGTM. happy to merge, but waiting until I'm sure someone can also deploy it. Can you?" [labs/tools/github-pr-closer] - 10https://gerrit.wikimedia.org/r/1079048 (https://phabricator.wikimedia.org/T374157) (owner: 10Pppery) [23:36:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.41% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [23:40:31] (03CR) 10Pppery: "No. I have no access to anything. Just figured I would try to help out by looking at the code, and observed a trivial one-line patch shoul" [labs/tools/github-pr-closer] - 10https://gerrit.wikimedia.org/r/1079048 (https://phabricator.wikimedia.org/T374157) (owner: 10Pppery) [23:47:48] (03CR) 10Pppery: "The way the process works is that" [labs/tools/github-pr-closer] - 10https://gerrit.wikimedia.org/r/1079048 (https://phabricator.wikimedia.org/T374157) (owner: 10Pppery) [23:56:19] 10Tool-ldap, 10Phabricator, 13Patch-For-Review: https://ldap.toolforge.org/ integration assumes that `cn` and `uid` are equivalent - https://phabricator.wikimedia.org/T376769#10223031 (10Pppery) This still isn't quite right. Taking matmarex as an example, Phabricator links to https://ldap.toolforge.org/user/...