[00:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:30:11] 10Tool-Global-user-contributions, 10CheckUser-GlobalContributions, 06Trust and Safety Product Team: Add discovery link in GUC for temp user edits (to Special:GlobalContribs) - https://phabricator.wikimedia.org/T382390 (10Krinkle) 03NEW [01:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:45:25] 10Tool-Global-user-contributions, 10CheckUser-GlobalContributions, 06Trust and Safety Product Team: Add discovery link in GUC for temp user edits (to Special:GlobalContribs) - https://phabricator.wikimedia.org/T382390#10411494 (10Krinkle) I'm struggling to come up with a clear and concise wording for a banne... [01:45:47] 10Tool-Global-user-contributions, 10CheckUser-GlobalContributions, 06Trust and Safety Product Team: Add discovery link in GUC for temp user edits (to Special:GlobalContribs) - https://phabricator.wikimedia.org/T382390#10411495 (10Krinkle) p:05Triage→03High a:03Krinkle [01:53:38] 10Tool-Global-user-contributions, 10CheckUser-GlobalContributions, 06Trust and Safety Product Team: Add discovery link in GUC for temp user edits (to Special:GlobalContribs) - https://phabricator.wikimedia.org/T382390#10411510 (10Krinkle) [04:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:18:39] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: replace cloudgw100[12] with spare 'second region' dev servers cloudnet100[78]-dev - https://phabricator.wikimedia.org/T382356#10411934 (10fnegri) p:05Triage→03High [11:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:23:21] 06cloud-services-team, 10Cloud-VPS: ProbeDown Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T380692#10412107 (10fnegri) 05Open→03Resolved a:03fnegri Probably related to {T382220} [11:23:25] 10cloud-services-team (FY2024/2025-Q1-Q2): KernelError Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382220#10412114 (10fnegri) [11:23:25] 06cloud-services-team, 10Cloud-VPS: ProbeDown Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://phabricator.wikimedia.org/T380692#10412113 (10fnegri) [11:24:51] 06cloud-services-team, 10Cloud-VPS: cloudgw: suspected network problems - https://phabricator.wikimedia.org/T381078#10412118 (10fnegri) [11:24:52] 10cloud-services-team (FY2024/2025-Q1-Q2): KernelError Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382220#10412119 (10fnegri) [11:26:11] 06cloud-services-team, 10Toolforge (Toolforge iteration 16), 10Sustainability (Incident Followup): [docs,envvars-api,jobs-api,builds-api] create docs on how to operate the cluster and core components - https://phabricator.wikimedia.org/T380959#10412121 (10fnegri) [11:26:12] 06cloud-services-team, 10Toolforge (Toolforge iteration 16): [jobs-api] crashing - https://phabricator.wikimedia.org/T380832#10412122 (10fnegri) [11:26:33] 06cloud-services-team, 10Toolforge (Toolforge iteration 16), 10Sustainability (Incident Followup): [docs,envvars-api,jobs-api,builds-api] create docs on how to operate the cluster and core components - https://phabricator.wikimedia.org/T380959#10412125 (10fnegri) ^ Removed the (Resolved) parent task to clean... [11:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:49:18] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: replace cloudgw100[12] with spare 'second region' dev servers cloudnet100[78]-dev - https://phabricator.wikimedia.org/T382356#10412143 (10cmooney) > Does it matter that the replacement servers are in different racks? Pinging @aborrero and @cmooney for an... [12:04:31] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 13Patch-For-Review: [wmcs-cookbooks] wmcs.openstack.cloudvirt.vm_console cookbook is not working from cloudcumin hosts - https://phabricator.wikimedia.org/T379570#10412187 (10fnegri) 05Open→03In progress [12:06:37] 10cloud-services-team (FY2024/2025-Q1-Q2): KernelError Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382220#10412189 (10fnegri) 05Open→03In progress [13:04:58] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks - https://phabricator.wikimedia.org/T382412 (10Andrew) 03NEW [13:07:08] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: replace cloudgw100[12] with spare 'second region' dev servers cloudnet100[78]-dev - https://phabricator.wikimedia.org/T382356#10412324 (10Andrew) Created T382412 about relocating the cloudnet-dev servers. [13:19:02] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: replace cloudgw100[12] with spare 'second region' dev servers cloudnet100[78]-dev - https://phabricator.wikimedia.org/T382356#10412349 (10RobH) Servers can be renamed, but there are a few places that may cause issues if they are not updated. * Update the... [14:16:58] 06cloud-services-team, 10Toolforge: Unable to Connect to Database for New Toolforge Project "wiki-talents" - https://phabricator.wikimedia.org/T381457#10412530 (10fnegri) p:05Triage→03Medium [14:17:09] 06cloud-services-team, 10Toolforge: Unable to Connect to Database for New Toolforge Project "wiki-talents" - https://phabricator.wikimedia.org/T381457#10412532 (10fnegri) a:03fnegri [14:21:34] (03CR) 10Jforrester: "recheck" [labs/tools/meetingtimes] - 10https://gerrit.wikimedia.org/r/971743 (owner: 10VolkerE) [14:46:08] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10412589 (10Andrew) [14:47:08] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10412591 (10Andrew) [14:48:27] 06cloud-services-team, 10Toolforge: Unable to Connect to Database for New Toolforge Project "wiki-talents" - https://phabricator.wikimedia.org/T381457#10412592 (10fnegri) 05Open→03Resolved I regenerated `replica.my.cnf` (following the [docs](https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Re... [14:49:01] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10412597 (10fnegri) a:05fnegri→03None [14:49:39] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10412609 (10fnegri) [14:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:51:20] 10Tool-schedule-deployment: Show upcoming backport windows in local time - https://phabricator.wikimedia.org/T382417 (10SBisson) 03NEW [14:53:29] 10Tool-schedule-deployment: Show upcoming backport windows in local time - https://phabricator.wikimedia.org/T382417#10412630 (10SBisson) [14:58:04] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: Relocate cloudnet1007-dev and cloudnet1008-dev to new racks and rename - https://phabricator.wikimedia.org/T382412#10412648 (10aborrero) [15:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:04:19] 06cloud-services-team, 10Toolforge: ToolsDB: setup pt-heartbeat replication monitor - https://phabricator.wikimedia.org/T334925#10412660 (10fnegri) 05In progress→03Resolved a:03fnegri I'll mark this as Resolved, as ToolsDB is using wmt-pt-heartbeat, and Quarry is using it to display the lag in ToolsD... [15:06:32] 06cloud-services-team, 10Toolforge: toolsdb: review alerting - https://phabricator.wikimedia.org/T306453#10412677 (10fnegri) 05Open→03Resolved a:03fnegri I'll mark this as resolved as the current alerts are working well. With the simple primary-replica setup of ToolsDB, I don't think using pt-heartbe... [15:07:05] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install cloudcephosd2004-dev - https://phabricator.wikimedia.org/T378825#10412682 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudcephosd2004-dev.codfw.wmnet with OS bul... [15:31:14] FIRING: KernelError: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelError [15:31:19] 06cloud-services-team: KernelError Server cloudgw1002 may have kernel errors - https://phabricator.wikimedia.org/T382421 (10phaultfinder) 03NEW [15:50:40] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T382189#10412911 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/470 [15:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:50:45] vivian-rook opened https://github.com/toolforge/paws/pull/470 [16:00:07] 06cloud-services-team, 10Toolforge: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T381453#10412945 (10fnegri) 05Open→03Resolved I updated the Toolforge image following the procedure at [Portal:Toolforge/Admin/Pywikibot_image#Updating_the_image](https://wikitech.wikimedia.org/wi... [16:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:00:59] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install cloudcephosd2004-dev - https://phabricator.wikimedia.org/T378825#10412950 (10Jhancock.wm) @Andrew what kind of partition should this server have? I keep getting an error in that part of the installer. my first thought was... [16:21:41] 06cloud-services-team, 10Cloud-VPS: CloudVPS: codfw1dev: database backup for clouddb2001-dev.codfw.wmnet - https://phabricator.wikimedia.org/T229559#10412999 (10Andrew) 05Open→03Invalid As passed labtestwikitech, so passed these clouddb hosts. [16:25:16] 10wikitech.wikimedia.org, 10Wiki-Setup (Delete / Redirect): Retire labtestwiki - https://phabricator.wikimedia.org/T378260#10413008 (10taavi) 05Open→03Resolved a:03taavi [16:40:14] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T382189#10413043 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/470 [16:40:25] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T382189#10413044 (10rook) 05Open→03Resolved a:03rook [16:40:30] vivian-rook closed https://github.com/toolforge/paws/pull/470 [16:41:14] 10PAWS: update jupyterlab - https://phabricator.wikimedia.org/T382427 (10rook) 03NEW [16:42:25] 10PAWS: update jupyterlab - https://phabricator.wikimedia.org/T382427#10413064 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/471 [16:42:28] vivian-rook opened https://github.com/toolforge/paws/pull/471 [17:22:38] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q2:rack/setup/install cloudcephosd2004-dev - https://phabricator.wikimedia.org/T378825#10413175 (10Andrew) The two small drives should be mirrored (raid 1) and used for the OS, the larger drives left unformatted for Ceph to manage. I believe... [17:34:19] 10PAWS: update jupyterlab - https://phabricator.wikimedia.org/T382427#10413258 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/471 [17:34:23] vivian-rook closed https://github.com/toolforge/paws/pull/471 [17:34:30] 10PAWS: update jupyterlab - https://phabricator.wikimedia.org/T382427#10413259 (10rook) 05Open→03Resolved [17:44:29] (03PS1) 10Btullis: Add dummy tokens for new temporary Hadoop workers [labs/private] - 10https://gerrit.wikimedia.org/r/1105404 (https://phabricator.wikimedia.org/T382410) [17:53:10] (03CR) 10Btullis: [V:03+2 C:03+2] Add dummy tokens for new temporary Hadoop workers [labs/private] - 10https://gerrit.wikimedia.org/r/1105404 (https://phabricator.wikimedia.org/T382410) (owner: 10Btullis) [18:07:04] 06Toolforge-standards-committee: Adoption request for bullseye - https://phabricator.wikimedia.org/T380537#10413396 (10LucasWerkmeister) 05Open→03Resolved a:03LucasWerkmeister Going to call this done. @TheresNoTime please reopen if there are any issues :) [18:28:23] 06cloud-services-team, 10Cloud-VPS: openstack: consider removing labs-ip-aliaser - https://phabricator.wikimedia.org/T374129#10413541 (10bd808) Some things I did when Andrew asked me to double check that things seemed to work without split horizon DNS remapping: `lang=shell-session root@abogott-T374129:~# host... [19:31:14] FIRING: KernelError: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelError [19:35:42] 10Tool-wikiqanda, 06Future-Audiences: Data collection for external release - https://phabricator.wikimedia.org/T380780#10413849 (10DLin-WMF) [20:41:11] 10PAWS: New upstream release for Wikimedia Commons Extension for OpenRefine - https://phabricator.wikimedia.org/T382444#10414084 (10LibUp-bot) [20:44:32] 10PAWS: New upstream release for Wikimedia Commons Extension for OpenRefine - https://phabricator.wikimedia.org/T382444#10414097 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/472 [20:44:46] vivian-rook opened https://github.com/toolforge/paws/pull/472 [21:39:14] 06cloud-services-team, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, 06SRE: decommission cloudcephmon100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T380893#10414252 (10Andrew) a:05Andrew→03cmooney [23:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:31:14] FIRING: KernelError: Server cloudgw1002 may have kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Kernel_panic - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-panic-detector?orgId=1&var-instance=cloudgw1002 - https://alerts.wikimedia.org/?q=alertname%3DKernelError