[07:30:34] 10Traffic, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Operations, 10Patch-For-Review: TY pages in a subdomain of wikipedia and set hide banner cookie - https://phabricator.wikimedia.org/T251780 (10AndyRussG) Thanks so much @Krinkle for the information on this! Please see {T259002}. [08:54:58] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: cp3053 nvme0 issues - https://phabricator.wikimedia.org/T256632 (10Vgutierrez) 05Stalled→03Resolved a:03Vgutierrez Thanks for pinging me @wiki_willy, we can close to this task, everything seems good in cp3053 so far. I'll reopen the task if needed [09:06:45] 10netops, 10Operations, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Migrate role::netmon to Buster - https://phabricator.wikimedia.org/T247967 (10fgiunchedi) [09:07:03] 10netops, 10Operations, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Migrate role::netmon to Buster - https://phabricator.wikimedia.org/T247967 (10fgiunchedi) [09:09:18] 10netops, 10Operations, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Migrate role::netmon to Buster - https://phabricator.wikimedia.org/T247967 (10fgiunchedi) Failover happened and the active netmon host is now netmon2001. Issues identified: 1. Polling time for eqiad devices increased signif... [09:11:14] 10netops, 10Operations, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Migrate role::netmon to Buster - https://phabricator.wikimedia.org/T247967 (10fgiunchedi) [10:04:49] 10netops, 10Operations, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Migrate role::netmon to Buster - https://phabricator.wikimedia.org/T247967 (10fgiunchedi) [10:37:21] 10Traffic, 10Operations, 10Patch-For-Review: ats-tls is having issues when varnish-fe goes away - https://phabricator.wikimedia.org/T242620 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [10:38:54] 10Traffic, 10Operations: ATS logs aren't being rotated - https://phabricator.wikimedia.org/T238724 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [12:45:56] 10Traffic, 10Operations: varnishmtail silently stops working if varnishncsa crashes - https://phabricator.wikimedia.org/T259020 (10ema) [13:06:56] 10Traffic, 10Operations: varnishmtail silently stops working if varnishncsa crashes - https://phabricator.wikimedia.org/T259020 (10ema) p:05Triage→03Medium [13:09:54] 10Traffic, 10Operations, 10Patch-For-Review: Varnish and ATS health-check improvements - https://phabricator.wikimedia.org/T255015 (10ema) [13:10:07] 10Traffic, 10Operations, 10Patch-For-Review: Varnish and ATS health-check improvements - https://phabricator.wikimedia.org/T255015 (10ema) 05Open→03Resolved a:03ema All done! [13:10:14] 10netops, 10Operations: ripe-atlas-eqiad IPv6 unreachable - https://phabricator.wikimedia.org/T258018 (10CDanis) 05Stalled→03Resolved With the serial console now attached, I found myself in a rescue shell. I poked around some, got `/` and `/boot` mounted under the empty `/sysroot`, looked at the failed ke... [16:11:51] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: cp3053 nvme0 issues - https://phabricator.wikimedia.org/T256632 (10wiki_willy) Thanks @Vgutierrez [17:25:35] 10Wikimedia-Apache-configuration, 10Diff-blog: Create redirect for legacy blog.wikimedia.org/blog/ links - https://phabricator.wikimedia.org/T85076 (10CKoerner_WMF) 05Open→03Resolved a:03CKoerner_WMF After a bit of testing to verify, these posts are all redirecting appropriately to their home on Diff. Re... [17:42:11] 10Traffic, 10Operations, 10Phabricator, 10serviceops, and 2 others: Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10mmodell) 05Stalled→03Open [18:01:49] Hello all! I recently removed a cluster from lvs (a think that Jason set up and we no longer need). I believe I've cleaned up everything that needs cleaning in puppet, but pybal is still upset about the absence: [18:01:52] PYBAL CRITICAL - CRITICAL - cloudceph_9283: Servers cloudcephmon1002.eqiad.wmnet are marked down but pooled [18:02:09] Can I get advice about what other steps I need to take to make e.g. lvs1015 truly forget about all that? [18:05:42] hm, docs suggest 'restart pybal' [18:05:49] which I can do but is scary :) [18:06:21] hm, that surely won't tell icinga anything [18:08:29] andrewbogott: I can do a pybal restart [18:08:36] thanks [18:08:47] is it just eqiad that is complaining? [18:09:27] should be yeah [18:10:50] I don't see the alert in icinga? [18:11:30] it's downtimed [18:11:36] if you look at lvs1015 directly you can see it [18:33:43] in addition to the pybal restart to clear the backends health check, I also needed to manually remove the service from the kernel tables using ipvsadm [18:35:58] ok. Is that normal, or a result of me doing things out of order? [18:36:56] normal [18:38:08] icinga seems happy now — thank you! [21:13:42] 10Traffic, 10Operations, 10Phabricator, 10serviceops, and 2 others: Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10Dzahn) @20after4 Current status is now - `aphlict1001.eqiad.wmnet` up and running - phabricator-roots admin gro...