[02:30:33] (FYI) cross-posting here, I've added a new 18h silence specifically for GatewayBackendErrorsHigh, covering reference-risk and reference-need, referencing T387019 [02:30:34] T387019: Increased latencies in reference-quality models (ref-need) - https://phabricator.wikimedia.org/T387019 [08:02:39] ack, thx [08:24:40] dcaro arturo I need to switch m5 master, I don't think it'd take more than 30 seconds of RO time, does this need any preparation from your side? https://phabricator.wikimedia.org/T388500 [08:24:46] I wanted to do it tomorrow [08:34:28] marostegui: I don't think we need anything special [08:34:41] you can do anytime [08:34:47] arturo: thank you! [08:34:55] thanks! [08:59:47] There are lots of icinga checks like NRPE: Command 'check_conntrack_table_size' not defined showing unknown [08:59:57] Yeah, I'm removing that check [09:00:11] Apparently Icinga doesn't agree with my solution [09:03:03] I'll just revert it ... AGAIN, and ping someone in o11y for the correct approach to remove the check. [10:44:42] hnowlan: To confirm, after the MW switch on Wed, we are leaving codfw depooled from reads until the 26th, correct? [10:45:30] marostegui: correct [10:45:38] hnowlan: thanks! [10:46:15] if there's anything we can help with let us know! [10:46:23] wilco thank you! [12:23:07] hey on-callers, Kartotherian codfw is running only on k8s, and eqiad is running 70% on k8s (still two bare metals pooled) [12:23:23] it looks good, but if you get some weird error/report/etc.. please ping me [12:25:17] \o/ [12:40:46] Kartotherian now fully served by k8s :) [12:43:38] Woo [12:55:11] \o/ [14:46:48] FYI I'm seeing puppet errors like the following e.g. on prometheus2008 https://phabricator.wikimedia.org/P74195 [15:10:46] Heads up: Ilias and I are pushing a small API GW change in a few minutes (should onyl affect Liftwing services): https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1126523