[07:32:29] 06serviceops, 06collaboration-services, 10envoy, 06SRE, 13Patch-For-Review: Upgrade Envoy to v1.29.12 - https://phabricator.wikimedia.org/T403663#11223257 (10MoritzMuehlenhoff) [07:41:37] 06serviceops, 06Traffic, 05WE4.2 Bot detection (WE4.2 hCaptcha account creation trial): Investigate options for per-wiki, percentage-based rollout of hCaptcha - https://phabricator.wikimedia.org/T404184#11223287 (10jijiki) [08:36:16] 06serviceops, 06Infrastructure-Foundations, 10Spicerack, 10SRE-tools: Spicerack's `Discovery.resolve_with_client_ip` should set a timeout on `udp_with_fallback` - https://phabricator.wikimedia.org/T405397#11223585 (10elukey) Spicerack 11.9.0 deployed on all cumin nodes :) [11:23:48] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), and 2 others: Update wikikube eqiad to kubernetes 1.31 - https://phabricator.wikimedia.org/T405703#11224189 (10JMeybohm) [11:25:25] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), and 2 others: Update wikikube eqiad to kubernetes 1.31 - https://phabricator.wikimedia.org/T405703#11224191 (10JMeybohm) [11:26:08] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), and 2 others: Update wikikube eqiad to kubernetes 1.31 - https://phabricator.wikimedia.org/T405703#11224204 (10JMeybohm) [12:05:13] jelto: claime: when would be a good time we have a quick sync regarding the wikikube eqiad upgrade? [12:05:39] jayme: Give me 90 minutes? I need to finish up something then grab a bite [12:06:00] today I'm available until 15:00 UTC, so in 90 minutes works [12:06:15] ok, cool. I'll schedule something [13:10:29] jayme: I have come back from the dead, do you want me around or are you good? [13:10:40] (braaaaaaaaains) [13:23:31] 06serviceops, 06MW-Interfaces-Team (MWI-Sprint-19 (2025-09-23 to 2025-10-07)), 07OKR-Work, 13Patch-For-Review: Execute test plan for rest gateway rerouting for rest.php requests and report findings - https://phabricator.wikimedia.org/T405368#11224527 (10Clement_Goubert) >>! In T405368#11211609, @aaron wrot... [13:28:02] Raine: right. I've added you to the meet if you'd like to join [13:41:08] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), and 2 others: Update wikikube eqiad to kubernetes 1.31 - https://phabricator.wikimedia.org/T405703#11224617 (10Jelto) [13:48:16] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), and 2 others: Update wikikube eqiad to kubernetes 1.31 - https://phabricator.wikimedia.org/T405703#11224632 (10Jelto) [13:54:50] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), and 2 others: Update wikikube eqiad to kubernetes 1.31 - https://phabricator.wikimedia.org/T405703#11224666 (10Clement_Goubert) a:03Clement_Goubert I will be running the upgrade, @Jelto is backup, an... [14:04:05] jayme: toolhub is once again being served only from eqiad, so it will be down. Last time this happened we prioritized redeploying it to minimize downtime, I think we should do the same [14:06:36] argh...that thing [14:06:45] yeah, ack! [14:06:55] jayme: we can just helmfile apply deploy it then run charlie right? [14:07:37] I think so, yes. Never used charlie thb. but AIUI r.zl had used it to deploy the envoy changes [14:09:50] claime: mind adding that (toolhub) to the phab task to we can curse it right away next time? :D [14:19:24] jayme: sure [14:23:39] charlie? [14:23:53] taavi: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1188456 [14:25:40] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), and 2 others: Update wikikube eqiad to kubernetes 1.31 - https://phabricator.wikimedia.org/T405703#11224866 (10Clement_Goubert) [14:26:10] ^ excellent name, r.zl is a horrible nerd [14:27:14] https://upload.wikimedia.org/wikipedia/commons/8/82/Charlie_Work.png [14:52:07] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 06Data-Platform-SRE (2025.09.26 - 2025.10.17), and 2 others: Update wikikube eqiad to kubernetes 1.31 - https://phabricator.wikimedia.org/T405703#11224975 (10Clement_Goubert) [15:04:34] 06serviceops, 06SRE, 06Traffic: Reconcile MediaWiki POST timeout and Varnish/ATS timeouts - https://phabricator.wikimedia.org/T294800#11225029 (10ssingh) This is being moved on the Traffic workboard to "Radar/Not for service" as I don't think there is anything on our end to do here. Please let me know if you... [15:04:54] 06serviceops, 06SRE, 06Traffic: Reconcile MediaWiki POST timeout and Varnish/ATS timeouts - https://phabricator.wikimedia.org/T294800#11225030 (10ssingh) And to be clear, by that I mean that this change is better suited for MW and not the CDN. [15:17:02] 06serviceops, 06Infrastructure-Foundations, 10Spicerack, 10SRE-tools: Spicerack's `Discovery.resolve_with_client_ip` should set a timeout on `udp_with_fallback` - https://phabricator.wikimedia.org/T405397#11225075 (10Scott_French) 05Open→03Resolved a:03Scott_French Amazing - thank you very much,... [15:24:59] 06serviceops, 10Page Content Service, 06Traffic: Block traffic to RESTBase /page/talk endpoint and sunset it - https://phabricator.wikimedia.org/T401895#11225095 (10MSantos) [15:42:32] 06serviceops, 10Page Content Service, 06Traffic: Block traffic to RESTBase /page/talk endpoint and sunset it - https://phabricator.wikimedia.org/T401895#11225201 (10MSantos) p:05Triage→03Medium @akosiaris per T392491#11167986 and message sent to Wikitech-l earlier Today, this is ready to go. [16:38:14] elukey: we can discuss it tomorrow, but the k8s upgrade means kartotherian will either need to move to codfw temporarily, or be depooled completely [16:45:14] claime: o/ ahhhh okok makes sense, do you have a timeline? We are testing the new codfw stack, hopefully ready by end of week [16:45:33] elukey: Wednesday [16:46:02] And of course I announced it before I knew that was single-homed [16:46:11] claime: I hate to ask, but is it possible to post-pone it by a week? [16:46:13] (it's not fun otherwise) [16:47:06] elukey: It's *possible* as in we'd need to postpone eqiad's repool by as much [16:47:39] wonderful [16:48:16] so in theory we may be able to fallback to the old stack for that time, it should be lagging a bit but better than nothing [16:48:22] I can try to set it up tomorrow [16:48:37] (of course Moritz is out during the next couple of days, perfect timing) [16:48:50] claime: let's resync tomorrow ok? I'll try to come up with some plan [16:48:52] elukey: Tell me tomorrow if you need a hand with it, I'll help [16:49:10] ack thanks! [16:49:10] Yeah, we don't need to make a decision right now. [16:49:13] if that's not an option or does not work we can also postpone the k8s update even more and depool eqiad again. That's fine as well [17:03:25] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950 (10RobH) 03NEW [17:07:41] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11225895 (10RobH) @Kappakayala, I'm not exactly sure who in your team would be the best point of contact for the above migration list, as it covers multiple service groups. Th... [17:07:51] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: eqiad row C/D Service Ops host migrations - https://phabricator.wikimedia.org/T405950#11225896 (10RobH) a:03Kappakayala [17:07:59] 06serviceops, 06MW-Interfaces-Team (MWI-Sprint-19 (2025-09-23 to 2025-10-07)), 07OKR-Work, 13Patch-For-Review: Execute test plan for rest gateway rerouting for rest.php requests and report findings - https://phabricator.wikimedia.org/T405368#11225897 (10aaron) I'll add some more test for the endpoints I di... [18:09:09] 06serviceops, 06MW-Interfaces-Team (MWI-Sprint-19 (2025-09-23 to 2025-10-07)), 07OKR-Work, 13Patch-For-Review: Execute test plan for rest gateway rerouting for rest.php requests and report findings - https://phabricator.wikimedia.org/T405368#11226062 (10aaron) >>! In T405368#11225897, @aaron wrote: > I'll... [18:23:15] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q1:rack/setup/install wikikube-ctrl2006 - https://phabricator.wikimedia.org/T400661#11226087 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1002 for host wikikube-ctrl2006.codfw.wmnet with OS bookworm [18:41:15] 06serviceops, 06MW-Interfaces-Team (MWI-Sprint-19 (2025-09-23 to 2025-10-07)), 07OKR-Work, 13Patch-For-Review: Execute test plan for rest gateway rerouting for rest.php requests and report findings - https://phabricator.wikimedia.org/T405368#11226157 (10aaron) >>! In T405368#11224527, @Clement_Goubert wrot... [18:42:13] 06serviceops: MediaWiki on PHP 8.3 production workload migration - https://phabricator.wikimedia.org/T405955 (10Scott_French) 03NEW [18:42:29] 06serviceops: MediaWiki on PHP 8.3 production workload migration - https://phabricator.wikimedia.org/T405955#11226178 (10Scott_French) p:05Triage→03Medium [18:42:49] 06serviceops: MediaWiki on PHP 8.3 production workload migration - https://phabricator.wikimedia.org/T405955#11226179 (10Scott_French) [18:42:53] 06serviceops, 06MediaWiki-Platform-Team, 07Epic: Migrate Wikimedia production from PHP 8.1 to PHP 8.3 - https://phabricator.wikimedia.org/T360995#11226180 (10Scott_French) [18:48:58] 06serviceops, 10MoveComms-Support, 07Datacenter-Switchover: MoveComms support for Southward DC Switchover (September 2025) - https://phabricator.wikimedia.org/T399894#11226199 (10EBlackorby-WMF) debrief from MoveComms for next time: - Please avoid changing the hour of the switchover, as it impacts our Mo... [18:49:25] 06serviceops, 10MoveComms-Support, 07Datacenter-Switchover: MoveComms support for Southward DC Switchover (September 2025) - https://phabricator.wikimedia.org/T399894#11226201 (10EBlackorby-WMF) 05Open→03Resolved [19:44:04] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q1:rack/setup/install wikikube-ctrl2006 - https://phabricator.wikimedia.org/T400661#11226602 (10Jhancock.wm) got the pxe issue fixed. but found a new one. @Clement_Goubert this server has to be uefi and it looks like the preseed is set up for bios. if i'm reading... [20:04:34] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q1:rack/setup/install wikikube-ctrl2006 - https://phabricator.wikimedia.org/T400661#11226748 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1002 for host wikikube-ctrl2006.codfw.wmnet with OS bookworm executed with errors: -... [20:30:40] 06serviceops, 13Patch-For-Review: MediaWiki on PHP 8.3 production workload migration - https://phabricator.wikimedia.org/T405955#11226822 (10Scott_French)