[10:44:22] 10netops, 06Infrastructure-Foundations, 06SRE: Homer trying to delete BGP peerings for VMs on new Eqiad ganeti nodes - https://phabricator.wikimedia.org/T381175#10520327 (10ayounsi) For (1) we can have the `sre.ganeti.addnode` cookbook call the PuppetDBImport script towards the end. What do you and @MoritzMu... [11:01:17] 10netops, 06Infrastructure-Foundations, 10ops-eqsin, 06SRE: WMF RIPE Atlas probe in Eqsin offline - https://phabricator.wikimedia.org/T382519#10520381 (10ayounsi) Let's decom it and focus our efforts on spinning up VMs instead (T385560). It needs to be removed from the list on https://github.com/wikimedia/... [11:01:56] 10netops, 06Infrastructure-Foundations, 06SRE: WMF RIPE Atlas probe in Eqiad offline - https://phabricator.wikimedia.org/T382518#10520385 (10ayounsi) Let's decom it and focus our efforts on spinning up VMs instead (T385560). It needs to be removed from the list on https://github.com/wikimedia/operations-pupp... [11:04:38] 06Traffic, 10Citoid, 06Editing QA, 06Editing-team, and 3 others: Switchover plan from restbase to api gateway for Citoid - https://phabricator.wikimedia.org/T361576#10520404 (10Mvolz) >>! In T361576#10519300, @Ryasmeen wrote: >>>! In T361576#10515182, @Mvolz wrote: >> This is now available to test on test... [11:51:01] 06Traffic, 06Data-Persistence: migrate swift/swift-https LB VIPs to IPIP encapsulation - https://phabricator.wikimedia.org/T385564 (10Vgutierrez) 03NEW [11:51:30] 06Traffic, 13Patch-For-Review: Remove katran blockers for low-traffic non-k8s based services - https://phabricator.wikimedia.org/T373020#10520550 (10Vgutierrez) [12:29:33] 06Traffic, 13Patch-For-Review: Add support for mh-port scheduler flag on pybal - https://phabricator.wikimedia.org/T373027#10520748 (10Vgutierrez) 05Open→03Resolved [12:44:41] 06Traffic, 06SRE, 07Wikimedia-production-error: 503 error when edit large size pages - https://phabricator.wikimedia.org/T385395#10520811 (10Ahonc) more errors: Request served via cp6014 cp6014, Varnish XID 739548500 Error: 503, Backend fetch failed at Tue, 04 Feb 2025 12:38:16 GMT Request served via cp3069... [14:50:22] 06Traffic, 06SRE, 07Wikimedia-production-error: 503 error when edit large size pages - https://phabricator.wikimedia.org/T385395#10521269 (10Vgutierrez) thanks for your report @Ahonc, I'm seeing your requests hitting a timeout after 125 seconds, this matches with our `idle_send_timeout` configuration (https:... [14:58:37] 06Traffic, 06SRE, 07Wikimedia-production-error: 503 error when edit large size pages - https://phabricator.wikimedia.org/T385395#10521325 (10Ahonc) > tracert uk.wikipedia.org Tracing route to dyna.wikimedia.org [185.15.59.224] over a maximum of 30 hops: 1 4 ms 2 ms 2 ms router.lan [192.168.8... [15:06:02] 06Traffic, 06SRE, 07Wikimedia-production-error: 503 error when edit large size pages - https://phabricator.wikimedia.org/T385395#10521377 (10Ahonc) from other network: > tracert uk.wikipedia.org Tracing route to dyna.wikimedia.org [185.15.59.224] over a maximum of 30 hops: 1 3 ms 2 ms 1 ms... [15:35:27] 06Traffic, 13Patch-For-Review: Replace pybal with liberica on the PoPs - https://phabricator.wikimedia.org/T384477#10521469 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1002 for host lvs4008.ulsfo.wmnet with OS bookworm [16:17:03] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: Dec 2024: cr3-ulsfo errors on et-0/0/0 link from cr4 - https://phabricator.wikimedia.org/T384288#10521665 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=a50b2671-d855-40a0-8790-c502280b9115) set by cmooney@cumin100... [16:47:37] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: Dec 2024: cr3-ulsfo errors on et-0/0/0 link from cr4 - https://phabricator.wikimedia.org/T384288#10521778 (10cmooney) >>! In T384288#10505646, @RobH wrote: > Remote hands 01020815 scheduled for 2025-02-04 @ 0800 Pacific (1600 GMT). Ha... [16:56:31] 06Traffic: Replace pybal with liberica on the PoPs - https://phabricator.wikimedia.org/T384477#10521811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1002 for host lvs4008.ulsfo.wmnet with OS bookworm executed with errors: - lvs4008 (**FAIL**) - Downtimed on Icinga/Aler... [16:59:01] 06Traffic: Replace pybal with liberica on the PoPs - https://phabricator.wikimedia.org/T384477#10521832 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1002 for host lvs4008.ulsfo.wmnet with OS bookworm [17:40:39] 06Traffic: Replace pybal with liberica on the PoPs - https://phabricator.wikimedia.org/T384477#10522054 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1002 for host lvs4008.ulsfo.wmnet with OS bookworm completed: - lvs4008 (**PASS**) - Removed from Puppet and PuppetDB if... [17:55:47] 06Traffic, 13Patch-For-Review: Replace pybal with liberica on the PoPs - https://phabricator.wikimedia.org/T384477#10522131 (10Vgutierrez) [18:11:20] 06Traffic, 06SRE, 07Wikimedia-production-error: 503 error when edit large size pages - https://phabricator.wikimedia.org/T385395#10522170 (10Ahonc) I send same request in Postman, and got such timeline: It waits something 125 sec {F58356964} [18:34:51] 06Traffic, 06SRE, 07Wikimedia-production-error: 503 error when edit large size pages - https://phabricator.wikimedia.org/T385395#10522286 (10ssingh) Hi @Ahonc: Can you also share a traceroute to two other domains for comparison? Please share the output for traceroute to `google.com` and `text-lb.drmrs.wikime... [18:40:48] 06Traffic, 06SRE, 07Wikimedia-production-error: 503 error when edit large size pages - https://phabricator.wikimedia.org/T385395#10522294 (10Ahonc) > tracert text-lb.drmrs.wikimedia.org. Tracing route to text-lb.drmrs.wikimedia.org [185.15.58.224] over a maximum of 30 hops: 1 2 ms 1 ms 3 ms... [18:47:26] hello traffic friends, back once again with an ATS Lua change [0] (this time a fairly minor one that fixes an edge case involving config loading). [18:47:27] is there any time today that would be convenient for me to deploy this? [18:47:27] [0] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1084247 [18:47:54] swfrench-wmf: whenever you want, we are not doing any cp work today [18:49:36] sukhe: ah, great! alright, let me take a look at what all else is happening today and get back to you :) [18:51:39] :) [18:51:43] 06Traffic, 06SRE, 07Wikimedia-production-error: 503 error when edit large size pages - https://phabricator.wikimedia.org/T385395#10522330 (10ssingh) Thanks for sharing. It is interesting that your connection to drmrs and google is fairly what you would expect but to dyna (text-lb.esams), the second hop laten... [19:33:38] 06Traffic, 06SRE, 07Wikimedia-production-error: 503 error when edit large size pages - https://phabricator.wikimedia.org/T385395#10522506 (10Ahonc) These pages I cannot edit: https://w.wiki/CvKT , https://w.wiki/BNFa , but this https://w.wiki/CwnG I can edit (I tried 3 times and all three it was saved) and t... [19:53:27] sukhe: I'm free to do this now if that's not inconvenient for you all [19:53:43] swfrench-wmf: not at all, go for it, you take care of everything anyway :) [19:54:37] swfrench-wmf: poke me when you're done please, you just reminded me I have cp work to do [19:54:46] :) [19:54:54] ack, I'll get started [20:11:05] cdanis: I should be out of your way in 15-20m (progressively running puppet on the rest of A:cp-text now) [20:11:13] npnp! [20:11:24] I'm having fun with alpinejs 😎 [20:12:33] you can't share yet-another-js-framework and not share why! [20:13:04] this one is alright [20:36:55] sukhe: cdanis: all done [20:37:01] thank! [20:37:15] puppet reenabled everywhere? [20:37:24] thanks scott! [20:37:34] np! [20:37:49] cdanis: yup, reenabled / run on all of A:cp-text [21:02:49] ok, I've disabled puppet on A:cp [21:18:51] all done; puppet reënabled everywhere [21:32:21] Thanks!