[00:04:40] FIRING: [6x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [00:09:40] FIRING: [10x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [00:19:40] FIRING: [18x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [00:39:40] RESOLVED: [8x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [04:18:40] 06Traffic, 06Commons: HTTP 429 error on original image requests on Commons since a few days (iOS app) - https://phabricator.wikimedia.org/T413570#11494643 (10SuperHamster) Ah! I resolved the 429 issue on my end. My webapp uses Helmet.js, which defaults to hiding the Referrer header when loading external files.... [09:08:36] 10netops, 06Infrastructure-Foundations: access request - read-only access to pfw's for Avishua Stein (astein) - https://phabricator.wikimedia.org/T413826#11494856 (10ayounsi) @Dwisehaupt @Jgreen do you approve that request ? @AStein-WMF can you send me your SSH public key (ideally ed25519, SK is not supported... [10:48:45] 06Traffic, 06Commons: HTTP 429 error on original image requests on Commons since a few days (iOS app) - https://phabricator.wikimedia.org/T413570#11495109 (10TheDJ) >>! In T413570#11494643, @SuperHamster wrote: > Ah! I resolved the 429 issue on my end. My webapp uses Helmet.js, which defaults to hiding the Ref... [11:01:30] 06Traffic, 06Commons: HTTP 429 error on original image requests on Commons (iOS app by default hiding the Referrer header) - https://phabricator.wikimedia.org/T413570#11495171 (10Aklapper) [12:01:05] 10netops, 06Infrastructure-Foundations: access request - read-only access to pfw's for Avishua Stein (astein) - https://phabricator.wikimedia.org/T413826#11495408 (10Jgreen) >>! In T413826#11494855, @ayounsi wrote: > @Dwisehaupt @Jgreen do you approve that request ? Yup! > @AStein-WMF can you send me your S... [12:50:32] 06Traffic, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, and 4 others: Switch from restbase to rest-gateway for Citoid - https://phabricator.wikimedia.org/T361576#11495527 (10Mvolz) [12:51:10] 06Traffic, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, and 4 others: Switch from restbase to rest-gateway for Citoid - https://phabricator.wikimedia.org/T361576#11495528 (10Mvolz) [12:53:24] 06Traffic, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, and 4 others: Switch from restbase to rest-gateway for Citoid - https://phabricator.wikimedia.org/T361576#11495543 (10Mvolz) [13:17:13] 06Traffic, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, and 5 others: Switch from restbase to rest-gateway for Citoid - https://phabricator.wikimedia.org/T361576#11495724 (10Mvolz) [13:17:49] 06Traffic, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, and 5 others: Switch from restbase to rest-gateway for Citoid - https://phabricator.wikimedia.org/T361576#11495730 (10Mvolz) [13:23:32] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Propose a new set of standard thumbnail sizes - https://phabricator.wikimedia.org/T412971#11495789 (10Ladsgroup) I've added those sizes to https://www.mediawiki.org/w/index.php?title=Common_thumbnail_sizes&diff=prev&oldid=8130399 [14:11:20] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Propose a new set of standard thumbnail sizes - https://phabricator.wikimedia.org/T412971#11496076 (10Ladsgroup) I'd say we should move the discussion about pre-generation to another ticket since it's a bit offtopic but in the... [14:23:56] 10netops, 06Infrastructure-Foundations: access request - read-only access to pfw's for Avishua Stein (astein) - https://phabricator.wikimedia.org/T413826#11496117 (10AStein-WMF) Thanks! Here's the info! ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMjoabk/8RwY48ExL+TuZHUz466v8yluAuxZhSCO9pmp astein@wikimedia.org astein [14:34:06] 06Traffic, 06Commons: HTTP 429 error on original image requests on Commons (iOS app by default hiding the Referrer header) - https://phabricator.wikimedia.org/T413570#11496174 (10Nylki) >>! In T413570#11494643, @SuperHamster wrote: > Ah! I resolved the 429 issue on my end. My webapp uses Helmet.js, which defau... [14:35:52] sukhe: good morning and happy new year! let me know when you're free to chat about Bird 2.18 [14:38:05] XioNoX: happy new year! happy to chat right now, you know how much I love Bird [14:38:27] :) [14:38:52] sukhe: https://phabricator.wikimedia.org/T413740 santa claus brought us Bird 2.18 with all the routed ganeti patches [14:40:02] wow [14:40:28] moritzm backported it, I tested the basic features on ganeti2033/34, next we need to test BGP to the VM [14:40:39] in theory it should work out of the box [14:40:48] XioNoX: feel free to pick the usual doh/durum ones [14:40:56] any is fine, magru is the best [14:41:04] durum7004? [14:41:10] moritzm: sure [14:41:15] ganeti7001 only have testvm7001, durum7004, hcaptcha-proxy7002, doh7004 [14:41:26] and once we have done it there, we can upgrade a DNS host as well [14:41:26] sounds good [14:41:54] let Traffic know if you need to offset anything ot us [14:41:55] *to [14:41:58] and then ganeti7001 to make sure it works on both sides (vm/hypervisor) [14:42:26] initially we plan to replace the custom 2.17 branch deb with 2.18, but as a followup ideally all of Bird uses (and then copy 2.18 into "main", so that all of Wikimedia defaults to it independent of whether Bookworm or Trixie [14:42:58] sukhe: just getting green lights for durum7004, doh7004 and probably depool hcaptcha-proxy7002 when we do ganeti7001 [14:43:17] and yeah, this too [14:43:53] XioNoX: sounds good. (hcaptcha-proxy depool is simply stopping bird as well) [14:44:12] cool [14:44:15] moritzm: ok. once we finish the VM testing, I will do a DNS host before we roll it out everywhere [14:45:51] sure thing, we'll not rush this out :-) [14:47:11] I'll start upgrading Bird on ganeti7001 now [14:47:18] thanks! [14:47:56] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Cleaning up Puppet and Netbox VLAN sub-ints on edge sites - https://phabricator.wikimedia.org/T410411#11496209 (10ssingh) a:03ssingh [14:49:57] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Cleaning up Puppet and Netbox VLAN sub-ints on edge sites - https://phabricator.wikimedia.org/T410411#11496214 (10ssingh) Is there anything to be aware of about the order of this? Should we just merge the above patch and then s... [14:52:36] ganeti7001 is on 2.18, I'll upgrade durum7004 now (the upgrade incurs a restart anyway) [14:54:03] and durum7004 is updated [14:55:43] thanks [14:56:47] * sukhe checks BGP [14:57:00] looking [14:57:23] thanks :) [15:00:12] sukhe: v4 and v6 lgtm [15:02:58] I'll proceed with doh7004 then [15:06:12] and doh7004 is updated [15:09:07] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Cleaning up Puppet and Netbox VLAN sub-ints on edge sites - https://phabricator.wikimedia.org/T410411#11496261 (10cmooney) >>! In T410411#11496209, @ssingh wrote: > Is there anything to be aware of about the order of this? Shou... [15:14:27] lgtm! [15:14:48] v4 and v6 prefixes are still learned over BGP on ganeti7001 [15:15:01] XioNoX: silly question, but how did you verify the BGP sessions? I am on asw1-b13 but I am really not sure which commands to run; I can do a blank routing for the durum IPs but of course that's not specific. [15:15:20] *asw1-b3 [15:15:32] on ganeti7001, I just did `ip route | grep tap1` [15:15:44] for doh7004 [15:15:58] tap0 is durum7004 [15:16:46] aaah makes sense [15:17:31] on the switch you can do that [15:17:35] I'll do hcaptcha-proxy7002 next, but what did you mean with depool earlier? it's not an LVSed service. I'd do the same upgrade step as for durum and wikidough otherwise? [15:17:52] moritzm: yeah, simple stopping bird is enough as it is anycasted [15:17:56] ok [15:17:59] (hcaptcha-proxy7002) [15:18:10] that reminds me, we should clean up the old-LVS service. so thanks [15:18:33] ganeti7001 is 10.140.0.11, so you can do asw1-b3-magru> show route receive-protocol bgp 10.140.0.11 [15:18:56] and you can see that 185.71.138.138/32 is there too [15:20:20] surprising, I tried that and it did't show me anything but now it does. I can't see which command I ran before though, so I think got something wrong [15:20:51] sukhe: dunno if it's a typo but you said "I am on asw1-b13" so maybe you were on drmrs ? [15:21:21] yeah that was a typo but I was most certainly on asw1-b3 [15:21:22] 10:15:20 < sukhe> *asw1-b3 [15:21:25] this bit [15:21:33] anyway, thanks, I can see it now, so helpful [15:30:09] hcaptcha-proxy7002 is now also updated [15:34:46] thanks [15:34:47] XioNoX: sukhe@asw1-b3-magru> show route receive-protocol bgp 195.200.68.103 [15:34:50] {master:0} [15:34:52] no output again, what am I getting wrong? [15:35:11] checking [15:35:14] dig -x 195.200.68.103 +short [15:35:15] hcaptcha-proxy7002.wikimedia.org. [15:35:21] so that's fine at least [15:35:37] sukhe: you need to do the show route on the ganeti host [15:35:47] so it's still `show route receive-protocol bgp 10.140.0.11` [15:35:59] aaah on ganeti7001, got it [15:36:04] I mixed that up [15:36:08] checking again [15:36:12] and in the output you should see the IP advertised by the VM (through ganeti7001) [15:36:20] in addition to the VM main IP [15:36:21] * 195.200.68.103/32 10.140.0.11 64612 I [15:36:29] looks good, thanks, sorry for the confusion! [15:36:31] so that's the main IP [15:36:48] and if hcaptcha-proxy7002 advertises an IP there should be another one [15:37:04] 10.3.0.10 [15:37:08] yep that's the one [15:37:14] perfect [15:37:26] * 10.3.0.10/32 10.140.0.11 64612 64605 I [15:37:40] also looks good on the host (not that that is the authoritative one but yeah) [15:38:02] if we're all happy, then I would move forward with ganeti7002 and then hcaptcha-proxy7001, ok? [15:38:38] ganeti7001 first [15:38:55] sure, sure [15:39:09] ganet7001 is already updated [15:40:18] 48m ago? :) [15:40:24] that's cool [15:40:29] everything is fine [15:43:18] :) [15:43:37] yeah [15:43:42] I started with ganeti7001 [15:43:53] I'll upgrade ganeti7002 now [15:44:25] ganeti7002 is updated [15:44:33] hcaptcha-proxy7001 next [15:45:20] and hcaptcha-proxy7001 is on 2.18 [15:47:36] * 10.3.0.10/32 10.140.1.12 64612 64605 I [15:47:41] * 195.200.68.102/32 10.140.1.12 64612 I [15:48:17] looks good! [15:49:19] ok! I'm upgrading gantei7003 next and then durum7003 [15:52:14] actually durum7003 seems to be inaccessible, maybe it had Puppet disabled and got evicted from puppetdb over the holiday period [15:52:45] upgrading doh7003 in the mean time [15:52:53] wow, interesting [15:53:35] that's pretty weird [15:53:38] I can see the VM itself running in Ganeti, I'll check later what happened there [15:53:54] there isn't any alert for that as well I can see [15:54:50] ah now I recall it [15:54:53] https://sal.toolforge.org/log/dCaCgZkBffdvpiTrowC6 [15:55:12] ah [15:55:24] doh7003 is upgraded [15:55:26] I need to refresh my memory on what happened [15:55:48] it was possibly the "same node issue" [15:56:09] but on the bright side we'll fix that next, I'm building dnsmasq 2.92rc3 tomorrow [15:56:15] yeah, possibly but I need to check my logs on why I left it, or I simply got busy and moved on with the other work [15:57:46] I'm upgraing ganeti7004 now, then magru is done [15:57:51] ok thank you [15:57:56] you can leave durum7003 to me please [15:59:00] ok! [15:59:25] ganeti7004 is done [15:59:37] unless any issues pop up with magru, I'd upgrade esams tomorrow [16:00:04] ok. I think one day baking period especially with sessions clearly establed is enough as well [16:00:12] ack [16:00:21] so at least my +1 [16:01:48] great :-) [16:03:28] and when routed Ganeti is complete, we can upgrade wikidough/durum/hcaptcha on the classic Ganeti VMs (the ones currently on Bird 2.12) and then later on test upgrading dns* nodes [16:57:28] very cool :) [17:16:20] hi traffic, not sure how many of you are in today, any objections to me deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1223692 ? [17:51:24] cdanis: no objections given a single host [17:51:32] thanks :D [17:51:39] (just me and bblac.k are around today) [17:52:12] the ramshackle-yet-repeatable local perf testing I did seemed to say, not a huge impact, something like 20 uS [18:02:52] the benefits of this should make up for it :> [18:04:03] yes [18:04:20] always optimize for SRE toil reduction over system perf anyways :) [18:06:18] yes :) [18:07:34] so right now it breaks haproxy on hosts that don't have it enabled; I'm fixing that the easiest way [18:08:17] I saw that in the PCC output but I figured since it wasn't updating the haproxy cfg, it was ok [18:08:20] sorry about that [18:09:39] yeah, I should have realized [18:09:49] the lua file loads both unconditionally [18:12:09] forgot how to write puppet over break :> [18:12:11] cdanis: I think "or" is probably better here [18:12:18] lol yes just uploaded that [18:12:36] now I wonder if CI will dislike some unnecessary parentheses [18:12:46] lol [20:47:55] sukhe: do you mind if I also enable it on one text host somewhere, right now? [20:50:53] cdanis: no issues from me if you are comfortable [23:29:17] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: Propose a new set of standard thumbnail sizes - https://phabricator.wikimedia.org/T412971#11498230 (10AntiCompositeNumber) Special:NewFiles doesn't appear to be as bad as it was a few years ago, but I do think it would still be...