[01:26:59] 06Traffic, 06Movement-Insights: Disable Chrome Private Prefetch Proxy - https://phabricator.wikimedia.org/T364126#9802684 (10BBlack) >>! In T364126#9785855, @akosiaris wrote: > Media (images, video, etc) are served from `upload.wikimedia.org` and are requested without Cookies. While we don't set cookies from... [08:20:44] 06Traffic, 13Patch-For-Review: Use IPIP encapsulation on lvs<-->upload cluster - https://phabricator.wikimedia.org/T357257#9803279 (10Vgutierrez) [08:35:25] 06Traffic: MSS clamper clamping check false positives - https://phabricator.wikimedia.org/T365101 (10Vgutierrez) 03NEW [08:44:41] 06Traffic: MSS clamper check triggers false positives - https://phabricator.wikimedia.org/T365101#9803417 (10Vgutierrez) [09:50:30] 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging: HAProxy log format doesn't support "invalid" request path - https://phabricator.wikimedia.org/T365117 (10Fabfur) 03NEW [09:51:07] XioNoX, topranks: context being: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1032400/2/modules/prometheus/files/usr/local/bin/prometheus-lvs-realserver-mss.py#32 any idea if I could determine that 12 bytes difference programatically rather than hardcoding it? [09:59:44] vgutierrez: not off the top of my head no [10:01:05] RFC6691 says that the MSS should be IP MTU - (fixed IP + TCP headers). So 1460 for IPv4 when there is a 1500 byte MTU for example. Any space used by IP or TCP options needs to be accounted for, but shouldn't affect MSS? [10:01:57] Why the kernel sets TCP_MAXSEG to 12 bytes lower I don't know, perhaps just how it internally deals with the options it knows it needs to add and how to get the tcp chunking right. [10:02:15] topranks: TCP_MAXSEG here reports 12 bytes less than MSS as seen on wireshark [10:02:27] I'll try to see if I can find anything, did you just discover this yourself? [10:02:37] ok yeah [10:02:45] b.black has seen the same 12 bytes gap yesterday on his tests as well [10:03:01] yeah it rings a bell with me too from somewhere [10:03:46] and checking with wireshark the TCP options of the next packet on a regular connection it matches the length of the TCP options [10:06:47] other thing I'd wonder is if segmentation offload might be palying a role in why it looks different on wire to what kernel reports [10:11:24] OSX behaves in the same way here as the linux kernel btw [10:13:52] https://github.com/torvalds/linux/blob/3c999d1ae3c75991902a1a7dad0cb62c2a3008b4/net/ipv4/tcp_output.c#L1879 [10:13:55] that explains it [10:15:23] my problem with the hardcoded 12 bytes is that it's assuming a certain return value from tcp_established_options: static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb, [10:15:31] uh.... wrong copy paste [10:15:38] https://github.com/torvalds/linux/blob/3c999d1ae3c75991902a1a7dad0cb62c2a3008b4/net/ipv4/tcp_output.c#L973 [10:15:40] that's better :) [10:16:12] given how we run the test, locally on the realserver hosts we shouldn't have issues but still :) [10:23:40] I was looking at the kernel stuff, it seems you are being returned the value of "mss_cache" (output.c L1813 has an explainer of user_mss, mss_clamp and mss_cache variables) [10:24:43] interesting comment there [10:24:47] what's returned by getsockopt seems like it can be one of these depending on the exact state the socket is in when called [10:24:47] https://github.com/torvalds/linux/blob/master/net/ipv4/tcp.c#L4043 [10:24:49] "NOTE1. rfc1122 clearly states that advertised MSS DOES NOT include either tcp or ip options." [10:25:14] Yeah. RFC6691 clarifies and confirms that. [10:26:49] https://www.irccloud.com/pastebin/OyHA5XeV/ [10:26:55] and tha's the whole picture I. guess [10:30:47] no exact answer here either but some related discussion: [10:30:48] https://lore.kernel.org/netdev/34BAAED6-5CD0-42D0-A9FB-82A01962A2D7@linux.alibaba.com/T/ [10:33:09] I guess what you want is "mss_clamp" but not sure if you can fetch it, getsockopt is giving you mss_cache for TCP_MAXSEG instead. [10:34:06] is the socket is in "repair" mode the code seems to return mss_clamp, for normal established socket it returns mss_cache [10:34:13] *if [10:38:43] you can maybe set TCP_REPAIR on the socket after the handshake, before you call getsockopt for TCP_MAXSEG? [10:40:23] TCP_REPAIR is missing on python3.9 [10:40:29] (added in 3.12) [10:40:53] dammit [10:41:07] it looks like I could hardcode the constant though [10:41:15] yeah [10:41:39] I guess the question is if the timestamp option is always going to be added? [10:42:08] such that the returned value you get is always going to account for it and be 12 bytes lower [10:42:26] or if sometimes it won't be adding that option, and the returned value won't be 12 bytes lower [10:42:45] yeah [10:42:47] https://github.com/python/cpython/pull/98031/commits/adcd727389dcd6553b0a92fbb27e87539f0c859e [10:42:58] that's the commit that adds TCP_REPAIR support on python 3.12 [10:43:08] so it's a matter of setting the constant value [10:44:28] I'll test that later, thanks for your feedback topranks :) [10:45:23] 10Wikimedia-Apache-configuration, 06serviceops, 10Wikimedia-Site-requests, 13Patch-For-Review: Temporarily redirect sgs.wikipedia.org to bat-smg.wikipedia.org until bat-smg->sgs move can be done - https://phabricator.wikimedia.org/T204830#9803912 (10Clement_Goubert) The redirect from sgs.wikipedia.org to b... [10:46:04] interesting rabbithole! [10:46:22] yep :D [10:50:39] I see TCP_TIMESTAMP was also added in Python 3.12, so you can't test for that being there in 3.9 either :( [11:24:58] yep https://docs.python.org/3/library/socket.html#socket.SOMAXCONN (see the lastChanged in version 3.12 ) [11:46:45] 06Traffic, 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team, 07Security, 05SUL3: Create a Wikimedia login domain that can be served by any wiki - https://phabricator.wikimedia.org/T363695#9804124 (10Tgr) [12:00:12] topranks: it works as expected [12:00:20] https://www.irccloud.com/pastebin/9hL2yTdK/ [12:00:34] bblack: ^^ [12:01:04] vgutierrez: nice! [12:01:08] what does this line do? [12:01:13] s.setsockopt(socket.IPPROTO_TCP, 19, 1) [12:01:52] topranks: enable TCP_REPAIR [12:02:12] ah ok yeah I thought so, so it can be done in 3.9, we just don't have the pretty name? [12:02:25] indeed [12:02:35] awesome! [12:02:48] probably adding the 12 bytes would have been fine, but that does seem safer :) [12:03:45] ah nice [12:03:53] and yeah, this is safer, options things could vary for $reasons [12:04:08] indeed [13:06:57] 10Wikimedia-Apache-configuration, 06serviceops, 10Wikimedia-Site-requests, 13Patch-For-Review: Temporarily redirect sgs.wikipedia.org to bat-smg.wikipedia.org until bat-smg->sgs move can be done - https://phabricator.wikimedia.org/T204830#9804353 (10Fomafix) 05Open→03Resolved a:03Fomafix [13:16:04] 06Traffic, 13Patch-For-Review: MSS clamper check triggers false positives - https://phabricator.wikimedia.org/T365101#9804476 (10Vgutierrez) 05Open→03Resolved removed raw sockets usage. We are now fetching MSS data via getsockopts() [13:41:46] https://phabricator.wikimedia.org/T320563 cdanis [13:43:02] absolutely the wrong channel, apologies [13:43:40] godog: no backsies! [13:43:56] lolz [14:56:34] 06Traffic, 06MW-Interfaces-Team, 06serviceops: map the /api/ prefix to /w/rest.php - https://phabricator.wikimedia.org/T364400#9805184 (10daniel) a:03daniel [14:59:20] 10netops, 06Infrastructure-Foundations, 06SRE: Switch BGP (EVPN) topology between rows/spines at core sites - https://phabricator.wikimedia.org/T365169 (10cmooney) 03NEW p:05Triage→03Low [15:04:02] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Codfw row C/D switch installation & configuration - https://phabricator.wikimedia.org/T364095#9805245 (10cmooney) [15:04:06] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Switch BGP (EVPN) topology between rows/spines at core sites - https://phabricator.wikimedia.org/T365169#9805246 (10cmooney) [15:08:31] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Switch BGP (EVPN) topology between rows/spines at core sites - https://phabricator.wikimedia.org/T365169#9805265 (10cmooney) [15:13:27] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Switch BGP (EVPN) topology between rows/spines at core sites - https://phabricator.wikimedia.org/T365169#9805297 (10cmooney) [15:15:52] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Switch BGP (EVPN) topology between rows/spines at core sites - https://phabricator.wikimedia.org/T365169#9805309 (10cmooney) p:05Low→03Medium [16:12:57] 06Traffic, 06Movement-Insights: Disable Chrome Private Prefetch Proxy - https://phabricator.wikimedia.org/T364126#9805638 (10akosiaris) >>! In T364126#9802684, @BBlack wrote: >>>! In T364126#9785855, @akosiaris wrote: >> Media (images, video, etc) are served from `upload.wikimedia.org` and are requested withou... [16:33:48] 10netops, 06Infrastructure-Foundations, 06SRE: magru network setup - https://phabricator.wikimedia.org/T362421#9805776 (10ssingh) Thanks to @cmooney for rolling the above out. For further context, we (Traffic and netops) decided to try out the anycast range in magru for the Wikidough service before doing it... [16:51:45] 10netops, 06Infrastructure-Foundations, 06SRE: magru network setup - https://phabricator.wikimedia.org/T362421#9806067 (10cmooney) And fwiw announcement looks good, all 3 of our transits are learning it ok, and I see it on other carriers from those sources as well. We also see live requests on the doh servers. [16:54:57] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Switch BGP (EVPN) topology between rows/spines at core sites - https://phabricator.wikimedia.org/T365169#9806104 (10cmooney) [17:08:12] 10netops, 06Infrastructure-Foundations, 10Sustainability (Incident Followup): Optimise WMF WAN Network Configuration - https://phabricator.wikimedia.org/T297355#9806267 (10Aklapper) a:05cmooney→03None @cmooney: Removing task assignee as this open task has been assigned for more than two years - see the e... [17:09:26] 07HTTPS, 06SRE, 06Traffic-Icebox, 13Patch-Needs-Improvement: Provide acme-chief/TLS SNI list support in compile_redirects() - https://phabricator.wikimedia.org/T225096#9806284 (10Aklapper) a:05Vgutierrez→03None @Vgutierrez: Removing task assignee as this open task has been assigned for more than two ye... [18:10:42] 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging, 13Patch-For-Review: HAProxy log format doesn't support "invalid" request path - https://phabricator.wikimedia.org/T365117#9806504 (10Fabfur) Some other information about this: * In HAProxy replacing `%HPO` with `%HP` logs the whole... [19:05:11] 10netops, 06Infrastructure-Foundations, 10Sustainability (Incident Followup): Optimise WMF WAN Network Configuration - https://phabricator.wikimedia.org/T297355#9806679 (10cmooney) p:05Medium→03Low Thanks. It is very much something we wish to do but unfortunately other priorities have always trumped it... [19:05:30] 10netops, 06Infrastructure-Foundations, 10Sustainability (Incident Followup): Optimise WMF WAN Network Configuration - https://phabricator.wikimedia.org/T297355#9806681 (10cmooney) [19:54:22] 10netops, 10Cloud Services Proposals, 06Infrastructure-Foundations, 06SRE: Separate WMCS control and management plane traffic - https://phabricator.wikimedia.org/T314847#9806824 (10cmooney) 05Open→03Resolved This has been implemented and the new vlan setup is recorded [[ https://wikitech.wikimedia.... [20:19:50] 06Traffic, 06Movement-Insights: Disable Chrome Private Prefetch Proxy - https://phabricator.wikimedia.org/T364126#9806926 (10BBlack) >>! In T364126#9805638, @akosiaris wrote: > * They do have a lot of presence all over the world. Presence we don't have currently and would take us decades to obtain. And via CP... [21:24:00] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Problem re-imaging hosts on row-wide vlan on EVPN switches - https://phabricator.wikimedia.org/T365204 (10cmooney) 03NEW p:05Triage→03High [21:31:15] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Problem re-imaging hosts on row-wide vlan on EVPN switches - https://phabricator.wikimedia.org/T365204#9807207 (10cmooney) [21:31:39] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Problem re-imaging hosts on row-wide vlan on EVPN switches - https://phabricator.wikimedia.org/T365204#9807214 (10cmooney) [21:45:24] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Problem re-imaging hosts on row-wide vlan on EVPN switches - https://phabricator.wikimedia.org/T365204#9807226 (10cmooney) [22:14:13] https://i.imgur.com/CWSFsHq.png [22:14:15] :) [22:17:03] 06Traffic: Craft geo-maps file to create lowest-latency routes from south america - https://phabricator.wikimedia.org/T363722#9807297 (10CDanis) @GreenReaper thanks so much for the helpful contribution :) I'll see if I can reproduce your results. [22:17:22] nice! [22:22:34] 06Traffic, 06Infrastructure-Foundations, 06SRE: Slowly ramping up traffic to the Brazil data center (magru) and related geo-maps - https://phabricator.wikimedia.org/T359054#9807307 (10CDanis) Adding the 3rd transit link in magru **greatly** improved the latency for many users in Argentina. The transit link... [22:22:37] 06Traffic: Craft geo-maps file to create lowest-latency routes from south america - https://phabricator.wikimedia.org/T363722#9807309 (10CDanis) Adding the 3rd transit link in magru **greatly** improved the latency for many users in Argentina. The transit link went live midway through Monday the 13th. Here's a...