[06:19:36] 10Traffic, 10Operations, 10media-storage, 10Patch-For-Review: Remove unnecessary response headers - https://phabricator.wikimedia.org/T194814#4209672 (10TheDJ) ``` X-Varnish 521726689 533337780, 225083667 220092282, 525815818 515121340 Server mw1238.eqiad.wmnet ``` Especially these two can be handy. In the... [07:30:03] https://hacks.mozilla.org/2018/05/a-cartoon-intro-to-dns-over-https/ [07:30:40] With mentions of Wikipedia as examples [07:32:30] > Cloudflare is providing a recursive resolution service with a pro-user privacy policy [07:35:21] skeptical ema is skeptical [07:50:17] ema: "privacy" [07:50:51] or just giving themselves a better position in the CDN market [07:56:05] BTW, I cannot use cloudflare DNS service from my home connection [07:56:10] my router hijacks the traffic [07:56:20] willikins:~ vgutierrez$ traceroute 1.1.1.1 [07:56:20] traceroute to 1.1.1.1 (1.1.1.1), 64 hops max, 52 byte packets [07:56:21] 1 1dot1dot1dot1.cloudflare-dns.com (1.1.1.1) 5.127 ms 3.225 ms 3.121 ms [07:56:23] right. And really what's the difference between 1.1.1.1 and 8.8.8.8? At the end of the day it's about trusting them to do what they say, and they both say they don't spy on you so there you go! [08:00:02] right now... 8.8.8.8 supports EDNS Client subnet and 1.1.1.1 nope [08:00:40] of course they don't support EDNS client subnet to protect the user :) [08:01:00] obviously [08:02:53] but of course, dns over https support on mainstream browsers is pretty useful when you are on an untrusted network [08:03:33] the only good solution is to have your local dns server, but ofc doesn't scale ;) [08:06:33] I recommend 9.9.9.9 if I have to pick one [08:06:59] (for the trust part ofc, that is) [08:07:00] :) [08:07:04] XioNoX: IBM? [08:07:17] quad9 [08:08:33] Dns over https is convenient for cloudfare, the same servers host dns and http [08:08:55] And recursive dns [08:09:19] BTW, same as cloudflare, quad9 (9.9.9.9) blocks EDNS client subnet [08:09:56] Oh really :( [08:10:00] vgutierrez: privacy and personalization don't really go well together ;) [08:10:08] https://www.quad9.net/faq/#What_is_EDNS_Client-Subnet [08:11:10] Middle ground would be to remove the last byte of the IP [08:11:24] Or sent the subnet [08:11:31] XioNoX: hmm I think that EDNS client subnet already does that [08:11:40] usually yes IIRC [08:13:32] it does [08:13:50] the request packet has a mask, and the client is free to pick whatever mask they think is suitable [08:14:25] the response also has a mask, and it can be different -- more specific, or less specific/wider [08:14:35] for the caching yes [08:14:38] now I remember :) [08:15:28] still ETOOEARLY [08:16:15] paravoid: go back on vacations! [08:16:17] ;) [08:16:45] volans: EMISSINGCOFFEE [08:16:54] * volans doesn't drink coffee [08:16:57] WUT? [08:17:07] what a disgrace for an italian folk [08:17:20] I'm fake [10:40:16] anyone available for a quick review? https://gerrit.wikimedia.org/r/#/c/436504/ [10:40:37] mostly double checking I didn't miss anything [10:40:44] https://gerrit.wikimedia.org/r/#/c/436505/ will follow on the DNS side once all looks good [10:45:01] :P [10:46:04] lol, thanks mark! Do you know if that can be just merged and wait that puppet runs everywhere or there is any specific deployment procedure? [10:46:17] i have no idea [10:46:24] sorry :) [10:46:29] IIRC should be ok given that it's a new service and there is no traffic neither a public DNS [10:46:35] but better safe than sorry on those things [10:46:56] i strongly doubt it would cause issues [10:47:01] but, never worked with this setup yet [10:47:34] WHEN I WAS A BOI [10:47:38] we edited vcl directly... [10:50:02] :D [10:50:16] well ok, erb in puppet :P [10:51:57] I hope with at least a middle step in perl :-P [10:53:57] i've tried not to hire people who like perl [10:54:00] i failed somewhat [10:54:12] lol [11:03:35] anyway, i'm pretty sure it just generates a new vcl file from the hieradata and then automatically reloads that in varnish [11:03:43] so you can just merge that [11:03:54] ...that was already working 5 years ago, i highly doubt they broke that :P [11:05:49] :) [11:07:47] FWIW, when we dropped the imagescaler service recently, it was just a matter of merging the puppet patch [11:10:24] for whoever is interested: https://turnilo.wikimedia.org/#webrequest_sampled_128 - this is the new datasource (more straightforward name) and contains 4 new fields: isp, as-number, country, continent [11:11:22] elukey: nice! how can I filter per country? I don't see any value there [11:11:54] volans: if you want a breakdown you need to split [11:12:23] elukey: no what I mean is that the dropdown has no possible values to filter [11:12:28] and if you split the only value is null ;) [11:13:13] indeed it is, something is wrong in turnilo or in the druid indexation, checking it now [11:16:39] nevermind I was too eager, indexation seems still ongoing [11:16:43] will report back when finished [11:16:48] :) np [11:16:55] thanks for sharing, very nice [11:21:12] volans: fixed! It was "country_code" not "country" :) [11:21:19] so turnilo was right [11:21:24] now it should work [11:21:37] indeed! [11:22:14] let me know if you find other weird things [11:23:24] didn't try all the other fields, I was lucky on the first one :-P [11:25:32] I tried the others and they seem to work (of course I didn't try the only one that was wrong :P) [11:26:09] eheheh, you need to smell the bugs ;) [11:38:27] demo effect sucks :) [12:24:58] I'm about to merge https://gerrit.wikimedia.org/r/#/c/436509/ FYI [12:30:43] volans: A record for debmonitor.wikimedia.org is missing though [12:31:02] ema: see above [12:31:03] Fri 12:40:44 volans| https://gerrit.wikimedia.org/r/#/c/436505/ will follow on the DNS side once all looks good [12:32:40] ema: is ther order wrong? [12:37:20] volans: all good, I've just missed your "will follow on the DNS side once all looks good" comment and I thought you forgot the dns part :) [12:37:45] np :) [12:45:31] volans: looks good, I see debmonitor's login page :) [12:46:24] through cp1051 - cp3008 - cp3010, you might get a 404 if you end up on other cache hosts where puppet hasn't run yet [12:46:39] yeah I was just waiting it to run everywhere [12:46:40] :D [12:46:47] thanks for checking [12:56:15] ok, all looks good merging the dns part [13:00:40] all good [13:00:45] nice [13:01:35] I'll leave the client side for next week :D [13:02:47] 10Traffic, 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4248605 (10Vgutierrez) it would be nice to be able to use X25519 curve here, OpenSSL provides support for X25519 since version 1.1.0. Regarding... [13:44:32] 10Traffic, 10Operations, 10Patch-For-Review: Merge cache_misc into cache_text functionally - https://phabricator.wikimedia.org/T164609#4248691 (10ema) [13:46:11] 10Traffic, 10Operations, 10Patch-For-Review: Merge cache_misc into cache_text functionally - https://phabricator.wikimedia.org/T164609#3239728 (10ema) [13:46:39] ema: the re2 hostmatch idea looks pretty awesome :) [13:52:24] bblack: seems promising, yes! :) [13:53:51] 10Traffic, 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4248709 (10Ottomata) Hm, ya, sounds like a way off before we get that in Debian then, ya? Is that something that would block removal of IPSec? [14:22:31] * mark needs reviewwwsssssss [14:39:38] bblack: https://gerrit.wikimedia.org/r/#/c/436526/ probably doesn't work as intended because of the vhtcpd regex filter (-r [um][pa][lp][os]) [14:41:35] it's a question more than a statement really [14:43:26] 10Traffic, 10Operations, 10Patch-For-Review: Merge cache_misc into cache_text functionally - https://phabricator.wikimedia.org/T164609#4248902 (10ema) [14:51:35] ema: it works, the URLs being purged are e.g. ttps://upload.wikimedia.org/wikipedia/labs/5/5f/Wikimedia_network_overview.png [14:52:26] the uplo|maps filter is just a perf hack to reject most of the non-upload/maps purges quickly without hitting varnish. [14:52:53] but if it happens to match some obscure wiki like duplo.wikipedia.org in the future or whatever, meh. [14:55:06] bblack: oh right those are purges sent by wikitech w/ Host: upload.wikimedia.org [15:00:12] it's a good topic to rewind on a bit though (well more continuing from yesterday about multicast IPs than this) [15:00:29] historically, we had a single multicast IP that the cache clusters listened to [15:00:55] I defined the split into 4x addresses that we see today in e.g. https://wikitech.wikimedia.org/wiki/Multicast_HTCP_purging [15:01:07] err... this one: https://wikitech.wikimedia.org/wiki/Multicast_IP_Addresses [15:01:36] when the .113 for upload was initially split off, we had the upload caches listening to both .112 + .113, because MW was still sending everything to .112 [15:02:25] but the idea was that the purge rate for actual upload is much much lower than the purge rate for text, so we should reconfigure MW to split its purges and use these separate multicast IPs (which moves the perf hack back up a layer from the uplo|maps regex, basically) [15:05:06] we tried that once. I figured out (at least I think!) what the syntax was for mediawiki-config to get the split, although I'm not 100% sure I had the syntax right, or that the (probably-untested!) splitting really works right for all situations. [15:05:49] we tried this back in the era when we had lots of complaints about failed upload purges (I think most of those causes turned out to be relatively-unrelated / confusing symptoms and are probably now in the past mostly) [15:06:11] and it seemed, if anything, to exacerbate user reports of upload purge loss, so we reverted back to just sending everything on .112 again [15:06:59] (the half-ass guess of a theory being that maybe the split worked, but maybe upload's relatively-rare purges had more success in beating purging cache-layer-race conditions when they were mixed into text's heavier purge stream, than when they were isolated?) [15:08:07] but we have other mitigations for layer races now, and splitting off upload's lower purge rate to a separate socket at least reduces the chance of upload purge loss during one of the (routine these days) awful spikes of text purge rates. [15:09:41] at this point the .114 (maps) and .115 (misc) are easy to deal with and/or get rid of. We can fold them up into upload and text respectively and get the apps to stop using the pointless extra IPs. [15:10:00] (the latter bit after your misc-into-text is done) [15:10:45] in the long run, we're likely to have at least some cache layers support upload+text simultaneously, and/or we're likely to move to a kafka-based purging solution, so the whole text/upload 112/113 split matters little in the long view, too. [15:10:58] right [15:11:06] currently nothing is using .113 though, right? [15:11:46] but perhaps in the short term, we should look again at splitting mediawiki's upload purges to .113 and then turning off the .112 listener on upload, to reduce some odds of upload purge loss during text purge spikes and make the multicasting over all our switches a little more efficient. [15:11:53] yeah afaik nothing's actively hitting .113 now [15:12:43] basically retry the experiment from before, under better conditions with more related issues already-solved. [15:13:47] ok so the short term plan would be to make maps go .114 -> .113 and mediawiki .112 -> .113 (the latter for upload URLs only) [15:14:09] this was the mediawiki-config patch for the experiment last time (which was later reverted) https://gerrit.wikimedia.org/r/#/c/249121/6/wmf-config/squid.php [15:15:19] right, short term we can ask the maps team to switch to using .113, and once they're done/confirmed we can drop that IP. [15:15:50] and as we work on your misc+text plan, we can add .115 to text initially to make it easy, then later work on asking whichever (probably few!) misc services send purges to .112 instead of .115 and eventually kill it. [15:16:25] ideally situations like wikitech should do their upload on a .113 split as well (.115/.113 instead of .112/.113, for now) [15:16:54] I think (?) wikitech may be the only real mediawiki install that happens to be in cache_misc today [15:20:26] for sure I haven't seen any other purge traffic on misc but wikitech's :) [15:27:45] right, most services probably don't have purge support and/or are pass-mode anyways :) [15:31:44] looks like there's labtestwikitech.wikimedia.org too [15:32:51] interesting, I've looked for 200 responses to HEAD /wiki/Main_Page on all misc hostnames [15:33:01] both phabricator and graphite-labs say 200 :)