[08:19:53] Hi all! I want to deploy a couple of changes to the rest gateway in a bit, https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1268520 and also https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1267122 if i can get that one ready in time. Should I use the slot at 10:00 UTC? Will any of you be around to give a quick +1? [09:48:16] duesen: get me a minute to take a look at those [09:55:15] <_joe_> -1 from me on https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1268520 [09:55:43] <_joe_> the logic is that if you don't change the UA from the one given by the bot creator, to personalize it, you're not complying [09:55:58] <_joe_> we should keep logic between the edge and the rest gateway consistent [10:05:55] _joe_: i'm confused - that patch changes the *key* used for compliant bots to be the IP address instead of the contact address. It doesn't change the rate limit class. If there is no x-ua-contact, you get treated as anon... [10:07:02] you mean that people who don't customize should share a key, to "munish" the fact that thei are not changing the UA? [10:07:09] <_joe_> yes [10:07:26] <_joe_> it's a way to induce them to change it with their own contact info [10:07:34] The flip side of that is that you can easily bypass the rate limit by seding a different contact string for every request. That seems extremely unsafe. [10:08:19] <_joe_> that's why we limit the size of the contact we accept at the edge [10:08:27] <_joe_> which is in x-ua-contact [10:09:11] <_joe_> but also, people who want to abuse don't tend to use email contacts [10:09:44] I don't understand - I can just send foo1@test.com, foo2@test.com, foo3@test.com... I can do that a million times and not hit a size limit... [10:10:01] <_joe_> and we'll catch you and ban you [10:10:21] <_joe_> but you can also use a residential proxy for 1 dollar per million requests and hop IPs [10:11:07] <_joe_> this has been extensively debated in SRE. And I would prefer not to discuss these details on publicly logged channels tbh [10:11:25] <_joe_> I am convinced our model is correct, but even if not - we need to keep them consistent [10:11:37] <_joe_> that's why my vote was -1 and not -2 [10:11:39] <_joe_> :) [10:12:24] Hm. Can you comemnt on the ticket, please? I was hoping to fix two issues with this change - a long standing TODO in the config to switch to IPs, wecause we don't want to key on something fully under the user's control. And secondly, the issue we are hitting with Openrefine, see https://github.com/OpenRefine/OpenRefine/issues/7731 [10:13:39] If you want use to continue to use x-ua-contact as a key, I have to go back on what I just told them about switching to IP based limits. [10:13:47] <_joe_> so lemme be open: the IP is also under user control [10:14:35] <_joe_> sure I can comment on the task [10:15:03] <_joe_> will do as soon as I can, I didn't plan to have this discussion tbh and I have deadlines today. [10:15:50] Yea, switching IPs is not quite as cheap and easy as putting a counter in the user agent string. But true, to anyone who really *wants* to bypass the rate limits, IPs are cheap enough. [10:16:29] <_joe_> and you're starting from the idea that most rogue users special-case their evasion techniques to our policies [10:16:31] <_joe_> which they don't [10:17:36] <_joe_> again, not here, we should set up a meeting with jon, you, claime, cdanis and I (I can be optional). what we definitely can't have is a split brain situation between CDN and the rest gateway [10:20:16] I'm sending a DM to jon to set that up [10:22:12] may I join too? I promise I'll be good :P [10:33:31] _joe_: I just DMed you to ask for a meeting :) [10:33:47] duesen: I asked Jon to set a metting up actually [10:34:18] claime: thanks. i wonder whether it's useful to have Jonathan there as well. [10:34:35] It is IMO [10:34:52] This is an alignment issue, not a technical one [10:52:58] duesen: re: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1267122 I don't see what we discussed yesterday, did you maybe not push the changes? [10:53:32] (back in a few minutes, I need a coffee) [12:47:03] claime: I didn't finish it. The change is simply, but I can't get the tests to pass. It's driving me nuts, I must be missing something obvious. [15:00:39] claime: I finally got it to work. There was a whole pile of oddness in the way. [15:01:20] One thin I'd really like to change is that we re-define all the routes for staging. That makes it really easy to miss an issue in the real route definitions. I'll look into a better solution. [15:02:05] Yeah... I think the issue is for some endpoints we use staging backends, but for some we don't have them [15:03:09] otherwise we could just inherit the whole block of routes I don't think that would be an issue [15:19:10] the actual endpoints are configured in the discovery_endpoints stanza. but I still need a way to *add* extra endpoints for testing new features before we apply them to production endpoints. [16:44:25] Hiii, I think we need to set up envoy for swift in mediawiki (See https://phabricator.wikimedia.org/T328872#11783346 ) so it would cache TLS handshakes. Any points on how to move forward? I can do the mw side of things