[04:52:20] (03CR) 10Gergő Tisza: [C: 032] Make rcshow=oresreview bypass query optimizer failure [extensions/ORES] - 10https://gerrit.wikimedia.org/r/337263 (https://phabricator.wikimedia.org/T152585) (owner: 10Ladsgroup) [04:53:30] (03Merged) 10jenkins-bot: Make rcshow=oresreview bypass query optimizer failure [extensions/ORES] - 10https://gerrit.wikimedia.org/r/337263 (https://phabricator.wikimedia.org/T152585) (owner: 10Ladsgroup) [14:11:17] o/ [17:29:35] Amir1: dr0ptp4kt: API usage errors are not logged but the ApiAction table in hadoop gets flagged on errors so that's one way to check [17:31:21] it looks like it's the same python ua, so i think the continuation rate limiting option needs to be used. Amir1 tgr coreyfloyd [17:31:51] dr0ptp4kt: Okay, I make a patch for that [17:32:13] thx Amir1 [17:32:49] hm, actually the API doesn't even error out, it just sends a warning about one of the params being wrong, so there won't be much trace in logs [17:32:51] I'm trying to find a phab card, Is there something for that? [17:34:04] 06Revision-Scoring-As-A-Service, 10MediaWiki-extensions-ORES, 15User-Ladsgroup: Reduce the number of revisions that can be requested in one batch - https://phabricator.wikimedia.org/T157983#3022340 (10Ladsgroup) [17:34:54] I just made one :D [17:35:48] if we want to issue a "communications block", we could put an IP range ban in an ApiCheckCanExecute hook [17:36:23] (03PS1) 10Ladsgroup: Reduce number of revisions that can be requested [extensions/ORES] - 10https://gerrit.wikimedia.org/r/337424 (https://phabricator.wikimedia.org/T157983) [17:37:50] tgr: https://gerrit.wikimedia.org/r/337424 I reduced it to 30, not sure that would be enough [17:40:13] dr0ptp4kt: What is the rate it's sending the request? [17:41:22] Amir1: about 320,000 api,php requests per day [17:42:00] oh boy [17:43:38] I think this tool should be banned from api.php in total. Maybe we need to talk about that with Ops, given that their UA doesn't satisfy the UA policy (no contact) [17:43:43] so assuming a constant rate its 3 rps: [17:43:46] ? [17:43:52] that does not sound so bad [17:44:19] but it's unleashing 3 * 50 revisions to check to ores endpoint [17:44:56] but yeah, block first, ask later [17:45:33] should we do that in MW config or ask ops to ban the IP: [17:46:14] let me try to convince Ops [17:47:32] Amir1: do we have any numbers to base the API rate limits on? with the current setting it's still possible to query 1500 revisions per request if I read the code correctly [17:48:06] I'd expect OresAPIMaxBatchJobs to be much lower [17:50:45] tgr: Isn't it 30? [17:51:04] maybe I'm reading the wrong code [17:51:30] yes, but that means that the API will fire 30 jobs which ask for 50 revisions each [17:53:32] tgr: oh, my bad. So 3 would be okay ( 3 * 30 = 90)? [17:53:55] yeah, something like that [17:54:23] In matter of ores itself, we have 45 workers per node (with four nodes) and each worker can handle around 18 revisions per minute [17:54:55] so 3240 per min [17:55:43] (03PS2) 10Ladsgroup: Reduce number of revisions that can be requested [extensions/ORES] - 10https://gerrit.wikimedia.org/r/337424 (https://phabricator.wikimedia.org/T157983) [17:56:29] OresAPIMaxBatchJobs is really just a kind of precaching, if the client handles API continuation correctly they will ask for those same batches in the next requests [17:57:31] so I guess it should be set to the number of workers you think a single client should be reasonably able to hog [17:59:38] tgr: I think 90 is a good number, we have tons of other clients too and our infra is fragile. I amended the patch [18:00:01] OK, do you want to SWAT it now or no rush? [18:00:52] (03CR) 10Gergő Tisza: [C: 032] Reduce number of revisions that can be requested [extensions/ORES] - 10https://gerrit.wikimedia.org/r/337424 (https://phabricator.wikimedia.org/T157983) (owner: 10Ladsgroup) [18:00:57] First we need to enable the functionalities. When it's not there, deploying them doesn't make sense [18:01:49] I'm waiting for Ops to tell me if they are going to ban that person or not. If they banned it. I make a patch to bring back and get it pushed through SWAT [18:02:00] sounds good tgr dr0ptp4kt ? [18:02:25] (03Merged) 10jenkins-bot: Reduce number of revisions that can be requested [extensions/ORES] - 10https://gerrit.wikimedia.org/r/337424 (https://phabricator.wikimedia.org/T157983) (owner: 10Ladsgroup) [18:02:33] If they didn't agree and banning, we need to figure out something else before deploying :( [18:03:16] afk for dinner, be back soon [18:08:45] we can do it inside the API, it's not a problem [18:09:13] gtg, will be back in two hours [18:32:38] Amir1 tgr|away as a general rule, i'd rather be gentle with api consumers, providing a warning instead of booting them. we're here for them! so i prefer rate limiting to banning. but whatever you guys think is best this time. [19:20:29] dr0ptp4kt: I'm back, https://meta.wikimedia.org/wiki/User-Agent_policy is important. I would have contacted that person instead of banning if they provided a meaningful UA [19:22:45] Amir1: right. i think sometimes people don't even realize they should look. to be fair, the top of the api.php actiohelp does point to the documentation, but the ua part is sort of buried. but rules are rules, too [19:25:00] hmm, I agree that we should make this more visible but I have no idea how. Definitely tgr|away and anomie have some ideas [19:48:50] IMO just ban with a friendly message asking to contact us [19:49:18] probably easier for them to figure out than things randomly failing due to rate limits [20:24:05] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: [Spec] Tracking and blocking specific IP/user-agent combinations - https://phabricator.wikimedia.org/T137962#2385178 (10Tgr) >>! In T137962#2447823, @schana wrote: > This is so one user-agent/IP doesn't hog all the resources and other users still are able to use... [21:09:25] 06Revision-Scoring-As-A-Service, 10Deployment-Systems, 10ORES, 06Release-Engineering-Team, 10Scap: Error after "Finished deploy": xrange() arg 3 must not be zero - https://phabricator.wikimedia.org/T157136#3023174 (10dduvall) [21:49:16] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: Implement parallel connection limit for querying ORES - https://phabricator.wikimedia.org/T148997#3023277 (10Tgr) Per T137962#2447946, "//Generic ratelimiting (e.g. per client IP) and other similar protection measures for these clusters has been pushed off for p... [21:52:00] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: [Spec] Tracking and blocking specific IP/user-agent combinations - https://phabricator.wikimedia.org/T137962#3023287 (10Tgr) >>! In T137962#2447946, @BBlack wrote: > 4. One of the best defenses you can have is to be sure that unauthenticated URLs are reasonably-... [23:11:49] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: [Spec] Tracking and blocking specific IP/user-agent combinations - https://phabricator.wikimedia.org/T137962#3023643 (10BBlack) >>! In T137962#3023287, @Tgr wrote: >>>! In T137962#2447946, @BBlack wrote: >> 4. One of the best defenses you can have is to be sure... [23:40:50] 10Revision-Scoring-As-A-Service-Backlog, 10ORES: [Spec] Tracking and blocking specific IP/user-agent combinations - https://phabricator.wikimedia.org/T137962#3023659 (10Tgr) >>! In T137962#3023643, @BBlack wrote: > Static home page at https://ores.wikimedia.org/ is not cacheable at all. > Static logo at https:... [23:48:23] 06Revision-Scoring-As-A-Service, 10ORES, 10Wikimedia-Logstash, 13Patch-For-Review, 15User-Ladsgroup: Send ORES logs to logstash - https://phabricator.wikimedia.org/T149010#3023690 (10Tgr) [23:48:31] 06Revision-Scoring-As-A-Service, 10ORES, 10Wikimedia-Logstash, 13Patch-For-Review, 15User-Ladsgroup: Send ORES logs to logstash - https://phabricator.wikimedia.org/T149010#2739588 (10Tgr)