[00:09:31] i suppose in a way..now we have an idea of how much excess capacity we have :P 3-4M commonswiki_file queries per day with pagination [00:09:44] (but only when running both clusters) [00:10:26] anyways, i'm undeploying the more_like -> eqiad patch, so all traffic will be going to dnsdisc (==codfw) as is typical soon [00:15:13] hmm, also docker daemon on cindy blew up again...second time this month that it needs a hard restart of the service [00:15:51] i suppose we could go nuclear and restart it before each test, but feels excessive :P [00:23:26] traffic switched back, everything looks happy (although this is the slow time of day) [07:00:23] errand [08:46:09] wondering if we could silence wdqs2009 (legacy full graph), a single server does not support surge of traffic and there's nothing we can do about [10:11:48] ryankemper: ^ [10:11:50] lunch [11:59:25] break [13:08:45] o/ [13:10:02] o/ [13:14:49] Thanks for everyone's help on the search latency stuff. ryankemper and I were talking about treating it as possible bot abuse on Tuesday but never followed up, sorry about that ;( [13:20:17] np! [14:08:01] \o [14:21:57] ryankemper we're also getting some alerts from the wdqs hosts that may or may not have reimaged after the LVS teardown? Probably worth a look at pairing today or whenever [14:25:31] Can we cancel the retro? I'd like to go to the Making Space meeting, which will not be recorded [14:26:01] +1 to cancelling retro [14:26:29] no opinion either way, we can [14:30:46] o/ [14:30:57] no opinion either way [14:58:12] sounds like a skip then? [14:59:37] yep [15:05:42] I'm seeing the messages only now. I had a few topics for the retro. But we can discuss at another time if needed. [16:02:47] workout, back in ~40 [16:34:42] heading out [16:45:51] hmm, looking at cormac's suggestion it seems the request would be to increase weight of the near_match in fulltext by perhaps an order of magnitude. I suspect we'd need to break out relforge (does it still work?) to understand if that has unintended effects [16:47:01] re: T401590 [16:47:01] T401590: Adjust CirrusSearchNamespaceWeights for Commons - https://phabricator.wikimedia.org/T401590 [17:00:36] back [17:09:08] Taking dog out [17:14:17] meh, phpcs doesn't like a bare return with all the conditions on following lines. Perhaps accidentaly. It requires return to be followed by a space, but it requires no whitespace at the end of a line [17:14:49] not a big deal, tbh we need a black for php so i don't think about it :P [18:11:53] * ebernhardson just realized we can also calculate natural language query precision/recall of the contains_question_word classifier from the manual classifications [18:11:57] should have been more obvious :P [18:28:00] ouch. precision is good, but as expect recall for the strict defintion is perhaps 0.54, and recall on the broader definition drops to .32 [18:33:21] ebernhardson: looks like you also forgot to remove the param for isDeepWebScrapingRequest(). Did you mean to remove the getExtraIndexesForNamespaces() clause from isDeepWebScrapingRequest() when you fixed the "space after return" problem? [18:34:40] Trey314159: yea, i posted an earlier comment on the patch about how the extra indexes is a reasonable metric for if a query might be expensive, but in terms of classifying traffic as automated how expensive the query is seems tangential [18:34:49] david posted a response aggreeing, so i dropped it [18:36:07] although i could think of alternatives, it could be reasonable to be more liberal with considering things automated if they are also expensive [18:36:39] although then i might want a better definition of expensive, maybe check the expected shard counts and consider anything querying more than X shards expensive [18:36:46] but then everything to commons is expensive :P [18:37:40] oh, actually we dont know the commonswiki shard counts on other wikis :( [18:37:59] i guess we could get it from the cross-wiki config, but thats fragile and probably not ideal to use on every request [18:38:35] * ebernhardson should actually look at the patch to make sure i'm talking about the same thing you are :P [18:39:23] oh, i see what you mean. yes i'm totally failing at being thourough lately :S [18:39:30] * ebernhardson also still cant spell :P [18:41:35] the problem is partly i need to fix my local stuff..when i run phan it complains about a bunch of things that CI doesn't because i have a version mismatch somewhere [18:41:44] I saw the discussion, but just wanted to double check it was intentional. [18:41:46] Re automated vs expensive... Naming is hard.. "expensive_and_or_automated", "should_be_rate_limited".. the question is whether they should be in the same pool or in different pools—and I have no strong opinion there. [18:45:52] i guess to me, the high level goal is that automated traffic gets throttled and interactive human traffic doesn't drop queries...but thats not a heuristic thats a wish :P [21:03:24] inflatador: 10’ [21:06:34] ryankemper cool, I'm here [22:17:08] inflatador: here's that patch to shift the wdqs hosts over to wdqs-main: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1191525 (in accordance with the plan in https://docs.google.com/spreadsheets/d/1Jh14bWaQhKiTWKaDJXD6n68kYbJ9ecIu4q8L0keX_wk/edit?gid=0#gid=0)