[00:22:10] Hi ebernhardson. Are you around for a question? [00:28:51] Shilad: sure [00:29:35] Based on your comments about caching yesterday I was planning on using webrequest table instead of the cirrus log table. [00:29:58] I'm trying to track down the uri patterns (path + query) that constitute a search. So far I have: [00:30:25] https://en.wikipedia.org/w/index.php?search=... [00:30:37] https://en.wikipedia.org/w/Special:Search?search= [00:30:46] https://www.wikipedia.org/search-redirect.php?family=wikipedia&language=en&search= [00:31:14] I'm sure I'm missing some... Do you know any obvious ones? [00:32:21] Shilad: so, there is this funny feature in mediawiki where *any* request not to api.php that has the query string containing 'search=' will return search results, regardless of what page you ask for. so https://en.wikipedia.org/wiki/Foo?search=bar&fulltext=1&searchToken=6n885j8acupq1bcfju2dgn1kx shows search results, even though it purports to be the 'Bar' page [00:32:44] HAH! [00:32:48] That cracks me up. [00:33:45] Thanks. I'll try to catch that as well. [00:34:34] Shilad: also fulltext search is not cached, only autocomplete. Autocomplete goes to /w/api.php. For desktop that uses action=opensearch, mobile web uses generator=prefixsearch. One problem you might run into though is that those requests can also be from anything else. No guarantee they are related to page views or whatnot [00:35:59] Shilad: if you are looking at multiple language wikis the query parameters are really your best bet, because each language will localize the Special:Search page which makes it hard to deal with [00:37:50] ebernhardson: By query parameters do you mean the entries in the webrequest table [00:38:11] (the uri_query column) [00:38:35] Shilad: query parameters are everything after the ? in a URL. [00:38:46] Shilad: in the webrequest table those end up in uri_query [00:39:37] Shilad: so something like `uri_path <> '/w/api.php' AND str_to_map(uri_query, '&', '=')['search'] is not null` is probably a reasonable way to detect a full text search request [00:40:07] Shilad: `uri_path = '/w/api.php' AND str_to_map(uri_query, '&', '=')['action'] = 'opensearch'` should detect autocomplete api requests on desktop web [00:40:38] ebernhardson: Thanks. I just wasn't sure whether you were talking about the cirrus table or the webrequest taable. [00:41:55] generally you only need the cirrus tables if you care about things like what/how many results were returned, what internal requests we made, etc. [00:42:28] ebernhardson: Great. Thanks for your help! [00:43:08] so like a match:foo with score 0.25 and a boost:2 drops the boost and converts match:foo to 0.5, continuing all the way until the thing is a flat sum of scoresnp [00:43:14] wrong room on that one :P [00:43:15] but np :) [00:46:44] perhaps some day i can bother you about figuring out what i can use from wikibrain to improve search ;) [00:47:28] ebernhardson: Sure! I think this project (which is closely related) will yield some useful search models. [09:45:17] kafka1023 fully bootstrapped! [09:45:21] \o/ [09:45:41] all metrics recovered, we have again a 6 node kafka cluster [12:58:42] * joal thanks a lot elukey and ottomata for taking super good care of our infra :) [19:14:05] (03PS1) 10GoranSMilovanovic: EngineGeo Dec 16 2017 [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398652 [19:14:11] (03CR) 10jerkins-bot: [V: 04-1] EngineGeo Dec 16 2017 [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398652 (owner: 10GoranSMilovanovic) [19:14:52] (03PS2) 10GoranSMilovanovic: EngineGeo Dec 16 2017 [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398652 [19:15:42] (03CR) 10GoranSMilovanovic: [V: 032 C: 032] EngineGeo Dec 16 2017 [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398652 (owner: 10GoranSMilovanovic) [19:23:17] (03PS1) 10GoranSMilovanovic: Production Dec 16 2017 [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398653 [19:23:24] (03CR) 10jerkins-bot: [V: 04-1] Production Dec 16 2017 [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398653 (owner: 10GoranSMilovanovic) [19:31:41] (03Abandoned) 10GoranSMilovanovic: Production Dec 16 2017 [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398653 (owner: 10GoranSMilovanovic) [19:54:06] (03PS1) 10GoranSMilovanovic: Production v2 16 Dec 2017 [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398655 [19:54:51] (03CR) 10GoranSMilovanovic: [V: 032 C: 032] Production v2 16 Dec 2017 [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398655 (owner: 10GoranSMilovanovic) [19:59:50] (03PS1) 10GoranSMilovanovic: .csv deletions 16 Dec 2017 [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398656 [20:00:40] (03CR) 10GoranSMilovanovic: [V: 032 C: 032] .csv deletions 16 Dec 2017 [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398656 (owner: 10GoranSMilovanovic) [20:16:31] (03PS1) 10GoranSMilovanovic: README.md change [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398657 [20:17:03] (03CR) 10GoranSMilovanovic: [V: 032 C: 032] README.md change [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398657 (owner: 10GoranSMilovanovic) [22:57:11] (03PS1) 10GoranSMilovanovic: EngineGeo 16 Dec 2017 [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398666 [22:57:53] (03CR) 10GoranSMilovanovic: [C: 032] EngineGeo 16 Dec 2017 [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/398666 (owner: 10GoranSMilovanovic)