[08:24:01] Is vital signs broken? I can only see reader metrics now, not user metrics which used to be there [10:22:13] Morning all, would I be able to borrow one of you very clever people (cc milimetric) to have a go at a pretty complex database query? [10:23:00] Essentially summed up in https://phabricator.wikimedia.org/T158545 but the current English Wikipedia new page patrol coordinator would like some patrol statistics queried [10:26:24] Hi samtar, not sure I'm clever enough for your stuff, but we still can discuss :) [10:27:44] Well first of all does the task describe what we're looking for clearly? It was copy/pasted from a message on Wikipedia [10:34:00] samtar: The tasks is detailed enough for me in term of problem understanding [10:34:24] samtar: However, I lack some technical/data specification details to (possdibly) help [10:34:48] samtar: Should I document the questions I have in the task, so that anyone can benefit from your anszers? [10:35:09] joal: that'd be great thank you :) [10:35:23] Ok, doing that samtar :) [14:11:54] 06Analytics-Kanban: Hive code to count global unique devices per top domain (like *.wikipedia.org) - https://phabricator.wikimedia.org/T143928#3040738 (10JAllemandou) Checked with daily requests, using a modifed version of `offset` (instead of `uri_host` splitting, use `normalised_host.project_class`, as per glo... [14:26:43] Hi, I'm using the pageviews api to get the monthly view stats for a bunch of pages. [14:27:00] I'm very frequently getting 404, even though I'm doing less than 100 req/s. [14:27:26] One in 5-6 queries returns 404. [15:23:02] hi Niharika :] can you explain how are those requests and how are you doing them? Like sequentially or in parallel? [15:23:49] mforns: Sequentially. I'm getting more 400 Bad requests than 404s but I'm getting them both. [15:24:28] Niharika, and how are you generating the urls for the requests? [15:25:27] I've checked and rechecked to make sure the URLs are correct. I'm generating them by calculating the month start and end I need data for and concatenating strings with project title etc. [15:25:55] The urls which return 400/404 with the app give me proper data when I load them on browser. [15:26:01] I see [15:26:20] I'm wondering if throttling is doing something weird. How many requests do you recommend I make per second? [15:27:10] Like I fetched ~20 pages just now: [15:27:14] https://www.irccloud.com/pastebin/MwrrcLSH/ [15:27:16] Niharika, theoretically, 100 req/s should be fine, but if you can, a good workaround could be making less requests per sec [15:27:56] I made 20 in one second and got 4 400s. See snippet. [15:28:07] Niharika, seems that all page titles have spaces in them [15:28:27] maybe your browser is URLencoding them, and this would work [15:28:36] but not a direct hit to the API [15:28:49] have you tried to replace spaces with underscores? [15:28:53] Ah, but it works for most titles with spaces? [15:29:10] I haven't, I'll give it a try now. [15:29:27] mmm [15:35:03] mforns: It's also incredibly slow for me. Like loading 50 pages takes ~5 minutes. I have a reasonably fast internet connection ~8 Mbps. So I'm not sure what's causing it to be this slow. [15:35:34] Niharika, oh.. wow, it shouldn't be so slow. [15:35:59] I will mention that in our stand-up meeting, which happens in 25 minutes, and let you know. [15:36:45] mforns: Thanks. I loaded the same 20 pages with underscores instead of spaces and it didn't give me and 400 or 404, yay! I'll do a more intensive test afterwards. Thanks for your help and for the awesome API! [15:37:15] Niharika, OK makes sense, thank you! [15:46:34] (03PS1) 10Mforns: Add script to generate WSC abbrevs to domain map [analytics/refinery] - 10https://gerrit.wikimedia.org/r/338786 (https://phabricator.wikimedia.org/T158330) [16:01:54] Hi A-Team, anybody for standup? [16:02:13] yes [16:03:16] only three of us :) joal [16:04:14] Am i in an alternate batcave? [16:05:22] joal I think so? [16:05:28] marcel and I are here [16:05:32] guys, I don't manage to get in proper batcave :( [16:06:14] aw that sucks [16:06:23] we can skip it if you want joal mforns [16:06:33] no, I'll try to make it work [16:07:03] rebooting now [16:27:30] https://whiteboardfox.com/87265-4242-9850 [16:55:16] fdans: I think we can say that overall project info (hourly, daily, monthly etc) is ~5Gb [16:56:06] joal: yeah but I'm not worried about that anymore since we're not going to dump from cassandra - insert into cassandra :) [16:59:51] k fdans :) [17:01:00] Hi nuria, I had a quick check at global uniques daily (if you wanna see: https://phabricator.wikimedia.org/T143928) [17:01:06] 06Analytics-Kanban: Hive code to count global unique devices per top domain (like *.wikipedia.org) - https://phabricator.wikimedia.org/T143928#3041187 (10Nuria) I think we probably need to take a second look at this calculation, compare the wikidata numbers with the ones we are already calculating, data below fo... [17:01:32] joal: Hola! just updated that ticket, see numbers for wikidata for february (the ones we are already calculating) [17:01:37] joal: they do not match [17:07:53] nuria: just commented as well [17:09:20] 06Analytics-Kanban: Hive code to count global unique devices per top domain (like *.wikipedia.org) - https://phabricator.wikimedia.org/T143928#3041189 (10JAllemandou) Actually numbers for uniques underestimates are close enough (my computation was for yesterday 2017-02-19, forgot to mention): - 3305 using by hos... [17:09:31] 06Analytics-Kanban: Hive code to count global unique devices per top domain (like *.wikipedia.org) - https://phabricator.wikimedia.org/T143928#3041190 (10Nuria) Right, offset is "off" (jaja!) but underestimate matches. [17:10:08] mforns: If it helps, the following url has given me 404 thrice now: Error - Warning: file_get_contents(https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Kordhocë/monthly/2017010100/2017013100): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found [17:11:36] mforns: If it helps, the following url has given me 404 thrice now. Error - Warning: file_get_contents(https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Kordhocë/monthly/2017010100/2017013100): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found [17:12:42] Niharika, HAve you tried requesting it URL-encoded like this? https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Kordhoc%C3%AB/monthly/2017010100/2017013100 [17:12:44] ? [17:13:02] Ugh, sorry. [17:13:07] Irccloud being stupid. [17:13:14] np :] [17:13:47] mforns: Ah, I haven't, thanks for the tip! [17:14:08] I guess it will work like that, let me know otherwise! [17:34:25] 10Analytics, 06Discovery: grant access to hue.wikimedia.org to Guillaume - https://phabricator.wikimedia.org/T158589#3041257 (10Gehel) [19:14:59] Hey a-team, flying back home tomorrow morning, I'll be available early afternoon if everything goes as planned [19:15:13] ok joal [19:15:17] :] [19:17:30] I'll see you all tomorrow :) Good night a-team ! [19:17:35] byeee! [23:05:08] (03PS1) 10Fdans: Add secondary sys endpoint to populate Cassandra with correct timestamps [analytics/aqs] - 10https://gerrit.wikimedia.org/r/338898 (https://phabricator.wikimedia.org/T156312)