[09:22:58] (CR) DCausse: "I ran some tests with camus against our topic in production:" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [09:51:15] (PS1) Ori.livneh: statsv: migrate to pykafka [analytics/statsv] - https://gerrit.wikimedia.org/r/252657 [09:52:21] (CR) Ori.livneh: [C: 2 V: 2] statsv: migrate to pykafka [analytics/statsv] - https://gerrit.wikimedia.org/r/252657 (owner: Ori.livneh) [10:35:31] Analytics-Tech-community-metrics, Developer-Relations, DevRel-November-2015: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1800559 (Dicortazar) Data were updated now. The issue with Git is that this is still n... [10:39:26] Analytics-Tech-community-metrics, DevRel-November-2015: Automated generation of (Git) repositories for Korma - https://phabricator.wikimedia.org/T110678#1800561 (Dicortazar) @Aklapper, my only concern is about the git repositories that are not part of Gerrit but still are required in the list of repositor... [10:42:54] Analytics-Kanban: Update Cassandra loading job - per-project [5 pts] {slug} - https://phabricator.wikimedia.org/T118447#1800562 (JAllemandou) NEW a:JAllemandou [10:44:00] Analytics-Kanban: Update cassandra monthly top job [3 pts] {slug} - https://phabricator.wikimedia.org/T118448#1800571 (JAllemandou) NEW a:JAllemandou [10:44:52] Analytics-Kanban: Correct cassandra daily top job [5 pts] {slug} - https://phabricator.wikimedia.org/T118449#1800578 (JAllemandou) NEW a:JAllemandou [10:46:53] (PS22) Joal: Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 [11:09:23] Analytics-Kanban: Backfill cassandra pageview data - September - https://phabricator.wikimedia.org/T118450#1800594 (JAllemandou) NEW a:JAllemandou [11:09:45] Analytics-Kanban: Backfill cassandra pageview data - September [5 pts] {slug} - https://phabricator.wikimedia.org/T118450#1800602 (JAllemandou) [12:12:54] * mforns tests [12:15:45] Hi mforns :) [12:15:51] seems to work :) [12:15:51] hello joal :] [12:15:55] xD [12:16:24] I'm back to work, the little one is ok :) [12:16:31] oh! good [12:16:50] my daughter is also better, maybe tomorrow she goes to school again [12:17:01] Cool :) [12:29:23] mforns: I have thought of you as well, reading about the political stuff going in catalunia :) [13:13:35] sorry joal, having a snack [13:13:44] np mforns :) [13:14:09] aha, yes... my family is all about that, but I'm kinda disconnected [13:14:20] :) [13:14:32] Maaaaan, I've caught Lino gastro :( [13:14:37] pffff [13:14:49] what? [13:15:21] uncomfortable diaper change? [13:15:23] :] [13:15:33] Lino had a gastroenteritis, and I'm pretty sure having taken care of him in the last few days, I've caught it :( [13:15:40] oooh... [13:15:50] mwarf [13:16:22] you should rest then [13:16:36] I mean, as long as the batroom is not too far, I'm ok :) [13:16:55] This kinda situation is a good example of when everybody benefits of remote working :) [13:17:01] xD [13:18:58] Analytics-Kanban, Patch-For-Review: Exclude MobileMenu from Pageviews - https://phabricator.wikimedia.org/T117345#1800857 (Ironholds) Done. [13:20:49] milimetric, it's for you: https://twitter.com/duggan/status/664541126469226497 [13:20:53] :D [13:25:50] hehe delicate theme :] [13:33:11] joal, yt? [13:33:15] yup [13:33:34] do we have monthly granularity available for by-project? [13:33:38] in aqs? [13:33:42] We should [13:34:11] how should I query it? giving the first day of the month and last day of the month? [13:34:16] I think aqs displays an error message requesting monthly, but that's not the expected behavior [13:34:23] aha [13:34:38] yes, I'm seeing that also [13:34:44] What I'd expect would be to request using first day of the month a [13:34:52] and either last day or next month first day [13:35:15] joal, to be consistent I think we should make last date always inclusive, or always exclusive [13:35:55] the per-article endpoint has an inclusive end-date [13:36:29] makes sense mforns [13:36:32] ok [13:55:58] Analytics-Tech-community-metrics, Developer-Relations: Mark BayesanFilter repository as inactive - https://phabricator.wikimedia.org/T118460#1800890 (Qgil) NEW [14:05:51] Analytics-Tech-community-metrics: Many 404s and graphs not displayed on gerrit_review_queue.html - https://phabricator.wikimedia.org/T118461#1800903 (Aklapper) NEW a:Dicortazar [14:08:48] Analytics-Tech-community-metrics, DevRel-November-2015: Many 404s and graphs not displayed on gerrit_review_queue.html - https://phabricator.wikimedia.org/T118461#1800910 (Aklapper) [15:11:36] hi joal. [15:11:42] Hi leila [15:11:52] wondering, joal, is the checkpoint happening today? [15:12:20] Nothing says it's cancelled :) [15:12:31] leila: --^ [15:12:48] leila: ottomata usually organises them, but I have not seen him yet [15:12:55] yeah, okay. I don't see attendance from anyone except Dan. Wasn't sure if it's a thing. okay! then see you in 4 hours. :-) [15:14:45] leila: depends also on you, if you think it's not worthwile, let's forget it ) [15:15:10] I'd like to touch-base and see what's up on your end, joal. [15:15:35] k cool :) [15:15:48] see you then leila ! [15:15:54] see ya! [15:15:57] :-) [15:19:45] *scratches head* [15:48:06] Analytics-General-or-Unknown, Wikidata, Story: [Story] Statistics for Wikidata exports - https://phabricator.wikimedia.org/T64874#1801041 (Lydia_Pintscher) Adam: Poke? This is getting rather important and urgent. [15:49:11] a-team: my google account says my pasword on nuria@wikimedia was changed 3 hours ago, have you guys had problems on this too? [15:49:19] mmm [15:50:14] nuria: everything ok for me [15:50:20] nuria, no, I didn't receive any notification [15:50:20] welll [15:50:39] O.o [15:51:21] sorry nuria :( [15:53:44] nuria, how did google notify you? [15:53:56] when trying to login [15:54:05] wow [15:54:17] but looks like it is going to fix it so well, no need to read e-mail today! [15:54:28] :] [15:54:36] nuria: you managed to login ? [15:54:47] that would be weird, with password changed [15:59:54] joal: i notified so i shoudl be ok, they will take care of it later on [16:00:01] cool [16:11:24] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1801111 (daniel) >>! In T116247#1799843, @Ottomata wrote: > Is it time to consider creating a standalone repo for these schemas?... [16:14:39] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1801114 (Ottomata) @daniel This schema repo will be used by many codebases. EventLogging, Mediawiki, analytics refinery, etc.... [16:16:01] Analytics-Backlog: Write pageview API blogpost - https://phabricator.wikimedia.org/T118471#1801116 (Nuria) NEW [16:21:08] Analytics: MobileWikiAppDailyStats should not count Googlebot - https://phabricator.wikimedia.org/T117631#1801135 (Dbrant) @JKatzWMF I think the best practice should be to always specify the userAgent when writing queries, e.g. LIKE '%-r-%' for the production app, LIKE '%-beta-%' for the beta app, etc. (Th... [16:23:47] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1801151 (daniel) @Ottomata If we have good versioned dependencies between the modules, that should work too. My concern is makin... [17:07:17] Analytics-Kanban, Patch-For-Review: Exclude MobileMenu from Pageviews - https://phabricator.wikimedia.org/T117345#1801241 (kevinator) Thanks @Ironholds [17:21:19] a-team: back on irc [17:25:00] nuria, https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/get_metrics_pageviews_top_project_access_year_month_day [17:36:36] mforns: can you paste etherpad doc? [17:37:09] nuria: https://etherpad.wikimedia.org/p/analytics-retrospective [17:57:45] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1801337 (EBernhardson) We have already run into many annoyances with trying to keep schemas in line across repositories. I'd be... [18:07:56] Analytics-Kanban: Define a first set of metrics to be worked for wikistats 2.0 {lama} [8 pts] - https://phabricator.wikimedia.org/T112911#1801375 (Milimetric) [18:11:24] Analytics-Backlog, Analytics-Dashiki, Editing-Analysis, Research-and-Data, VisualEditor: Start generating a visual editor adoption metric - https://phabricator.wikimedia.org/T109158#1801395 (Milimetric) as a gentle bump, we're going to remove the Analytics tags from this until there's more progr... [18:12:49] Analytics-Backlog: Write pageview API blogpost - https://phabricator.wikimedia.org/T118471#1801401 (Milimetric) p:Triage>High [18:14:01] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1801406 (Milimetric) @Qgil: could you comment about this and whether or not you feel it's now on track? I continue to think it's a very important discussion we... [18:14:30] Analytics-Kanban: AQS should expect article names uriencoded just once {slug} - https://phabricator.wikimedia.org/T118403#1801407 (JAllemandou) p:High>Unbreak! [18:15:29] Analytics-Kanban: AQS should expect article names uriencoded just once {slug} - https://phabricator.wikimedia.org/T118403#1799289 (Nuria) Holler to @gwickie and @mobroac: Does this issue sound familiar? [18:29:31] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] - https://phabricator.wikimedia.org/T114379#1801454 (ezachte) Dan, I'm still working on loose ends for Monthly Page View Reports. Also this task also was about Traffic Brea... [18:31:50] Analytics-Backlog: Make AQS return 0 instead of no values {slug} - https://phabricator.wikimedia.org/T118402#1801460 (Milimetric) p:High>Normal [18:34:26] Analytics-Backlog: Make AQS return 0 instead of no values {slug} - https://phabricator.wikimedia.org/T118402#1801473 (Nuria) Use cases: - a project with pageviews every other day. Do "empty" days return zero? - querying for a non-existing project? - querying for a data rage we do not have because we are 2 d... [18:34:45] Analytics-Backlog: Make AQS return 0 instead of no values {slug} - https://phabricator.wikimedia.org/T118402#1801475 (Milimetric) tricky! must find a way to differentiate between true 404 results and true 0 count results [18:42:41] Analytics-Kanban: Bring wikimetrics staging uptodate - https://phabricator.wikimedia.org/T118484#1801523 (Nuria) NEW [18:44:20] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Send email out to community notifying of change - https://phabricator.wikimedia.org/T115922#1801531 (Nuria) @JAllemandou millimetric: is this still blocked? [18:49:35] milimetric: batcave ? [18:54:30] I was reading this yesterday - https://www.reddit.com/r/IAmA/comments/3sf8xx/im_bill_binney_former_nsa_tech_director_worked/ my head is spinning with all that's being said [18:56:51] joal: sorry, omw! [18:59:44] joal: I was trying to launch the last access jobs and it fails with this error: [18:59:47] https://www.irccloud.com/pastebin/t9XC7TQu/ [19:00:18] which is confusing because the jar exists [19:29:44] madhuvishy: No idea really :( [19:30:00] :( [19:31:27] have you stopped/relauch the oozie job, or pause / resume ? [19:31:33] madhuvishy: --^ [19:31:41] i haven't stopped anything [19:31:51] this one is writing to a different place [19:32:00] so i din't think I had to do that first [19:32:18] i'm just trying to launch a new job [19:32:29] joal: ^ [19:32:52] k madhuvishy [19:33:01] As I said: sound bizarre [19:33:06] yeahhh [19:33:12] as always [19:33:16] :) [19:33:26] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Send email out to community notifying of change - https://phabricator.wikimedia.org/T115922#1801644 (ezachte) I'm still working on https://phabricator.wikimedia.org/T114379 (see status report there) Can we postpone this till everything is in place? [19:35:41] milimetric: wikimetrics staging is up to date right? you always deploy there before prod? [19:36:17] madhuvishy: not necessarily, people could have experimented there since last prod deploy [19:36:29] -staging is now more like -dev and -staging since we killed -dev [19:36:43] but the repo is up to date, if that helps - master is deployed [19:36:52] milimetric: aah okay [19:37:13] i'll check [20:02:54] madhuvishy: just so that you know: currently 4 coordinators running for last access [20:03:13] joal: ah, couple of them are probably failing - i'll kill [20:03:23] thks madhuvishy :) [20:06:39] joal: killed them [20:06:50] awesome thanks :) [20:10:10] joal: are you re-loading the top table right now? [20:10:15] yes I am [20:10:19] why milimetric ? [20:10:23] Analytics-Backlog, Language-Engineering: Investigate anomalous views to pages with replacement characters - https://phabricator.wikimedia.org/T117945#1801774 (kevinator) [20:10:23] Testing something ? [20:10:32] :) I was looking at the recommend tool [20:10:36] Arffffff [20:10:39] and it stopped working all of a sudden, so I was confused [20:10:39] Sorry :( [20:10:43] sok, np [20:10:44] My bad [20:10:57] I think now that we have official users we'll have to be more careful [20:11:04] is Ellery around so that I tell him ? [20:11:09] no, it's ok [20:11:14] when do you think it'll be done? [20:11:27] hm, no so soon unfortunately [20:11:38] ok, I'll talk to Leila [20:11:47] top is a long computation [20:11:50] how long does it usually take? [20:12:00] ~1hour per day [20:12:08] and we have all october to backfill [20:13:00] is there any way we can do November 12,11,10,09, etc.? [20:13:04] that way the tool will be back online [20:13:31] milimetric: can you confirm it needs only yesterday's top ? [20:13:39] if os, I'll ensure it works asap [20:14:03] joal: the request was /analytics.wikimedia.org/v1/pageviews/top/ca.wikipedia/all-access/2015/11/10 [20:14:08] but let me look at the code [20:14:15] sure milimetric [20:14:24] is it today -2 ? [20:14:38] yes, confirmed, today - relativedelta(days=2_ [20:14:56] ok, doing the necessary now [20:15:07] sweet, thank you [20:16:53] milimetric: jobs launched [20:17:15] going to diner, will come back to check after [20:17:21] thx joal, the tool works with the seed article, since that doesn't use top [20:17:23] and I let them know [20:17:27] bon apetit [20:52:48] nuria, yt? [20:53:14] I'm going to leave in a couple minutes, do you want to talk about aqs docs? [20:55:13] mforns: sorry, still trying to fix e-mail [20:57:07] nuria, np, is there any key point you wanted to express? [20:57:37] I guess I'll rewrite the docs in a simpler way and have a look at restbase spec [21:00:29] mforns: let's work together tomorrow to fix the spec based on your docs [21:00:50] milimetric, cool [21:01:19] see you tomorrow a-team! [21:01:27] good night mforns :) [21:01:29] nite [21:01:34] Analytics-Kanban: Bring wikimetrics staging uptodate - https://phabricator.wikimedia.org/T118484#1801958 (madhuvishy) I checked and staging looks uptodate. [21:01:34] :] [21:01:48] laters! [21:02:56] ottomata: Hi! how's the mvp going? I've been out of sync since vacation - anything i can help with? [21:03:06] bye mforns :) [21:03:44] review madhuvishy? it mostly ready. i'm going to walk thorugh it with ori tomorrow it hink [21:04:13] ottomata: sure, i can join your review with ori too, just to stare if not anything else [21:05:08] where is the code? [21:07:33] https://gerrit.wikimedia.org/r/#/c/235671/? [21:09:36] madhuvishy: sorry, got my gmail situation sorted out now [21:09:49] ah great! [21:09:58] ja madhuvishy https://gerrit.wikimedia.org/r/#/c/235671 [21:16:28] milimetric: I started poking around with the new API, and I can't get a per-article query to work. is that not enabled yet? [21:16:39] https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Selfie/daily/2015010101/2015100201 [21:16:50] gimme about 30 min. ragesoss, I'm in a meeting [21:17:10] thanks much. [21:22:45] ragesoss: your query seems to work for me [21:28:21] madhuvishy: odd; it's working for me now as well. [21:28:34] I tried it multiple times and kept getting 404 previously. [21:29:28] ragesoss: may be some tables were being updated and it just finished, i'll let Dan confirm that when he's back [21:31:53] it's only returning two datapoints now though. [21:33:00] milimetric: recommender back in the game ! [21:33:16] awesome, thx joal [21:33:49] I'll send an email to Leila and Ellery to say sorry for the downtime [21:33:55] madhuvishy, joal, milimetric : need to catch up on e-mail and recruting afte not having e-mail all day but by 3pm madhuvishy i can join you in wikimetrics work [21:35:18] ah, I guess there's only data since October 1? [21:35:47] and it gives a 404 if the whole query range is outside of what is available? [21:36:52] see you tomorrow a-team ! [21:37:09] laters! [21:37:51] Are there plans to let users query for multiple articles at once? [21:46:04] ragesoss: back, all yours [21:46:16] first - very weird you're only getting 2 data points for that... investigating [21:46:28] we are still loading, but I thought we were good on the per-article daat [21:46:29] *data [21:46:31] milimetric: it makes sense if there is only data from October. [21:46:43] because my query ended October 2. [21:46:53] and October 1 and 2 were the datapoints. [21:46:56] oh! I didn't notice. Yes, data starts Oct 1st [21:47:10] that solved it for me, then. [21:47:19] so a little current gotcha is that it just doesn't return anything for days where it has no data [21:47:34] milimetric: the other thing I wanted to ask about is above: plans for handling multiple articles at once? [21:47:35] so it's up to the client to assume 0 if it's a sensical date or "null" if it's in the future or something [21:47:43] ragesoss: you're the second user I know of [21:47:50] and the second person to ask for multiple articles at once :) [21:48:05] so - filing a phab ticket right now. But we weren't planning on it, no [21:48:12] my current use case is that I want to get a quick and dirty estimate of average daily views for *lots* of articles, as quickly as possible. [21:48:20] and it would change the way we did the rest URL [21:48:26] but that's not horrible [21:48:46] ragesoss: how many is lots? [21:49:02] 5 million-ish. [21:49:02] like entire wikiprojects at a time, is that what you're doing? [21:49:04] ;) [21:49:10] oh......k [21:49:21] so why not just get the project level aggregates at that point? [21:49:44] I want to find out which articles are highly viewed but not highly developed. [21:49:59] basically the intersection of pageviews and halfak's work on estimating quality. [21:50:09] i see [21:50:20] ^ yeah. That [21:50:27] I want to prload the data for this tool: https://outreachdashboard.wmflabs.org/article_finder [21:50:35] I dunno what you are talking about, but what you just said sounds good. [21:50:37] *preload [21:50:42] you might want to just query Hive directly for that, the API is not really meant for like large-scale analysis like this [21:51:12] okay. I'll look into that. [21:51:28] ragesoss: i forget, you have access to stat1002 to query Hive right? [21:51:50] milimetric: I think so, although I'm not certain. [21:53:21] milimetric: I'll probably also try to rewrite my views-fetching system soon to use the api instead of stats.grok.se. [21:53:22] hm, I suppose we could do something like https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Selfie|||Photography/daily/2015100100/2015111200 [21:53:35] How far back will the data filling go? [21:53:58] ragesoss: right, that's one of the main use cases - a replacement for stats.grok.se [21:54:06] we have data since May 2015 [21:54:12] and we'll backfill to that [21:54:30] past that we have the old data but it doesn't have the same dimensions and is of varied quality, so we're kind of hesitant to add it [21:54:47] okay. I'll need to investigate the consequences of that for my system. [21:55:16] Analytics-Backlog: AQS: query multiple articles at the same time - https://phabricator.wikimedia.org/T118508#1802162 (Milimetric) [21:55:25] I think we wouldn't need to run queries that go back farther than that, at least after the start of 2016. [21:55:29] ragesoss: ^^ phab ticket requesting multiple article query [21:55:44] ok, cool [21:55:57] I was wondering how many use cases there really are for older data [21:56:16] and ragesoss: thanks very much for playing with it :) [21:56:54] milimetric: what dashboard.wikiedu.org does is provided data on how many views an article (worked on by a student in one or our classroom projects) has had since that user worked on it. [21:57:29] very cool, yeah [21:57:40] that includes after the course ends (to some extent) [21:57:41] we would've loved to get hourly granularity to help more with that kind of query, but it's just too much data [21:57:57] so it might drastically change the numbers for older courses that are still being updated. [21:58:16] milimetric: I really don't care about hourly granularity for my purposes. [21:58:50] that's good to know too [21:58:50] basically, all the data I get from stats.grok.se gets translated into 'how many views has this article gotten since this revision' [21:59:09] oh so not between revision X and revision X+1 [21:59:13] which, I guess there'd be a tiny improvement in accuracy with hourly. [21:59:13] just > revision X [21:59:56] milimetric: yeah. because our system only pulls in revisions made by users in the system. so it isn't aware of revisions by others that happened in between. [22:00:07] and the point is to highlight the audience for a user's work. [22:00:28] yeah, makes perfect sense to me, it's never going to be accurate even if you threw a huge amount of computation at it [22:00:31] on the assumption that it didn't get reverted, then all the views since that revision are views that were affected by that user's contribution. [22:01:32] yeah. and the numbers have not real reason to be super precise, as long as they give the right sense of how much (or little) audience the contributions have. [22:01:50] very very cool. Yeah, I'm writing the public announcement for the API now, do you mind if I include a link to http://dashboard.wikiedu.org/ with an explanation of how you plan to use it? [22:01:57] Analytics-Backlog, Analytics-Cluster, operations: Audit Hadoop worker memory usage. - https://phabricator.wikimedia.org/T118501#1802174 (Ottomata) a:Ottomata [22:02:05] sure, go for it. [22:02:09] cool, thx [22:02:51] one of the things in the cards (the project relacted to halfak's work) is an 'article finder': a tool for identifying topics that are ripe for improvement because they are underdeveloped but highly viewed. [22:03:44] I linked a proof-of-concept that pulls in articles based on a category (and optionally, subcategories). [22:03:57] I think the API would be useful to query one-at-a-time as you got the score from halfak's tool you could get the pageviews [22:04:00] But stats.grok.se is far too slow for it work for large categories. [22:04:35] but to find in bulk all the articles that hit certain thresholds for both metrics, it might make sense to grab all the data in bulk and analyze it all at once [22:04:55] yeah. I'll look into that soon. [22:07:16] ragesoss, how is ORES for your bulk needs? [22:07:22] hm... though I kind of think this makes a lot of sense as part of the next phase of features for the API. ragesoss maybe comment about what you're doing here: https://phabricator.wikimedia.org/T112956 [22:07:45] halfak: okay so far. it's not the bottleneck, even without threading requests, since it handles 50 at a time. [22:08:12] halfak: it makes sense to me to stick revscores, average monthly pageviews, and other key stats into a Druid db [22:08:20] +1 [22:08:26] halfak: and if I hit it with like 5 or 10 threads (okay since you've got 16 workers or whatever, right?) then I expect I can get even more from it. [22:08:26] then it would be trivial to say: "get me top 10 most viewed with worst 10 scores" [22:08:37] We've been experimenting with scoring entire time slices. [22:08:54] halfak: intereresting! [22:08:55] It's not blazing fast, for sure, but we can score the entire wiki in ~24 hours. [22:09:05] enwiki that is. [22:09:20] Updating those scores is much cheapers because most pages don't get edited every day. [22:09:30] milimetric: the other thing I would find handy is larger aggregates, eg, 1 month. [22:09:44] right, we did have a monthly granularity job but we haven't enabled it yet [22:09:51] that's why "daily" is the only choice for per-article [22:09:59] but definitely mention that in the task ^^ [22:10:06] will do. [22:10:15] now is when we're trying to get what our next steps should be [22:11:56] nuria: hmMm! https://github.com/linkedin/camus/pull/249 [22:19:58] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1802250 (Ragesoss) I've started poking at this, feeling out the path for switching from stats.grok.se to this for dashboard.wikiedu.org. There are a couple of... [22:26:17] Analytics: MobileWikiAppDailyStats should not count Googlebot - https://phabricator.wikimedia.org/T117631#1802286 (JKatzWMF) Thanks, @dbrant. Makes sense! [22:29:56] Analytics: MobileWikiAppDailyStats should not count Googlebot - https://phabricator.wikimedia.org/T117631#1802298 (JKatzWMF) actually @dbrant, does the 'like' clause you suggested actually block googlebot? [22:31:07] Analytics: MobileWikiAppDailyStats should not count Googlebot - https://phabricator.wikimedia.org/T117631#1802306 (Dbrant) It should... unless Googlebot's useragent string contains "-r-"... [22:34:26] milimetric: this url is considered a bug, right? https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/get_metrics_pageviews_per_article_project_access_agent_article_granularity_start_end [22:34:50] the fact that it's so nasty? [22:35:12] milimetric: yeah. and https://www.wikimedia.org/api/rest_v1 and https://www.wikimedia.org/api just give 404s. [22:35:45] www.wikimedia.org isn't the same as wikimedia.org, oddly and confusingly, but I don't know the details of that [22:36:27] but as far as the nastiness of the URL, that's auto-generated and we have no real solution except to change the REST-full URL to query parameters [22:36:31] milimetric: I actually put in the one without www and got redirected to the www 404 [22:36:35] which we'd rather not do for cache / performance reasons [22:37:00] oh! I see what you mean, ragesoss [22:37:14] how about a redirect from https://wikimedia.org/api to the docs? [22:37:24] and likewise for api/rest_v1 [22:37:25] yeah, what's happening is if you leave out the trailing / it fires the 404 which is handled by the host wiki I guess, and that is on the different domain [22:37:44] that's a good idea. I'll file a ticket and cc the Services folks that took care of setting that up [22:38:15] long term the idea is to make the same data accessible via en.wikipedia.org/api/rest_v1/ and that wouldn't have the same problem [22:39:39] it'd still be useful to have /api and api/rest_v1 redirect to docs, on whatever wiki you're on. [22:39:52] Analytics-Backlog, Services: wikimedia.org/api and wikimedia.org/api/rest_v1 should redirect to the docs - https://phabricator.wikimedia.org/T118519#1802349 (Milimetric) NEW [22:40:04] especially for anyone used to how api.php works. [22:40:26] yep, agreed, that makes sense, it's jut not in our control, since we're behind the main RESTBase interface [22:40:42] but I figure those guys are probably going to agree [22:40:53] milimetric: I think we just forgot about wikimedia.org [22:40:59] https://en.wikipedia.org/api/ [22:41:20] right, https://en.wikipedia.org/api redirects to that too, that's great [22:41:31] gwicke: would it be hard to do the same for wikimedia.org? [22:41:36] no, not at all [22:41:41] awesome, thx very much [22:41:45] it's in mediawiki-config, IIRC [22:42:50] milimetric: I'm trying to start with the wikimetrics changes - i'm a bit confused on where to start [22:43:16] cool madhuvishy, wanna chat here / batcave? [22:43:31] we can batcave [22:44:31] milimetric: i'm there [22:45:17] mrt [22:59:05] Analytics-Tech-community-metrics: What is contributors.html for in korma? - https://phabricator.wikimedia.org/T118522#1802427 (Aklapper) NEW [22:59:29] Analytics-Tech-community-metrics: What is contributors.html for in korma? - https://phabricator.wikimedia.org/T118522#1802434 (Aklapper) p:Triage>Low [23:07:49] milimetric: https://gerrit.wikimedia.org/r/#/c/252863/ [23:13:17] oh wow. milimetric, I can pull in 500 pages worth of data in about 10 seconds. [23:13:32] this is soooo much better than stats.grok.se in that respect. [23:14:10] :) [23:14:47] Analytics-Backlog, Research-and-Data: Historical analysis of edit productivity for English Wikipedia - https://phabricator.wikimedia.org/T99172#1802486 (Halfak) New [Token stats] generation complete and sample extracted for analysis. Fun story, Token stags extraction is much faster with reasonable diffs :) [23:14:54] I'm glad you're happy, but do keep in mind that a dedicated database might be better for certain use cases. This is not a great fit for bulk data [23:15:49] just logically because it's breaking up that bulk data into tiny chunks, and then combining it again to serve a big request. So it might make sense to skip the middle man [23:15:57] milimetric: yeah, totes. [23:16:13] (which we may want to figure out a way to do in a future version of this, that shapes the endpoints differently) [23:17:07] but this will really cut down the update time for the stuff we're currently doing through stats.grok.se, the less-bulky article-by-article view data stuff. [23:17:29] now halfak's ORES is the bottleneck, all of a sudden. [23:17:42] nooo! [23:17:44] lol [23:18:27] halfak: is it okay to hit ORES with like 10 threads? [23:18:36] Yes. [23:18:42] Hit it with 50 if you need to [23:18:50] We'll respond with a 503 if we are overloaded. [23:18:57] If we are regularly overloaded, we'll add hardware. [23:18:58] 50 threads @ 50 articles per request? cool beans. [23:22:18] ragesoss: have you tried to use the "top" endpoint to get the top 1000 articles and see what the scores for those are? [23:22:28] maybe you can find enough suggestions among those, but maybe 1000 is not enough [23:22:52] milimetric: I have not. but that's definitely not enough. [23:23:04] k :) well, definitely on enwiki [23:23:08] especially because we want to create tailored lists for specific category. [23:23:11] but maybe for other projects [23:23:17] ah, right [23:23:19] then yea [23:23:44] I mean, ideally we can move to something better that literal categories. [23:23:52] hm, interesting use case. It's useful to spell out that basically what you'd need is "top 100 low ORES-score articles per category" [23:23:56] since those don't really capture 'topic area' well. [23:24:04] wikiprojects? [23:24:20] I know nothing of wikis, but I've been hearing more and more work done on wikiprojects [23:25:02] closer, but ideally it'd be more like based on network analysis and text analysis. [23:25:41] part of the fundamental problem is that these are topics that the community has not done well in building in the first place. [23:25:50] so we can expect the metadata about them to be similarly lacking. [23:26:01] they won't necessarily be tagged with the relevant categories or WikiProjects [23:26:26] even if the scopes of those categories and wikiprojects were perfect. [23:26:38] (which they are not, by a long shot) [23:31:30] milimetric: many popular articles are off from stats.grok.se by a factor of 2 or 3. [23:31:34] is that known/expected? [23:32:01] to clarify, rest_v1 is showing 2-3 times the page views. [23:32:12] this is all-access/user [23:43:21] gwicke: do we have throttling on rest based apis? [23:51:14] nuria: yeah, since wednesday or so [23:51:37] I'm trying to convince bblack to raise the limit for the rest apis at https://phabricator.wikimedia.org/T118365 [23:53:18] nuria: is your question actually about access to the metrics, or the value of the metrics? [23:53:55] gwicke: I see, we have users & systems accessing the same api, which is an issue [23:55:27] the rate limiting only applies if you go through varnish [23:56:07] if you are internal & don't need Varnish caching, then hitting restbase.svc.eqiad.wmnet:7231 directly avoids the limits & Varnish overheads [23:57:16] nuria: if your concern is about things that should go through varnish (like external consumers), then please add a note on https://phabricator.wikimedia.org/T118365 [23:58:10] gwicke: our throttling is per IP then? [23:58:24] yeah [23:58:54] / TBF: "1, 0.02s, 250" == "50/s, burst of 250"