[00:48:28] Analytics / Wikimetrics: Story: VSUser has corrected historical edits/pages data - https://bugzilla.wikimedia.org/72114 (Kevin Leduc) NEW p:Unprio s:enhanc a:None historical data for #edits/#pages_created in Vital Signs was generated using an old/incorrect definition (it didn't include dele... [00:49:07] Analytics / Wikimetrics: Story: VSUser has corrected historical edits/pages data - https://bugzilla.wikimedia.org/72114 (Kevin Leduc) p:Unprio>Normal [00:53:07] Analytics / Wikimetrics: Story: VSUser has corrected historical edits/pages data - https://bugzilla.wikimedia.org/72114#c1 (Kevin Leduc) collaborative tasking here: http://etherpad.wikimedia.org/p/analytics-72114 [01:10:39] Analytics / Wikimetrics: Accept more timezones as input - https://bugzilla.wikimedia.org/72116 (James Hare) NEW p:Unprio s:enhanc a:None Currently, Wikimetrics accepts time-and-date input in the UTC timezone only. This means that I need to find the time ranges for each event I run and conve... [01:18:54] Analytics / Wikimetrics: Give the option of using the same time range for all reports for a given cohort - https://bugzilla.wikimedia.org/72117 (James Hare) NEW p:Unprio s:enhanc a:None The current workflow when generating a new report: (1) Pick Cohorts (2) Pick Metrics (3) Configure Outpu... [01:31:26] Analytics / Wikimetrics: Scheduling reports for the future - https://bugzilla.wikimedia.org/72119 (James Hare) NEW p:Unprio s:enhanc a:None It would be useful to be able to compute editor retention metrics by scheduling a report in advance. For example, say I have an edit-a-thon on 9/21/201... [12:28:52] milimetric: you around? [13:48:09] Analytics / Wikimetrics: Story: VSUser has data from 2008 to now - https://bugzilla.wikimedia.org/72133 (Kevin Leduc) NEW p:Unprio s:enhanc a:None Backfill metrics data all the way back to 2008 for implemented metrics. Assuming labsdb has the necessary data going back that far (some metrics... [13:49:08] Analytics / Wikimetrics: Story: VSUser has data from 2008 to now - https://bugzilla.wikimedia.org/72133#c1 (Kevin Leduc) p:Unprio>Normal Collaborative tasking here: http://etherpad.wikimedia.org/p/analytics-72133 [13:59:24] Analytics / Wikimetrics: Story: VSUser has bots filtered out of all metrics - https://bugzilla.wikimedia.org/72134 (Kevin Leduc) NEW p:Unprio s:enhanc a:None - Retrofit bot filtering using research's definition on all implemented VS metrics. - delete existing (and invalidated) metric data -... [14:00:08] Analytics / Wikimetrics: Story: VSUser has bots filtered out of all metrics - https://bugzilla.wikimedia.org/72134#c1 (Kevin Leduc) p:Unprio>Normal Collaborative tasking: http://etherpad.wikimedia.org/p/analytics-72134 [14:01:17] milimetric: cannot join call, says not allowed [14:04:29] can someone invite me to standup? [14:06:08] tnegrin: invited [14:06:15] you gotta ping us or we don't notice :) [14:07:21] no shit [14:22:54] Analytics / Dashiki: Epic: VSUser breaks down metric by target site - https://bugzilla.wikimedia.org/72135 (Kevin Leduc) NEW p:Unprio s:normal a:None This epic will cover many stories like: - move necessary data into labs db - implement metrics - backfill data - implement UI according to P... [14:23:54] Analytics / Dashiki: Epic: VSUser breaks down metric by target site - https://bugzilla.wikimedia.org/72135#c1 (Kevin Leduc) p:Unprio>Normal s:normal>enhanc collaborative tasking: http://etherpad.wikimedia.org/p/analytics-72135 [14:40:58] qchris one question when sampling like TABLESAMPLE(BUCKET 1 OUT OF 1000 ON rand()) [14:41:15] who is cteating the buckets upon which we are sampling? [14:41:41] You mean the original, unsampled data? [14:41:47] Camus is writing them into HDFS [14:41:53] Oozie is creating the Hive partitions. [14:42:30] But TABLESAMPLE itself is a pure Hive thing. [14:42:52] qchris: so the buckets upon which we are sampling with tablesample [14:43:07] are not related to our partititions [14:43:38] I guess I do not understand the question. [14:44:10] Camus is consuming the data from kafka and is writing data for each hour and each webrequest_source (ie. mobile, text, bits, upload) [14:44:16] into a separate directory in HDFS. [14:44:28] ok, let me rephrase [14:44:36] Then from time to time, Oozie comes around, and hooks this data up with Hive. [14:44:43] you said that with this sampling " TABLESAMPLE(BUCKET 1 OUT OF 1000 ON rand())" [14:44:51] qchris: "sampling is fair" [14:45:33] qchris: how do we know is fair? [14:46:09] I tested it. [14:46:45] I'm having problems connecting to the batcave :( [14:46:54] qchris: i see, there is nothing we did to make it so [14:47:32] mforns: dan will reinvite you [14:47:54] I think it has to do with hangouts [14:48:02] i just invited you mforns [14:48:13] my hangouts in gmail isn't working either [14:48:18] thanks [14:48:41] nuria_: Right. We did not tune it or anything (at least not that I am aware of it). [14:48:57] qchris: ok, then a select like : "select ua(user_agent) from webrequest TABLESAMPLE(BUCKET 1 OUT OF 1000 ON rand()) where year=2014 and webrequest_source="mobile";" [14:49:12] qchris: samples 1/1000 randomly in our whole dataset correct? [14:49:44] With "whole" being "only the data from the mobile cluster" ... Yes. [14:50:04] qchris: ok, great, it only took 100 mins to run, which seems really good [14:52:09] In terms of data ... mobile is the one with fewest requests ... somewhere <10K of the total 120K. [14:52:17] (per second) [15:35:07] Analytics / Wikimetrics: Story: WikimetricsUser downloads large CSV - https://bugzilla.wikimedia.org/71255#c4 (Kevin Leduc) Collaborative tasking done on etherpad: http://etherpad.wikimedia.org/p/analytics-71255 [15:40:37] Analytics / Wikimetrics: Story: WikimetricsUser monitors health of Wikimetrics in graphite - https://bugzilla.wikimedia.org/72138#c1 (Kevin Leduc) s:normal>enhanc use graphite to monitor health of the system. [15:41:07] Analytics / Wikimetrics: Story: WikimetricsUser monitors health of Wikimetrics in graphite - https://bugzilla.wikimedia.org/72138#c2 (Kevin Leduc) Collaborative tasking: http://etherpad.wikimedia.org/p/analytics-72138 [16:22:22] Analytics / Wikimetrics: Story: VSUser has data from 2008 to now - https://bugzilla.wikimedia.org/72133 (Kevin Leduc) [17:21:53] Analytics / Wikimetrics: Story: VSUser has corrected historical edits/pages data - https://bugzilla.wikimedia.org/72114 (Dan Andreescu) [17:21:53] Analytics / Wikimetrics: Story: WikimetricsUser downloads large CSV - https://bugzilla.wikimedia.org/71255 (Dan Andreescu) [17:22:07] Analytics / Wikimetrics: Story: VSUser has bots filtered out of all metrics - https://bugzilla.wikimedia.org/72134 (Dan Andreescu) [17:22:08] Analytics / Wikimetrics: Story: User creates cohort with CentralAuth insertions - https://bugzilla.wikimedia.org/66843 (Dan Andreescu) [17:31:08] Analytics / Dashiki: Story: EEVSUser loads dashboard from URI that specifies state / EEVSUser copies URI that recreates current dashboard state - https://bugzilla.wikimedia.org/70887 (Dan Andreescu) PATC>RESO/FIX [18:59:55] nuria_: isn't it the case that for using the app you have to open an account? [19:00:43] leila: do you? seems that it would be strange we ask users to sign in prior to be able to read wikipedia [19:00:52] lemme check [19:01:02] (if it's not the case, the dependency in your email is serious) [19:01:09] leila: or you mean "google play/apple store"? [19:01:25] no, actual wp account [19:01:55] lelila: not in prior ones, lemme download the latest iOs [19:02:03] sorry leila [19:02:33] soo, in Android, there is no way for me to sing out, which I think means that you have to have a wp account [19:02:38] checking again [19:04:59] you're right! I re-installed it [19:05:03] you can skip opening an account [19:05:10] nuria_, ^ [19:05:18] leila: product wise taht makes a lot of sense [19:05:42] yes, agreed, and the dependency you mentioned is serious [19:06:03] leila: i will re-install ios and let you know but that would be something to talk to mobile about [19:08:19] yes! [19:10:09] (CR) Nuria: "Tested whether we could actually sample all mobile data we keep with 1/1000 rate with the default cache size we have (~1M) in ua parser an" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/166142 (owner: Ottomata) [19:12:26] leila: you can use the app w/o account just fine on IOS (which makes total sense) [19:17:11] good. thanks for checking. same with Android. [19:18:13] milimetric: hi! can you explain me what is expected that the centralauth user expansion does? [19:18:28] sure [19:18:36] so currently someone can put in a cohort like this: [19:18:43] name1,en [19:18:46] name2,de [19:18:47] aha [19:18:50] name1,de [19:19:09] or, to just select a default project (say de) [19:19:13] and the cohort would be: [19:19:14] name1 [19:19:15] name2 [19:19:37] so the checkbox would make it so you can put in: [19:19:38] name [19:19:40] sorry [19:19:41] name1 [19:19:43] name2 [19:19:46] check the box [19:19:50] and you would end up with basically [19:19:52] name1,en [19:19:55] name1,de [19:19:57] name2,en [19:19:58] name2,de [19:20:07] ok [19:20:09] so, [19:20:11] (assuming here that {en, de} is the set of all wikis) [19:20:21] aha [19:20:45] so, if I do not set any default project, all users must specify the project in the cohort [19:20:58] the default project is required i think [19:21:13] but if the checkbox is checked, i guess it would override that requirement [19:21:25] the interface to upload cohorts needs a lot of love [19:21:29] right now it's kind of crazy [19:21:43] we were at some point working with design on this - kevinator might have some updates but I don't see him online [19:21:47] what happens if I type a default project say 'enwiki' [19:21:59] and then [19:22:08] upload the following cohort [19:22:15] then any lines in the uploaded file or pasted lines that does *not* have a cohort gets 'enwiki' as the default cohort [19:22:16] name1,de [19:22:24] name2,fr [19:22:30] name3,ca [19:22:37] those would look in the wikis specified on each line [19:22:44] so the lines override the "default" [19:22:51] but if it was: [19:22:54] name1,de [19:22:55] name2 [19:22:57] name3,ca [19:23:04] then name2 would be looked up in "enwiki" [19:23:10] ok [19:24:11] this is why the upload interface needs work. before we started adding features to it, it was somewhat intuitive [19:24:14] now it's illegible [19:25:04] ok, and what is the relation between the centralauth users and local project users? [19:25:11] they have the same name? and id? [19:26:19] they should have the same name, but ids aren't in those tables [19:26:31] hang on, let me find the email thread where teresa asked this question [19:26:37] it's somewhat weird from what i remember [19:26:42] ok [19:31:52] :( mforns unfortunately I couldn't find the thread [19:32:00] I think this is the best I can offer: http://www.mediawiki.org/wiki/Extension:CentralAuth [19:32:11] that's the extension that provides this "globalization" of usernames [19:32:37] You can read that and then we can talk if there's no clear path forward [19:38:37] that's perfect, thanks! [19:38:49] milimetric: ^ [19:38:58] np [20:34:22] (CR) Nuria: "Main take i think is that we should take advantage of GenericUDF in order to initialize the Geocoding db." (8 comments) [analytics/refinery/source] (otto-geo) - https://gerrit.wikimedia.org/r/164264 (owner: Ottomata) [20:41:11] (CR) Ottomata: "My intention is to keep the Geocoding db logic out of the UDF logic altogether. We should be able to use whatever logic we need with any " (1 comment) [analytics/refinery/source] (otto-geo) - https://gerrit.wikimedia.org/r/164264 (owner: Ottomata) [20:42:36] (PS1) Milimetric: [WIP] Filter out bots from relevant metrics [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/167064 (https://bugzilla.wikimedia.org/72134) [21:01:52] (CR) Ottomata: Add IpUtil, Geocode, and GeocodeCountryUDF classes (3 comments) [analytics/refinery/source] (otto-geo) - https://gerrit.wikimedia.org/r/164264 (owner: Ottomata) [21:16:12] (CR) Nuria: ">My intention is to keep the Geocoding db logic out of the UDF logic >altogether. We should be able to use whatever logic we need with an" [analytics/refinery/source] (otto-geo) - https://gerrit.wikimedia.org/r/164264 (owner: Ottomata)