[03:12:11] 10Analytics, 10Phabricator: Get rid of Pageview-API -> Analytics auto-tagging - https://phabricator.wikimedia.org/T146042#2889020 (10Aklapper) (For future reference: If nobody adds a #Phabricator tag to issues with [[ https://www.mediawiki.org/wiki/Phabricator/Help/Herald_Rules | Phabricator Herald ]] (=auto-t... [08:11:02] * elukey checks the oozie emails.. [08:13:21] ah snap text and upload are blocked [08:14:12] or maybe just lagging a lot [08:15:53] ah yes [08:15:56] goooood [09:16:25] 06Analytics-Kanban, 06Operations, 10Ops-Access-Requests, 13Patch-For-Review, 15User-Elukey: Requesting access to Analytics production shell for Francisco Dans - https://phabricator.wikimedia.org/T153303#2875954 (10elukey) 05Open>03Resolved [09:21:44] (03CR) 10DCausse: [C: 031] "lgtm, as the only one who never touched a line of code in this patch let me know if I should +2?" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [09:28:15] 10Analytics-Cluster, 06Operations: stat1004 - sync snakebite version with repo - https://phabricator.wikimedia.org/T153493#2889361 (10elukey) snakebite changelog on stat1004 ``` snakebite (2.11.0) unstable; urgency=low Support dfs.client.use.datanode.hostname config property -- Wouter de Bie 10Analytics-Cluster, 06Operations: stat1004 - sync snakebite version with repo - https://phabricator.wikimedia.org/T153493#2889365 (10elukey) Ah there you go: https://phabricator.wikimedia.org/T152771 :) [09:29:45] elukey: checked as well, seems indeed just some big lag [09:30:05] elukey: o/, by the way ;) [09:30:19] 10Analytics-Cluster, 06Operations: stat1004 - sync snakebite version with repo - https://phabricator.wikimedia.org/T153493#2889371 (10elukey) [09:31:51] joal: o/ [10:18:04] 06Analytics-Kanban, 10MediaWiki-Vagrant: Enable 'analytics_cluster' role on Labs instance - https://phabricator.wikimedia.org/T151861#2889440 (10mschwarzer) Thanks for setting it up. Yes, we would like to test the whole integration of our Citolytics project ( T143197 ) into the Oozie work flow, i.e. processi... [10:42:16] 06Analytics-Kanban: Create 1-off tsv files that dashiki would source with standard metrics from datalake - https://phabricator.wikimedia.org/T152034#2889475 (10JAllemandou) {meme, src=votecat} [10:43:31] 10Analytics, 10Analytics-Cluster: Refactor webrequest_source partitions and oozie jobs - https://phabricator.wikimedia.org/T116387#2889477 (10elukey) [10:43:33] 10Analytics-Cluster: Make varnishkafka produce using dynamic topics - https://phabricator.wikimedia.org/T108379#2889476 (10elukey) 05Open>03declined [11:25:45] * elukey lunch! [12:44:08] gone for some time a-team [13:03:29] 06Analytics-Kanban, 13Patch-For-Review, 15User-Elukey: Puppetize clickhouse - https://phabricator.wikimedia.org/T150343#2889727 (10elukey) It has been decided within the team to wait for upstream changes before proceeding further. [13:26:43] Hi everyone. Is somebody using the Vagrant analytics/eventlogging role with Wikipedia schemas? https://phabricator.wikimedia.org/T153641 [13:50:01] hey team :] [15:20:35] 10Analytics-Tech-community-metrics, 06Developer-Relations: Measuring Time To First Code Change (TTFCC) - https://phabricator.wikimedia.org/T137201#2890024 (10Qgil) @Peter, if you are participating in the Summit you can still propose an Unconference session there. In any case, maybe is it worth discussing this... [15:37:00] 06Analytics-Kanban, 10Pageviews-API: Pageview API: Better filtering of bot traffic on top enpoints - https://phabricator.wikimedia.org/T123442#2890066 (10Milimetric) [15:37:30] 10Analytics, 10Pageviews-API: Pageview API: Better filtering of bot traffic on top enpoints - https://phabricator.wikimedia.org/T123442#2890069 (10Milimetric) [15:52:10] 10Analytics-Wikistats, 06Editing-Analysis: Update active editor metrics to use consensus definition - https://phabricator.wikimedia.org/T153702#2890123 (10Milimetric) @Nuria "active editors" does show up on the Research:Standard_metrics page but not in the way meant here. The "active editors" there are all in... [15:57:40] 06Analytics-Kanban, 06Reading-Web-Backlog: mobile-safari has very few internally-referred pageviews - https://phabricator.wikimedia.org/T148780#2890139 (10mforns) Here are some observations and conclusions: * The problem doesn't just affect Mobile Safari, **a lot of browsers and operating systems are affected... [16:00:38] a-team: standddupppp [16:00:52] cominggg [16:09:46] 06Analytics-Kanban, 10Analytics-Wikistats: Visual Language for http://stats.wikimedia.org replacement - https://phabricator.wikimedia.org/T152033#2890181 (10Nuria) [16:09:47] 06Analytics-Kanban, 10Analytics-Wikistats: Navigation model concept development - https://phabricator.wikimedia.org/T152438#2890180 (10Nuria) 05Open>03Resolved [16:10:01] 06Analytics-Kanban: event_user_is_anonymous is never true - https://phabricator.wikimedia.org/T153492#2890182 (10Nuria) 05Open>03Resolved [16:10:16] 10Analytics-Dashiki, 06Analytics-Kanban, 13Patch-For-Review: Remove dependency on available-projects.json file hosted in labs - https://phabricator.wikimedia.org/T136120#2890183 (10Nuria) 05Open>03Resolved [16:10:29] 06Analytics-Kanban, 10GitHub-Mirrors, 07Documentation, 07Easy, 13Patch-For-Review: Mark documentation about limn as deprecated - https://phabricator.wikimedia.org/T148058#2890184 (10Nuria) 05Open>03Resolved [16:12:15] 06Analytics-Kanban, 13Patch-For-Review: Stop and remove legacy TSV generation jobs - https://phabricator.wikimedia.org/T153082#2890188 (10Nuria) 05Open>03Resolved [16:12:23] 06Analytics-Kanban: Attach error file to webrequest load email - https://phabricator.wikimedia.org/T153212#2890189 (10Nuria) 05Open>03Resolved [16:12:25] 06Analytics-Kanban, 13Patch-For-Review: Change requests drop alarms to be more precise regarding data loss - https://phabricator.wikimedia.org/T148980#2890190 (10Nuria) [16:50:30] 10Analytics, 10Phabricator: Get rid of Pageview-API -> Analytics auto-tagging - https://phabricator.wikimedia.org/T146042#2890281 (10Milimetric) 05Open>03Resolved resolved by T153723 [16:56:45] fdans: you wanted to chat [16:57:13] milimetric in about ~1 hour? [16:57:42] we have staff meeting and then I have another meeting and then lunch [16:58:17] oh the staff meeting isn't in my calendar [16:58:29] I'll send you an email with what you need, best guess [16:58:32] I'll add you to staff [16:58:59] (done) [16:59:21] thank you! [17:00:01] milimetric: do we want to put these on vital signs dashboard so they are more findable? https://analytics.wikimedia.org/dashboards/standard-metrics/#projects=eswiki,itwiki,enwiki,jawiki,dewiki,ruwiki,frwiki/metrics=(Beta)%20Monthly%20New%20Editors [17:00:12] nuria: I don't think so [17:00:16] they're not updated [17:00:20] (automatically) [17:00:26] and they're not yet vetted by research [17:00:38] these aren't Erik's metrics, they're the standard metrics [17:00:56] milimetric: right, those were the ones we had on vital signs before right? [17:01:00] so I think the next step is a) we can show this off and be proud, b) we can follow up with research [17:01:39] we had some of them on vital signs, yes, but based on the standard definitions in Research:Standard_metrics and these are interpreted from those and still different from them enough that we should vet them [17:01:50] (we did the same initially for vital signs) [17:02:07] nuria: we can talk to dario about this, you ok with the meeting I set up? [17:02:16] milimetric: yes, [17:02:28] ok, yeah, let's mention it there then [17:04:30] nuria: can you please review this real quick? https://phabricator.wikimedia.org/T153763 [17:04:42] milimetric: on meeting, will look after [17:04:47] it's a request to add herald rules for all our projects [17:04:59] a-team: in general, let me know if you're not ok with these auto-tagging rules: https://phabricator.wikimedia.org/T153763 [17:05:22] Hi, I'm not sure if this is the right place, but how would I request my own scratch database on analytics-store.eqiad.wmnet? [17:06:32] awight: the staging database there is meant for scratch work, the research user we all have has write access there, do you need more than tables, do you need your own db? [17:07:03] milimetric: Thanks! That should be perfect--I just came across the documentation here, as well: https://wikitech.wikimedia.org/wiki/Analytics/Data_access#MariaDB_slaves [17:07:08] ok, cool, then just "use staging;" and commence having fun [17:07:19] ;) [17:30:19] team, we found something about the referrer problem: [17:30:20] https://bugs.webkit.org/show_bug.cgi?id=154588 [17:30:25] https://lists.w3.org/Archives/Public/public-webappsec/2015May/0064.html [17:31:00] https://moz.com/blog/meta-referrer-tag [17:31:14] 06Analytics-Kanban: Create 1-off tsv files that dashiki would source with standard metrics from datalake - https://phabricator.wikimedia.org/T152034#2890389 (10Nuria) Let's please talk about putting these metrics in a more "findable" location [17:32:21] it looks like Feb 23 we WMF changed the referrer meta tag to correct a legacy typo, from origin-when-crossorigin to origin-when-cross-origin (add dash) [17:33:04] older browsers, seem to be not supporting the new corrected version [17:33:39] :O [17:33:58] and are sending no referrer with internal links [17:36:14] good finding mforns! [17:36:23] elukey, we're confirming it... :] [17:47:37] 06Analytics-Kanban: decommission outdated instances on labs - https://phabricator.wikimedia.org/T153193#2872345 (10Milimetric) p:05Triage>03Normal [17:47:53] 06Analytics-Kanban: Clean up datasets.wikimedia.org - https://phabricator.wikimedia.org/T125854#2890449 (10Milimetric) a:03Milimetric [17:49:08] wow [17:49:13] nice marcel [17:50:24] milimetric, do you know how 'they' configure meta tags in mediawiki, because I could not find any mediawiki deployment [17:50:48] if it's not with core my guess is some type of extension [17:50:52] let's see [17:51:42] searching https://github.com/wikimedia/mediawiki-extensions for "meta" gives a couple results [17:51:52] aha [17:53:28] I searched for origin-when-cross-origin in the three I found: [17:53:31] milimetric, joal found it, it's mediawiki-config repo [17:53:39] oh right, that makes sense [17:54:07] mforns niiiice :D [17:54:13] (none of the extensions had origin-when-cross-origin in it) [17:54:37] 06Analytics-Kanban: Create 1-off tsv files that dashiki would source with standard metrics from datalake - https://phabricator.wikimedia.org/T152034#2890480 (10Milimetric) I'm all for it, but they should be vetted first and then we have a little name collision problem with Dashiki:CategorizedMetrics to solve. [17:55:04] Hi! How could I install plotly to use with ipython? [17:55:41] AndyRussG: you mean on the stat boxes? [17:55:53] milimetric: yeah stat1002 [17:56:39] pip install plotly apparently doesn't get a network connection or something [17:56:40] AndyRussG: file a request, it has to be installed via a debian package [17:57:10] milimetric: K gotcha : [17:57:12] :) [17:57:14] thanks [17:57:16] ! [17:57:17] np [18:01:27] (03PS2) 10Joal: Add oozie job loading MW history in druid [analytics/refinery] - 10https://gerrit.wikimedia.org/r/328154 (https://phabricator.wikimedia.org/T141473) [18:26:45] urandom: let me know how are you doing with access and getting the dataset [18:41:39] (03PS3) 10Joal: Add oozie job loading MW history in druid [analytics/refinery] - 10https://gerrit.wikimedia.org/r/328154 (https://phabricator.wikimedia.org/T141473) [18:42:11] (03CR) 10Nuria: [C: 032] Lucene Stemmer UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [18:50:00] ebernhardson: merged udf but we will probably not deploy this until after xmas (let us know if you need it sooner) [18:50:49] 06Analytics-Kanban, 13Patch-For-Review: Productionize edit history extraction for all wikis using Sqoop - https://phabricator.wikimedia.org/T141476#2890622 (10Nuria) [18:50:55] 06Analytics-Kanban: Productionize Edit History Reconstruction and Extraction - https://phabricator.wikimedia.org/T152035#2890624 (10Nuria) [18:50:57] 06Analytics-Kanban, 13Patch-For-Review: Productionize edit history extraction for all wikis using Sqoop - https://phabricator.wikimedia.org/T141476#2499954 (10Nuria) 05Open>03Resolved [18:51:08] 06Analytics-Kanban, 13Patch-For-Review: Create SLA alarms for pageview_hourly jobs - https://phabricator.wikimedia.org/T152109#2890625 (10Nuria) 05Open>03Resolved [19:40:13] see you tomorrow a-team! [19:40:31] bye fdans, see you! [19:46:55] nuria: i think we're still in the waiting period: https://gerrit.wikimedia.org/r/#/c/328181 [19:47:06] nuria: merge date tomorrow [20:03:44] urandom: is waiting period ok? [20:04:06] nuria: yeah, i've got other things i can work on [20:05:47] mforns: are we sure enough about our meta tag theory to tell reading what the problem might Be? [20:07:10] nuria, I'm looking right now into confirming the hypothesis by checking which major versions of each browser fail in passing the referrer [20:07:19] it looks like it's true [20:07:23] mforns: k [20:07:57] for example, in the case of Chrome Mobile, the major version 34 is the only one affected [20:08:37] curiously enough versions 28 and 30 are not affected... [20:10:32] mforns: ok, will do a brief update on ticket so as to ping brandon? boy is strange only 1 vs would be affected... [20:10:56] nuria, I'm writing the update, don worry [20:11:00] k [20:11:13] yes, will look a bit more into it [20:15:18] 06Analytics-Kanban, 10CirrusSearch, 06Discovery, 06Discovery-Search, 13Patch-For-Review: Evaluate using SERP click throughs to build a search feedback loop - https://phabricator.wikimedia.org/T148811#2890937 (10Nuria) [20:17:00] I think it's because: 1) Versions from 2013 or older do not implement referrer meta tag, so ignore it and pass the referrer normally, 2) Versions from 2014 or newer, implement referrer meta tag with typo, so do not recognize the correct version (no typo) from mediawiki code and return no referrer 3) Newest versions implement the referrer meta tag with the correct (no typo) version, so referrer is passed normally. [20:18:25] mforns: nice piece of knowledge, there is also a well known typo on the refferer spec on http so one more to add to that mix [20:19:30] mforns: doesn't sound like any action is needed in our part or traffic though, just documenting ticket here: https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly#Changes_and_known_problems_since_2015-06-16 [20:19:43] nuria, aha [20:33:47] 10Analytics: Add global last-access cookie for top domain (*.wikipedia.org) - https://phabricator.wikimedia.org/T138027#2890969 (10Nuria) >I think we should restrict ourselves (for starters) to the *.wikipedia.org domain Agreed [20:35:59] 10Analytics: Add global last-access cookie for top domain (*.wikipedia.org) - https://phabricator.wikimedia.org/T138027#2890972 (10BBlack) Yes, we can do Q3. The actual work on our end is fairly minimal, just need to pencil it in and remember to get it done! [20:39:54] milimetric: i have not connected to EL in foreverr [20:40:11] milimetric: this no longer works? [20:40:13] mysql --defaults-file=/etc/mysql/conf.d/analytics-research-client.cnf [20:40:21] cc mforns [20:40:37] nuria, looking [20:41:24] nuria, are you in staging or production? [20:41:51] i was connecting through 1002 [20:42:06] mforns: to log db which is a copy of EL database but not production [20:42:31] oh [20:43:01] nuria, I do: mysql --defaults-extra-file="/a/.my.cnf.research" -h analytics-store.eqiad.wmnet [20:43:44] although usually I use stat1003 [20:44:22] mforns: ah, ok, it works , thank you [20:44:23] is that what you mean? I use the same replica for mediawiki dbs and log db [20:44:46] mforns: ya, before i did not needed to specify host but it is needed now, thanks ! [20:44:57] np [20:46:58] 10Analytics, 10Pageviews-API, 06Reading-analysis: Skewed pageviews for Azerbaijani and Bulgarian Wikipedias, September, October and November 2016 - https://phabricator.wikimedia.org/T153699#2887617 (10Nuria) Amire80: from report it is likely a bot looking for an exploit [20:54:25] 06Analytics-Kanban: Investigate duplicate EventLogging rows - https://phabricator.wikimedia.org/T142667#2890996 (10Nuria) Running this again number of so-called duplicates is small. Results of blog schema: 201508 108015 12.14 201509 132388 10.76 201510 87961 13.84 201511 71734 14.96 201512 106813 14.35 201601... [21:17:39] 06Analytics-Kanban, 06Reading-Web-Backlog: mobile-safari has very few internally-referred pageviews - https://phabricator.wikimedia.org/T148780#2891117 (10mforns) We found a potential cause of the issue: copying @BBlack On Feb 22, 2016 the WMF enabled the html referrer meta tag via this change[1] in wmf-confi... [21:47:35] 06Analytics-Kanban: Investigate duplicate EventLogging rows - https://phabricator.wikimedia.org/T142667#2891207 (10Nuria) I think there are several issues here, the main one is is likely a browser issue, almost all duplicated events on pop up schema are coming from FF (which unlike chrome doesn't cancel outgoing... [21:50:11] 06Analytics-Kanban: Investigate duplicate EventLogging rows - https://phabricator.wikimedia.org/T142667#2891209 (10Nuria) I really do not see any action items here for analytics. The issues afflicting schemas are of different nature and can be tracked back to the client for these two examples. [22:54:02] (03PS3) 10Bearloga: [WIP] POC of loading tile data into pivot [analytics/refinery] - 10https://gerrit.wikimedia.org/r/327845 (https://phabricator.wikimedia.org/T151832) (owner: 10Nuria) [22:59:03] 06Analytics-Kanban, 06Discovery, 06Discovery-Analysis (Current work), 03Interactive-Sprint, 13Patch-For-Review: Add Maps tile usage counts as a Data Cube in Pivot - https://phabricator.wikimedia.org/T151832#2891483 (10mpopov) >>! In T151832#2882399, @Nuria wrote: > Are the columns on this table the dimen... [23:12:00] 06Analytics-Kanban, 06Reading-Web-Backlog: mobile-safari has very few internally-referred pageviews - https://phabricator.wikimedia.org/T148780#2891520 (10JKatzWMF) Thank you for solving this mystery, @mforns. Is there anything we can do on our end to remedy this?