[00:10:47] Analytics-Backlog: Provide weekly app session metrics separately for Android and iOS - https://phabricator.wikimedia.org/T117615#1780250 (kevinator) [00:18:08] Analytics: MobileWikiAppDailyStats should not count Googlebot - https://phabricator.wikimedia.org/T117631#1780280 (Tbayer) NEW [00:23:13] Analytics-Backlog: Provide weekly app session metrics separately for Android and iOS - https://phabricator.wikimedia.org/T117615#1780300 (Nuria) >Since we have already collected quite a bit of historical data at this point for the aggregated (iOS & Android) metric, we should keep generating it as before, and... [00:28:45] Analytics-Backlog: Move App session data to 7 day counts - https://phabricator.wikimedia.org/T117637#1780342 (JKatzWMF) NEW [00:35:41] Analytics: MobileWikiAppDailyStats should not count Googlebot - https://phabricator.wikimedia.org/T117631#1780378 (Nuria) I believe this is a ticket for reader teamso analytics doesn't need to take an action here. [00:41:54] Analytics-Backlog, Analytics-Wikimetrics, Community-Wikimetrics, Easy, and 2 others: "Create Report" button does not appear when uploading a new cohort - https://phabricator.wikimedia.org/T95456#1780383 (Nuria) [00:42:46] Analytics-Backlog, Analytics-Cluster, Easy: Add better detection of wikipediaApp to user agent UDF {hawk} - https://phabricator.wikimedia.org/T96376#1780385 (Nuria) [00:45:37] Analytics-Backlog, Analytics-Wikimetrics, Community-Wikimetrics, Easy, and 2 others: "Create Report" button does not appear when uploading a new cohort - https://phabricator.wikimedia.org/T95456#1780394 (Nuria) I think this should be an appropiate task for google code in. Please jump on IRC on #wi... [00:46:30] Analytics-Backlog: Move App session data to 7 day counts - https://phabricator.wikimedia.org/T117637#1780399 (Tbayer) (to record our in-person discussion about this from some days ago here, and expand a bit on it:) It makes sense to me that a 7-day window might make the information a bit more actionable. (Pe... [00:46:38] Analytics-Wikimetrics, Easy, Google-Code-In-2015: can't remove users from cohort in Iceweasel (aka Firefox, works fine in Chromium) - https://phabricator.wikimedia.org/T115160#1780402 (Nuria) [00:47:09] Analytics-Wikimetrics, Easy, Google-Code-In-2015: can't remove users from cohort in Iceweasel (aka Firefox, works fine in Chromium) - https://phabricator.wikimedia.org/T115160#1780406 (Nuria) Please jump in #wikimedia-analytics on irc for help [00:50:41] Analytics-Engineering, Analytics-Wikimetrics, Community-Wikimetrics, Easy, Google-Code-In-2015: User reads result of validation after creating a cohort - https://phabricator.wikimedia.org/T76914#1780413 (Nuria) [00:51:00] Analytics-Engineering, Analytics-Wikimetrics, Community-Wikimetrics, Easy, Google-Code-In-2015: User reads result of validation after creating a cohort - https://phabricator.wikimedia.org/T76914#822708 (Nuria) Please jump in #wikimedia-analytics on IRC for help [00:53:08] Analytics-EventLogging: Update EventLogging documentation {stag} [3 pts] - https://phabricator.wikimedia.org/T112313#1780428 (Nuria) Closing, this is done already. [00:53:17] Analytics-EventLogging: Update EventLogging documentation {stag} [3 pts] - https://phabricator.wikimedia.org/T112313#1780429 (Nuria) Open>Resolved [01:04:43] Analytics-Backlog: Move App session data to 7 day counts - https://phabricator.wikimedia.org/T117637#1780490 (Nuria) @JKatzWMF this requires testing, calculating sessions requires quite a bit of data and until we test we will not know if doing it every 7 days renders meaningful data. My recommendation: get o... [01:32:57] Quarry: Add 'download in HTML format' option (Quarry) - https://phabricator.wikimedia.org/T117644#1780562 (XXN) NEW [01:35:17] Quarry: Add 'download as wikitable' option (Quarry) - https://phabricator.wikimedia.org/T117645#1780573 (XXN) NEW [01:39:09] Quarry: Add chat or forum (Quarry) - https://phabricator.wikimedia.org/T117647#1780591 (XXN) NEW [01:42:19] Quarry: Add chat or forum (Quarry) - https://phabricator.wikimedia.org/T117647#1780601 (Legoktm) You can use IRC. [01:43:18] Quarry: Add chat or forum (Quarry) - https://phabricator.wikimedia.org/T117647#1780604 (yuvipanda) There's no Quarry specific IRC channel, and I think there should be a link to whatever it is from Quarry itself. I'll also try to get a Flow board created for this. That seems useful. [02:14:37] Analytics-EventLogging, operations, Graphite: Statsv down since 2015-09-20 07:53 - https://phabricator.wikimedia.org/T113315#1780649 (ori) Open>Resolved a:ori [02:15:05] Analytics-EventLogging, operations, Patch-For-Review: Create a package for python-pykafka for ubuntu precise and debian sid - https://phabricator.wikimedia.org/T109567#1780652 (ori) Open>Resolved hafnium is now on jessie. [02:38:31] Analytics-Backlog: Provide weekly app session metrics separately for Android and iOS - https://phabricator.wikimedia.org/T117615#1780692 (Tbayer) >>! In T117615#1780300, @Nuria wrote: >>Since we have already collected quite a bit of historical data at this point for the aggregated (iOS & Android) metric, we s... [05:23:23] Analytics-Backlog, MediaWiki-API, Reading-Infrastructure-Team, Research-and-Data, and 2 others: Publish detailed Action API request information to Hadoop - https://phabricator.wikimedia.org/T108618#1780820 (bd808) [05:23:26] Analytics-Backlog, Developer-Relations, MediaWiki-API, Reading-Admin, and 6 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1780819 (bd808) [09:04:27] Quarry: Add 'download in HTML format' option (Quarry) - https://phabricator.wikimedia.org/T117644#1781007 (Aklapper) Please provide a specific usecase example describing what you would like to accomplish. [09:04:31] Quarry: Add 'download as wikitable' option (Quarry) - https://phabricator.wikimedia.org/T117645#1781008 (Aklapper) Please provide a specific usecase example describing what you would like to accomplish. [09:39:57] Analytics-Backlog, Datasets-General-or-Unknown, Wikidata, operations: Requests to dumps.wikimedia.org should end up in hadoop wmf.webrequest via kafka! - https://phabricator.wikimedia.org/T116430#1781063 (Addshore) [09:40:24] Analytics-Backlog, MediaWiki-API, Reading-Infrastructure-Team, Research-and-Data, and 2 others: Publish detailed Action API request information to Hadoop - https://phabricator.wikimedia.org/T108618#1781065 (Addshore) [10:32:23] Analytics-EventLogging, Editing-Department, Improving access, QuickSurveys, and 5 others: QuickSurveys: Schema changes - https://phabricator.wikimedia.org/T114164#1781208 (phuedx) >>! In T114164#1777856, @leila wrote: > @Jdlrobson: are we capturing > > unhashed IP > userAgent > x_forwarded_for >... [12:13:59] (CR) Addshore: [C: 2] "merged, although I may not deploy this as it is!" [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/248033 (owner: Christopher Johnson (WMDE)) [12:14:06] (CR) Addshore: [V: 2] adds bulk sparql query and output scripts [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/248033 (owner: Christopher Johnson (WMDE)) [12:58:40] Analytics-Tech-community-metrics, Developer-Relations, DevRel-December-2015: Who are the top 50 independent contributors and what do they need from the WMF? - https://phabricator.wikimedia.org/T85600#1781531 (Aklapper) [13:09:41] Analytics-Tech-community-metrics, DevRel-November-2015: Key performance indicator: Top contributors - https://phabricator.wikimedia.org/T64221#1781538 (Aklapper) > what are their areas of activity @Qgil: How to gather "areas of activity" and how to summarize them? I do not see us listing dozens of code r... [13:41:08] Analytics-Tech-community-metrics, DevRel-November-2015: Key performance indicator: Top contributors - https://phabricator.wikimedia.org/T64221#1781654 (Qgil) These two empty sections at the end of http://korma.wmflabs.org/browser/top-contributors.html can be removed. Areas of activity refers to Git/Gerri... [14:27:22] Analytics, Analytics-Kanban, Discovery, EventBus, and 8 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1781745 (mobrovac) >>! In T114443#1777558, @Ottomata wrote: > @gwicke, I think this may be a problem. From my perspective, the goal of this project is a generalized event servic... [15:33:59] ottomata: thoughts on a graphite instance for the analytics cluster? ;) haha! [15:34:15] ha, not a bad idea actually [15:34:42] i don't have a lot of experience with graphite, but it could be cool to use it for realtime type metric graphing [15:34:56] well, for a little discussion I just had in -operations [15:35:06] 3:20 PM <_joe_> addshore: whatever we want to keep "forever", should be somewhere else than graphite [15:37:09] and [15:37:11] 3:23 PM <_joe_> addshore: my point being that any opsen if given the choice between a 10 minutes downtime of a monitoring tool and dropping old data will choose the latter [15:37:46] ottomata: IMO it would be great and solve all of my problems ;) [15:39:35] addshore: http://opentsdb.net/overview.html [15:39:54] wouldn't be excited about setting up hbase [15:39:56] but ja know :) [15:40:16] works with grafana too [15:40:28] I'll read the description and docs in a bit and see how it looks! (in a meeting right now) ;) [15:40:33] check out https://influxdb.com/ [15:40:38] been watching it for awhile now [15:45:54] Analytics-Backlog: Provide weekly app session metrics separately for Android and iOS - https://phabricator.wikimedia.org/T117615#1781919 (Nuria) > absolutely agree, but that's hypothetical as we don't have platform-specific data for these months since May. Or are you saying that it could be generated retroact... [15:46:25] chasemp: I like the no external dependencies there :) [15:46:43] it's been getting some steam and it seems pretty nice and well thought out [15:47:02] iirc they just went 1.0 and had a lot of flesh put on the bone for resiliency of storage [15:49:40] also http://blueflood.io/ , bah, too many.. [15:51:54] ottomata: as a random stab in the dark what timescale would you say you would put on getting something (maybe on of the above) on the analytics cluster? [15:52:15] ja influxdb sounds really cool [15:52:48] if it is a simple single graphite instance, we could probalby do it pretty quickly [15:52:58] within a week if it gets prioritized by team [15:53:15] since we already ahve puppetization and infrastructure around that [15:53:16] maybe less [15:54:01] okay, I mainly ask as I have a deadline for the stuff I am working on (basically of the developer summit) ;) And again wonder if I should try and press forward with graphite or something similar or keep hacking away the way it has been going so far [15:55:26] addshore: If you have existing graphite stuff, maybe cyanite as graphite backend? ottomata you'll hate for this one ;) [15:56:04] *goes to look* [15:56:48] ahh, well I am get to build anything around graphite, as I was unsure if everything would work with it, but simply calling statsd etc is the level of simplicity I was going for ;) [15:57:21] k addshore :) [15:58:35] which Is why I started writing some crappy code which would basically mean I could call something with a metric name, timestamp and value and store it in mysql.. :P [15:58:42] Analytics-Backlog, Wikipedia-iOS-App-Product-Backlog, operations, vm-requests, iOS-5-app-production: Request one server to suport piwik analytics - https://phabricator.wikimedia.org/T116312#1781982 (Joe) Hi, sorry to jump in when a discussion is already underway (which seems my speciality nowada... [15:58:50] but having an actual solution, is a better solution :) [15:59:09] joal: i ain't got no probs with cassandra :) [15:59:13] addshore: but your data is very small, right? [15:59:28] true ottomata, I have :) [16:00:23] addshore: the idea of adding an instance of graphite to analytics cluster is a good one, we might be able to get to that soon with our current projects but .. would you please file a ticket on that regard? [16:01:36] Analytics-Kanban: Load Wikimedia JSON data into Altiscale "Research Cluster" HIVE [5 pts] {paon} - https://phabricator.wikimedia.org/T114489#1782001 (JAllemandou) [16:01:59] nuria: sure! ottomata indeed, it is very small, but of course, will indefinitely get bigger :P and also wider as well. [16:03:05] addshore: maybe graphite for now is fine, since it will give you something soon, be useful for us [16:03:13] we can do a whole choose a big thing later when we need to [16:03:14] ja? [16:03:25] addshore: when you file ticket, link to others that this is for, cause I ain't got no idea :) [16:14:11] Analytics-Tech-community-metrics, DevRel-November-2015: "Age of open changesets by Affiliation" has some "NaN" values - https://phabricator.wikimedia.org/T110875#1782079 (Aklapper) [16:20:37] there's no .gitreview in analytics/refinery how do you setup gerrit for this project? [16:20:46] Analytics-Backlog, Wikipedia-iOS-App-Product-Backlog, operations, vm-requests, iOS-5-app-production: Request one server to suport piwik analytics - https://phabricator.wikimedia.org/T116312#1782099 (akosiaris) Hello, I am having a hard time grasping what we are talking about here to be honest.... [16:20:54] Analytics, Analytics-Kanban, Discovery, EventBus, and 8 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1782101 (Nuria) >I don't see these two as being mutually-exclusive. In order to meet the end goal of a generalised event service we are starting with the Services' use case. The... [16:26:47] ottomata: can you merge this one: https://gerrit.wikimedia.org/r/250989 [16:28:15] sho done [16:40:32] ottomata: let's talk about cache instantiation [16:40:39] ottomata: when you have 2 mins [16:40:56] Analytics, Analytics-Kanban, Discovery, EventBus, and 8 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1782199 (mobrovac) >>! In T114443#1782101, @Nuria wrote: > I sure hope we are not thinking of having a node rest endpoint and another one based on eventlogging at the same time,... [16:41:20] nuria can talk [16:42:19] nuria: in batcave [16:43:19] Analytics, Analytics-Kanban, Discovery, EventBus, and 8 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1782215 (Joe) @mobrovac so let me get this straight, we discussed something that was already overridden by an existing implementation? As far as deploying the python app, who is... [16:44:12] ottomata: omw [16:46:06] Analytics-Cluster, Analytics-Kanban: Add automata value in agent_type field of the refined table {hawk} - https://phabricator.wikimedia.org/T95693#1782231 (ggellerman) Nov 4, 2015 update: per @kevinator, no need to re-open this task [16:47:41] Analytics: MobileWikiAppDailyStats should not count Googlebot - https://phabricator.wikimedia.org/T117631#1782237 (Niedzielski) It's my understanding that _all_ MobileWikiApp* event logging schemas are used _only_ by the Android app. It doesn't make sense to me how Googlebot could be counted. It's alarming i... [16:49:04] Analytics-Kanban: Understand the Perl code for this report {lama} - https://phabricator.wikimedia.org/T117247#1782245 (JAllemandou) a:JAllemandou [16:53:56] Analytics, Analytics-Kanban, Discovery, EventBus, and 8 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1782267 (mobrovac) >>! In T114443#1782215, @Joe wrote: > @mobrovac so let me get this straight, we discussed something that was already overridden by an existing implementation?... [16:55:22] hi a-team, I will be missing standup again for an infrastructure manager's bi-weekly meeting [16:55:32] thx kevinator [16:55:35] ok [16:59:22] mobrovac: I forgot to ask if you needed help today :S [16:59:37] heh joal [16:59:45] yeah i still need to get that list :) [17:00:17] Ok, obvisouly, I talk to you at stand up time, but I'll definitely get back just after :) [17:00:22] mobrovac: --^ [17:01:02] Analytics-Tech-community-metrics, Phabricator, DevRel-November-2015: Closed tickets in Bugzilla migrated without closing event? - https://phabricator.wikimedia.org/T107254#1782299 (chasemp) If we know what we want from bugzilla it should be fairly trivial to update the row for each task in the bugzilla... [17:01:47] heh [17:01:55] joal: let's do it tomorrow morning? [17:02:01] sure mobrovac [17:02:04] i'm guessing that won't take long? [17:02:29] except for maybe the hive waiting, the query writing itself should be fast indeed [17:02:32] mobrovac: --^ [17:03:10] Analytics-Wikistats: Monthly page view stats for wikibooks, wikinews, wikiquote, wikisource, wikiversity for July 2015 are extremely anomalous - https://phabricator.wikimedia.org/T116531#1782326 (Nuria) [17:03:11] Analytics, Analytics-Kanban, Patch-For-Review: View counts in squid logs, webstatscollector 2.0 and hive are very dissimilar for several projects. [5 pts] - https://phabricator.wikimedia.org/T116609#1782325 (Nuria) Open>Resolved [17:03:32] kk cool [17:03:36] thnx joal! [17:03:45] np mobrovac [17:05:50] joal: will you have some time tomorrow to help me with oozie? [17:05:51] ottomata: nuria will go make that ticket now! [17:06:10] Hi dcausse: I'll find some for sure ! [17:06:20] great, thanks! [17:06:34] dcausse: what timezone and time ? [17:06:47] I'm CET, when you want :) [17:07:23] early afternoon, like 14:00 CET would be fine :) [17:07:30] perfect [17:07:33] can you st up a meeting (so that I don't forget) [17:07:35] :D [17:07:39] ok :) [17:08:48] Analytics-Backlog, Analytics-Kanban, operations, Monitoring, Patch-For-Review: Turn off sqstat udp2log instance - https://phabricator.wikimedia.org/T117727#1782387 (Ottomata) NEW a:Ottomata [17:09:13] Analytics-Backlog, Analytics-Kanban, operations, Monitoring, Patch-For-Review: Turn off sqstat udp2log instance - https://phabricator.wikimedia.org/T117727#1782397 (Ottomata) [17:09:15] Analytics-Kanban, operations, Monitoring, Patch-For-Review: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#1782396 (Ottomata) [17:09:22] Analytics-Kanban, operations, Monitoring, Patch-For-Review: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#915957 (Ottomata) [17:09:23] Analytics-Backlog, Analytics-Kanban, operations, Monitoring, Patch-For-Review: Turn off sqstat udp2log instance - https://phabricator.wikimedia.org/T117727#1782387 (Ottomata) [17:12:14] Analytics-Kanban: Load Wikimedia JSON data into Altiscale "Research Cluster" HIVE [5 pts] {paon} - https://phabricator.wikimedia.org/T114489#1782428 (Nuria) Open>Resolved [17:13:10] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1782440 (Addshore) NEW [17:13:48] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1782449 (Addshore) [17:15:28] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1782440 (Addshore) [17:16:46] ottomata: nuria ^^ and others :) [17:17:50] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1782488 (Addshore) [17:17:52] addshore: on meeting thanks [17:25:17] And good news, grafana apparently supports multiple graphite backends, and shiny (which is what we are currently using) also does. [17:25:46] Analytics-Kanban: Understand the Perl code for "Visiting Country per Wiki" report {lama} - https://phabricator.wikimedia.org/T117247#1782572 (JAllemandou) [17:26:13] What do these task codes mean? lama? paon? I once saw a corgi? [17:27:22] https://phabricator.wikimedia.org/search/query/9fiK6pZUl56o/#R [17:27:44] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite, Patch-For-Review: Enable retention of daily metrics for longer periods of time in Graphite - https://phabricator.wikimedia.org/T117402#1782602 (Nuria) [17:29:07] harej: we (Analytics) tag our tasks with 4 letters animals to reference project level [17:29:18] "corgi" is five letters [17:29:27] not ours harej :) [17:32:14] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1782646 (Christopher) Where is the proof of concept for Graphite exactly? I am not sure why exactly Graphite is assumed to be the best solution. Can... [17:34:36] Analytics-Kanban: Understand the Perl code for "Visiting Country per Wiki" report {lama} - https://phabricator.wikimedia.org/T117247#1782656 (ezachte) Once I got https://phabricator.wikimedia.org/T114379 done (hopefully tomorrow) I hope to get geo reports [1] back online using new hive feed Would it make se... [17:37:08] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1782678 (Christopher) http://db-engines.com/en/system/Graphite%3BHBase [17:39:06] Analytics, Analytics-Kanban, Discovery, EventBus, and 8 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1782700 (faidon) >>! In T114443#1776521, @GWicke wrote: > We have a [simple node service](https://github.com/wikimedia/restevent) that does what we need & integrates with our nod... [17:40:30] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1782707 (Addshore) Graphite is already puppetized on the cluster, hence creating another instance should be /trivial/. I am not necessarily saying it i... [17:53:53] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] - https://phabricator.wikimedia.org/T114379#1782782 (ezachte) Status: updates have been tested, see stat1002:/a/dammit.lt/projectviews/projectviews_csv.zip Getting the syn... [18:07:19] Analytics-Backlog, Wikipedia-iOS-App-Product-Backlog, operations, vm-requests, iOS-5-app-production: Request one server to suport piwik analytics - https://phabricator.wikimedia.org/T116312#1782842 (JMinor) Hello I'll try to address the questions of motivation and potential scale. As for maintai... [18:12:23] halfak: got a minute ? [18:12:56] Hmm... Not exactly, but if you say things at my, i can read the backlog in 5-10 minutes. [18:13:14] halfak: can wait :) [18:33:40] Analytics-Kanban: Understand the Perl code for "Visiting Country per Wiki" report {lama} - https://phabricator.wikimedia.org/T117247#1783017 (ezachte) By all means let's talk. I moved the meeting to Monday (I'm away Fri-Sun). For starters my current thinking is this: reports like breakdown by browser, opera... [18:36:34] o/ joal [18:36:42] hey halfak :) [18:36:52] I am monitoring the current run over enwiki [18:37:32] map phase passed - Needed to change timeout (don't really understand why though :( [18:37:37] halfak: --^ [18:37:50] Gotcha. Could be the one super-big file? [18:37:59] Are you filtering all but mainspace? [18:38:04] halfak: interesting findings: 2700 mappers, 1% over 1h [18:38:15] 2700 parallel mappers?! [18:38:17] halfak: generating for everything [18:38:24] no 2700 total mappers :) [18:38:29] Great. We can't filter out ns0 for this work [18:38:34] roughtly 300 in parallel when resource free [18:38:58] halfak: not sure what work you are talking about, but I trust you :) [18:39:35] And, finally, 1 mapper taking 14h, while the second longest is 3h30 [18:39:36] The work on the research cluster [18:39:41] k makes sense :) [18:39:48] Wow. [18:39:59] Must be the dump file with WP:ANI in it. [18:40:03] I'll try a ns0 filtered job, just to see [18:40:09] That's the one that gets a bot edit every 3 secs [18:40:31] And even looking further in the power law: 23 mappers over 1h [18:40:48] Anyway :) [18:41:17] Looks like we are refining to a working json-sorted script :) [18:43:20] halfak: just wanted to let you know :) [18:49:32] Analytics: MobileWikiAppDailyStats should not count Googlebot - https://phabricator.wikimedia.org/T117631#1783065 (Nuria) If the user agent is anything other than what you send on the app (and sounds like it is) it is likely some code that is not your app code is accessing your pages. Are there any webviews... [18:50:04] halfak: forgot to tell about something as well [18:50:50] I think you should have a dedicated user to run the scripts (sorted-json generation and parquet-metadata extraction) in order for the data not to be deleted by mistake by one of the users [18:50:54] Analytics: MobileWikiAppDailyStats should not count Googlebot - https://phabricator.wikimedia.org/T117631#1783069 (Dbrant) The explanation seems to be that Google periodically runs our app in an automated fashion, in order to evaluate the similarity of how the content is presented in the app versus mobile we... [18:50:59] halfak: --^ [18:51:54] How will having a dedicated user help? [18:52:14] Giving right access on hdfs to that user only (read for others) [18:52:16] halfak: --^ [18:52:19] Ahh.. I see. [18:53:07] Can you set up permissions in Hive? [18:53:13] Or do you do it in HDFS? [18:53:36] Output files a written by the user launching the job / the hive client [18:53:50] In our cluster we use hdfs user [18:54:00] So I go for sudo -u hdfs hive [18:54:19] Oozie handles that for us most of the time, but it can be done manually [18:54:30] Changing ownership / permissions on HDFS can also be done [18:54:33] halfak: --^ [18:55:16] milimetric: is dashiki ok on IE11? [18:55:30] milimetric: if not those will be easy bugs to file for google code in [18:56:17] joal, gotcha. I'll find out if they already have something set up for permissions. [18:56:55] halfak: for data used by many people, a bit of protection is usually nice :) [18:57:19] +1 [18:58:04] ottomata: coming to cassandra meeting [18:58:24] nuria: I have problems joining, will keep trying [18:58:25] joal, milimetric : FYI that Erik z. moved the wikistats meeting to monday [18:58:31] nuria: I have seen [18:58:34] k [18:58:40] thanks nuria :) [18:59:34] nuria: yes [19:01:03] Analytics, Analytics-Kanban, Discovery, EventBus, and 8 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1783135 (mobrovac) >>! In T114443#1782700, @faidon wrote: > So either someone else should make it for you (//soon//) or you'll just use your own thing? No, it doesn't work like t... [19:24:36] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1783354 (Christopher) If time is the deciding issue, then the question should be how much work is it to puppetize HBase? As indicated [[ http://serv... [19:51:41] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1783530 (Nuria) I am not sure we are worried about scale here, data size is small so many tech solutions would work. [19:51:55] hi milimetric. [19:52:27] one question: do we have hourly pageview data per project in some machine readable format for Discovery? [19:53:30] leila: i think we have it on the cluster , right? on: [19:53:44] as hdfs, nuria? [19:53:56] on hive: projectview_hourly [19:54:05] it's a table [19:55:02] milimetric, ottomata : 11/11 is a holiday, let me know if i should change pageview APi retro [19:55:08] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1783568 (Christopher) Puppet recipes for Hadoop, HBase and Zookeeper: https://github.com/hstack/puppet [19:55:17] nuria: I'll be off (didn't noticed, sorry) [19:55:27] joal: ok, let's reschedule [19:55:50] super interesting talk milimetric, thanks for having set that up ! [19:56:13] I need to go and have diner with my wife before she kills me, but will love to debrief :) [19:57:02] Hey all, I am on ipad [19:57:11] :D [19:57:47] ottomata: ok, i get the sighup code now, thank you [19:58:14] a-team, gone for diner, back after for a minute (but rreally not long) [19:58:18] yup [19:58:21] See you tomorrpw :) [20:01:56] thanks nuria. this is helpful, Max is happily back to his desk. :-) [20:02:16] milimetric: enjoy iPad [20:02:20] ;-) [20:03:42] Analytics-EventLogging, Continuous-Integration-Config: Set up jsduck test job for EventLogging - https://phabricator.wikimedia.org/T88343#1783607 (Krinkle) Open>Resolved p:Lowest>Low a:Paladox [20:03:46] Analytics-EventLogging, Continuous-Integration-Config: Set up jsduck test job for EventLogging - https://phabricator.wikimedia.org/T88343#1009758 (Krinkle) Now live at . [20:04:11] milimetric: are there any IE11 bugs on dashiki? [20:04:19] milimetric: that .. ahem... we know of [20:04:36] Not that I've seen [20:04:46] Nor on Edge [20:04:51] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1783613 (JanZerebecki) It would be nice to be able to also use our metrics in the same graphs as those we already use on grafana.w.o . I don't know wha... [20:05:09] But there's one small one in FF [20:06:36] milimetric: i added that to google code in [20:06:41] milimetric: the ff one [20:06:50] Cool, thx [20:08:00] leila: yes, sorry, the pageview dumps is what we have at hourly resolution. I assume you mean the new definition because the old one has been around for a while [20:08:15] milimetric: yes, new definition is good. :-) [20:08:30] K, one sec I'll link. Hardddd on ipad ;) [20:09:40] (My laptop network card died) [20:09:49] Ok, leila: http://dumps.wikimedia.org/other/pageviews/2015/2015-05/ [20:10:31] That's the earliest we have projectviews hourly based on the new definition. If you go up folders from there you'll find the other months [20:15:22] YES!!! I woke up my busted wifi card :) [20:15:36] I had to shut down instead of just restart. duh, Dan, duh [20:17:13] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1783646 (Christopher) Graphana and OpenTSDB work together. http://docs.grafana.org/datasources/opentsdb/ Yes, Graphana is a great front end. But usin... [20:43:28] Analytics-Kanban, Patch-For-Review: Analytics support for echo dashboard task {frog} [8 pts] - https://phabricator.wikimedia.org/T117220#1783729 (Milimetric) @matthiasmullie: the dashboards look like they're getting data now and the graphs are rendering fine: http://ee-dashboard.wmflabs.org/dashboards/enw... [20:50:14] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] - https://phabricator.wikimedia.org/T114379#1783752 (Milimetric) :( sorry for the format problems, Erik; I understand you fought through it but you should've just pushed ba... [20:54:32] Analytics-Kanban: Understand the Perl code for this report {lama} - https://phabricator.wikimedia.org/T117245#1783766 (Nuria) a:Nuria [20:56:14] Analytics-Backlog, Wikipedia-iOS-App-Product-Backlog, operations, vm-requests, iOS-5-app-production: Request one server to suport piwik analytics - https://phabricator.wikimedia.org/T116312#1783771 (Milimetric) So assuming 1000 bytes for each event, and like 3 million events per day, that means:... [21:29:16] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] - https://phabricator.wikimedia.org/T114379#1783854 (ezachte) Hey Dan, no worries. I should have been more clear. This has nothing to do with your upgrade to webstatscollect... [21:30:05] nuria, milimetric, there's this layout detail in dashiki vital signs that maybe could be a task for google code in?: the chart overflows the white space and adds unnecessary scroll, and also breaks the background... [21:30:19] at least in my screen [21:31:24] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] - https://phabricator.wikimedia.org/T114379#1783856 (ezachte) encore My scripts were processing webstatscollector 1.0 output so far. That's why I encountered it only now. [21:43:40] mforns: excellemt! can you file phab task? [21:44:17] mforns: if you tag it with "easy" and "google code in" (gci?) it will be added to the ones available for students [22:11:11] ottomata, https://phabricator.wikimedia.org/T117805 ?? [22:11:19] Hive claims it doesn't have any Maps data on the 31st [22:11:30] has anything changed? [22:12:53] hm looking [22:12:55] not that i know of [22:12:57] its just the 31st? [22:13:13] ottomata, i am about to check for the 2nd [22:15:16] ottomata, hmm, i just ran it for 2015-11-01 1st hr, and it gave me some data [22:15:18] looking further [22:17:25] ottomata, hm, seems to be working, need to figure out if something has died in the Ironholds's dept :) [22:17:34] oook [22:38:22] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1784032 (Addshore) @JanZerebecki what was the use case you had in mind? Grafana can support multiple graphite backends. Shiney can also draw data from... [22:38:39] Analytics-Backlog, WMDE-Analytics-Engineering, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1784037 (Addshore) @JanZerebecki what was the use case you had in mind? Grafana can support multiple graphite backends. Shiney can also draw data from... [23:26:00] Analytics: MobileWikiAppDailyStats should not count Googlebot - https://phabricator.wikimedia.org/T117631#1784126 (Tbayer) @Niedzielski , as to the question how significant this issue is: These Googlebot entries make up 14% of the entire database, although only 0.5% for yesterday. ``` mysql:research@analyti... [23:39:59] Quarry: Add 'download in HTML format' option (Quarry) - https://phabricator.wikimedia.org/T117644#1784169 (yuvipanda) I suppose in addition to the CSV, TSV, JSON and JSON-Lines format the reporter would like 'HTML' as a download format as the results as well. [23:40:36] Quarry: Add 'download as wikitable' option (Quarry) - https://phabricator.wikimedia.org/T117645#1784170 (yuvipanda) I suppose in addition to the CSV, TSV, JSON and JSON-Lines format the reporter would like 'wikitable' as a download format as the results as well. [23:43:12] Analytics-Backlog: Traffic Breakdown Report - Browser Major Minor Version {lama} - https://phabricator.wikimedia.org/T115590#1784174 (Nuria) I think these two are so similar that is worth studing them together: https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future_per_report_B2#2_Breakdown_... [23:43:29] Analytics-Kanban: Traffic Breakdown Report - Browser Major Minor Version {lama} - https://phabricator.wikimedia.org/T115590#1784175 (Nuria)