[06:18:48] (PS2) QChris: Drop unneeded auxpaths for hive snippets [analytics/refinery] - https://gerrit.wikimedia.org/r/144910 [06:19:03] (PS2) QChris: Drop unneeded partition dropping part of oozie import [analytics/refinery] - https://gerrit.wikimedia.org/r/144909 (https://bugzilla.wikimedia.org/67128) [06:36:02] Analytics / Wikimetrics: Story: Community has documentation on chosen dashboard architecture and alternatives - https://bugzilla.wikimedia.org/67125 (Kevin Leduc) [09:56:58] (CR) Nuria: Add autcomplete to tags (2 comments) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/145039 (owner: Terrrydactyl) [10:19:51] hi nuria [10:20:00] to gergo's question [10:20:16] the data he'd be querying is private, so wikimetrics can't get to it [10:20:36] ahhhhh [10:20:50] the other problem is the metric he'd need would be like an ad-hoc metric because we shouldn't add a lot of custom single purpose metrics [10:20:51] tgr: mi master plan no work [10:21:10] I was thinking about an ad-hoc metric but the private data is a bit of a blocker for now [10:21:16] milimetric: but it could be added like a "preferences" metric [10:21:19] tgr: can the opt-out data be made public? I think not right? [10:21:39] mmm, "preferences" is like a million things though :) [10:21:50] not right now [10:21:50] check out user_properties, it stores the kitchen sink [10:21:57] ok tgr, cool [10:22:10] ya, i did check it out [10:22:15] there is an ongoing discussion to sanitize and publish it, but there are unsolved issues [10:22:40] that is why it would be generic : "query that table daily for any given preference id" [10:22:40] so then, what we need is to get sean to give you access to create a table on analytics-store [10:22:53] https://bugzilla.wikimedia.org/show_bug.cgi?id=58196 [10:23:07] right, but if I understand correctly, tgr, this is not in user_properties? [10:23:11] I think that would be the simplest, yes [10:23:31] opt-out is probably in a custom extension table? [10:23:48] no, it's a user preference [10:23:52] oh?! [10:23:55] hm.... [10:24:03] milimetric: do you have access to : analytics-store.eqiad.wmnet? i donot [10:24:22] yeah, i think everyone that has the research .my.cnf does [10:24:34] but can you ssh into it? [10:24:37] (if you have access to s{1-7}, it's the same [10:24:43] ah ok [10:24:46] no, you wouldn't ssh, just connect via mysql [10:25:14] ok, that makes wikimetrics a possibility then [10:25:21] ah ok, then i must cause is just like EL db [10:25:43] milimetric: data is on user_preferences on enwiki db [10:25:47] I think [10:25:52] but I think that would slow them down right now, let's see if we can get him access to make temp tables there for now [10:25:58] so that data must be on labs .. right? [10:26:48] yeah, it is, user_preferences is exposed... but it might still be filtered out, we'd have to check - tgr, what would it look like in user_properties? I'll query for it [10:28:15] https://gerrit.wikimedia.org/r/#/c/143501/2/optout/template.sql <- this is the query [10:28:46] it has a lot of cruft to group users by activity levels [10:29:34] right, no it's cool, one sec before I bother sean [10:30:12] https://gerrit.wikimedia.org/r/#/c/143501/2/optout/init.sql <- this is the temp table I was planning to use [10:30:58] I figured the easiest is to put it somewhere on the analytics-store DB, but storing it on the labs instance which runs limn would work as well [10:34:20] i don't think we'd want it on limn1 because then you'd have to crunch the numbers there and that box is running people's dashboards so probably shouldn't also do analysis [10:35:51] so tgr, do you guys know about the mobile web's team's solution for this? [10:36:03] it's possible you might want to start doing the same thing for now [10:36:05] no [10:36:11] k, one sec [10:36:26] https://git.wikimedia.org/tree/analytics%2Flimn-mobile-data [10:36:33] ok tgr, so quick description [10:36:48] mobile/config.yaml is how they configure their jobs [10:36:53] there is nothing in the init.sql table that would need further crunching though [10:37:22] it would store exactly the same data as the final tsv, it's just easier to add/update rows [10:37:33] well, tgr, the main problem here would be going from analytics-store to labs, crossing that private/public border is the hard part [10:37:50] right, so that's what the mobile web team's scripts do [10:37:55] they basically run a cron on stat1 [10:38:20] look at a history file of when the last successful run of each query they want finished, and run everything that is out of date [10:39:01] it's evolved over time, so one of their queries looks like this: https://git.wikimedia.org/blob/analytics%2Flimn-mobile-data/df701bdaf8a0531b03ee13e80bee2cafef9b4802/mobile%2Fedits-monthly-5plus-editors.sql [10:39:12] notice the template approach where they can pass in "from" and "to" [10:39:48] this runs in a cron on stat1, then the results get rsynced over to stat1001 and are served publicly where limn can see them: http://stat1001.wikimedia.org/limn-public-data/mobile/datafiles/ [10:40:18] this is probably the fastest. Because if you had a table on analytics-store, wouldn't you still have to do all this tgr? [10:40:31] and instead of doing it manually, you can share that codebase [10:41:25] we have some scripts already for running a query against analytics-store, creating a tsv from the result and loading it into limn [10:41:30] I could just reuse that [10:41:51] they can also process the results through python if they need, like: https://git.wikimedia.org/blob/analytics%2Flimn-mobile-data/df701bdaf8a0531b03ee13e80bee2cafef9b4802/mobile%2Fdeleted-uploads.py [10:42:11] how do you move the tsv from analytics-store to a public place? [10:42:57] not sure [10:43:13] gilles magic or something? :) [10:43:20] ok, i'll shoot that email to sean, I have it written [10:43:22] I think a cronjob on limn1 connects via mysql to analytics store and dumps the results in a file [10:44:09] and the credentials are saved in some private my.cnf file [10:45:09] uh... [10:45:12] that would be bad [10:45:26] everyone has sudo on limn1 so nothing can be private [10:45:50] let's make sure that's not the case, and if it is, let's think more carefully about the mobile web team's solution [10:48:55] milimetric: I'll check tomorrow [10:49:08] (well, today) [10:49:23] it's too late now for me to think coherently :) [10:50:13] tgr where are you located? [10:50:25] SF [10:50:39] what? [10:50:45] it is LATE! [10:51:26] tgr: get some sleep! [10:52:15] lol [10:52:19] ok springle, thanks for joining [10:52:31] so tgr, question: what user do you use to connect to analytics-store? [10:53:00] research, I think? [10:53:14] as I said, I'm not really familiar with the code [10:53:23] but I'll check tomorrow [10:54:23] research user can create stuff in staging db. is that appropriate for limn [10:54:34] milimetric: ^ i guess, if tgr should be zzz :) [10:54:48] yeah, springle, that should be fine I think [10:54:59] they don't care what db it's going to. Oh, crap, sorry, I remember now you and Aaron having this discussion [10:55:00] my bad [10:55:39] it's more a question for analytics i think, depending on how you intend to manage staging [10:56:16] right, I don't think we want to enforce anything there yet, we'll see [10:56:24] so springle, what's the staging db again? [10:56:32] but it would be great to keep the number of writable locations on -store to a minimum, since that box has a *lot* of schemas on there already :) many file handles, etc [10:56:33] I was looking through the list on analytics-store... [10:56:46] yeah, agreed, staging works great [10:56:58] well.. don't know. dario or aaron asked for 'staging' [10:57:25] oh, duh, that's there [10:57:42] the name implied there might eventually be a request for 'production' too [10:57:46] but nobody has asked yet [10:58:18] cool, ok tgr: use the user you already have to write to analytics-store, database "staging" and make a table name that is sensible there [10:58:28] I'll reply to your email on the list with the same [10:58:34] thanks springle! [10:58:56] yw [12:38:16] (PS3) QChris: Add basic deployment script [analytics/refinery] - https://gerrit.wikimedia.org/r/144677 (https://bugzilla.wikimedia.org/67129) [12:40:33] (CR) QChris: Add basic deployment script (6 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/144677 (https://bugzilla.wikimedia.org/67129) (owner: QChris) [12:42:12] (CR) QChris: Add basic deployment script (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/144677 (https://bugzilla.wikimedia.org/67129) (owner: QChris) [13:05:10] (CR) Ottomata: [C: 2 V: 2] Drop unneeded auxpaths for hive snippets [analytics/refinery] - https://gerrit.wikimedia.org/r/144910 (owner: QChris) [13:05:25] (CR) Ottomata: [C: 2 V: 2] Drop unneeded partition dropping part of oozie import [analytics/refinery] - https://gerrit.wikimedia.org/r/144909 (https://bugzilla.wikimedia.org/67128) (owner: QChris) [13:29:21] (CR) Ottomata: Add basic deployment script (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/144677 (https://bugzilla.wikimedia.org/67129) (owner: QChris) [13:30:39] (CR) Ottomata: Add basic deployment script (3 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/144677 (https://bugzilla.wikimedia.org/67129) (owner: QChris) [13:48:35] Analytics / Refinery: Story: AnalyticsEng has UDF in Hadoop for UA parsing - https://bugzilla.wikimedia.org/67803 (Kevin Leduc) NEW p:Unprio s:enhanc a:None port a UA parsing function for the Hadoop environment so that it can be used to support Page View counting [13:50:02] Analytics / Refinery: Story: AnalyticsEng has UDF in Hadoop for UA parsing - https://bugzilla.wikimedia.org/67803 (Kevin Leduc) p:Unprio>Highes [13:58:04] Analytics / Refinery: Story: AnalyticsEng has daily PageView counts - https://bugzilla.wikimedia.org/67804 (Kevin Leduc) NEW p:Unprio s:enhanc a:None Implement the definition of PageViews Count total PageViews for everything Run it daily & store it somewhere [13:59:17] Analytics / Refinery: Story: AnalyticsEng has daily PageView counts - https://bugzilla.wikimedia.org/67804 (Kevin Leduc) p:Unprio>Normal [14:31:34] Analytics / Visualization: Story: EEVSUser has a portal to navigate Vital Signs - https://bugzilla.wikimedia.org/67806 (Kevin Leduc) NEW p:Unprio s:enhanc a:None - implement Pau's design (add/remove metrics & projects) - do not include displaying of graphs for now [14:32:03] Analytics / Visualization: Story: EEVSUser has a portal to navigate Vital Signs - https://bugzilla.wikimedia.org/67806 (Kevin Leduc) p:Unprio>High [14:54:22] [travis-ci] wikimedia/mediawiki-extensions-EventLogging#228 (wmf/1.24wmf13 - ff189b1 : Reedy): The build passed. [14:54:22] [travis-ci] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/ff189b10ce96 [14:54:22] [travis-ci] Build details : http://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/29607850 [17:03:32] Analytics / Refinery: Story: Admin has duplicate monitoring in Icinga - https://bugzilla.wikimedia.org/67128 (Dan Andreescu) [17:04:31] Analytics / Visualization: Spike: AnalyticsEng decide on stack for EEVS dashboard - https://bugzilla.wikimedia.org/67172 (Kevin Leduc) [17:07:47] Analytics / Wikimetrics: Story:a WikimetricsUser runs 'Rolling Monthly Active Editors' report - https://bugzilla.wikimedia.org/67458 (Dan Andreescu) [17:09:03] Analytics / General/Unknown: Packetloss issues on oxygen (and analytics1003) - https://bugzilla.wikimedia.org/67694 (Dan Andreescu) [17:09:16] Analytics / EventLogging: UniversalLanguageSelector-tofu logging too much data - https://bugzilla.wikimedia.org/67463 (Dan Andreescu) [17:11:01] Analytics / Visualization: Spike: AnalyticsEng decide on stack for EEVS dashboard - https://bugzilla.wikimedia.org/67172 (Dan Andreescu) [17:19:32] so the espeak thing that tells me when my meetings are [17:19:38] queues up if I'm using hangouts [17:19:40] and at the end [17:19:50] it just yells all the meetings that I missed reminders for [17:20:17] so since we've been in one long hangout for like 4 hours, it just yelled like 4 overlapping reminders at me [17:20:20] #veryfunny [18:03:26] (PS4) Terrrydactyl: [WIP] Add autcomplete to tags [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/145039 [18:03:37] nuria ^ [18:04:57] (CR) jenkins-bot: [V: -1] [WIP] Add autcomplete to tags [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/145039 (owner: Terrrydactyl) [19:13:36] (PS5) Nuria: [WIP] Add autcomplete to tags [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/145039 (owner: Terrrydactyl) [19:13:56] (CR) jenkins-bot: [V: -1] [WIP] Add autcomplete to tags [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/145039 (owner: Terrrydactyl) [19:43:42] oh, ottomata, can you review / merge a couple of tiny changes for wikimetrics puppet? [19:43:49] sure [19:43:57] https://gerrit.wikimedia.org/r/#/c/144154/ [19:44:00] https://gerrit.wikimedia.org/r/#/c/144761/ [19:48:47] milimetric: so what's the CORS thing do? [19:48:57] allow any origin to request stuff in that dir? [19:49:40] yes ottomata [19:49:55] if I tried to access it from the dashboard running in the client's browser for example [19:49:59] it wouldn't be allowed [19:50:03] this is for remote rendering of datasets? [19:50:06] Analytics / Wikimetrics: Optimize JSON format of recurrent report output - https://bugzilla.wikimedia.org/67822 (Dan Andreescu) NEW p:Unprio s:normal a:None Currently, recurrent reports are outputting: "date": { "SUM": { "submetric": value } }, "date": { "SUM": { "submetric": value } }, "d... [19:50:09] yes [19:50:24] kevinator: 3 things [19:50:35] 1. I have to wait for the puppet stuff to be merged before deploying wikimetrics [19:50:40] ok [19:50:50] 2. This bug is very interesting and we should schedule it soon: https://bugzilla.wikimedia.org/show_bug.cgi?id=67822 [19:51:13] 3. There's a problem with performance so before we ramp up the real reports on production, we need to fix (I'm about to file it, hang on) [19:53:11] kevinator: https://bugzilla.wikimedia.org/show_bug.cgi?id=67823 [19:53:34] Analytics / Wikimetrics: Max recursion limit hit - https://bugzilla.wikimedia.org/67823 (Dan Andreescu) NEW p:Unprio s:normal a:None When running recurrent reports, if more than a certain number of reports at the same time, we hit a bug in pickle which causes python to throw an error about... [19:54:58] milimetric: I agree on #2… it is interesting [19:55:26] Seems like a simple change that could pay off huge when backfilling data [19:55:48] would it impact other wikimetrics users who consume JSON data? [19:56:46] Analytics / Wikimetrics: Optimize JSON format of recurrent report output - https://bugzilla.wikimedia.org/67822 (Kevin Leduc) p:Unprio>High [19:57:27] kevinator: no, shouldn't hurt anyone [19:57:34] would help people using recurrent reports [19:57:43] unless someone built some crazy pipeline around it (highly doubtful) [19:57:43] :) [19:58:40] I’d love to pull it into this sprint… but I don’t want this to sidetrack our commitments [20:01:25] regarding #3… I have difficulty conceiving why pickle uses a recursive algorithm [20:02:30] kevinator, because pickles are delicious [20:02:38] you should always eat them recursively. [20:02:55] An infinite fork-bomb of pickles, with an emphasis on fork (nobody wants saline on their hands) [20:03:05] (okay, I'll let you guys actually do work now) [20:03:45] :-P [20:03:47] Analytics / Wikimetrics: Max recursion limit hit - https://bugzilla.wikimedia.org/67823#c1 (Kevin Leduc) Let's do the quick fix now (setrecursionlimit) and log a new bug to move away from pickle serialization so we can do that later. [20:05:25] milimetric: wouldn’t fixing bug #2 and generating timeseries to backfill data avoid bug in #3 [20:07:07] (PS1) QChris: Document naming conventions [analytics/refinery] - https://gerrit.wikimedia.org/r/145421 [20:10:48] (CR) QChris: Add basic deployment script (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/144677 (https://bugzilla.wikimedia.org/67129) (owner: QChris) [20:39:02] Analytics / Wikimetrics: Max recursion limit hit - https://bugzilla.wikimedia.org/67823#c2 (Kevin Leduc) What are we trying to pickle anyway? I read the documentation on Pickling ( https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled ) and it left me wondering if we were try... [21:03:48] kevinator: think about objects as trees of properties with complex properties as nodes and simple properties as leaves [21:03:56] (sorry was in interview until now) [21:04:11] so, #2 would totally help with #3, but would not eliminate the problem [21:04:50] #3 arises because we use a high level structure that sits on top of our recurrent report runs and allows us fine grained control over how many tasks can run in parallel but still guarantees all of them get queued [21:05:08] and that structure, when celery tries to pickle it, makes pickle choke [21:06:02] so we could do the dirty hack for #3, then #2, then the clean way for #3, but we have to solve #3 at some point [21:06:27] kevinator: https://github.com/celery/celery/issues/1078 [21:06:36] that describes the issue better probably, if you're curious :) [21:06:46] I have no problems conceptualizing the tree… but I imagine it flat, rather than deep [21:07:08] i will read the article [21:07:47] Analytics / Wikimetrics: Max recursion limit hit - https://bugzilla.wikimedia.org/67823#c3 (Dan Andreescu) This is the link to the pickle issue. It's because we're using a chain of group of tasks (to allow us to throttle how many run in parallel): https://github.com/celery/celery/issues/1078 [21:08:27] BTW I wrote a web browser back in the day and had plenty of experience with trees and recursion when running stress test with nested tables. [21:08:45] it was quite trippy trying to optimize rendering [21:16:16] Analytics / Wikimetrics: Unique constraint on Tag table causes errors in tests - https://bugzilla.wikimedia.org/66671 (Dan Andreescu) PATC>RESO/FIX [21:17:30] kevinator: in this case I don't think it's flat or deep, I think there's an actual loop that pickle doesn't handle properly or something. [21:17:50] that's cool about the browser though [21:18:11] does it still work? :) [21:18:12] milimetric: after reading the article… it looks like chaining tasks implicitly creates a tree [21:18:17] yea [21:18:41] so we make groups of X first, where X is configurable, and then chain those groups [21:18:52] that way only X tasks ever run in parallel [21:19:23] (well, at least kicked off by our recurrent scheduler) [21:19:32] the company was acquired by sun Micro, but our product was EOL’ed within a couple of years. WAP browsers became the thing [21:19:53] aw, that would've been fun to load up things on it [21:20:11] we could've been like - look eevs dashboard runs on KevinatorKit [21:20:17] (you called it KevinatorKit right?) [21:21:42] nope… lawyers got a hold of it and described it using legalease [21:21:45] here: http://patft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=6639611.PN.&OS=PN/6639611&RS=PN/6639611 [21:22:51] although I like “KevinatorKit” We’ll have to save that one for something special :-) [21:23:51] what?! you hold patents!!! [21:23:52] that's crazy [21:23:54] Kitenator [21:24:00] milimetric, amusingly so does our ED [21:24:07] meow? [21:24:21] people could like, legitimately call us patent trolls now [21:24:42] well, ok, fairly to Kevin, Sun holds this one [21:24:48] https://en.wikipedia.org/wiki/Lila_Tretikov#Patents [21:24:56] woa [21:24:58] Sun also holds a lot of those I think [21:25:02] but I dunno [21:26:10] Multi-instance "shadow" system and method for automated resource redundancy reduction across dynamic language applications utilizing application of dynamically generated templates. [21:26:34] our new ED knows code ;p [21:26:48] yeah... [21:26:50] our ED aka The Dynamic Shadow [21:27:00] oh my god that should totally be her nickname [21:27:14] when I worked at sprint, one director suggested we invent new words to make the ideas more patentable [21:27:21] lol [21:27:39] kevinator, "multi-instance bullshitterising for the purpose of obfuscating and confusing the USPTO" [21:27:44] we can submit it tomorrow have it confirmed Monday. [21:28:43] :-) [21:30:13] brrr, the breeze coming into this cafe is chilly. time to head to the next one [21:58:03] Analytics / Wikimetrics: Add mod_headers to puppet - https://bugzilla.wikimedia.org/67825 (Dan Andreescu) NEW p:Unprio s:normal a:None I forgot to include this in the puppet change that added CORS headers to wikimetrics: sudo a2enmod headers I'm not sure how that translates to puppet. [21:58:04] milimetric: btw, https://trello.com/c/nOJDn3zC/7-as-a-mobile-team-member-i-d-like-our-most-popular-metrics-to-be-shown-on-an-analytics-dashboard is coming up for next sprint, so I'll be doing a lot of that :) [22:04:31] Analytics / Wikimetrics: Performance of Recurrent Reports - https://bugzilla.wikimedia.org/67543 (Dan Andreescu) [22:06:00] YuviPanda: very cool [22:06:10] we are doing our last round of discussions and design this sprint [22:06:23] and will have some prototypes soon [22:06:24] milimetric: cool, do keep me updated. [22:06:31] yeah, so far, thinking is basically: [22:06:34] storage - mediawiki [22:06:47] visualization - whatever is super fast to implement and will run on mobile [22:07:00] glue in between - needs research, will pick something next week [22:07:25] so for viz we have like rickshaw and vega we're going to look at first, dygraphs maybe, other stuff like that [22:07:43] can you explain by what you mean by 'storage - mediawiki'? [22:07:43] but it doesn't matter because unlike limn we're not going to weld the viz into the dashboard tool [22:07:45] storage of what? [22:07:57] dashboard definition, like the name of the dashboard, what graphs are on it, etc. [22:08:19] also, for browsing in our case - what projects are available and what metrics are available [22:08:32] ah, so would that mean code that'll live on metawiki? [22:08:44] not code, but json maybe [22:08:46] http://pauginer.github.io/prototypes/analytics-dashboard/index.html [22:09:02] it would inform our dashboard of everything it needs to render that design ^ [22:11:26] YuviPanda: one interesting thing would be what your dashboard layout would be like [22:11:39] (what can people browse and what kinds of graphs it should render) [22:12:19] milimetric: right. [22:12:29] milimetric: I did love your cool demo at the hackathon tho :) [22:12:51] well, that's one motivation to keep this server-less and to stay close to mediawiki as we develop it [22:13:00] because that's one of the end goals - viz on mediawiki [22:13:17] however, that doesn't bring home the velocity-points-bacon right now :) [22:14:14] milimetric: pffft, that's 'but what about this quarter\'s profits?' talk :) [22:15:07] yea, no, i mean, we're severely under-powered as an organization, and I always push for ideas that work now, get the job done, but look far ahead [22:19:21] i think knockout components are key here, so everything we do can be re-used in mediawiki [22:19:32] milimetric: indeed. [22:19:42] milimetric: ah, assuming knockout gets used in MW :) [22:20:39] well, when web components come out we'd migrate to that, and use knockout as just a shim until then. Or keep knockout if we want reactivity [22:20:50] milimetric: btw, https://trello.com/c/nOJDn3zC/7-as-a-mobile-team-member-i-d-like-our-most-popular-metrics-to-be-shown-on-an-analytics-dashboard is more complete now [22:21:04] i imagine on-wiki this would have a reactive / static mode and the static is just an image served by a vega service -> varnish cache [22:21:13] sweet [22:22:16] if I were you, I wouldn't worry about building a dashboarding system right now [22:22:25] if our stuff is too slow for you, just build only what you need and we'll integrate later [22:22:36] ooh, gotta run to the train, nite everyone [22:22:39] milimetric: night!