[00:00:16] Ooh... Definitely looks better. [00:00:40] halfak: and is more full featured. supports multiple queries (and resultsets), etc [00:03:37] * halfak looks for the API docs. [00:05:14] * halfak plays the guess-whats-in-the-module game with dir() [00:05:28] Woo! Found DictCursor where I thought it would be. [00:07:47] YuviPanda, do you know where the docs are? [00:07:52] I'm seriously at a loss. [00:08:03] halfak: I don't think there are generated docs, which is weird considering there are inline docs [00:08:14] halfak: but it follows the python db api standard... [00:08:36] Arg. [00:15:23] (PS1) Yuvipanda: Serve fonts locally [analytics/quarry/web] - https://gerrit.wikimedia.org/r/150996 [00:19:24] Analytics / EventLogging: Scrub logs of sensitive experimental data - https://bugzilla.wikimedia.org/68978 (Kevin Leduc) NEW p:Unprio s:normal a:None logs on Vanadium need to be scrubbed of sensitive data produced by a growth team test. see thread: http://lists.wikimedia.org/pipermail/anal... [00:19:44] YuviPanda, say I wanted python 3.4 on an analytics machine, would an RT ticket be the right way to request that? [00:20:02] halfak: yup, an RT ticket and a lot of waiting :) [00:20:08] halfak: upgrading to trusty is the easiest way to get python 3.4 [00:20:10] * halfak grumbles [00:20:17] You've got to be joking [00:21:15] halfak: nope [00:21:48] halfak: you could get someone to upload python3.4 precise backport to apt.wikimedia.org, but I don't think that's very easy [00:22:08] halfak: if you want to be subversive, you can use pyenv and get whatever version of python you want wherever :) [00:26:32] (PS2) Yuvipanda: Serve fonts locally [analytics/quarry/web] - https://gerrit.wikimedia.org/r/150996 [00:30:48] YuviPanda: I don't, colours are terrible on IRC and there's an mlock so you'd need somebody with +s flag [00:30:57] hmm, ok [00:34:34] (CR) Prtksxna: [C: 1] Serve fonts locally [analytics/quarry/web] - https://gerrit.wikimedia.org/r/150996 (owner: Yuvipanda) [00:35:14] Bah! Looks like pymysql doesn't support connecting to the analytics slaves. [00:35:27] Something is up between the password hashing function and the server. [00:36:13] Oh wait. It's python 3.2 [00:36:16] curses! [00:36:45] sigh [00:37:31] * halfak logs into RT [00:39:26] (CR) Legoktm: [C: 2] "Per Prtksxna" [analytics/quarry/web] - https://gerrit.wikimedia.org/r/150996 (owner: Yuvipanda) [00:39:32] (Merged) jenkins-bot: Serve fonts locally [analytics/quarry/web] - https://gerrit.wikimedia.org/r/150996 (owner: Yuvipanda) [00:39:53] halfak: highly reccomend asking for a trusty upgrade [00:40:07] halfak: ottomate mentioned he is willing to do that, I think [00:40:11] You think that'll be faster? [00:40:14] Ahh.. OK then. [00:40:20] halfak: IMO, yes. [00:40:51] halfak: to get python3.4 into apt.wikimedia.org, first someone has to build it (much harder than the building I'm doing), then build all its dependencies, and make sure they don't conflict with current versions of said dependencies, and then upload [00:41:09] halfak: or find someone who's already done all the leg work *whom ops trust* (big part), and then upload. [00:43:26] (PS1) Yuvipanda: Mark quarry as beta. Other minor design improvements. [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151000 [00:43:41] prtksxna: ^ minor design fixes? [00:50:57] * halfak gives up on a long thread of DB nonsense. [00:51:27] * YuviPanda pats halfak [00:51:38] halfak: I think python3 without trusty is painful [01:06:59] (PS2) Yuvipanda: Mark quarry as beta. Other minor design improvements. [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151000 [01:12:32] (CR) Prtksxna: [C: 1] Mark quarry as beta. Other minor design improvements. [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151000 (owner: Yuvipanda) [01:14:19] (CR) Yuvipanda: [C: 2] Mark quarry as beta. Other minor design improvements. [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151000 (owner: Yuvipanda) [01:14:24] (Merged) jenkins-bot: Mark quarry as beta. Other minor design improvements. [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151000 (owner: Yuvipanda) [01:35:32] (PS3) Gergő Tisza: Query UploadWizard funnel data [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/150750 [01:38:19] (PS3) Gergő Tisza: Query UploadWizard funnel data [analytics/multimedia] - https://gerrit.wikimedia.org/r/150749 [01:40:47] (CR) Gergő Tisza: "I was missing a brain, apparently: I deployed on limn1 without merging, and local changes are overwritten by the deployscript." [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/150750 (owner: Gergő Tisza) [02:40:52] (PS1) Yuvipanda: [WIP] Add DataTables [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151006 [03:02:17] (PS1) Yuvipanda: Add minimal logging to the celery runner [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151007 [03:06:23] (CR) Yuvipanda: [C: 2] Add minimal logging to the celery runner [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151007 (owner: Yuvipanda) [03:06:28] (Merged) jenkins-bot: Add minimal logging to the celery runner [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151007 (owner: Yuvipanda) [03:17:17] (PS1) Yuvipanda: Add super minimal query checking [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151008 [03:19:58] (PS2) Yuvipanda: Add super minimal query checking [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151008 [03:20:23] (CR) Yuvipanda: [C: 2] Add super minimal query checking [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151008 (owner: Yuvipanda) [03:20:29] (Merged) jenkins-bot: Add super minimal query checking [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151008 (owner: Yuvipanda) [03:34:40] (PS1) Yuvipanda: Slightly more robuse unauthorized db checker [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151009 [03:34:56] (CR) Yuvipanda: [C: 2] Slightly more robuse unauthorized db checker [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151009 (owner: Yuvipanda) [03:35:01] (Merged) jenkins-bot: Slightly more robuse unauthorized db checker [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151009 (owner: Yuvipanda) [04:29:51] (PS1) Yuvipanda: Minor styling fixes [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151017 [06:25:22] Analytics / EventLogging: Cleaning up of some (?) EventLogging schemata for Growth - https://bugzilla.wikimedia.org/68931#c1 (christian) *** Bug 68978 has been marked as a duplicate of this bug. *** [06:25:23] Analytics / EventLogging: Scrub logs of sensitive experimental data - https://bugzilla.wikimedia.org/68978#c1 (christian) NEW>RESO/DUP *** This bug has been marked as a duplicate of bug 68931 *** [06:52:37] Analytics / EventLogging: Cleaning up of some (?) EventLogging schemata for Growth - https://bugzilla.wikimedia.org/68931#c2 (christian) > I pushed back on cleanup of raw logs. Steven clarified on-list that they have an agreement with legal to remove the data. So we should do it. [07:53:52] Analytics / EventLogging: Cleaning up of some (?) EventLogging schemata for Growth - https://bugzilla.wikimedia.org/68931#c3 (christian) On-list [1] Kevin said > Christian: before I prioritize it, can you scope out how much work > would be required? The items that immediatedly come mind are: * Clarif... [08:35:08] Analytics / General/Unknown: Replication lag on analytics-store.eqiad.wmnet >12 hours for s1 replicas - https://bugzilla.wikimedia.org/68993 (christian) NEW p:Unprio s:normal a:None Replication lag on analytics-store.eqiad.wmnet was >12 hours since 2014-07-30 Affected databases: enwiki... [08:39:08] Analytics / General/Unknown: Replication lag on analytics-store.eqiad.wmnet >12 hours for s1 replicas - https://bugzilla.wikimedia.org/68993#c1 (christian) NEW>RESO/FIX a:christian (In reply to christian from comment #0) > Corresponding RT ticket: > https://rt.wikimedia.org/Ticket/Display.ht... [09:13:56] qchris: hi there. Do you have any clue whether libcidr and libanon are still being used? [09:14:10] there is some libdclass floating around as well [09:14:18] Aaaaaaahm. No idea. [09:14:30] We stopped using libdclass. [09:14:47] Not sure if there were other users. [09:14:52] I guess I should list the package I am wondering about and fill a bug about it [09:15:01] That'd be great. [09:15:38] Ottomata might know better about uses. [09:15:48] have to dig in :] [09:15:55] thank you! [09:20:14] hashar: libcidr and libanon are still used in udp-filters [09:20:24] IIRC, this is still in active production use. [09:20:56] But I could not find a production use of libdclass. [09:23:37] reat [09:25:10] qchris: the reason I am asking is that the Debian packages have not been build for Ubuntu Trusty [09:25:26] and I have them installed on Jenkins slaves which cause puppet to complain on a Trusty instance :D [09:25:31] :-D [09:25:56] libdclass can probably get nuked (but do not quote me on that) [09:26:11] We've switched to ua-parser, so no libdclass use from our side. [09:26:18] And I doubt others are using it. [09:26:56] For libcidr + libanon, I guess we need to build those packages for Trusty before we switch our analytics machines to trusty. [09:27:23] indeed [09:27:42] any bugzilla component I can fill that against ? [09:28:00] ah Analytics > General/Unknown [09:28:02] No. I'd use "Analytics" / "General/Unknown" [09:28:05] Yes. [09:31:31] and done https://bugzilla.wikimedia.org/show_bug.cgi?id=68997 [09:31:33] thank you! [09:31:40] Analytics / General/Unknown: Package libcidr + libanon + libdclass for Ubuntu Trusty - https://bugzilla.wikimedia.org/68997 (Antoine "hashar" Musso) NEW p:Unprio s:normal a:None We have some Jenkins jobs building libanon, libdclass and libcdr whenever a patch is proposed in Gerrit. Hence t... [09:32:33] hashar: Thanks for the bug! [09:40:52] Analytics / General/Unknown: Package libcidr + libanon + libdclass for Ubuntu Trusty - https://bugzilla.wikimedia.org/68997#c1 (christian) (In reply to Antoine "hashar" Musso from comment #0) > But libanon and libcdr could use a Trusty version. Yes. We need them to build the udp-filters repo. Its exec... [10:26:23] Analytics / EventLogging: Cleaning up of some (?) EventLogging schemata for Growth - https://bugzilla.wikimedia.org/68931#c4 (christian) ahalfak said in private communication that he has finished the things he needed to do, so we're good to get things moving from their end. [11:19:01] (PS2) Yuvipanda: Minor styling fixes & add footer [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151017 [11:42:35] qchris: :) thanks for adding otto, I forgot to do that [11:42:49] yw :-) [11:43:06] That will give us PHP glory sooner. [11:43:53] Analytics / General/Unknown: Turn on PHP on stat servers - https://bugzilla.wikimedia.org/68937#c6 (christian) (In reply to Toby Negrin from comment #1) > Otto -- how difficult to do is this? [...] He is not CCed :-) Not sure he received this ping. Hence, adding him to CC. (And also adding him as rev... [11:44:32] qchris: heh :) although, in hindsight, ugh PHP :) [11:44:39] qchris: also, have you seen quarry.wmflabs.org? [11:44:52] Hahaha. I was so waiting for the PHP bash :-) [11:45:09] I only heard/read about quarry ... but have not tried it. [11:45:30] qchris: do give it a shot :) [11:45:40] Just doing that :-) [11:46:16] :D [11:46:33] I especially liked when a few days ago I saw the commit message fly by that switched to CC0 :-) [11:51:43] is http://dumps.wikimedia.org/other/ down or is it just me? [11:52:35] YuviPanda: quarry looks great! [11:52:52] jorn: Does not work for me either. [11:52:56] qchris: :D suggestions for improvements? :) [11:53:31] * YuviPanda considers switching code of Quarry itself to GPL, is unsure [11:53:34] YuviPanda: Not really. It just looks great. [11:53:42] * qchris likes GPL a lot. [11:53:54] * YuviPanda is ambivalent to GPL [11:54:11] but this is the first time I've written a proper 'application' that's not just a library [11:54:32] qchris: btw, it's also fully puppetized :) and distributed as well (currently two machines, can add more trivially) [11:54:37] Since it is a webservice ... AGPL? [11:54:52] ah, no :) [11:54:55] too viral for my tastes :) [11:54:58] Mhmm... puppetized. I like that. [11:55:26] qchris: toby was excited by the prospect of putting one of these in prod, for PMs, etc [11:55:30] with access to EL data [11:55:35] :-D [11:55:53] We cannot even support the systems we currently have. [11:56:08] But a have no say there. [11:57:21] qchris: heh, hopefully in a lot of months, etc. and yeah, it's in a weird state since there's only me building out all parts [11:57:30] very much a volunteer project [12:59:07] Analytics / General/Unknown: Package libcidr + libanon + libdclass for Ubuntu Trusty - https://bugzilla.wikimedia.org/68997 (Antoine "hashar" Musso) [13:23:46] (CR) QChris: [C: -1] Use tsv format when outputting webrequest faulty hosts files (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/150963 (owner: Ottomata) [13:33:43] (CR) QChris: Use tsv format when outputting webrequest faulty hosts files (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/150963 (owner: Ottomata) [14:23:52] Analytics / General/Unknown: Package libcidr + libanon + libdclass for Ubuntu Trusty - https://bugzilla.wikimedia.org/68997#c2 (Andrew Otto) (In reply to christian from comment #1) > (In reply to Antoine "hashar" Musso from comment #0) > > But libanon and libcdr could use a Trusty version. > > Yes. We... [14:24:24] Analytics / General/Unknown: Package libcidr + libanon + libdclass for Ubuntu Trusty - https://bugzilla.wikimedia.org/68997#c3 (Andrew Otto) Oops, I was just trying to add Gage to the CC list, didn't mean to add a comment. [15:00:11] Analytics / General/Unknown: Turn on PHP on stat servers - https://bugzilla.wikimedia.org/68937 (christian) PATC>RESO/FIX [15:32:31] qchris_away: https://gerrit.wikimedia.org/r/#/c/151095/ whenver :) [15:32:41] * qchris_away looks [15:32:48] Darn. I am still marked away. [15:33:19] "#!/bin/bash" <-- Music to my ears :-) [15:36:09] ottomata: Does the script run for you? [15:36:52] yes [15:36:55] haha, it doesn't fo you? [15:37:36] I didn't try. But a few things look backwards at first ... So wanted to check. [15:37:49] Ok. I'll give it a shot then. [15:38:03] (CR) Yuvipanda: [C: 2] Minor styling fixes & add footer [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151017 (owner: Yuvipanda) [15:38:10] (Merged) jenkins-bot: Minor styling fixes & add footer [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151017 (owner: Yuvipanda) [15:39:03] ok ja qchris, my shell scripting skills are good, but very inconsistent, since I always have to re-lookup the syntax for things [15:39:07] happy for any comments [15:39:34] ok. [16:20:53] Analytics / EventLogging: database consumer could batch inserts (sometimes) - https://bugzilla.wikimedia.org/67450 (Kevin Leduc) p:Unprio>Lowest s:normal>enhanc [16:35:07] Analytics / EventLogging: Cleaning up of some (?) EventLogging schemata for Growth - https://bugzilla.wikimedia.org/68931 (Kevin Leduc) p:Unprio>Highes [16:36:45] qchris, cool, thanks [16:36:51] i like everything you said, will fix up [16:36:53] one 1 [16:36:53] q [16:36:55] Nit: I see that ${ensure} is only set to constants, but I'd quote it nonetheless. [16:36:57] not sure what you mean there [16:36:59] comment on line 92 [16:37:12] Using "${ensure}" instead of ${ensure} [16:38:00] Whoops. [16:38:07] s/ensure/exists/ [16:38:52] hm, i'm doing integer comaprison though...i know bash might not care [16:39:02] but i didn't quote it because I'm using -eq and comparing ints [16:39:54] It's fine to leave unquoted. [16:40:01] Both ways work. [16:40:21] But if ${exists} changes at some point in the future, the quoted version is safe. [16:40:34] In the sense, that worst case, the test fails versus [16:40:49] Arbitrary might maybe get run. [16:40:57] But it's fine to ignore [18:26:00] (PS7) Terrrydactyl: [WIP] Add ability to global query a user's wikis [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/129858 [18:26:22] (CR) jenkins-bot: [V: -1] [WIP] Add ability to global query a user's wikis [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/129858 (owner: Terrrydactyl) [19:15:32] qchris, ping [19:16:08] pong. [19:16:13] hey! [19:16:30] just got home from an extended ru/il/uk visit [19:16:39] Whoa. Sounds crazy :-D [19:16:44] oh yeah :) [19:17:05] i was wondering how the api analytics is working out for you? [19:17:11] any questions / concerns? [19:17:29] The WAP thing did not go away. [19:17:36] So there are still carriers with letters appended. [19:17:41] I guess that's gonna stay? [19:17:50] i thought we got rid of them? [19:17:57] or i'm not sure i understand [19:18:18] Let me get to the files. 1sec. [19:18:29] qchris, vchat? [19:18:46] Sure. Let me boot the google machine. [19:18:56] same here [19:20:38] I'm in the analytics batcave at http://goo.gl/1pm5JI [19:20:45] yurik: ^ [19:21:55] qchris_meeting, not connecting [19:22:00] "you are not allowed ... [19:22:10] oops [19:22:22] Let me invite you. [19:23:19] trying direct video chat... [19:23:35] yuri: https://plus.google.com/hangouts/_/gwmtf4z7meoanq72nb6o5cwp5ua [20:14:37] Analytics / Refinery: Epic: AnalyticsEng has kafkatee running in lieu of varnishcsa and udp2log - https://bugzilla.wikimedia.org/68139 (Kevin Leduc) [20:21:32] DarTar: la tua ultima presentazione è in Commons per caso? [20:21:40] sì [20:22:03] https://commons.wikimedia.org/wiki/File:Wikimedia_Mobile_Trends.pdf [20:22:47] grazie [20:23:20] leila metterà anche gli altri plot (compresi quelli che non sono stati inclusi nel deck) [20:23:44] np [20:24:47] just to be sure, I put the other plots on the meta page, too, right? DarTar. [20:25:19] Nemo_bis, I'll let you know when I have it ready [20:26:00] leila: yes that’d be grand [20:26:07] sure [20:33:17] hey [20:34:06] DarTar: https://meta.wikimedia.org/wiki/Research_talk:Mobile_editor_engagement/Editor_activation [20:35:00] thanks leila [20:35:16] Nemo_bis: cool, will read it in a sec [20:35:28] yo tnegrin [20:46:20] hey Nemo_bis [20:46:28] (PS1) Yuvipanda: Add Labs ToU link to under the Login button [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151210 [20:46:30] (PS1) Yuvipanda: Switch to metawiki for login [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151211 [20:46:44] read your comments, let me respond quickly (I’m about to switch off for tonight) [20:47:03] Thanks! [20:47:20] > The graph in the next page doesn't clarify whether the increase in registered editors, caused by this restriction, was able to compensate. [20:47:20] (PS1) Yuvipanda: Untitled queries aren't awesome [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151213 [20:47:41] it clearly wasn’t, in terms of total unique editing users on tablets [20:47:47] (PS1) Yuvipanda: Bump up number of concurrent tasks [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151214 [20:48:37] the plot on the right indicates that we’re seeing a wash in unique editing *registered* users on tablet [20:48:45] after the switchover [20:49:40] so we see a boost in activation for newly registered users, but an overall drop in total editing users [20:49:52] Nemo_bis: I’m not 100% sure if that’s what you’re asking [20:50:12] err, why is the header "Anonymous editors" then [20:50:48] Ironholds: ! [20:50:53] i just made an hdfs homedire for you [20:50:58] because that was the focus of the slide, did you see the actual presentation? [20:50:58] not sure if that was your problem, but it probably was [20:50:59] ry now [20:51:00] try now [20:51:11] thanks! [20:51:16] so, /home/ironholds/? [20:51:18] Ironholds: also, i do not 100% vouch for this data yet, need to do some stuff [20:51:21] /user/ironholds [20:51:23] but [20:51:27] you can check out the data for yourself! [20:51:27] kk [20:51:30] and see the quality! [20:51:33] this will also resolve some issues with RHive, so I may be able to kill two birds w/ one stone. [20:51:34] DarTar: no I didn't yet; now I read the legend more closely, thanks [20:51:36] wmf_raw.webrequest_sequence_stats [20:51:40] Nemo_bis: sorry, a wash in total editing users on tablet [20:51:48] yep, now clear [20:51:48] you can see per hour per hostname percent_different breakdowns [20:51:54] yep [20:51:56] percent_different == 0.0, all is well [20:52:00] < 0.0, missing data [20:52:03] > 0.0 duplicate data [20:52:04] the tl;dr is that the forced signup is really a double edge sword [20:53:27] ottomata: percent_different is no good for data quality. If 1000 lines are missing and 1000 lines are duplicate, percent_different will be 0. [20:53:46] (PS1) Yuvipanda: Add report bugs link [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151216 [20:53:47] DarTar: I only see one edge, because the increase in activation didn't produce any increase in productivity :P [20:54:04] yes yes esy [20:54:07] but that is unlikely [20:54:13] ok, let me rephrase [20:54:29] a non 100% reliable indicator of data quality :p [20:54:35] :-P [20:56:27] Nemo_bis: Maryana has a pretty articulated rationale around this choice, not one that I fully endorse, but an interesting one [20:57:06] for newly registered users on tablets it definitely produced a boost in productivity [20:57:16] a significant and very large one [20:57:37] well, rhive is giving the same error but that may just be rhive [20:57:40] let's see what the terminal does. [20:57:49] but for the population of tablet editors as a whole it’s definitely not a win [20:58:24] Nemo_bis: I’ll post a follow up over the weekend, I’m wrapping up a few things and disconnecting for today [20:58:43] I’m on CEST, for a change :) [20:59:25] Ironholds: I want to understand if the increased cluster size helps the larger queries [20:59:33] it sounds like it doesn't [20:59:38] tnegrin, well, I don't know yet [20:59:43] I haven't run a large query against it; p [20:59:59] the only changes I've noticed so far are a lot of deprecation warnings on load and slower setting-up-map-reduce time. [21:00:15] but we'll see what happens. I've not got a use case for the cluster at the mo so it's not like I've been doing anything interesting on it. [21:01:21] DarTar: thanks! [21:02:01] this isn't good [21:04:02] ottomata, org.apache.hadoop.security.AccessControlException: Permission denied: user=ironholds, access=WRITE, inode="/user/ironholds":ironhods:ironholds:drwxr-xr-x [21:04:17] same pain. [21:05:02] hm [21:05:05] k... [21:05:38] ahp [21:05:39] * Ironholds goes to to some tidying while waiting for the 1000000th latex submodule to install. Vignette generation in R makes me want to punch things. [21:05:39] typo! [21:05:40] ironhods [21:05:43] baha! [21:05:58] ok try again [21:06:08] that's a wonderful typo [21:06:25] https://en.wikipedia.org/wiki/File:Kolenkit.jpg <- ironhods. [21:06:40] okay, let's see [21:06:57] tnegrin: Ironholds, while you experiment, please keep in mind that there has not been a qchris+ottomata announcement email saying "hive is good to go!" :p [21:07:10] yep [21:07:11] :-D [21:07:20] mostly I just want to see if I can kill that "integrate R and Hive" card on trello [21:07:25] which has been assigned to me for 3 months. [21:07:28] haha k [21:07:30] cool [21:07:30] I like it when ottomata is defensive ;-) [21:07:37] hahah [21:07:39] defensive programm(ing|er) [21:07:48] (CR) Yuvipanda: [C: 2] Add Labs ToU link to under the Login button [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151210 (owner: Yuvipanda) [21:07:55] (Merged) jenkins-bot: Add Labs ToU link to under the Login button [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151210 (owner: Yuvipanda) [21:08:16] qchris: btw, thanks for the better way of getting that hdfs_path arg in that script, i kept rearranging that part looking for a DRY way [21:08:18] yours is good :) [21:08:29] (CR) Yuvipanda: [C: 2] Switch to metawiki for login [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151211 (owner: Yuvipanda) [21:08:35] (Merged) jenkins-bot: Switch to metawiki for login [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151211 (owner: Yuvipanda) [21:08:39] Yay. \o/ [21:08:48] (CR) Yuvipanda: [C: 2] Untitled queries aren't awesome [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151213 (owner: Yuvipanda) [21:08:53] (Merged) jenkins-bot: Untitled queries aren't awesome [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151213 (owner: Yuvipanda) [21:09:39] (CR) Yuvipanda: [C: 2] Add report bugs link [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151216 (owner: Yuvipanda) [21:10:08] (CR) Yuvipanda: [V: 2] Add report bugs link [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151216 (owner: Yuvipanda) [21:10:22] (CR) Yuvipanda: [C: 2] Bump up number of concurrent tasks [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151214 (owner: Yuvipanda) [21:10:27] (Merged) jenkins-bot: Bump up number of concurrent tasks [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151214 (owner: Yuvipanda) [21:10:32] qchris, why you expect test -f instead of test -e? [21:10:34] ottomata, victory! [21:10:44] with R? [21:10:45] Ironholds: ? [21:10:51] naw, just at getting it to work in the terminal [21:10:54] ah ha [21:10:54] k [21:11:02] I'll test R now, but given that the errors were the same, I tohught terminal debugging would be more useful [21:11:17] DarTar, if you're going to do the uniques stuff do you need hadoopish data? [21:12:54] Ironholds: lucid or proper uniques? [21:13:35] the long term plan is definitely going to need hadoop [21:13:47] and unsampled data [21:13:49] neither [21:13:51] the apps stuff ;p [21:13:56] ah [21:14:00] ottomata, what hive version are we using? sending a big report to the RHive team [21:14:33] is there a request for app uniques coming up? [21:15:03] sql gods -- what's the sql for counting unique items and sorting by that count? [21:15:13] Ironholds: [21:15:14] Version: 0.12.0+cdh5.0.2+319-1.cdh5.0.2.p0.16~precise-cdh5.0.2 [21:15:17] ottomata, ta [21:15:31] tnegrin, SELECT X, COUNT(*), GROUP BY X, ORDER BY COUNT(*) DESC [21:15:35] Ironholds: what apps stuff? [21:15:48] YuviPanda, dan sent a long-ass request for unique data around the apps. [21:15:49] Ironholds: I haven’t fully thought through the implications of UVs for apps tbh [21:15:52] if you do order by in hive [21:15:54] its going to ask for a limit [21:15:55] so [21:15:55] DarTar, yeah :( [21:15:55] ah [21:15:57] limit 1000000 if you want [21:15:57] ah [21:16:04] ottomata, yeah, unless you set nonstrict [21:16:10] which I have had to do on occasion [21:16:15] thanks [21:16:17] it also objects on ORDERs without GROUPs. Stupid hive. [21:16:30] * Ironholds adds "sql god" to his resume [21:16:44] you can't order by in hive wihtout a group by? [21:16:49] in strict? [21:16:51] Ironholds: querying an hours worth of data is fine [21:16:57] saw it, yes, we’ll definitely want to do this via hadoop [21:17:06] ottomata, at least in the old system it objected strongly [21:17:07] I get sane #s of mappers and reducers [21:17:16] aye, i'm sure its the same here [21:17:20] guess i never tried it [21:17:50] this was pre-TABLESAMPLE [21:18:05] guys, I’m calling it a day – I have to pack for a little 1-day family trip [21:18:06] but now I have no reason to generate from rand() which means I have no reason to give a crap about order when I'm not clustering. woohoo! [21:18:14] happy hiving :) [21:18:25] happy hiking! [21:24:38] so it takes about 3 minutes to generate the top 100 URIs from an hours worth of data [21:25:30] Ironholds: this is how you kill a job -- this is in the output [21:25:31] Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1406229821917_15545 [21:32:11] tnegrin, yeah, I know :) [21:32:23] but ta again [21:32:36] we have more data in a minute than my old job had in a hour [21:35:47] ottomata: about "test -f" vs "test -e" ... I was expecting "test -f" since we write the done flag as file. [21:35:47] so, Ironholds, I need to understand this: why does select * from webrequest where year=2014 limit 1 take ~80 sec? [21:36:14] ottomata: but I do not care. -e is fine too. [21:36:53] leila, should it be faster or slower? [21:37:27] thought faster for 1 row [21:37:55] k, take it back, Ironholds [21:38:45] qchris, yeah, but i was writing this script as a generic check [21:38:48] coudl be used for directories or files [21:39:28] leila: i doubt it'll get much faster than that [21:39:32] but the script is called '..._file' not '..._path' [21:39:37] its pretty dumb, but ja, hive is high latency :/ [21:40:00] to get you that data, its got to translate your hive query, laucnh maps and reducers across the cluster [21:40:03] parse the json [21:40:06] ottomata, I took it back. it's a huge chunk of data, when I look at a day, it's what I expect. [21:40:07] etc. et.c [21:40:13] aye cool [21:40:22] qchris, then let's change it! [21:40:22] :) [21:40:31] Hahaha. [21:40:55] Sure. For 'check_hdfs_path', 'test -e' looks like the right thing. [21:41:40] qchris: ^https://gerrit.wikimedia.org/r/151095  [21:41:43] i'm not looking for a +1 yet [21:41:49] still gotta add the icinga bits [21:41:59] but, lemme know if you ahve more comments [21:42:06] or if I forgot to respond to one [21:42:08] don't think I did [21:42:40] Whoa. With short parameters! [21:43:08] And exit codes!!!!!1111!!!11 [21:43:14] Wooohoo :-D [21:43:33] something seems exciting! [21:43:41] * terrrydactyl celebrates too! [21:43:46] terrrydactyl: https://gerrit.wikimedia.org/r/#/c/151095 [21:43:46] i think qchris likes pretty bash [21:43:51] which may or may not be an oxymoron [21:44:11] it does look pretty [21:44:17] I have a fetish for bash... [21:44:27] * qchris looks up oxymoro-thingie-thing [21:44:31] qchris, i can't say I mind it [21:44:48] its syntax is cumbersome, but somehow I still enjoy it [21:44:55] i haven't seen much bash, but i guess it can look terrible? [21:44:56] :-D [21:45:30] terrrydactyl: if you ever design a language, would you choose to close blocks of code by spelling their block opening backwards? [21:45:43] if ... fi [21:45:43] case ...esac [21:45:46] hahaha, that is true. [21:46:30] * qchris tries not hard to point the finger at python for getting code blocks wrong [21:46:33] my go to "what were they thinking?" langugage is lisp [21:46:40] those parentheses [21:46:45] * Ironholds grins [21:46:57] in R, there are 3 class systems, 4 methods of assignation, and 2 simultaneously is and is not true. [21:47:15] In VHDL they have 9-valued logic :-P [21:47:17] we also have both NULL and NA for ternary logic, except only NA can actually be used for ternary logic, on account of if you do a check against a NULL it produces /no output/ [21:47:26] there are worse languages than lisp. Like perl! [21:47:27] * Ironholds runs [21:47:44] but: every language sucks in some way. [21:48:12] That's a nice ending line for a working week! [21:48:13] ACK, qchris, i left a bad idea in the usage info [21:48:14] fixing. [21:48:31] ottomata: I'll look at it on monday. Is that ok? [21:48:33] i've never used perl so i can't comment [21:48:35] yeah [21:48:37] no prob at all [21:48:49] Enjoy your weekend! (Or see you tomorrow :-D) [21:48:55] laters! [21:48:57] for a while i didn't like php, but then i started writing code for an extension so it grew on me [21:49:42] i took a class with R and i literally don't remember any of it. i may have blocked it from memory [21:56:31] ok so, tnegrin, some quick unreliable kinda maybe good news: [21:56:48] there was a hiccup with amsterdam hosts on the 29th [21:56:56] which caused a little bit of duplicate data [21:57:12] the link went down or soemthing for a bit [21:57:36] but, aside from that, i think data since then has been 100% good [21:58:25] but we detected that right? [21:58:31] oh yes, its all in the table [21:58:34] so easy to see now [21:58:38] after every partition is added [21:58:43] the sequence stat query runs [21:58:53] so you can see the breakdown per hour [21:59:23] https://github.com/wikimedia/analytics-refinery/blob/master/oozie/webrequest/partition/add/generate_sequence_statistics.hql [21:59:58] qchris: did a really awesome job with all this oozie and hive stuff [22:00:00] works so well [22:00:06] once oozie is good, it just goes! [22:02:06] yes -- I saw that [22:02:09] very cool [22:02:38] so I just ran a query over an hour and a day. the day took about 35 minutes, but everything seemed fine. [22:03:00] I think I'll try a month now -- should I use screen to keep it going? [22:03:00] (PS1) Yuvipanda: Re-run tasks if the worker crashes midway [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151226 [22:03:20] (CR) Yuvipanda: [C: 2] Re-run tasks if the worker crashes midway [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151226 (owner: Yuvipanda) [22:03:24] (Merged) jenkins-bot: Re-run tasks if the worker crashes midway [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151226 (owner: Yuvipanda) [22:08:39] Analytics / EventLogging: Get back to 90 days of logs on vanadium - https://bugzilla.wikimedia.org/69029 (christian) NEW p:Unprio s:normal a:None Vanadium (the main machine for collecting EventLogging data) used to have logrotation of EventLogging data set to 90 days of EventLogging data. D... [22:10:37] Analytics / EventLogging: Get back to 90 days of logs on vanadium - https://bugzilla.wikimedia.org/69029#c1 (christian) The RT ticket for moving the mount is RT 8063. [22:11:54] Analytics / Quarry: Add paren matching to SQL writer text field - https://bugzilla.wikimedia.org/69030 (Aaron Halfaker) NEW p:Unprio s:enhanc a:None It would be very helpful to have an indication of matching parens while writing a query. [22:16:38] (PS1) Yuvipanda: Add bracket matching to the editor [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151228 (https://bugzilla.wikimedia.org/69030) [22:16:52] (CR) Yuvipanda: [C: 2] Add bracket matching to the editor [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151228 (https://bugzilla.wikimedia.org/69030) (owner: Yuvipanda) [22:16:57] (Merged) jenkins-bot: Add bracket matching to the editor [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151228 (https://bugzilla.wikimedia.org/69030) (owner: Yuvipanda) [22:17:52] Analytics / Quarry: Add paren matching to SQL writer text field - https://bugzilla.wikimedia.org/69030#c3 (Yuvi Panda) PATC>RESO/FIX Deployed! [22:29:22] (PS1) Yuvipanda: Add proper content id to lists html [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151231 [22:29:41] (CR) Yuvipanda: [C: 2] Add proper content id to lists html [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151231 (owner: Yuvipanda) [22:29:45] (Merged) jenkins-bot: Add proper content id to lists html [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151231 (owner: Yuvipanda) [22:32:20] (PS1) Yuvipanda: Remove debugging alert [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151232 [22:32:54] Analytics / Quarry: When I change the title of a query I get an alert {"id": 2} - https://bugzilla.wikimedia.org/69033 (Aaron Halfaker) NEW p:Unprio s:normal a:None Subject says it all. [22:33:07] (CR) Yuvipanda: [C: 2] Remove debugging alert [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151232 (owner: Yuvipanda) [22:33:12] (Merged) jenkins-bot: Remove debugging alert [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151232 (owner: Yuvipanda) [22:34:07] Analytics / Quarry: When I change the title of a query I get an alert {"id": 2} - https://bugzilla.wikimedia.org/69033#c1 (Yuvi Panda) NEW>RESO/FIX Fixed in https://gerrit.wikimedia.org/r/151232, deployed as well [22:45:55] (CR) QChris: "> I have no idea how this works!" (3 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/150844 (owner: QChris) [22:47:54] (PS1) Yuvipanda: Don't crash when encountering DateTimes on output [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151236 [22:48:31] (CR) Yuvipanda: [C: 2] Don't crash when encountering DateTimes on output [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151236 (owner: Yuvipanda) [22:48:38] (Merged) jenkins-bot: Don't crash when encountering DateTimes on output [analytics/quarry/web] - https://gerrit.wikimedia.org/r/151236 (owner: Yuvipanda) [23:07:39] Analytics / Quarry: Add a stop button to halt the query - https://bugzilla.wikimedia.org/69037 (Aaron Halfaker) NEW p:Unprio s:enhanc a:None Sometimes I start a query and I realize I've made a mistake that is going to cause it to run for a long time. I'd like to be able to kill a query wit... [23:08:22] Analytics / Quarry: Add a stop button to halt the query - https://bugzilla.wikimedia.org/69037#c1 (Yuvi Panda) Yup, need to slightly redo the way I'm managing the queries to make this happen. On it now. [23:17:40] (PS8) Terrrydactyl: [WIP] Add ability to global query a user's wikis [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/129858