[11:30:25] Empty channel logs since 2013-11-28? Dummy message to see if channel logs are still working :-) [11:31:27] Channel logs seem to work \o/ [12:49:26] (PS1) QChris: Ignore generated files [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98503 [12:50:02] (PS1) QChris: Add check target to run tests [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98504 [12:50:13] (PS1) QChris: No longer count requests to Special:CentralAutoLogin/ [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98505 [12:51:11] (CR) QChris: [C: -1] "Works as expected (for me), but needs sign-off from Toby that" [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98505 (owner: QChris) [12:52:55] hey qchris [12:53:02] hi average [12:53:04] good afternoon :) [12:53:24] who is the official maintainer of webstatscollector? [12:53:35] domaz, you, drdee, ...? [12:53:53] Looking at the most recent commits, I am not sure. [12:54:12] I just added you. Please just decline, if you do not care about it :-D [12:56:05] so, webstatscollector was created by domasz [12:56:06] but [12:56:24] (CR) Erik Zachte: [C: 2 V: 2] No longer count requests to Special:CentralAutoLogin/ [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98505 (owner: QChris) [12:56:48] there have been some recent improvements on in filter.c for webstatscollector made by Munagala Ramanath (aka Ram) [12:57:02] Noooo ezachte .... [12:57:06] improvements in terms of less packet loss [12:57:12] qchris: what happened ? [12:57:34] qchris: he merged without letting you finish the gerrit change righ t? [12:57:34] average: https://gerrit.wikimedia.org/r/#/c/98505/ [12:58:04] that happened to me before with Erik. I told him to not merge until I tell him my gerrit changes are ready. [12:58:28] Meh. No problem. [12:58:40] The change is not deployed automatically. [12:58:47] fortunately : [12:58:48] :) [12:58:50] ok so going back to webstatscollector [12:59:03] Yes. [12:59:09] Whom should I add as reviewer? [12:59:10] last time, Ram ^^ and drdee made some changes to improve on the packet loss problem it had [12:59:25] qchris: add drdee, and me. I will review your patchset [12:59:31] Ok. [12:59:33] Thanks. [13:02:39] qchris: wait, I will point you to some docs of webstatscollector [13:02:57] qchris: please update them with the changes you make to it [13:03:12] Sure. [13:06:39] (PS1) QChris: Whitespace cleanup in filter.c [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98508 [13:24:00] (PS2) QChris: Add check target to run tests [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98504 [13:28:54] hey milimetric [13:29:05] hi average [13:29:07] morning [13:33:22] when I run hive queries on lots of data it starts throwing out many empty lines [13:33:29] really annoying [13:34:02] I mean before writing the result of the query, it throws lots of empty lines on my console and messes with my console buffer so I can't scroll up to see stuff anymore :( [13:35:37] milimetric: still didn't manage to get hue working. I think there was some miscommunication on my side about why I need hue. I'm open to explaining why I need it in more detail if you have time [13:36:31] also the overhead I was mentioning in more detail. [13:36:50] last time this was discussed in the standup and I don't think there's enough time to discuss that there [13:38:24] milimetric , qchris IOW I'm in the batcave if you want to know more about this [13:39:05] average: I gotta finish running my morning errands [13:39:10] average: Booting google machine. [13:39:17] but I can talk around standup time [13:39:22] before I mean [13:39:47] milimetric , alright, I'll talk to qchris and talk again to you closer to standup time [13:39:59] The thing is, I've got a few other high priority things and I think what you're talking about is mostly convenience with hive [13:40:09] I just use the command line, and that works fine for me [13:40:39] milimetric: yeah, that's exactly why I need to explain it in more detail, so when you have time, lemme know [13:40:51] sure [14:54:12] hi allllllll i think I can make stand up after all, but I have to run to get car inspected shortly after that [14:54:15] this is complicated! [14:55:07] (PS1) QChris: Turn path to filter into variable for tests [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98521 [14:55:08] (PS1) QChris: Move tests into tests subdirectory [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98522 [14:55:09] (PS1) QChris: Add rudimentary test accounting to current tests [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98523 [14:55:10] (PS1) QChris: Add tests for Special:CentralAutoLogin requests [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98524 [16:50:18] hey ok milimetric, there? [16:50:20] or average? [16:50:22] yep [16:50:24] ok so [16:50:30] ja, the old ganglia-logtailer did this [16:50:57] sys.path.append("/usr/share/ganglia-logtailer") [16:51:03] right [16:51:06] and then custom classes were installed there [16:51:09] but, i don't really want to do that [16:51:12] (PS6) Stefan.petrea: [DO NOT SUBMIT] kraken-hive stub [analytics/kraken] - https://gerrit.wikimedia.org/r/96738 (owner: QChris) [16:51:14] i could modify logster to do this [16:51:15] but [16:51:21] i'd rather not modify since it is etsy upstream [16:51:33] so, i can pass any arbitrary module/class name for this [16:51:41] note: that might mean other things can shadow it. If you definitely want it to win name conflicts, you can insert it into sys.path [16:51:42] if there is no '.' in the class name i pass in [16:52:13] then logster will assume that it is a parser that comes with the logster module [16:52:20] but i'm trying to use a module that does not come with the logster module [16:52:28] so, there is PYTHONPATH, or PYTHON_PATH, not sure which [16:52:36] i don't mind settting that in the shell env before I start python [16:52:40] but i haven't gotten it to work yet [16:53:54] ok, lemme look that up [16:55:00] OR [16:55:13] i don't mind installing directly into /usr/local/lib [16:55:23] which is on sys.path by default [16:56:26] it looks like PYTHONPATH needs to be set to $PYTHONPATH:/usr/share/ganglia-logtailer [16:56:27] http://www.stereoplex.com/blog/understanding-imports-and-pythonpath [16:56:54] but, did you try that and it failed? [16:57:17] hmm, i may not have understood what the name needed to be [16:57:23] relative to the path [16:57:23] uhhh [16:57:31] but, actually, it might make more sense for me to install into /usr/local/lib [16:57:40] /usr/local/lib/python2.7/dist-packages [16:57:42] maybe? [16:57:43] hmm [16:57:43] maybe not [16:57:47] ok let me try pythonpath first [16:59:41] ok so milimetric [16:59:44] yes [17:00:00] try pythonpath first [17:00:04] if I have /tmp/python/vklog/VarnishkafkaLogster.py [17:00:17] and I want to import vklog.VarnishkafkaLogster [17:00:25] i should set PYTHONPATH=/tmp/python [17:00:26] right? [17:00:39] hm, I don't think so [17:00:49] I don't know [17:01:11] I think class VarnishKafkaLogster should live inside file vklog [17:01:20] hmmmmm [17:01:36] so that is then [17:01:46] vklog.VarnishkafkaLogster.VarnishkafkaLogster? [17:02:21] yeah, but I think that casing for the middle VarnishkafkaLogster is not "cool" wih python people [17:02:33] i'm just trying to make it work for now [17:02:35] that doesn't seem tow ork though [17:02:36] k [17:02:42] lemme simulate [17:03:01] i can see sys.path being set properly when I set PYTHONPATh [17:04:05] i'm trying it out too [17:04:15] so yeah, it'll just be a matter of importing the right thing [17:06:49] ah! I think you need __init__.py files along your path [17:06:59] just empty? [17:07:18] hang on, checking [17:07:38] yesss i think this works [17:08:06] yes! [17:08:07] awesome [17:08:17] did they have to be empty? [17:08:21] yeah emtpy is fine [17:08:25] k [17:08:25] i just touched __init__.py [17:08:38] i think theoretically they should describe what your module is offering though [17:19:56] (PS1) QChris: Update changelog [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98544 [17:31:05] (PS2) Ottomata: Ignore generated files [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98503 (owner: QChris) [17:31:09] (CR) Ottomata: [C: 2 V: 2] Ignore generated files [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98503 (owner: QChris) [17:35:41] (PS2) Ottomata: Update changelog [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98544 (owner: QChris) [17:35:46] (CR) Ottomata: [C: 2 V: 2] Update changelog [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98544 (owner: QChris) [18:42:30] psshhhhh [18:42:32] python [18:42:33] >>> isinstance(False, int) [18:42:33] True [18:42:35] nuh uh! [18:45:20] lol [18:46:03] >>> issubclass(bool, int) [18:46:03] True [18:46:04] Hmmm [18:46:22] Actually, that makes a bit of sense. [18:46:32] ottomata: ^ [18:46:50] In [13]: x = False; y=1; print type(x); print type(y); [18:46:53] [18:46:55] [18:47:39] Subclasses are instances of parent classes. [18:48:26] bool is subclass of int? [18:48:30] or vice versa? [18:48:33] bool of int [18:48:34] buuuut [18:48:47] >>> class Foo: pass [18:48:47] ... [18:48:47] >>> class Bar(Foo): pass [18:48:47] ... [18:48:47] >>> f = Bar() [18:48:47] >>> isinstance(f, Foo) [18:48:50] True [18:48:50] hm [18:48:55] 0 == False -> True [18:49:00] 1 == True -> True [18:49:02] 2 == True -> False [18:49:03] hm [18:49:44] i guess bool is just an int that is 0 or 1 [18:49:45] hm [18:49:47] ooook [18:49:54] hm [18:57:32] hey ottomata [18:57:44] hiya [18:58:03] http://i.imgur.com/HpkJhFm.png [18:59:25] hey, so we want to publish a dataset as part of some research halfak is doing [18:59:51] the open data repo can take up to a few gigs but they advised that it's best to use it as a registry and host the files ourselves if possible [18:59:56] especially if they are large [19:00:33] this dataset should be 10Gb uncompressed, according to halfak: any chance we could upload it to stat1001 and share it from there? [19:00:43] yeah that's fine [19:01:04] 4Tb of space avail there :) [19:01:13] and what's the best way to upload these large datasets? [19:01:25] I don't think we have stat1001 access [19:01:27] is this a one time thing, or somethign regular? [19:01:29] (or do we?) [19:01:39] I expect we may have more in the future [19:01:46] where are they generated? stat1? [19:01:51] not just the tiny CSV dumps rsync'ed from stat1 [19:01:57] halfak: ^^ [19:02:47] stat1 can write directly to an rsync module on stat1001 [19:02:56] The data is half-generated in db1047 right now. [19:03:00] we might need to get some directories set up for you [19:03:02] More like 1/3rd generated. [19:03:17] in files? or in mysql? [19:03:21] Once I finish this dataset, I'll move on and not plan to return to it soon. [19:03:24] MySQL [19:03:28] ok, so we can directly create a directory under /a/public-datasets [19:03:30] But I can dump a TSV easily. [19:03:41] on stat1 [19:03:59] yeah that's fine, hmm, but i'm noticing on stat1001 [19:04:07] that public-datasets seems to be on / partition [19:04:08] instead of /a [19:04:17] / is only 30G [19:04:23] ouch [19:04:30] we just have to move it, no biggie [19:04:32] it should be on /a anyway [19:04:43] yup [19:05:07] actually, i'll do that now [19:05:44] DarTar: 10GB ? that's a lot of data. what data is it ? [19:06:16] A record of every page creation on all the major wikis [19:07:06] hmm, we should really take this out of the default vhost, hm [19:07:27] DarTar, i should puppetize this a bit better, when do you want to start syncing? [19:07:44] agreed, it's not urgent, halfak's queries are still running [19:10:11] halfak: it's best to announce this in the December report and newsletter since both the data and report are in progress, what do you think? [19:10:54] Hmm... I already feel like we're way ahead of ourselves. However, December sounds fine to me. [19:13:48] ok cool, i'll clean this up a bit in prep for 10G sync to public-datasets soon [19:13:56] will try to get to it this week [19:14:03] keep bugging me if you haven't heard from me [19:14:05] excellent, thank you! [19:17:46] Thanks guys! [19:17:47] :) [21:03:52] (PS2) Ottomata: Whitespace cleanup in filter.c [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98508 (owner: QChris) [21:03:59] (CR) Ottomata: [C: 2 V: 2] Whitespace cleanup in filter.c [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98508 (owner: QChris) [21:04:47] (PS3) Ottomata: Add check target to run tests [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98504 (owner: QChris) [21:04:52] (CR) Ottomata: [C: 2 V: 2] Add check target to run tests [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98504 (owner: QChris) [21:05:08] (PS2) Ottomata: Turn path to filter into variable for tests [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98521 (owner: QChris) [21:05:14] (CR) Ottomata: [C: 2 V: 2] Turn path to filter into variable for tests [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98521 (owner: QChris) [21:05:29] (PS2) Ottomata: Move tests into tests subdirectory [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98522 (owner: QChris) [21:05:45] (CR) Ottomata: [C: 2 V: 2] Move tests into tests subdirectory [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98522 (owner: QChris) [21:06:13] (PS2) Ottomata: Add rudimentary test accounting to current tests [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98523 (owner: QChris) [21:06:18] (CR) Ottomata: [C: 2 V: 2] Add rudimentary test accounting to current tests [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98523 (owner: QChris) [21:06:48] (PS2) Ottomata: Add tests for Special:CentralAutoLogin requests [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98524 (owner: QChris) [21:07:25] (CR) Ottomata: [C: 2 V: 2] Add tests for Special:CentralAutoLogin requests [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98524 (owner: QChris) [21:11:29] (PS1) Ottomata: Fixing changelog line [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98689 [21:11:41] (CR) Ottomata: [C: 2 V: 2] Fixing changelog line [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/98689 (owner: Ottomata) [22:00:15] I finally regained admin access to the Wikimedia group and org on the DataHub after they upgraded so we can resume activity as usual [23:51:55] yo halfak, are you still shooting for this afternoon for the results section? [23:52:35] Yeah. I just got back to hacking. I'll have the results there, but I don't expect to finish the prose tonight :\ [23:53:24] 'sokay, results are what i need to unblock deployment [23:53:39] i owe you prose-y things myself anyhow [23:56:05] Can you remind me when the timer starts and stops for event_moduleLoadTime? [23:56:40] ori-l: ^^ [23:57:26] halfak: it starts on ResourceLoader initialization; that's not a time that corresponds to a W3C standard, but the crucial thing about it is that it's the point at which the code paths for experiment and control diverge [23:57:46] Excellent. [23:58:00] the stop is the window load event [23:58:27] Do you want to discuss this timing when you go through the bucketing logic? [23:58:34] In the "Methods" section?