[12:52:43] Good morning science people :) [12:53:02] (or timezone appropriate greeting) [12:53:16] good morning from new yoik [12:54:07] Ironholds, your at the R conf? [12:54:24] yep [12:54:25] * halfak scopes out agenda [12:54:26] http://www.rstats.nyc/#agenda [12:54:29] it's full of ENTERPRISE (TM) people [12:54:31] why? [12:54:41] enterprise is terrible for code [12:54:51] hell, enterprise code isn't even the best enterprise! [12:54:55] I thought I'd get to hang with Picard! [12:55:06] WTF is "enterpise"? I thought it wasn't a real thing. [12:55:34] The USS Enterprise [12:55:50] And her sister vessel, the USS Software as a Service [12:55:56] Just overheard "hey bro, you wanna wear a Data Mafia t-shirt or you wanna be professional?" [12:56:01] just to give scope to the sort of person here [12:56:07] harej, I like it. Permission to tweet? [12:56:14] Sure [12:56:33] TIL https://en.wikipedia.org/wiki/Enterprise_software [12:56:41] == Software for organizational needs. [12:56:43] So, CSCW [12:56:55] ta [12:56:55] Except more buzzwords and less Theory [12:56:59] o/ [12:57:01] halfak, ENTERPRISE [12:57:55] "Enterprise software is an integral part of a (computer based) Information System, and as such includes web site software production." wut [12:58:37] http://thedailywtf.com/articles/The-Enterprisey-Null-Test [12:58:57] http://thedailywtf.com/articles/Enterprise_SQL [13:00:55] Comment from the talk page makes me think this whole article is a big practical joke. "This article reads like it was written by someone whose first language is not English. Almost every sentence is overly long and awkward with bizarre grammatical errors. It should probably be re-written from scratch" [13:07:47] businesspeople don't really know how to write coherent english [13:08:02] our solutions firm leverages saas to maximize revenue potential [13:08:07] what ENTERPRISE means is that your code will be written in ENTERPRISE java which is like normal java but slow and expensive and ISO-compliant and expensive [13:08:27] lip service as a service [13:10:54] harej, I already built that [13:11:02] yes [13:11:02] http://ironholds.org/lsaas/ [13:11:50] "like normal java but slow and expensive," so, like normal java. [13:12:24] hey! [13:12:32] Java can be pretty fast [13:12:35] not C++ fast, but ;p [13:14:21] Ironholds, C++ pales in comparison to machine code. [13:14:23] :P [13:14:39] halfak, do you typically write machine code? ;p [13:14:40] RUnning all those unnecessary instructions [13:14:57] Ironholds, I don't think that's relevant. [13:15:42] I'll respond as soon as I manually encode this IRC socket in machine code. [13:16:04] halfak, heh [13:23:34] hahaha [13:23:37] they're talking reading sessions [13:25:34] actually it's fascinating; halfak I should get this dude to talk to you [13:25:48] COol :) [13:25:53] he's treating sessions as markov models and calculating the distances between them to identify different patterns of behaviour [13:26:17] Oh yeah! I was working on that for a while too. More D. Kluver's modeling work. [13:26:45] We struggled get enough observations with the datasets we had to do anything interesting. [13:27:01] and the code is all in python [13:27:04] We were using data from movielens.org [13:27:11] Ironholds, R conf? [13:27:16] yep [13:27:21] lol [13:27:30] but it's the director of the NYT research lab, and he's not an R person [13:27:49] Good to know that there's not a religious battle over the different environments. [13:28:41] nope! [13:29:00] I should bug him about the session approach [13:29:02] * Ironholds schemes [13:30:13] hmn [13:30:23] he had an interesting point on reading recommendations [13:31:16] this guy is GREAT. I need to put yinz lot in touch [13:31:23] no pittsburgh [13:31:36] he actively dislikes the idea of segmenting by demography [13:39:35] and this bit is just a sales pitch [13:39:45] Buy my python. [13:39:48] Really. [13:43:22] oh god [13:43:37] "industry leaders believe in us" and "highly enterprise-oriented" in subsequent sentences [13:44:25] Ironholds, a big reason why I never got a foothold in the DC tech community is because of people like them. I use highly vulgar vocabulary to describe these kinds of people. [13:45:04] hah [13:45:48] I'm trying to write my report to the government describing the work my organization did. The first thing we did was a ten minute phone call that wasn't really about anything. [13:46:42] * Ironholds giggles [18:03:31] halfak: Dario is not in the meeting but if you join maybe we can get the ball rolling? [18:06:48] halfak: are you around? [21:26:58] halfak: hey; thanks again for generating that archive of refs. Do you have any tips or good practices about processing large TSVs in Python? [21:27:26] I'm guessing putting the 15GB file in memory is going to be a challenge :) [21:27:34] 25GB, even. [21:27:39] guillom, I'd recommend the 'csv' package, but I generally find that package to be a pain in the butt. [21:27:47] So I usually process TSVs manually. [21:27:52] Python's generators make this easy. [21:27:59] Allow me to create a snippet. :) [21:28:15] Yeah, I used csv for another project and it worked mostly fine; I had to work around its lack of UTF8 support but it worked. [21:28:26] guillom: unicodecsv is a nice module :) [21:28:36] it is exactly the same as csv but with unicode support! [21:29:37] YuviPanda: hah! Well, I got away with just adding a ut8reader; it was just a few lines of code. I'll keep your suggestion in mind for the next time though :) [21:29:40] Thanks [21:29:45] :D [21:29:48] yw [21:30:56] Except for MrMetadata, I mostly use Python as a utility knife for small projects [21:31:38] Although for refs, I might eventually create a more complex dashboard; dunno yet the scope of this project. [21:32:42] https://gist.github.com/halfak/9224de257db0b7b0403c [21:32:58] Are you going to be running python2.7 or python3? [21:33:21] halfak: awesome! [21:33:38] halfak: I think I've been using 2.7 so far..? I'm not sure. /checks [21:33:47] It hasn't made a big difference so far [21:33:57] Gotcha. python3 will read the file in as unicode by default. [21:34:01] yes, 2.7.8 [21:34:24] In python2.7, you'll want to open stdin with a UTF8 codec [21:34:34] * guillom looks at the available packages. [21:35:08] halfak: not as cool, but I just setup tools.wmflabs.org/cdnjs. Makes it super easy to include arbit 3rd party libraries in a wikimedia privacy policy way for JS / CSS. [21:35:54] YuviPanda, cool. [21:36:00] I will make use of this. [21:36:11] Looks like I already have python3 installed; yup; cool [21:36:15] halfak: thanks again :) [21:36:26] no problem guillom. :) [21:36:31] Happy to help [21:36:54] halfak: I may have questions for you next week about generating the same kind of data for a couple of other wikis. Nothing urgent, and once you tell me how I can do it myself and leave you alone :) [21:37:07] For now I have my hands full with enwp [21:37:09] FYI: My frustrations with the CSV model are around how it can misbehave if you find you have a surprisingly long row. [21:37:15] guillom, I have a present for you :D [21:37:24] halfak: again?! [21:37:30] https://github.com/halfak/mwrefs [21:37:39] It's a simple utility that you can just run on the stats servers. [21:37:55] Or your local machine if you hate being able to use it for anything else. [21:38:13] Ah, yes. But I don't know how to do that. Maybe I'll need to request access or something. But it can wait :) [21:38:14] It'll process the dump multicore -- assuming there's more than one dump file. [21:39:01] guillom, we should get you on the stat machines. [21:39:07] * halfak looks into how to request access. [21:39:15] halfak: don't worry, I'll look into it [21:39:25] I've never needed the access before so I haven't asked for it [21:39:52] I adhere to the "if I don't need it, I don't want it" access policy. Great for peace of mind :) [21:39:54] When you do, check out this request and model yours after it: https://phabricator.wikimedia.org/T94390 [21:40:01] Thanks!