[14:50:24] o/ all [14:53:25] hey Nettrom :) [14:53:52] hi Nettrom [16:15:04] so is anyone coming, orrrr [16:21:05] I can neither see nor hear you guys [16:21:28] well, you get another monkey that can't talk, and then... [16:22:11] * Ironholds slaps YuviPanda [16:22:21] I'm writing C++ and benchmarking. You think I have a sense of humour?! ;p [16:22:37] at least you're not writing fortran [16:22:49] my language is a wrapper around fortran [16:22:50] I don't need to [16:22:59] heh [16:58:56] Ironholds: re-join the mini-hack hangout? [16:59:08] DarTar, sure. Give me 2 minutes? [16:59:30] sure [17:14:02] halfak: hi! [17:14:08] Hey Helder :) [17:14:36] I see that we have a couple of new volunteers in th revscores proposal. [17:14:46] :-) [17:15:19] Do you know if it is possible to use mw.xml_dump with a compressed xml file? (e.g. a pages-meta-history.xml.7z) [17:16:15] halfak: i.e. extracting the file while processing it [17:16:17] Yup [17:16:25] That's what it is designed for. [17:16:31] It reads the file extension. [17:16:36] And decompresses for you. [17:16:47] hm... maybe I missed something [17:16:50] 7z is a little weird. You'll have to install the utility on the machine you are processing with. [17:17:11] ah, let me check if have that installed [17:18:52] halfak: take a look into this: [17:18:52] https://gist.github.com/he7d3r/f99482f4f54f97895ccb [17:19:06] do you see anything which could obviously be optimized? [17:19:36] I would like to run it with a ptwiki dump, but last time I tried with a smaller one (from ptwikiversity) it took ~1 hour [17:19:45] from mw.xml_dump import open_file [17:20:37] Iterator.from_file(open_file("xmldump.xml.7z")) [17:40:01] halfak: I posted the list of profanity lists on meta so we don't forget them: [17:40:01] https://meta.wikimedia.org/wiki/Research_talk:Revision_scoring_as_a_service#Badword_lists [17:41:05] Awesome. :) Sorry to be a little absent. I'm in meetings all day today. [17:43:25] no problem :-) [18:42:48] Helder, just saw lists of badword lists. That's freaking awesome to see all together. [18:43:05] * halfak looks forward to expanding it. [18:44:53] :) [18:45:24] Please expand it if you know about other lists [20:26:12] alright, given that we're now in minute 25 of "I've been sat in a google hangout waiting for everyone else" I'm going to go out to the shops or something [20:26:27] if we've decided to change it to "get out at 1:00, come back at 2:30", someone needsto update the calendar entries. [20:47:27] Sorry Oliver. We forgot to start up the call. [20:47:30] :( [20:47:45] That's super lame. I feel the appropriate amount of bad. [20:47:51] We're in the call now. [21:43:19] halfak, hey dude.I'm around if people are talking about stuff I can help with. [21:43:57] We're in small groups working. I'm working with ottomatta on hadoop streaming. [21:44:07] Dario is off working on mobile. [21:44:28] Devs & toby are in an interview. [21:44:30] yes, I mailed Ironholds, I should have copied the rest of you [21:45:17] kk [21:45:28] * Ironholds headscratches [21:45:41] I mean, unless you guys fancy debugging RJDBC'sinteractions with our hive instance.. [21:48:22] in the absence of that I'll pop off for a while; coming up to 6pm and I've been working since 9 [21:48:24] * Ironholds waves