[01:13:01] Hey folks! We have a mailing list: https://lists.wikimedia.org/mailman/listinfo/ai [01:13:03] Sign up!@ [01:13:11] lzia, ^ [01:13:27] ToAruShiroiNeko, ^ [05:01:59] Refactored features and datasources complete and tested. [05:03:40] Now I just need to finish the languages refactor (mixins) and re-implement the basic datasources in APIExtractor. [05:03:46] g'night! [13:52:13] o/ halfak [13:52:15] http://pastebin.com/1GFAMSzk [14:00:44] halfak: https://github.com/wiki-ai/editquality/pull/4 [14:00:46] Also this [14:03:10] lzia: hey, if you're around. I built this based on your recommender system :) http://tools.wmflabs.org/dexbot/tools/recom.php?user=Ladsgroup [14:22:40] very nice, Amir! :-) [14:22:49] happy to see that it's already being used. [14:23:55] Amir1: are you pulling the randomly generated articles from en -> fa? [14:24:41] (brb, need to eat something and get ready to catch the train) [14:25:47] no, I get last 20 articles of the user and recommend based on that [14:25:55] I need to go too [14:25:57] ah! got it. [14:26:04] o/ [14:26:07] we'll chat later, Amir. ciao ciao [14:27:39] ciao :) [15:56:51] o/ halfak [15:56:57] Hey Amir1 [15:57:05] Saw your pull but haven't reviewed yet [15:57:17] great [15:57:19] no rush [15:57:25] I tested some parts [15:57:54] Looks like you changed the call signature. It turns out that mwxml can process many XML files in parallel with the map() function. [15:58:11] So you can have ... and then pass the list of paths to mwxml.map() [16:00:52] hmm, let me take a look at it [16:01:22] Amir1, http://pythonhosted.org/mwxml/map.html [16:01:54] Note that you can pass paths to the mapper are compressed. [16:02:10] The mapper will handle raw xml, gz, bz2 and 7z [16:02:27] It will call out to p7zip on the system for 7z though so that'll need to be installed. [16:03:02] https://github.com/mediawiki-utilities/python-mwcli/blob/master/mwcli/files/functions.py [16:03:57] okay :) [16:04:03] I will make it work :) [16:04:20] in several hours but right now I need some rest [16:04:33] once it's merged I need to start writing the paper [16:05:06] halfak: overall, do you like it? [16:05:35] Yes. I'm kind of thinking of just taking the next pass on it myself. [16:05:42] Amir1, what do you think? [16:06:27] good :) [16:06:36] I like it [16:06:58] I'll finish this [16:07:01] then you start [16:09:07] OK sounds good. [16:25:23] halfak: I'm trying to write that part but it seems I can't send arguments to process and in callback [16:25:25] https://github.com/mediawiki-utilities/python-mwxml/blob/master/mwxml/map/map.py [16:25:40] should I change it works or change this file ^ [16:25:48] *the way [16:25:55] Amir1, what is the problem? I'm confused. [16:25:59] that function isn't a callback. [16:26:03] It yields out values. [16:26:18] def page_info(dump, path): [16:26:31] I want to send arguments to this function [16:26:46] Oh. Just use a clojure. [16:26:54] *closure [16:27:18] Variables that are in scope when you define the function continue to be available when it is called. [16:28:29] https://en.wikipedia.org/wiki/Closure_(computer_programming) [16:29:16] I used closures in javascript before [16:29:20] but never in python [16:29:26] I can understand how it works [16:53:08] halfak: I'm done now :) check the new amend on my commit [17:25:19] Will do [17:26:13] Amir1, did you test? It looks like that won't work. [17:28:21] Yeah, I tested it [17:28:38] halfak: how did you test it? [17:28:47] testing all aspects of the code is hard [17:28:57] Did you run it on multiple dump files? [17:29:27] So, the dump_processor function is what gets parallelized. [17:30:15] See an example dump processor here: https://gist.github.com/halfak/c7a6bb267fcefb3aa14c [17:30:30] You want to yield a minimal amount of data from the processor function. [17:30:33] brb [17:52:34] hmm, when I give two dumps and it got crazy [17:53:15] I know I should yield processed data but I couldn't make it happen [17:53:24] can you give it a try halfak? [17:53:38] Will do :) [17:54:51] thanks [18:42:12] SpaceX lands a rocket! [18:42:14] http://www.nytimes.com/2015/12/22/science/spacex-rocket-landing.html?_r=0 [18:42:17] huzzah! [18:48:01] Pretty cool, I agree :) [18:51:14] * aetilley arrived in PA. Should be on later. [19:37:06] halfak is this the future of commuting? [19:38:00] Hmm. Probe-ably not. [19:46:54] Also, I've learned that mixins are stupid and confusing. [19:46:57] So none of that [19:47:00] More namespaces! [19:52:33] Yes! More namespaces! Project! File! Module! Schema! Topic! [19:53:57] heh. [19:54:03] So I'm making namespaces for classes of features. [19:54:25] So we have bytes, temporal, wikibase, and wikitext [19:54:43] wikitext has some sub-namespaces: edit, parsed and tokenized [19:55:08] These are not wiki namespaces, mind you