[14:39:56] o/ [14:40:04] Just in for about an hour this morning. [14:51:27] o/ Amir1 [14:51:36] Can we close this: https://github.com/wiki-ai/revscoring/pull/163 [14:51:44] It seems like we might not need it anymore. [14:51:56] But there might be something to salvage. :) [14:52:12] o/ halfak I started harvesting features [14:52:17] Woot! [14:52:21] How's the speed? [14:52:38] 794K already done [14:52:51] 13 secs per 1K revs [14:53:11] That's better than expected. Seems like we should just pursue this strategy. [14:53:21] BTW, did you also detect reverts? [14:53:30] It would be nice to do both in one pass. [14:53:38] But not 100% necessary [14:53:54] I knew the biggest speed holder is writing in files, so I tried to write in file in batch of 1K revs, it got 10 secs fasterper 1K revs [14:54:24] Really? Python should handle batching. [14:54:34] What file writer are you using? [14:54:50] halfak: I added rev_id in the last part of features, so we can track of all of them [14:54:57] codecs [14:55:04] python2! :P [14:55:19] no, python 3 [14:55:26] I use codecs so much that I forgot to use other libraries [14:55:31] Oh. Huh. [14:55:46] Yeah python3 has TextIOWrapper thingie. [14:56:21] Also, I'd just use the std. file writer with an encoding [14:56:35] e.g. open('utf8.txt','w',encoding='utf8') as f: ... [14:57:50] anyway, can you look at that PR and tell me if we still need it? [14:57:53] Amir1, ^ [14:58:12] hmm, okay [14:58:17] I think you'd ask me to review it [15:01:23] I did. [15:02:09] halfak: we still need this change but definitely it needs modifications [15:02:13] 1- rebase [15:02:49] A rebase might not even be possible. This change was pushed before the great refactoring of language into features sets. [15:02:57] The Great Refactoring(TM) [15:03:29] Then again, I suppose it's some additional stopwords, so there's still a file for that and I think it still has history./ [15:04:14] :D [15:05:45] Thanks! [15:05:56] * halfak continues mopping up. [15:12:16] we need those informal words [15:12:22] I try to mae another PR [15:12:27] *make [15:13:59] Thanks [15:14:52] If you could clean up the formatting a bit too, I'd be very grateful. I'm very worried about breaking something because of my inability to remember a sequence of non-latin symbols. [15:15:07] I spent a lot of time looking at it all very carefully when I got it to pass flake8 [15:33:31] halfak: Hi! I think we can talk about deploying mw-ext-ORES, once the outstanding patches are merged. I'm over the cache hump, and the RC view is mostly correct now. [15:33:50] Hrm, two remaining concerns though, I'll comment on the task. [23:27:21] Gonna point "grrrit" at this channel, for parity with the github bot. Objections?