[02:24:05] <halAFK>	 codezee, how are you reading the file in? 
[02:24:10] <halAFK>	 Can I see your code? 
[03:50:55] <codezee>	 halAFK: sorry for the late reply, was away, here's the paste - https://dpaste.org/T34j
[03:50:57] <codezee>	 its just 4 lines
[12:12:27] <wikibugs>	 10Scoring-platform-team, 10Bad-Words-Detection-System, 10editquality-modeling, 10User-Ladsgroup, and 2 others: migrate bad words detection to editquality repo - https://phabricator.wikimedia.org/T131861 (10jeropbrenda) a:03jeropbrenda
[14:21:22] <wikibugs>	 10Scoring-platform-team (Current), 10revscoring, 10Chinese-Sites, 10artificial-intelligence: Tokenization of "word" things for CJK - https://phabricator.wikimedia.org/T111179 (10Pavol86) @Halfak  I need your feedback on following, per out call last week I did following : 1. make the CJK, jap, kor tokenizer...
[14:24:56] <chtnnh>	 hey halAFK 
[14:25:14] <halfak>	 Hey chtnnh!
[14:25:29] <halfak>	 aha!  codezee, you'll either need to open that file with gzip.open() or you can use our utility that looks for filename extensions. 
[14:26:19] <halfak>	 mwtypes.files.reader(filename)
[14:26:49] <codezee>	 halfak: oh, i see, thanks!  will use the utility :)
[14:27:46] <halfak>	 codezee, when you use mwxml.map(), it uses mwtypes.files.reader() internally.  
[14:27:50] <halfak>	 That's what I figured you were doing. 
[14:28:12] <halfak>	 mwxml.map() is way more useful than working with Dump() directly in most cases. 
[14:28:16] <codezee>	 i see, yeah, before using map, i wanted to explore the data in the dump in a simple loop
[14:28:34] <codezee>	 i'll use map
[14:32:12] <halfak>	 That's a good use of Dump() then!
[14:32:17] <halfak>	 Exploration is a good idea :) 
[14:40:52] <wikibugs>	 10Scoring-platform-team (Current), 10drafttopic-modeling: Compress Gensim models - https://phabricator.wikimedia.org/T247523 (10Halfak)
[14:41:37] <wikibugs>	 10Scoring-platform-team, 10drafttopic-modeling: Fit more topic models into ORES - https://phabricator.wikimedia.org/T249520 (10Halfak)
[14:41:39] <wikibugs>	 10Scoring-platform-team (Current), 10drafttopic-modeling: Compress Gensim models - https://phabricator.wikimedia.org/T247523 (10Halfak) 05Open→03Resolved a:03Pavol86 We new have models that are built using the compressed vectors.  They seem to give us good fitness.
[14:42:59] <wikibugs>	 10Scoring-platform-team (Current), 10Growth-Scaling, 10Growth-Team, 10Serbian-Sites: Scale: ORES topic models for uk, hu, hy, eu, sr (needed as soon as available) - https://phabricator.wikimedia.org/T249382 (10Halfak) a:05Halfak→03HAKSOAT
[14:43:38] <wikibugs>	 10Scoring-platform-team (Current), 10Growth-Scaling, 10Growth-Team, 10Serbian-Sites: Scale: ORES topic models for uk, hu, hy, eu, sr (needed as soon as available) - https://phabricator.wikimedia.org/T249382 (10Halfak) We've managed to compress our vectors and reduce the memory footprint of ORES.  That mean...
[16:18:07] <wikibugs>	 10Scoring-platform-team, 10articlequality-modeling, 10artificial-intelligence: Identify articles that should be de-prod'ed. - https://phabricator.wikimedia.org/T258082 (10Halfak)
[16:18:49] <wikibugs>	 10Scoring-platform-team, 10articlequality-modeling, 10artificial-intelligence: Identify articles that should be de-prod'ed. - https://phabricator.wikimedia.org/T258082 (10Halfak) How would we get some good labeled data for this?  Is there a log event when an article is Prod'ed that we can look for?
[20:36:07] <wikibugs>	 10Scoring-platform-team, 10Research ideas, 10Research-Backlog, 10Wiki-Loves-Monuments, 10artificial-intelligence: General image classifier for commons - https://phabricator.wikimedia.org/T155538 (10Aklapper)