[15:48:44] * halfak is looking into spaCy tokenization. https://stackoverflow.com/questions/51012476/spacy-custom-tokenizer-to-include-only-hyphen-words-as-tokens-using-infix-regex [15:48:46] Interesting stuff. [15:49:07] Looks like I can address our issues re. diacritics with an infix regex. [16:49:03] \o/ kevinbazira finally was granted an IRC cloak! Woo [16:54:38] \o/ woohoo thanks for helping with this halfak :) [16:55:19] That shouldn't have taken so long. Oh well. [17:01:34] Running late for standup. Stuck in a manager meeting. [17:01:40] accraze, kevinbazira: ^ [17:02:41] no worries halfak, we'll fill out the current work etherpad while waiting [17:22:27] wikimedia/editquality#703 (add_eswiki_badwords+informals_to_cawiki - 1e70b62 : Aaron Halfaker): The build passed. https://travis-ci.org/wikimedia/editquality/builds/613606203 [18:47:23] Ooof. So many meetings. [18:47:45] Wooo! Looks like the cawiki stuff got merged. I'll be looking into a deployment window next :) [18:47:55] ... after lunch [19:23:16] wow, just figured out the MW api sandbox seems to break if you have any uncommented print statements in your code...TIL :) [22:44:24] So many emails. [23:20:19] OK. That's enough email for today. Saving some for tomorrow morning. [23:20:22] Have a good one, folks. [23:26:47] later halfak