[17:15:28] <bmansurov>	 leila: o/ here are some results from yesterday's discussion: https://github.com/wikimedia/research-translation-recommendation-models/blob/master/wikidata_item_similarity.ipynb
[17:16:08] <bmansurov>	 leila: they look promising, but we need language specific stop words for this to work well.
[17:20:01] * leila makes a note to check the page bmansurov gave in 2 hours. she will report back.
[17:20:56] <leila>	 In the mean time, I just learned that our ex-colleague Kaity Hammerstein is working on http://publiceditor.io/ which is a really neat project that can change the future of fact-checking and verification on the web.
[17:33:17] <bmansurov>	 leila, dsaez: o/ In the synonyms spreadhseet, I see we have 3 sheets per language. What's the difference between them?
[17:33:32] <bmansurov>	 Which ones should I ask users to fill out?
[17:38:42] <halfak>	 bmansurov, we have lots of stopwords assets in ORES. 
[17:39:10] <bmansurov>	 halfak: great. Do you have the list of languages?
[17:39:20] <bmansurov>	 leila: dsaez: Also I'm looking at the list of synonyms, it's obvious that some of the items are not synonyms at all. Do we need experienced editors to label them too? I could go ahead do the easy ones myself.
[17:39:31] <halfak>	 https://github.com/wikimedia/revscoring/tree/master/revscoring/languages
[17:39:58] <halfak>	 Most of those languages have stopwords.  We have a process for acquiring stopwords for new languages (semi-automated -- requires human review)
[17:40:24] <bmansurov>	 halfak: I see. So over time the list will increase?
[17:40:27] <halfak>	 I'd be interested in splitting these language assets from revscoring too.  Seems like they would be generally useful. 
[17:40:47] <halfak>	 Yes.  As we get interest from new wiki communities, the first thing we ask for is help generating these language assets. 
[17:40:53] <bmansurov>	 halfak: agree. We could also use a plain text format.
[17:41:51] <halfak>	 bmansurov, some of the assets are in the form of regex.  Others use curated external resources (e.g. enchant dicts) so it is nice to have a python library. 
[17:42:05] <bmansurov>	 I see
[17:43:38] <halfak>	 We can certainly split the plaintext bits from the complex bits if that would be worthwhile. 
[17:43:53] <halfak>	 I'd love to make it easier for Wikipedians to curate and expand the lists. 
[17:44:50] <bmansurov>	 halfak: yes something like this would be useful: https://github.com/apache/spark/tree/master/mllib/src/main/resources/org/apache/spark/ml/feature/stopwords
[17:46:34] <halfak>	 Makes sense.  I wonder if we could even automate pulling such data from a wiki page. 
[17:47:11] <halfak>	 Either way, I agree re. text files.  In the mid-term, I'd be happy to review some changes to revscoring to bring this to life.  
[17:47:23] <halfak>	 In the long-term, I'll try to find time to do it myself. 
[17:50:02] <bmansurov>	 halfak: ok, I'll help out with the task
[17:54:05] <halAFK>	 Thank you!
[17:54:07] * halAFK --> Away
[19:06:32] <leila>	 bmansurov: I'd recommend not labeling it yourself. It's best if experienced editors do it, to avoid potential issues with labels and language norms. :)
[19:07:14] <isaacj>	 is there a wikitech or related page for the microsurveys that are run on wikipedia? for instance describing what parameters can be considered in the randomization of who receives a link etc.
[19:08:20] <leila>	 isaacj: bmansurov is your first friend there ;) but in the meantime: https://www.mediawiki.org/wiki/Extension:QuickSurveys
[19:09:47] <isaacj>	 oh awesome, thanks!
[19:21:20] <isaacj>	 bmansurov: is there anyway to target microsurveys to specific articles or is it only project-level random sampling at this point?
[20:08:21] <bmansurov>	 isaacj: yeah pretty much random if I remember it correctly
[20:15:21] <isaacj>	 bmansurov: thanks but drat. so if we wanted to survey a sample of people who all read the same article, would you say that's a simple change or much more involved change in quicksurveys?
[20:18:37] <bmansurov>	 isaacj: I don't think it will be a simple change.
[20:19:08] <bmansurov>	 Quicksurveys needs a major change, it was not designed with the requirements you mention in mind.
[20:20:06] <isaacj>	 bmansurov: mmkay, thanks. i will revisit the idea if it seems especially important but in the meantime, i'll think of alternative strategies then
[23:10:21] <leila>	 HaeB: can you add me back to the weekly research meeting? :D
[23:10:45] <HaeB>	 oh, did you drop off the event?
[23:10:49] <leila>	 HaeB: I just realized that I don't see it in my calendar anymore (unless it was intentional and all those who didn't attend for some time were kicked out, which is fair;)
[23:11:14] <HaeB>	 yeah you have to commit to a 50min presentation to get back in again ;)
[23:12:16] <leila>	 if anyone is willing to listen, sure! count me in!
[23:12:20] <leila>	 HaeB: ^
[23:12:42] <HaeB>	 added you in (from next week on) 
[23:12:49] <leila>	 \o/ Thank you!
[23:12:53] <HaeB>	 ...but now i'm wondering who else got dropped, and when