[17:16:49] halfak: o/ What python library do you recommend for de-normalizing MW titles? Say I have "Early_Life" and I want to get "Early Life" from it. [17:17:24] I found out that mediawiki-utilities does normalization, but not the other way around. [17:17:48] http://pythonhosted.org/mwtypes/ [17:17:58] Maybe that's where such a thing should live. [17:18:14] Relevant: https://www.mediawiki.org/wiki/Mediawiki-utilities [17:20:16] halfak: thanks! [17:42:34] tizianop: o/ you there? [17:42:48] bmansurov: yes [17:43:16] tizianop: which XML dumps are you using for generating the category network? [17:43:55] Here's a list for example: https://dumps.wikimedia.org/uzwiki/20180201/ [17:45:13] tizianop: also, are you using https://github.com/attardi/wikiextractor to extract data from dumps? [17:46:35] bmansurov: recomputing the recommendations now, you will have them in around 1:30 hour [17:46:55] dsaez: cool [17:48:49] bmansurov: For the taxonomy provided, I'm not sure how the data is extracted. I sent an email to ask for more information [17:49:35] tizianop: great, thanks! I've updated the ehterpad with a JS library for denormalizing titles, no luck with Python yet. [17:49:55] tizianop: let me know when you hear back. Maybe the dumps already contain denormalized titles. [17:50:56] bmansurov: I think it's better to use the SQL version. I found that some categories' parent are generated by a template that in the XML version is not expanded [17:51:27] tizianop: makes sense and in SQL titles are already normalized imo [18:06:22] bmansurov: If you have to write the Python code for denormalization, maybe have a look at pywikibot? It does handle titles, but not sure what the code looks like. [18:07:03] Nettrom: OK, thanks for the pointer! [18:07:17] yw :) [18:07:21] gotta run, back a bit later [22:50:45] hey bmansurov. sooo, any update I should be aware of from the meeting with Tiziano et al? (sorry I just realized that you're probably out by now)