[13:59:57] hi to all [14:09:57] bachounda: hello. Nice to see you here :) [14:10:25] lol [14:25:02] hasharAway, Krinkle|detached: no CI meeting today? [14:26:27] legoktm: we have the Language Engineering here in a few minutes. Was the CI meeting supposed to be on this channel? [14:26:36] office hour* [14:27:17] arrbee: apparently we're not meeting today. [14:27:29] legoktm: okay [14:30:18] #startmeeting Language Engineering monthly office hour - May 2015 [14:30:36] hmm.. no meetbot [14:30:48] anyways [14:31:07] Hello and welcome to the monthly office hour of the Wikimedia Language Engineering team [14:31:13] * arrbee is Runa [14:31:44] Hi arrbee. [14:31:50] with me today are my team mates aharoni santhosh pginer Nikerabbit [14:31:56] kart_ would be joining soon [14:31:59] hey Niharika [14:32:23] Congratulations to the team for the awesome work on CX project. \o/ [14:32:40] Before we begin, please note that this chat will be publicly logged [14:32:48] Niharika: Thank you :D [14:32:58] Haumьhьƣьđ [14:33:05] * arrbee waits to see if aharoni has a special greeting for today [14:33:15] Of course I do. [14:33:19] :D [14:33:51] So our last office hour was on February 18th [14:33:52] That's "hello" in Bashkir (the old https://en.wikipedia.org/wiki/Ya%C3%B1alif orthograhy). [14:34:06] logs are at: https://meta.wikimedia.org/wiki/IRC_office_hours/Office_hours_2015-02-18 [14:34:29] (in case you missed them) [14:34:41] hello everybody! [14:34:54] hello froskos [14:35:39] * arrbee waves around to people who I am assuming are here for the office hour today - bachounda TarLocesilion :) [14:35:48] hey [14:36:09] I hope this time is more convenient than what we were using before [14:36:46] So we could not follow up with the March and April meetings due to too many things [14:36:58] We also had our quarterly review [14:37:07] http://meta.wikimedia.org/wiki/WMF_Metrics_and_activities_meetings/Quarterly_reviews/Editing,_Collaboration_and_Language_Engineering,_April_2015 [14:37:24] You can find the slides and minutes on the page linked above [14:38:09] Also, if you haven't seen the announcement already, the Language Engineering team is now part of the bigger Editing team in WMF [14:38:36] More details here on the FAQ: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Engineering_reorganization_FAQ [14:39:04] At the moment there is no major changes in the way we are working [14:39:31] We are still available on the same IRC channels, mailing lists, talk pages, phabricator etc. [14:39:57] Our main focus for this month and the next is still Content Translation [14:40:02] hello. [14:40:08] hi kart_ [14:40:27] n our last published blog post, we had posted some figures in terms of how Content Translation was being adopted [14:41:04] here is the link http://blog.wikimedia.org/2015/04/08/the-new-content-translation-tool/ [14:41:47] Since then we have observed an exponential growth in terms of users adopting the tool to publish articles [14:42:07] For instance, between January to March around 700 articles were published [14:42:26] and from April 1 to today its more than 1500 [14:42:39] which is double of what we had seen in the first 3 months [14:43:37] Its quite overwhelming for us and we are constantly exploring new ideas on how this can be made more efficient and reach more people at the same time [14:44:19] Content Translation aka CX is now deployment on 44 Wikipedias... as an opt-in beta-feature [14:44:52] And like with most things, we are also getting new bugs or usage issues being reported [14:45:26] for instance, recently we have seen quite a few problems related to publishing or saving of articles [14:46:00] The reasons have been varied but the reports have been very helpful in finding the causes [14:46:23] Some of these errors are still being investigated [14:46:49] santhosh: would you like to add more here? [14:47:13] nope [14:47:33] okay [14:47:47] I shall add that we improved our logging, so we can now investigate the failures better. [14:47:47] legoktm: no ci meeting indeed sorry : [14:48:01] and we are thankful for each report from our users. [14:48:12] Generally we get to know about these errors through comments on the talk page and phabricator [14:48:18] legoktm: I had an appointment. I screwed up and should have sent the ci meeting minutes ages ago as well as a remember message that today meeting was cancelled [14:48:30] legoktm: sorry you potentially had to wake up earlier :/ [14:48:34] And yes, its really very helpful like aharoni said [14:49:40] * arrbee wonders if anyone is here who faced some of these publishing errors [14:49:42] hashar: welcome to the Language Engineering Office hour! :) [14:51:14] kart_: sorry :/ [14:51:17] okay, so earlier during the day there were some queries coming in from the Polish Wikipedia community [14:51:24] hashar: lol.. no worries [14:51:48] yep, we have some ideas of improvement [14:52:09] TarLocesilion: yay.. please go ahead [14:53:46] TarLocesilion: you may be happy to hear that we fixed the <Ś> letter issue earlier today ;) [14:53:46] for context [14:53:46] aharoni: yes, I know :D [14:53:46] The Alt+s shortcut currently gets in the way of typing the character ś [14:53:46] https://phabricator.wikimedia.org/T98126 [14:53:46] and [14:53:46] https://phabricator.wikimedia.org/T98153 [14:53:46] * arrbee clicks [14:53:58] the first is : Add "Translated page" template [14:54:16] first, many wikis have templates which inform users: "this page contains translation" [14:55:00] it'd be really, really nice if CX added such templates automatically. [14:56:01] santhosh: aharoni ^^ [14:56:01] such templates help power users to control all pages translated from a specific language [14:57:04] and ofc, are important for those readers who dare to read talkpages. #copyright etc. [14:57:27] TarLocesilion: it's probably similar to https://phabricator.wikimedia.org/T96935 [14:57:51] copyright is handled through the edit summary (it was actually approved by the legal team ;) ) [14:57:58] but I guess that it's OK to add the template, too [14:58:21] When will have any dictionary or corpus or support for Content Translation tool? [14:58:56] A much better solution would be to have true page metadata, something that a lot of projects have been requesting for a long time, but till then we can probably add templates. [14:59:09] aharoni: yep, but you know, info only in edit summary is kinda minimalism ;) [14:59:12] true. [14:59:27] Content Translation has dictionary support. But for every language pair, it is difficult to find a free licensed good quality dictionary. If we find one, we would be happy to add. [14:59:44] Another issue is that such templates work differently in different languages, but we can try to cope with that, too. (I'd love to see true cross-project collaboration between the editors in different languages here :) ) [15:00:09] Pavanaja: as santhosh_ says, the technical side is already implemented, but we need the data for every language. [15:00:16] We have it for es-ca , ca-es, ca-en pairs [15:00:32] I mean we have dictionary support for above pairs [15:00:32] aharoni: okay.. i am kind of trolling here about this template so please feel free to shoo me away... what happens if an article with that template is translated through CX? [15:00:53] arrbee: it's intended for the talk page, IIUC [15:00:58] ahh ok [15:01:22] yes, now I recall TarLocesilion saying so [15:01:26] If I'm not mistaken, we have a dictionary for Spanish-Portuguese, too. [15:01:34] In any case, we need the data. [15:02:08] Pavanaja: if you have any connections with academic institutions or other projects that provide dictionary files, we'd love to get in touch with them and integrate them. [15:02:17] We can probably create a provision to add such template, but must be opt-in and configurable per wiki. We have a provision to add a category if we see too much machine translation. [15:02:24] Currently not used in any wiki [15:03:10] Pavanaja: We saw the updates about Konkani Wikipedia. Do you think having CX in there would be helpful? Are they translating the articles from any other languages? [15:03:47] TarLocesilion: We still have one more phab ticket of yours [15:04:02] yes, https://phabricator.wikimedia.org/T98153 -- a hidden category [15:05:13] TarLocesilion: contenttranslation tag may be useful too :) [15:05:25] which will show all articles created by CX [15:05:29] kart_ gotya. no. [15:05:35] not all. only recent. [15:06:05] TarLocesilion: oh that's true. thanks for correcting. [15:06:15] and that's very important. [15:07:20] because when you have a lot of translations (and apparently that's the upcoming fact on many wikis -- see stats) you can find only few articles translated with CX. [15:07:23] santhosh_: thoughts? [15:08:15] There are two ways: The tag filter page for example https://ca.wikipedia.org/w/index.php?title=Especial:Canvis_recents&tagfilter=contenttranslation allows you to choose how many to see and days [15:08:32] and here comes the traditional question asked by established communities who forgot the rocket period -- "what about quality"? [15:08:42] Second way is, we provide an API to list all published articles across any languages with more details [15:09:02] That is documented at https://www.mediawiki.org/wiki/Content_translation/Published_translations [15:09:08] TarLocesilion: Tricky question, isn't it? :) [15:09:36] that's why we care about ALL input by newbies. [15:10:27] and it's not hard to imagine how easily CX may be used to autotranslate random pieces of content. [15:10:33] https://www.mediawiki.org/wiki/Content_translation/Abuse_prevention has notes about approaches we do about quality [15:11:38] and that's also great. [15:11:39] From our analytics so far, such abuse or low quality articles created and then deleted are relatively too small [15:12:30] TarLocesilion: To be more precise: I checked three days ago, and of 2000 articles that were created using ContentTranslation, only about 60 were deleted as "bad translation" or "vandalism". [15:12:43] (In all languages.) [15:13:40] that is a very good rate [15:14:09] it highly depends on inclusionism/deletionism. my community doesn't only aim to control "new articles to be speedy deleted", but all of them. [15:14:32] 60 - yep, it's great. [15:15:12] TarLocesilion: you mean delete anything - old or new, that looks sub-standard? [15:15:27] delete. move to sandbox. etc. [15:15:33] okay [15:17:03] Essentially, in terms of vandalism CX doesn't add any capability that is different or more than how a new article can be created [15:17:33] but yeah, these are valid concerns about how individual communities operate [15:17:41] TarLocesilion: well, that makes sense - it can happen in any language that a bad article hides in the corner for years :) [15:18:02] we are quite similar to dewiki in that point. for instance, eswiki allows users to create articles without sources, or only with a list of sources. [15:18:49] @arrbee: I am not aware of Konkani people translating from other language Wikipedia. I guess their articles are based on the Konkani Encyclopaedia released under CC by Goa University [15:18:50] * arrbee recently heard about a fictitous article on enwiki about some event that never happened and it went on to become a featured article [15:18:58] I need to find out about that [15:19:43] Pavanaja: oh ok, [15:20:18] @aharoni: I am a member of the Kannada Software Commitee of Govt of Karnataka. Creating a corpus is one of our agendas. We can definitely work together [15:20:29] hello again [15:20:32] aharoni: I'm in a difficult position right now, because I'm here to show the state of a restrictive community with very restrictive barriers for article creation. [15:20:35] arrbee: War happen in Goa that never happened :) [15:21:16] Pavanaja: CX is currently available on Kannada, Gujarati and Punjabi Wikipedia. In case you can come across information related to resources for these languages we would really appreciate it. [15:21:45] @arrbee: I am aware of the availability of CX for KN WP [15:22:08] Pavanaja: Any feedback for us? :) [15:22:24] Pavanaja: also, try CX :) [15:22:32] Pavanaja: awesome [15:23:05] CX currently works only for totally new article creation by translation. There are many articles in KN WP which are more like stubs, having the same article in EN WP. But we can't use CX to improve those articles [15:23:16] lol [15:23:26] yes, thats a feature we all want to see soon [15:23:46] but its not planned yet [15:23:51] Pavanaja: technically you can. Currently you can overwrite that one line articles by big articles [15:24:07] TarLocesilion: I imagine - I am very much a Wikipedian myself, with a lot of experience in English, Hebrew, and Russian, and I know about the hard control of new articles. You know, you can create bad new articles without ContentTranslation :) [15:24:16] CX gives a warning, but that does not stop [15:24:32] yep , it suggests changing the name of the article [15:24:33] If anything, I think that creating good new articles is easier and more likely with ContentTransltaion. [15:24:53] That's at least what we see from the data from other languages. [15:25:15] Quick timecheck.. we have about 5 mins left of the hour [15:26:45] TarLocesilion: We will keep updating you through the phab tickets. Thanks a lot for filing them. It really helps us!! [15:27:12] aharoni: hi i search farmer [15:27:39] lets wrap up from here now, I am not sure if there is any other meeting scheduled on the channel :) [15:27:51] bachounda: I remember you! :) [15:28:15] Thanks a lot everyone for coming today. [15:28:26] aharoni: bachounda from algeria wikimania london [15:28:47] good night! [15:29:00] Our next office hour is planned for June 10th, same time and same place [15:29:12] #endmeeting [15:29:13] good morning/afternoon/evening/night! [15:29:21] froskos_: :) [15:29:41] * arrbee will post logs in a few minutes [15:35:04] https://meta.wikimedia.org/wiki/IRC_office_hours/Office_hours_2015-05-05 [15:35:24] +1 :)