[21:56:04] some fuss over on Twitter today about how Hungarian is a gender neutral language and how Google Translate automatically chooses the gender for you. Unsurprisingly that leads to "he works, he is insensitive" and "she cooks, she is beautiful" etc. How will we tackle that problem? [21:57:05] I saw some similar stuff about turkish and finnish a few days ago [21:57:50] yeah, Finnish is mentioned in the comment thread here: https://twitter.com/DoraVargha/status/1373211762108076034 [21:58:24] I expect the only times it'll be a problem for Abstract Wikipedia is when we're generating prose about a human and the only available source information is written in genderless languages [21:58:26] we won't be generating text based on how common a pair of words is though [21:58:44] For AW the frames won't be generated from free text. So, I guess, the data which entity should be used with which lexeme and which gender will be hopefully more predictable. (re @moebeus: some fuss over on Twitter today about how Hungarian is a gender neutral language and how Google Translate automatically chooses the gender for you. Unsurprisingly that leads to "he works, he is insensitive" and "she cooks, she is be [21:59:27] e.g. source info: Joe Bloggs is an artist, born in year 1996, famous works include A, B, and C [21:59:40] And we need to represent this in abstract Wikipedia [22:01:08] Probably best to handle these with "warning: missing info" when generating the required prose in a gendered language for the first time [22:02:06] we'll need to have a fallback for unknown/unsupported gender anyway [22:02:07] I sometimes see companies and musical groups get gendered when articles are machine translated on English Wiki. Is that related? [22:03:56] "SomeRockBand started in 1998. She released first record in 2000" [22:06:23] like if you have a person who is an author but the gender is unknown, and you want to generate a description of the item, in german you don't know whether to use "Autor" or "Autorin", so the function would need to decide what to output in that case, which might be "Autor/Autorin" or "Autor(in)" or "Autor*in" or just "Autor" or any of the other strategies people have used [22:07:29] and the same goes for pronouns, in english the obvious choice is singular they, in german you run away screaming [22:08:33] I'm a little surprised Google doesn't offer both actually, seems like a cheap and easy solution [22:09:14] probably because they're not looking at grammar and going "oh this isn't gender specific" [22:11:37] but if the whole language is gender neutral, like with Hungarian or Finnish, etc. ? It wouldn't require any deep analysis, just a switch to show the same sentence in two versions. Probably some obstacle I haven't thought about though, there usually is 😊 [22:15:23] as I understand it, it learns different translations and which ones are most common in what context, and then when it finds a word that translates as "strong" and a pronoun that translates as "he", "she", "he/she", it goes "ah this word usually means "he" here"... but it doesn't know that it's a pronoun or that gender is a thing, it just knows that out of the three options, option 1 is most common with this other word [22:21:11] I think you're right, that's how they do it, most of the time. Don't know about Google, but Bing (Microsoft Translator) will cheat a little too, probably to cover more ground. That can lead to some weird ass translations: https://twitter.com/exmusica/status/1322307143459086338?s=20 [22:21:34] https://twitter.com/exmusica/status/1322307801797042176?s=20 [22:23:29] oh dear [22:24:27] (I don't normally write angry letters to corporate accounts on Twitter, must have been tired that day😂) [22:29:15] what (re @moebeus: https://twitter.com/exmusica/status/1322307801797042176?s=20) [22:32:23] always double and triple translate, if you want to know what it really says 🙃 https://twitter.com/exmusica/status/1322310204185288704 (re @Sannita: what)