[13:47:43] Point to consider: [13:48:01] Rex ate a hot dog [13:48:01] Rex is a hot dog [14:54:37] That should be easy: there's a difference between a dog with the adjective hot and the compound noun phrase hot dog, which in most languages besides English will render very differently. [15:08:07] What is the problem, understanding this correctly or generating this? Generating this doesn't sound difficult at all. I never studied NLP properly, didn't even get close to that. But my simple intuition tells me that automated generation of language from well-defined data and rules is far easier than automated understanding of language written by humans. [15:08:07] Also, if we are talking about Wikipedia, it's not so likely that these particular phrases would have to be generated, but there could be other examples of ambiguity. (re @Csisc1994: Rex ate a hot dog [15:08:09] Rex is a hot dog) [15:09:30] +1 (re @amire80: What is the problem, understanding this correctly or generating this? Generating this doesn't sound difficult at all. I never studied NLP properly, didn't even get close to that. But my simple intuition tells me that automated generation of language from well-defined data and rules is far easier than automated understanding of language written by humans. [15:09:31] Also, if we are talking about Wikipedia, it's not so likely that these particular phrases would have to be generated, but there could be other examples of ambiguity.) [15:09:50] The understanding is the matter I have raised. [15:10:37] AFAIK, Abstract Wikipedia is about generating and not understanding :) [15:11:09] If it was about understanding, I wouldn't be so interested in it. Maybe I would even be actively opposed, depending on the details. [15:13:56] As I said, I never studied NLP properly. Also, even though I worked as a software engineer for many years, I never studied Computer Science, Machine Learning, or Artificial Intelligence properly. But I nevertheless think I have a pretty good feeling for what things should be given to computers and what things should be kept in humans' control. [15:15:33] The understanding part is also important, the stress issue raised weeks ago explains why it can be useful. [15:15:38] I am super-cynical almost every time I hear an engineer, an entrepreneur, or a politician saying that "such and such thing will be done by AI in a few years and humans won't be needed". Driving—maybe in some cases. Image recognition—maybe in some cases. Art—never, absolutely never. Understanding language—mostly not; at most as a far-from-perfect translation aid. Language and art are for humans. [15:16:00] +1 (re @amire80: I am super-cynical almost every time I hear an engineer, an entrepreneur, or a politician saying that "such and such thing will be done by AI in a few years and humans won't be needed". Driving—maybe in some cases. Image recognition—maybe in some cases. Art—never, absolutely never. Understanding language—mostly not; at most as a far-from-perfect translation aid. Language and art are for humans.) [15:16:06] I agree. This is another fact. [15:17:35] What I mean is that automated understanding even partial is needed to express intonation in the text to spreech systems. [15:17:44] (Art includes painting, sculpting, movies, theater, literature, music, games.) (re @amire80: I am super-cynical almost every time I hear an engineer, an entrepreneur, or a politician saying that "such and such thing will be done by AI in a few years and humans won't be needed". Driving—maybe in some cases. Image recognition—maybe in some cases. Art—never, absolutely never. Understanding language—mostly not; at most as [15:18:57] In fact, if we can automatically determine intonation, we can automatically have a vocal edition of Abstract Wikipedia. [15:19:28] Yeah... I worked for some time in a company that did text to speech for Hebrew, which is relatively more difficult for Hebrew than for many other languages, because of the very weird spelling. I remember making a lot of fixes for stress in the dictionaries. It only worked well when I did it manually, and it's just impossible to do it perfectly for all the texts in the. (re @Csisc1994: What I mean is that automated understandin [15:19:34] Art also includes architecture... (re @amire80: (Art includes painting, sculpting, movies, theater, literature, music, games.)) [15:19:42] Yes. (re @Csisc1994: Art also includes architecture...) [15:20:25] Same for all the Semitic languages. (re @amire80: Yeah... I worked for some time in a company that did text to speech for Hebrew, which is relatively more difficult for Hebrew than for many other languages, because of the very weird spelling. I remember making a lot of fixes for stress in the dictionaries. It only worked well when I did it manually, and it's just impossible to do it perfectly for all the texts in the.) [15:21:39] Ah, that's actually quite possible, and much easier to do than for human-written text! As long as the link between the each lexeme and its meaning is preserved, of course. Will require some improvements in Wikidata Lexeme, but totalle doable. (re @Csisc1994: In fact, if we can automatically determine intonation, we can automatically have a vocal edition of Abstract Wikipedia.) [15:21:52] Ah, that's actually quite possible, and much easier to do than for human-written text! As long as the link between the each lexeme and its meaning is preserved, of course. Will require some improvements in Wikidata Lexeme, but totally doable. (re @Csisc1994: In fact, if we can automatically determine intonation, we can automatically have a vocal edition of Abstract Wikipedia.) [15:22:09] +1 (re @amire80: Ah, that's actually quite possible, and much easier to do than for human-written text! As long as the link between the each lexeme and its meaning is preserved, of course. Will require some improvements in Wikidata Lexeme, but totally doable.) [15:22:55] That's one of the many good things in AW, as opposed to Lsjbot. [15:23:19] What improvements do you have in mind? (re @amire80: Ah, that's actually quite possible, and much easier to do than for human-written text! As long as the link between the each lexeme and its meaning is preserved, of course. Will require some improvements in Wikidata Lexeme, but totally doable.) [15:23:22] Absolutely agree. (re @amire80: That's one of the many good things in AW, as opposed to Lsjbot.) [15:27:05] The number 1 most important improvement for Lexeme: a built-in common framework for generating forms. Last time I checked, forms were uploaded by bots, and there was no convenient way inside Wikidata to display as a nice table. (There is an external tool that can display such things, although I forgot its name. Surely there's someone here who can remind me?.. And in any case, it really should be built-in.) (re @mahir256: What [15:27:58] (I guess I had meant in the context of determining intonations, in response to which you brought up the idea of improvements.) (re @amire80: The number 1 most important improvement for Lexeme: a built-in common framework for generating forms. Last time I checked, forms were uploaded by bots, and there was no convenient way inside Wikidata to display as a nice table. (There is an external tool that can display such things, alt [15:28:14] But yes, what you described is also necessary. [15:29:12] are you thinking of https://lexeme-forms.toolforge.org/? (re @amire80: The number 1 most important improvement for Lexeme: a built-in common framework for generating forms. Last time I checked, forms were uploaded by bots, and there was no convenient way inside Wikidata to display as a nice table. (There is an external tool that can display such things, although I forgot its name. Surely there's someone here who can remind me?.. [15:30:27] And that's what I meant, too. If you have a more structured support for forms, you can associate forms with intonations, stress, etc. It's probably possible now by sticking together some properties, but my impression is that without a more unified framework, this approach will be too difficult to implement for a lot of languages. (re @mahir256: (I guess I had meant in the context of determining intonations, in response to whic [15:31:23] Excellent idea (re @amire80: And that's what I meant, too. If you have a more structured support for forms, you can associate forms with intonations, stress, etc. It's probably possible now by sticking together some properties, but my impression is that without a more unified framework, this approach will be too difficult to implement for a lot of languages.) [15:32:24] Oh, I wasn't familiar with this one. This looks like creation, not display. I'm afraid of pressing "Create" :) [15:32:25] I was talking about something else, but I forgot its name. I'll recall it if I see it... (re @Nikki: are you thinking of https://lexeme-forms.toolforge.org/?) [15:33:13] it has an edit mode which can display the existing data, I'm on my phone right now though so I can't easily find a good example [15:33:23] There was an effort by some people a while back to convert the inflection templates used on the English Wiktionary to a form that could be used for adding information to lexemes (a need which the Lexeme Forms tool also fills). (re @amire80: And that's what I meant, too. If you have a more structured support for forms, you can associate forms with intonations, stress, etc. It's probably possible now by sticking together some p [15:38:22] Yes. A lot of Wiktionaries have templates or Lua modules that do this quite well. [15:38:22] It's one of the most brilliant and underappreciated achievements of the Wikimedia volunteer community!!! [15:38:23] I use it a lot for understanding verbs when I read or write French, Spanish, Catalan, or Belarusian, none of which I know perfectly, and I even use it to verify the spelling of rare declined forms in my native Russian. [15:38:25] It would be nice not just to rewrite this for use on Wikidata Lexeme, but to do it in a way that allows code reuse between languages, and easy addition of new languages. That's exactly what I mean when I say "a unified framework". (re @mahir256: There was an effort by some people a while back to convert the inflection templates used on the English Wiktionary to a form that could be used for adding information to lexemes (a nee [15:43:30] To clarify, what is done now is more less this: "To add Assamese support, create a blank template, copy the template from Bengali, and change the endings so they would be correct in Assamese". And this is definitely not my definition of "easy addition of new languages". (re @amire80: Yes. A lot of Wiktionaries have templates or Lua modules that do this quite well. [15:43:30] It's one of the most brilliant and underappreciated achievements of the Wikimedia volunteer community!!! [15:43:32] I use it a lot for understanding verbs when I read or write French, Spanish, Catalan, or Belarusian, none of which I know perfectly, and I even use it to verify the spelling of rare declined forms in my native Russian. [15:43:33] It would be nice not just to rewrite this for use on Wikidata Lexeme, but to do it in a way that allows code reuse between languages, and easy addition of new languages. That's exactly what I mean when I say "a unified framework".) [16:09:33] We have a task for that (T2.6 here https://meta.m.wikimedia.org/wiki/Abstract_Wikipedia/Tasks ), to support the creation of regular forms [16:10:58] Regarding viewing them, I'd rather Wikidata not be too pretty, and instead make it easier for the Wiktionaries to provide pretty views, if that makes sense [16:26:02] As long as this works equally well for all languages without having to manually copy and fork a lot of code for each of them, showing it in Wiktionary is fine. [16:31:31] Hauki? (probably hauki.toolforge.org but I'm also on my phone so don't take my word for it ^^) (re @amire80: Oh, I wasn't familiar with this one. This looks like creation, not display. I'm afraid of pressing "Create" :) [16:31:32] I was talking about something else, but I forgot its name. I'll recall it if I see it...) [16:31:56] (meh, picked the wrong message to reply to but I hope you know what I mean) [16:33:50] Yes, agreed, that should be the goal. But this requires intensive discussions with the Wiktionary communities, about how to exactly integrate that. [16:33:50] In the end, the Wiktionaries are complex each in a unique way, and without some unification it will be hard to avoid manual work. [16:33:52] But we'll see how things develop. (re @amire80: As long as this works equally well for all languages without having to manually copy and fork a lot of code for each of them, showing it in Wiktionary is fine.) [16:36:59] [Meta-comment: I often think that I'm the last sucker using wikidata.org and wikipedia.org to read and edit Wikipedia and Wikidata, while everyone else uses other tools.] [16:37:47] No one is above manual edits on-wiki (re @amire80: [Meta-comment: I often think that I'm the last sucker using wikidata.org and wikipedia.org to read and edit Wikipedia and Wikidata, while everyone else uses other tools.]) [16:38:47] it would be nice if Wikidata were at least nice *enough* that I could see errors without needing to load a lexeme using a separate tool. a long list of forms with no guaranteed order is pretty much unusable :/ (re @vrandecic: Regarding viewing them, I'd rather Wikidata not be too pretty, and instead make it easier for the Wiktionaries to provide pretty views, if that makes sense) [16:39:21] No, I almost always use Wikidata and Wikipedia directly. Only recently I started using Lucas' Lexeme forms tool, but only for creating the first version of the Lexeme forms and then I clean up after manually [16:40:37] Agreed. This was in the original plan, but turned out to be to hasselsome to develop. [16:40:37] I'm surprised there's no gadget yet to provide such a view. (re @Nikki: it would be nice if Wikidata were at least nice *enough* that I could see errors without needing to load a lexeme using a separate tool. a long list of forms with no guaranteed order is pretty much unusable :/) [16:47:02] Another point to consider: Not all Wikidata items can be included in Abstract Wikipedia [16:47:17] Only encyclopedic information can be involved. [16:48:32] I propose to put ZObject in Wikilambda and use them to create Wikipedia articles in Abstract Wikipedia and the non-encyclopedic information in Wikispore. [17:01:08] Are you referring to Tabernacle? (re @amire80: Oh, I wasn't familiar with this one. This looks like creation, not display. I'm afraid of pressing "Create" :) [17:01:08] I was talking about something else, but I forgot its name. I'll recall it if I see it...) [17:01:47] Probably not. Ooof :( I should have bookmarked it. [17:26:48] +1 (re @amire80: Probably not. Ooof :( I should have bookmarked it.)