[14:58:31] I'm not sure I understand the name discussion. [14:58:50] Is there a particular target date for the decision? [14:58:56] And who will make it? [15:31:12] hi @amire80 - we'll send out a draft of the call for the naming decision out quite soon, and then it should be clearer. [15:48:46] :) (re @wmtelegram_bot: hi @amire80 - we'll send out a draft of the call for the naming decision out quite soon, and then it should be clearer.) [16:24:34] Re this one discussion (re @amire80: You may have read the robot-written article in The Guardian. In case you haven't, here it is: https://amp.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-article-gpt-3) [16:26:35] I have read the articles and the discussion, and I think there is one major difference to the examples discussed there, which is that in Abstract Wikipedia we won't let a neural network come up with anything, but we will have a symbolic, high-fidelity natural language generation system. [16:26:38] I am not sure that marking the text visually as generated is necessary. But it sure will be possible, and I think it should be in the hand of the individual communities to decide whether they want it to be marked up or not. [16:27:01] In the end, we are also not marking up text generated through templates right now. [16:27:49] So, yes, if the communities want to do so, they can, as visibly as they like, but I don't think it should be a product decision to force the communities to make it super visible. (Beyond what is needed to invite contribution) [16:27:55] Does this make sense? [18:10:55] We don't mark content generated by templates as such, but this content is usually not quite text. [18:10:56] It can be very short text ("citation needed"), or labels in an infobox, or slightly longer text in hatnotes ("This biography of a living person needs additional citations for verification."), and so on. [18:10:57] It is not 100% integrated into the prose. It looks more like part of the user interface. [18:10:58] In fact, I heard that the English Wikipedia, and possibly some others, have a policy that article content is not supposed to be written in the template namespace. (I'm not entirely sure whether it's a written policy or an implicit community agreement. If anyone can clarify that, I'll be very thankful.) (re @wmtelegram_bot: In the end, we are also not marking up text generated through templates right now.) [18:11:40] AW plans to go beyond that: to generate prose. (Although I do suspect that some people will use it for generating more technical and UI things.) [18:13:28] We absolutely have to mark it in a both human-readable and machine-readable way. Even though it's predictable and not based on artificial "intelligence". That's the bare minimum that has to be done. We should probably do even more to show that this was not written by humans. (re @wmtelegram_bot: I am not sure that marking the text visually as generated is necessary. But it sure will be possible, and I think it shou [18:14:46] Mass-creating thousands of long texts in multiple languages is a whole another league, different from anything we do now. [18:15:18] People will think that it's the same as the rest of Wikipedia, but it isn't. [18:16:05] Why do we restrict the work of Wikilambda to supporting Abstract Wikipedia. [18:16:19] I am thinking of another application in medicine. [18:17:10] For example, a measure such as body mass index in Wikidata can be linked to its Z-Object or formula [18:18:04] The input variables of the Z-Object can be Wikidata items. [18:18:29] e.g. for body mass index, the inputs are mass and height. [18:23:31] Another application can be the characterization of medical classifications in the Z-Objects. [18:24:42] Here's just one reason to mark it as "not written by humans": let's say that a function that generates texts about famous scientists has a bug that produces incorrect spelling in Persian. It affects thousands of articles. It can be fixed, but: [18:24:42] 1. To be fixed it has to be noticed and reported first. If it's marked as "generated by an algorithm, not generated by humans", it can also have a "report a mistake" button that would be different from usual editing. [18:24:44] 2. People who read it may think that this is the correct spelling ("it appears in an encyclopedia, so it must be good"), or they can think that the people who write this encyclopedia are dumb and don't know the language. A mistake made by an algorithm is perceived differently from a mistake made by a human. (And yes, an algorithm is made by a human, but it's not the same kind of a mistake.) [18:24:45] 3. Texts are often used for feeding machine translation engines. A machine-generated text must be treated differently [18:25:11] Here's just one reason to mark it as "not written by humans": let's say that a function that generates texts about famous scientists has a bug that produces incorrect spelling in Persian. It affects thousands of articles. It can be fixed, but: [18:25:11] 1. To be fixed it has to be noticed and reported first. If it's marked as "generated by an algorithm, not generated by humans", it can also have a "report a mistake" button that would be different from usual editing. [18:25:12] 2. People who read it may think that this is the correct spelling ("it appears in an encyclopedia, so it must be good"), or they can think that the people who write this encyclopedia are dumb and don't know the language. A mistake made by an algorithm is perceived differently from a mistake made by a human. (And yes, an algorithm is made by a human, but it's not the same kind of a mistake.) [18:25:14] 3. Texts are often used for feeding machine translation engines. A machine-generated text must be treated differently. When the bug is fixed, the machine translation engine should be updated. [18:25:46] Is your second point directed towards anyone working with Scots? (re @amire80: Here's just one reason to mark it as "not written by humans": let's say that a function that generates texts about famous scientists has a bug that produces incorrect spelling in Persian. It affects thousands of articles. It can be fixed, but: [18:25:46] 1. To be fixed it has to be noticed and reported first. If it's marked as "generated by an algorithm, not generated by humans", it can also have a "report a mistake" button that would be different from usual editing. [18:25:47] 2. People who read it may think that this is the correct spelling ("it appears in an encyclopedia, so it must be good"), or they can think that the people who write this encyclopedia are dumb and don't know the language. A mistake made by an algorithm is perceived differently from a mistake made by a human. (And yes, an algorithm is made by a human, but it's not the same kind of a mistake.) [18:25:49] 3. Texts are often used for feeding machine translation engines. A machine-generated text must be treated differently. When the bug is fixed, the machine translation engine should be updated.) [18:25:57] Is your second point directed towards anyone working with Scots? :-) (re @amire80: Here's just one reason to mark it as "not written by humans": let's say that a function that generates texts about famous scientists has a bug that produces incorrect spelling in Persian. It affects thousands of articles. It can be fixed, but: [18:25:58] 1. To be fixed it has to be noticed and reported first. If it's marked as "generated by an algorithm, not generated by humans", it can also have a "report a mistake" button that would be different from usual editing. [18:25:59] 2. People who read it may think that this is the correct spelling ("it appears in an encyclopedia, so it must be good"), or they can think that the people who write this encyclopedia are dumb and don't know the language. A mistake made by an algorithm is perceived differently from a mistake made by a human. (And yes, an algorithm is made by a human, but it's not the same kind of a mistake.) [18:26:00] 3. Texts are often used for feeding machine translation engines. A machine-generated text must be treated differently. When the bug is fixed, the machine translation engine should be updated.) [18:29:06] That's not what I was thinking about when I wrote it. But yes, it's a consideration, too. It is quite certain that many text-generating functions will be written by people who don't actually know the language. Sometimes they will have good intentions, but the results will still be bad. And sometimes the result may turn out to be good sooner or later. But we have to take precautions. (re @mahir256: Is your second point directed [19:21:10] It can be done with a Lua module already now :) [19:21:10] Except that modules today can't be easily shared across wikis, but a cross-wiki repository to share templates and modules between the WMF projects https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Tasks :) (re @Csisc1994: For example, a measure such as body mass index in Wikidata can be linked to its Z-Object or formula) [19:32:14] Developing this thought even further: Mistakes can be made by people who know the language well and by people who don't know the language well. In fact, in absolute numbers, people who know the language well will probably make more mistakes, because they will probably write more code. And to be even more precise and general, you don't even have to call "mistakes", but "code that produces output that for whatever reason is not [19:32:14] But the code doesn't care. Code just produces output. People need a constant reminder that they should care. (re @mahir256: Is your second point directed towards anyone working with Scots? :-)) [19:36:02] That's not a thing I'd leave to the communities because they'll forget to do it. It's probably okay to let communities customize everything, but by default it should be clearly marked. It won't be controversial. I can't imagine any community hating this labeling so much that they want to hide it. It's much more likely that they'll love it because it clearly distinguishes their human manual efforts from machine-generated text. [20:07:12] There will hopefully be so many ways that functions from the Wiki of functions are used, that having a hardcoded, single frame they all have to go through just seems neither feasible nor attractive. There might be complex editorial decisions, which I really would leave to the local communities to decide upon. [20:08:22] As you said, we will need some UX element that allows to get into a specialized editing mode. So they will be marked that way. But in the end it is up to the communities to decide how much visual difference they want to put on the generated content. I wouldn't want to make this decision for all Wikipedias and other projects, in advance, and for all different cases.