[04:11:54] Hi Al - I'm generally on board with the idea of creating non-Wikidata lexemes; in particular, because Denny recently pointed out to me that Wikidata can never provide lexemes for all named entities. He also suggested there could be some smallish new types for named entity lexemes, and a few simple functions that can be called to create instances of those types. (As one [04:11:54] example [04:11:54] , there could be a function that takes the QID of a named entity and a language, retrieves relevant info from Wikidata, and puts the entity name into one of these simple new types, along with the relevant grammatical features that it figures out for the given language, such as gender.) We didn't have time to explore the details; not saying it would all be trivial. [04:11:54] Anyway, if you [04:11:56] have a moment, I'm interested to hear more about what new built-in function would be needed for the language-focused type that you mentioned above. (re @Al: Fair enough. Denny and I had this discussion on the original type proposal and our conclusion was: [04:11:57] It does make sense, yes. But...) [05:14:05] mhh, what are "non-Wikidata lexemes" here? [08:40:10] Could do… but it becomes self-referential. I don’t see it co-existing with Z89; it would just re-define it with additional Keys for the required tag and attributes. The innovation would be in the converters to code, where the ill-defined type-union would be resolved by reducing an HTML fragment to a string. They might also formalise the conversion of a Z11 to a language [08:40:11] span [08:40:11] (as a string). Other types of object should probably be rejected, but (somewhere upstream) the string from a display function should be tagged with the requested language, by wrapping it as either a Z11 or a language span. (re @u99of9: That sounds like a whole new Type. Do you want to write it up?) [08:59:21] I also don't think it should coexist in the long run. But IMO we are so far down the road with the old one that changing it will break a lot, and we may be better to deprecate and out-compete. I also think a proposal will help: explain the details; and demonstrate support. Otherwise the devs will have to interpret from Telegram. [09:05:37] But I do agree that adding extra keys is not terrible. (re @Al: Could do… but it becomes self-referential. I don’t see it co-existing with Z89; it would just re-define it with additional Keys ...) [09:10:36] Can the text node be an html Type object too? So kind-of recursive bracketing? (re @Al: Now you mention it, there was never a Type proposal for Z89, was there? 🤔 A proper HTML type would support a tag and an attribut...) [09:12:26] (bear in mind, I last learned anything new about html 30 years ago) [09:12:57] Sounds about right. I’m happy to go into further detail but it’s difficult to know where to draw the line. The most basic version is just a fusion of existing built-ins: something like Z6820 (or Z30120) + Z6830 + Z6005, for a context where the language is known. But even here, we could usefully incorporate Z24144, so that the initial fetch is limited to a few languages, [09:12:57] inclu [09:12:57] ding “mul”, and remains identical through the fallback chain. As Denny suggests, there will typically be no associated lexeme in any language, so even including the empty list from Z6830 (P5137, per language) is a high-value signal. In the case where there is exactly one linked lexeme (per language or for the first language), the lexeme itself could be included. This [09:12:57] could be [09:12:59] controlled by a Z6030, I suppose. (re @David: Hi Al - I'm generally on board with the idea of creating non-Wikidata lexemes; in particular, because Denny recently pointed ou...) [09:15:23] They can be. Maybe I’ll propose a variation to the current type. (re @u99of9: But I do agree that adding extra keys is not terrible, especially if they can be blank.) [09:22:02] Haha, I just saw Z33745. I can't believe we didn't have that already!! [09:31:57] Oh, I unknowingly reverted @vrandecic disconnecting Z34047. My simplistic view is that it's the only implementation that passes both tests. So connect? (re @u99of9: Haha, I just saw Z33745. I can't believe we didn't have that already!!) [09:33:31] In effect, yes… but, strictly speaking, no. That’s the recursiveness. In the type definition, the position of the “text node” would be occupied by a “content” object rather than a string, but only Z6, Z11 or Z89 would be valid there. We can’t currently enforce that in the type definition, but the converters would be built to fail for invalid types and would always p [09:33:31] [09:33:32] ass only a native string to code (for the “content” Key value). Unless we want to support a list of strings in that position, that is. (re @u99of9: Can the text node be an html Type object too? So kind-of recursive bracketing?) [09:37:25] Does that mean we would be better off with "union types" first? (re @Al: In effect, yes… but, strictly speaking, no. That’s the recursiveness. In the type definition, the position of the “text node” wo...) [09:39:50] That’s fine. Once *T423853* is fixed, Z33747 should be preferred. (re @u99of9: Oh, I unknowingly reverted @vrandecic who disconnected Z34047. My simplistic view is that it's the only implementation that pass...) [09:41:48] Yes. But I wouldn’t delay on that account. (re @u99of9: Does that mean we would be better off with "union types" first?) [15:56:05] Hi @NicolasVIGNERON - As I understand it (I didn't start the idea), these would be instances of some new types that could be created on Wikifunctions, which would be similar to Wikidata lexemes but probably simpler and lighter weight, and they would exist to provide lexicographic info for words/phrases that don't show up in Wikidata lexemes. I believe named entities is [15:56:05] a "poster [15:56:06] child" motivating this idea. they were mentioned and discussed a bit here (https://www.wikifunctions.org/wiki/Wikifunctions:Type_proposals/Wikidata_based_types#Discussion). (re @NicolasVIGNERON: mhh, what are "non-Wikidata lexemes" here?) [16:05:31] [more specifically, here https://www.wikifunctions.org/wiki/Wikifunctions:Type_proposals/Wikidata_based_types#c-DVrandecic_(WMF)-20240716191000-GrounderUK-20240716105100 @NicolasVIGNERON] (re @David: Hi @NicolasVIGNERON - As I understand it (I didn't start the idea), these would be instances of some new types that could be cre...) [17:26:27] mhh, I'm not sure to really understand the idea... [17:26:27] if there is lexical data to store, then it's probably eligible in Lexemes and if not, just taking the item label might be enough? or am I missing some special case (re @David: Hi @NicolasVIGNERON - As I understand it (I didn't start the idea), these would be instances of some new types that could be cre...) [17:42:55] Does this mean that the Wikifunctions community has to build up entire new Wikifunctions-based lexeme from scratch? (re @Al: [more specifically, here https://www.wikifunctions.org/wiki/Wikifunctions:Type_proposals/Wikidata_based_types#c-DVrandecic_(WMF)...) [17:44:22] it's a bit what I fear, it would be counter-productive... (re @acanthamoeba_castellanii: Does this mean that the Wikifunctions community has to build up entire new Wikifunctions-based lexeme from scratch?) [17:44:51] I am not so sure if people would like to do redundant word though... (re @acanthamoeba_castellanii: Does this mean that the Wikifunctions community has to build up entire new Wikifunctions-based lexeme from scratch?) [17:46:38] there is maybe room for having pseudo-lexemes for things that can't be lexemes (even though I can't think of example), so redundancy of _content_ would be avoided but there would still be some _structural_ redundancy (re @acanthamoeba_castellanii: I am not so sure if people (perhaps language specific community) would like to do redundant work though...) [17:51:11] I think the aim would be to avoid exactly that. Presumably we do not want every person and place to be its own lexeme, but some languages will require grammatical features to be available, so there appears to be a middle ground where an item label is not enough (in some language) and a full lexeme is neither available nor desirable. (re @acanthamoeba_castellanii: Does [17:51:11] this mean t [17:51:11] hat the Wikifunctions community has to build up entire new Wikifunctions-based lexeme from scratch?) [17:52:17] The special case is when some grammatical features of the pseudo-lexeme (that cannot be stored as a real lexeme on Wikidata, like for example names of people) should be directly taken from the Wikidata item (for example, the grammatical gender) (re @NicolasVIGNERON: mhh, I'm not sure to really understand the idea... [17:52:18] if there is lexical data to store, then it's probably eligible in Lexemes and...) [17:55:04] people, maybe not (but it might depend) [17:55:05] but places can absolutely be in Lexemes (some languages did already some big import) (re @Al: I think the aim would be to avoid exactly that. Presumably we do not want every person and place to be its own lexeme, but some ...) [17:55:53] But it’s not clear that items can or should have a grammatical gender, or any other feature that can vary by language. (re @dvd_ccc27919: The special case is when some grammatical features of the pseudo-lexeme (that cannot be stored as a real lexeme on Wikidata, lik...) [17:57:44] In fact, pseudo lexemes would very likely be created by language-specific functions (re @Al: But it’s not clear that items can or should have a grammatical gender, or any other feature that can vary by language.) [17:58:14] so things like "Albert Einstein"? hmm, I see... thanks [17:58:15] aren't already all the data needed on the corresponding item? (label and sex-gender for instance) what value a "pseudo-lexemes" would add? (like is there languages where the grammatical and/or the semantic gender would be the sex-gender) (re @dvd_ccc27919: The special case is when some grammatical features of the pseudo-lexeme (that cannot be stored as a real [17:58:15] lexeme on Wikidata, lik...) [17:59:36] A pseudo-lexeme would be a useful tool to handle these information internally between function calls (re @NicolasVIGNERON: so things like "Albert Einstein"? hmm, I see... thanks [17:59:36] aren't already all the data needed on the corresponding item? (label and ...) [18:01:03] oh, like a first function would retrieve "label : Albert Einstein, gender : male" and pass it to some other function, interresting [18:01:05] now I think I get it, and indeed it sounds useful and interresting (re @dvd_ccc27919: A pseudo-lexeme would be a useful tool to handle these information internally between function calls) [18:01:54] When a place (or person… entity) has a lexeme, there is no problem, other than a deficiency in the explicit forms, especially the regular ones (like periphrastic tenses). (re @NicolasVIGNERON: people, maybe not (but it might depend) [18:01:54] but places can absolutely be in Lexemes (some languages did already some big import)) [18:02:40] I support the idea that the current Lexeme type should be enough, as long as we allow to create them ex-novo from Wikifunctions functions [18:04:43] Yes, so long as the provenance is explicitly not Wikidata, for those lexemes and forms that are not fetched from Wikidata. (re @dvd_ccc27919: I support the idea that the current Lexeme type should be enough, as long as we allow to create them ex-novo from Wikifunctions ...) [18:09:45] Truthfully, I still think that the current lexicographical data on Wikidata might need to be improved further, because the lexical category is being put directly under a particular lexeme. A lot of languages are root/stem based, and then doing so by repeatedly creating new lexeme which is spelled exactly the same but different lexical category could [18:09:46] be redundant, so if Wikifuncti [18:09:47] ons are going to have a new form of lexicographical data then perhaps this aspect could be considered too [18:12:50] Yes, there was a lot to be said for the Wiktionary organisation around homographs. (re @acanthamoeba_castellanii: Truthfully, I still think that the current lexicographical data on Wikidata might need to be improved further, because the lexic...) [18:12:57] This would mean, for example, that Z6095 should be allowed to store an ID that is not a valid Wikidata LID (in order to uniquely identify pseudo lexemes) (re @dvd_ccc27919: I support the idea that the current Lexeme type should be enough, as long as we allow to create them ex-novo from Wikifunctions ...) [19:10:29] Hello. [19:11:20] Still working for my exams. But, I have done research on punctuations and I want to share the following links. [19:11:56] https://arxiv.org/abs/cmp-lg/9506012 [19:11:57] https://www.sciencedirect.com/science/article/abs/pii/S0378216602001893 [19:11:59] https://philarchive.org/archive/SAYCAT [19:12:00] https://www.researchgate.net/profile/Ted-Briscoe/publication/2284341_The_Syntax_and_Semantics_of_Punctuation_and_its_Use_in_Interpretation/links/00463519e130fba7bd000000/The-Syntax-and-Semantics-of-Punctuation-and-its-Use-in-Interpretation.pdf [19:12:02] https://eric.ed.gov/?id=ED208404 [19:12:27] These works present a point of view about how to represent punctuations in a universal context [19:13:39] Each work proposes a different approach with multiple points in common. I think that NLG SIG can be interested in having these works for developing punctuations for the Universal Language it works on. [19:15:10] I think that it will be probably useful to use these works for implementing punctuations. [20:06:36] Another concern has been that realistically, will there ever be lexemes for all place names, in all languages? If so, that's great and I guess that part of the discussion is done. If not, maybe there could be a non-Wikidata lexeme type for this purpose, and functions to create them, which cleverly figure out needed grammatical features like gender, at least when those [20:06:36] features c [20:06:36] onform to some rules. (But I don't know if that's even feasible, linguistically.) I presume these functions would be used on demand when articles are generated. Also, if the needed info is already available in a Wikidata lexeme, these functions would not be used at all. (re @NicolasVIGNERON: people, maybe not (but it might depend) [20:06:38] but places can absolutely be in Lexemes (some languages did already some big import)) [20:12:21] An idea is to use RegEx to identify gender. For example, "Château de ****" is masculine because Château meaning palace is masculine. (re @David: Another concern has been that realistically, will there ever be lexemes for all place names, in all languages? If so, that's gre...) [20:13:08] Doing this for palaces, streets, hotels, buildings, and so on can be useful. [20:13:41] Might work for some languages. Others, not so much. (re @Csisc1994: An idea is to use RegEx to identify gender. For example, "Château de ****" is masculine because Château meaning palace is mascul...) [21:15:21] Even if every place had a lexeme (which I think is impossible, since I guess that there are at least millions places in the world, expecially if you give a wide definition of "place"), there would still be the problem of people's names (re @David: Another concern has been that realistically, will there ever be lexemes for all place names, in all languages? If [21:15:21] so, that's gre...)