[01:29:12] A user brought up on the project chat that CJK languages don't have spaces between sentences. How do we plan to handle this? [01:34:39] Maybe have a function "List of sentences into HTML fragment" with a language parameter, so that it can have no spaces for CJK languages. [01:35:40] This is what I was thinking too. We ought to wrap that content in a paragraph and accept all text types (string, fragment, monolingual text) [01:37:58] I've created Z33068 [04:42:55] ko and vi has spaces [04:44:33] (Well actually vi is vi-Latn and ko is ko-Kore so they don't count) [04:48:10] Also need to handle Latin script for some Sinitic languages, e.g., cdo-Latn, hak-Latn, nan-Latn-(pehoeji|tailo). [04:51:34] It's me (re @Feeglgeef: A user brought up on the project chat that CJK languages don't have spaces between sentences. How do we plan to handle this?) [04:52:04] Wonder why the Natural language Objects don't have "script" keys [04:52:11] Will this make it harder to edit? (re @wmtelegram_bot: Maybe have a function "List of sentences into HTML fragment" with a language parameter, so that it can have no spaces...) [04:52:27] In what way? (re @OverflowCat: Will this make it harder to edit?) [04:53:06] like, you have to unfold the function to edit nested ones [04:53:43] My plan was to replace the paragraph function with a function that can handle the differences between languages (re @OverflowCat: like, you have to unfold the function to edit nested ones) [04:53:49] So I don't think so [04:55:45] Another thing is that whether words of dialects should have a separate lexeme [04:56:40] yes for 汉语方言 [05:00:15] I guess you mean something "different languages of Sinitic languages" ( `zh-*` vs. `hak-*` ) instead of something more "dialect" on mutual intelligibility ( `hak-*-CN` vs. `hak-*-TW` )? (re @OverflowCat: Another thing is that whether words of dialects should have a separate lexeme) [05:00:57] Yes, but I mean Mandarin dialects as well (re @Winston_Sung: I guess you mean something "different languages of Sinitic languages" ( zh-* vs. hak-* ) instead of something more "dialect" on ...) [05:01:32] Wikidata supports lexemes without an ISO code [05:02:28] If a word's usage is the same across sinitic labguages, should it still be created multiple times? [05:03:10] (I guess you mean BCP 47 language tag because something without a BCP 47 variant tag but have ISO 639 language code/BCP 47 language subtag would still be supported on Wikidata.) (re @OverflowCat: Wikidata supports lexemes without an ISO code) [05:06:26] Lexemes dont necesarily needs a code; when you create one, the lexeme's language field is using Q item (re @Winston_Sung: (I guess you mean BCP 47 language tag because something without a BCP 47 variant tag but have ISO 639 language code/BCP 47 langu...) [05:06:50] I remember there are discussions about this⁽¹⁾ and this⁽²⁾. [05:06:52] Some users said we should use same Wikidata Lexeme for zh-Hans + zh-Hant while some users said we should separate them into different Lexemes. [05:06:53] For zh-* vs. hak-*, I remember they always in different Lexemes. [05:06:57] this⁽¹⁾ (re @OverflowCat: Yes, but I mean Mandarin dialects as well) [05:07:02] this⁽²⁾ (re @OverflowCat: If a word's usage is the same across sinitic labguages, should it still be created multiple times?) [05:09:18] I mean the thing shows there: [05:09:19] https://www.wikidata.org/wiki/Lexeme:L501727 [05:09:20] https://www.wikidata.org/wiki/Lexeme:L988684 (re @OverflowCat: Lexemes dont necesarily needs a code; when you create one, the lexeme's language field is using Q item) [05:20:19] (1) I think simplified and traditional should still be merged. Otherwise, wouldn't POJ also need its own separate lexeme? [05:20:20] (2) so the criterion for this boundary is the language tag, and zh maps to Modern Standard Chinese? (re @Winston_Sung: I remember there are discussions about this⁽¹⁾ and this⁽²⁾. [05:20:22] Some users said we should use same Wikidata Lexeme for zh-Hans + zh-...) [05:54:51] (1) Pe̍h-ōe-jī now use the same Lexeme for Traditional Han and Tâi-lô in Hokkien. [05:54:52] (2) zh maps to Modern Standard Mandarin at the moment. (re @OverflowCat: (1) I think simplified and traditional should still be merged. Otherwise, wouldn't POJ also need its own separate lexeme? [05:54:53] (2) s...) [05:59:47] Some considerations: [05:59:49] * How to handle zh-Hans and zh-Hant (and possibly zh-Hant-HK) when using different words/terms/glossaries? zh-Hans-CN + zh-Hant-CN? [05:59:50] * Considering the possibility to add Hanyu Pinyin into Lexemes and there are different pronounciations. Add different pronounciations in the same Lexeme or split them? + zh-Latn-CN-pinyin + zh-Latn-TW-pinyin? (re @OverflowCat: (1) I think simplified and traditional should still be merged. Otherwise, wouldn't POJ also need its own separate lexeme? [05:59:52] (2) s...) [06:16:19] Language Variant Conversion through Wikifunctions sounds interesting. I would like to try for ms-Latn and ms-Arab too in the (not so near) future. [06:21:39] zh-TW-t-zh-Hant-CN-x-gujibiao (re @Winston_Sung: Some considerations: [06:21:40] * How to handle zh-Hans and zh-Hant (and possibly zh-Hant-HK) when using different words/terms/glossaries?...) [06:22:36] Maybe it's time to split the conversion table in MediaWiki core. [06:59:34] I tried to add German. It works sometimes, but times out frequently. (re @u99of9: Does anyone want to translate Z32919 into another language in the next week so I can show off creation of an article like https:...) [07:39:39] consider asking in the lexicographic channel for input (re @Winston_Sung: Some considerations: [07:39:40] * How to handle zh-Hans and zh-Hant (and possibly zh-Hant-HK) when using different words/terms/glossaries?...) [07:55:29] What do you mean gu-ji-biao/gu-ji-bi-ao/gu-ji-bia-o by the way? (re @zauberviolino: zh-TW-t-zh-Hant-CN-x-gujibiao) [07:59:16] working on the dump script I created in December.I'm interested in comparing the number of functions and implementations and tests over time 😀 : https://tools-static.wmflabs.org/bridgebot/f1ff9907/file_79165.jpg [08:03:09] Since december WMF changed the rate limit it seems. I got 429 when using the undocumented API to get the test results, so I slowed it down and added a queue. Now it takes forever even for 100 ZIDs out of the 3,8k total 🙈 [08:04:53] 10 min for 100 ZIDs equal 6,5h total running time. [08:05:55] That is quite not practical. (re @Npriskorn: 10 min for 100 ZIDs equal 6,5h total running time.) [08:06:28] I have a tool that allows you to perform JSONata queries on the objects (so you can figure out essentially any statistic), and I ran into this (re @Npriskorn: Since december WMF changed the rate limit it seems. I got 429 when using the undocumented API to get the test results, so I slow...) [08:06:36] This is exactly the number of function calls for a Start-Class Page. (re @Npriskorn: 10 min for 100 ZIDs equal 6,5h total running time.) [08:06:48] I slowed it down a lot, maybe more than needed as I'm still testing. I'll try to crank it up a bit when this run is done (re @Csisc1994: That is quite not practical.) [08:08:11] We need somehow to find a way to store the output of functions in the Back-End and add a refresh button to refresh a specific function when updated. [08:09:20] Isn't the rate limit only for unconnected users? (re @Npriskorn: Since december WMF changed the rate limit it seems. I got 429 when using the undocumented API to get the test results, so I slow...) [08:10:00] No. It exists in the Abstract Wikipedia, as well. Only in Wikifunctions when logged in. (re @NicolasVIGNERON: Isn't the rate limit only for unconnected users?) [08:12:02] Wait, are you saying that rate limits exist even for logged in users for Abstract Wikipedia. Wouldn't that be concerning due to the number of function calls on some abstract articles? [08:12:14] would you be willing to write a ticket? (re @Csisc1994: We need somehow to find a way to store the output of functions in the Back-End and add a refresh button to refresh a specific fu...) [08:13:12] You can write a ticket. (re @Npriskorn: would you be willing to write a ticket?) [08:13:49] But, I can take a screenshot as a proof. [08:14:48] got rate limited again even though I slowed it down to snail pace : https://tools-static.wmflabs.org/bridgebot/d656f464/file_79167.jpg [08:15:19] Are you setting a user agent? [08:16:05] I think we should file a ticket for documented stable APIs for getting function results and test statuses so we can use the system even without the website. [08:16:12] yep (re @Jan_ainali: Are you setting a user agent?) [08:16:24] Unrelated, but I feel like the function catalogue of Wikifunctions will get more and more unsustainable as time goes on. I feel like using categories for functions would be a much better approach, especially from the new influx of functions from Abstract Wikipedia's release. [08:18:47] Unfortunately categories can't currently be put on function pages, only talk pages. And then in the categories all you see are ZIDs... (re @wmtelegram_bot: Unrelated, but I feel like the function catalogue of Wikifunctions will get more and more unsustainable as time goes ...) [08:18:57] Hmm, "low" is not really helpful on [[mw:Wikimedia_APIs/Rate_limits#Limits]] [08:19:42] I agree it's not sustainable as it is now. [08:19:43] We have requested structured data for ZIDs before but it got rejected. [08:19:44] I'm not sure categories has been suggested. [08:19:46] They could perhaps be embedded in the zid json? (re @wmtelegram_bot: Unrelated, but I feel like the function catalogue of Wikifunctions will get more and more unsustainable as time goes ...) [08:20:30] Maybe it could be embedded into the json in some form? [08:21:08] As mainspace zid pages are entirely json [08:21:27] https://tools-static.wmflabs.org/bridgebot/cbf3d030/file_79168.jpg [08:21:39] https://tools-static.wmflabs.org/bridgebot/8cb909c6/file_79169.jpg [08:21:42] https://tools-static.wmflabs.org/bridgebot/20b57d91/file_79170.jpg [08:22:09] Screenshots at t=0min, t=4min, and t=8min [08:23:05] I find most functions by searching. For this it helps to have lots of aliases. So whenever I want to purge the cache for a function, I add an alias or two to nudge it. [08:26:27] thanks for the link, I'll try logging in, perhaps that solves the issue. (re @Jan_ainali: Hmm, "low" is not really helpful on [[mw:Wikimedia_APIs/Rate_limits#Limits]]) [08:29:26] I feel like there should be some way to add edit summaries to connection or disconnections without having to use that wikilambda-source script. (Sorry, I think I'm just using this now to air my grievances with Wikifunctions ) [08:35:02] If you add your “grievances” to [[Wikifunctions:Project chat]] we can tag them with the corresponding Phabricator ticket. (re @wmtelegram_bot: I feel like there should be some way to add edit summaries to connection or disconnections without having to use that...) [08:39:56] The more the merrier, perhaps… but if there’s more than one, you can just shuffle them. Given that we should always have a long form and a short form, there will always be two. (I just made that rule up.) (re @u99of9: I find most functions by searching. For this it helps to have lots of aliases. So whenever I want to purge the cache for a funct...) [10:13:43] My current implementation is to [10:13:44] * Add language parameter if other parameters only calls something multilingual, e.g., Wikidata entity, or have more than 1 string parameter [10:13:46] * Use monolingual text for single string parameter (re @OverflowCat: also there is a lang code param in monolingual text) [10:16:26] For example, [10:16:28] https://www.wikifunctions.org/wiki/Z32790 [10:34:29] Nice… and I think Z16277 is fixed (for now). (re @Winston_Sung: For example, [10:34:29] https://www.wikifunctions.org/wiki/Z32790) [10:39:27] I hope it doesn't add extra clicks if we don't want to edit the summary. I'm quite happy with the way it works now. (re @wmtelegram_bot: I feel like there should be some way to add edit summaries to connection or disconnections without having to use that...) [10:42:26] I just disconnected Z23758 and then had to add a couple of aliases to Z16277 just to explain why… that was quite a few extra clicks! (re @u99of9: I hope it doesn't add extra clicks if we don't want to edit the summary. I'm quite happy with the way it works now.) [10:44:50] Maybe disconnections typically deserve explanations but connections don't? (re @Al: I just disconnected Z23758 and then had to add a couple of aliases to Z16277 just to explain why… that was quite a few extra cli...) [10:45:46] I’d say you just hit the nail in the head 😎👍 (re @u99of9: Maybe disconnections typically deserve explanations but connections don't?) [10:48:56] While everyone is here, can you explain where we landed with language configurations? It sounds like the previous composition cannot be supported by V2. But maybe "quoting" something will fix it? But then we have another slower implementation anyway?? [10:49:49] It should be easier if we have something "$1 of $2" format strings, so we don't fragmentize strings ("lego messages"). (re @Winston_Sung: For example, [10:49:50] https://www.wikifunctions.org/wiki/Z32790) [11:05:18] Yeah, but there seems to be some caching that still shows failed in the test cases. (re @Al: Nice… and I think Z16277 is fixed (for now).) [11:06:11] No. I think someone misunderstood the issue. I hope T419789#11782917 clarifies. It might technically be possible to call a code function with a quoted Z60, but I think that’s blocked by *T409229*, so I’m in no hurry to try. The team should fix the immediate issue, fix the integration testing failure by omission and work on a scalable solution for configuration by [11:06:11] language. If [11:06:11] the first point is addressed without the second and third being addressed, someone will have to raise a separate ticket. No doubt some of the larger configurations are already bringing Abstract Wikipedia to its knees, but I can’t see any way round that for the time being. (re @u99of9: While everyone is here, can you explain where we landed with language configurations? It [11:06:11] soun [11:06:13] ds like the previous composition ca...) [11:09:17] Thanks. Sorry I hadn't read your reply in the task until after I posted here. This makes more sense now. (re @Al: No. I think someone misunderstood the issue. I hope T419789#11782917 clarifies. It might technically be possible to call a code ...) [11:16:41] The Swedish case now results in [11:16:41] 瑞典是位于国家的欧洲。 (re @Winston_Sung: Yeah, but there seems to be some caching that still shows failed in the test cases.) [11:17:47] Z32903 (re @Al: The Swedish case now results in [11:17:47] 瑞典是位于国家的欧洲。) [11:28:10] Well… we could hardcode a ZID to language-tag lookup in the JavaScript, I suppose 🤔 (re @u99of9: Thanks. Sorry I hadn't read your reply in the task until after I posted here. This makes more sense now.) [12:01:29] Oops. (re @Al: The Swedish case now results in [12:01:31] 瑞典是位于国家的欧洲。) [12:03:35] This avoids measure words, though (re @Al: The Swedish case now results in [12:03:35] 瑞典是位于国家的欧洲。) [12:06:26] Oh, the parameter values were incorrect in the test case. (re @Al: The Swedish case now results in [12:06:28] 瑞典是位于国家的欧洲。) [12:07:54] Ah, that’s a reassuring “failure” then 😎👍 (re @Winston_Sung: Oh, the parameter values were incorrect in the test case.) [12:10:13] 《古籍印刷通用字规范字形表》(GB/Z 40637-2021), the national standard for traditional characters of China. But the use of Tongyong Guifan Hanzi Biao is also widespread. So I guess you can say there are two zh-Hant-CN (re @Winston_Sung: What do you mean gu-ji-biao/gu-ji-bi-ao/gu-ji-bia-o by the way?) [12:26:54] I've fixed the parameter order but it still shows as failed. (re @Al: Ah, that’s a reassuring “failure” then 😎👍) [12:39:44] I think that was just the caching. Both tests now show as passing. (re @Winston_Sung: I've fixed the parameter order but it still shows as failed.) [12:40:33] Yeah. Thanks. [12:43:10] By the way, don't just write "zh", use specific language tag "zh-Hans", "zh-Hant", "zh-Hant-HK" whenever possible. (re @Al: I think that was just the caching. Both tests now show as passing.) [12:46:36] Did I do that? I’m sorry. The tag for Z1589 is just “zh-hk”. Is that incorrect? Please file a task on Phabricator, in that case. (re @Winston_Sung: By the way, don't just write "zh", use specific language tag "zh-Hans", "zh-Hant", "zh-Hant-HK" whenever possible.) [12:52:48] zh-hk: Wikimedia [12:52:49] zh-Hant-HK: BCP 47 (re @Al: Did I do that? I’m sorry. The tag for Z1589 is just “zh-hk”. Is that incorrect? Please file a task on Phabricator, in that case.) [13:16:04] I’ve added that in Z1589 as an “en” alias and the “mul” label. (re @Winston_Sung: zh-hk: Wikimedia [13:16:04] zh-Hant-HK: BCP 47) [15:03:34] This would be a better approach, probably: [15:03:35] https://www.wikifunctions.org/wiki/Z33048 [15:03:37] https://www.wikifunctions.org/wiki/Z33037 (re @OverflowCat: Creating 9 different functions just for Mandarin Chinese will be insane) [16:15:37] Is "A woman is a grown gal. " the expected output of https://abstract.wikipedia.org/wiki/Q467 ? [16:16:35] No, I think it's a flaw with lexeme search [16:17:45] I think for this case just getting the English item labels should be enough so I don't know why we're fetching lexemes [17:21:52] where is the 'gal' coming from? [17:23:54] L1504131, I think [17:24:13] (especially L1504131-S1#P5137) [18:04:47] huh, and L1504131 has P5137 set to Q84048852, even though L3340 has it set to Q3031. [18:08:05] and the fact about 30% of the time the "Generated text" box yields "Reached max retries. Try again later." (with no manual retries on my part) is known, yes? [18:23:58] I have now changed this so gal has 'girl' rather than 'female human' as its item. (re @abartov: huh, and L1504131 has P5137 set to Q84048852, even though L3340 has it set to Q3031.) [18:24:29] (and, er, removed 'Kuh' from 'female human's aliases.) [18:37:56] not sure we have a ticket for that one yet. (re @abartov: and the fact about 30% of the time the "Generated text" box yields "Reached max retries. Try again later." (with no manual retri...) [18:47:04] that exists as a slang in Danish also (ko). In Swedish the rough equivalent is kossa (according to chatgpt, I havent actually heard it) (re @abartov: (and, er, removed 'Kuh' from 'female human's German aliases.)) [19:50:52] oh, I don't doubt its use as a pejorative (indeed, I have been aware it exists), and Lexeme should indeed document it, but the Wikidata *item* should not have pejoratives as aliases! [19:50:52] Just as Q42884 should not have 'Krauts' as an alias, nor Q384593 'pig'. (re @Npriskorn: that exists as a pejorative slang in Danish also (ko). In Swedish the equivalent is kossa (according to chatgpt, I never actuall...) [20:28:52] 4716 [20:32:22] it seems we do: https://phabricator.wikimedia.org/T420630 (re @Npriskorn: not sure we have a ticket for that one yet.) [20:49:12] I think that is incorrect. The sense in question is for “girl or woman”, so it was correctly linking to the item for that sense “female human”. In a neutral context, I would say a “gal” is more likely to be an adult than a child. (re @abartov: I have now changed this so gal has 'girl' rather than 'female human' as its item.) [21:10:29] 'girl' itself in actual usage is also quite likely to be an adult, despite its original denotation. (re @Al: I think that is incorrect. The sense in question is for “girl or woman”, so it was correctly linking to the item for that sense ...) [21:11:01] Would it be possible to create a gadget that gives different styles on functions, implementations, and test cases in recent changes and contributions? [21:20:04] Perhaps L3340 should have an additional sense that links to Q84048852, but the existing sense of L1504141, L1504141-S1, is not a sense whose item should be Q3031. (re @abartov: 'girl' itself in actual usage is also quite likely to be an adult, despite its original denotation.) [21:21:09] L1504131 (re @wikilinksbot: L3340 – girl [en: English] [21:21:10] Q84048852 – female human [21:21:11] L1504141 – piektdiena [lv: Latvian] [21:21:13] L1504141-S1 [21:21:14] Q3031 👧 girl)