[09:55:48] Alphos: It's better to respond on Wiki [10:27:11] multichill seems a bit too intertwined with edits that shouldn't be reverted for rollbot to come in [10:31:57] abbe98[m], hej, er du her? [10:37:16] anyone here know how I can form a sparql query to sort by someone's age? [10:37:53] if they're still alive, sort by their date of birth [10:37:57] Jhs: ?person wdt:P569 ?dob. ……… } ORDER BY ?dob [10:38:24] WikidataFacts, Alphos, they're all dead [10:38:30] otherwise, BIND the subtraction to an ?age variable, and sort by that [10:38:38] or: ?person wdt:P569 ?dob. OPTIONAL { ?person wdt:P570 ?dod. } } ORDER BY (COALESCE(?dod, NOW()) - ?dob) [10:38:48] (untested, may have bugs) [10:39:05] Alphos, right, i've tried that, but I think I'm messing up the syntax [10:39:22] I have: [10:39:27] SELECT ?item ?itemLabel ?age [10:39:28] WHERE [10:39:28] { [10:39:28] ?item wdt:P106 wd:Q219477 . [10:39:28] ?item wdt:P27 wd:Q20 . [10:39:28] ?item wdt: [10:39:30] ?item wdt:P569 ?dob . ?item wdt:P570 ?dod . BIND(YEAR(?dod)-YEAR(?dob) as ?age [10:39:30] short url [10:39:32] SERVICE wikibase:label { bd:serviceParam wikibase:language "nb,nn,en,fi" } [10:39:36] } [10:40:00] http://tinyurl.com/ycmh48wx [10:40:30] http://tinyurl.com/yatfcj37 [10:40:43] but that’s a lot of people with age 0 and 1… [10:41:22] http://tinyurl.com/ybc5gwjw [10:41:42] WikidataFacts lots of people with the same dob as dod [10:42:11] thank you very much! [10:42:19] who despite their short lives, worked as missionaries, impressive :P [10:42:22] improved age calculation: http://tinyurl.com/y9b8n2wa [10:42:30] nikki, hehe [10:42:35] this is exactly why i needed this query ;) [10:42:40] (365.2425 is the average length of a year according to the Gregorian calendar) [10:43:00] there were lots of children in the database I imported, and I removed most of them beforehand, but some got through [10:43:23] Jhs: so is this a database of missionaries, except also with their children? [10:43:55] WikidataFacts, yup. [11:38:51] hey halfak, you seem to be all over the place! ;-) [11:39:02] o/ :D [11:39:07] I hope in a good way :) [11:39:22] Hack, cite, metrics, etc [11:39:37] * halfak does all the things! \o/ [11:39:50] But yeah, I'm pretty exhausted. Some hard tradeoffs in trying to do it all. [11:39:51] I was wondering. Your quality model uses the suggestions. Does that also include qualifier suggestions? [11:40:13] It actually doesn't use suggestions yet. I've been working through the suggestions system in order to use it. [11:40:48] Seems the model has probably learned a bit of suggestions without explicit signal, but I think that'll be make a big difference. [11:40:56] For https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings/Collection/Rijksmuseum I seem to be running out of suggestions for a lot of paintings :-) [11:41:10] So right now, I'm working with Sjoerd on thinking through the things suggestions doesn't do great. [11:41:28] Qualifier suggestions are currently broken btw [11:41:34] Oh awesome. Do you think that the suggestions system is doing a good job and this content space is reaching item completeness? [11:41:43] It's correct in the DB and API, but interface isn't showing the right ones [11:41:57] (it doesn't grab the right data) [11:43:14] halfak: https://phabricator.wikimedia.org/T102324 [11:45:43] The suggestion system helps to get from a baby item to a quite grown item. Doesn't mean it's do. Probably hitting the pareto principle there [11:50:27] multichill, gotcha. I've been thinking about a somewhat simple extension to the suggestion system that I think will have a lot of fitness. [11:51:01] I'd use conditional (rather than independent) probability based on instance-of and subclass-of values for each property correlation. [11:53:22] multichill, one of the other ideas I've been working in is training the property recommender on seemingly high quality items. [11:53:34] This would boost the probability for properties that only exist on high quality items. [11:53:49] * nikki wishes the suggestions thing wouldn't just break on certain properties [11:54:53] nikki, sjoerd tells me that the qualifier recommender doesn't work on properties that were just added to an item. If that what you are talking about or something else? [11:55:32] I mean something else: try adding a new statement on https://www.wikidata.org/wiki/Q11532584 and see how it suggests absolutely nothing, not even instance of/subclass of [11:56:41] nikki, interesting. It seems like we should be able to recommend at least intance-of and subclass-of by default if it isn't present. [11:56:48] That should be 100% certainty! [11:56:55] that's what I'd like it to do [11:56:57] You're gonna have one of those wo. [11:57:51] nikki, looks like that is in the list I'm working from https://www.wikidata.org/wiki/User:Sjoerddebruin/Entity_suggester [11:58:01] Did you want to look into doing some of the technical work? [11:59:06] I wouldn't know how to do the technical work :/ [11:59:41] but if someone comes up with a new version to try out, I'm happy to test it [11:59:58] also I was the one who added it to that page :D [12:01:39] Cool! I'll keep you in mind for reviewing lists of sample recommendations. :) [12:03:09] halfak: It's very much a Mac Donalds system right now, not haute cuisine ;-) [12:03:59] Introducing bias towards high quality/very complete items is a good next step [12:04:04] I think the no suggestions thing happens when the only statements use properties which are on the excluded list (ones which are too broadly used to produce useful suggestions), but instead of treating it the same as no statements, it gets confused and see statements but no properties with suggestions and suggests nothing at all [12:04:40] nikki, aha! That makes sense and should be an easy fix (he says not knowing for sure) [12:05:27] lol multichill. Good way to describe it. I think I'll be looking at the data processing side of suggestions pretty soon, so hopefully we'll upgrade to something on the level of a good pub burger soon. [12:05:50] glorian_wd, ^ fyi [12:07:15] halfak: We talked about the item maturing from an empty item to a featured item this weekend. Some tools work really well on baby items, but completely mess up mature items [12:07:20] Not sure how to document that [12:07:58] Like for example using petscan to get some initial statements to sort things out works really well, but if you do those edits on already filled items you'll probably mess things up [12:08:38] multichill, interesting. My sense is that a good "epic" task on phab would be a good start. [12:08:43] * halfak creates. [12:08:46] So if you expose the maturity in a way it can be used to query and filter, tools can focus on the right items to work on [12:10:07] So you could just setup a query like https://www.wikidata.org/wiki/User:Multichill/Empty_items_with_Dutch_label to find not very mature items in a certain field [12:12:07] A possible technical implementation would be to measure it, say scale from 0 (empty) to 100 (most mature item) and store it in the page_props table. That also exposes it in SPARQL [12:12:34] https://phabricator.wikimedia.org/T166426 "Improve tools for completing an item on Wikidata" [12:12:58] multichill, we can use ORES for the scoring. [12:13:09] It won't be perfect, but it'll be useful. [12:13:18] And then iterate from there as ORES gets smarter. [12:13:50] Should be a flexible scale [12:14:09] So a very good item scores 90 now, same item without changes in a couple years probably 50 [12:14:31] I'll file a task for this case [12:15:20] +1 ORES works that way for article quality too. The scale changes. History looks different depending on how we feel about things now. :) [12:20:17] halfak: https://phabricator.wikimedia.org/T166427 feel free to completely change it ;-) [12:21:08] BLP would be an interesting case too. [12:23:14] I think we should rename this model from "item quality" to "item completeness" [12:23:22] The ORES one [12:46:59] halfak: o/ . I just had my lunch [12:47:29] Hey! Just wanted you to know about the discussions above re. property suggestions. [12:47:59] halfak: yea. I have quickly read it :) [12:48:01] You don't need to think about this yet. Your current work is plenty. I'll be thinking about this if and when we get to next steps for you re. property suggestions :) [13:36:50] halfak: o/ [13:36:55] https://usercontent.irccloud-cdn.com/file/63pR86y0/ [13:37:13] This is what I have achieved so far, I added one entry called "high probability properties" for wbsgetsuggestions API on my local mediawiki [13:37:30] because it is ran on my local, the PIDs are dummy [13:38:59] OK two thoughts. Why do you need an additional field for this? [13:39:14] Well.. let's just start with that thought. [13:42:28] halfak: because hoo said, they still use the original property suggester entry (the one which does not generate property suggestions if the item has already many properties) for some purpose. So, I feel I shouldn't overwrite that entry [13:42:33] and instead, create new entry [13:43:02] Maybe we could add a flag for "include_all_suggestions" or something like that. [13:43:25] halfak: oh. Do you mean add a flag on the API? [13:43:34] API URL* [13:44:01] Yes [13:44:10] something like "api.php?action=wbsgetsuggestions&entity=Qxxx&include_all_suggestions=true" [13:44:12] ? [13:44:17] Default would be to filter out properties that already exist. [13:45:19] halfak: got it. Then, I have to dig how to do it [13:46:16] anomie can probably help too. [13:46:31] He's sort-of the de facto product owner for api.php [13:46:54] halfak: kk [14:14:48] halfak: it just came up into my mind. I think we should not include all properties given a property-pair because the result could yield to a long page. For instance, the property pair P31: Q5 yields to 100 properties recommendation. Probably, we should set some filter, such like, only include properties that have probability more than or equal to 0.5. [14:19:52] ah wait. Just realized we can set the limit on the API. [14:20:16] right. and using a continuation [16:37:27] halfak: what would be the problems if we keep showing all suggested properties in a new entry? [16:38:04] bandwidth and we'd have to change the javascript in Wikidata to filter on the client-side. [16:38:34] If we add a flag to give us all the properties then we don't waist bandwidth and we preserve compatibility. [16:39:47] halfak: ok [17:40:25] :D [17:43:45] hoo: can you post our current entity suggester blacklist? [17:44:10] Sure, one second [17:46:25] sjoerddebruin: https://phabricator.wikimedia.org/P5494 ;) [17:48:06] hoo: oops, sorry. Meant your local workaround file for the calculations. Blame the weather. :P [17:50:06] Sure [17:52:55] sjoerddebruin: https://phabricator.wikimedia.org/P5495 [17:53:16] Saying it has rough edges is an understatement [17:53:43] But it might help with getting a view of how the current system works. [17:54:50] I hope so, yes [18:01:52] fyi halfak: those are the fixes we've talked about, we currently exclude external id's, some set of properties and the P31 suggestions for certain date properties to make the suggestions a little bit better to work with ^ [20:58:22] Jesus, VIAF is really fast with merging. [20:59:33] Two weeks. https://www.wikidata.org/w/index.php?title=Q29907218&type=revision&diff=491238801&oldid=485401851 [21:01:10] We should have a way to track this [21:05:20] sjoerddebruin: RKD is much faster! :P [21:05:32] Still, I'm impressed. [21:05:49] Haha, yes, it sure is nice to see [21:19:28] hoo|away: https://gerrit.wikimedia.org/r/#/c/354246/ is marked as merge conflict.....