[06:49:39] Hey everyone. I'm trying to use WikiData to identify logo brands programatically. I've created a SPARQL statement https://gist.github.com/clanstyles/0d56096e73d175b312e72ec48531bda6 which keeps timing out. I'm also open to any and all suggestions as to why the logo_image is blank sometimes. [07:02:14] styles_: http://tinyurl.com/y6vd6knk works well for me [07:02:52] ?logo_image would be blank because you're not actually assigning anything to ?logo_image [07:03:52] gotcha [07:05:19] I'm actually trying to switch up the query now [07:05:30] I'm noticing the filter loser case search is pretty slow [07:05:37] https://gist.github.com/clanstyles/f0339c3d1a47cc7e7cd6042c8f879a69 [07:06:04] Thise isn't working, but the idea was to look at the official website. The problem is the website can be http://www.google.com, https://www.google.com, https://www.google.com/ etc.. [07:06:09] there are a ton of variations [07:06:13] Can I just look it up by host? [07:06:40] https://www.wikidata.org/wiki/Property:P856 [07:21:13] styles_: http://tinyurl.com/ydf6zzpo ? [07:21:29] https://gist.github.com/clanstyles/f17b225d1a2ba23b580a41c7030656ca [07:21:36] I came up with this, but it doesn't work on DSW but NBC [07:21:41] I bet the data is tagged weird [07:22:19] It's missing the website [07:22:59] reosarevok, what's the SERVICE statement do? [07:23:28] It takes one label, basically - ?item rdfs:label ?itemLabel would give you one result per language label IIRC [07:25:51] reosarevok, try menswearhouse.com as the input for yours [07:25:55] It doesn't work in this case [07:26:04] https://www.wikidata.org/wiki/Q3305660 [07:26:10] But it is listed as an official website under this company [07:29:51] styles_: sure, but there's no logo image [07:30:04] (there's "image", but that's different) [07:30:10] Yeah some of these aren't in the system or are missing data [07:30:13] yeah [07:31:02] That item also seems crappy - the two brands of Tailored Brands and Men's Wearhouse should probably have separate items [07:31:16] But I haven't heard of either before, so :D [07:31:21] haha [07:31:30] Basically I have a list of 40k brands and need to get logos for them [07:31:47] this seems to have a 10% success rate [07:32:19] Try "pizzahut.com" [07:32:26] it's got a logo & website and it fails [07:34:36] I'm starting to wonder if Google will be a better choice [07:58:32] styles_: the logo is set as deprecated because they don't use that logo since 1999 [07:58:41] yeah [07:58:46] I found a company Clearbit that does this [07:58:52] I'm reaching out to them, their dataset seems pretty complete [12:08:58] 15,6 million edits last month. http://wikidata.wikiscan.org/date/201802/stats [12:09:08] Wait, I could say 15,7 :P [12:27:13] Thiemo_WMDE, hi, did you see my comment on Phab? :) [12:34:45] SMalyshev: are there problems with elastic search updating atm? [12:37:41] Or is this related to https://phabricator.wikimedia.org/T188595? [12:41:02] Lydia_WMDE: can you check if it is just me? New items aren't showing up in search and terms aren't getting updated. [12:41:27] sjoerddebruin: ohnoes [12:41:28] looking [12:41:30] I can't even save a new item as value! [12:41:39] sjoerddebruin: even with q-id? [12:41:45] Yes [12:41:50] ok checking [12:42:01] Trying to add https://www.wikidata.org/wiki/Q50286889 as family name to https://www.wikidata.org/wiki/Q50185873 [12:43:13] Hm, now it shows up. [12:43:35] Oh wait, pasting the wrong url [12:46:23] sjoerddebruin: https://www.wikidata.org/wiki/Q4115189 [12:46:31] i just added a statement with a new item there [12:46:40] using the q-id [12:47:18] sjoerddebruin, i had that earlier today as well. after 10 minutes they showed up [12:48:07] Jhs: that is probably https://phabricator.wikimedia.org/T183053 [12:48:20] but i understood sjoerddebruin's issue to be a bit different [12:48:21] hmmm [12:48:24] It's now working, showed up after 15± minutes. [12:48:37] ok so then likely an update issue with elastic [12:48:48] could be increase in job queue rate / delay [12:48:51] stas is working on reducing the time it takes for the index to update [12:48:52] Yeah, they are rebooting a lot of servers it seems. [12:48:56] * aude waves :) [12:49:01] https://grafana.wikimedia.org/dashboard/db/job-queue-rate?orgId=1 [12:49:02] hey aude :) [12:49:04] ah! [12:49:10] yeah that makes sense then i guess [12:49:16] sjoerddebruin: thanks for the ping [12:49:21] i assume it'll fix itself then [12:49:22] maybe increase in editing on wikidata [12:50:39] looking at https://phabricator.wikimedia.org/T183053 it seems stas merged the necessary patch [12:50:45] https://grafana.wikimedia.org/dashboard/db/job-queue-health?refresh=1m&orgId=1 [12:50:48] so hopefully the underlying issue is solved soon [12:51:35] ah, that patch was just merged today [12:51:43] so it would go out next wednesday [12:53:29] additionally the elastic cluster is being restarted so delays may be higher than usual [12:54:53] * gehel is jumping on that discussion here as well... [12:55:56] yep, we are restarting all elasticsearch servers, and this has some impact... [12:56:18] ok thanks [13:04:42] for most wikis, this is entirely transparent. But it seems that wikidata relies more heavily on low indexing lag [13:05:07] note to self: better communicate to wikidata the planned maintenance of elasticsearch [13:05:20] :D [13:05:52] but sadly, we are still going to have maintenance, and those are still going to generate indexing lag... [13:06:04] sorry :/ [13:06:06] I'm now aware, so I should do other things. [22:36:04] Lydia_WMDE: Howdy, just to confirm - wikidata team is ready for the review of WikibaseLexeme extension? (I'm a little confused by the bug being marked WIP) [22:36:23] and just to keep the information flowing, the restart of the elasticsearch cluster is completed, search should not be lagging anymore [22:36:33] well, at least not until the next restart...