[05:24:44] @SothoTalKer did you resolve it? [05:58:59] PROBLEM - Check systemd state on wdqs1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [05:59:25] PROBLEM - Blazegraph process -wdqs-blazegraph- on wdqs1010 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 499 (blazegraph), regex args ^java .* --port 9999 .* blazegraph-service-.*war https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [05:59:27] PROBLEM - Blazegraph Port for wdqs-blazegraph on wdqs1010 is CRITICAL: connect to address 127.0.0.1 and port 9999: Connection refused https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [05:59:37] PROBLEM - WDQS HTTP Port on wdqs1010 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 380 bytes in 0.001 second response time [06:00:17] RECOVERY - Check systemd state on wdqs1010 is OK: OK - running: The system is fully operational [06:00:45] RECOVERY - Blazegraph process -wdqs-blazegraph- on wdqs1010 is OK: PROCS OK: 1 process with UID = 499 (blazegraph), regex args ^java .* --port 9999 .* blazegraph-service-.*war https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [06:00:47] RECOVERY - Blazegraph Port for wdqs-blazegraph on wdqs1010 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9999 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [06:00:57] RECOVERY - WDQS HTTP Port on wdqs1010 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.031 second response time [10:40:40] the project chat should be more browsable again [11:04:56] sjoerddebruin: I left an update on the WikiWetenschappers page about ShEx... your input is very much appreciated [11:05:08] egonw_: will take a look :) [11:54:14] pintoch: if you are curious how maxlag works now... https://grafana.wikimedia.org/d/000000170/wikidata-edits?refresh=1m&orgId=1&from=now-6h&to=now [11:54:31] you can clearly see quickstatements respecting it :) [11:57:33] oh that’s a very neat graph :) [11:59:28] nice! [12:51:33] https://grafana.wikimedia.org/d/000000489/wikidata-query-service?panelId=8&fullscreen&orgId=1&from=now-6h&to=now [13:35:19] sjoerddebruin: I think it's Daniel again, so: https://www.wikidata.org/wiki/User_talk:Daniel_Mietchen#Editing_speed [13:35:57] I hope Magnus has time to add batch queueing next week... [13:36:58] it would be nice, but are we sure it would solve the problem? [13:37:20] The solution would be query service lag > max lag [13:37:29] yeah clearly [13:37:45] And for the rest, we're hitting Blazegraph restrictions afaik [13:38:59] if there was a way not to update the RDF serialization of the entire item every time, I assume it would help a lot [13:39:24] also having QS do atomic updates like OpenRefine does, potentially [13:39:35] i think there was a task for that... [13:40:14] https://phabricator.wikimedia.org/T212826 ? [13:41:07] thanks, token awarded :) [13:59:51] pintoch: to follow up from our discussion yesterday, it seems articles do not have more statements then chemicals (using 500 instances of each): https://twitter.com/egonwillighagen/status/1121413364645863424 [14:01:39] egonw_: ok! so we are lucky that there is not as much activity in chemistry as in Wikicite… [14:02:03] indeed :) [14:02:04] (the WDQS lag started increasing again today when Daniel resumed three QS batches) [14:02:12] yes, seeing that [14:03:35] note that statements are just one part of the item size, too: I assume labels, descriptions and aliases also play a role in item size [14:08:07] pintoch: follow up: https://twitter.com/egonwillighagen/status/1121415537123364864 [14:08:24] pintoch: absolutely... also qualifiers and references [14:09:01] ok, what other class do we expect to be large or small... [14:09:18] sjoerddebruin: what "type" should I add? you had yesterday something particularly large, not? [14:18:21] pintoch: well, it seem that when we consider subclasses, articles do seem larger, better visible after log2 transform: https://twitter.com/egonwillighagen/status/1121417684867010561 [14:19:08] nice! [14:19:24] I'll now try with a larger sample... [14:19:28] 500 is not too large [14:19:33] sjoerddebruin: I have blocked Daniel for now, let's see if WDQS likes that [14:20:49] pintoch: okay, that settles it :) [14:21:24] (umm... about size=5000, not your message to Sjoerd) [14:22:10] https://twitter.com/egonwillighagen/status/1121419027086282752 [14:22:20] ok, going to be offline for some time now [14:28:36] pintoch: it's probably not the first time batch edits happen to large items, why does it suddenly effect the lag so much? [14:29:30] good question! yes Daniel has been adding topics like this for a while… [14:31:48] I am not sure when Lucas_WMDE patched QS so that edits are made via the correct accounts, but maybe this increased the editing volume of QS (since the background jobs were capped by QuickStatementsBot's editing speed before)? [14:32:56] that was a while ago [14:33:14] October 2018, says git log [14:33:28] ah ok, so it's not that [14:34:36] lag slowly going down again [15:05:32] it's more or less back to normal [15:29:42] sjoerddebruin: fyi: https://github.com/fnielsen/scholia/issues/683 [15:30:20] egonw_: thank you :) [15:36:00] pintoch, sjoerddebruin: the source code: https://egonw.github.io/wikidata-item-size/wikidata_item_size.html [16:02:22] ok, done a few more types... articles are not particularly large... it seems my choice of chem compounds and professors was poor :) https://twitter.com/egonwillighagen/status/1121444289693790209 [16:03:14] so, if the hypothesis that size matter is true, then edits on articles should not be particularly bad... (but still just looking at the number of statements) [16:29:08] pintoch, sjoerddebruin: to solve the issue of various edits behind each other (which is also very normal for manual editing), is to have the RDF conversion for the WDQS not happen for every edit, but throttle that to 3 seconds [16:29:37] even down to 2.5 second, it would have the load on the system for those problematic edits [16:29:55] (without additonal hardware) [16:31:34] settlements is another category with about the same number of statements as articles [16:44:09] yeah, it's not clear to me whether this is already done or not [16:45:03] SMalyshev: ^ [16:46:19] Not sure how throttling would help... unless we insert a separate filtering pipeline for edits [16:46:33] just shifting back in time won't change anything [16:47:19] I don't think that is what egonw means [16:47:23] adding an event filter might help, but would require some work. it is not done right now [16:47:56] if there are two consecutive edits to Q123 in a short period of time, Q123's RDF serialization should be updated only once [16:49:16] pintoch: well, the problem is that when we're processing the first edit we have no idea whether the other one is coming... we can just load the latest RDF, which is happening now, but that precludes use of cache for RDF data which is why we'll be probably moving away from it [16:49:58] data shows very close consequtive edits are relatively rare, tons of edits to different items are much more common [16:50:23] but you are already processing updates by batches, right? so these batches are too small to see the same edit multiple times… [16:50:32] *the same item, I mean [16:50:35] yes [16:50:44] and by too small I mean like 1000 edits [16:51:03] yeah. So it would help if QS bundled up its edits correctly. [16:51:09] within that batch size, repeat edits are relatively rare [16:51:47] pintoch: that would require developing external edit filter, because with current batch size it's likely impossible to see two consequtive (human) edits [16:51:54] too many other edits in between [16:52:03] makes sense [16:52:35] if we had a filter that batches in larger chunks (and maybe also does some delay and so on) it could be possible, but we don't have that right now [16:52:49] it can be done but requires some work... [16:52:52] right, so https://phabricator.wikimedia.org/T212826 is likely to be more effective I guess [19:46:21] nope, i didnt. had something to do with how the object was stored in js i think. [20:12:25] hi