[00:55:12] Hey volans do you know why jamwiki is not replicated to labs yet? [07:26:54] jamwiki IS being replicated to labs [07:27:34] maybe labs needs to force a puppet run to expose permissions? [09:07:03] another reason on the weight for 5.7 dbstores- multi-tenent replication [09:07:39] individual threads for each wiki, so lag would be less of an issue [10:11:38] I'm importing from dbstore1002 to santitarium, ignore slave stopped on both [10:12:06] I've also found the sanitarium replication problems: T134349 [10:12:06] T134349: Upgrade db1069 - https://phabricator.wikimedia.org/T134349 [11:52:48] hey jynus! could I get you to look at something briefly? [12:50:32] addshore, tell me [12:50:51] Take a look at https://grafana.wikimedia.org/dashboard/db/mediawiki-watcheditemstore [12:51:02] This shows the caching in the class WacthedItemStore in core [12:51:51] the caching stops around 1000 select queries from slaves on the whole cluster / all dbs per min [12:52:50] the question is basically, can we kill the caching? or should we keep it? [12:52:51] and when I say 1000, basically its 1000 cache hits for 9000 cache misses [12:52:56] (on avergae) [12:53:14] let me look, but if your question contains "caching", usually I am the wrong person to ask [12:53:56] I have no idea what those values mean [12:54:22] well, they are light weight queries, and would 1000 more queries spread accross all slaves mer min hurt you? :) [12:54:58] 1000 queries where? and which queries? [12:55:09] and in how much time? [12:55:59] I am missing important info- I assume this is mediawiki-related. mediawiki is the most important app I support, but only one of many [12:56:12] yes mediawiki [12:56:16] is there a ticket? [12:56:21] a code link? [12:57:01] https://github.com/wikimedia/mediawiki/blob/master/includes/WatchedItemStore.php#L493 [12:57:28] that is the only query we are talking about here, selecting a single column from watchlist essentially, using the indexes too so not heavy imo [12:57:30] ok, NOW we start to talk :-) [12:57:57] let me check the table structure [12:58:08] IMO the caching is probably not needed, hence why I thought I would quickly ask you and then remove it [12:59:32] wait, because while I think I now understand what you mean [12:59:45] things are not always easy [13:00:09] in particular, sometimes there are hot spots under special circunstances [13:03:36] so, that select should be "fast" [13:04:22] let me give you an idea based on actual data [13:08:17] 0.01 seconds on the largest production database [13:10:03] now, if you want to change the cache policy, you should ask for +1 one of the more senior devels/perf members [13:11:07] whould I be worried about querying that to live servers- No. Now, I do cannot predict impact on performance per request [13:12:08] specially now that we still have not proper fine-grain measurement on db performance [13:12:51] I agree in general with the idea, assuming the hit ratio is low [13:13:52] watchlist in general have issues, but on loading all users' list (e.g. on editing) [13:14:15] maybe those should be cached somehow instead (they can end up time out-ing) [13:36:03] jynus: okay, I'll put a patch up and get someone to review it (and I'll also add you) [13:36:26] the level of caching right now in the watchlist stuff is so small, removing it and starting over might be the best bet [13:36:59] not only caching [13:37:07] handling in general [13:37:20] are you aware of the RFC related to watchlist [13:37:38] (sorry, I may be asking to the one that started it, cannot remember :-)) [13:37:40] if your talking about watchlist expiry then yes, otherwise I may have misse dit [13:37:45] yes ;) [13:37:54] yes, that [13:38:40] despite what people think, my mediawiki knowledge is purely by osmosis [13:38:45] :D [13:38:48] and seeing errors/backend side [13:39:12] for example, the pseudo-orm that mediawiki has is completelly strange to me [13:39:14] is there a way to see which queries on the watchlist table hurt the most? [13:39:22] yes [13:39:33] we actually have been working on implementing that [13:39:45] that's awesome [13:39:51] we still have not a cool graph/dashboard [13:39:59] *don't have [13:40:07] but he have the raw data [13:40:12] oooh, speaking of dashboards.... https://gerrit.wikimedia.org/r/#/c/284184/ [13:40:33] yeahhhhhh [13:40:47] ganglia is not like... the future [13:41:09] we can deploy it, but there may not be even mysql data about codfw [13:41:35] we will move either to graphite or prometheus [13:42:33] woo! [13:49:55] woo! good or woo bad? [13:51:07] addshore, https://phabricator.wikimedia.org/P2999 [13:51:16] woo good ;) [13:51:34] ahh and great! I'll definatly take a look at those queries [13:51:42] that is the summary of all queries executed on watchlist [13:51:51] ordered by "server time" [13:52:13] in the last X days [13:52:26] that's amazing data to have! :) [13:52:44] (Agregated by digest) [13:56:23] however, I think that doesn't add queries to fail (timeout), for that we have to go to kibana [13:56:33] s/to/that/ [16:08:24] jynus: that is / was only queries on masters? [16:59:44] addshore, sorry? [17:00:05] is https://phabricator.wikimedia.org/P2999 just for queries to the master db? [17:00:27] it is a single, smallish slave [17:00:40] if it has updates, it can be the ones from the master, replicated [17:00:58] ahhh, okay! [17:01:03] I think it would not have API, recentchanges, etc. [17:01:10] I can get those, too [17:01:24] I think there is a special role for that? [17:01:37] I think there is a role for 'watchlist' too [17:01:50] or maybe that is bundeled with recentchanges [17:02:04] I can check that