[02:37:53] https://gerrit.wikimedia.org/r/#/c/265427/ [02:38:06] wrong channel [08:39:09] jynus: https://gerrit.wikimedia.org/r/265466 [08:41:13] Given what we saw with the "small" slaves before and the higher number of query failures coming from db1045 I decided to not put load onto it [08:41:44] also that will make sure it wont receive wb_terms search traffic [09:44:19] I do not like much only having 2 servers with main load [09:44:34] hm [09:44:59] I agree with the logic in terms of performance/load balancing [09:45:29] we have to wait also high availability if one of the 2 fails completelly [09:45:58] Well, the LoadBalancer should help us out if one of the two fail completely [09:45:59] I would suggest having another with non 0 weight, even if very little [09:46:11] but I'm not sure the others could keep up with all the load [09:46:30] that is why they are with 500 weight [09:46:59] Is that to keep the other server "hot"? [09:47:14] yep, among other reasons [09:47:28] it doesn't have to be a 1/3 of the load [09:47:37] but 1/10, etc [09:48:25] look all other shards, I have a minimum of 3 servers pooled [09:48:46] s5 is special (in a bad way) [09:48:48] but I see the point [09:49:31] if doesn't have to be the api, it can be the rc, but I do not trust that one because of the index changes [09:50:18] what about the dump/ vslow one? [09:50:30] But db1049 has high iowait anyway [09:50:36] so rather a question in general [09:50:52] (which, btw, I have to check to see if they work after my fix yesterday) [09:51:09] I think the problem here is that we need an extra server to have 2 api servers [09:51:46] that would have solved many of the problems yesterday [09:51:51] Yes [09:52:24] I think we will for codfw [09:52:27] also that would make us more resistant to db1070 or 71 going bad [09:53:36] problem is I do not have new servers yet [09:53:48] In eqiad? [09:53:49] but they are coming soon [09:54:22] Amended the change to put db1045 back to 50 (like it was before yesterday) [09:55:13] the problem with dump is that normally it has little load, but at peaks it has a lot, which is not good for comining regular load [09:55:41] its buffer pool will usually be trashed [09:55:57] because of the giant selects of old data [09:56:18] so we don't usually use them [09:56:41] Makes sense [09:56:58] Maybe we should have a wikibase and/ or wb_terms load group [09:57:14] I wanted to do that before, but not trivial to get into all abstraction layers [09:57:24] (within Wikibase) [10:04:45] jynus: I'm not going to be around for probably the next few hours [10:04:54] Don't wait for me with that change or anyhting [10:05:20] ok [10:06:05] please note that I think the API wasn't exausted on capacity, but in "serial" capacity, if that makes sense