[10:12:55] jynus: about db1071, there were spikes during the last 2 days around 8pm but the strange part is that if I do -rpc AND -api they almost disappear, but db1071 is not in the API role. Does this means that there are API calls going to db1071 too? [10:13:57] yes [10:14:56] I only think it is a job-connection issue [10:15:20] the others are synthoms only [10:15:49] I will check db1070 anyway [10:16:52] to be fair, lots of concurrent connections there, 200/server, up to peaks of 400 [10:17:02] in other services it is around 50-60 [10:17:13] that is again probably due to the missing db1058 [10:17:25] we need the new servers there ASAP [10:18:24] and stats to grafana so in this case we could sum connections for all hosts to see if something changed [10:19:14] we can check it already on the masters (or tendril) [10:19:26] check how s2 and s5 masters are overloaded [10:19:34] that usually means job queue [10:19:43] and there is indeed high job activity [10:21:18] check those spikes: https://grafana-admin.wikimedia.org/dashboard/db/job-queue-health [10:22:38] yep [10:25:07] volans, you may need to deploy https://gerrit.wikimedia.org/r/#/c/291696/ [10:26:08] jynus: yes, I just don't know how... I asked you in -operations once merged ;) [10:30:42] oh, if I hadn't told you already is because I would do what you are going to do- check where it is installed on puppet and rebase that [10:31:00] mw1152 [10:31:11] I learned once and unlearned it immediately [10:31:20] I was alrady there, ah so a manual rebase :) [10:31:31] I thought of some magical automation :-P [10:39:29] I wouldn't be surprised that even puppet does that [10:42:38] I tried a puppet noop run, it didn't ;) [10:42:50] I'm updating the docs