[02:18:28] commented on T107282. general +1, but not to reducing buffer pool on any s[1-7], including s3. let's just fix/split/upgrade s3 instead [02:19:29] correct, we have some tables that are not suitable for pt-table-sync [02:19:40] or pt-osc, for that matter [02:20:48] s3 could have many objects altered back into central tablespace, as temporary fix [02:23:40] the db1035 issue you also saw on s1 and s4 -- want to hear more. ie, symptom of thread pool not handling new connections fast enough, even with far fewer objects, or actual resource starvation even with few objects? [10:13:03] let's differenciate here, there are some (right now, very minor) db1035 issues [10:13:21] and there are timout job issues [10:14:00] db1035 restart only made the second problem more frequent [10:14:24] to see the extent of timouts, search for message="Error connecting" url="/rpc/RunJobs.php" [10:14:46] on wfLogDBError dashboard on the logs [10:17:19] not a long time ago, db1060 had a peak of unable to connect errors at 9:05 UTC [10:19:26] I am not sure yet if this is ops-related or logic-related, because only jobs fail substantially, not other connections [10:26:07] my suspicion is on client side, not db side, because it only happens with a very particular traffic [10:26:28] tcp port exhaustion on client? timout was reverted? [10:27:38] maybe only happens on jobs because they create many concurrent connections? [10:28:22] but there is no mw* host correlation [10:33:56] It could be on the servers, but max_connections never gets saturated [20:18:32] see https://gerrit.wikimedia.org/r/228134 [20:19:36] having labs::db and mariadb::labs created an issue with ferm rules [20:20:57] Coren suggested to deprecate mariadb::labs, I kinda agree, separate roles and they own labsdbs [20:30:13] jynus, springle: Know anything about the "reset-mysql-slave" script? [20:31:20] wow, that sounds dangerous [20:31:33] :) [20:31:52] where are you reading that and what do you really want to do [20:32:12] I want to clean up files in the puppet repo that reference old crap [20:32:17] This one contains "/home/wikipedia" [20:32:23] to get the MASTER_PASSWORD, which sounds fun [20:32:28] ok, give a link [20:32:30] tries to read /home/wikipedia/doc/repl-password ... [20:32:35] modules/scap/files/reset-mysql-slave [20:32:37] in the puppet repo [20:32:50] I have no intention of running this, don't worry :p [20:33:05] ok, that is a better introduction [20:33:10] :D [20:33:15] than "BTW, I just executed it" [20:33:44] "Oh hey I just executed this dangerous-looking script. What does it do btw?" [20:35:00] Probably hasn't been touched for 4+ years [20:35:52] it is a script to "change master" of a slave [20:36:09] for me it is useless, and I bet also for springle [20:36:30] there are better scrpts on operations/software/dbtools [20:37:59] I would not need a script for that, so comment it against deployers, delete if not useful [20:38:45] ^Krenair [20:41:33] can I ask you a favour, I am about to disconnect [20:42:47] can you try to move T107265 (not necessarily fix), I would put it higher than low right now [20:43:24] ^this is for Krenair or sean [20:44:34] It has been dominating our warning logs [20:46:02] upped the priority to high [20:47:17] sorry, race condition, didn't mean to lower the priority [20:47:39] mark has told me to go to bed :-), cheers!