[20:58:43] jynus: replag graph for s1 is beautiful for the week, and miserable for s4 though :( [20:59:30] I still see horribly big transactions logged in https://logstash.wikimedia.org/goto/6545a6fc4d0aea327b0396856c89cee6 [20:59:39] hoo: ^ [21:00:16] maybe I should make something like $wgMaxUserDBWriteDuration apply to jobs >:D [21:04:54] well, I mentioned that I should probably create a new group, jobs [21:05:00] on dbs [21:05:13] so that at least we can do quick and diry mitigations [21:05:52] AaronSchulz: Yikes… are these all ChangeNotification jobs? [21:06:11] seems largely so [21:06:48] we had some fone before this week with cirrus binging down s1 [21:06:52] *fun [21:07:53] Probably next week we will enable https://grafana.wikimedia.org/dashboard/db/mysql for eqiad and will get more insights [21:09:44] AaronSchulz: :/ Splitting these up is on our long list of things since forever [21:10:18] is there any short term quick & dirty batching you can do in the job? [21:10:52] To a certain degree, yes [21:10:58] that will mean more jobs [21:11:07] but I guess htat's expected [21:12:13] "/* MediaWiki::restInPeace /*" ? [21:12:34] taht does the final commit [21:12:44] and enqueud jobs and stuff [21:12:58] I see heartbeat taking 7 seconds to do a REPLACE? [21:12:59] * enqueued [21:13:31] how can that be, it is a single replace, on a non-transactional engine [21:14:12] some kind of load issue that piles up queries? [21:21:16] average query time is 0.001 seconds; max is 9.999 [21:22:56] do we run enqueued jobs on restInPeace on wikimedia sites hoo? [21:23:48] I think so [21:24:45] wait, there is something- a pileup of reads [21:25:23] which may mean I should convert this table to innodb [21:25:34] wow, that was unexpected [21:27:00] Aron, do not make much plans yet; I think I detected a db issue-, that could could cause delays on replication handlin, which by itself can cause issues on contention, which can cause issues on other queries [21:28:03] so the lag is artificial [21:28:16] I mean, it doesn't help having large transactions [21:28:23] Krenair: we are not supposed to [21:28:29] I remember seeing that and it was odd [21:28:39] it should always be JobRunner::commitMasterChanges [21:29:03] I saw lots of transaction-related logs recently [21:29:09] I thought we didn't too [21:29:16] In fact, I had to do -("Implicit transaction already active"|"Explicit commit of implicit transaction"|"Implicit transaction expected") [21:29:25] on kibana to see somthing [21:50:23] jynus: where did you see that? [21:52:51] https://logstash.wikimedia.org/goto/0fb0bd7bd2a6a986b7a1a3186d296958