[01:11:29] jynus: ignoring that^ for now. need to get db1034 done (repartitioning as s7 logpager) [05:54:24] I can take it from here [06:07:14] I do not think "it" needs clarification- I am answering your last comment about T104471 [06:18:02] there is an alarm right now for the SQL thread at db2029- it is not precise, the issue there is full max_connections [06:37:02] I've just depooled db2029 from codfw? I am confused about why I had to do that- I supposed there is not traffic there. Will investigate. [07:15:01] There seems to be some accidental health checks on codfw- not real traffic. The problem here is that db029 got saturated in connections. [07:15:24] ^db2029, I mean [07:55:18] I still do not have all details of current traffic on codfw or why connections to db2029 failed. There is a long tail of connections that had failed to db2029. I will wait and see. [10:54:49] ^the long tail of connection errors to db2029 on kibana is no more. Server is "pooled" but with a weight of 0. We will have to do more testing, but for now I will leave it as 0 [10:55:41] I will try to fix dbstore2X. the plan is: [10:56:26] 1) Confirm that a change of configuration of replication filtering fixes the issues [10:57:54] 2) stop replication on a different s7 codfw db node, export tables in innodb format (binary) and then reimport them; convert them to tokudb [10:58:27] 3) Check other nodes in production with similar replication filtering rules [11:00:32] I am still involved with yuvi on the issue of correcting labsdb grants [11:18:44] db1002 just got a disk failure, acking it because it may be decommisioned (T103005) [13:08:13] jynus: did you say db1002 can be decom'd? (wording in the ticket is a bit off with me) [13:09:37] sorry I wasn't clear, JohnFLewis [13:09:46] It may or may be not [13:09:53] not decided yet [13:10:02] but do not replace it either [13:10:18] (the disk, I mean) [13:10:53] but I wanted to report it in case it was useful to see bad batches, etc. [13:11:00] Just wanted to check as if it was to be decomed, I was going to provide some patches for it [13:11:17] patches? [13:11:43] Puppet code/MW config changes [13:12:13] oh, do not care, in my opinion for db1002-7 [13:12:25] they are not in production in any case [13:13:56] do you see any specific problem, JohnFLewis, or it was a "I am going to do it for all db* hosts?" [13:15:14] I see no problem, just saw the ticket in my emails and saw chat of decom and thought I could provide patches :) [13:16:05] let's wait, but they will be eventually be decommissioned or turned into misc servers [13:16:38] if they are referenced on mediawiki config, you can feel free to delete them from there [13:17:12] but I would like to check them before erase them/repourpose them [13:17:19] I'll have a look to see if they can removed, the host name->IP ones shouldn't but I think there are more [13:17:36] thanks if you do that! [13:23:04] I just looked at where they're included, it's db-secondary which includes a pmtpa db server as well [13:23:24] buff, that is pure archeology [13:23:56] and exactly the reason that ticket is marked "for later" [13:24:08] :-) [13:24:53] Probably not even worth the bytes to do the patches for them [13:25:37] no, doing something eventually is needed- specially for puppet and wikimedia config [13:25:49] to get rid of old configs and so [13:26:16] but not a priority, when there are currently small issues on codfw [13:26:40] I can't see how db-secondary is used actually and it looks out of date (last updated a year ago). I'll remove it and add you to it to decide if it needs anymore [13:27:22] fine with me [13:28:25] and thank you again! [13:31:40] https://gerrit.wikimedia.org/r/#/c/222289/ [13:32:38] allow me some days to double check it, ok? [13:32:52] Yeah, that's fine [13:38:35] also, if it is not an emergency, I also like sean to see it too [13:46:33] I added Sean to the patch as well anyway [13:46:58] yep, I saw it, but he should be sleeping by now :-) [14:33:59] I've just created T104573 with the issues with db2029 [14:34:22] I have depooled it for the time being (SAL) [18:09:37] finished the export of s7 from db2047, there will be lag on that server for a few minutes [20:14:36] status right now: both dbstore2001 and dbstore2002 are getting logical importing s7, do not start 's7' replication while it is ongoing [20:15:27] pending to fix: (export and import) m3 on dbstore2001 [20:20:07] db2047 can be put back into production (used only for exports), but db2029 not: T104573