[01:11:29] <springle>	 jynus: ignoring that^ for now. need to get db1034 done (repartitioning as s7 logpager)
[05:54:24] <jynus>	 I can take it from here
[06:07:14] <jynus>	 I do not think "it" needs clarification- I am answering your last comment about T104471
[06:18:02] <jynus>	 there is an alarm right now for the SQL thread at db2029- it is not precise, the issue there is full max_connections
[06:37:02] <jynus>	 I've just depooled db2029 from codfw? I am confused about why I had to do that- I supposed there is not traffic there. Will investigate.
[07:15:01] <jynus>	 There seems to be some accidental health checks on codfw- not real traffic. The problem here is that db029 got saturated in connections.
[07:15:24] <jynus>	 ^db2029, I mean
[07:55:18] <jynus>	 I still do not have all details of current traffic on codfw or why connections to db2029 failed. There is a long tail of connections that had failed to db2029. I will wait and see.
[10:54:49] <jynus>	 ^the long tail of connection errors to db2029 on kibana is no more. Server is "pooled" but with a weight of 0. We will have to do more testing, but for now I will leave it as 0
[10:55:41] <jynus>	 I will try to fix dbstore2X. the plan is:
[10:56:26] <jynus>	 1) Confirm that a change of configuration of replication filtering fixes the issues
[10:57:54] <jynus>	 2) stop replication on a different s7 codfw db node, export tables in innodb format (binary) and then reimport them; convert them to tokudb
[10:58:27] <jynus>	 3) Check other nodes in production with similar replication filtering rules
[11:00:32] <jynus>	 I am still involved with yuvi on the issue of correcting labsdb grants
[11:18:44] <jynus>	 db1002 just got a disk failure, acking it because it may be decommisioned (T103005)
[13:08:13] <JohnFLewis>	 jynus: did you say db1002 can be decom'd? (wording in the ticket is a bit off with me)
[13:09:37] <jynus>	 sorry I wasn't clear, JohnFLewis
[13:09:46] <jynus>	 It may or may be not
[13:09:53] <jynus>	 not decided yet
[13:10:02] <jynus>	 but do not replace it either
[13:10:18] <jynus>	 (the disk, I mean)
[13:10:53] <jynus>	 but I wanted to report it in case it was useful to see bad batches, etc.
[13:11:00] <JohnFLewis>	 Just wanted to check as if it was to be decomed, I was going to provide some patches for it
[13:11:17] <jynus>	 patches?
[13:11:43] <JohnFLewis>	 Puppet code/MW config changes
[13:12:13] <jynus>	 oh, do not care, in my opinion for db1002-7
[13:12:25] <jynus>	 they are not in production in any case
[13:13:56] <jynus>	 do you see any specific problem, JohnFLewis, or it was a "I am going to do it for all db* hosts?"
[13:15:14] <JohnFLewis>	 I see no problem, just saw the ticket in my emails and saw chat of decom and thought I could provide patches :)
[13:16:05] <jynus>	 let's wait, but they will be eventually be decommissioned or turned into misc servers
[13:16:38] <jynus>	 if they are referenced on mediawiki config, you can feel free to delete them from there
[13:17:12] <jynus>	 but I would like to check them before erase them/repourpose them
[13:17:19] <JohnFLewis>	 I'll have a look to see if they can removed, the host name->IP ones shouldn't but I think there are more
[13:17:36] <jynus>	 thanks if you do that!
[13:23:04] <JohnFLewis>	 I just looked at where they're included, it's db-secondary which includes a pmtpa db server as well
[13:23:24] <jynus>	 buff, that is pure archeology
[13:23:56] <jynus>	 and exactly the reason that ticket is marked "for later"
[13:24:08] <jynus>	 :-)
[13:24:53] <JohnFLewis>	 Probably not even worth the bytes to do the patches for them
[13:25:37] <jynus>	 no, doing something eventually is needed- specially for puppet and wikimedia config
[13:25:49] <jynus>	 to get rid of old configs and so
[13:26:16] <jynus>	 but not a priority, when there are currently small issues on codfw
[13:26:40] <JohnFLewis>	 I can't see how db-secondary is used actually and it looks out of date (last updated a year ago). I'll remove it and add you to it to decide if it needs anymore
[13:27:22] <jynus>	 fine with me
[13:28:25] <jynus>	 and thank you again!
[13:31:40] <JohnFLewis>	 https://gerrit.wikimedia.org/r/#/c/222289/
[13:32:38] <jynus>	 allow me some days to double check it, ok?
[13:32:52] <JohnFLewis>	 Yeah, that's fine
[13:38:35] <jynus>	 also, if it is not an emergency, I also like sean to see it too
[13:46:33] <JohnFLewis>	 I added Sean to the patch as well anyway
[13:46:58] <jynus>	 yep, I saw it, but he should be sleeping by now :-)
[14:33:59] <jynus>	 I've just created T104573 with the issues with db2029
[14:34:22] <jynus>	 I have depooled it for the time being (SAL)
[18:09:37] <jynus>	 finished the export of s7 from db2047, there will be lag on that server for a few minutes
[20:14:36] <jynus>	 status right now: both dbstore2001 and dbstore2002 are getting logical importing s7, do not start 's7' replication while it is ongoing
[20:15:27] <jynus>	 pending to fix: (export and import) m3 on dbstore2001
[20:20:07] <jynus>	 db2047 can be put back into production (used only for exports), but db2029 not: T104573