[08:55:24] <marostegui>	 Amir1: Is MW ready to have ms3 as a section in dbctl even if not used?
[11:19:59] <Emperor>	 mildly concerned to have another report https://phabricator.wikimedia.org/T387340#10589479 of MW losing a race with itself about writing an object resulting in MW thinking the object is gone when in fact it is present on swift.
[11:20:26] <Emperor>	 Not sure if this is a user trying some new workflow that is exposing an old race, or if something has changed in MW that is now losing this race more often
[11:26:36] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on x1 on db1220 is CRITICAL: 12.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1220&var-port=9104
[11:28:36] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on x1 on db1220 is CRITICAL: 10.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1220&var-port=9104
[11:31:36] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on x1 on db1220 is OK: (C)10 ge (W)5 ge 1 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1220&var-port=9104
[11:52:14] <Emperor>	 Interesting observation that our older Config-J systems have special firmware on the controllers - https://phabricator.wikimedia.org/T384003#10588536
[12:15:52] <marostegui>	 Amir1: I am ready to start working on ms2 but that means that x2 will stop having any standby replica
[12:15:58] <marostegui>	 as ms3 is done
[12:16:07] <marostegui>	 So if there's a x2 host failure there is no host to failover ot
[12:16:09] <marostegui>	 *to
[12:16:25] <Amir1>	 if that happens, maybe we can force push the changes? :D
[12:16:40] <marostegui>	 XD
[12:16:45] <marostegui>	 maybe I should do it monday
[12:16:46] <marostegui>	 just in case
[12:17:07] <Amir1>	 yeah, let's do it on Monday, we can't use it until Monday anyway
[12:55:54] <kwakuofori>	 Emperor: does that potentially mean that the other controller we want to try may not solve the problem?
[12:56:32] <kwakuofori>	 this extra information does not provide a lot of encouragement...
[13:02:09] <Emperor>	 kwakuofori: no idea, sorry :(
[13:02:51] <Emperor>	 e.lukey is looking to get to the point of trying a controller reset (but they need a newer storcli to attempt that), which may or may not help
[13:04:12] <kwakuofori>	 hmm... ok, I get we just wait and see
[13:06:20] <Emperor>	 I think so, yes.
[13:06:58] <Emperor>	 A quote for the controller is in the works, at least
[16:05:55] <federico3>	 @elukey I left a question for you on https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1084813 - should I flag the comment as "resolved" for you to see it?
[16:07:25] <elukey>	 federico3: o/ nono I get an email if you answer, usually we leave to the person that originally created the comment to resolve (unless it is something trivial like "typo" etc..)
[16:18:06] <elukey>	 federico3: tried to reply, for your confirm_on_failure + retry I don't have a solid answer, I'll try to document myself
[16:18:33] <elukey>	 in any case, I'll try to help speeding up the code patches (whatever I can, for the more complicated things we'll need to defer to Riccardo)
[16:19:25] <federico3>	 elukey: I'm doing really minor cleanups- the bulk of the code is there
[16:20:25] <federico3>	 (I'm also e2e testing the script with running a real clone process)
[16:21:10] <elukey>	 federico3: I am aware that you are picking up previous work, but I wouldn't define it minor cleanup, some code is being added and for better of worse you are the owner of it now :D
[16:21:50] <elukey>	 and I am totally aware that this is tested etc.., but I am worried about other folks using it without your context and ending up with terse stacktraces to debug
[16:22:13] <elukey>	 it is also fine to leave things as they are with minor error msgs sumups etc..
[16:22:16] <elukey>	 nothing really major
[16:22:21] <elukey>	 and we refine as we go
[16:23:39] <federico3>	 I was referring to the last few commits triggering the linters; I just have to fix the docstrings and I don't expect pushing any substantial change to this CR so that it can be merged
[16:24:23] <elukey>	 ah right okok