[05:07:04] <wikibugs>	 10DBA, 10Gerrit, 10Operations, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4277822 (10Marostegui) >>! In T196840#4277313, @mmodell wrote: > @marostegui: I canceled some of the queued jobs which should have helped somewhat. The only thing I know to do...
[05:09:53] <wikibugs>	 10DBA, 10Gerrit, 10Operations, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4277823 (10mmodell) I've got the queue down to 3.1M by canceling jobs. There is still write traffic involved even to delete the jobs so it hasn't really reduced the traffic as...
[05:20:28] <marostegui>	 All pre steps are done
[05:24:51] <jynus>	 we should enable semi sync shortly after the maintenance, which I think requires a reconnection from all (most) slaves, starting with the most powerful
[05:25:08] <marostegui>	 yep!
[05:33:40] <jynus>	 actually, I see semisync enabled already on the replica
[05:34:02] <jynus>	 Rpl_semi_sync_master_clients | 4
[05:34:09] <marostegui>	 maybe you did it yesterday?
[05:34:13] <jynus>	 probably enabled after the restart + topology changes
[05:34:20] <marostegui>	 aaaah
[05:34:24] <marostegui>	 yeah could be
[05:39:55] <wikibugs>	 10DBA, 10Gerrit, 10Operations, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4277839 (10jcrespo) p:05High>03Normal I don't think this is high from our perspective- they have dedicated db resources and the replica is up to data, and were aware of the...
[05:50:13] <jynus>	 what is our most active wiki on s2 right now?
[05:51:00] <marostegui>	 I guess itwiki or ptwiki
[05:51:49] <marostegui>	 I will polish this later: https://wikitech.wikimedia.org/wiki/MariaDB#Production_section_failover_checklist
[05:51:52] <marostegui>	 but at least it is copied there
[05:53:05] <jynus>	 marostegui: you take care of deploys and I monitor (the automation is not ready yet)?
[05:53:10] <marostegui>	 yep!
[05:53:18] <jynus>	 there is also https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_master_(a.k.a._promoting_a_new_slave_to_master) but it is meant more for emergencies
[05:54:34] <jynus>	 I will copy binlog info into line 19
[05:54:37] <jynus>	 or you do
[05:54:39] <marostegui>	 I will
[05:54:54] <jynus>	 and I will confirm I agree
[05:54:58] <marostegui>	 awesome
[05:55:14] <marostegui>	 I am going to merge, but not deploy, the read only change
[05:55:48] <jynus>	 we should not spend much time until 16 and 24
[05:55:55] <jynus>	 *between
[05:56:00] <jynus>	 let's move to operations
[05:56:05] <marostegui>	 yes
[06:08:57] <wikibugs>	 10DBA, 10Gerrit, 10Operations, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4277853 (10mmodell) The gerrit notedb migration was a one time event, so it shouldn't really be something that happens with every update.
[06:09:27] <jynus>	 ro around 6:00 -> 6:08
[06:10:11] <jynus>	 no errors on logs
[06:10:27] <marostegui>	 yeah, 06:01 to 06:08 according to the deployment logs
[06:11:10] <jynus>	 no stall errors that I can see
[06:11:50] <jynus>	 no fatas
[06:11:53] <jynus>	 *fatals
[06:31:04] <wikibugs>	 10DBA, 10Patch-For-Review: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4277859 (10jcrespo)
[06:32:03] <wikibugs>	 10DBA, 10Patch-For-Review: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4277864 (10Marostegui) This was completed. read only time started at 06:01 read only time finished at 08:08 Total read only time was around 7 minutes
[06:37:48] <wikibugs>	 10DBA: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4277886 (10Marostegui)
[06:38:05] <wikibugs>	 10DBA: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4277900 (10Marostegui) p:05Triage>03Normal
[06:38:26] <wikibugs>	 10DBA: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4277886 (10Marostegui)
[06:38:28] <wikibugs>	 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320#4277903 (10Marostegui)
[06:38:42] <wikibugs>	 10DBA, 10Patch-For-Review: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4211428 (10Marostegui) 05Open>03Resolved a:03Marostegui
[06:38:44] <wikibugs>	 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320#3940866 (10Marostegui)
[06:40:15] <wikibugs>	 10DBA, 10MediaWiki-Platform-Team (MWPT-Q4-Apr-Jun-2018), 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#4277907 (10Marostegui)
[06:40:50] <wikibugs>	 10DBA, 10MediaWiki-Platform-Team (MWPT-Q4-Apr-Jun-2018), 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#4060892 (10Marostegui) s2 (T194870 )was failed over to a different master, and the new master has the sche...
[06:42:20] <wikibugs>	 10DBA, 10Patch-For-Review: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4211428 (10Marostegui)
[08:10:05] <jynus>	 should we copy db1054's data to the new candidate just in case?
[08:10:34] <marostegui>	 or do a compare.py?
[08:10:39] <jynus>	 or alternatively, run a comparison? (not suggesting, just asking)
[08:10:46] <jynus>	 one of the 2
[08:10:52] <marostegui>	 yeah, I will go for the comparison
[08:11:41] <wikibugs>	 10DBA, 10Patch-For-Review: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4278092 (10Marostegui)
[08:22:40] <wikibugs>	 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278128 (10Marostegui) p:05Triage>03Normal
[08:23:37] <wikibugs>	 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278128 (10Marostegui) I would like to suggest July 18th (Wednesday) at 06:00AM UTC as a failover date
[08:24:24] <wikibugs>	 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278158 (10Marostegui)
[08:24:27] <wikibugs>	 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320#3940869 (10Marostegui)
[08:27:28] <wikibugs>	 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278162 (10jcrespo) Seems ok to me at first. I would also like to check for blockers for the parent task, even if they are not blockers for this subtask.
[08:28:11] <wikibugs>	 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278164 (10Marostegui)
[08:28:28] <wikibugs>	 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278165 (10jcrespo)
[08:28:47] <marostegui>	 that is something I was thinking about too
[08:28:51] <marostegui>	 I think we should go for it
[08:29:00] <marostegui>	 let's give the new s2 master some weeks
[08:29:01] <marostegui>	 before
[08:29:05] <wikibugs>	 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278128 (10jcrespo)
[08:29:06] <marostegui>	 and then go for it if it goes fine
[08:29:35] <jynus>	 ^see my description suggestions
[08:29:53] <marostegui>	 yeah
[08:29:57] <marostegui>	 that is what I was saying
[08:30:04] <marostegui>	 that maybe give db1066 a few weeks
[08:30:10] <marostegui>	 to make sure we don't see any regression
[08:30:17] <jynus>	 I was counting on that
[08:30:23] <jynus>	 in any case
[08:30:54] <jynus>	 the row is B?
[08:31:38] <jynus>	 only those hosts https://phabricator.wikimedia.org/T183585#3979437 ?
[08:32:31] <jynus>	 but there is no 52 there, or am i blind?
[08:33:22] <jynus>	 I think we don't have a complete list yet
[08:36:01] <marostegui>	 It is there
[08:36:02] <marostegui>	 B3
[08:36:22] <marostegui>	 at least on rack tables
[08:36:32] <marostegui>	 let's ping arzhel
[08:41:28] <jynus>	 es3 master is also there
[08:42:10] <jynus>	 should we move, eg. es1019 to C and switch there?
[08:43:41] <marostegui>	 +1 yeah
[08:44:41] <jynus>	 we can move es1019 or es1017
[08:44:45] <jynus>	 any preference?
[08:44:52] <jynus>	 e.g. one less stable
[08:44:55] <wikibugs>	 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278187 (10Marostegui)
[08:44:58] <jynus>	 or other thing
[08:45:39] <marostegui>	 let's see if we have some HW issues history
[08:46:15] <jynus>	 es1017 seems it has less crash history
[08:47:29] <marostegui>	 and mgmt seems working
[08:47:34] <marostegui>	 so could be a good candidate
[08:50:32] <jynus>	 I am trying to find a place
[08:50:46] <marostegui>	 maybe we have hosts to decommission on row C?
[08:50:55] <jynus>	 db1056 maybe?
[08:51:15] <marostegui>	 yep
[08:51:20] <marostegui>	 looks like a good one to take out
[08:54:54] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Physically move es1017 from D to C row - https://phabricator.wikimedia.org/T197072#4278217 (10jcrespo)
[08:55:09] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Physically move es1017 from D to C row - https://phabricator.wikimedia.org/T197072#4278232 (10jcrespo)
[09:00:02] <wikibugs>	 10DBA: switchover es1014 to es1017 - https://phabricator.wikimedia.org/T197073#4278245 (10jcrespo)
[09:01:20] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Physically move es1017 from D to C row - https://phabricator.wikimedia.org/T197072#4278273 (10jcrespo)
[09:01:22] <wikibugs>	 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1056 - https://phabricator.wikimedia.org/T193736#4278272 (10jcrespo)
[09:04:30] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Physically move es1017 from D to C row - https://phabricator.wikimedia.org/T197072#4278276 (10jcrespo)
[09:05:02] <wikibugs>	 10DBA: switchover es1014 to es1017 - https://phabricator.wikimedia.org/T197073#4278245 (10jcrespo)
[09:05:26] <jynus>	 so you can see the hierarchy here: https://phabricator.wikimedia.org/T183585
[09:06:20] <wikibugs>	 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1056 - https://phabricator.wikimedia.org/T193736#4278296 (10jcrespo) p:05Low>03Normal Not low anymore, based on my proposal of 1 server movement.
[09:06:38] <AaronSchulz>	 jynus: you mentioned some things not being multi-DC ready. Is there a task of what you had in mind (not necessarily proposals, just problem statements)?
[09:06:51] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Physically move es1017 from D to C row - https://phabricator.wikimedia.org/T197072#4278310 (10jcrespo) p:05Triage>03Normal
[09:07:03] <jynus>	 AaronSchulz: I have some ideas
[09:07:24] <jynus>	 I have not written them because I have not made up my mind and you may hate them
[09:08:00] <jynus>	 but it can be summarized as: I think GTID is broken and messy, both at application and app side
[09:08:19] <jynus>	 and I know you have spent a lot of time of making it work
[09:08:24] <jynus>	 and so we did
[09:09:16] <jynus>	 but maybe it is time to thing about alternatives, that dont require different handling for mysql and mariadb and are more reliable
[09:09:48] <jynus>	 it also relates to replication control, which will need changes to make it work cross-dc
[09:10:17] <jynus>	 and I have doubts it will properly scale though wan
[09:10:27] <jynus>	 (as it is now)
[09:10:50] <jynus>	 agreeing or now, do you see my themes and fears, AaronSchulz?
[09:11:44] <AaronSchulz>	 jynus: you mean things like "wait for slaves to catch up in the remote DC"?
[09:11:47] <jynus>	 yep
[09:11:49] <AaronSchulz>	 which we totally ignore atm
[09:12:00] <jynus>	 also relability issues
[09:12:14] <jynus>	 within-dc network shortages are rare
[09:12:21] <jynus>	 but quite frequent cross-dc
[09:12:39] <jynus>	 and we already have issues with the first ones :-)
[09:13:07] <jynus>	 maybe not full outages (there is physical redundancy)
[09:13:23] <jynus>	 but degradations of latency, etc.
[09:14:02] <jynus>	 also I would like to combine mysql and mariadb hosts on the same section
[09:14:18] <jynus>	 so we are not tied to a single vendor
[09:15:13] <jynus>	 all of these may seem too complex, but I have ideas to make this work with very little code changes
[09:15:43] <jynus>	 I want to sync with you in problem statement- do you want me to write a list of issues/needs I see?
[09:16:13] <AaronSchulz>	 jynus: it would be good to have that in writing with bullet points and so on
[09:16:24] <jynus>	 for example, tim's comment on
[09:16:36] <jynus>	 "do not wait for all replicas"
[09:16:49] <jynus>	 makes much more sense if a remote dc is involved
[09:17:08] <jynus>	 not his comment, the opposite position
[09:17:50] <jynus>	 I already commented to him the problem with extra lag plus multiple tiers of replicas being an order of magnitude slower
[09:18:10] <jynus>	 but I am not sure if it is fully realized
[09:18:55] <jynus>	 maintenance scripts running 10x slower may not be acceptable for developers, etc.
[09:19:17] <jynus>	 and more servers more things that can go bad
[09:19:30] <jynus>	 I will write an RFC-like on phabricator and add you aaron
[09:19:39] <jynus>	 ^AaronSchulz
[09:19:42] <AaronSchulz>	 jynus: have you considered "black hole" type servers that are just there for replication? (just a random thought)
[09:19:47] <AaronSchulz>	 *?
[09:20:03] <jynus>	 I didn't get that last thing, for what of the problems?
[09:20:38] <jynus>	 basically, I am not sure what you mean with black hole servers in context
[09:20:42] <wikibugs>	 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278364 (10Marostegui)
[09:20:42] <AaronSchulz>	 I assume you meant that 1xxx => 1xxy => 2xxz is slower than 1xxx => 2xxz
[09:20:53] <jynus>	 yes
[09:21:08] <jynus>	 we lose parallelism 
[09:21:40] <jynus>	 if you mean some kind of fast relay, it would help (e.g. reducing the consitency of the non-primary master)
[09:21:57] <AaronSchulz>	 so instead of (receive event => apply event => log it => send it out) you'd get rid of one of the steps in the middle
[09:22:12] <AaronSchulz>	 (apply)
[09:22:24] <jynus>	 but it is not the only issue- intermediate replicas in general avoid parralel application of changes
[09:23:01] <jynus>	 *prevent
[09:23:17] <jynus>	 there are techniques, tunings, etc.
[09:23:31] <jynus>	 but WAN latency is mostly based on light speed :-)
[09:24:22] <jynus>	 so we just have to accept replication is not instant and work around that
[09:24:54] <jynus>	 you gave options, like higher locking on write
[09:25:25] <jynus>	 but basically we have to desing any changes with that in mind
[09:26:25] <jynus>	 the decisions about connection pooling are mostly ready
[09:26:30] <jynus>	 (changing topic)
[09:27:10] <jynus>	 for now we will setup proxysql on a codfw master and I will ask you (and others) to test it from mwdebug/canary servers
[09:27:35] <jynus>	 codfw master means that you will connect to separate port on a codfw master that will proxy to the real master
[09:28:01] <jynus>	 not sure if you saw https://phabricator.wikimedia.org/T196378#4254120
[09:28:57] <jynus>	 mediawiki will not know about TLS for now (it will be on the first itteration as if it is connecting to a non-tls db)
[09:30:17] <AaronSchulz>	 I'd imagine gtid waiting could use WaitConditionLoop on pt_heartbeat (or some other such table) instead. The polling interval could increase if the lag is higher and decrease when it's low ("lag" as in now - heartbeat)...and cache could also be used to short-circuit. Something like that would be vastly simpler and would work with mixed maria/mysql. I should probably at least add a mode for that.
[09:30:23] <jynus>	 AaronSchulz: so I don't have yet actionables, just wanted to give you a heads up of a lots of requests for support we will be generating soon (no matter what is the actual iomplementation)
[09:30:34] <jynus>	 AaronSchulz: that was exactly my mind
[09:30:44] <jynus>	 gtid comparisons have gone craxy
[09:31:08] <jynus>	 and even without mediawiki, on purse server side, we are thinking of abandon it
[09:31:12] <jynus>	 *server
[09:31:26] <jynus>	 because the mess when masters change
[09:31:28] <AaronSchulz>	 "real master" as in eqiad or the local "standby master"?
[09:31:34] <jynus>	 yes, sorry
[09:31:39] <jynus>	 I mean active (rw)
[09:31:53] <jynus>	 the other being standby or passive (ro)
[09:32:25] <jynus>	 in the future, with more work, we could even make full write-write dc if pooling is effective
[09:32:26] <AaronSchulz>	 right, for latency testing
[09:32:42] <jynus>	 but that would require way more changes outside of databases
[09:32:48] <jynus>	 we are not yet even thinking about
[09:33:08] <jynus>	 but yes, one proxy per master
[09:33:22] <jynus>	 so SPOF is still SPOF, but we don't add more :-)
[09:33:54] <jynus>	 and configuration should be trivial -master is the local master, either the proxy or the real mysql
[09:34:05] <jynus>	 based on the primary dc configuration
[09:34:29] <jynus>	 AaronSchulz: I love that we are in sync, based on your heartbeat comment
[09:35:06] <jynus>	 obviously, we would need to make heartbeats more frequent: every 0.1 seconds or so
[09:36:07] <jynus>	 I will soon create a ticket with ongoing problems/pain points and share it with you
[09:36:15] <jynus>	 so we can both think of solutions
[09:36:44] <jynus>	 shouldn't you be sleeping, BTW?
[09:36:50] <jynus>	 :-D
[09:37:16] <AaronSchulz>	 my initial apprehension was general dislike of polling...but I think most POST + GET (from redirect) cycles probably involve the replica already being caught up by then and thus no waiting. A few tight polling queries would usually be all that is needed. With cache to help and dynamic waiting, I think it can be made quite reasonable, even without proper blocking I/O. 
[09:37:41] <jynus>	 yeah, no solutions is perfect
[09:37:58] <jynus>	 but we can iterate on either fixing the current one
[09:37:59] <AaronSchulz>	 meh, I tend to get up late, or nap midday, or stay up all day sometimes
[09:38:10] <jynus>	 or think of alternatives
[09:38:10] * AaronSchulz does not recommend that though
[09:39:00] <jynus>	 the whole gtid_status:X-Y-Z,U-V-X,Q-R-S is really bad
[09:39:05] <jynus>	 and will only get worse
[09:39:16] <jynus>	 we are also reasearching being able to clean up those
[09:39:40] <jynus>	 but in the past it was painful- required stop of all servers at the same time
[09:40:07] <jynus>	 we want to test MySQL 8.0, which has some features that would be advantageous
[09:40:20] <jynus>	 but we cannot pool it because gtid is configured per section
[09:40:45] <jynus>	 (we are not planning a migration, but we don't want to be tied to a single vendor)
[09:41:18] <jynus>	 the mysql<->mariadb diversion will only cause headaches
[09:42:03] <jynus>	 have some rest, AaronSchulz- I will add you to a ticket as promised with a list of pain points so you can help us solve it
[09:42:13] <jynus>	 thank you very much for you alway helpful hand!
[09:42:25] <AaronSchulz>	 I also suspect that gtid_current_pos had some bug that caused me to revert back to using gtid_slave_pos
[09:42:40] <jynus>	 I wouldn't be surprised
[09:43:06] <jynus>	 pain here for pure sysadmin tasks too
[09:43:21] <AaronSchulz>	 it only failed sometimes (a minority but a huge # of events) and not on servers with a bunch of replication channels or local writes or fancy stuff
[09:43:31] <AaronSchulz>	 so I can't see it being some subtle semantic thing afaik
[09:43:48] <AaronSchulz>	 I know a bug is filed that it fails to update sometimes upstream
[09:43:56] <AaronSchulz>	 not really a confidence builder
[09:44:05] <jynus>	 we are writing our own automatic topology changes script based on binlogs + heartbeats
[09:44:16] <jynus>	 because it doesn't try to be smart and give us control
[09:44:39] <jynus>	 and doesn't break if we accidentaly write to the replicas
[09:45:18] <jynus>	 gtid both for mariadb and mysql was such a great idea
[09:45:38] <jynus>	 but it is turning up badly for many people in practice (not only us)
[09:46:04] <jynus>	 I shared experiences with other people with large installations, and they share our pain
[10:50:17] <jynus>	 marostegui: it started working https://phabricator.wikimedia.org/P7254
[10:50:57] <marostegui>	 yeah, saw it on gerrit
[10:51:03] <marostegui>	 it looks very promising! 
[10:51:09] <jynus>	 a few hours too late, I know
[10:51:19] <marostegui>	 it is never too late, we will have more failovers!
[10:51:38] <jynus>	 I can add the replica switch, too, but that needs more work
[10:51:48] <jynus>	 the topology changes, I mean
[10:52:00] <marostegui>	 yeah, we can leave that for a second phase, we have a working script already
[10:52:04] <marostegui>	 so no need to refactor that!
[10:52:11] <jynus>	 that needs a source of truth
[10:53:14] <jynus>	 or maybe I can auto-discover replicas?
[10:53:38] <marostegui>	 yeah, that'd be nice, but that has problems
[10:53:50] <marostegui>	 and it is that we can miss replicas if for whatever reason they are down
[10:53:54] <jynus>	 yeah, we may only change a subset of them
[10:53:59] <marostegui>	 the source of truth will tell you: that is a replica, but it is down
[10:54:17] <jynus>	 well, I have checks for stupid things already, that is easy to do
[10:54:23] <marostegui>	 ah :)
[10:54:33] <jynus>	 for example, the #1 this is trying to failover a host to itslef
[10:54:39] <jynus>	 and it fails
[10:55:17] <marostegui>	 yeah
[10:55:24] <marostegui>	 we can add a bunch of those
[10:55:35] <marostegui>	 once we have the source of truth or harcoding things
[10:55:41] <marostegui>	 like: oh, you are a sanitarium master -> fail
[10:56:04] <jynus>	 it also fails if the mater has a replication channel already
[10:56:10] <jynus>	 or the master is not a direct replica
[10:56:20] <marostegui>	 yeah those are good
[10:56:31] <jynus>	 or read onlys are strange
[11:00:57] * _joe_ lunch
[11:01:34] <jynus>	 _joe_: wrong channel?
[11:01:49] <jynus>	 (not complaining if you invite us)
[11:23:29] <_joe_>	 jynus: ahah yeah
[13:49:56] <wikibugs>	 10DBA, 10Patch-For-Review: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4279223 (10Marostegui)
[13:50:10] <wikibugs>	 10DBA, 10Patch-For-Review: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4277886 (10Marostegui) main tables have been checked without any differences.
[13:50:40] <wikibugs>	 10DBA, 10Patch-For-Review: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4279225 (10Marostegui)
[15:26:30] <wikibugs>	 10DBA, 10MediaWiki-Configuration, 10Operations: Create tool to handle the state of database configuration in MediaWiki in etcd - https://phabricator.wikimedia.org/T197126#4279638 (10Joe)
[15:26:43] <_joe_>	 marostegui, jynus, volans ^^
[15:27:14] <_joe_>	 can you look at that ticket and give me some feedback before the end of the week? I plan on working on it next week while you enjoy yourselves in Prague
[15:27:35] <jynus>	 may I suggest a vision change?
[15:27:46] <jynus>	 nothing too radical
[15:28:43] <jynus>	 like mw developers in the past, I don't need high level interfaces "warmup" is too high level
[15:28:52] <jynus>	 (they offered that functionality)
[15:29:14] <_joe_>	 that's what we talked about when we had our meeting :)
[15:29:17] <jynus>	 we don't need that, we want low level- we can later do warmup scripts
[15:29:35] <jynus>	 yes, but you asked what is the typical things we do
[15:29:46] <_joe_>	 it's actually different from doing anything lower-level, which you can still do of course
[15:29:47] <jynus>	 warmup is something we do often, but not something se need
[15:29:51] <_joe_>	 this is the high-level interface
[15:30:10] <_joe_>	 if you want to script it, you should use the python code I'll write as a library, most probably
[15:31:09] <jynus>	 we need to setup individual values, the verb is "set" not "warmup", which would be otherwise too complex or too limited
[15:31:52] <volans>	 I can have a look later or tomorrow
[15:32:51] <_joe_>	 no, what we agreed  on in april was that "warmup" would be used just to ramp-up to the basic weights you already decided and that you can set with "edit"
[15:33:06] <_joe_>	 so if you want to change an individual weight you use that
[15:33:13] <jynus>	 ok
[15:33:18] <jynus>	 then maybe the problem
[15:33:20] <_joe_>	 but ok, I will leave the functionality and leave that for later
[15:33:24] <jynus>	 is a missunderstanding
[15:33:28] <jynus>	 on what warmup means
[15:33:31] <jynus>	 on my side
[15:33:41] <jynus>	 maybe we agree, but warmup is a bad name
[15:33:44] <jynus>	 :-)
[15:34:19] <jynus>	 if warmup == 'change weight to X' we agree
[15:34:49] <_joe_>	 it was more like "send this server X% of the traffic it would get normally"
[15:34:54] <jynus>	 I think we agree then
[15:34:59] <jynus>	 just warmup is missleading
[15:35:02] <jynus>	 for me
[15:35:18] <jynus>	 because we would use warmup to reduce the load too
[15:35:39] <jynus>	 eg. wamup 0.5 (reduce traffic by 50%)
[15:35:58] <jynus>	 so we just need to change warmup by other verb  and that will make us happy :-)
[15:36:12] <jynus>	 literally a string change on the proposal :-)
[15:36:50] <jynus>	 setweight or something
[15:37:17] <jynus>	 I don't have a good alternative right now, but now I understand what you mean
[15:38:14] <jynus>	 I tought at firt that was like a multiplier
[15:38:17] <jynus>	 *first
[15:41:18] <wikibugs>	 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4279770 (10Marostegui)
[15:42:03] <_joe_>	 well, it is a multiplier of the weight the server should have in a normal situation;
[15:43:18] <jynus>	 yes, as I said, it was the name that was missleading to me
[15:43:38] <wikibugs>	 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4279792 (10Marostegui)
[15:43:42] <jynus>	 now that I understod it better it is ok
[15:44:05] <_joe_>	 cool :)
[15:44:16] <jynus>	 I would have some questions about expectations
[15:45:49] <jynus>	 on the details
[15:45:55] <jynus>	 willl write on the ticket
[15:47:09] <jynus>	 thanks for writing that, _joe_
[16:14:38] <wikibugs>	 10DBA, 10MediaWiki-User-management, 10Anti-Harassment (AHT Sprint 23): Draft a proposal for granular blocks table schema(s), submit for DBA review - https://phabricator.wikimedia.org/T193449#4169871 (10dbarratt)
[16:15:23] <wikibugs>	 10DBA, 10MediaWiki-User-management, 10Anti-Harassment (AHT Sprint 23): Draft a proposal for granular blocks table schema(s), submit for DBA review - https://phabricator.wikimedia.org/T193449#4169871 (10dbarratt)
[16:45:02] <wikibugs>	 10DBA, 10Patch-For-Review: Decommission db1053 - https://phabricator.wikimedia.org/T194634#4280161 (10jcrespo)
[16:45:43] <wikibugs>	 10DBA, 10decommission: Decommission db1053 - https://phabricator.wikimedia.org/T194634#4204024 (10jcrespo) a:03RobH
[16:47:01] <wikibugs>	 10DBA, 10decommission: Decommission db1059 - https://phabricator.wikimedia.org/T196606#4280170 (10jcrespo) a:05jcrespo>03RobH
[16:48:22] <wikibugs>	 10DBA, 10decommission: Decommission db1051 - https://phabricator.wikimedia.org/T195484#4280180 (10jcrespo) a:05jcrespo>03RobH ready for decomm.
[16:51:17] <wikibugs>	 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320#4280190 (10jcrespo) db1054 is pending wait everything is working as expected on s2- db1052 process has not started yet. All others are ready for robh/dcops to continue as noted on the subtasks.
[18:02:42] <wikibugs>	 10DBA, 10MediaWiki-Configuration, 10Operations: Create tool to handle the state of database configuration in MediaWiki in etcd - https://phabricator.wikimedia.org/T197126#4280521 (10Volans) Quick first feedback/questions on the proposal:  > dbconfig get NAME gets you all the current configuration of a mysql...