[05:07:04] 10DBA, 10Gerrit, 10Operations, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4277822 (10Marostegui) >>! In T196840#4277313, @mmodell wrote: > @marostegui: I canceled some of the queued jobs which should have helped somewhat. The only thing I know to do... [05:09:53] 10DBA, 10Gerrit, 10Operations, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4277823 (10mmodell) I've got the queue down to 3.1M by canceling jobs. There is still write traffic involved even to delete the jobs so it hasn't really reduced the traffic as... [05:20:28] All pre steps are done [05:24:51] we should enable semi sync shortly after the maintenance, which I think requires a reconnection from all (most) slaves, starting with the most powerful [05:25:08] yep! [05:33:40] actually, I see semisync enabled already on the replica [05:34:02] Rpl_semi_sync_master_clients | 4 [05:34:09] maybe you did it yesterday? [05:34:13] probably enabled after the restart + topology changes [05:34:20] aaaah [05:34:24] yeah could be [05:39:55] 10DBA, 10Gerrit, 10Operations, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4277839 (10jcrespo) p:05High>03Normal I don't think this is high from our perspective- they have dedicated db resources and the replica is up to data, and were aware of the... [05:50:13] what is our most active wiki on s2 right now? [05:51:00] I guess itwiki or ptwiki [05:51:49] I will polish this later: https://wikitech.wikimedia.org/wiki/MariaDB#Production_section_failover_checklist [05:51:52] but at least it is copied there [05:53:05] marostegui: you take care of deploys and I monitor (the automation is not ready yet)? [05:53:10] yep! [05:53:18] there is also https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_master_(a.k.a._promoting_a_new_slave_to_master) but it is meant more for emergencies [05:54:34] I will copy binlog info into line 19 [05:54:37] or you do [05:54:39] I will [05:54:54] and I will confirm I agree [05:54:58] awesome [05:55:14] I am going to merge, but not deploy, the read only change [05:55:48] we should not spend much time until 16 and 24 [05:55:55] *between [05:56:00] let's move to operations [05:56:05] yes [06:08:57] 10DBA, 10Gerrit, 10Operations, 10Phabricator: Massive increase of writes in m3 section - https://phabricator.wikimedia.org/T196840#4277853 (10mmodell) The gerrit notedb migration was a one time event, so it shouldn't really be something that happens with every update. [06:09:27] ro around 6:00 -> 6:08 [06:10:11] no errors on logs [06:10:27] yeah, 06:01 to 06:08 according to the deployment logs [06:11:10] no stall errors that I can see [06:11:50] no fatas [06:11:53] *fatals [06:31:04] 10DBA, 10Patch-For-Review: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4277859 (10jcrespo) [06:32:03] 10DBA, 10Patch-For-Review: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4277864 (10Marostegui) This was completed. read only time started at 06:01 read only time finished at 08:08 Total read only time was around 7 minutes [06:37:48] 10DBA: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4277886 (10Marostegui) [06:38:05] 10DBA: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4277900 (10Marostegui) p:05Triage>03Normal [06:38:26] 10DBA: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4277886 (10Marostegui) [06:38:28] 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320#4277903 (10Marostegui) [06:38:42] 10DBA, 10Patch-For-Review: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4211428 (10Marostegui) 05Open>03Resolved a:03Marostegui [06:38:44] 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320#3940866 (10Marostegui) [06:40:15] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q4-Apr-Jun-2018), 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#4277907 (10Marostegui) [06:40:50] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q4-Apr-Jun-2018), 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#4060892 (10Marostegui) s2 (T194870 )was failed over to a different master, and the new master has the sche... [06:42:20] 10DBA, 10Patch-For-Review: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4211428 (10Marostegui) [08:10:05] should we copy db1054's data to the new candidate just in case? [08:10:34] or do a compare.py? [08:10:39] or alternatively, run a comparison? (not suggesting, just asking) [08:10:46] one of the 2 [08:10:52] yeah, I will go for the comparison [08:11:41] 10DBA, 10Patch-For-Review: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4278092 (10Marostegui) [08:22:40] 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278128 (10Marostegui) p:05Triage>03Normal [08:23:37] 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278128 (10Marostegui) I would like to suggest July 18th (Wednesday) at 06:00AM UTC as a failover date [08:24:24] 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278158 (10Marostegui) [08:24:27] 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320#3940869 (10Marostegui) [08:27:28] 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278162 (10jcrespo) Seems ok to me at first. I would also like to check for blockers for the parent task, even if they are not blockers for this subtask. [08:28:11] 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278164 (10Marostegui) [08:28:28] 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278165 (10jcrespo) [08:28:47] that is something I was thinking about too [08:28:51] I think we should go for it [08:29:00] let's give the new s2 master some weeks [08:29:01] before [08:29:05] 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278128 (10jcrespo) [08:29:06] and then go for it if it goes fine [08:29:35] ^see my description suggestions [08:29:53] yeah [08:29:57] that is what I was saying [08:30:04] that maybe give db1066 a few weeks [08:30:10] to make sure we don't see any regression [08:30:17] I was counting on that [08:30:23] in any case [08:30:54] the row is B? [08:31:38] only those hosts https://phabricator.wikimedia.org/T183585#3979437 ? [08:32:31] but there is no 52 there, or am i blind? [08:33:22] I think we don't have a complete list yet [08:36:01] It is there [08:36:02] B3 [08:36:22] at least on rack tables [08:36:32] let's ping arzhel [08:41:28] es3 master is also there [08:42:10] should we move, eg. es1019 to C and switch there? [08:43:41] +1 yeah [08:44:41] we can move es1019 or es1017 [08:44:45] any preference? [08:44:52] e.g. one less stable [08:44:55] 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278187 (10Marostegui) [08:44:58] or other thing [08:45:39] let's see if we have some HW issues history [08:46:15] es1017 seems it has less crash history [08:47:29] and mgmt seems working [08:47:34] so could be a good candidate [08:50:32] I am trying to find a place [08:50:46] maybe we have hosts to decommission on row C? [08:50:55] db1056 maybe? [08:51:15] yep [08:51:20] looks like a good one to take out [08:54:54] 10DBA, 10Operations, 10ops-eqiad: Physically move es1017 from D to C row - https://phabricator.wikimedia.org/T197072#4278217 (10jcrespo) [08:55:09] 10DBA, 10Operations, 10ops-eqiad: Physically move es1017 from D to C row - https://phabricator.wikimedia.org/T197072#4278232 (10jcrespo) [09:00:02] 10DBA: switchover es1014 to es1017 - https://phabricator.wikimedia.org/T197073#4278245 (10jcrespo) [09:01:20] 10DBA, 10Operations, 10ops-eqiad: Physically move es1017 from D to C row - https://phabricator.wikimedia.org/T197072#4278273 (10jcrespo) [09:01:22] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1056 - https://phabricator.wikimedia.org/T193736#4278272 (10jcrespo) [09:04:30] 10DBA, 10Operations, 10ops-eqiad: Physically move es1017 from D to C row - https://phabricator.wikimedia.org/T197072#4278276 (10jcrespo) [09:05:02] 10DBA: switchover es1014 to es1017 - https://phabricator.wikimedia.org/T197073#4278245 (10jcrespo) [09:05:26] so you can see the hierarchy here: https://phabricator.wikimedia.org/T183585 [09:06:20] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1056 - https://phabricator.wikimedia.org/T193736#4278296 (10jcrespo) p:05Low>03Normal Not low anymore, based on my proposal of 1 server movement. [09:06:38] jynus: you mentioned some things not being multi-DC ready. Is there a task of what you had in mind (not necessarily proposals, just problem statements)? [09:06:51] 10DBA, 10Operations, 10ops-eqiad: Physically move es1017 from D to C row - https://phabricator.wikimedia.org/T197072#4278310 (10jcrespo) p:05Triage>03Normal [09:07:03] AaronSchulz: I have some ideas [09:07:24] I have not written them because I have not made up my mind and you may hate them [09:08:00] but it can be summarized as: I think GTID is broken and messy, both at application and app side [09:08:19] and I know you have spent a lot of time of making it work [09:08:24] and so we did [09:09:16] but maybe it is time to thing about alternatives, that dont require different handling for mysql and mariadb and are more reliable [09:09:48] it also relates to replication control, which will need changes to make it work cross-dc [09:10:17] and I have doubts it will properly scale though wan [09:10:27] (as it is now) [09:10:50] agreeing or now, do you see my themes and fears, AaronSchulz? [09:11:44] jynus: you mean things like "wait for slaves to catch up in the remote DC"? [09:11:47] yep [09:11:49] which we totally ignore atm [09:12:00] also relability issues [09:12:14] within-dc network shortages are rare [09:12:21] but quite frequent cross-dc [09:12:39] and we already have issues with the first ones :-) [09:13:07] maybe not full outages (there is physical redundancy) [09:13:23] but degradations of latency, etc. [09:14:02] also I would like to combine mysql and mariadb hosts on the same section [09:14:18] so we are not tied to a single vendor [09:15:13] all of these may seem too complex, but I have ideas to make this work with very little code changes [09:15:43] I want to sync with you in problem statement- do you want me to write a list of issues/needs I see? [09:16:13] jynus: it would be good to have that in writing with bullet points and so on [09:16:24] for example, tim's comment on [09:16:36] "do not wait for all replicas" [09:16:49] makes much more sense if a remote dc is involved [09:17:08] not his comment, the opposite position [09:17:50] I already commented to him the problem with extra lag plus multiple tiers of replicas being an order of magnitude slower [09:18:10] but I am not sure if it is fully realized [09:18:55] maintenance scripts running 10x slower may not be acceptable for developers, etc. [09:19:17] and more servers more things that can go bad [09:19:30] I will write an RFC-like on phabricator and add you aaron [09:19:39] ^AaronSchulz [09:19:42] jynus: have you considered "black hole" type servers that are just there for replication? (just a random thought) [09:19:47] *? [09:20:03] I didn't get that last thing, for what of the problems? [09:20:38] basically, I am not sure what you mean with black hole servers in context [09:20:42] 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4278364 (10Marostegui) [09:20:42] I assume you meant that 1xxx => 1xxy => 2xxz is slower than 1xxx => 2xxz [09:20:53] yes [09:21:08] we lose parallelism [09:21:40] if you mean some kind of fast relay, it would help (e.g. reducing the consitency of the non-primary master) [09:21:57] so instead of (receive event => apply event => log it => send it out) you'd get rid of one of the steps in the middle [09:22:12] (apply) [09:22:24] but it is not the only issue- intermediate replicas in general avoid parralel application of changes [09:23:01] *prevent [09:23:17] there are techniques, tunings, etc. [09:23:31] but WAN latency is mostly based on light speed :-) [09:24:22] so we just have to accept replication is not instant and work around that [09:24:54] you gave options, like higher locking on write [09:25:25] but basically we have to desing any changes with that in mind [09:26:25] the decisions about connection pooling are mostly ready [09:26:30] (changing topic) [09:27:10] for now we will setup proxysql on a codfw master and I will ask you (and others) to test it from mwdebug/canary servers [09:27:35] codfw master means that you will connect to separate port on a codfw master that will proxy to the real master [09:28:01] not sure if you saw https://phabricator.wikimedia.org/T196378#4254120 [09:28:57] mediawiki will not know about TLS for now (it will be on the first itteration as if it is connecting to a non-tls db) [09:30:17] I'd imagine gtid waiting could use WaitConditionLoop on pt_heartbeat (or some other such table) instead. The polling interval could increase if the lag is higher and decrease when it's low ("lag" as in now - heartbeat)...and cache could also be used to short-circuit. Something like that would be vastly simpler and would work with mixed maria/mysql. I should probably at least add a mode for that. [09:30:23] AaronSchulz: so I don't have yet actionables, just wanted to give you a heads up of a lots of requests for support we will be generating soon (no matter what is the actual iomplementation) [09:30:34] AaronSchulz: that was exactly my mind [09:30:44] gtid comparisons have gone craxy [09:31:08] and even without mediawiki, on purse server side, we are thinking of abandon it [09:31:12] *server [09:31:26] because the mess when masters change [09:31:28] "real master" as in eqiad or the local "standby master"? [09:31:34] yes, sorry [09:31:39] I mean active (rw) [09:31:53] the other being standby or passive (ro) [09:32:25] in the future, with more work, we could even make full write-write dc if pooling is effective [09:32:26] right, for latency testing [09:32:42] but that would require way more changes outside of databases [09:32:48] we are not yet even thinking about [09:33:08] but yes, one proxy per master [09:33:22] so SPOF is still SPOF, but we don't add more :-) [09:33:54] and configuration should be trivial -master is the local master, either the proxy or the real mysql [09:34:05] based on the primary dc configuration [09:34:29] AaronSchulz: I love that we are in sync, based on your heartbeat comment [09:35:06] obviously, we would need to make heartbeats more frequent: every 0.1 seconds or so [09:36:07] I will soon create a ticket with ongoing problems/pain points and share it with you [09:36:15] so we can both think of solutions [09:36:44] shouldn't you be sleeping, BTW? [09:36:50] :-D [09:37:16] my initial apprehension was general dislike of polling...but I think most POST + GET (from redirect) cycles probably involve the replica already being caught up by then and thus no waiting. A few tight polling queries would usually be all that is needed. With cache to help and dynamic waiting, I think it can be made quite reasonable, even without proper blocking I/O. [09:37:41] yeah, no solutions is perfect [09:37:58] but we can iterate on either fixing the current one [09:37:59] meh, I tend to get up late, or nap midday, or stay up all day sometimes [09:38:10] or think of alternatives [09:38:10] * AaronSchulz does not recommend that though [09:39:00] the whole gtid_status:X-Y-Z,U-V-X,Q-R-S is really bad [09:39:05] and will only get worse [09:39:16] we are also reasearching being able to clean up those [09:39:40] but in the past it was painful- required stop of all servers at the same time [09:40:07] we want to test MySQL 8.0, which has some features that would be advantageous [09:40:20] but we cannot pool it because gtid is configured per section [09:40:45] (we are not planning a migration, but we don't want to be tied to a single vendor) [09:41:18] the mysql<->mariadb diversion will only cause headaches [09:42:03] have some rest, AaronSchulz- I will add you to a ticket as promised with a list of pain points so you can help us solve it [09:42:13] thank you very much for you alway helpful hand! [09:42:25] I also suspect that gtid_current_pos had some bug that caused me to revert back to using gtid_slave_pos [09:42:40] I wouldn't be surprised [09:43:06] pain here for pure sysadmin tasks too [09:43:21] it only failed sometimes (a minority but a huge # of events) and not on servers with a bunch of replication channels or local writes or fancy stuff [09:43:31] so I can't see it being some subtle semantic thing afaik [09:43:48] I know a bug is filed that it fails to update sometimes upstream [09:43:56] not really a confidence builder [09:44:05] we are writing our own automatic topology changes script based on binlogs + heartbeats [09:44:16] because it doesn't try to be smart and give us control [09:44:39] and doesn't break if we accidentaly write to the replicas [09:45:18] gtid both for mariadb and mysql was such a great idea [09:45:38] but it is turning up badly for many people in practice (not only us) [09:46:04] I shared experiences with other people with large installations, and they share our pain [10:50:17] marostegui: it started working https://phabricator.wikimedia.org/P7254 [10:50:57] yeah, saw it on gerrit [10:51:03] it looks very promising! [10:51:09] a few hours too late, I know [10:51:19] it is never too late, we will have more failovers! [10:51:38] I can add the replica switch, too, but that needs more work [10:51:48] the topology changes, I mean [10:52:00] yeah, we can leave that for a second phase, we have a working script already [10:52:04] so no need to refactor that! [10:52:11] that needs a source of truth [10:53:14] or maybe I can auto-discover replicas? [10:53:38] yeah, that'd be nice, but that has problems [10:53:50] and it is that we can miss replicas if for whatever reason they are down [10:53:54] yeah, we may only change a subset of them [10:53:59] the source of truth will tell you: that is a replica, but it is down [10:54:17] well, I have checks for stupid things already, that is easy to do [10:54:23] ah :) [10:54:33] for example, the #1 this is trying to failover a host to itslef [10:54:39] and it fails [10:55:17] yeah [10:55:24] we can add a bunch of those [10:55:35] once we have the source of truth or harcoding things [10:55:41] like: oh, you are a sanitarium master -> fail [10:56:04] it also fails if the mater has a replication channel already [10:56:10] or the master is not a direct replica [10:56:20] yeah those are good [10:56:31] or read onlys are strange [11:00:57] * _joe_ lunch [11:01:34] _joe_: wrong channel? [11:01:49] (not complaining if you invite us) [11:23:29] <_joe_> jynus: ahah yeah [13:49:56] 10DBA, 10Patch-For-Review: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4279223 (10Marostegui) [13:50:10] 10DBA, 10Patch-For-Review: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4277886 (10Marostegui) main tables have been checked without any differences. [13:50:40] 10DBA, 10Patch-For-Review: Decommission db1054 - https://phabricator.wikimedia.org/T197063#4279225 (10Marostegui) [15:26:30] 10DBA, 10MediaWiki-Configuration, 10Operations: Create tool to handle the state of database configuration in MediaWiki in etcd - https://phabricator.wikimedia.org/T197126#4279638 (10Joe) [15:26:43] <_joe_> marostegui, jynus, volans ^^ [15:27:14] <_joe_> can you look at that ticket and give me some feedback before the end of the week? I plan on working on it next week while you enjoy yourselves in Prague [15:27:35] may I suggest a vision change? [15:27:46] nothing too radical [15:28:43] like mw developers in the past, I don't need high level interfaces "warmup" is too high level [15:28:52] (they offered that functionality) [15:29:14] <_joe_> that's what we talked about when we had our meeting :) [15:29:17] we don't need that, we want low level- we can later do warmup scripts [15:29:35] yes, but you asked what is the typical things we do [15:29:46] <_joe_> it's actually different from doing anything lower-level, which you can still do of course [15:29:47] warmup is something we do often, but not something se need [15:29:51] <_joe_> this is the high-level interface [15:30:10] <_joe_> if you want to script it, you should use the python code I'll write as a library, most probably [15:31:09] we need to setup individual values, the verb is "set" not "warmup", which would be otherwise too complex or too limited [15:31:52] I can have a look later or tomorrow [15:32:51] <_joe_> no, what we agreed on in april was that "warmup" would be used just to ramp-up to the basic weights you already decided and that you can set with "edit" [15:33:06] <_joe_> so if you want to change an individual weight you use that [15:33:13] ok [15:33:18] then maybe the problem [15:33:20] <_joe_> but ok, I will leave the functionality and leave that for later [15:33:24] is a missunderstanding [15:33:28] on what warmup means [15:33:31] on my side [15:33:41] maybe we agree, but warmup is a bad name [15:33:44] :-) [15:34:19] if warmup == 'change weight to X' we agree [15:34:49] <_joe_> it was more like "send this server X% of the traffic it would get normally" [15:34:54] I think we agree then [15:34:59] just warmup is missleading [15:35:02] for me [15:35:18] because we would use warmup to reduce the load too [15:35:39] eg. wamup 0.5 (reduce traffic by 50%) [15:35:58] so we just need to change warmup by other verb and that will make us happy :-) [15:36:12] literally a string change on the proposal :-) [15:36:50] setweight or something [15:37:17] I don't have a good alternative right now, but now I understand what you mean [15:38:14] I tought at firt that was like a multiplier [15:38:17] *first [15:41:18] 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4279770 (10Marostegui) [15:42:03] <_joe_> well, it is a multiplier of the weight the server should have in a normal situation; [15:43:18] yes, as I said, it was the name that was missleading to me [15:43:38] 10DBA: Failover db1052 (s1) db primary master - https://phabricator.wikimedia.org/T197069#4279792 (10Marostegui) [15:43:42] now that I understod it better it is ok [15:44:05] <_joe_> cool :) [15:44:16] I would have some questions about expectations [15:45:49] on the details [15:45:55] willl write on the ticket [15:47:09] thanks for writing that, _joe_ [16:14:38] 10DBA, 10MediaWiki-User-management, 10Anti-Harassment (AHT Sprint 23): Draft a proposal for granular blocks table schema(s), submit for DBA review - https://phabricator.wikimedia.org/T193449#4169871 (10dbarratt) [16:15:23] 10DBA, 10MediaWiki-User-management, 10Anti-Harassment (AHT Sprint 23): Draft a proposal for granular blocks table schema(s), submit for DBA review - https://phabricator.wikimedia.org/T193449#4169871 (10dbarratt) [16:45:02] 10DBA, 10Patch-For-Review: Decommission db1053 - https://phabricator.wikimedia.org/T194634#4280161 (10jcrespo) [16:45:43] 10DBA, 10decommission: Decommission db1053 - https://phabricator.wikimedia.org/T194634#4204024 (10jcrespo) a:03RobH [16:47:01] 10DBA, 10decommission: Decommission db1059 - https://phabricator.wikimedia.org/T196606#4280170 (10jcrespo) a:05jcrespo>03RobH [16:48:22] 10DBA, 10decommission: Decommission db1051 - https://phabricator.wikimedia.org/T195484#4280180 (10jcrespo) a:05jcrespo>03RobH ready for decomm. [16:51:17] 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320#4280190 (10jcrespo) db1054 is pending wait everything is working as expected on s2- db1052 process has not started yet. All others are ready for robh/dcops to continue as noted on the subtasks. [18:02:42] 10DBA, 10MediaWiki-Configuration, 10Operations: Create tool to handle the state of database configuration in MediaWiki in etcd - https://phabricator.wikimedia.org/T197126#4280521 (10Volans) Quick first feedback/questions on the proposal: > dbconfig get NAME gets you all the current configuration of a mysql...