[06:16:08] 10DBA, 10Product-Infrastructure-Team-Backlog: DBA review for the PushNotifications extension - https://phabricator.wikimedia.org/T246716 (10Marostegui) p:05Triage→03Medium The intent is to also release it on Q4 as well? Will this involve schema changes? If so, to which tables are likely to be involved - th... [06:18:07] 10DBA: Move db1111 from test-s4 to s8 - https://phabricator.wikimedia.org/T246447 (10Marostegui) 05Open→03Resolved db1111 is now serving in s8 with the same weight as db1126 (400) [06:52:19] 10DBA, 10Patch-For-Review: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1096.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202003030652_... [07:15:23] 10DBA: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1096.eqiad.wmnet'] ` and were **ALL** successful. [07:41:07] Morning marostegui :) [07:41:14] hey [07:41:22] Happy if I continue upping the limit? [07:42:00] addshore: We are expecting to do a change on external store today at 09:00 AM UTC, so if you increase it, make sure to stop at least 30 minutes before our change if possible [07:42:11] They should not collide with each other, but let's be careful [07:42:18] Ack! [07:43:00] Let me get cache warming now and see how far that gets :) [07:43:00] If it's ready more than 30 mins before I'll do it before, if not I'll do it after! [07:44:33] cool [07:45:50] cache warming start :) [07:45:53] *started [07:56:14] the cache warming goes at the same speed on both hosts now that they have the same load :) [08:00:44] yeah, they are the same hw as well [08:01:20] <3 [08:01:52] less than 60k ops on the old store now, and just over 100k ops on the new one :) [08:03:32] https://usercontent.irccloud-cdn.com/file/TSZry8Ad/image.png [08:03:35] migration so far ^^ [08:14:09] how long do you reckon it will still take? [08:17:09] I think you'll have to do your external store thing before I do anything :) [08:17:24] oh, or do you mean the migration in its entirety ? [08:17:26] I mean in total :) [08:17:38] so the last pass of the migration script is at 58.8 million [08:17:53] progresses at maybe 1 million a day [08:18:19] It could easily go faster, but right now it has this silly 1 second sleep in between batches to make sure we dont lag codfw [08:18:49] If we wanted to, I think we could get reads all the way up to 50 million today :P [08:20:03] although the cache warming to get to 50 million ight take 15-17 hours alone :P [08:20:18] tis not a high speed sport :D [08:20:33] but, should get faster as items have less terms [08:20:50] Once the train rollws out this week, we can stop writing to some of the old store we think :) [08:21:01] ie once it is on wikidata, so wednesday [08:28:49] Yup, I wont do a cofnig change before your external store stuff :) I'll just keep the caches warm [08:28:56] Thanks [08:28:57] :) [09:57:28] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10Goal, 10Patch-For-Review: Enable es4 and es5 as writable new external store sections - https://phabricator.wikimedia.org/T246072 (10Marostegui) es4 has been deployed successfully, it is now being written. We are going to give it 24h and tomorrow... [10:08:23] going from 12 -> 15 million now [10:08:45] ok [10:12:37] 15 million done and saw barely any increase [10:12:51] I'm going to warm the cache for 15->20 million now [10:28:07] I don't know the name of the new servers by heart yet :-( [10:29:09] haha [10:29:12] they'll come [10:29:15] in order to update spicerack for the switchdc, what will be the 'es' cluster RW? just 4 and 5? [10:29:23] *clusters [10:29:25] es4 and es5 yep [10:29:25] volans: we are transitioning [10:29:29] I know [10:29:32] ok [10:29:32] the end game [10:29:40] than what manuel said [10:29:40] to prepare the patch [10:29:44] ok [10:29:53] es1,2,3 will be ro [10:29:55] thanks <3 [10:39:57] {sent} :) [12:35:55] going to go to 20 mill in the next few mins [12:50:04] now at 20 mill, and warming the next 5 mill [13:38:58] k [13:39:17] db1126 is having spikes again [13:39:27] let's not increase more today after that last one [14:13:30] db1111 has around the same weight [14:13:42] but cpu scaling it much worse [14:13:43] yes, they have the same weight [14:14:05] I wonder if there is also some other db1126-specifics that make it have more issues [14:14:25] db1111 is buster+10.4 [14:14:30] that's the only difference [14:14:40] and db1126 is stretch+10.1 [14:14:41] ah, so maybe improved compression [14:15:00] let me check things like hw misconfiguration and/or cpu governor [14:16:17] they have the same governor [14:16:23] (I checked when I first deployed it) [14:18:48] disk size is larger, but that should not matter [14:19:24] maybe 10.4 just performs better? [14:19:39] Is VD emulated : Yes/No [14:22:42] The disk throuphput and IOPS is almost the same on both hosts [14:24:01] the graphs are pretty much equal for both hosts, but seems that db1126 decides to use more CPU [14:24:21] yeah, don't see any obvious reason, let me check table status [14:25:34] both have all tables compressed [14:25:44] yep [14:26:31] I will check bugs or reports of high cpu usage between 10.1 and 10.4 [14:28:08] I am checking another 10.4 host compared to 10.1 (for enwiki) [14:28:13] And it does seem to use less CPU [14:28:33] So maybe 10.4 has improved on that front [14:28:59] "Extremely high cpu usage and slow performance after upgrade from 5.5.52 to 10.1.29" [14:29:12] ha! [14:29:18] Resolution:Cannot Reproduce [14:29:41] I am going to repool db1096:3315 and db1096:3316 (Even though we still dont' have graphs) [14:29:59] Should be fine anyways [14:30:14] Those will also give us direct cpu comparison once we have the exporter fixed [14:30:22] So we can compare another hosts with their exact equivalent on 10.1 [14:31:44] nothing really apparent [14:34:04] ok with repooling that host even though we are blind on a graph level? [14:34:08] shouldn't be a big deal [14:34:16] it is s5 and s6, which don't have load issues [14:36:47] up to you, I was going to look into the prometheus issue in a minute [14:37:04] yeah, no rush about it, I think we are fine without having that for a while [14:37:16] I just don't want to leave them depooled for long [14:41:52] This is the worse wikidata query at db1126: https://phabricator.wikimedia.org/P10589 [14:41:53] * addshore reads up [14:43:42] while it is not slow in average, it reads 1976 rows for each 6 it returns [14:44:27] excuted 138697646 times on that server in a week [14:44:55] the high CPU usage lines up with me warming the cache on db1126 [14:45:16] or 200 times per second [14:45:18] interesting that db1111 seems slightly less loaded than db1126 [14:47:58] "while it is not slow in average, it reads 1976 rows for each 6 it returns" << yup, This is why i raised the idea of uncompressing the wbt_item_terms table, as I figure that is the biggest CPU hit [14:48:45] ? [14:49:00] row counters there are independent of compression [14:49:10] they are just handler api calls [14:49:49] I figured that more rows being returned and filtered = more cpu usage for decompression? (that was a guess though from me) [14:50:24] addshore: if you give me some values for that query, I can try to check the query plan on 10.4 and 10.1 too [14:50:31] can do [14:50:45] I think you can take over while I check prometheus [14:50:50] definitely :) [14:50:58] wanted to share what I got until now [14:51:26] also if there is a change on handler calls [14:51:33] or some other optimization [14:51:48] SELECT /* Wikibase\Lib\Store\Sql\Terms\DatabaseTermInLangIdsResolver::selectTermsViaJoin */ wbtl_id, wbtl_type_id, wbxl_language, wbx_text, wbit_item_id FROM `wbt_term_in_lang`, `wbt_text_in_lang`, `wbt_text`, `wbt_item_terms` WHERE (wbtl_text_in_lang_id=wbxl_id) AND (wbxl_text_id=wbx_id) AND wbit_item_id IN (148, 159, 334, 423, 668, 836, 869, 881, 928, 1867, 4970, 5826, 8646, 8686, 11725, 11736, 14773, 15174, 15175, 15176) AND [14:51:48] (`wbit_term_in_lang_id` = wbtl_id) AND wbtl_type_id = 2 AND wbxl_language = 'zh' /* 48e190962e41cab73577ba2c90524037 db1126 wikidatawiki 2s */ [14:51:54] bah, was meant to paste bin ... [14:52:03] addshore: cool, let me check that [14:52:20] https://phabricator.wikimedia.org/P10590 paste [14:53:21] nah, the query plan is exactly the same [14:53:49] so that query in words is.... Lookup the zh descriptions for 20 items [14:54:40] the query is slightly faster on 10.4 indeed (0.06 vs 0.3) [14:54:47] the fact it has to lookup 6521 rows to do that is indeed a bit sucky :/ [14:55:21] let me examine what the optimizer is actually doing [14:55:25] with 10.4 we now can do that! [14:55:30] :D [14:55:50] prometheus-mysqld-exporter: error: unknown short flag '-w', try --help [14:56:22] that's coming from the -web-liste I believe [14:56:34] I tried --web-listen and worked [14:56:42] I believe, I don't remember well [14:56:57] I am guyessing -web.listen-address ":13315" [14:56:59] yeah [14:57:11] so maybe it changed again format since last time [14:57:31] or I only tested it on non-multiinstance [14:57:37] that is an easy fix [14:58:30] addshore: Looks like that chosen index is actually the best one to choose, based on the optimizer cost: https://phabricator.wikimedia.org/P10593 [14:58:56] the other plans would end up scanning more rows from what I can see [14:59:08] would smaller batches change that at all? [14:59:21] we can try that, do you want to draft a query for me? [14:59:25] wbit_item_id IN (148, 159, 334, 423, 668, 836, 869, 881, 928, 1867) [14:59:30] checking [14:59:33] ^^ replace the other IN with that :) [15:00:03] right now they are at a max batch size of 20 items [15:00:31] it scans 4.2k [15:01:30] is there any user impact if you reduce the batch size? [15:02:36] let me check other indexes [15:03:02] we can definitely try decreasing it, It'll lead to slight increase in execution time in the cases where batches are needed i guess due to sql latency [15:03:52] nah, the other possible index wbt_item_terms_term_in_lang_id_item_id is a lot worse (as expected) [15:03:58] and as the optimizer trace showed [15:04:04] but I think essentially the db servers will end up doing the same amount of work, just in smaller batches as 2303 + 4218 (which were my 2 10 item batches) = the number of rows grabbed by the full 20 batch [15:04:17] would a different index make sense? [15:04:33] No, but I wanted to try if the optimizer was right on its trace [15:05:02] I reported a bug to 10.4 the other day, where the optimizer was being silly and chosing the wrong plan: https://jira.mariadb.org/browse/MDEV-21813 [15:05:11] but in this case it is doing the same thing on 10.1 and 10.4 [15:05:56] I think I am going to decrease a bit the weight for db1126 [15:06:05] and leave it slighly under db1111 for now [15:06:15] not much, but just a bit [15:08:13] So far 10.4 is working well, I have only found two issues, that one and another one which is about to get fixed I believe: https://jira.mariadb.org/browse/MDEV-21794 [15:09:08] It's probably a bit of an intense day in terms of db load, cache warming 2 db servers, migration running at "full tilt" :P [15:09:16] and all writes still on going [15:09:34] I have decreased a bit its weight and it is now getting some breath room [15:09:42] I would hope we can see some load decrease once we start turning some of the old term store writes off from wednesday [15:09:44] let's not change config until tomorrow? [15:09:49] marostegui: ack! :) [15:10:04] addshore: are you doing that tomorrow? [15:10:20] Yes, once the train rolls tommorrow we can start turning some of the writes off [15:10:24] Cool [15:10:36] We will start with properties, and if that goes well can start turning off batches of items too [15:10:37] We are going to set es2 in read only tomorrow at 9AM UTC [15:10:44] but hopefully will be done in 1h or less [15:10:47] if things go well [15:10:47] ack! [15:11:10] so can you repeat again what will that imply? [15:11:13] stop writing to both? [15:11:19] for certain number of items? [15:11:26] I'll leave the cache warming running that I have on the 2 dbs now, it should end in the next hour, and I will re do it tommorrow [15:11:43] so step 1 would be to stop writing to wb_terms for property changes [15:12:14] step 2 will be going through turning off writes to wb_terms for batches of items, probably first 100k, then 500k then 1 million, 2 million etc [15:12:33] in theroy we can go all the way up to where we are reading from now (20 million) but want to allow some time to spot issues [15:12:44] yeah, makes total sense [15:12:58] so step 1 is done kind of ON/OFF? [15:13:11] yup [15:13:21] and that's to be done on Wed? [15:13:23] in theroy, at that point we could start removing rows from wb_terms [15:13:28] but i dont know if that is desired [15:13:39] or if we just dont touch it, as last time we tried to remove things from the table it exploded [15:13:40] :P [15:13:52] It won't make any difference disk-wise, as we'd need to optimize it anways [15:14:01] Yes, step 1 will happen wednesday after the train rolls to wikidatawiki [15:14:02] It is probably better if we can wait until we can drop it entirely [15:14:13] yup, im pro waiting until we can totally drop it [15:14:30] how long do you think it will take to reach that point? [15:14:48] early april [15:14:53] <3 [15:15:13] I am curious to see how long it will take for the drop table to run [15:15:16] It could probably go quite a lot faster if we didnt have this 1 second sleep in between batches of 100 :P [15:15:51] haha [15:15:59] 30000000/100*1/60/60/24 = 3.47 days of maybe undeeded sleeping [15:15:59] xD [15:16:02] Better be safe! [15:16:04] *un needed [15:16:42] Yeah, but if you remove that sleep, I wonder = XX would be where XX is the sleepless nights for the DBAs! [15:17:03] :D [15:18:19] I am very excited to know if that CPU usage reduction comes from 10.4 itself [15:18:25] that'd be good news [15:26:50] 10DBA: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 (10Marostegui) [15:35:11] spike of Wikibase\Lib\Store\DivergingEntityIdException from line 154 of /srv/mediawiki/php-1.35.0-wmf.21/extensions/Wikibase/lib/includes/Store/Sql/WikiPageEntityRevisionLookup.php: Revision 1128156192 belongs to P3090 instead of ? [15:35:17] addshore: ^ [15:36:28] it is not ongoing but please have a look at -operations [15:36:33] *looks* [15:37:06] marostegui: I created a patch for multiinstance [15:37:15] oh, I will take a look in a sec [15:37:17] but will need to puppet-compile so it is a noop [15:37:36] don't worry, just as a heads up that I had a fix [15:37:42] thanks :* [15:41:40] jynus: add me as a reviewer when you feel comfortable with the patch, thanks [15:42:04] ok, not yet [15:42:13] sure, no rush [15:58:07] cache warming ended on the 2 servers now btw (just FYI). I wont do any more config today [16:00:32] addshore: nice, db11126 went back to more CPU normal values [16:00:53] my hunch is the cache warming takes more of a toll on db1126 [16:01:01] db11126 <--- that would be a nightmare [16:01:06] XDDDDD [16:01:16] the cpu level this morning before i started the warming was pretty equal on both hosts [16:01:24] jynus: but still better than db1021 that you mentioned earlier....no SSDs there! [16:01:41] several thousand mysql servers per dc at our level of technology? [16:47:23] "no user or password specified under [client] in /var/lib/prometheus/.my.s5.cnf" [16:47:47] of course there is no password- that is why we use socket_auth with an unpriviliged account! [16:50:48] so, aside from reporting the bug, the workaround will be to add a fake password! [16:51:47] ah, no [16:51:52] there is something else [16:57:02] no, I was right, if I add a fake password, it works [16:57:15] sending a bug to debian [17:01:09] hahaha [17:01:14] please send me the link! [17:45:10] seing issues like https://github.com/prometheus/mysqld_exporter/issues/376 open for a year, I won't have much faith on upstream [17:45:31] bug is at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=953040 [17:48:21] checking [17:49:03] I've added a manual workaround on db1096 for now, configming that previous patch works [17:49:10] and to avoid losing metrics until merge [17:49:14] we can merge tomorrow [17:49:42] (as metrics are flowing after workaround, it just needs to be on puppet) [17:50:21] i wonder if Debian will fix that or will wait for upstream [17:50:47] last time with the atop issue I reported the maintainer refused to fix it [17:50:58] in a non very nice communication style I have to say [17:50:59] actually, the upstream one was open to it [17:51:10] it was the debian one that argue not to fix it [17:51:18] because "I had too much memory" [17:51:21] yeah, that's what I'm saying [17:51:33] (even if in the end it wasn't related to memory scaling) [17:51:38] that in this case it looks like the other way around, so I wonder how they'll treat it [17:51:57] let's merge tomorrow yeah [17:52:15] I don't know, I heard one of the 2 maintainers of that package hates dbas :-D [17:52:25] hahahs [17:52:39] he will probably misstreat us :-P [17:52:42] but he lives close enough so I can drop by and have a chat! [17:53:11] I love the fake password you've chosen xddddd [17:53:36] I think that would be it for today [17:53:46] yeah, let's call it a day [17:53:51] thanks for the help!