[01:42:17] PROBLEM - MariaDB sustained replica lag on pc2007 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [01:44:35] RECOVERY - MariaDB sustained replica lag on pc2007 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [03:12:31] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [03:17:19] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [05:09:56] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [05:13:52] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1082.eqiad.wmnet - https://phabricator.wikimedia.org/T281794 (10Marostegui) [05:19:07] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 (10Marostegui) Sounds good, ping me whenever you want to get the databases deleted. Is it needed to take a final backup from these testing databases? I am off Thursday and Friday, but @Kormat... [05:31:28] marostegui: morning, can I bother you with some stuff? [05:31:36] what's up! [05:31:43] Three things! [05:32:07] First: arwiki's watchlist is now half, it's a couple of GBs so not sure shrinking would be useful but don't be alaramed [05:32:17] if you alter it and it goes really tiny [05:32:41] ah cool, I don't think I will do it, 2-3GB isn't that much for the pain for having to depool all the hosts and such [05:32:49] yeah [05:32:57] The second: I just wrote a dashboard for tracking drifts reported by the tool https://drift-tracker.toolforge.org/ [05:33:09] oh nice!!!!!! [05:33:15] I haven't set all tracking tickets yet but I'm on it [05:33:18] this is very cool [05:33:36] this checks one host per section? [05:33:39] or all of them? [05:33:45] all of them [05:33:48] sweet! [05:33:54] click on the button to see the detailed report [05:34:16] yeah, I am seeing it, but I thought the original drifts only reported one host per section, or was it one wiki only? [05:34:23] but one random wiki (otherwise, s3 will explode it) [05:34:34] ah that was it [05:35:10] slowly I make this more automated and add more than core so you can choose categories, etc. [05:35:20] it is good that some of them only have one section with the drifts [05:35:38] yeah it's mostly s10 and due to the charset thingy [05:36:07] There are things I don't get though [05:36:21] Ie: https://phabricator.wikimedia.org/T277116 this reports many wikis but on the dashboard says only s10 [05:37:39] ah, that was my mistake in setting the tracking [05:37:52] there are two tracked under T277116 [05:37:53] T277116: fa_deleted_timestamp and fa_timestamp are binary(14) in code but varbinary(14) in production - https://phabricator.wikimedia.org/T277116 [05:38:04] one is wrong. I need to fix that [05:38:13] sure, no rush, i was just confused :) [05:39:08] I hope I can get most of it at least tracked so we can handle them during the switchover [05:39:21] yeah, I have a list of things I want to do during the switchover [05:39:32] and will probably include also the 3 you created yesterday [05:39:41] and revision table is being abstracted, list of its drifts will be fun [05:39:54] indeed [05:40:06] (the only table left) [05:40:21] The third thing: the commons and image table issue [05:40:39] I wrote a patch that's a rather straightforward and simple fix [05:40:47] I even wrote a maint script to clean the mess [05:41:25] which of the issues it has?: ) [05:41:33] pdf/djvu [05:41:43] ah right! [05:41:59] waiting it to be merged but once it's done, I'll probably start cleaning image table in commons (it moves the img_metatdata field to ES and compress it) [05:42:09] that's going to be fun... [05:42:12] it's more than 90% of the table [05:42:13] with such huge table [05:42:41] the good thing is that it has around 90M rows or so but the pdf/djvus are only 3M in total [05:43:33] I can simply run it on these rows [05:43:53] Those are very good news, image table really needed love [05:44:13] can you check the size on disk? I feel it'll give you a lot of space in s4 [05:44:26] (next links table once I'm done with this monster) [05:44:34] yep, let me see [05:45:02] it is 361GB....compressed [05:45:06] probably it'll improve innodb buffer pool efficiency a lot [05:45:16] definitely [05:45:30] 361GB is crazy [05:45:36] pretty much all the buffer pool [05:45:39] the table or s4 in total? [05:45:43] the table [05:45:51] oh boy [05:46:07] it'll be around 30GB once this is done (hopefully) [05:46:18] wtfffff [05:46:20] really? [05:46:30] I told you, 90% of it is just pdf/djvu BS [05:46:44] which will be moved to ES [05:46:47] (and compressed) [05:46:50] it will be nice to optimize it on eqiad at least, while codfw is active [05:47:11] Yeah, have fun :D [05:47:24] let me see if I can convince Daniel to merge the patch :D [05:47:32] hahaha [05:48:01] have fun XD [05:48:14] actually the solution was suggested by Tim Starling so I hope it'll be easy [05:57:57] 10DBA, 10Datasets-General-or-Unknown, 10Patch-For-Review, 10Sustainability (Incident Followup), 10WorkType-NewFunctionality: Automate the check and fix of object, schema and data drifts between mediawiki HEAD, production masters and slaves - https://phabricator.wikimedia.org/T104459 (10Ladsgroup) I just... [06:16:35] 10DBA, 10DiscussionTools, 10Editing-team, 10Performance-Team, and 2 others: Reduce parser cache retention temporarily for DiscussionTools - https://phabricator.wikimedia.org/T280605 (10Marostegui) Still running: `May 10 04:23:37 mwmaint1002 mediawiki_job_parser_cache_purging[56878]: .......................... [07:15:43] 10DBA: Fix db-switchover update zarcillo part - https://phabricator.wikimedia.org/T272954 (10Marostegui) @Kormat was this ever released? I don't recall if I had to manually update the master or not on the last s1 switchover. Maybe this can be released and tested during the upcoming s6 switch? [07:18:54] 10DBA, 10SRE, 10Datacenter-Switchover: When switching DCs, update pc hosts in tendril - https://phabricator.wikimedia.org/T266723 (10Marostegui) This is probably not worth the effort if we are expecting to drop tendril "soon". We can update these manually for the next switch (and switch back) and hopefully f... [08:49:37] 10DBA, 10SRE, 10Datacenter-Switchover: When switching DCs, update pc hosts in tendril - https://phabricator.wikimedia.org/T266723 (10Kormat) I discussed this with @RLazarus back in october, and we agreed it's not worth the effort given the impending any-day-now™ tendril decomm. (I forgot to update the task w... [08:54:25] 10DBA, 10SRE, 10Datacenter-Switchover: When switching DCs, update pc hosts in tendril - https://phabricator.wikimedia.org/T266723 (10Marostegui) 05Open→03Declined [09:46:04] 10DBA, 10decommission-hardware: decommission db1074.eqiad.wmnet - https://phabricator.wikimedia.org/T281959 (10Marostegui) Host depooled [09:50:23] 10DBA: Fix db-switchover update zarcillo part - https://phabricator.wikimedia.org/T272954 (10Kormat) No, not yet. Various roadblocks have held up the next release of wmfmariadbpy, though hopefully they're mostly resolved by now. I'll be working on it this week. [11:52:07] 10DBA: Investigate changing innodb_fast_shutdown from 1 to 0 - https://phabricator.wikimedia.org/T282443 (10Marostegui) [11:52:24] 10DBA: Investigate changing innodb_fast_shutdown from 1 to 0 - https://phabricator.wikimedia.org/T282443 (10Marostegui) p:05Triage→03Medium [11:52:33] ^ thoughts? [12:07:27] marostegui: seems like something worth testing [12:08:09] \o/ [13:20:46] marostegui, have you tried? [13:20:50] *it [13:21:02] because last time I tried,I had to kill mysql after 12 hours [13:42:04] not in this environment, that's what I mentioned on the ticket, that we need to evaluate it, as it might a no-go for mw hosts [17:13:17] 10DBA, 10DiscussionTools, 10Editing-team, 10Performance-Team, and 2 others: Reduce parser cache retention temporarily for DiscussionTools - https://phabricator.wikimedia.org/T280605 (10DLynch) If the purge script is getting unreliable, should we look into that being related to our space issues? [17:46:24] 10Data-Persistence-Backup, 10Patch-For-Review: Setup backup1003 and backup2003 as the storage location for es bacula backups - https://phabricator.wikimedia.org/T282249 (10jcrespo) The software reconfiguration for the migration went surprisingly well. The only issue is that because no new backups are being tak... [18:14:37] 10DBA, 10DiscussionTools, 10Editing-team, 10Performance-Team, and 2 others: Reduce parser cache retention temporarily for DiscussionTools - https://phabricator.wikimedia.org/T280605 (10Krinkle) It takes 4-5 days to run instead of <24h, and this has been the case for a few months. Overall though, it should... [18:15:50] 10DBA, 10Patch-For-Review: Switchover s6 from db1131 to db1173 - https://phabricator.wikimedia.org/T282124 (10LSobanski) p:05Triage→03Medium [18:17:08] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of user_touched - https://phabricator.wikimedia.org/T282373 (10LSobanski) p:05Triage→03Medium a:03Marostegui Assigning to Manuel to confirm if this can go into Ready. [18:17:32] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of page_touched - https://phabricator.wikimedia.org/T282372 (10LSobanski) p:05Triage→03Medium a:03Marostegui Assigning to Manuel to confirm if this can go into Ready. [18:17:59] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of ar_timestamp - https://phabricator.wikimedia.org/T282371 (10LSobanski) p:05Triage→03Medium a:03Marostegui Assigning to Manuel to confirm if this can go into Ready. [18:24:49] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 (10LSobanski) Actually, @Kormat is out as well so it'll have to either happen by Wednesday or next week. [19:00:53] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 (10Legoktm) We don't need a backup, it's all just emails of people saying "Test" etc. :P, I'll delete the VM tomorrow (Pacific Time) so it should be ready for DBA deletion on Wednesday :)