[05:41:00] 10DBA, 10Patch-For-Review: decommission dbproxy1011.eqiad.wmnet - https://phabricator.wikimedia.org/T249590 (10Marostegui) [05:54:32] 10DBA, 10Operations, 10Patch-For-Review, 10Wikimedia-Incident: investigate pc1008 for possible hardware issues / performance under high load - https://phabricator.wikimedia.org/T247787 (10Marostegui) Let's move pc1008 back to pc2 master. Also, as pc1010 is replicating still from pc1, its disk was around 88... [06:00:34] 10DBA: Drop wikidatawiki.wb_items_per_site_old from s8 hosts - https://phabricator.wikimedia.org/T250345 (10Marostegui) [06:00:52] 10DBA: Drop wikidatawiki.wb_items_per_site_old from s8 hosts - https://phabricator.wikimedia.org/T250345 (10Marostegui) p:05Triage→03Medium [06:59:22] 10DBA: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 (10Marostegui) [07:16:39] should we merge 589188 now? [07:17:08] I wanted to wait for kormat to be around, so he can actually do the deploy himself [07:17:23] ok [07:17:29] you want to live dangerously i see :) [07:17:30] We don't have many chances to deploy mediawiki lately thanks to dbctl! [07:17:44] should we delete and optimize the tables? [07:17:52] which ones? [07:17:52] deleting binlogs only seems like a small thing [07:18:04] there is likely entries not purged automatically [07:18:12] you are talking about pc1010, right? [07:18:19] pc1008 [07:18:39] pc1008 is having no space issues [07:19:36] can I test on one table to verify no maintenance needed? [07:19:43] sure [07:20:05] it should take no more than 15 minutes and I will get more calm :-D [07:20:33] pc1008 was entirely purged when reinstalled [07:20:51] we could do a defragmentation if we wanted to [07:20:59] But a purge isn't necessary as it has been replicating from pc1010 [07:21:01] sure, not worried about that [07:21:03] but about data [07:21:18] ah, so it was wiped? [07:21:22] yes [07:21:28] ah, I thought data was maintained [07:21:33] no [07:21:39] I purged it to start fresh [07:21:50] in any case, let me check it on one table [07:22:01] that's why I waited almost a month to get it back, so its data was almost there and the replacement wouldn't case massive hit rate decreases [07:24:05] before state: https://phabricator.wikimedia.org/P10996 [07:27:39] running optimize now on pc255, it should take around 15 minutes [07:27:45] That is exactly what I suggested above [07:32:23] it actualy did it in only 3 minutes, probably due to the lack of load [07:34:04] https://phabricator.wikimedia.org/P10996#63315 [07:34:27] not much gain, as it is expected [07:34:40] yeah, but we now are sure :-D [07:35:54] note the analytics re-running [07:36:51] also it changed the row format from compact to dynamic [07:37:38] I think they should be compact [07:38:36] or maybe they should all be dynamic? [07:38:52] noting it to check it later [07:39:28] I will go back to backup2002, heads up, I will be running later backups of es2XXX hosts [07:39:38] cool, I will deploy this with kormat [08:12:10] 10DBA, 10Operations, 10serviceops, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10jcrespo) [08:13:00] 10DBA, 10Operations, 10serviceops, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10jcrespo) 05Open→03Resolved a:03jcrespo I am going to close this, leave pending work at T238048. [08:21:54] I would really want a review of https://gerrit.wikimedia.org/r/c/operations/puppet/+/589263 [08:22:00] specially the node selection [08:22:22] jynus: with you in a sec, finishing the parsercache stuff with kormat [08:22:28] yep, no [08:22:30] np [08:22:37] :** [08:33:04] it's taking forever because i keep distracting marostegui with questions :) [08:36:12] 10DBA, 10Operations, 10Patch-For-Review, 10Wikimedia-Incident: investigate pc1008 for possible hardware issues / performance under high load - https://phabricator.wikimedia.org/T247787 (10Kormat) pc1008 is back as pc2 master and pc1010 will be cleaned up, purged and then back to the spare pool [09:59:50] alert fired for db1092 replication lag. the lag is >1d. this smells like an expired silence. [10:00:02] yeah, check -operations :) [10:00:09] downtimed again [10:00:22] ah hah [10:00:48] 10DBA, 10Operations: replace phabricator db passwords with longer passwords - https://phabricator.wikimedia.org/T250361 (10Dzahn) [10:04:42] 10DBA, 10Operations: replace phabricator db passwords with longer passwords - https://phabricator.wikimedia.org/T250361 (10Dzahn) - schedule a short maintenance window for phabricator - change the passwords live - change the passwords in private repo in class passwords::mysql::phabricator - run puppet on phabr... [10:08:23] ^marostegui [10:35:06] 10DBA, 10Operations: replace phabricator db passwords with longer passwords - https://phabricator.wikimedia.org/T250361 (10Marostegui) happy to help [13:57:25] I need to deploy this today: https://gerrit.wikimedia.org/r/c/operations/puppet/+/589303 [13:58:22] I accept later renamings of the actual classes [14:00:43] checking [14:01:13] I accept them now, I mean I will apply them later :-D [14:01:29] I configured bacula by accident on backup2002 [14:01:44] and now it would store the backups on the Databases pool [14:01:53] for which we don't have enough space for es backups [14:03:05] I think in the end I will generate backups on both DCs, and store them on the cross-dc bacula for redundancy [14:03:19] like in a cross [14:03:38] so backup2002 will also talk directly to bacula? [14:04:28] dbprov2* backups will go to bacula/backup1XXX and dbprov1* backups will go to bacula/backup2* [14:04:35] but that is for later [14:04:41] yeah I mean this [14:04:43] yes, backup2002 [14:04:52] will be the place where they are generated AND bacula [14:04:57] but that is not part of the patch [14:05:06] (will require more work later) [14:05:06] [16:01:29] I configured bacula by accident on backup2002 [14:05:07] [16:01:44] and now it would store the backups on the Databases pool [14:05:22] for now I just want to remove bacula from backup2002 [14:05:28] Re: those comments [14:05:36] which is that patch [14:05:53] ok, now I get it [14:05:55] later I will re-add it [14:06:02] but pointing to the right place [14:06:10] I did +1 with a minor language typo [14:06:14] not on the Databases pool, which we would fill :-D [14:06:15] that can be fixed on a different patch [14:06:17] thanks [14:06:35] I also try to keep you up to date with my crazy ideas :-D [14:06:45] sorry this was in a rush because I did it wrong [14:06:48] No worries [14:07:01] I think I am going to work out a bit, do you need me for anything else now? [14:07:07] no, thank you! [14:07:10] thanks! bye! [14:07:14] I will ask you about names other day [14:07:30] (better naming and documentation for clarity) [14:15:39] it is nice that we have monitoring of backups, which let me detect this before it was too late :-D [20:13:16] 10DBA, 10MediaWiki-Page-derived-data, 10Schema-change: Avoid MySQL's ENUM type, which makes keyset pagination difficult - https://phabricator.wikimedia.org/T119173 (10Krinkle) Tagging DBA to make a decision on whether or not this is a Bad Thing (TM), then resourcing/migration can follow or be declined.