[00:17:42] 10DBA, 10Operations, 10ops-eqiad: db1095 backup source crashed: broken BBU - https://phabricator.wikimedia.org/T244958 (10Jclark-ctr) @jcrespo. Battery replacement delivery date is 02/22/20 Please message me on irc for what time works best for you for replacement. I can accommodate your schedule [00:34:32] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Needed by 31st January) eqiad: rack/setup/install es102[0-5].eqiad.wmnet - https://phabricator.wikimedia.org/T241359 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson Below are switch ports for host are racked cabled and updated netbox. Handing over to... [05:54:22] 10DBA, 10Expiring-Watchlist-Items, 10Community-Tech (Kanban-Q3-2019-20), 10Core Platform Team Workboards (Clinic Duty Team), 10MW-1.35-notes (1.35.0-wmf.19; 2020-02-11): Create required table for new Watchlist Expiry feature - https://phabricator.wikimedia.org/T240094 (10Samwilson) > Note that MediaWiki'... [06:10:11] 10DBA, 10Operations, 10Phabricator, 10Release-Engineering-Team (Development services): Upgrade and restart m3 (phabricator) master (db1128) - https://phabricator.wikimedia.org/T244566 (10Marostegui) 05Open→03Resolved This was done successfully. Downtime was 56 seconds: 06:01:19 - 06:02:15 [06:10:16] 10DBA, 10Operations, 10Puppet, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [06:10:25] 10DBA, 10Operations, 10Puppet, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [06:11:32] 10DBA: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453 (10Marostegui) [06:43:57] 10DBA, 10Operations, 10Puppet, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [07:08:29] 10DBA: Possibly replace db1087 (s8) with db1127 (x1) - https://phabricator.wikimedia.org/T245107 (10Marostegui) [07:09:11] 10DBA: Possibly replace db1087 (s8) with db1127 (x1) - https://phabricator.wikimedia.org/T245107 (10Marostegui) p:05Triage→03Medium [07:09:13] 10DBA: Possibly replace db1087 (s8) with db1127 (x1) due to disk space constrains - https://phabricator.wikimedia.org/T245107 (10Marostegui) [07:22:38] 10DBA, 10Cloud-Services, 10cloud-services-team: Prepare and check storage layer for ngwikimedia - https://phabricator.wikimedia.org/T240772 (10Marostegui) This is ready for WMCS to create the views. I have created `ngwikimedia_p` and granted `labsdbuser` grants for it. #cloud-services-team please go ahead an... [07:27:47] 10DBA: Productionize es1020-es1025, es2020-es2025 - https://phabricator.wikimedia.org/T243052 (10Marostegui) [07:56:44] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Needed by 31st January) eqiad: rack/setup/install es102[0-5].eqiad.wmnet - https://phabricator.wikimedia.org/T241359 (10Marostegui) [08:40:07] 10DBA, 10Epic, 10Goal: Setup es4 and es5 replica sets for new read-write external store service - https://phabricator.wikimedia.org/T226704 (10Marostegui) [08:48:29] 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Implement logic to be able to perform partial/incremental backups of ES hosts - https://phabricator.wikimedia.org/T244884 (10Marostegui) We can also take into consideration reading the binlogs from a given file/position using the coordinates we store on the... [09:03:32] 10DBA: Test MariaDB 10.4 in production - https://phabricator.wikimedia.org/T242702 (10Marostegui) I have given this host more weight now, it is now serving with weight 100, which is 6% of enwiki main traffic. [10:10:36] 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Implement logic to be able to perform partial/incremental backups of ES hosts - https://phabricator.wikimedia.org/T244884 (10jcrespo) >>! In T244884#5879850, @Marostegui wrote: > We can also take into consideration reading the binlogs from a given file/posit... [10:11:02] 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Implement logic to be able to perform full and incremental backups of ES hosts - https://phabricator.wikimedia.org/T244884 (10jcrespo) [10:16:01] 10DBA, 10Operations, 10ops-eqiad: db1095 backup source crashed: broken BBU - https://phabricator.wikimedia.org/T244958 (10jcrespo) Thank, Jclark-ctl. No need to wait for us in this particular case, as it is as important that the service was immediately moved elsewhere and the data considered irrecoverable (b... [10:17:05] ^ jynus will you reimage it or just apt-full upgrade? [10:17:12] just curious [10:17:26] probably both [10:18:31] both? one doesn't exclude the other? [10:18:37] maybe testing buster? [10:19:02] then revert to strech? [10:19:50] ah, that's good [10:19:54] wouldn't hurt [10:20:41] (assuming db1140 works well, there will be room for experimentation [10:22:17] did you have only 1 buster host right now? [10:22:24] yep [10:22:24] or just 1 core/load? [10:22:27] db1106 [10:22:32] db1107 sorry [10:22:39] but are there others not in production? [10:23:19] wasn't test-s1 buster? [10:23:25] even if percona [10:23:39] yes, db1114 is buster too (percona) [10:23:45] I cannot remember [10:24:01] those are the only two [10:24:03] that can be changed [10:24:22] db1107 is 10.4+buster in production, db1114 is percona+buster replicating from db1107 [10:24:49] what about test-s1 on codfw? strech? [10:25:04] I believe so, I don't recall touching those [10:25:20] yep, db2102 confirmed stretch [10:25:48] backup workflow is complicated [10:25:52] because we only have one [10:26:03] and there will be a time where we are 50/50 [10:26:42] I guess we upgrade one dc when relatively confident? [10:27:04] or keep in strech for longer as there is an upgrade path? [10:27:29] not sure yet, first we have to test 1 host per section with buster [10:27:33] we'll see [10:27:43] specially ES [10:28:11] well, we have logical backups weekly [10:28:16] should the worse happen [11:19:25] 10DBA, 10Operations, 10procurement: eqiad: 9 database servers - https://phabricator.wikimedia.org/T245137 (10Marostegui) [15:53:44] 10DBA, 10Cloud-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for ngwikimedia - https://phabricator.wikimedia.org/T240772 (10bd808) [15:53:54] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for ngwikimedia - https://phabricator.wikimedia.org/T240772 (10bd808) [16:48:01] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: es1019: reseat IPMI - https://phabricator.wikimedia.org/T243963 (10wiki_willy) @Cmjohnson - looks like it's too late to do this one today, since they need 24hrs to depool. Chatted with @Marostegui briefly, so just let know them know when's a good date for you t...