[08:13:55] jynus: I have installed 10.1.33-1 on db1120 and I am having some issues while starting systemctl start mariadb@s2 not sure if it is related to the package and the new systemd unit, just mentioning [08:13:59] I am checking it [08:14:34] did you go over my checklist? [08:15:12] the one in our etherpad? [08:15:46] I went thru that one, and I am checking why puppet didn't create /run/mysqld [08:16:08] it is created on reboot [08:16:24] Server reboot? [08:16:28] it cannot be created on package start because of the bug [08:16:30] yes [08:16:32] riiight [08:16:33] ok [08:16:41] let's see [08:16:43] it cannot be fixed until buster [08:16:50] or it will create the issues I reported [08:21:36] I guess I could add, additionally "ExecStart=/usr/bin/systemd-tmpfiles --create" [08:25:42] worked fine on reboot indeed [08:27:01] I will add ExecStartPre=/bin/systemd-tmpfiles --create [08:27:12] until there is a better option available [08:27:15] Cool! [08:29:20] that wouldn't work either [08:31:54] I can create some workarounds [08:32:09] but I think it is best to stick to the current method- reboot [08:32:17] until a newer systemd [08:32:18] It is not a really big deal I think, it is just the first time it bite me, so I was a bit like: eh uh :) [08:32:28] I would say do not spend time on it [08:32:38] note I documented it on the checklist [08:32:57] and it should happen on a reimaged- where a reboot happens after puppet ran [08:33:05] Yeah, I wasn't aware of the reboot part, I just added it [08:33:06] *not happen on a reimage [12:48:09] Going to deploy a schema change on s3 with replication on codfw [12:50:03] noted [12:50:14] thanks for the comment! [12:50:25] it is helpful [12:50:36] :) [13:58:56] db1105@s2 started [13:59:03] s1 says it is corrupted [13:59:36] :__( [13:59:55] so gtid isn't great haha [14:00:29] well, it was a powerdown (sigint) followed by a sigkill [14:01:42] I was kidding :) [14:02:12] maybe we can recover from backups and then do all the partitioning manually? :( [14:02:20] or mydumper [14:39:41] I am going to enable notifications on db1064, they are disable. Probably a left over from when moving it to x1 [14:40:18] +1 [15:00:44] running "recover_section.py --host db1105.eqiad.wmnet --port=3311 --password=$pass /srv/backups/latest/dump.s1.2018-05-15--17-00-02" on dbstore1001 [15:01:09] nice :) [15:01:16] we will have to do the partioning afterwards, no? [15:01:23] yep [15:01:28] but that should be ok [15:01:53] I could edit the table definition in advance [15:01:56] but the other time [15:02:07] it takes so much time, that I prefer to catch up replication [15:02:13] and then partition/compress [15:02:23] in this case the compression will happen by default [15:02:26] yeah, agreed [15:02:55] if I was in a hurry I could copy from codfw [15:03:02] but we are ok for now [15:03:46] yeah, we are fine [15:04:03] on the bright side, we may have a working transfer.py :-) [15:04:34] it took 3 days to import to dbstore1001 [15:04:45] it probably will take just 12 hours on db1105 [15:04:53] it could have been much worse! [15:05:50] and we get to test the backups again! [15:05:51] :) [15:06:07] I have been recovering from them for a while [15:06:16] when building new hosts [15:06:38] db2067 story doesn't end [15:07:22] the new disk is still recoverying [15:07:22] :) [15:09:32] 10DBA: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4212030 (10Marostegui) [15:10:24] db1106 should be finnaly up again [15:11:16] \o/ [15:28:02] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1064 - https://phabricator.wikimedia.org/T194885#4212056 (10Marostegui) 05Open>03Resolved a:03Marostegui This is now fixed, I am going to fail the other disk and a new task will be created ``` Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id:... [16:16:31] I am also running a quick compare.py on s2 to check db1115 really went up without corruption