[07:47:42] marostegui: see my date proposals on sre travel spreadsheet [07:51:31] checking [07:52:47] jynus: sure, that works [07:53:40] or we can fly the 9th just a different hours [07:54:04] I can check [07:54:31] it depends from where you'd fly from [07:55:22] From madrid there is a flight at 16:05 and another one at 10:30 [07:55:46] so i could fly either in the morning or in the evening depending on yours [07:56:18] normally I cannot fly in the morning [07:56:39] because I have to travel in public transport kms away [07:56:40] so I could take the 10:30 one and at 12:15 I am available again [08:57:26] I think I am going to setup dbprov2002 and then set es2001 as a spare [08:57:41] :) [09:03:31] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on cumin1001.eqiad.wmnet for hosts: ` ['dbprov2002.codfw.wmnet'] ` The... [09:24:04] let me know if you have some ideas: [09:24:10] PXE-E51: No DHCP or proxyDHCP offers were received. [09:24:23] Mac is correct [09:24:27] dns looks right [09:24:31] link seems up [09:24:59] wrong vlan on the switch? [09:25:20] on the switch port that is [09:25:48] I will mention it on the ticket, just double checking if I am mistaken [09:25:53] I have seen that before and turned out to be the vlan [09:26:01] of something stupid I did wrong [09:26:39] if the mac is correct and matches the one on the netboot file, I would say it could be the port config [09:26:55] thanks, I will mention it to papaul [09:27:30] the other thing I saw in the past is configuring the wrong port, but this is not - link is 10Gb on the right one and "0" on the other one we don't use [09:32:41] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10jcrespo) a:05jcrespo→03Papaul @papaul we need help from you. We cannot network boot on dbprov2002 (we did on dbprov2001 already). `... [10:48:10] 10DBA: grant Tarrow access to test-s4 servers - https://phabricator.wikimedia.org/T219613 (10Tarrow) [10:49:13] marostegui: I guess I cna just do the ^^ ticket? by copying the grants over into this home dir? [10:49:42] addshore: I am checking the NDA signatures [10:50:13] ack :) I mean, he definitely has an nda as he already has cluster access ;) [10:50:19] & deployment [10:50:19] ah roger [10:50:23] then yeah [10:50:45] :) [10:51:10] oh, not sure i can actually make the file there [10:51:15] so marostegui i might have to leave it to you! [10:51:38] let me check [10:53:03] marostegui: it would be nice to have a list of "owners/users" for that, for when we need to do maintenance [10:53:05] tarrow: can you check your home at mwmaint1002? [10:53:21] so we can contact them [10:53:35] Good idea, I always make sure they know that those hosts can go anytime (sort of) [10:53:58] maybe they can just create a wiki themselves [10:54:25] (wiki page) [10:54:42] yeah [10:55:17] I think we can place it under https://wikitech.wikimedia.org/wiki/MariaDB or something [10:55:29] I just filled tarrow in regarding the way to access the hosts and which users exist there [10:55:46] addshore: I left the file at his home [10:55:50] tarrow: ^ [10:56:02] amazing! [10:56:29] once it is confirmed let me know and feel free to close that task [10:56:34] I will comment there for the record [10:56:41] ack! [10:57:21] 10DBA: grant Tarrow access to test-s4 servers - https://phabricator.wikimedia.org/T219613 (10Marostegui) As spoken on IRC, @Tarrow has an NDA and already access to the cluster. I have left the grants file at his home at `mwmaint1002` [10:58:07] all good! works for me [10:58:15] =] [10:58:40] when do the credentials roll over? Will some script update that grants file? [10:59:12] tarrow: It is a testing cluster, very few people have access to it, it is not always warrantied to be up and the data doesn't change, it is static [10:59:36] cool, fine :) [10:59:37] 10DBA: grant Tarrow access to test-s4 servers - https://phabricator.wikimedia.org/T219613 (10Marostegui) 05Open→03Resolved a:03Marostegui ` ˜/tarrow 11:58> all good! works for me ` [11:09:38] marostegui: thanks for sorting everything out so fast:) [11:14:18] I have created this for now: https://wikitech.wikimedia.org/wiki/MariaDB#Testing_servers [11:14:35] thanks! [11:15:24] they were bought I think for MCR usage, I may add some extra stuff later [11:15:54] meanwhile I had create a tranfer.py --type=decompress [11:15:55] sure, please complete more stuff! [11:16:02] *created [11:16:35] so one can do transfer.pt --type=decompress host1:backup.tar.gz host2:/srv/datadir [11:16:48] and it won't prepare anything [11:16:49] no? [11:17:05] in theory it should be prepared already [11:17:39] transfer doesn't touch at all the files/data [11:18:14] ah so to recover a host we can use that thing then? [11:18:25] like a new host, you want to populate it from an existing backup [11:18:28] that would work? [11:18:39] that is the plan [11:18:51] I mean, that'd be the command to run? [11:18:57] apart from setting up replication etc [11:18:58] I didn't want to work on this yet [11:19:10] but one cannot check the backups are ok without recovering them [11:19:18] sure, I am just trying to see if I am following it right [11:19:26] yeah, the other parts definitely will not happen yet [11:19:44] but at least I want an easy way to test them, even if manually [11:19:50] of course! [11:20:15] for example, imagine that the tar.gz run of disk space [11:20:27] and the error handling doesn't get that [11:20:39] or the prepare doesn't end and the error handling doesn't get it either [11:20:45] I want to check a full process [11:21:14] yeah, indeed [11:25:13] do we have any large host that is yet to be provisioned? [11:25:40] like online but with no data? [11:25:44] yeah [11:25:47] no :( [11:26:07] I will use es hosts, then [11:26:24] coolio [11:37:40] tarrow: yw! [13:18:02] 10DBA: Create a recovery/provisioning script for database binary backups - https://phabricator.wikimedia.org/T219631 (10jcrespo) p:05Triage→03High [13:24:28] 10DBA, 10Patch-For-Review: Create a recovery/provisioning script for database binary backups - https://phabricator.wikimedia.org/T219631 (10jcrespo) ` time ./transfer.py --type=decompress --compress --no-checksum --no-encrypt dbprov2001.codfw.wmnet:/srv/backups/snapshots/archive/snapshot.x1.2019-03-28--12-56-4... [14:20:41] 10DBA, 10Patch-For-Review: Create a recovery/provisioning script for database binary backups - https://phabricator.wikimedia.org/T219631 (10Marostegui) Nice work!! :) [14:24:05] 10DBA, 10Patch-For-Review: Document clearly the mariadb backup and recovery setup - https://phabricator.wikimedia.org/T205626 (10Marostegui) Self note: TODO: expand the following point on https://wikitech.wikimedia.org/wiki/MariaDB/Backups#Recovering_a_Snapshot //"Setup replication based on the GTID coordinat... [14:44:54] 10DBA, 10Patch-For-Review: Create a recovery/provisioning script for database binary backups - https://phabricator.wikimedia.org/T219631 (10Marostegui) Heh, I got confused. I was taking a look at the `xtrabackup_*` files on es2002 after that transfer and I saw very strange things and then I realised that those... [18:50:02] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10Papaul) @jcrespo the problem was that ge-4/0/3 was already part of private1-b-codfw and not xe-4/0/3 so the install is in progress. will... [18:56:15] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10Papaul) a:05Papaul→03jcrespo @jcrespo all yours let me know if you have any questions