[07:06:56] I've refreshed ms3 codfw master with a new host, if you see something weird let me know, as the process is very delicate [07:12:53] hi, marostegui, may you double check my rebase is correct to current status? [07:12:56] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1276382 [07:13:04] checking [07:13:34] +1ed again [07:13:38] looks good [07:14:00] just making sure you were not in the mean time working with db2253 [07:14:14] will merge as is now [07:14:18] nop, not today probably [07:14:20] thanks though [07:14:39] no problem [07:24:01] FIRING: SystemdUnitFailed: prometheus-mysqld-exporter.service on db2250:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:18:16] Amir1, marostegui: can I start the rolling reboot on es6 and then es7 using rolling_restart.py ? Any blocker or glitch to be aware of? [08:18:57] federico3: double check with jaime to make sure no backups are running [08:19:46] the rolling_restart.py scripts checks for running backups and skips hosts as needed [08:20:04] nice then [08:20:25] (you mean jynus?) [08:20:54] yes [08:21:17] no ongoing backups now, how long does it take? backups run the monday to tuesday night [08:25:19] should be done well before night [08:36:59] 👍 [09:10:20] FYI: Apr 23 09:09:02 prometheus2006 mysqld_exporter_config.py[734533]: Writing mysql-dbstore_codfw.yaml [09:10:48] if there has been any schema change in the last 24 hours for s1, it may need reappying for db2250:s1 [09:44:01] RESOLVED: SystemdUnitFailed: prometheus-mysqld-exporter.service on db2250:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:20:10] I've refreshed pc2 codfw master with a new host, if you see something weird let me know, as the process is very delicate [15:15:57] so you know if there is more backup sources or backup1 dbs programmed to be installed this Q? [15:16:52] jynus: https://phabricator.wikimedia.org/T412415 those have arrived, not sure if in the refreshes (listed there too) are backup sources [15:17:02] I will check myself [15:17:38] is there a list of decoms, or just first + number of hosts? [15:18:53] This is covered by "Refresh of db11[50-82,84]" [15:19:02] Those are the hosts that are being refreshed [15:19:02] thanks, that I couldn't find [15:19:06] I will check those [15:19:17] jynus: 34 will be decommissioned, but we bought only 26 [15:19:25] oh [15:19:31] if there are backup sources involved, those will get a host, so no worries [15:19:47] a worry for other time, anyway [15:20:13] yep :) [15:20:32] I will create the "productionize task" soon, so I will ping you there too with the host that will be "yours" [15:23:11] https://phabricator.wikimedia.org/P91376 [15:24:13] db1150 & db1171 for backup sources [15:24:35] I will assign some of the new hosts to those :) [15:30:53] I will be leaving a backup running to test the new host- it will have a 50% chance of failing, because I always forget some minor thing on setup (which gets thoroughly checked on backup), so you can ignore if you see some job error tommorrow morning