[08:54:33] we have a PSU fault on es2055 [09:01:51] sometimes monitoring will make a ticket automatically... [10:45:35] federico3: looking at the essential work doc, you look to have added a new section the other day, but it's still called week of 18 August, and copies the object storage update from the week of 18 August. I think that's an error and you meant to make a new week of 25 August section, so I should update the date and the object storage bits. Is that correct? [10:45:55] [alternatively, you might have been meaning to add things to the 18th August section] [10:46:36] oh, I'll move those to the 25th [10:47:24] thanks [11:33:57] thank you [11:43:10] Amir1: updated https://gerrit.wikimedia.org/r/c/operations/puppet/+/1182592 [11:43:52] Amir1: also can I start the schema change on s3 in codfw? [11:49:37] federico3: I'm running schema change on s3 [11:49:42] https://wikitech.wikimedia.org/wiki/Map_of_database_maintenance [11:49:47] It'll be over soon though [11:50:05] I can wait or do s7 instead [11:50:29] wait a bit since I want to do s7 next :P [13:06:57] Amir1: can I start the upgrade and clone of es2049 or maybe we want to do the clone on monday? [13:07:12] let's do it on Monday [13:07:34] good to go for the upgrade now? [13:31:25] FIRING: SystemdUnitFailed: prometheus-mysqld-exporter.service on es2049:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:49:11] ^-- expected? [13:49:53] yeah, it's being provisioned which shouldn't trigger an alert [15:33:07] Amir1: can I start the upgrade on es2049 now? [15:34:08] also can I start che schema change on s3? I see yours is not running on cumin1002 [15:37:49] * Emperor is just going to gently note it's Friday EU-afternoon [16:01:58] I'm also a bit confused by the notion of upgrade. It's a new host being provisioned, it doesn't have anything to upgrade. What am I missing here [16:03:19] s3 is done on my side [17:31:40] FIRING: SystemdUnitFailed: prometheus-mysqld-exporter.service on es2049:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:51:25] FIRING: [2x] SystemdUnitFailed: prometheus-mysqld-exporter.service on es2049:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:51:40] FIRING: [2x] SystemdUnitFailed: prometheus-mysqld-exporter.service on es2049:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:35:02] Is this going to alert all weekend? Can we at least silence it until Monday? [22:40:10] I'll try to silence it [22:43:59] downtimed, let's see if that does anything [22:44:27] this is empty but can't say whether it's real or not, https://alerts.wikimedia.org/?q=alertname%21%3DSystemdUnitFailed&q=team%3D~data-persistence&q=%40state%3Dactive [22:44:54] is this triggered by the deployment through puppet? [22:45:19] it's not fully provisioned that's the problem [22:45:42] so puppet is checking for a service that's not set up (yet) and fails [22:46:28] but.. should puppet have ran and deployed mariadb? I was planning to run a host update to get all the packages deployed [22:49:31] I don't think puppet deploys mariadb, most notably the data is missing [22:49:43] needs cloning from another host first [23:13:02] (fwiw the wmf-mariadb1011 package is deployed)