[02:33:10] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on es1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:33:34] FIRING: PuppetFailure: Puppet has failed on es1050:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [05:13:10] FIRING: SystemdUnitFailed: prometheus-mysqld-exporter.service on es1056:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:55:34] temporarily stopping the restart service [08:20:22] federico3: it's being provisioned [08:23:55] hello marostegui :) [08:24:24] yes I've noticed it was being worked on so I just stopped the restarter service to stop the alerts [08:25:43] sweet [08:26:01] puppet will start it again I guess [12:00:01] hah, google calendar is not giving me the link to meet for the meeting [12:06:43] * Emperor has sent the link by privmsg [12:33:59] https://www.irccloud.com/pastebin/lK1DPwFo/part%20of%20the%20ouptut%20from%20mysql_upgrade [12:55:22] turns out I need a *powered* USB hub because mine drops to 4.4 v :D [13:53:21] I'm going to run the major version upgrade cookbook on db1176 [14:11:54] ICYMI: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/3EITZKJUQHP4CW223XT27BEKDTWUHL7V/ [14:14:58] kwakuofori: welcome back! [15:01:50] federico3: thanks! [15:40:03] marostegui: when you have a sec https://phabricator.wikimedia.org/T391581#10867540 [16:33:17] urandom: just a heads-up, there's quite a bit of eqiad restbase flapping today. I'm not certain but I think I saw some nodes down [16:33:58] hnowlan: I'm doing reboots [16:34:35] sorry, just seeing the alerts in #-operations, I'm not sure why that's happening though, silences should be created [16:34:37] ahh okay, grand [16:35:20] and it's just tcp_cassandra_a_ssl_ip4... [17:13:21] sre.ganeti.makevm is hanging in both eqiad and codfw, any idea? [17:14:36] e.g. it took 2.5 minutes only to fetch data from netbox