[00:57:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter@s4.service on db1150:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[03:32:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter@s3.service on db1150:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:11:51] <Emperor>	 are those alerts from db1150 expected?
[07:13:17] <Emperor>	 (I dunno if it'll help, but my toot about our vacancy has got a reasonable amount of boosting)
[07:15:03] <jynus>	 yes
[07:15:11] <jynus>	 I am "fixing it"
[07:16:16] <jynus>	 I also mentioned we should get rid of predictive disk failure, because provisioning makes fire it
[07:22:25] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter@s3.service on db1150:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:39:52] <Emperor>	 jynus: sorry to be a bore; I know you've previously reviewed https://gerrit.wikimedia.org/r/c/operations/puppet/+/1190674 but would you mind having another look, please? I'm now leaving ms-be1088 out of the rings so e.lukey can do some boot-testing on it
[08:42:09] <jynus>	 checking
[08:43:57] <jynus>	 lgtm
[08:44:51] <Emperor>	 TY :)
[09:27:51] <Amir1>	 zabe: do you want to announce the rev_sha1 thing or should I? 
[09:35:41] <zabe>	 Amir1: Do you mean on the cloud mailing list?
[09:35:47] <Amir1>	 si
[09:37:13] <zabe>	 I can do it
[09:37:27] <Amir1>	 go for it
[09:37:39] <zabe>	 How long do we want to give folks to migrate away?
[09:37:45] <Amir1>	 three weeks?
[09:37:52] <zabe>	 sounds good
[09:38:10] <Amir1>	 Thank you!
[10:40:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: puppet-agent-timer.service on ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:43:57] <Emperor>	 (/Stage[main]/Profile::Swift::Storage/Swift::Init_device[/dev/sdf]/Exec[mkfs-/dev/sdf1]/unless) Check "xfs_admin -l /dev/sdf1" exceeded timeout
[10:45:02] <Emperor>	 (said command returns almost immediately now)
[10:50:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: puppet-agent-timer.service on ms-be2067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:51:22] <Emperor>	 that's interesting, since it's still running (and this time it's  xfs_db -x -p xfs_admin -r -c label /dev/sdi1 that seems stuck)
[10:52:19] <Emperor>	 If that doesn't sort itself out soon, I'll reboot it.
[10:56:51] <jynus>	 going for lunch and I will merge 1192501 when I come back
[10:57:58] <Emperor>	 sort> it didn't. Rebooting.
[14:12:50] <Amir1>	 federico3: can you run your schema change on eqiad hosts? T401906
[14:12:51] <stashbot>	 T401906: Add default value for afl_ip and remove default value for afl_ip_hex in abuse_filter_log table - https://phabricator.wikimedia.org/T401906
[14:13:38] <Amir1>	 actually let me just run it with replication on a couple of them
[14:13:52] <federico3>	 sure
[14:14:09] <federico3>	 want to try using the wrapper perhaps?
[14:15:15] <Amir1>	 no, it's a different thing, when eqiad is depooled, I can just run the alter on master of the dc with replication 
[14:15:37] <Amir1>	 but not right now actually, I have to go. Feel free to start the script (which is safer) on the eqiad hosts for now
[14:15:44] <Amir1>	 I try to get it to tomorrow 
[14:16:01] <federico3>	 ok
[15:43:20] <Emperor>	 I'd like a +1 on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1192575 please, to set new ms-be nodes being ordered to use our ms-be efi preseed setup.
[15:59:10] <federico3>	 looking
[16:01:41] <federico3>	 why don't we unroll these regexps 😭
[16:08:59] <Emperor>	 I think they're globs not regexes
[16:44:09] <tgr_>	 FTR, we caused a spike in sessionstore GET requests for ~2 hours (4-5x the normal value)