[13:16:07] I'm seeing some alerts from tools-prometheus-6 being down and now tools-prometheus-7 [13:16:18] -6 is back up but it flapped twice since this morning [13:26:26] -7 is indeed unreachable, "virsh console" is also getting stuck [13:28:50] I can see a big spike in server load for -6 this morning [13:34:42] aaand -7 is back [15:21:01] tools-prometheus-6 is not responding again (3rd time today). andrewbogott can you think of anything to check? maybe it's just excessive load, but it's quite a beefy vm [15:23:35] dhinus: The only (unlikely) think I can think of is if things are too crowded on the hypervisor and it's getting CPU starved -- I squished everything together to make room for the switch restart. [15:23:40] I'll move it and we'll see if that makes a difference. [15:24:20] thanks, good idea! [15:28:28] well, -6 now has a hypervisor all to itself and I still can't ssh [15:28:43] although I guess it might've been broken already before I moved it [15:29:27] Did you have to reboot it before? [15:30:55] dhinus: should I reboot -6? [15:34:56] * andrewbogott does [15:38:06] the oomkiller was firing before I restarted it [15:38:55] so not a noisy neighbor issue [16:05:48] andrewbogott: sorry, didn't see your ping. I don't know much about that host, but I assume it's safe to reboot [16:05:58] this morning it came back on its own after 1 hour of being unresponsive [16:06:31] Ok. Will see how it goes [16:06:52] where did you see the oomkiller was firing? [16:08:05] could you ssh to it? I could not when I tried earlier [16:24:19] now cloudinfra-cloudvps-puppetserver-1 is alerting :/ [16:33:06] ssh and "virsh console" are not working [16:40:13] I'll try forcing a reboot [16:45:41] "openstack server reboot" fixed it [16:46:53] I'm looking at the journald log for the past hour and I see "sssd_nss[2528288]: Shutting down (status = 0)" [16:52:37] I remember sssd caused some issues to other vms in the past, but I'm not sure this is the same issue [16:53:10] if this keeps happening I will open a task tomorrow [16:53:24] * dhinus off for today