[11:28:30] !log admin [codfw1dev] created VM bastion-codfw1dev-04 to replace current bastion -03 (T374828) [11:28:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:28:37] T374828: openstack: vxlan: verify nova proxy and floating IPs work with new VXLAN-based network - https://phabricator.wikimedia.org/T374828 [13:31:38] !status OK, upgrading k8s, some pods will be restarted [17:02:41] quick bridge test [17:02:47] and back [17:02:52] okay, all working, don’t mind me :) [17:34:23] I have a few pods stuck with a Terminating status. Would you like a task or a list here? [17:36:15] where are they? [17:36:20] (which tool) [17:36:32] tool-jjmc89-bot [17:37:32] looking [17:38:58] they seem stuck on NFS [17:39:34] I figured - that's usually the case when this happens [17:41:13] yep, we have a bit of extra-unstable ceph cluster for a few days as we are draining a rack on it, and it's getting close to full limit [17:41:22] that makes nfs a bit flaky too [17:51:17] restarting the last worker, that should get rid of it [17:58:39] thanks, dcaro [18:09:17] np, /me off for the day [19:17:51] !log lucaswerkmeister@tools-bastion-13 tools.quickcategories toolforge envvars create TOOL_DATABASE__TOOLSDB true [19:17:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.quickcategories/SAL [19:17:56] !log lucaswerkmeister@tools-bastion-13 tools.quickcategories toolforge envvars create TOOL_DATABASE__DB s53976__quickcategories [19:17:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.quickcategories/SAL [19:20:15] !log lucaswerkmeister@tools-bastion-13 tools.quickcategories deployed a82f584c44 (read toolsdb from envvars); commented out DATABASE section in config.yaml, should use envvars instead [19:20:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.quickcategories/SAL [22:13:49] I have a webservice stuck pending in `get pods`. `kubectl logs -f podname` returns no output. [22:14:00] is tools healthy? [22:16:13] huh idk. restart didn't fix it. then 5+ mins later stop and separate start did fix it. [22:16:24] I wish the log had said something [22:53:44] @jeremy_b: you might be able to see what happened with `kubectl get events`. A Pod in "pending" usually means it was waiting for an exec node that could handle the request or for the image to pull from the registry. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase [23:22:41] No resources found in tool-${TOOL} namespace. [23:22:55] but it was no longer pending when I asked for events [23:35:42] yeah, events aren't kept for very long in my experience [23:36:10] but if you look at them in time (next time) they can still be useful [23:49:50] oh fun. stashbot's pod isn't finding a place to run from. [23:50:59] @jeremy_b: https://phabricator.wikimedia.org/P69197 is an example of what events shows when things are not great on the kubernetes cluster side of things [23:51:51] why are 15 nodes unschedulable is the question of the moment [23:52:24] and k8s-status is hard down? blerg