[18:54:28] I've just redployed a pod in codfw and it's getting `UnknownHostException: flink-zk2001.codfw.wmnet: Temporary failure in name resolution`. But from a deploy host i can ping (-4 and -6) zk. Any ideas? [18:58:26] it's not isolated to zk2001, separate retries have mentioned zk2002 and zk2003 [19:27:43] from playing around in the shells(call gethostbyname from perl), it looks like both a working and broken container are using 10.192.72.3 as the dns server, but on the broken one requests are timing out (firewalled?) [19:35:44] ^^ looking at this, best guess is we need to update for calico network policies [20:18:13] per above, ebernhardson was able to get it to work by redeploying. When the original pod `flink-app-producer-b65b89bf5-5hff9` was scheduled on `wikikube-worker2043.codfw.wmnet ` , it had no networking. Just a heads-up in case anyone wants to take a look. I don't see anything obviously wrong w/that host [20:18:59] just doing a cursory glance at https://wikitech.wikimedia.org/wiki/Calico [20:23:17] Sigh, something has gone wrong with the reimaging then. I ll have look tomorrow, but in the meantime could you cordon it please inflatador? [20:23:42] akosiaris ACK, will do [20:23:50] Thanks! [20:23:58] akosiaris do we have a cook-book for this or is it just the regular kubectl cmd? [20:24:40] ah, looks like a cook-book https://wikitech.wikimedia.org/wiki/Kubernetes/Administration [20:25:39] Yeah there is a cookbook, best run that one. [20:28:31] akosiaris cool, just ran the cook-book and it looks to have worked correctly