[09:43:54] lunch [12:51:40] greetings [13:05:37] inflatador: I saw your invitation to the Ansible Learning Circle, but since it was sent to our mailing list, it does not show up in my calendar. Could add the people on the team to this event directly? [13:21:36] gehel ACK, will send again [13:22:01] Thanks ! [13:34:22] inflatador: what's your level of familiarity with docker? [13:35:48] Trey314159 I'm steadily mediocre. Why, what's up? [13:36:06] Mine is close to zero, so I'm stuck trying to help ejoseph get a properly running terminal/command line within the container [13:38:13] it's usually "docker exec -it /bin/bash" - or something like that [13:38:29] ^^ [13:39:18] Feel free to send me a Meet link if you want me jump on [13:39:22] topranks: weird things are happening.. he gets a terminal, but doesn't have a username (it actually says "I have no name!" in the prompt) [13:39:35] inflatador: https://meet.google.com/swt-hpvi-mgx?authuser=0 [13:39:48] That's fairly typical, there might not be much of the OS exposed within the container [13:39:50] It's a container - you shouldn't expect a normal userland [14:35:40] ryan-kemper FYI for when you come in, I noticed some omega (9443) shards getting stuck moving between elastic1100 and another one of the new hosts (1098 maybe?). I restarted the omega service on elastic1100 but omega's still stuck in yellow. Continuing to troubleshoot... [14:56:13] looks like firewall issues again, running puppet agent again but I doubt it will fix it [14:59:59] also, won't make retro today [15:02:55] ryankemper: retrospective if you are around: https://meet.google.com/eki-rafx-cxi [15:53:06] back [15:53:38] ryankemper looks like the same firewall issues we saw yesterday, cluster communication ports (9300/9500) are open but not REST API ports (9200/9400). Do you remember what we did to fix that? [16:24:05] looks like the 1100-1102 hosts are in a different network from the others (10.64.136.0/24), best guess at the moment is that whatever ferm doesn't think that's a valid elastic network. [16:30:34] disabled puppet and stopped elasticsearch on 1100-1102 [16:31:09] bleh, phone got unplugged [16:31:15] attempts to ban via ES API fail with `failed to process cluster event (reroute_after_cluster_update_settings) within 30s` [16:31:24] up now [16:31:30] inflatador: looking [16:31:45] np, looks like we're back to green for omega [16:32:03] banning worked now too [16:32:26] yeah looks like the api thought it was getting too many state changes or something [16:32:58] yeah, or maybe it couldn't propagate that change to the broken hosts, so it gave up? [16:34:51] networks might be in modules/base/templates/firewall/defs.erb in puppet, still checking [16:47:45] heads up, we are red in the main eqiad cluster, probably just a bad index alias...checking it out now [16:51:03] OK, fixed [17:59:56] firewall task created at https://phabricator.wikimedia.org/T315038 , feel free to add more details as needed [18:04:22] dr appointment, back in ~90m hopefully [18:29:23] coldbooting router, will be @ sre pairing in 5' [18:33:41] ryankemper: I have a last minute meeting with Willy, I'll be a bit late [18:33:49] gehel: ack [20:15:36] back, but my eyes are still blurred from dilation at the eye doctor. I'm going to take another hour and hope things are better by that time [21:21:23] Eyes are still blurry. I'm going to call it a day. See you tomorrow!