[00:16:56] ryankemper interestingly enough, there was an alert for unexpected MSS value on cirrussearch1081 https://alerts.wikimedia.org/?q=alertname%3DFermMSS [00:17:11] you just restarted it, right? [00:18:00] inflatador: yup 1081 was the one I restarted. or rather stopped but puppet wasn't disabled so it's back up now [00:18:55] ryankemper did it join the cluster OK? after split brain it's safer to wipe out its datadir and let it join as a new node [00:20:16] inflatador: nevermind it's not back up now. must have mixed up my tabs [00:21:22] ryankemper ACK, I'd just leave it out and leave off puppet until we figure out the MSS alert thing, probably till tomorrow [00:22:11] inflatador: disabled puppet on 1081 [00:22:18] it's possible I screwed up the hieradata for that hosts' IPIP [00:22:32] but we are good to repool eqiad now, will do that unless you have any concerns [00:23:09] inflatador: we're still yellow status, repooling is fine but might be wise to wait for green while stuff reshuffles [00:23:25] ryankemper ACK, will wait. I'm also gonna wipe the datadir for 1081, better to be safe with that [00:23:32] sounds good [00:24:55] If you wanna work on an incident report, https://wikitech.wikimedia.org/w/index.php?create=Create+report&title=Incidents%2F2025-07-07+Cirrussearch+outage&redirect=no is a good place to start..otherwise it can wait until tomorrow [00:26:39] just downtimed 1081 for the next 12 hrs [00:33:53] ryankemper sorry, I misread your update. I actually am repooling now [00:37:11] BTW, this is the best panel for seeing the changes by pooling/depooling https://grafana.wikimedia.org/goto/7gTI1_UHR?orgId=1 [10:03:16] I got pinged by traffic because cirrussearch1081 is generating alerts due to this MSS inconsistency. Conversation happening here: https://wikimedia.slack.com/archives/C055QGPTC69/p1753177774854319 [10:12:48] I have depooled cirrussearch1081, as pybal probes were failing anyway, so it wasn't really pooled. [10:13:09] Pybal is also expecting a service to be listening on port 9200 on this host and nothing is currently listening on that port.