[01:34:24] lvs1015 had apparently crashed about 4 hours ago [01:34:34] i just powercycled it and now it's back [01:34:55] it was showing in icinga as 1 host down but almost all the services on it were just UNKNOWN, not CRIT [01:35:44] and the "PyBal BGP sessions are established" check can even stay OK/green when the status is "NaN" and parent host down, btw [01:36:24] so it wasn't very obvious on IRC.. but noticable on web UI [01:36:55] i see all the checks turning green and there did not appear to be any hardware issue at boot..running puppet [07:56:17] 10Traffic, 10Operations: ATS strict round robin parent select policy doesn't work as expected - https://phabricator.wikimedia.org/T242778 (10MoritzMuehlenhoff) p:05Triage→03Normal [09:56:19] 10Traffic, 10netops, 10Operations, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090 (10ayounsi) 05Open→03Resolved Done, dashboard and doc updated. https://grafana.wikimedia.org/d/000000513/ping-offload [10:07:24] 10Traffic, 10Operations: ATS strict round robin parent select policy doesn't work as expected - https://phabricator.wikimedia.org/T242778 (10Vgutierrez) Apparently, when we disabled DNS resolution for parent requests to fix T232209 we introduced part of the issue. On short-lived ATS instances, enabling `proxy.... [10:23:20] 10Traffic, 10Operations: ATS strict round robin parent select policy doesn't work as expected - https://phabricator.wikimedia.org/T242778 (10Vgutierrez) before enabling DNS resolution on cp3052: ` vgutierrez@cp3052:~$ for port in {3120..3127}; do ss "( dport = $port or sport = $port )" |wc -l; done 3 144 4 13... [16:22:35] 10Traffic, 10Discovery, 10Operations, 10Wikidata, and 2 others: LDF server has 404 errors for JS and CSS resources - https://phabricator.wikimedia.org/T237165 (10Gehel) 05Open→03Resolved [18:16:29] 10Traffic, 10Operations, 10Phabricator, 10serviceops, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10mmodell) p:05Normal→03High [20:33:45] ema: is the code from this patch for ats-fe, -be or both? https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/559711/ [20:36:32] Krinkle: ats-tls, which is the fe-most layer, but has not replaced varnish-fe (the current stack from the outside-in view is ats-tls -> varnish-fe -> ats-be) [20:38:49] Right, OK. Makes sense. Wanted to check because if it was ats-be then it would only strip it for some of the cases (those where MW sets it, not the ones where varnish-fe sets it) [20:38:59] perfect