[08:09:13] Krinkle: too late! Let's merge the change after all-hands, shall we? [09:53:37] :) [11:48:37] 10Traffic, 10Operations: varnishmtail panics on buster - https://phabricator.wikimedia.org/T243591 (10Vgutierrez) [11:50:57] 10Traffic, 10Operations: varnishmtail panics on buster - https://phabricator.wikimedia.org/T243591 (10Vgutierrez) p:05Triage→03Normal [11:51:53] hmm that's interesting... that's actually mtail panicking [11:52:43] ohh interesting version choice debian: 3.0.0~rc19-2 [11:53:31] is that rc17 or 19v2? :P [11:53:53] lol [11:57:35] 10Traffic, 10Operations: varnishmtail panics on buster - https://phabricator.wikimedia.org/T243591 (10Vgutierrez) that panic comes from mtail itself, after all varnishmtail is running: `/usr/bin/varnishncsa -n frontend -c -b -F "${FMT}" | mtail -progs "${PROGS}" -logs /dev/stdin` on the same host, atsmtail l... [14:09:34] vgutierrez: not sure of details but I know shdubsh was looking at mtail Debian packaging, maybe for a working version? [14:10:28] so we have a specific version working for us on stretch [14:10:35] cdanis: hey! Yeah we need 3.0.0~rc5 unfortunately [14:10:52] support for reading from stdin got broken on later versions IIRC [14:10:58] ah, right [14:13:20] I can't find the relevant issue though, there's https://github.com/google/mtail/issues/3 filed by godog with contributions from jbond42, which is similar [14:14:07] so the issue in practice is that with newer versions we get: [14:14:08] panic: runtime error: index out of range [14:16:01] ema: i think with newer versions you can use `-log /dev/stdin` instead of `-logfds 1` thats what i did to get it working with i think rc19, however i think godog may have found an additional issue [14:16:50] never mind i hadn't read the ticket i see that switch is allready in use [14:16:56] https://phabricator.wikimedia.org/T225604#5345679 [14:16:57] ema: okay [14:17:51] cdanis: thanks for the reminder :) although i dont think i spent much more time on it then that im afraid [14:26:01] correction: the problem with more recent mtail versions is that all our ci nosetests depend on the -one_shot_metrics argument, which got removed [14:26:24] the current crashes we're seeing on cp4032, recently upgraded to buster, seem to be due to some other reason [14:26:55] so I've downgraded mtail on cp4032 [14:27:02] and now we got data on https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?orgId=1&var-site=ulsfo%20prometheus%2Fops&var-instance=cp4032&fullscreen&panelId=73&from=now-15m&to=now [14:27:26] \o/ but also a bit of /o\ [14:27:31] yup [14:27:35] +1 to upload it to apt.wm.o? [14:27:42] definitely [14:27:48] lovely [14:28:51] 10Traffic, 10Operations: varnishmtail panics on buster - https://phabricator.wikimedia.org/T243591 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez Solved after rebuilding mtail 3.0.0~rc5 for buster [14:28:54] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10Vgutierrez) [14:30:42] BTW, note that it's something related to a text mtail program [14:30:57] cause on cp4026 has been happily running: https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?orgId=1&fullscreen&panelId=73&from=now-24h&to=now&var-site=ulsfo%20prometheus%2Fops&var-instance=cp4026 [14:31:20] interesting [14:32:29] so the culprit must be varnishrls.mtail [14:33:38] I see that cp4026 is running 3.0.0~rc5 too though, but it seems you just downgraded it [14:33:54] yep [14:33:59] for consistency sake [14:34:20] I don't wanna go crazy due to mtail version discrepancies [14:34:33] (even more crazy than right now of course) [14:35:30] let's open an issue on google/mtail? [14:36:26] +1 [14:43:31] filled as https://github.com/google/mtail/issues/289 [14:43:58] 10Traffic, 10Operations: varnishmtail panics on buster - https://phabricator.wikimedia.org/T243591 (10Vgutierrez) reported to upstream as https://github.com/google/mtail/issues/289 [15:55:56] 10Traffic, 10Operations: varnishmtail panics on buster - https://phabricator.wikimedia.org/T243591 (10colewhite) [16:20:07] vgutierrez: Gah, this again. It'd be helpful if we had a way to reproduce this outside of production. [16:21:17] Looking around at other open mtail issues, I don't think we're the only ones seeing it. [16:23:39] vgutierrez: not reproducible in beta? Lack of traffic or something else? [18:13:52] 10Domains, 10Traffic, 10DNS, 10Operations: Donate wikiźródła.pl and wikisłownik.pl to the Foundation - https://phabricator.wikimedia.org/T240446 (10CRoslof) 05Stalled→03Resolved a:03CRoslof These domain names have now been transferred to the Foundation and I've updated them to use the Foundation's na...