[08:53:02] 10Traffic, 10Varnish, 06Operations, 13Patch-For-Review: varnishmedia: repeated calls to flush_stats() - https://phabricator.wikimedia.org/T132474#2317462 (10ema) 05Open>03Resolved a:03ema [10:54:08] 10Traffic, 10Beta-Cluster-Infrastructure, 06Labs, 06Operations: deployment-cache-upload04 (m1.medium) / is almost full - https://phabricator.wikimedia.org/T135700#2317668 (10hashar) I have checked after the week-end and deployment-cache-upload04 shows the FD leak. Via `lsof -X -n|grep deleted`: * Lot of... [11:05:19] 10Traffic, 10Beta-Cluster-Infrastructure, 06Labs, 06Operations: deployment-cache-upload04 (m1.medium) / is almost full - https://phabricator.wikimedia.org/T135700#2317675 (10Joe) @hashar the reason you see all those deleted "varnishd" lines is that varnish has been updated on disk but not restarted, which... [11:10:04] 10Traffic, 10Beta-Cluster-Infrastructure, 06Labs, 06Operations: deployment-cache-upload04 (m1.medium) / is almost full - https://phabricator.wikimedia.org/T135700#2317678 (10Joe) So the problem - that we have in production too (!!!) is that the logrotate receipt calls ``` invoke-rc.d varnishlog reload ```... [11:11:31] 10Traffic, 10Beta-Cluster-Infrastructure, 06Labs, 06Operations: Varnishlog doesn't properly rotates logs, varnish.log is empty since forever (was: deployment-cache-upload04 (m1.medium) / is almost full) - https://phabricator.wikimedia.org/T135700#2317679 (10Joe) p:05Low>03High [11:11:37] <_joe_> ema: ^^ [12:12:03] _joe_: so logrotate reloads varnishlog but the systemd unit doesn't support reloads [12:12:34] <_joe_> yes [12:12:40] <_joe_> ema: I am going to work on it now [12:13:00] _joe_: ok, are you going to add reload support to the unit? [12:13:15] or were you planning on changing the logrotate part? [12:13:15] <_joe_> yes [12:13:19] <_joe_> the former [12:13:22] perfect, +1 [12:13:26] <_joe_> actually, both [12:26:08] <_joe_> ema: should I modify our varnish debian package? [12:26:29] mmh but do we actually need varnish.log, I was thinking? [12:26:36] <_joe_> probably not [12:26:49] <_joe_> also it only looks at the backend, I think [12:27:02] right, and all the stuff for analytics is reading straight from the vsm files [12:28:33] <_joe_> so let's wait for bblack, but stopping varnishlog altoghether could be a good idea [12:28:47] yeah, I think it's the way to go [12:29:00] <_joe_> ema: are we using the jessie varnish package as a base for ours? [12:29:11] _joe_: yep [12:29:12] <_joe_> because that is a straightforward bug of that package [12:29:25] <_joe_> pretty embarassing indeed [12:29:37] _joe_: https://github.com/wikimedia/operations-debs-varnish4/tree/debian-wmf [12:30:10] that's our branch with local patches [12:30:53] <_joe_> that's varnish 4 [12:30:58] <_joe_> what about varnish 3? [12:31:04] <_joe_> let me look at varnish 4 btw [12:31:28] _joe_: https://github.com/wikimedia/operations-debs-varnish/tree/3.0.6-plus-wm [12:33:57] <_joe_> ok [12:34:14] <_joe_> I want to check if the bug is upstream as well [12:34:39] _joe_: varnishlog has been removed from v4 [12:34:42] https://github.com/wikimedia/operations-debs-varnish4/commit/0b9cd9a54b2c7b6aaa5afd30e3311749bf84a070 [12:35:51] and v3 is pretty old stuff (eg: only in oldstable) https://packages.qa.debian.org/v/varnish.html [12:36:04] <_joe_> yes, no reason to report a bug [12:38:12] 10Traffic, 10Beta-Cluster-Infrastructure, 06Labs, 06Operations: Varnishlog doesn't properly rotates logs, varnish.log is empty since forever (was: deployment-cache-upload04 (m1.medium) / is almost full) - https://phabricator.wikimedia.org/T135700#2318200 (10Joe) A third option is we just stop varnishlog as... [12:50:25] _joe_: perhaps something like this? https://gerrit.wikimedia.org/r/290208 [12:53:36] <_joe_> lol [12:54:04] <_joe_> I added the condition in varnish::common instead [12:54:09] :) [12:54:12] <_joe_> (https://gerrit.wikimedia.org/r/290209) [12:54:21] <_joe_> whatever you think it's cleaner [12:54:50] <_joe_> and no, actually your patch is incorrect [12:56:30] ? [12:56:40] <_joe_> see my comment [12:56:46] <_joe_> puppet delicacies... [12:58:40] _joe_: I don't think we declare the varnishlog service anywhere else [12:59:26] <_joe_> ema: but you can define two varnish::logging instances [12:59:29] <_joe_> and if fact we do [12:59:30] oooh [13:00:14] <_joe_> "nothing with a non-variable name should be in a puppet define" [13:00:34] <_joe_> where non-variable is anything /not/ based on $title :) [13:00:57] OK so varnish::common is a better idea [13:04:48] <_joe_> it works :) [13:05:27] _joe_: I'd do that only if !varnish_version4 though [13:05:44] v4 does not provide the service at all [13:05:52] <_joe_> ema: yeah it's inside a conditional in fact [13:06:05] <_joe_> I added it to the varnishlog.py conditional [13:06:20] wonderful! [13:07:44] thanks :) [13:12:17] 10Traffic, 10Beta-Cluster-Infrastructure, 06Labs, 06Operations, 13Patch-For-Review: Varnishlog doesn't properly rotates logs, varnish.log is empty since forever (was: deployment-cache-upload04 (m1.medium) / is almost full) - https://phabricator.wikimedia.org/T135700#2318265 (10Joe) 05Open>03Resolved [14:46:49] https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud-update.pdf [14:54:25] among other interesting things: cluster load is also taken into account by DNS, not only geolocation [14:57:30] and they also use DR [15:33:56] 10netops, 06DC-Ops, 06Operations, 10ops-codfw: setup wifi in codfw - https://phabricator.wikimedia.org/T86541#2318735 (10RobH) p:05Normal>03High Papaul is still using the mifi for ALL onsite work. Is there anything that either Papaul or I can do to move the setup of wifi in codfw along? [16:13:56] 10netops, 06Operations: cr2-codfw LUCHIP/trinity_pio error messages - https://phabricator.wikimedia.org/T134932#2318924 (10faidon) Mark rebooted the FPC on Friday: > 13:45 mark: Enabled cr2-codfw et-0/* interfaces, reenabling OSPF/OSPF3 > 13:38 mark: Bringing cr2-codfw FPC 0 back up > 13:37 mark: Offlinin... [16:48:07] 10Traffic, 10Analytics, 06Operations, 06Performance-Team: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2310174 (10Milimetric) I have a quick suggestion to make this play nice with client-side sampling. First, the problem: some Event Logging instrumentation randomly only sends b... [18:42:21] 10Traffic, 06Operations, 06Performance-Team, 13Patch-For-Review, 07perfnotice: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2320679 (10Krinkle) [21:52:15] 10Traffic, 10Wikimedia-Apache-configuration, 10DNS, 06Operations, 10Wikimedia-General-or-Unknown: m.{project}.org portal/redirect consistency - https://phabricator.wikimedia.org/T78421#2321287 (10Jdlrobson) [21:52:17] 10Traffic, 10Wikimedia-Apache-configuration, 10MediaWiki-extensions-ZeroBanner, 06Operations, and 4 others: m.wikipedia.org incorrectly redirects to en.m.wikipedia.org - https://phabricator.wikimedia.org/T69015#2321285 (10Jdlrobson) 05Open>03stalled Stalled per https://phabricator.wikimedia.org/T69015#...