[09:38:39] moritzm: cp3034 and cp3040 have been running jessie 8.7 with 4.4.2-3+wmf8 for a while now with no issues, OK to proceed with the upgrade on other cache hosts? Also upgrading to openssl 1.0.2k? [09:42:44] yeah, sounds good! [10:06:47] alright then I'll start with maps [12:22:30] <_joe_> ema, bblack I need your help in performing a ban in varnish [12:25:01] <_joe_> as in [12:25:25] <_joe_> banning a certain client to make rogue requests to the mw api [12:25:32] <_joe_> but that might not be necessary [16:45:29] 10Traffic, 06Operations, 15User-Elukey: prometheus-vhtcpd-stats cronspamming if vhtcpd is not running yet - https://phabricator.wikimedia.org/T157353#3002382 (10ema) [17:03:53] 10netops, 06Operations: netops: switch all subnets to use install1002/2002 as DHCP - https://phabricator.wikimedia.org/T156109#3002420 (10Dzahn) a:05akosiaris>03Dzahn [17:33:32] 503 spikes in esams [17:33:35] ema: reboots? [17:34:02] bblack: yep, stopping them for now [17:38:53] maybe lvs3 lacking etcd updates? [17:50:36] mmh no I wouldn't say so, all servers reported as being down in pybal.log were also correctly identified as depooled by lvs3001 [17:51:30] now, who knows if the servers were actually depooled in LVS [17:52:29] we've still got two esams servers to reboot, let's see if they get depooled for real [17:55:14] yeah depooling seems to work fine [17:56:30] (now) [17:59:40] ok :) [18:02:08] 10netops, 06Operations: netops: switch all subnets to use install1002/2002 as DHCP - https://phabricator.wikimedia.org/T156109#3002687 (10Dzahn) a:05Dzahn>03mark [18:05:59] 5 text hosts left, I'll finish the reboots with a longer sleep after depool [18:10:29] strange that only esams was affected though [18:11:11] well ulsfo too, to a lesser extent [18:18:09] 503s again, looking [18:20:07] ulsfo and esams mostly [18:24:45] heh the fact that the icinga check kicks in minutes after the issue is gone doesn't help [18:29:25] sweet, we've got varnishmedia.service on some text machines :) [18:29:45] (that explains the 'check systemd state' alerts) [18:32:19] nice [18:32:53] so the 503s in ulsfo and esams seem to happen pretty much at the same time [18:33:56] well the spikes happen *exactly* at the same time [18:42:55] ok, text is done with the reboots [18:43:20] tomorrow I'll work on upload and follow closely what happens [18:52:21] can i shutdown cp3011-cp3022 [18:52:42] from https://phabricator.wikimedia.org/T130883 [18:58:24] bblack: ^ ? [18:58:30] time to eat! see you tomorrow [19:47:54] mutante: yes I think you can [19:48:21] bblack: ok, thanks [19:54:31] 10Traffic, 06Operations, 06Wikipedia-Android-App-Backlog, 06Wikipedia-iOS-App-Backlog, and 2 others: Zero: Investigate removing the limit on carrier tagging to m-dot and zero-dot requests - https://phabricator.wikimedia.org/T137990#3003140 (10JMinor) p:05Normal>03Low [23:58:05] 07HTTPS, 10Traffic, 10Collection: Book collections communicate with pediapress using http: - https://phabricator.wikimedia.org/T157398#3003811 (10Platonides)