[07:34:21] we now have purged metrics in prometheus: https://grafana.wikimedia.org/d/RvscY1CZk/purged?orgId=1&from=1586416687917&to=1586417622638 [07:37:51] \o/ [07:38:41] hmm if I'm reading that properly.. purged is generating up to 14k PURGE requests per second? [07:38:49] 10Traffic, 10Operations, 10Security: HTTP MediaWiki API GET requests to Wikimedia wikis should not be redirected to HTTPS when they have a session cookie or Authorization header - https://phabricator.wikimedia.org/T247490 (10MoritzMuehlenhoff) p:05Triage→03Medium [07:40:04] pure madness [07:40:29] 10Traffic, 10OpenRefine, 10Operations, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team): Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10MoritzMuehlenhoff) p:05Triage→03High [07:46:00] vgutierrez: yeah, that spike there is ~ 7K http PURGE per layer [07:53:31] 10Traffic, 10OpenRefine, 10Operations, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team): Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10Joe) As far as I can see, apache (which sits beyond envoy) emits all cooki... [08:06:31] 10Traffic, 10OpenRefine, 10Operations, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team): Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10Pintoch) Switching on headers capitalization would be absolutely fantastic... [09:41:47] 10Traffic, 10Operations: varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 (10ema) [09:41:55] 10Traffic, 10Operations: varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 (10ema) p:05Triage→03High [09:43:58] 10Traffic, 10Operations: varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 (10ema) [09:44:12] 10Traffic, 10Operations: cache_upload varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 (10ema) [09:45:35] I'm gonna try disable the limit on transient storage on cp3051 and see what happens ^ [10:17:16] without the limit transient usage is going occasionally up to 13G, and does not seem to ever go below 8G much [10:18:45] oh [10:19:01] I think we're not capping correctly the ttl of hfp objects on cache_upload [10:19:17] varnishlog -n frontend -q 'VCL_return eq "pass"' | grep 'TTL HFP' [10:24:39] mmh no, hfp on large objects just happens less often on text. We might want to introduce a low ttl cap on hfp for large objects though? [10:27:54] so the experiment on cp3051 with disabling transient limit is going as one would have expected: there are no fetch failures but we're using lots of transient memory [10:28:53] I'm gonna try to downgrade varnish to 5.1.3-1wm12 (that's the version without 0035-vbf_stp_condfetch_crash.patch) on cp3051 and see if somehow transient usage goes down [10:34:46] mmh [10:34:54] haven't restarted yet, I've just depooled the node [10:36:31] it took a while for transient to be cleared, see https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?panelId=8&fullscreen&orgId=1&from=1586426597520&to=1586428560700&var-site=esams%20prometheus%2Fops&var-instance=cp3051 [10:40:05] but still it was cleared in much less time than the HFP object's TTLs I've been seeing [10:40:21] which makes me wonder what exactly we are storing in transient [10:40:40] objects', not object's [10:48:33] downgraded to wm12, we're still using too much transient: [10:48:35] https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?panelId=8&fullscreen&orgId=1&from=1586427854121&to=1586429276244&var-site=esams%20prometheus%2Fops&var-instance=cp3051 [10:51:46] I'll let cp3051 run with wm12 for now and go make lunch, one shouldn't draw conclusions on a empty stomach :) [11:02:15] 10netops, 10Operations, 10cloud-services-team (Kanban): CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606 (10aborrero) 05Open→03Stalled We are not planning on working on this anytime soon. [12:27:21] 10Traffic, 10Operations: cache_upload varnish-fe exhausting transient memory - https://phabricator.wikimedia.org/T249809 (10ema) [16:35:10] 10Traffic, 10OpenRefine, 10Operations, 10serviceops, and 2 others: Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10Joe) 05Open→03Resolved a:03Joe From my tests, now we get all cookies correctly set: ` $ curl --http1.1 -sIL... [16:36:42] 10Traffic, 10OpenRefine, 10Operations, 10serviceops, and 2 others: Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10Joe) @Pintoch the behaviour should be restored now. Can you confirm old versions of OpenRefine work correctly now? [16:56:42] 10Traffic, 10OpenRefine, 10Operations, 10serviceops, and 2 others: Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10Pintoch) Thanks a million, this is very kind of you! I can confirm this works, edits are coming through again fro...