[03:01:38] 10Traffic, 10DNS, 10Operations, 10Performance-Team, and 2 others: Add DNS for WebPageRelay hosts - https://phabricator.wikimedia.org/T242398 (10Krinkle) [04:24:35] 10Traffic, 10Operations, 10Phabricator, 10serviceops, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10mmodell) @Dzahn perhaps we should try just re-enabling aphlict as i... [05:11:39] 10Traffic, 10Operations, 10Phabricator, 10serviceops, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10mmodell) So it's worth pointing out that there are two types of con... [07:42:05] bblack: it's perhaps useful to remember at this point that by default ATS caches up to 5 different alternates for the same url: proxy.config.cache.limits.http.max_alts [07:42:56] apparently the reason for this is that while finding an object given the URL is "fast", going through the alternates is a sequential process [07:44:25] we can disable the limit by setting max_alts to 0 if we decide that the sequential scan for alternates is acceptable, but we would need to check for cases where we don't normalize the headers we vary on [07:44:53] maybe, or maybe this is a non-issue :) [08:55:52] ema: in this case, we have 5 alternates that are supported server side [08:56:03] clients could still be sending junk, but I doubt [08:56:42] this was opened in Sept 2019, maybe the issue was fixed by something else in the meantime? [08:56:51] when was the switch to ATS? [09:03:15] gehel: the very last backends were converted on 2019-12-19, but the migration lasted several months [09:05:58] so it is possible that this issue was accidentally solved by the migration to ATS? Any reason why the behaviour might have been different on varnish? [09:09:39] gehel: which issue was opened in Sept 2019? [09:10:36] ema: T232006 [09:10:36] T232006: LDF service does not Vary responses by Accept, sending incorrect cached responses to clients - https://phabricator.wikimedia.org/T232006 [09:10:56] I was assuming you were referring to that one with your earlier comments to Brandon [09:11:57] yes, then I got confused because T237165 was opened in November [09:12:07] T237165: LDF server has 404 errors for JS and CSS resources - https://phabricator.wikimedia.org/T237165 [09:17:16] gehel: so the premise of the bug report seems false ("the Vary response header doesn’t include the Accept request header"). As you mentioned in T232006#5744249 the appserver *does* respond with "Vary: Accept" [09:17:16] T232006: LDF service does not Vary responses by Accept, sending incorrect cached responses to clients - https://phabricator.wikimedia.org/T232006 [09:18:31] gehel: could you maybe update the ticket description to match current reality, and what (if anything) is broken right now? [09:24:50] * gehel is still trying to figure out if anything is still broken or not :) [09:25:40] I'm tempted to close as we don't seem to be able to reproduce and re-open if anyone has an issue again with this. [09:25:40] :) [09:25:59] I'll let maryum decide, I'll add a comment in the meantime [09:26:04] thanks [09:26:52] 10Traffic, 10Discovery, 10Operations, 10Wikidata, and 2 others: LDF server has 404 errors for JS and CSS resources - https://phabricator.wikimedia.org/T237165 (10Gehel) Thanks @BBlack, this now works as expected. [09:28:54] 10Traffic, 10Operations, 10Wikidata, 10Wikidata-Query-Service, and 2 others: LDF service does not Vary responses by Accept, sending incorrect cached responses to clients - https://phabricator.wikimedia.org/T232006 (10Gehel) a:03Mstyles Investigation on this was blocked by T237165 (which is now resolved).... [09:35:28] 10Traffic, 10Operations: varnish parent unable to send signals to child - https://phabricator.wikimedia.org/T242411 (10ema) [11:31:05] 10Traffic, 10Operations: varnish-fe crashes due to "Error in munmap(): Cannot allocate memory" - https://phabricator.wikimedia.org/T242417 (10ema) [11:31:10] 10Traffic, 10Operations: varnish-fe crashes due to "Error in munmap(): Cannot allocate memory" - https://phabricator.wikimedia.org/T242417 (10ema) p:05Triage→03High [11:33:20] 10Traffic, 10Operations: varnish-fe crashes due to "Error in munmap(): Cannot allocate memory" - https://phabricator.wikimedia.org/T242417 (10ema) [12:04:52] there's a "varnish frontend child restarted" alert for cp3054, known? [12:11:49] I think so, ema ^^ [12:39:57] 10Traffic, 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (Development services): Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10kostajh) Could we please get an update on the timeframe for this? [12:40:04] 10Traffic, 10Operations, 10Phabricator, 10serviceops, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Phabricator downtime due to aphlict and websockets (aphlict current disabled) - https://phabricator.wikimedia.org/T238593 (10akosiaris) > I really can't think of any reason for it to be using... [13:12:23] moritzm: the event is known, the reason is not [13:13:01] alert ack'ed [13:16:39] ack [13:21:18] Traffic folks, please get your OKRs up in the OKR google doc by end of today :) [13:21:21] drafts [14:41:06] 10Traffic, 10Operations: varnish parent unable to send signals to child - https://phabricator.wikimedia.org/T242411 (10ema) p:05Triage→03Normal [20:20:11] 10Traffic, 10Operations, 10Wikidata, 10Wikidata-Query-Service, and 2 others: LDF service does not Vary responses by Accept, sending incorrect cached responses to clients - https://phabricator.wikimedia.org/T232006 (10Mstyles) @gehel I think we can consider this closed unless someone is able to reproduce [21:16:48] 10Traffic, 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (Development services): Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10srodlund) @kostajh We're in progress with the tech blog. It still ne... [23:43:44] 10Traffic, 10Operations, 10Performance-Team: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478 (10Krinkle) [23:44:25] 10Traffic, 10Operations, 10Performance-Team: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478 (10Krinkle) p:05Triage→03High [23:45:08] 10Traffic, 10Operations, 10Performance-Team: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478 (10Krinkle) I recall that in the pre-ATS setup, we explicitly configured the interaction between applayer and traffic to not request compressed re... [23:45:24] 10Traffic, 10Operations, 10Performance-Team: Production load.php spends ~ 10% time doing output compression within PHP - https://phabricator.wikimedia.org/T242478 (10Krinkle)