[00:43:40] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, 10Release-Engineering-Team (Seen): Create mw-web helmfile deployment - https://phabricator.wikimedia.org/T321900 (10Krinkle) [00:43:50] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, 10Release-Engineering-Team (Seen): Create mw-jobrunner helmfile deployment - https://phabricator.wikimedia.org/T321897 (10Krinkle) [00:44:00] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, 10Release-Engineering-Team (Seen): Create mw-videoscaler helmfile deployment - https://phabricator.wikimedia.org/T321899 (10Krinkle) [00:44:09] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, 10Release-Engineering-Team (Seen): Create mw-api-int helmfile deployment - https://phabricator.wikimedia.org/T321895 (10Krinkle) [00:44:31] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, 10Release-Engineering-Team (Seen): Create mw-api-ext helmfile deployment - https://phabricator.wikimedia.org/T321896 (10Krinkle) [10:08:06] 10Traffic, 10SRE: haproxy::site doesn't work as expected on the first puppet run - https://phabricator.wikimedia.org/T321684 (10Vgutierrez) I believe this is not affecting cp instances. In your log, systemd is complaining about several notifications: ` systemd[1]: haproxy.service: Got notification message from... [13:33:04] 10Traffic, 10Data-Services, 10SRE: 2022-09-04 Scraping from AS714 (Apple) against dumps.wikimedia.org saturating network links - https://phabricator.wikimedia.org/T317001 (10jbond) [14:20:41] 10netops, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE, 10Puppet: Moving network::external to hiera broke much of labs - https://phabricator.wikimedia.org/T141959 (10jbond) 05Open→03Resolved a:03jbond Im going to close this ticket assuming that the issues has been resolved in the mean time... [15:15:46] 10Wikimedia-Apache-configuration, 10Observability-Logging, 10SRE, 10User-Joe: Gain visibility into httpd mod_proxy actions - https://phabricator.wikimedia.org/T188601 (10jbond) is this related to some specific service or perhaps now no longer valid? [15:26:59] 10Traffic, 10SRE: ATS flags origin servers as down during 60 seconds after a connect timeout - https://phabricator.wikimedia.org/T322420 (10Vgutierrez) [15:27:15] 10Traffic, 10SRE: ATS flags origin servers as down during 60 seconds after a connect timeout - https://phabricator.wikimedia.org/T322420 (10Vgutierrez) p:05Triage→03Medium a:03Vgutierrez [15:31:15] 10Traffic, 10SRE: ATS flags origin servers as down during 60 seconds after a connect timeout - https://phabricator.wikimedia.org/T322420 (10BBlack) Arguably we want this down server cache time to be very low or even disabled in the general case. It's not likely that caching the origin outage is going to help... [15:33:41] 10Traffic, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10User-Dereckson: Create /community-beacon alternative entry point - https://phabricator.wikimedia.org/T155929 (10jbond) [15:35:25] 10Traffic, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10User-Dereckson: Create /community-beacon alternative entry point - https://phabricator.wikimedia.org/T155929 (10jbond) Adding traffic to analyses the varnish comment, however i also wonder if this task is still valid or has perhaps b... [16:00:04] 10Traffic, 10SRE: ATS flags origin servers as down during 60 seconds after a connect timeout - https://phabricator.wikimedia.org/T322420 (10Vgutierrez) I think `proxy.config.http.connect.dead.policy` is also interesting for us: ` Controls what origin server connection failures contribute to marking a server de... [19:30:15] 10Traffic, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10User-Dereckson: Create /community-beacon alternative entry point - https://phabricator.wikimedia.org/T155929 (10Pcoombe) I'm not sure I understand the reasoning for this. Wikimedia's CentralNotice generally isn't blocked by ad lists... [22:05:58] 10Traffic, 10SRE: strip non session cookies before cache lookup in ATS - https://phabricator.wikimedia.org/T316338 (10Krinkle) >>! In T316338#8205843, @Vgutierrez wrote: > As a direct result cache hitrate shows up to a 100% increase in the text cluster at the ats layer […] Images for future reference, as from... [22:52:01] 10Traffic, 10Discovery-Search, 10Observability-Alerting, 10SRE: Use DNS name instead of IP in PyBal alerts - https://phabricator.wikimedia.org/T322377 (10Dzahn) The check is defined in: ` modules/pybal/manifests/monitoring.pp: nrpe::plugin { 'check_pybal_ipvs_diff': ` so it runs a command via NRPE o... [23:32:30] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10Upstream: ATS cache read p999 metrics shows up requests taking up to 1 second on cache read operations - https://phabricator.wikimedia.org/T317748 (10Krinkle) For future reference, some additional graphs captured over a slightly wider ra...