[07:45:03] 10Traffic, 10Operations, 10observability: varnish: implement FetchError logging - https://phabricator.wikimedia.org/T224994 (10ema) [07:51:14] 10netops, 10Operations: check_ospf.py fails on mr1-eqsin - https://phabricator.wikimedia.org/T225905 (10ayounsi) p:05Triage→03Normal [07:56:13] 10Traffic, 10Operations, 10observability, 10Patch-For-Review: varnish: implement FetchError logging - https://phabricator.wikimedia.org/T224994 (10ema) The initial plan of adding a synthetic header to varnish with the FetchError cause seems a little to complicated to implement. Send error logs to logstash... [08:32:58] 10netops, 10DC-Ops, 10Operations, 10observability: Send some LibreNMS alerts to dcops and netops only - https://phabricator.wikimedia.org/T224180 (10ayounsi) 05Open→03Resolved a:03ayounsi I made the change to email dcops + me for those alerts. All have a linked runbook. The alerting might need to b... [09:15:44] 10Traffic, 10Analytics, 10Operations: Investigate varnish behavior change since new ATS-change in webrequest upload - https://phabricator.wikimedia.org/T225786 (10ema) p:05Triage→03Normal [09:30:13] 10Traffic, 10Analytics, 10Operations: Investigate varnish behavior change since new ATS-change in webrequest upload - https://phabricator.wikimedia.org/T225786 (10ema) @JAllemandou thanks for the analysis! A few initial points that might help investigation with regards to ATS: * So far we've upgraded upload... [10:01:58] 10Traffic, 10DNS, 10Operations: GSuite Test Domain Verification - https://phabricator.wikimedia.org/T223921 (10ema) p:05Triage→03Normal [11:36:38] 10Traffic, 10Analytics, 10Operations: Investigate varnish behavior change since new ATS-change in webrequest upload - https://phabricator.wikimedia.org/T225786 (10JAllemandou) Hi @ema We can easily get data for older days if needed (we don't drop statistic-data). Here are the hosts with issues for June 6t... [12:25:10] 10Traffic, 10Cloud-VPS, 10Operations, 10cloud-services-team (Kanban): cloudcontrol: decide on FQDN for service endpoints - https://phabricator.wikimedia.org/T223902 (10Andrew) We had a session about this during the SRE summit. The conclusions were: - Use HA Proxy instead of trying to get into the LVS poo... [12:25:57] 10Traffic, 10Cloud-VPS, 10Operations, 10cloud-services-team (Kanban): cloudcontrol: decide on FQDN for service endpoints - https://phabricator.wikimedia.org/T223902 (10Andrew) The remaining task here is to make/update a wiki page about this. [12:45:27] 10Traffic, 10Analytics, 10Operations: Investigate varnish behavior change since new ATS-change in webrequest upload - https://phabricator.wikimedia.org/T225786 (10ema) Mmmh interesting. Certainly, the issue is not ATS-specific: eqsin is still running Varnish, and requests routed through eqsin do not involve... [15:46:33] 10Acme-chief, 10Traffic, 10Operations: acme-chief staging time not working as expected - https://phabricator.wikimedia.org/T225945 (10Vgutierrez) [15:46:35] 10Acme-chief, 10Traffic, 10Operations: acme-chief staging time not working as expected - https://phabricator.wikimedia.org/T225945 (10Vgutierrez) p:05Triage→03High [15:46:48] 10Acme-chief, 10Traffic, 10Operations: acme-chief staging time not working as expected - https://phabricator.wikimedia.org/T225945 (10Vgutierrez) [15:46:56] wikibugs lag is pretty high [15:52:08] vgutierrez: at least our dear bot hasn't committed seppuku yet! [22:05:05] vgutierrez: there's rate limiting to prevent excessive flooding across all the channels wikibugs is in