[00:36:58] 10Domains, 10Traffic, 10Operations, 10WMF-Legal, 10Patch-For-Review: Move wikimedia.ee under WM-EE - https://phabricator.wikimedia.org/T204056 (10tramm) 05Stalled→03Open Please change wikimedia.ee DNS record to refer to 185.7.252.114 (test page: http://wikimedia.ee.klient.veebimajutus.ee/). [07:22:34] 10Traffic, 10DNS, 10Operations, 10Patch-For-Review: Add SPF record for non-canonical domains that are not parked - https://phabricator.wikimedia.org/T220786 (10Vgutierrez) [07:22:44] 10Traffic, 10Cloud-VPS, 10DNS, 10Mail, and 3 others: Set SPF (... -all) for toolserver.org - https://phabricator.wikimedia.org/T131930 (10Vgutierrez) 05Open→03Resolved a:05herron→03Vgutierrez [07:26:05] 10Traffic, 10DNS, 10Operations, 10Patch-For-Review: Add SPF record for non-canonical domains that are not parked - https://phabricator.wikimedia.org/T220786 (10Vgutierrez) 05Open→03Stalled [10:38:55] 10netops, 10Operations: cr2-esams: BGP flapping for AS 61955 (ipv4 and ipv6) - https://phabricator.wikimedia.org/T222424 (10faidon) [11:12:22] we had another few 503 upload@ulsfo spikes due to bots hitting cached 503s in ulsfo. Turns out proxy.config.http.negative_caching_enabled is not actually reloadable like the docs say [11:13:17] other than that, it seems broken like mentioned here on Friday, asked the mailing list for clarification before filing a bug https://lists.apache.org/thread.html/aa008e05c55623cc346a14699adcc66304e282b7e8bc6ed16784e903@%3Cusers.trafficserver.apache.org%3E [11:14:13] and I propose that we switch to require Cache-Control to determine cacheability https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509787/ [11:14:40] it seems the only sane way to be able to answer the question "will this response be cached, and if so for how long?" [14:24:42] 10netops, 10DC-Ops, 10Operations: Juniper network device audit - all sites - https://phabricator.wikimedia.org/T213843 (10ayounsi) This is almost done. They added all the missing devices and almost fixed the "installed at" addresses, got some of them wrong. I followed up with the correct ones. [15:32:14] 10Traffic, 10Operations: tagged_interface sometimes exceeds IFNAMSIZ - https://phabricator.wikimedia.org/T209707 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [15:32:19] \o/ [15:37:41] bblack: feedback on https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509787/ much appreciated! [15:39:06] yeah I looked earlier... seems legit at least for now until we get a better grip on things during the text stuff [15:39:31] +1-ified [15:39:38] ty :) [15:40:32] ema: BTW, maybe is worth investing some seconds in adding a setting to toggle proxy.config.http.cache.required_headers? [15:40:54] [nitpick] of course [15:41:07] and optional :) [15:44:09] vgutierrez: yes why not [15:45:49] fun fun fun of today includes https://github.com/apache/trafficserver/issues/5486 and https://github.com/apache/trafficserver/issues/2505#issuecomment-491867831 [15:46:00] (and https://gerrit.wikimedia.org/r/#/c/operations/debs/trafficserver/+/509871) [15:46:23] looking forward to having ReadOnlyDirectories=/etc [15:46:41] that should bring the general sanity level up a bit [15:49:35] ah, BTW, when earlier today I said that "proxy.config.http.negative_caching_enabled is not actually reloadable like the docs say", that wasn't true. The setting was not applied due to issues with `traffic_ctl config reload` (and that lead to the fun fun fun part) [15:54:42] ah! [15:54:57] so it's probably that the negative-caching stuff actually works as intended? [15:58:31] nope, that's very likely very broken :) [15:59:58] what happened on Friday is https://lists.apache.org/thread.html/aa008e05c55623cc346a14699adcc66304e282b7e8bc6ed16784e903@%3Cusers.trafficserver.apache.org%3E [16:00:54] then we (thought we) disabled negative caching, but changes did not actually get applied due to issues related to the fun fun fun [16:27:19] but the original enabling of the negative caching and restricting it to a few codes, that did get applied and just didn't work right? [16:27:34] (or did it also not get applied for the same basic reason) [16:34:32] bblack: yes, that did get applied and is broken [16:35:05] dear traffic team, could you please review this https://phabricator.wikimedia.org/T222210 so i can merge it? is causing some trouble to releng [16:35:13] sorry for pushing it a little bit [16:38:22] done [16:38:34] thank you bblack [16:38:39] I don't get the earlier comment (being removed by your patch, not added) about avoiding caching loops though [16:38:47] maybe it's from ancient history or who knows [16:39:14] (oh or leftover from a past transition between the two sides, perhaps) [16:41:56] both sites were introduced to address the important caveat listed here https://wikitech.wikimedia.org/wiki/Global_traffic_routing#Cache-to-application_routing [16:42:14] maybe the wording was not the right one, but was the intent [16:46:22] vgutierrez: updated https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509787/ for your reviewing pleasure [16:46:28] <3 [16:46:41] you know how to make me happy [16:48:25] ema: https://gerrit.wikimedia.org/r/c/operations/puppet/+/509787/3/modules/profile/files/trafficserver/default.lua could it make any sense to chain those if blocks to avoid evaluating all of them if one already evaluated to true and do_not_cached() has been called? [16:51:21] aka elseif [16:52:15] vgutierrez: you're right as always [18:10:49] that's quite impressive https://blog.cloudflare.com/argo-and-the-cloudflare-global-private-backbone/?a [18:35:01] 10netops, 10Operations: Emergency syslog messages on asw1-eqsin - https://phabricator.wikimedia.org/T223156 (10ayounsi) p:05Triage→03Normal [20:01:12] we're seeing a spike of slower web performance in North America and Asia as of 8-9 hours ago, any traffic-related work happened around that time? [20:01:20] or incident, I guess [20:02:21] https://grafana.wikimedia.org/d/000000230/navigation-timing-by-continent?orgId=1 [20:03:34] could be something else entirely or something organic, but it's always suspicious when one continent is significantly more affected than others [20:15:02] 10Traffic, 10Operations, 10Performance-Team (Radar): Some load.php requests failing due to "ERR_SPDY_PROTOCOL_ERROR 200" - https://phabricator.wikimedia.org/T220022 (10Krinkle) [20:18:58] nothing that I'n aware at the network level [22:18:30] 10netops, 10Operations, 10Operations-Software-Development, 10netbox, 10User-crusnov: Netbox report to validate network equipment data - https://phabricator.wikimedia.org/T221507 (10crusnov) After digging and discussing I believe the way forward since the mapping is slightly ... weird between LibreNMS and... [23:24:43] 10Wikimedia-Apache-configuration, 10Commons, 10SDC General, 10WikibaseMediaInfo, and 3 others: Make /entity/ alias work for Commons - https://phabricator.wikimedia.org/T222321 (10Addshore) For wikidata this is done in puppet/modules/mediawiki/files/apache/sites/wikidata-uris.incl We should set up a sim...