[07:56:02] 10Traffic, 10Operations: ats-backend throttles connections under heavy load - https://phabricator.wikimedia.org/T254714 (10Vgutierrez) [07:59:14] 10Traffic, 10Operations, 10Phabricator, 10Security-Team: Accessing Phabricator from Tor - https://phabricator.wikimedia.org/T254568 (10Fae) Playing with the Tor browser this morning, a work-around could be to for users to keep trying new Tor circuits until they stop getting the Error 500 message. This appe... [08:10:03] where can I find the equivalent of VCL logic for ATS? Is it in Puppet as well or on a different repo? [08:11:10] puppet [08:11:50] so.. we have a global lua script for every request.. that's https://github.com/wikimedia/puppet/blob/production/modules/profile/files/trafficserver/default.lua [08:12:08] and we have some linked to specific remap rules [08:12:18] there are in the same directory as the default.lua one [08:13:02] and you can find the remap rules here: https://github.com/wikimedia/puppet/blob/production/hieradata/common/profile/trafficserver/backend.yaml [08:13:24] as an example, you can see here https://github.com/wikimedia/puppet/blob/production/hieradata/common/profile/trafficserver/backend.yaml#L114-L117 [08:13:47] mwmaint.discovery.wmnet traffic using x-wikimedia-debug-routing.lua [08:13:59] that's https://github.com/wikimedia/puppet/blob/production/modules/profile/files/trafficserver/x-wikimedia-debug-routing.lua [08:15:22] thanks [08:16:11] and for ats-tls we only have the global one: https://github.com/wikimedia/puppet/blob/production/modules/profile/files/trafficserver/tls.lua [08:16:34] is there any header stripping logic that applies upload? [08:16:43] or a whitelist [08:19:34] hmm from the client to the origin server or the other way around? [08:19:54] tls.lua cleans some debug headers https://github.com/wikimedia/puppet/blob/production/modules/profile/files/trafficserver/tls.lua [08:20:18] but I guess ema can give you more details on that :) [08:20:30] gilles: and of course the VCL for varnish-frontend is still there [08:22:51] 10Traffic, 10Commons, 10MediaWiki-File-management, 10Operations, and 2 others: ATS or Varnish incorrectly strips Content-Disposition header for webp thumbnails - https://phabricator.wikimedia.org/T254557 (10Gilles) a:03ema [08:23:19] gilles: we remove various response headers at the ats-tls layer in do_global_send_response(): https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/profile/files/trafficserver/tls.lua#62 [08:23:53] yeah, doesn't seem to be coming from that. context is https://phabricator.wikimedia.org/T254557 [08:26:07] hmm weird [08:31:27] ah, nevermind, I think I've figured it out [08:31:50] the header isn't being stored in swift, so when it gets pulled from swift directly (previously generated) it's not there anymore [08:31:59] false alarm :) nothing to do with ATS/Varnish [08:32:14] :) [08:34:24] gilles: nice catch, I was trying a few different requests and could only occasionally reproduce [08:34:46] for some reason some thumbnails do have those headers in the swift object, it's weird [08:35:26] maybe it's just old objects stored in swift without it before we started storing the headers? [08:36:03] oh no, ok of course if I request a fresh one from swift I get the header, but it's not stored permanently if I request the same thing again immediately [08:36:06] so yeah, that's it [08:36:51] for jpgs it does stick, which is probably why we haven't seen this before [08:37:05] I thought swift just stored all headers you provide when saving the object [08:39:20] yes, but you have to pass them to begin with. the codepath for saving and the headers that get served by the request are different [08:39:34] because there are some debugging headers that we wan thumbor to generate but not save to swift [08:40:40] oh I see [08:40:59] it's actually possible that it's wrong for JPGs as well, in fact. we're just a lot more likely to have old JPG thumbnails in Swift that MediaWiki saved with that header a long time ago [09:08:21] 10netops, 10Operations: Telia eqiad<->codfw (IC-307235) outage ref: 01171084 - https://phabricator.wikimedia.org/T254674 (10Dzahn) "we experienced a brief service disruption in a card in Selma, AL, impacting our transmission stretch between Atlanta and Houston. A cold reboot of the card restored service. We... [09:33:43] 10netops, 10Operations: Telia eqiad<->codfw (IC-307235) outage ref: 01171084 - https://phabricator.wikimedia.org/T254674 (10ayounsi) 05Open→03Resolved a:03ayounsi Thanks, all back to normal. [10:38:08] vgutierrez, ema, can I get a +1 on https://gerrit.wikimedia.org/r/c/operations/dns/+/603409 ? [10:38:13] * vgutierrez checking [10:38:51] thx! [10:48:45] the contractor we've had implement browser features for the past few months is interested in tackling issues we have in PHP, ATS and any C/C++ based FLOSS projects we rely on next FY if we retain his services [10:49:20] how useful would it be to you all to have some extra help fixing upstream ATS bugs and/or implementing new features in it next FY? [11:23:26] 10Traffic, 10Operations: ats-backend throttles connections under heavy load - https://phabricator.wikimedia.org/T254714 (10jbond) p:05Triage→03Medium [11:25:30] 10Traffic, 10Operations, 10Phabricator, 10Security-Team: Accessing Phabricator from Tor - https://phabricator.wikimedia.org/T254568 (10jbond) p:05Triage→03Medium [13:23:30] gilles: I think it could be useful, we'll discuss this today and get back to you! [13:24:52] I've sent the request to Grant and Erika already, but if you see the value as well I think it would be helpful to get a +1 from your team about this sent to them. My thinking would be to assign him what is the most critical for the org, regardless of which upstream project it is. Obviously ATS is pretty critical :) [13:36:54] 10Traffic, 10Analytics, 10Analytics-Kanban, 10EventStreams, and 2 others: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10ema) 05Open→03Declined >>! In T242767#6199410, @MrJaroslavik wrote: > Hey, can be fixed this problem?... [13:58:46] 10Traffic, 10Analytics, 10Analytics-Kanban, 10EventStreams, and 2 others: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10Ottomata) Hm, EventStreams uses the Server Sent Events for this very reason. I don't think anyone is expect... [14:00:30] godog: https://gerrit.wikimedia.org/r/c/operations/puppet/+/603475 this kind of adjustment is enough or do we need to clean something on the prometheus side? [14:03:43] vgutierrez: no action on the prometheus side no, you might see the temporary effect on the dashboards though when the timespan has both previous and new buckets [14:04:31] I have to ask though, what's the rationale ? [14:04:58] we have timeouts way higher than 1.2 secs [14:05:07] 3, 35, 60 secs... [14:05:17] so we'd like to have more insights that just "Infinite" :) [14:06:51] heehh fair enough [16:10:20] 10Traffic, 10Analytics, 10Operations: Compare logs produced by atskfafka with those produced by varnishkafka - https://phabricator.wikimedia.org/T254317 (10Milimetric) p:05Medium→03High [16:44:15] 10Traffic, 10Analytics, 10Analytics-Kanban, 10EventStreams, and 2 others: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10stjn) This is a very strange conclusion to this task. There was never an assumption that you do not need to... [17:00:06] 10Traffic, 10Analytics, 10Analytics-Kanban, 10EventStreams, and 2 others: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10BBlack) >>! In T242767#6201754, @Ottomata wrote: [reordering a little] > What happens right now if someone h... [17:06:21] I took so long writing a response, that I missed a newer comment coming in 15 minutes earlier :P [17:06:59] which I used to get warned about by phab's little notifications, which were over websockets that are now disabled and also are a long-running server-push connection case, ironically [17:07:20] or maybe not-ironically. I think I've lost all track of the difference between true irony and the misuse of the term. [17:23:29] 10Traffic, 10Analytics, 10Analytics-Kanban, 10EventStreams, and 2 others: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10Ottomata) (Thanks for the response bblack!) > 2. Does the typical client handle the disconnect gracefully,... [20:24:01] 10netops, 10DC-Ops, 10Operations, 10ops-eqiad, 10cloud-services-team (Hardware): (Need By: 2020-06-12) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10wiki_willy) [21:26:02] 10Traffic, 10Analytics, 10Analytics-Kanban, 10EventStreams, and 2 others: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10stjn) >>! In T242767#6202740, @Ottomata wrote: > I guess I'd like to hear from the EventStreams users on thi... [21:40:58] 10netops, 10Operations, 10ops-codfw: (Need by: End of July-2020 ) codfw:rack/setup/new management switches - https://phabricator.wikimedia.org/T253154 (10Papaul) [22:26:05] 10Traffic, 10Core Platform Team, 10Operations: Move wikitech purges to kafka - https://phabricator.wikimedia.org/T254828 (10Pchelolo)