[03:39:45] 10Traffic, 10Discovery, 10Operations, 10Wikidata, 10Wikidata-Query-Service: LDF server has 404 errors for JS and CSS resources - https://phabricator.wikimedia.org/T237165 (10Dzahn) p:05Triage→03High [04:25:32] cdanis: it's up to the traffic team, but looks like a good thing to do, and it's easy to deploy [07:32:45] 10netops, 10Operations: Stale LibreNMS ports - https://phabricator.wikimedia.org/T242318 (10ayounsi) p:05Triage→03Low [07:37:20] hello everybody, after cdanis' request we added to https://turnilo.wikimedia.org/#webrequest_sampled_128 the 'aggregated_response_size' metric [07:37:55] it is available from yesterday's data, so it will take a bit to have the full datasource with the new metric [07:38:15] (once the old data without the metric gets dropped and overwritten with newer data) [08:13:13] 10netops, 10Operations: Stale LibreNMS ports - https://phabricator.wikimedia.org/T242318 (10ayounsi) From @Marostegui, the list of tables that have rows with `device_id = 20`: P10095#59005 [08:19:56] 10netops, 10Operations: Stale LibreNMS ports - https://phabricator.wikimedia.org/T242318 (10Marostegui) If you need the exact rows just do: `select * from...` instead of `select count(*) from...` Let me know if you need further help. [08:40:42] 10Traffic, 10Operations: Provide non-canonical-redirect from every datacenter - https://phabricator.wikimedia.org/T242321 (10Vgutierrez) [08:41:06] 10Traffic, 10Operations: Provide non-canonical-redirect service from every datacenter - https://phabricator.wikimedia.org/T242321 (10Vgutierrez) p:05Triage→03Normal [09:46:59] https://blogs.dropbox.com/tech/2020/01/intelligent-dns-based-load-balancing-at-dropbox/ [10:14:53] gilles: wow how cool is kepler.gl [10:16:06] so tl;dr would be: latency based routing > geo based routing [10:16:28] pretty much [10:16:44] when you have 20 PoPs [10:16:49] exactly :) [10:17:25] with 5 PoPs, geo based routing is probably a good enough tradeoff between perf and engineering costs [10:29:38] 10Traffic, 10Operations, 10Performance-Team, 10observability: Ensure graphs used by Performance account for Varnish-to-ATS migration - https://phabricator.wikimedia.org/T233474 (10ema) >>! In T233474#5786575, @Krinkle wrote: > @ema If I understand correctly, varnishrls does not yet require migration becaus... [15:11:40] 10netops, 10Operations, 10Patch-For-Review: fastnetmon misreports attack type and protocol - https://phabricator.wikimedia.org/T241374 (10CDanis) 05Open→03Stalled Believe this has been worked around for now. [15:16:38] a bunch of "Varnish frontend child restarted: (null)" [15:16:47] jynus: known, ty [15:17:02] they should be recovering soon [15:17:08] cool, sorry [16:55:27] 10Traffic, 10Discovery, 10Operations, 10Wikidata, 10Wikidata-Query-Service: LDF server has 404 errors for JS and CSS resources - https://phabricator.wikimedia.org/T237165 (10Mstyles) for clarification the correct response will contain a list that looks like this ` @prefix schema: . @... [16:55:58] mutante: did you see anything in the iptables? [16:58:26] maryum: no. if that was the case we wouldn't be getting any curl response. i was thinking at first that was the case so wanted to rule out firewall and "missing proxy setting" but it's not that then [17:02:54] mutante: ah okay thanks. [19:42:54] 10Traffic, 10Discovery, 10Operations, 10Wikidata, and 2 others: LDF server has 404 errors for JS and CSS resources - https://phabricator.wikimedia.org/T237165 (10Gehel) a:03Mstyles [21:22:17] vgutierrez: still around by any chance? We're having issues with T237165 [21:22:18] T237165: LDF server has 404 errors for JS and CSS resources - https://phabricator.wikimedia.org/T237165 [21:22:55] we're getting different pages when querying nginx directly on the wdqs servers or when going through the external endpoint [21:23:31] it looks like there is some rewrite going on upstream from the wdqs servers, but I have no idea how that part works... [21:24:52] bblack: ^ [21:29:30] hmm right now at ats-be level ldf requests are being sent to wdqs1005 [21:30:27] https://github.com/wikimedia/puppet/blob/6dc32560a6cc544f394faa020133a40dc6784c27/hieradata/common/profile/trafficserver/backend.yaml#L150 [21:31:47] 10Traffic, 10Operations: Set up git-driven static microsite for wikiworkshop.org - https://phabricator.wikimedia.org/T242374 (10BBlack) p:05Triage→03Normal [21:32:00] 10Traffic, 10Operations: Set up git-driven static microsite for wikiworkshop.org - https://phabricator.wikimedia.org/T242374 (10BBlack) [21:32:03] 10Traffic, 10DNS, 10Operations, 10Research: Add wikiworkshop.org to the Foundation's DNS - https://phabricator.wikimedia.org/T240303 (10BBlack) [21:32:10] 10Traffic, 10Operations: Set up git-driven static microsite for wikiworkshop.org - https://phabricator.wikimedia.org/T242374 (10BBlack) [21:32:13] 10Traffic, 10DNS, 10Operations, 10Research: Add wikiworkshop.org to the Foundation's DNS - https://phabricator.wikimedia.org/T240303 (10BBlack) 05Open→03Resolved [21:33:05] vgutierrez: it looks like the requests are rewritten from /bigdata/ldf to just /ldf [21:33:19] gehel: from https://phabricator.wikimedia.org/T237165#5788219 I get that's not right and it should point to a dns service discovery hostname balancing traffic across the 4 servers? [21:34:02] is 1005 not in the usual set or something? [21:34:25] nope, the LDF traffic is forwarded to a single server (some pagination issues) [21:34:29] gehel: right.. cause the target of the remap lacks the url path [21:34:36] ack [21:34:48] let me get the laptop... [21:34:55] vgutierrez: the remap at ATS level? [21:35:03] vgutierrez: I can do it [21:35:21] ack [21:35:24] yes [21:35:42] not an emergency, it's been broken for some time [21:36:02] just wanted to make sure someone understands what's going on [21:36:22] it should be replacement: https://wdqs1005.eqiad.wmnet/bigdata/ldf [21:38:51] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/563281/1/hieradata/common/profile/trafficserver/backend.yaml [21:39:06] ^ that *should* fix it, if I understand all the bits correctly [21:39:23] yes [21:39:49] will give it a spin! [21:40:22] nice.. ats-be will reload the remap config as soon as puppet runs.. so no restart is required [21:45:52] the CR looks good to me! Let's see if we can reproduce [21:46:20] maryum: I'm off for today, can you check if this is fixed and update the ticket? [21:47:14] bblack, vgutierrez: maryum will need some help from you as well for a caching issue on that LDF endpoint: T232006 [21:47:14] T232006: LDF service does not Vary responses by Accept, sending incorrect cached responses to clients - https://phabricator.wikimedia.org/T232006 [21:47:28] gehel: should be applied now [21:47:32] (the remap fix) [21:47:33] vgutierrez, bblack: thanks a lot for the fix! [21:47:58] bblack: yep, looks good now! [21:49:04] gehel: re: Accept header, I think the only normalization we have on that today is for specific limited cases like Restbase [21:49:23] I would've thought, it would Just Work by the standards w/ Vary:Accept and we'd only be maybe adding normalization as an optimization [21:49:35] but obviously, that must not be the case :) [21:50:04] he, he, he... things are never what they look like... [21:51:48] do you happen to know if the content-type and accept values actually match for the LDF case? [21:52:13] anyways, we can sort it out on the ticket [21:52:27] but I think ATS actually cares about matching C-T vs Accept and has some config related to it [21:52:41] I would expect so, but I have not actually checked [21:52:50] let's sort this on ticket, too late for me now [21:52:54] yeah [21:53:07] maryum might be able to help as well (better timezone) [21:53:16] yes I'll take a look!! [21:57:28] yes all of the content types are getting returned properly [22:05:06] maryum: so I'm trying to get a succint repro [22:05:14] at least when I miss/pass, this seems right: [22:05:18] bblack@haliax:~$ curl -v -H 'Accept: application/ld+json' 'https://query.wikidata.org/bigdata/ldf?subject=&predicate=http%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2FP3417&page=2' -o /dev/null 2>&1|egrep '^< (content-type|x-cache):' [22:05:22] < content-type: application/ld+json;charset=utf-8 [22:05:25] < x-cache: cp4030 miss, cp4029 pass [22:05:27] bblack@haliax:~$ curl -v -H 'Accept: text/turtle' 'https://query.wikidata.org/bigdata/ldf?subject=&predicate=http%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2FP3417&page=2' -o /dev/null 2>&1|egrep '^< (content-type|x-cache):' [22:05:31] < content-type: text/turtle;charset=utf-8 [22:05:34] < x-cache: cp4031 miss, cp4029 pass [22:06:21] getting a repetitive hit to test cache contents has been tricky, and the FE always seems to pass [22:08:39] overall the behavior seems strange on our end of things [22:08:57] it's always a frontend pass, and seems to randomize backend selection like it's expecting to pass there, but it is backend-hittable [22:09:04] without enough spamming I can load hits in lots of backends though [22:12:15] I have yet to find a counter-example though (of hitting on the wrong content-type) [22:17:02] 10Traffic, 10Discovery, 10Operations, 10Wikidata, and 2 others: LDF server has 404 errors for JS and CSS resources - https://phabricator.wikimedia.org/T237165 (10Vahurzpu) It's consistently working for me now. Thanks! [22:35:32] 10Traffic, 10Operations, 10Research: Set up git-driven static microsite for wikiworkshop.org - https://phabricator.wikimedia.org/T242374 (10leila) [22:41:27] 10Traffic, 10Operations, 10Research: Set up git-driven static microsite for wikiworkshop.org - https://phabricator.wikimedia.org/T242374 (10leila) @bmansurov you can use this task for tracking and implementing the change for bringing the hosting of wikiworkshop.org to github. (For context: I had a chat wit... [22:44:26] 10Traffic, 10Operations, 10Research: Set up git-driven static microsite for wikiworkshop.org - https://phabricator.wikimedia.org/T242374 (10Reedy) >>! In T242374#5791357, @leila wrote: > @bmansurov you can use this task for tracking and implementing the change for bringing the hosting of wikiworkshop.org to... [23:45:57] bblack: I'm not able to see the cache issue either.