[09:32:07] 10Traffic, 10Operations, 10Patch-For-Review: ATS backend-side request-mangling - https://phabricator.wikimedia.org/T209021 (10ema) [11:23:12] 10Traffic, 10Operations, 10Wikimedia-General-or-Unknown, 10media-storage: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10fgiunchedi) I can indeed reproduce the problem when fetching e.g. https://upload... [11:35:28] 10Traffic, 10Operations, 10Wikimedia-General-or-Unknown, 10media-storage: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10akosiaris) I can reproduce it as well. Received sizes and execution times are not... [12:53:08] 10Traffic, 10Operations, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Addshore) Poke as it is now 1 or 2 weeks since the last movement here. @BBlack I just cced you on the patch so it appears in your review queue.... [13:24:47] 10Traffic, 10Operations, 10Privacy: Disable WMF-Last-Access cookies for wmfusercontent.org - https://phabricator.wikimedia.org/T210167 (10jijiki) p:05Triage>03Normal [13:27:41] 10Traffic, 10Operations: INMARSAT geolocates to the UK, leading to requests going to esams - https://phabricator.wikimedia.org/T209785 (10jijiki) p:05Triage>03Low @Reedy if you disagree with the priority I set, feel free to change it:) [13:40:14] 10Traffic, 10Operations, 10Wikimedia-General-or-Unknown, 10media-storage: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10jijiki) p:05Triage>03High Should we merge this with T190988 or vice versa? [13:44:36] 10netops, 10Cloud-Services, 10Operations, 10Patch-For-Review: Renumber cloud-instance-transport1-b-eqiad to public IPs - https://phabricator.wikimedia.org/T207663 (10aborrero) [13:44:57] 10netops, 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Renumber cloud-instance-transport1-b-eqiad to public IPs - https://phabricator.wikimedia.org/T207663 (10aborrero) [13:45:13] 10netops, 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Renumber cloud-instance-transport1-b-eqiad to public IPs - https://phabricator.wikimedia.org/T207663 (10aborrero) p:05Normal>03Low [14:24:41] 10Traffic, 10Operations, 10Patch-For-Review: ATS backend-side request-mangling - https://phabricator.wikimedia.org/T209021 (10ema) [14:57:20] 10Traffic, 10Operations, 10Patch-For-Review: ATS production-ready as a backend cache layer - https://phabricator.wikimedia.org/T207048 (10ema) [15:35:45] 10netops, 10Operations, 10ops-codfw: codfw row A recable and add QFX - https://phabricator.wikimedia.org/T210447 (10Papaul) [15:40:28] 10netops, 10Operations, 10ops-codfw: codfw row A recable and add QFX - https://phabricator.wikimedia.org/T210447 (10Papaul) [16:04:47] 10Traffic, 10Operations, 10Wikimedia-General-or-Unknown, 10media-storage: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10BBlack) They seem different, as T190988 is about faulty uploads (which I presume... [16:31:51] 10netops, 10Operations, 10monitoring, 10Patch-For-Review: Add virtual chassis port status alerting - https://phabricator.wikimedia.org/T201097 (10ayounsi) 05Open>03Resolved a:03ayounsi Done. Runnbook at https://wikitech.wikimedia.org/wiki/Network_monitoring#VCP_status [16:36:18] bblack: haven't had time to dig yet, but the % or retransmits for the DNS recursors suddenly increased around 8pm UTC on Friday - https://grafana.wikimedia.org/dashboard/db/network-performances-global?panelId=18&fullscreen&edit&tab=alert&orgId=1&from=now-7d&to=now [17:30:02] 10Traffic, 10Analytics, 10Operations, 10Performance-Team: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10fdans) Analytics needs x-analytics in every request, not only in debugging ones but we don't need to include it in the response headers. W... [18:08:27] XioNoX: I think that point-in-time is just when the cluster names were finally defined for them in https://gerrit.wikimedia.org/r/c/operations/puppet/+/476393 , which caused them to first start showing up in that graph at all. [18:09:08] ah okay! [18:09:10] XioNoX: and if I had to take a guess, since they're recursors that handle all the DNS lookups from us to the outside world, it's probably pretty sadly-normal that they'd see lots of retransmits, due to broken TCP DNS out in the wild [18:09:55] that makes sens [18:09:58] but it might be worth checking in what the actual retranses look like, in case it's something stupid on our end (e.g. we've got some bad filtering of our own outbound TCP DNS somewhere) [18:10:15] yeah, I'll do some packet capture and see what's up [19:12:27] 10netops, 10Operations, 10ops-codfw: codfw row A recable and add QFX - https://phabricator.wikimedia.org/T210447 (10ayounsi) [22:15:49] 10Traffic, 10Operations, 10Wikimedia-General-or-Unknown, 10media-storage: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10Danmichaelo) This also affects CropTool (hosted on Tool Labs), I'm getting report... [22:38:01] bblack, https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom;TicketID=10893724 [22:38:42] (in case you don't have info-en access, basically someone is using a public library computer to access wikipedia.org and getting NET::ERR_CERT_AUTHORITY_INVALID) [22:41:35] might be interesting to see what is trying to issue a wikipedia.org cert, though it might turn to be that their computer does not trust our CA [22:42:35] or even that there is a filtering software [22:42:53] and uses a self-signed certificate to show an error message [22:43:31] that error loosk like Firefox, doesn't it? [22:43:42] yeah. I certainly prefer such a system that throws invalid cert errors at the user than one that can successfully trick them with a bogus local CA [22:43:46] hmm, chrome [22:46:10] Krenair: I don't have OTRS [22:46:18] ok [22:46:33] but also, it's possible the library computer's date is off by a lot, too [22:46:42] that would be a different error I think [22:46:49] welll [22:47:01] might depend on the CA's validity dates? [22:47:03] I would ask them to provide the certificate they receive [22:47:32] should I? Or do you prefer to reply yourself? [22:47:51] usually if they bother putting some TLS MITM in place for a network of library computers or whatever, they install fake root CAs on the clients and nobody ever sees an error [22:47:52] Platonides, go for it [22:47:59] might need to walk them through it a bit [22:48:41] more than a bit, I'm afraid [22:49:55] yeah [23:14:26] grr [23:14:39] Platonides, what's up? [23:14:42] I would like to have Chromium devs providing the steps for that person [23:14:49] haha [23:15:02] I didn't remember that they had hidden the certificate details a few versions back [23:15:05] they bury it pretty deep now don't they? [23:15:09] yes [23:15:17] in a tab of developer tools [23:15:28] I got pissed off with that and other things and ditched it for FF [23:15:44] I got pissed off for that, too [23:21:37] 10Traffic, 10Operations, 10Wikimedia-General-or-Unknown, 10media-storage, 10Patch-For-Review: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10BBlack) I think the patch reverted above was at fault. Wha... [23:25:03] replied [23:26:17] Platonides, looks good [23:26:49] too long, imho [23:26:53] yeah well [23:26:54] but there was little to do there [23:26:55] not your fault [23:27:01] yup [23:27:18] someone successfully responding to that will likely come back with quite a bit of info we can't share out of OTRS though [23:27:43] the bad cert may be enough to identify their location [23:27:49] I wasn't asking for anything confidential... [23:27:54] no [23:28:08] hmm, perhaps [23:28:11] I'm saying when we get it we should treat it very carefully [23:31:34] I wonder what's the user age [23:49:03] Platonides, why? [23:52:47] 10Traffic, 10Operations, 10Wikimedia-General-or-Unknown, 10media-storage, 10Patch-For-Review: Loading full versions of larger images from Commons stucks / repeatedly gets interrupted after a few MBs - https://phabricator.wikimedia.org/T210890 (10BBlack) 05Open>03Resolved a:03BBlack I can't reproduc...