[01:16:41] 10Traffic, 10netops, 10Operations, 10Wikimedia-General-or-Unknown: Numerous people reporting issues saving edits and viewing previews/diffs - https://phabricator.wikimedia.org/T232491 (10BBlack) I've made a temporary MTU-related fixup on the affected eqiad and esams cache hosts. Assuming we understand the... [04:28:44] 10Traffic, 10netops, 10Operations, 10Wikimedia-General-or-Unknown: Numerous people reporting issues saving edits and viewing previews/diffs - https://phabricator.wikimedia.org/T232491 (10John_of_Reading) Thank you! I've successfully previewed and edited in Firefox. I've also saved an edit in AWB, which had... [08:43:57] 10Traffic, 10Operations, 10observability: varnish request rates showed a spike up while nginx request rates didn't - https://phabricator.wikimedia.org/T232574 (10fgiunchedi) [09:07:56] 10Traffic, 10Operations, 10observability: Alert in case of significant discrepancies between the number of nginx and varnish responses - https://phabricator.wikimedia.org/T232574 (10ema) p:05Triage→03Normal [09:14:50] 10Traffic, 10Operations, 10Wikimedia-Logstash, 10observability, 10User-Addshore: Changing Kibana filters is ridiculously slow - https://phabricator.wikimedia.org/T189333 (10ema) >>! In T189333#5481492, @Krinkle wrote: > I re-ran my analysis today, and oddly enough the total number of fields it not only s... [10:00:43] akosiaris: FTR I've looked at some of the origin servers connect errors on cp1075, those you mentioned yesterday [10:01:13] akosiaris: it seems mostly *.planet.wikimedia.org and people.wikimedia.org are affected, adding them to SAN now [10:14:34] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review: Applayer services without TLS - https://phabricator.wikimedia.org/T210411 (10ema) [10:17:16] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review: Migrate Failoid hosts to Stretch/Buster - https://phabricator.wikimedia.org/T224559 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: `tureis.codfw.wmnet` - tureis.codfw.wmnet - Removed from Puppet mast... [10:18:56] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review: Migrate Failoid hosts to Stretch/Buster - https://phabricator.wikimedia.org/T224559 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2001 for hosts: `roentgenium.eqiad.wmnet` - roentgenium.eqiad.wmnet - Removed from P... [10:33:31] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review: Migrate Failoid hosts to Stretch/Buster - https://phabricator.wikimedia.org/T224559 (10MoritzMuehlenhoff) 05Open→03Resolved New instances (failoid1001 and failoid2001) have been set up with Buster and are in use. The old instances (roentgenium... [12:06:35] Just an idea for the next datacenter (PoP/CDN). Sao Paulo has the world largest IXP (IXP.br) plus there are several submarine cables connecting Brazil to South Africa. This would improve this pictures greatly https://commons.wikimedia.org/wiki/File:Latency_map_world.png Of course adding a new dc has so many aspects that can overshadow being close to a large IXP [12:07:11] (btw. It would be great if RIPE atlas do the tests again after eqsin) [12:09:54] 10Traffic, 10Operations: GRE MTU mitigations - Tracking - https://phabricator.wikimedia.org/T232602 (10BBlack) p:05Triage→03Normal [12:10:38] 10Traffic, 10Operations: GRE MTU mitigations - Tracking - https://phabricator.wikimedia.org/T232602 (10BBlack) [12:12:03] Amir1: yeah Sao Paolo is already our preferred next edge as far as under-served further-flung places, although there's some debate on priority between that and some other expansion plans (e.g. getting a second EMEA site for more redundancy and diversity for the huge traffic over there) [12:12:50] an ideal medium-term state for us that's within reasonable reach would be to add both of the above-mentioned sites as well as something on the western side of asia. [12:16:26] Amir1: there's an unofficial and slightly different map arzhel made post-eqsin for our global latencies here, as part of future planning efforts: [12:16:29] https://people.wikimedia.org/~ayounsi/countrymap_www.wikipedia.org.html [12:17:21] bblack: thanks for the note. For west asia, that would make me a happy but we just need to find an stable country (full disclosure, I'm from Iran) [12:17:52] :) [12:18:19] it's probably the lowest-priority of the three mentioned, so we haven't dug into ideal locations on that one much [12:18:44] even something in ~NW india would help if there's a viable spot there [12:19:02] Turkey would be a good option (ironically) [12:19:35] for the ME, probably the second EMEA site will help enough as well that we don't need one directly in the ME for now [12:19:51] esp if we choose a southeasterly EU location [12:20:00] I see [12:20:02] but cable maps don't always line up with intuitive geography either :) [12:20:05] that sounds food [12:20:42] yeah [12:56:40] 10Traffic, 10Operations, 10Wikimedia-Logstash, 10observability, 10User-Addshore: Changing Kibana filters is ridiculously slow - https://phabricator.wikimedia.org/T189333 (10fgiunchedi) >>! In T189333#5481492, @Krinkle wrote: > I re-ran my analysis today, and oddly enough the total number of fields it not... [13:31:26] 10Traffic, 10netops, 10Operations, 10Wikimedia-General-or-Unknown: Numerous people reporting issues saving edits and viewing previews/diffs - https://phabricator.wikimedia.org/T232491 (10Izno) >>! In T232491#5481717, @BBlack wrote: > Can any previous reporters confirm the same continued breakage, or new su... [13:51:40] 10netops, 10Operations: BGP session down for AS4739 on cr4-ulsfo - https://phabricator.wikimedia.org/T230005 (10elukey) @ayounsi I have sent two emails to their NOC but no answer, should we remove the peering config? [13:52:48] 10netops, 10Operations: BGP session down for AS 20485 on cr2-esams - https://phabricator.wikimedia.org/T230004 (10elukey) 05Open→03Resolved ` elukey@re0.cr2-esams> show bgp summary | match 20485 80.249.210.177 20485 12254 7659 0 0 2d 14:25:53 Establ ` All good! [13:59:08] 10Wikimedia-Apache-configuration, 10Performance-Team: Apache configuration: SVGs served by MediaWiki aren't gzipped - https://phabricator.wikimedia.org/T232615 (10Gilles) [14:00:22] 10netops, 10Operations: BGP sessions down on cr2-esams - https://phabricator.wikimedia.org/T232617 (10elukey) [14:13:15] 10netops, 10Operations: BGP sessions down on cr2-esams - https://phabricator.wikimedia.org/T232617 (10jbond) p:05Triage→03Normal [14:18:46] so, I haven't looked seriously at our envoy deployment or envoy in general [14:19:03] but my quick question is: can we do TLS termination for raw TCP (not HTTPS) with it as well? [14:19:10] (like haproxy would) [14:20:06] we've had the authdns DoTLS thing on the back-burner for a while. Most of it's ready except puppetizing some generic TLS revproxy for raw TCP in front of it on the right port... [14:20:27] (haproxy was looking better than nginx for that purpose when I last looked, but maybe now that we're pushing envoy around it's the best option?) [14:21:29] https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/network_filters/tcp_proxy_filter seems possible! [14:25:51] hmm it may not support outbound PROXY protocol on the backend, though :/ [14:26:00] it does seem to have support for incoming PROXY, but no docs on outbound [14:27:39] https://github.com/envoyproxy/envoy/issues/1031 [14:32:45] they have a "transparent mode" that could work for sidecar, but it requires iptables mangling hacks :/ [14:32:49] https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/original_src_filter [14:41:49] 10Traffic, 10netops, 10Operations, 10Wikimedia-General-or-Unknown: Numerous people reporting issues saving edits and viewing previews/diffs - https://phabricator.wikimedia.org/T232491 (10Ahecht) It's working again for me as well. [14:59:08] 10netops, 10Operations: BGP session down for AS4739 on cr4-ulsfo - https://phabricator.wikimedia.org/T230005 (10ayounsi) Yep! I can walk you through it if needed. [15:01:07] 10netops, 10Operations: BGP sessions down on cr2-esams - https://phabricator.wikimedia.org/T232617 (10ayounsi) I think it's safe to delete 28598 if they don't reply to your most recent email. About 12871 you're correct, or they're migrating something. Best is to ask them, then delete the down sessions if no r... [15:24:34] 10Wikimedia-Apache-configuration, 10Performance-Team, 10Patch-For-Review: Apache configuration: SVGs served by MediaWiki aren't gzipped - https://phabricator.wikimedia.org/T232615 (10Krinkle) Nice catch. I could've sworn this was done already, but maybe it got lost somehow. Perhaps in the transition from bit... [15:31:59] 10Wikimedia-Apache-configuration, 10Performance-Team, 10Patch-For-Review: Apache configuration: SVGs served by MediaWiki aren't gzipped - https://phabricator.wikimedia.org/T232615 (10Gilles) a:03Gilles [15:32:17] 10Wikimedia-Apache-configuration, 10Performance-Team, 10Patch-For-Review: Apache configuration: SVGs served by MediaWiki aren't gzipped - https://phabricator.wikimedia.org/T232615 (10Gilles) p:05Triage→03High [15:38:40] 10Traffic, 10netops, 10Operations, 10Wikimedia-General-or-Unknown: Numerous people reporting issues saving edits and viewing previews/diffs - https://phabricator.wikimedia.org/T232491 (10Harmonia_Amanda) It works for me too! Thank you! [16:43:38] 10HTTPS, 10Traffic, 10Operations: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559 (10Aklapper) @MBeat33: Could you maybe answer @BBlack's last question? Or do you know who could? (Asking you because of https://phabricator.wikimedia.org/T228672#5358426 ) [17:09:03] 10HTTPS, 10Traffic, 10Operations: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559 (10Dzahn) T228672 says nobody in charge of the Shop is even on Phabricator :( Looks like we have to email merchandise@ to get this bumped. [17:09:35] 10Traffic, 10FR-Q2-FY2019-20-cleanup-list, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Operations: Geoip lookup - Misidentifying country due to travelling - https://phabricator.wikimedia.org/T175691 (10DStrine) [17:11:10] 10HTTPS, 10Traffic, 10Operations: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559 (10BBlack) That's kind of ridiculous... [17:17:28] lol [17:20:00] don't you know, we use betterworks no, not phabricator [17:20:04] *now [17:21:32] remember when phabricator got introduced so that we stop using different ticket systems in each team [17:22:10] I pulled the comment and emailed them asking to create a phab account and respond [17:23:10] cool. going by "email this address" i doubt they use any ticket system. unless it's an RT queue :) [17:24:15] jynus: that is not helpful [17:25:27] my comment wasn't very helpful either, sorry for starting the downhill slide there [17:25:41] but it is kind of crazy, esp given the existing length history on this trivial ticket :P [17:26:05] ^ +1 [17:29:24] bblack: going to add the prepend to the routes we advertise to our peering/transit in eqsin [17:29:39] XioNoX: do we need to? [17:29:47] bblack: no [17:29:56] I'd say hold unless we see a problem [17:30:06] works for me :) [17:30:17] we're already wondering about whether and how soon we might pull back on the other two sites' adverts given the MTU thing [17:30:23] bblack: should we remove the prepend in esams? [17:30:30] and at least we'll have to add advmss to cp5xxx if we do turn it on [17:30:31] ok [17:30:47] I think decision still pending, need to ask CF some things, etc [17:30:52] bblack: there is *some* traffic going through the eqsin tunnels, but it's really not much [17:31:22] ah, I figured just the tunnel was configured but they weren't advertising yet, I hadn't checked [17:31:32] hmmm [17:31:47] I guess at this point, simplest answer will be to add cp5 to the mitigation list heh [17:31:50] bblack: yeah they are advertising, but our regular transit/peering are prefered most of the time [17:32:34] bblack: https://librenms.wikimedia.org/device/device=175/tab=port/port=18871/ [17:33:11] 10Traffic, 10Operations: GRE MTU mitigations - Tracking - https://phabricator.wikimedia.org/T232602 (10BBlack) [18:18:54] 10HTTPS, 10Traffic, 10Operations: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559 (10MBeat33) Hi all, @Jseddon is on leave, but this is on his agenda for when he returns. I know he's engaged with Shopify about this issue. [18:33:24] 10HTTPS, 10Traffic, 10Operations: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559 (10BBlack) @MBeat33 + @Jseddon - Thank you for the update(s) [22:04:04] 10Traffic, 10Analytics, 10Operations: Images served with text/html content type - https://phabricator.wikimedia.org/T232679 (10Nuria) [22:06:23] 10Traffic, 10Analytics, 10Operations: Images served with text/html content type - https://phabricator.wikimedia.org/T232679 (10Nuria) This has the effect that these images are being considered content pageviews when they are just asset requests [22:34:20] 10Traffic, 10Analytics, 10Operations: Images served with text/html content type - https://phabricator.wikimedia.org/T232679 (10Nuria) I think we need to add proxy=googleweblight to x-analytics [23:46:50] 10Traffic, 10Operations, 10Performance-Team (Radar): Some HTTP requests for MW failing due to "ERR_SPDY_PROTOCOL_ERROR 200" - https://phabricator.wikimedia.org/T220022 (10Krinkle) [23:47:58] 10Traffic, 10Operations, 10Performance-Team (Radar): Some HTTP requests for MW failing due to "ERR_SPDY_PROTOCOL_ERROR 200" - https://phabricator.wikimedia.org/T220022 (10Krinkle) [23:48:23] 10Traffic, 10Operations, 10Performance-Team (Radar): Some HTTP requests for MW failing due to "ERR_SPDY_PROTOCOL_ERROR 200" - https://phabricator.wikimedia.org/T220022 (10Krinkle) [23:49:53] 10Traffic, 10Operations, 10Performance-Team (Radar): Some HTTP requests for MW failing due to "ERR_SPDY_PROTOCOL_ERROR 200" - https://phabricator.wikimedia.org/T220022 (10Krinkle) >>! At T232252, @Agusbou2015 wrote: > > I click on "Publish changes" (on any Wikimedia project) and changes are not saved. > Ste...