[03:31:10] <wikibugs>	 10Traffic, 10Operations, 10Performance-Team (Radar): Some HTTP requests for MW failing due to "ERR_SPDY_PROTOCOL_ERROR 200" - https://phabricator.wikimedia.org/T220022 (10matmarex) Has anyone seen this issue again in the past two weeks? If not, the VE patch might have fixed it…
[04:45:49] <wikibugs>	 10Traffic, 10netops, 10Operations, 10Wikimedia-General-or-Unknown: Numerous people reporting issues saving edits and viewing previews/diffs - https://phabricator.wikimedia.org/T232491 (10Marostegui) 05Open→03Resolved Going to close this for now. Feel free to reopen if needed.
[10:40:48] <effie>	 ema it looks that cp3034 has a much larger failed fetched error rate 
[10:40:50] <effie>	 than the others 
[10:41:10] <effie>	 do you know what's wrong?
[10:41:19] <effie>	 just curious, I run into it
[10:42:48] <volans>	 effie: ENOEMA ;) cc vgutierrez 
[10:43:11] <vgutierrez>	 that's interesting
[10:43:27] <vgutierrez>	 cp3034 is running ATS as TLS termination layer
[10:43:35] <vgutierrez>	 effie: where are you seeing that?
[10:45:53] <effie>	 let me find it again 
[10:49:46] <vgutierrez>	 hmmm https://grafana.wikimedia.org/d/000000352/varnish-failed-fetches?orgId=1&var-datasource=esams%20prometheus%2Fops&var-cache_type=upload&var-server=All&var-layer=frontend&from=1568576222591&to=1568803747192
[10:49:47] <vgutierrez>	 that one?
[10:50:19] <effie>	 https://grafana.wikimedia.org/d/000000352/varnish-failed-fetches?panelId=3&fullscreen&orgId=1&var-datasource=esams%20prometheus%2Fops&var-cache_type=upload&var-server=All&var-layer=frontend&from=now-6h&to=now
[10:50:28] <effie>	 yeah 
[10:51:52] <vgutierrez>	 wonderful
[10:53:27] <vgutierrez>	 it's consistent for all the instances running ATS-TLS instead of nginx
[10:53:34] <vgutierrez>	 🍿
[10:54:05] <effie>	 I at least it is consistent :p
[10:54:07] <wikibugs>	 10Traffic, 10Operations, 10Wikidata, 10serviceops, and 3 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10WMDE-leszek) @Dzahn the instance was no longer in use, so I've just deleted it.
[10:54:22] <effie>	 should I file a task ?
[10:54:32] <vgutierrez>	 please :)
[10:54:36] <effie>	 haha
[10:54:39] <vgutierrez>	 I was going to do it otherwise
[10:54:43] <effie>	 sure
[10:55:18] <vgutierrez>	 T231433 that's a nice parent
[10:55:19] <stashbot>	 T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433
[10:56:29] <effie>	 tx
[10:56:54] <vgutierrez>	 I'm wondering if....
[10:57:46] <vgutierrez>	 let me know the task id..
[10:57:53] <effie>	 sure
[10:59:35] <vgutierrez>	 I've got a suspect already... let me check TS source code...
[11:01:16] <wikibugs>	 10Traffic, 10Operations: Higher failed fetches error rate on some caching servers - https://phabricator.wikimedia.org/T233205 (10jijiki)
[11:01:52] <wikibugs>	 10Traffic, 10Operations: Higher failed fetches error rate on some caching servers - https://phabricator.wikimedia.org/T233205 (10jijiki)
[11:01:55] <wikibugs>	 10Traffic, 10Operations: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10jijiki)
[11:20:24] <vgutierrez>	 hmmm
[11:20:53] <vgutierrez>	 something is telling me that the Proxy-Connection: close header sent by ATS is messing with varnish-fe
[11:21:51] <vgutierrez>	 https://grafana.wikimedia.org/d/000000352/varnish-failed-fetches?panelId=5&fullscreen&orgId=1&var-datasource=eqsin%20prometheus%2Fops&var-cache_type=upload&var-server=All&var-layer=frontend
[11:23:50] <vgutierrez>	 or even worse, with the ats-be behind varnish-fe :)
[11:49:56] <vgutierrez>	 yup... Proxy-Connection looks like the culprit :)
[12:18:23] <wikibugs>	 10Traffic, 10Operations: Higher failed fetches error rate on some caching servers - https://phabricator.wikimedia.org/T233205 (10Vgutierrez) It looks like ats-tls setting `Proxy-Connection`to `Close` is messing with varnish-fe<-->ats-be connections as it can be seen in https://grafana.wikimedia.org/d/000000352...
[12:26:28] <wikibugs>	 10Traffic, 10Operations: Higher failed fetches error rate on some caching servers - https://phabricator.wikimedia.org/T233205 (10Vgutierrez) 05Open→03Resolved p:05Triage→03Normal a:03Vgutierrez Solved by preventing Proxy-Connection from spreading across varnish-fe and ats-be, thanks for reporting the...
[12:26:32] <wikibugs>	 10Traffic, 10Operations: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez)
[13:46:19] <wikibugs>	 10netops, 10Analytics, 10Analytics-Kanban, 10Operations, 10ops-eqiad: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10elukey) Thanks to the awesome work of @Jclark-ctr an-presto1001 and an-presto1003 are now reimaged, but an-p...
[13:47:59] <wikibugs>	 10netops, 10Analytics, 10Analytics-Kanban, 10Operations, 10ops-eqiad: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10elukey) Proposed fix for asw2-b:  ` delete interfaces interface-range cloud-hosts1-b-eqiad member xe-4/0/5 s...
[13:55:48] <wikibugs>	 10netops, 10Analytics, 10Analytics-Kanban, 10Operations, 10ops-eqiad: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10akosiaris) >>! In T225128#5503053, @elukey wrote: > Proposed fix for asw2-b: >  > ` > delete interfaces inte...
[14:02:56] <wikibugs>	 10netops, 10Analytics, 10Analytics-Kanban, 10Operations, 10ops-eqiad: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10elukey) Committed:  ` elukey@asw2-b-eqiad# show | compare [edit interfaces interface-range vlan-cloud-hosts1...
[14:41:40] <wikibugs>	 10netops, 10Analytics, 10Analytics-Kanban, 10Operations, and 2 others: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10elukey) Ok so current status:  * All hosts reimaged to buster and working * Renamed hostnames in netbox * Wai...
[14:46:20] <wikibugs>	 10netops, 10Analytics, 10Analytics-Kanban, 10Operations, and 2 others: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Ottomata) AWESOME thank youuuu
[17:28:54] <wikibugs>	 10Traffic, 10Operations, 10ops-esams: rack/setup/install cp30[50-65].esams.wmnet - https://phabricator.wikimedia.org/T233242 (10RobH) p:05Triage→03Normal
[17:29:05] <wikibugs>	 10Traffic, 10Operations, 10ops-esams: rack/setup/install cp30[50-65].esams.wmnet - https://phabricator.wikimedia.org/T233242 (10RobH)
[18:45:15] <wikibugs>	 10Wikimedia-Apache-configuration, 10Performance-Team, 10Patch-For-Review: Apache configuration: SVGs served by MediaWiki aren't gzipped - https://phabricator.wikimedia.org/T232615 (10Krinkle)
[19:20:47] <wikibugs>	 10Traffic, 10Analytics, 10Operations: We are not capturing IPs of original requests for proxied requests from operamini and googleweblight. x-forwarded-for is null and client-ip is the same as IP on Webrequest data - https://phabricator.wikimedia.org/T232795 (10herron) p:05Triage→03Normal
[20:34:36] <wikibugs>	 10netops, 10Operations: Instability of the Level3 link between cr2-eqiad and cr2-esams - https://phabricator.wikimedia.org/T228827 (10ayounsi) 05Open→03Resolved From Level3: > I appreciate your patience while we worked on gathering the data on these repair tickets.  I’ve attached the repair ticket log abov...
[21:13:59] <wikibugs>	 10Traffic, 10Operations, 10Wikidata, 10serviceops, and 3 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) Thanks @WMDE-leszek. !    I think we can close this now @BBlack @Vgutierrez I don't see more leftovers to clean up.
[21:16:06] <wikibugs>	 10Traffic, 10netops, 10Operations: Configure interface damping on primary links - https://phabricator.wikimedia.org/T196432 (10ayounsi) 05Open→03Resolved All primary link of all transport pairs have now damping configured.
[21:56:27] <wikibugs>	 10netops, 10DC-Ops, 10Operations, 10ops-eqiad: Check for faulty optic asw-c-eqiad to cr1-eqiad - https://phabricator.wikimedia.org/T233265 (10Cmjohnson)
[23:14:31] <wikibugs>	 10netops, 10DC-Ops, 10Operations, 10ops-eqiad: Check for faulty optic asw-c-eqiad to cr1-eqiad - https://phabricator.wikimedia.org/T233265 (10Cmjohnson) swapped both optics on cr1-eqiad and asw2-c xe-2/045. Giving it 24 hours to see if any errors return