[06:28:56] (HAProxyEdgeTrafficDrop) firing: 65% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [06:33:56] (HAProxyEdgeTrafficDrop) resolved: (2) 63% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [07:03:56] (HAProxyEdgeTrafficDrop) firing: 56% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [07:08:56] (HAProxyEdgeTrafficDrop) resolved: 57% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [07:49:35] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=1ea26f52-695b-41ae-a3b4-28808d44161a) set by ayounsi@cumin1001 for 4:00:00 on 3 host(s) and th... [08:16:58] (PyBalBGPUnstable) firing: (4) PyBal BGP sessions on instance lvs1017 are failing - TODO - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [09:29:19] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=acbab0ff-4998-42b3-b0ad-a6be933dfff6) set by ayounsi@cumin1001 for 4:00:00 on 3 host(s) and th... [09:31:58] (PyBalBGPUnstable) resolved: (4) PyBal BGP sessions on instance lvs1017 are failing - TODO - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [09:46:58] (PyBalBGPUnstable) firing: (4) PyBal BGP sessions on instance lvs1017 are failing - TODO - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [10:46:58] (PyBalBGPUnstable) resolved: (4) PyBal BGP sessions on instance lvs1017 are failing - TODO - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [10:50:27] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=01f0d013-5101-4278-93a6-1ea49f9dea28) set by ayounsi@cumin1001 for 1:00:00 on 2 host(s) and th... [11:20:40] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ayounsi) eqiad and eqord went extremely well. Thanks @cmooney for the [[ https://wikitech.wikimedia.org/wiki/Juniper_RE_i40e_firmware | firmware instructions ]] [11:21:27] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ayounsi) [11:24:49] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade network devices to Junos 20+ - https://phabricator.wikimedia.org/T316539 (10ayounsi) [11:25:23] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ayounsi) 05Open→03Resolved [13:49:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10Jclark-ctr) cableid 2207506656 fpc7 - fpc5 cableid 2207506655 fpc2 - fpc6 cableid 2207506658 fpc7 - fpc3 Ayounsi did you... [16:06:56] (HAProxyEdgeTrafficDrop) firing: (5) 19% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [16:11:56] (HAProxyEdgeTrafficDrop) resolved: (6) 40% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [16:13:56] (HAProxyEdgeTrafficDrop) firing: (6) 31% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [16:18:56] (HAProxyEdgeTrafficDrop) resolved: (6) 63% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [16:20:56] (HAProxyEdgeTrafficDrop) firing: (6) 27% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [16:23:16] (VarnishTrafficDrop) firing: (4) Varnish traffic in codfw has dropped 21.291095888934912% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [16:28:16] (VarnishTrafficDrop) firing: (11) Varnish traffic in codfw has dropped 16.499398776584002% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [16:30:56] (HAProxyEdgeTrafficDrop) resolved: (6) 21% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [16:33:16] (VarnishTrafficDrop) resolved: (12) Varnish traffic in codfw has dropped 16.499398776584002% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [16:49:03] 10Traffic, 10InternetArchiveBot, 10SRE: IABot is encountering 429 on Wikimedia Production - https://phabricator.wikimedia.org/T318065 (10Markjgraham_hmb) I would very much appreciate the opportunity to speak with whoever is in charge of the SRE team. I am mark@archive.org (917) 697-0110 [17:43:01] 10Traffic, 10InternetArchiveBot, 10SRE: IABot is encountering 429 on Wikimedia Production - https://phabricator.wikimedia.org/T318065 (10KOfori) Hello Mark, I will be in touch with you concerning this. [18:36:40] 10Traffic, 10InternetArchiveBot, 10SRE: IABot is encountering 429 on Wikimedia Production - https://phabricator.wikimedia.org/T318065 (10Markjgraham_hmb) I just spoke with Kwaku Addo Ofori via a video call. Thank you Kwaku and the entire SRE team for your care and attention! I am grateful for your efforts... [18:46:27] 10Traffic, 10InternetArchiveBot, 10SRE: IABot is encountering 429 on Wikimedia Production - https://phabricator.wikimedia.org/T318065 (10KOfori) Thanks for speaking to me, Mark. We can confirm the bit about the IP being added to the allow list next week but I believe the best way forward is for us to work to... [18:52:51] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp - https://phabricator.wikimedia.org/T317244 (10RobH) [18:56:41] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp - https://phabricator.wikimedia.org/T317244 (10RobH) [20:26:38] 10Domains, 10Analytics-Radar, 10SRE, 10Traffic-Icebox, 10WMF-General-or-Unknown: Don't set cookies in traffic layer for non-user facing domains (avoid false third-party cookie warning) - https://phabricator.wikimedia.org/T262996 (10Krinkle) >>! In T262996#8002643, @Nemo_bis wrote: > Is this related to "T... [20:26:58] 10Domains, 10Analytics-Radar, 10SRE, 10Traffic-Icebox, and 2 others: Don't set cookies in traffic layer for non-user facing domains (avoid false third-party cookie warning) - https://phabricator.wikimedia.org/T262996 (10Krinkle) [20:42:57] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp - https://phabricator.wikimedia.org/T317244 (10RobH) [21:20:58] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp - https://phabricator.wikimedia.org/T317244 (10RobH) [21:59:06] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp - https://phabricator.wikimedia.org/T317244 (10RobH) [21:59:22] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp - https://phabricator.wikimedia.org/T317244 (10RobH) a:05BBlack→03RobH [23:56:36] 10Traffic, 10decommission-hardware: decommission cp4021 &n cp4027 - https://phabricator.wikimedia.org/T318963 (10RobH) Brandon, Both of these hosts have had the decom script run, but they still have references in config files/repo for removal. [23:56:55] 10Traffic, 10decommission-hardware: decommission cp4021 &n cp4027 - https://phabricator.wikimedia.org/T318963 (10RobH)