[04:10:44] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 3 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) 05Open→03Resolved a:03tstarling [05:29:56] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 3 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) Here's a model of the benefit of the multi-DC project for users west of codfw. The servers are 30ms closer, but codfw seems a bit slower, so if yo... [07:48:16] (VarnishTrafficDrop) firing: Varnish traffic in esams has dropped 69.54865519134279% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [07:49:56] (HAProxyEdgeTrafficDrop) firing: 54% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [07:53:16] (VarnishTrafficDrop) firing: Varnish traffic in esams has dropped 35.038340426029436% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [08:09:34] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=3b336fa4-f522-4b10-abdb-d6be83f6a04a) set by ayounsi@cumin2002 for 2:00:00 on 3 host(s) and th... [08:16:57] (PyBalBGPUnstable) firing: PyBal BGP sessions on instance lvs3007 are failing - https://grafana.wikimedia.org/d/000000488/pybal-bgp?var-datasource=esams%20prometheus/ops&var-server=lvs3007 - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [08:18:16] (VarnishTrafficDrop) resolved: (2) Varnish traffic in esams has dropped 2.0071758680392695% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [08:21:56] (PyBalBGPUnstable) firing: (3) PyBal BGP sessions on instance lvs3005 are failing - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [08:34:56] (HAProxyEdgeTrafficDrop) resolved: 67% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [08:37:56] (HAProxyEdgeTrafficDrop) firing: 68% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [08:45:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=e0d9eb2b-5520-4f80-912e-3627c94e9982) set by ayounsi@cumin2002 for 2:00:00 on 3 host(s) and th... [08:46:57] (PyBalBGPUnstable) resolved: (3) PyBal BGP sessions on instance lvs3005 are failing - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [08:47:56] (HAProxyEdgeTrafficDrop) resolved: 67% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [08:56:57] (PyBalBGPUnstable) firing: (6) PyBal BGP sessions on instance lvs3005 are failing - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [09:03:56] (HAProxyEdgeTrafficDrop) firing: 67% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:08:56] (HAProxyEdgeTrafficDrop) resolved: 64% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:31:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=ff1db65d-a6ee-4e20-ae07-837bbe264b2f) set by ayounsi@cumin2002 for 2:00:00 on 2 host(s) and th... [09:31:57] (PyBalBGPUnstable) resolved: (3) PyBal BGP sessions on instance lvs3005 are failing - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [09:53:36] 10netops, 10Infrastructure-Foundations, 10SRE: Overlay VRF / VXLAN traffic failure between lsw1-f2-eqiad and lsw1-f3-eqiad - https://phabricator.wikimedia.org/T315038 (10cmooney) 05Open→03Resolved So after quite a bit of back-and-forth with Juniper and pulling logs etc. they say they can't see anything i... [10:05:27] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ayounsi) cr2-esams and cr3-knams got upgraded as expected. cr3-esams failed as it requires a firmware upgrade, and only JTAC can provide us the firmware. We wi... [10:06:38] 10Traffic, 10MediaWiki-Core-HTTP-Cache, 10MediaWiki-Page-history, 10SRE, and 3 others: History pages' caches not being invalidated after edits - https://phabricator.wikimedia.org/T317064 (10Vgutierrez) 05Open→03Resolved a:03Joe ` #before the edit vgutierrez@carrot:~$ curl "https://test.wikipedia.org/... [10:07:03] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ayounsi) [10:16:16] (VarnishTrafficDrop) firing: Varnish traffic in drmrs has dropped 68.51628278229387% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [10:17:56] (HAProxyEdgeTrafficDrop) firing: 57% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [10:21:16] (VarnishTrafficDrop) firing: Varnish traffic in drmrs has dropped 44.905538705573406% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [10:50:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10cmooney) [10:51:16] (VarnishTrafficDrop) resolved: (2) Varnish traffic in drmrs has dropped 56.8118579280126% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [10:52:41] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10cmooney) The firmware provided by Juniper seems to be accepted by cr3-esams: ` cmooney@re0.cr3-esams> show system firmware | match "^Part|version|i40" Part... [10:52:56] (HAProxyEdgeTrafficDrop) resolved: 64% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [10:52:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10cmooney) [14:59:29] Hi folks - I have managed to get the Clinic Duty untriaged jobs panel almost down to a sensible level, but there are still 4 Traffic jobs needing a priority - would you mind adding a priority to T122097 T300247 T315536 and T315676 please? [14:59:30] T315536: Create program to interact with Atlas RIPE API - https://phabricator.wikimedia.org/T315536 [14:59:30] T315676: Add DP cookie for pageview filtering - https://phabricator.wikimedia.org/T315676 [14:59:30] T122097: Set expiry time for GeoIP cookies - https://phabricator.wikimedia.org/T122097 [14:59:31] T300247: Remove old and unused libvarnishapi - https://phabricator.wikimedia.org/T300247 [15:01:00] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade management routers and switches to Junos 21 - https://phabricator.wikimedia.org/T316529 (10Papaul) I was having the issue below upgrading mr1 to version 21 ` Validating against /config/rescue.conf.gz /config/rescue.conf.gz:61:(21) syntax error at 'rfc-co... [15:01:30] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade management routers and switches to Junos 21 - https://phabricator.wikimedia.org/T316529 (10Papaul) [15:01:54] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade management routers and switches to Junos 21 - https://phabricator.wikimedia.org/T316529 (10Papaul) ` papaul@mr1-codfw> show version Hostname: mr1-codfw Model: srx300 Junos: 21.2R3-S2.9 JUNOS Software Release [21.2R3-S2.9] ` [15:16:15] ...if you'd rather I'd just marked them all "medium" and/or remove the SRE tag, do say :) [16:03:30] Emperor: I think you should be fine just triaging as medium for them all. We can always bump them up/down as needed. Thanks for doing that! [16:03:49] oh, I'll add the priorities as was asked :) [16:04:23] 10Traffic, 10SRE, 10Patch-For-Review: Create program to interact with Atlas RIPE API - https://phabricator.wikimedia.org/T315536 (10BCornwall) p:05Triage→03Medium [16:04:38] 10Traffic, 10SRE: Remove old and unused libvarnishapi - https://phabricator.wikimedia.org/T300247 (10BCornwall) p:05Triage→03Medium [16:04:56] 10Traffic, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10SRE: Set expiry time for GeoIP cookies - https://phabricator.wikimedia.org/T122097 (10BCornwall) p:05Triage→03Medium [16:04:59] 10Traffic, 10SRE, 10Patch-For-Review: Add DP cookie for pageview filtering - https://phabricator.wikimedia.org/T315676 (10BCornwall) p:05Triage→03Medium [16:05:51] brett: brilliant, thanks :) [16:23:18] 10Traffic, 10MediaWiki-Uploading: ATS 502 on uploading non-small files - https://phabricator.wikimedia.org/T299160 (10Krinkle) [16:23:42] 10Traffic, 10Commons, 10MediaWiki-Uploading, 10SRE-swift-storage, and 2 others: 502 Server Hangup Error on esams for "Upload a new version of this file" on Special:Upload on Commons - https://phabricator.wikimedia.org/T247454 (10Krinkle) [17:40:19] 10netops, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE: Undocumented IP on WMCS network - https://phabricator.wikimedia.org/T315955 (10Andrew) ` root@cloudcontrol2005-dev:~# dig +noall +answer SOA 16-29.57.15.185.in-addr.arpa. 16-29.57.15.185.in-addr.arpa. 120 IN SOA ns0.openstack.codfw1dev.wikime... [18:23:57] (HAProxyEdgeTrafficDrop) firing: 65% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [18:28:57] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [20:09:16] (VarnishTrafficDrop) firing: Varnish traffic in esams has dropped 66.52701816576015% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [20:10:56] (HAProxyEdgeTrafficDrop) firing: 65% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [20:14:16] (VarnishTrafficDrop) firing: Varnish traffic in esams has dropped 54.79928980193598% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [20:29:16] (VarnishTrafficDrop) resolved: (2) Varnish traffic in esams has dropped 57.9655603164252% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [20:30:56] (HAProxyEdgeTrafficDrop) resolved: 65% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop