[02:33:28] 10netops, 10Operations, 10fundraising-tech-ops: Deploy pfw policy to allow https to frmon.frdev.wikimedia.org - https://phabricator.wikimedia.org/T215364 (10ayounsi) a:03ayounsi Done. No diff in codfw, 1 new rule in eqiad. [03:10:36] 10netops, 10Operations, 10fundraising-tech-ops: Deploy pfw policy to allow https to frmon.frdev.wikimedia.org - https://phabricator.wikimedia.org/T215364 (10cwdent) 05Open→03Resolved Working, thanks @ayounsi ! [08:15:16] 10netops, 10Operations, 10Performance-Team (Radar): Stop prioritizing peering over transit - https://phabricator.wikimedia.org/T204281 (10Gilles) I've updated the Google Spreadsheet with the figures up to yesterday. It seems like nothing changed from the end users' perspective in terms of median time-to-firs... [08:51:10] 10Traffic, 10Operations, 10Performance-Team, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) The simplest architecture really is to touch swift objects on every retrieval, which can be done async, but the unknown is how much extra lo... [09:42:51] 10Traffic, 10Operations: esams cache layer mangles downloads of specific url - https://phabricator.wikimedia.org/T215389 (10akosiaris) [09:43:02] 10Traffic, 10Operations: esams cache layer mangles downloads of specific url - https://phabricator.wikimedia.org/T215389 (10akosiaris) p:05Triage→03Normal [09:49:36] 10Traffic, 10Operations: esams cache layer mangles downloads of specific url - https://phabricator.wikimedia.org/T215389 (10akosiaris) [10:00:01] 10Traffic, 10Operations: esams cache layer mangles downloads of specific url - https://phabricator.wikimedia.org/T215389 (10akosiaris) cp3030 seems to be in some trouble since approximately 04:30 [1] [1] https://grafana.wikimedia.org/d/000000352/varnish-failed-fetches?orgId=1&var-datasource=esams%20prometheus... [10:00:46] 10Traffic, 10Operations: esams cache layer mangles downloads of specific url - https://phabricator.wikimedia.org/T215389 (10akosiaris) p:05Normal→03High [10:08:11] 10Traffic, 10Operations: esams cache layer mangles downloads of specific url - https://phabricator.wikimedia.org/T215389 (10akosiaris) p:05High→03Low The restart of varnish-frontend on cp3030 indeed resolved the issue. I 'll lower priority but leave task open. Feel free to resolve however. [10:41:55] 10Traffic, 10Operations, 10Performance-Team, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10fgiunchedi) IIRC object expiration was considered years ago (i.e. https://wikitech.wikimedia.org/wiki/Swift/ObjectExpiration) and at the time consid... [12:33:49] 10Traffic, 10Operations: esams cache layer mangles downloads of specific url - https://phabricator.wikimedia.org/T215389 (10Vgutierrez) Checking the rest of the text cluster in esams from bast3002 showed that all of them where affected. After restarting varnish-frontend the issue is gone. I'll leave the task o... [14:52:14] 10Traffic, 10Cloud-VPS, 10Operations, 10Toolforge, 10Patch-For-Review: Wikimedia varnish rules no longer exempt all Cloud VPS/Toolforge IPs from rate limits (HTTP 429 response) - https://phabricator.wikimedia.org/T213475 (10Cyberpower678) This patch doesn't seem to actually fix the issue, just restructur... [15:36:01] 10Traffic, 10Operations, 10Performance-Team, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10ori) >>! In T211661#4931056, @fgiunchedi wrote: > And indeed I share the concerns already mentioned, namely making sure we're able to have a bound o... [16:46:58] 10Traffic, 10Operations, 10Performance-Team, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) It's true that even if we only clean up a portion of thumbnails, we're already in a good place. The operational goal is to free up space at... [16:49:11] 10Traffic, 10Operations, 10Performance-Team, 10media-storage: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10Gilles) What we can do is if we see that the thumbnail already has a X-Delete-After header on get, we update it. If it doesn't have the header, we r... [18:03:30] 10netops, 10Operations: Spike of multicast traffic - https://phabricator.wikimedia.org/T212273 (10ayounsi) My guess so far is that the recabling triggered a bug in Junos VCF which caused a multicast storm that got propagated to all listeners, filling up links and exhausting resources. After some research, ther... [19:16:47] 10Traffic, 10Operations, 10ops-ulsfo: cp4026 correctable dimm error - https://phabricator.wikimedia.org/T214516 (10RobH) ` robh@cp4026:~$ sudo ipmi-sel ID | Date | Time | Name | Type | Event 1 | Apr-23-2017 | 23:39:37 | SEL | Event Logging Disabled... [19:22:53] 10netops, 10Operations, 10ops-eqiad, 10ops-eqsin: Deploy cr2-eqsin - https://phabricator.wikimedia.org/T213121 (10ayounsi) >>! In T205487#4932700, @Cmjohnson wrote: > @ayounsi This is the contents that I am shipping to eqsin. Please confirm that is all you need > > 12 SFP-10GLR Transceivers > (2) 3M LC-... [19:55:33] 10Traffic, 10Operations, 10ops-ulsfo: cp4026 correctable dimm error - https://phabricator.wikimedia.org/T214516 (10RobH) 05Open→03Resolved Ok, things I did to fix this system so far: * set system and services/mgmt to maint mode for 2 hours * updated task with full SEL log output * powered off system * u... [21:04:05] 10Traffic, 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog, and 3 others: Document and possibly fine-tune how Proton interacts with Varnish - https://phabricator.wikimedia.org/T213371 (10pmiazga) @Tgr I assume you're still waiting for answers from @ema? Is there anything I can help you with? [22:35:55] 10Traffic, 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog, and 3 others: Document and possibly fine-tune how Proton interacts with Varnish - https://phabricator.wikimedia.org/T213371 (10Tgr) Yeah, but I don't think this task should be a blocker (for either handover or production switchover).... [22:36:09] 10Traffic, 10Operations, 10Proton, 10Reading-Infrastructure-Team-Backlog, and 3 others: Document and possibly fine-tune how Proton interacts with Varnish - https://phabricator.wikimedia.org/T213371 (10Tgr)