[09:19:01] 10netops, 10Operations, 10SRE-swift-storage: ms-be2050 shows network errors - https://phabricator.wikimedia.org/T271041 (10ayounsi) Symptoms are a bit similar to T269313 but I don't think it's the same issue as the switch port is showing dropped multicast traffic for no reason. ` asw-d-codfw> show interfaces... [09:24:12] 10netops, 10Operations, 10SRE-swift-storage, 10ops-codfw: ms-be2050 shows network errors - https://phabricator.wikimedia.org/T271041 (10elukey) [09:25:18] 10netops, 10Operations, 10SRE-swift-storage, 10ops-codfw: ms-be2050 shows network errors - https://phabricator.wikimedia.org/T271041 (10elukey) @Papaul Hi! happy new year :) When you are in can you ping me or Filippo to swap the DAC between ms-be2050 and asw-d-codfw? [09:25:29] 10netops, 10Operations, 10SRE-swift-storage, 10ops-codfw: ms-be2050 shows network errors - https://phabricator.wikimedia.org/T271041 (10elukey) p:05Triage→03Medium [09:26:21] 10Traffic, 10Operations, 10Performance-Team: Enable webp thumbnails on all images for non-Commons wikis - https://phabricator.wikimedia.org/T269946 (10Joe) @jbond you can't remove the operations tag without also removing the #traffic one. Not sure if it should given @Peachey88's comment above. [09:30:12] 10Traffic, 10Operations, 10Technical-blog-posts: 3rd part of blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T270074 (10ema) 05Open→03Resolved a:03ema >>! In T270074#6701946, @srodlund wrote: > @ema these should all be fixed now. :-) I'll send... [09:45:37] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1) - https://phabricator.wikimedia.org/T264398 (10ema) Varnish 6.0.0 does not seem to be affected by the regression, here is the average `webperf_navtiming_responsestart... [09:53:39] 10Traffic, 10Operations, 10ops-eqiad: Interface errors on asw2-a-eqiad:xe-4/0/7 (lvs1016) - https://phabricator.wikimedia.org/T271087 (10ayounsi) p:05Triage→03Medium [11:38:17] 10Traffic, 10netops, 10Operations, 10User-jbond: varnish filtering: should we automatically update public_cloud_nets - https://phabricator.wikimedia.org/T270391 (10ayounsi) 2 other options: * Define a list of ASNs and get the matching prefixes from BGP (or API like RIPE stats) * Define a list of ASNs and g... [14:39:00] 10Traffic, 10Operations, 10Patch-For-Review: Deploy Wikidough: Experimental DNS-over-HTTPS (DoH) public resolver - https://phabricator.wikimedia.org/T252132 (10ssingh) [14:39:24] 10Traffic, 10Operations, 10Patch-For-Review: Integration tests for Wikidough - https://phabricator.wikimedia.org/T267424 (10ssingh) 05Open→03Resolved With https://gerrit.wikimedia.org/r/639838 merged, I am going to mark this as resolved as the first iteration of the test suite for Wikidough is now comple... [15:29:32] 10netops, 10Operations, 10SRE-swift-storage, 10ops-codfw: ms-be2050 shows network errors - https://phabricator.wikimedia.org/T271041 (10Papaul) 05Open→03Resolved a:03Papaul ` Queue: 8, Forwarding classes: mcast Queued: Packets : 0 0 pps B... [16:41:57] 10Acme-chief, 10LDAP, 10cloud-services-team (Kanban): acme-chief ldap certs required chained (with intermediate CA) versions suddenly - https://phabricator.wikimedia.org/T271063 (10dancy) [17:08:41] 10Traffic, 10Operations: ats-be occasional system CPU usage increase - https://phabricator.wikimedia.org/T265625 (10ema) All that CPU time is spent in the kernel, and specifically calling `mmap` a lot. I've seen `ksys_mmap_pgoff` featured prominently in `perf report` of affected nodes, and tracing for 10 secon... [18:31:54] 10Acme-chief, 10Patch-For-Review: Add simple script for account creation - https://phabricator.wikimedia.org/T207372 (10Andrew) This script is now in place on acme-chief cloud hosts as /usr/local/bin/create_acme_le_account.py I'm not sure if this task represented a grander ambition than that. [19:00:57] 10Traffic, 10Operations, 10SRE-tools, 10User-crusnov: Some Traffic clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271144 (10crusnov) [19:37:25] 10Traffic, 10Operations, 10SRE-tools, 10IPv6, 10User-crusnov: Some Traffic clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271144 (10Aklapper) [20:22:08] 10netops, 10Operations, 10SRE-tools, 10IPv6, 10User-jbond: Some Foundation clusters do not appear to support IPv6 - https://phabricator.wikimedia.org/T271136 (10Dzahn) [20:22:10] 10netops, 10Operations, 10SRE-tools, 10IPv6, 10User-jbond: Some Foundation clusters do not appear to support IPv6 - https://phabricator.wikimedia.org/T271136 (10Dzahn) adding netops because ping* offload servers are in their domain, right? [20:22:50] 10netops, 10Analytics-Clusters, 10Operations, 10SRE-tools, and 2 others: Some Foundation clusters do not appear to support IPv6 - https://phabricator.wikimedia.org/T271136 (10Dzahn) [20:34:35] 10netops, 10Operations, 10SRE-tools, 10IPv6, 10User-jbond: Some Foundation clusters do not appear to support IPv6 - https://phabricator.wikimedia.org/T271136 (10elukey) Removing the Analytics tag since kafka-main servers are managed by SRE (it is the codfw cluster for the jobqueue etc..) :) [22:43:17] 10Acme-chief, 10LDAP, 10cloud-services-team (Kanban): acme-chief ldap certs required chained (with intermediate CA) versions suddenly - https://phabricator.wikimedia.org/T271063 (10Bstorm) p:05Triage→03Medium I think this is resolved now. I'll leave it open for a little while for people to disagree.