[05:33:24] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#11979903 (10ayounsi) [05:34:19] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#11979906 (10ayounsi) [05:34:28] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#11979908 (10ayounsi) [07:38:27] 10netops, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A5 maintenance - https://phabricator.wikimedia.org/T428020#11980094 (10MLechvien-WMF) [08:11:05] 10netops, 06Infrastructure-Foundations: Create public vlans in eqiad and codfw - https://phabricator.wikimedia.org/T422043#11980301 (10ayounsi) [08:15:50] 10netops, 06Infrastructure-Foundations: Create public vlans in eqiad and codfw - https://phabricator.wikimedia.org/T422043#11980302 (10ayounsi) [08:19:50] 10netops, 06Infrastructure-Foundations: Create public vlans in eqiad and codfw - https://phabricator.wikimedia.org/T422043#11980303 (10ayounsi) @cmooney see task description, let me know what you think of: * eqiad A/B : 208.80.155.32/28 - 2620:0:861:6::/64 * eqiad C/D : 208.80.155.48/28 - 2620:0:861:7::/64 *... [08:22:29] 10netops, 06Infrastructure-Foundations: Create public vlans in eqiad and codfw - https://phabricator.wikimedia.org/T422043#11980305 (10cmooney) >>! In T422043#11980303, @ayounsi wrote: > @cmooney see task description, let me know what you think of: > * eqiad A/B : 208.80.155.32/28 - 2620:0:861:6::/64 > * eqiad... [08:45:04] 06Traffic, 10Lift-Wing, 10Semantic Search, 07Essential-Work, 06Machine-Learning-Team (Q4 FY2025-26): Transparent DNS Routing for LiftWing Services (eqiad vs Multi-DC) - https://phabricator.wikimedia.org/T422253#11980368 (10isarantopoulos) [08:48:48] 06Traffic, 10Lift-Wing, 10Semantic Search, 07Essential-Work, 06Machine-Learning-Team (Q4 FY2025-26): Transparent DNS Routing for LiftWing Services (eqiad vs Multi-DC) - https://phabricator.wikimedia.org/T422253#11980377 (10DPogorzelski-WMF) but in that case we would see a good amount of 404 since some mo... [08:59:50] 06Traffic, 10Lift-Wing, 10Semantic Search, 07Essential-Work, 06Machine-Learning-Team (Q4 FY2025-26): Transparent DNS Routing for LiftWing Services (eqiad vs Multi-DC) - https://phabricator.wikimedia.org/T422253#11980405 (10isarantopoulos) You are right, but given that we don't currently have production l... [09:16:49] 06Traffic, 06SRE: WE5.2.13 Dumps UA enforcement - https://phabricator.wikimedia.org/T427836#11980480 (10SLyngshede-WMF) [09:20:59] 06Traffic, 06ServiceOps new, 06Machine-Learning-Team (Q4 FY2025-26): k8s changes needed to allow article topic (and other future isvcs) to use the kserve v2 inference protocol (and gRPC) - https://phabricator.wikimedia.org/T424049#11980492 (10elukey) @isarantopoulos o/ before closing this task let's make sur... [10:20:24] 06Traffic, 10Lift-Wing, 10Semantic Search, 07Essential-Work, 06Machine-Learning-Team (Q4 FY2025-26): Transparent DNS Routing for LiftWing Services (eqiad vs Multi-DC) - https://phabricator.wikimedia.org/T422253#11980745 (10DPogorzelski-WMF) we are already in an active/active setup so it would be enough t... [10:26:16] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE, 13Patch-For-Review: Improve port-utilisation alerting to take QoS into account - https://phabricator.wikimedia.org/T384052#11980758 (10cmooney) 05Open→03Resolved a:03cmooney Gonna close this one. Alert is in place and fir... [10:30:45] 10netops, 06Infrastructure-Foundations, 06SRE: Change codfw dns hosts BGP peering to top-of-rack switch - https://phabricator.wikimedia.org/T376894#11980777 (10cmooney) 05Open→03Declined I'm going to close this one for now. Given we are moving the dns hosts to new vlans under T422043, during which t... [10:31:06] 10netops, 06Infrastructure-Foundations: Create public vlans in eqiad and codfw - https://phabricator.wikimedia.org/T422043#11980783 (10cmooney) [10:31:08] 10netops, 06Infrastructure-Foundations, 06SRE: Change codfw dns hosts BGP peering to top-of-rack switch - https://phabricator.wikimedia.org/T376894#11980784 (10cmooney) [10:35:01] 10netops, 06Infrastructure-Foundations: Create public vlans in eqiad and codfw - https://phabricator.wikimedia.org/T422043#11980807 (10ayounsi) [12:38:21] 06Traffic, 10Liberica, 06Machine-Learning-Team, 10Prod-Kubernetes, and 2 others: Migrate ML k8s apiserver and services to IPIP - https://phabricator.wikimedia.org/T420438#11981196 (10elukey) @isarantopoulos not incredibly urgent but Service Ops is asking teams to complete it before the end of Q1 (earlier i... [12:41:28] 06Traffic, 06ServiceOps new, 06Machine-Learning-Team (Q4 FY2025-26): k8s changes needed to allow article topic (and other future isvcs) to use the kserve v2 inference protocol (and gRPC) - https://phabricator.wikimedia.org/T424049#11981211 (10isarantopoulos) ack, will make sure to do that! [12:56:42] 06Traffic, 10Beta-Cluster-Infrastructure, 06SRE: haproxy in Beta cluster has invalid config - https://phabricator.wikimedia.org/T428052#11981250 (10Urbanecm_WMF) [12:57:48] 06Traffic, 10Liberica, 06Machine-Learning-Team, 10Prod-Kubernetes, and 2 others: Migrate ML k8s apiserver and services to IPIP - https://phabricator.wikimedia.org/T420438#11981252 (10isarantopoulos) @elukey Thanks for the ping! We'll make sure to do it and we'll reach out in case we have any questions. [13:11:29] FIRING: HAProxyRestarted: HAProxy server restarted on cp2043:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=codfw%20prometheus/ops&var-instance=cp2043&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [13:11:37] yeah [13:19:30] 06Traffic, 10Lift-Wing, 10Semantic Search, 07Essential-Work, 06Machine-Learning-Team (Q4 FY2025-26): Transparent DNS Routing for LiftWing Services (eqiad vs Multi-DC) - https://phabricator.wikimedia.org/T422253#11981295 (10elukey) @DPogorzelski-WMF the discovery records resolve to the IP of eqiad or codf... [13:21:29] RESOLVED: HAProxyRestarted: HAProxy server restarted on cp2043:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=codfw%20prometheus/ops&var-instance=cp2043&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [13:36:38] ^ it's me [13:59:27] 10netops, 06Infrastructure-Foundations: Create public vlans in eqiad and codfw - https://phabricator.wikimedia.org/T422043#11981430 (10ayounsi) [13:59:57] 10netops, 06Infrastructure-Foundations: Create public vlans in eqiad and codfw - https://phabricator.wikimedia.org/T422043#11981442 (10ayounsi) [14:00:50] 10netops, 06Infrastructure-Foundations: Create public vlans in eqiad and codfw - https://phabricator.wikimedia.org/T422043#11981458 (10ayounsi) [16:53:39] 06Traffic, 10Beta-Cluster-Infrastructure, 06SRE: Beta cluster haproxy does not support `warn-blocked-traffic-after` keyword - https://phabricator.wikimedia.org/T428052#11982409 (10ssingh) >>! In T428052#11982183, @bd808 wrote: > The Beta Cluster cache nodes are Debian Bullseye running HAProxy version 2.8.18-... [18:49:24] 06Traffic: Remove Digicert CAA records from most domains - https://phabricator.wikimedia.org/T428093 (10BCornwall) 03NEW [18:49:57] 06Traffic: Remove Digicert CAA records from most domains - https://phabricator.wikimedia.org/T428093#11982788 (10BCornwall) 05Open→03In progress p:05Triage→03Low [19:00:52] 06Traffic, 13Patch-For-Review: Remove Digicert CAA records from most domains - https://phabricator.wikimedia.org/T428093#11982860 (10ssingh) > CAA works at the subdomain level so we can set the records for payments.wikimedia.org to allow them their issuance while removing the unnecessary records for the rest o... [19:02:16] 06Traffic, 13Patch-For-Review: Remove Digicert CAA records from most domains - https://phabricator.wikimedia.org/T428093#11982863 (10taavi) >>! In T428093#11982860, @ssingh wrote: > So `payments` is a CNAME and hence we can't add a CAA record for it. Can we not add a CAA record to the CNAME targets (payments-... [19:03:41] 06Traffic, 13Patch-For-Review: Remove Digicert CAA records from most domains - https://phabricator.wikimedia.org/T428093#11982872 (10ssingh) >>! In T428093#11982863, @taavi wrote: >>>! In T428093#11982860, @ssingh wrote: >> So `payments` is a CNAME and hence we can't add a CAA record for it. > > Can we not ad... [19:20:55] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqsin, 06SRE: EQSIN: Setup VRRP on both routers for the new subnets - https://phabricator.wikimedia.org/T427393#11982896 (10BCornwall) When pooling cp5032 I noticed that connection to kafka-jumbo1016.eqiad.wmnet:9093 (`10.64.154.15 via 10.132.1.1 de... [19:26:35] brett: for https://phabricator.wikimedia.org/T427393#11982896 I think it's because https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common.yaml#L1651 needs updating [19:26:58] XioNoX: Aha, thank you! [19:37:23] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqsin, and 2 others: EQSIN: Setup VRRP on both routers for the new subnets - https://phabricator.wikimedia.org/T427393#11982956 (10BCornwall) @ayounsi helpfully pointed out that I needed to update hieradata/common.yaml with the new IP addresses. Thanks! [20:43:11] also remember to update wmf-config/reverse-proxy.php in operations/mediawiki-config.git for any new ranges cp hosts will be in [21:22:01] oh but of course... [21:26:10] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqsin, and 2 others: EQSIN: Setup VRRP on both routers for the new subnets - https://phabricator.wikimedia.org/T427393#11983173 (10BCornwall) [21:26:56] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqsin, and 2 others: EQSIN: Setup VRRP on both routers for the new subnets - https://phabricator.wikimedia.org/T427393#11983180 (10BCornwall) I was advised by @taavi to also update mediawiki-config's `wmf-config/reverse-proxy.php` ranges. I've updated... [21:27:34] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqsin, and 2 others: EQSIN: Setup VRRP on both routers for the new subnets - https://phabricator.wikimedia.org/T427393#11983181 (10BCornwall) [23:53:04] 06Traffic, 06Data-Persistence: Move thumbnail caching from upload cluster to text - https://phabricator.wikimedia.org/T427465#11983527 (10Ladsgroup) Moving images to under the subdomain of the wikis bring a lot of complexities. These are things to come to mind right now (and there might be more): How to do CSP...