[09:39:05] 06Traffic: Provide debian packages for liberica - https://phabricator.wikimedia.org/T376600#10343022 (10Vgutierrez) 05Open→03Resolved [09:39:07] 06Traffic: Test liberica BGP support - https://phabricator.wikimedia.org/T375464#10343019 (10Vgutierrez) 05Open→03Resolved liberica is using BGP as expected on lvs1013 [09:39:38] 06Traffic: harden liberica systemd service units - https://phabricator.wikimedia.org/T378341#10343024 (10Vgutierrez) 05In progress→03Resolved [09:40:41] 06Traffic: bring katran to liberica - https://phabricator.wikimedia.org/T380450 (10Vgutierrez) 03NEW [09:41:18] 06Traffic: bring katran to liberica - https://phabricator.wikimedia.org/T380450#10343040 (10Vgutierrez) p:05Triage→03High [10:18:33] 10netops, 06Infrastructure-Foundations: Lumen codfw-ulsfo down (Nov 2024) - https://phabricator.wikimedia.org/T380451 (10ayounsi) 03NEW p:05Triage→03High [10:19:44] 10netops, 06Infrastructure-Foundations: Lumen codfw-ulsfo down (Nov 2024) - https://phabricator.wikimedia.org/T380451#10343097 (10ops-monitoring-bot) ===== Automated diagnostic for Netbox circuit ID 102 --- **Interface cr4-ulsfo:xe-0/1/1** - admin-status: up - ⚠️ oper-status: down - interface-flapped: 2024-... [10:24:58] 10netops, 06Infrastructure-Foundations: Lumen codfw-ulsfo down (Nov 2024) - https://phabricator.wikimedia.org/T380451#10343105 (10ayounsi) > Ticket ID 30654682 has been successfully created. [11:07:10] 06Traffic, 10MediaWiki-Core-HTTP-Cache, 10MediaWiki-REST-API, 06MW-Interfaces-Team, and 2 others: Determine http cache control and active purging for REST endpoints serving parsoid output - https://phabricator.wikimedia.org/T308424#10343277 (10MSantos) [11:23:30] 06Traffic, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, 06serviceops: Switchover plan from restbase to api gateway for Citoid - https://phabricator.wikimedia.org/T361576#10343341 (10MSantos) [11:41:25] 10netops, 10Ceph, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Configure DSCP marking for cloudceph* hosts - https://phabricator.wikimedia.org/T371501#10343409 (10BTullis) [11:44:51] 06Traffic, 10MediaWiki-Core-HTTP-Cache, 10MediaWiki-REST-API, 06MW-Interfaces-Team, and 2 others: Determine http cache control and active purging for REST endpoints serving parsoid output - https://phabricator.wikimedia.org/T308424#10343419 (10MSantos) a:05DAlangi_WMF→03None [12:43:58] 10netops, 06Traffic, 06Infrastructure-Foundations: eqiad/esams/drmrs LVS: use Netbox BGP flag - https://phabricator.wikimedia.org/T380469 (10ayounsi) 03NEW [12:56:33] 06Traffic, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, 06serviceops: Switchover plan from restbase to api gateway for Citoid - https://phabricator.wikimedia.org/T361576#10343780 (10Mvolz) Once we do the switchover, Citoid endpoint: /api?format=&search= Becomes optional, as that's... [13:10:32] 06Traffic: Package and deploy ATS 9.2.6 - https://phabricator.wikimedia.org/T379797#10343802 (10Fabfur) [13:45:43] 06Traffic, 06Data-Platform-SRE: Allow TLS authenticated client to write on new topics - https://phabricator.wikimedia.org/T380373#10343910 (10brouberol) Just to make sure, that'd be on `kafka-jumbo`? [13:52:45] 10netops, 06Infrastructure-Foundations: Lumen codfw-ulsfo down (Nov 2024) - https://phabricator.wikimedia.org/T380451#10343916 (10cmooney) Seems to be back up: ` Nov 21 13:51:02 cr4-ulsfo rpd[29932]: RPD_OSPF_NBRUP: OSPF neighbor (realm ospf-v2 xe-0/1/1.0 area state changed from Exchang... [14:18:48] hi traffic, i'd like to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/1092914 today, any preferences on when or how? [14:23:26] whenever you want, I'm here in case I can help! [14:42:25] 06Traffic, 06Data-Platform-SRE: Allow TLS authenticated client to write on new topics - https://phabricator.wikimedia.org/T380373#10344115 (10Fabfur) >>! In T380373#10343910, @brouberol wrote: > Just to make sure, that'd be on `kafka-jumbo`? Yes, haproxykafka will only use kafka-jumbo [14:42:45] ok, net effect of the change is we add one log-only rule to the cp-text servers there [14:44:35] 👍 [14:46:04] ... oh lmao [14:46:26] ? [14:47:04] I broke a dashboard :D [14:47:06] fixing it now [14:47:32] https://i.imgur.com/3hIMVjs.png [14:47:56] https://i.imgur.com/5fxJMJp.png [14:47:57] curious [14:48:03] it makes perfect sense actually [14:49:04] sum(rate(haproxy_backend_bytes_out_total{proxy=~"tls|requestctl_filters", site="$site", cluster=~"cache_$cluster"}[5m])) by (instance) [14:49:47] ok, it is deployed on all codfw cp-text [14:56:24] 06Traffic, 10conftool, 13Patch-For-Review: Integrate requestctl haproxy rules into our TLS terminator - https://phabricator.wikimedia.org/T370745#10344157 (10CDanis) Now deployed on all cp hosts in codfw. This of course caused a minor dashboard issue, in all plots that use `haproxy_backend_` metrics and lo... [15:09:03] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 4 others: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790#10344210 (10VRiley-WMF) A 10G connection has been placed into port 25 a... [15:14:31] 10netops, 06Infrastructure-Foundations: Lumen codfw-ulsfo down (Nov 2024) - https://phabricator.wikimedia.org/T380451#10344249 (10cmooney) 05Open→03Resolved a:03cmooney From Lumen: ` 2024-11-21 12:38:45 GMT - Splicing has commenced at the south end. Service affecting alarms are expected to begin clea... [16:32:36] 10netops, 10Cloud-Services, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503 (10cmooney) 03NEW p:05Triage→03High The #Cloud-Services project tag is not intended to have any... [16:35:55] 10netops, 10Cloud-Services, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503#10344824 (10JJMC89) [16:42:05] 10netops, 10Cloud-Services, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503#10344918 (10dcaro) [17:04:29] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, and 4 others: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790#10345111 (10cmooney) >>! In T379790#10344210, @VRiley-WMF wrote: > A 10... [17:16:20] 06Traffic, 06Data-Platform-SRE: Allow TLS authenticated client to write on new topics - https://phabricator.wikimedia.org/T380373#10345275 (10elukey) +1 from my side! [17:45:43] 06Traffic, 06Data-Platform-SRE: Allow TLS authenticated client to write on new topics - https://phabricator.wikimedia.org/T380373#10345543 (10Fabfur) 05Open→03Resolved Done from kafka-jumbo1009 [17:48:02] 10netops, 10Cloud-Services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503#10345576 (10dcaro) [17:54:55] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: codfw: cp2038 Correctable memory error on DIMM A3 - https://phabricator.wikimedia.org/T308459#10345600 (10Papaul) 05Resolved→03Open a:05Papaul→03Jhancock.wm Reopen this task since we are now seeing the error on DIMM B3. @Jhancock.wm since this server is out... [17:57:11] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: codfw: cp2038 Correctable memory error on DIMM A3 - https://phabricator.wikimedia.org/T308459#10345617 (10Jhancock.wm) We have that on hand. @Vgutierrez (or anyone else in traffic) when is a good time to do this swap? [17:58:38] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: codfw: cp2038 Correctable memory error on DIMM A3 - https://phabricator.wikimedia.org/T308459#10345625 (10ssingh) Hi Jenn. The host has been depooled so you can do it whenever you want. Thanks! [18:05:43] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: codfw: cp2038 Correctable memory error on DIMM A3 - https://phabricator.wikimedia.org/T308459#10345652 (10Jhancock.wm) replaced with a new DIMM. coming up now [18:07:13] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: codfw: cp2038 Correctable memory error on DIMM A3 - https://phabricator.wikimedia.org/T308459#10345658 (10Jhancock.wm) a:05Jhancock.wm→03Papaul [18:08:40] FIRING: VarnishPrometheusExporterDown: Varnish Exporter on instance cp2038:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [18:09:15] ^ downtied [20:18:40] RESOLVED: VarnishPrometheusExporterDown: Varnish Exporter on instance cp2038:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [20:19:38] FIRING: [4x] LVSRealserverMSS: Unexpected MSS value on @ cp2038 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=codfw&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [20:19:54] ^ server rebooted after maintenance [20:21:31] fixed [20:24:38] RESOLVED: [4x] LVSRealserverMSS: Unexpected MSS value on @ cp2038 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=codfw&var-cluster=cache_upload - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [20:25:18] 06Traffic, 06DC-Ops, 10ops-codfw, 06SRE: codfw: cp2038 Correctable memory error on DIMM A3 - https://phabricator.wikimedia.org/T308459#10346277 (10ssingh) 05Open→03Resolved [20:40:35] 06Traffic, 10conftool, 13Patch-For-Review: Integrate requestctl haproxy rules into our TLS terminator - https://phabricator.wikimedia.org/T370745#10346333 (10CDanis) The good news: this works, we now see some `hap:` prefix rules appearing in x-requestctl. The bad news: especially on text, we aren't reusing... [20:42:07] 06Traffic, 10conftool, 13Patch-For-Review: Integrate requestctl haproxy rules into our TLS terminator - https://phabricator.wikimedia.org/T370745#10346337 (10Reedy) [23:27:19] Hi, I'm working on getting puppet patches going for wdqs *internal* graph split services that we are working on launching [23:27:34] (We've done this previously for public wdqs, now it's the same drill for wdqs-internal basically) [23:28:05] Are there any dependencies for step 3 where I allocate the VIPs in netbox? See https://wikitech.wikimedia.org/wiki/LVS#DNS_changes_(svc_zone_only) and https://wikitech.wikimedia.org/wiki/DNS/Netbox#How_to_manually_allocate_a_special_purpose_IP_address_in_Netbox [23:29:29] I've got the pages open in netbox to create the new IPs, (and I'm aware that I need to commit the dns changes afterwards once I actually press the buttons), but just wanted to check if anything needs to be in place before I allocate the IPs. (for ex the backend servers aren't ready yet etc, I don't think that's an issue but wanted to check)