[05:45:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp1114:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqiad&var-instance=cp1114 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [05:55:40] RESOLVED: VarnishHighThreadCount: Varnish's thread count on cp1114:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqiad&var-instance=cp1114 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [06:25:39] 06Traffic, 10DNS: authdns-update failing - https://phabricator.wikimedia.org/T428541 (10Marostegui) 03NEW [06:30:50] 06Traffic, 10DNS, 06SRE: authdns-update failing - https://phabricator.wikimedia.org/T428541#11997424 (10FCeratto-WMF) This is also preventing https://gerrit.wikimedia.org/r/c/operations/dns/+/1286411 from being merged [06:35:02] 06Traffic, 10DNS, 06SRE: authdns-update failing - https://phabricator.wikimedia.org/T428541#11997431 (10Marostegui) p:05Triage→03High [06:58:20] 06Traffic, 10DNS, 06SRE: authdns-update failing - https://phabricator.wikimedia.org/T428541#11997473 (10AlexisJazz) I'm seeing "Failed to fetch notifications." when I try to view my notifications on enwiktionary, is this related? I'm also seeing ``https://en.wiktionary.org/api/rest_v1/page/html/non_sequitur... [07:08:05] 06Traffic, 10DNS, 06SRE: authdns-update failing - https://phabricator.wikimedia.org/T428541#11997504 (10elukey) The error is: ` netbox/codfw.wmnet:539 dse-k8s-wdqs2002.codfw.wmnet. A 10.192.47.10 netbox/codfw.wmnet:540 dse-k8s-wdqs2002.codfw.wmnet. AAAA 2620:0:860:125:10:192:46:9 netbox/codfw.wm... [07:10:22] 06Traffic, 10DNS, 06SRE: authdns-update failing - https://phabricator.wikimedia.org/T428541#11997506 (10elukey) I think the DNS name for https://netbox.wikimedia.org/ipam/ip-addresses/23579/ was 2002 instead of 2001, fixed it. [07:24:11] 06Traffic, 10DNS, 06SRE: authdns-update failing - https://phabricator.wikimedia.org/T428541#11997541 (10elukey) 05Open→03Resolved a:03elukey Confirmed, all good! The fix was https://netbox.wikimedia.org/extras/changelog/280023/ @AlexisJazz it shouldn't be related, it was just a minor DNS update th... [13:21:58] 10netops, 06Traffic, 06Discovery-Search, 06Infrastructure-Foundations, and 3 others: codfw: rack A4 maintenance - https://phabricator.wikimedia.org/T427357#11999553 (10ayounsi) 05Open→03Resolved a:03ayounsi Maintenance done, all servers except Ganeti and the ones mentioned by @jcrespo have been... [13:26:48] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: pod AB switches upgrade (2026) - https://phabricator.wikimedia.org/T426197#11999603 (10ayounsi) [13:27:42] 10netops, 06Traffic, 06Discovery-Search, 06Infrastructure-Foundations, and 3 others: codfw: rack A4 maintenance - https://phabricator.wikimedia.org/T427357#11999608 (10MatthewVernon) ms swift in codfw looks OK after this work, thanks. [13:51:39] 10netops, 06Traffic, 06Discovery-Search, 06Infrastructure-Foundations, and 3 others: codfw: rack A4 maintenance - https://phabricator.wikimedia.org/T427357#11999792 (10ops-monitoring-bot) VM netflow2004.codfw.wmnet switching disk type to drbd [14:04:20] 10netops, 06Traffic, 06Discovery-Search, 06Infrastructure-Foundations, and 3 others: codfw: rack A4 maintenance - https://phabricator.wikimedia.org/T427357#11999859 (10ops-monitoring-bot) VM rpki2003.codfw.wmnet switching disk type to drbd [14:08:00] 10netops, 06Traffic, 06Discovery-Search, 06Infrastructure-Foundations, and 3 others: codfw: rack A4 maintenance - https://phabricator.wikimedia.org/T427357#11999886 (10MoritzMuehlenhoff) All Ganeti nodes are back in service [14:38:50] 06Traffic, 10DNS, 06SRE: 10.67.28.73 reverse DNS showing 2(SERVFAIL) - https://phabricator.wikimedia.org/T428573#12000128 (10cmooney) Thanks @Marostegui for the task. There are a few components here that need to work. Firstly our authdns servers are responsible for the entire 10.0.0.0/8 range (10.in-addr.a... [15:51:30] 06Traffic, 06Infrastructure-Foundations: eqsin: re-image rack 604 servers on new vlan - https://phabricator.wikimedia.org/T428229#12000666 (10Papaul) @BCornwall hello just wanted to follow up on this to know when do you think you or your team will be done with the re-image. Our goal is to have all the re-image... [16:06:00] 06Traffic, 06Infrastructure-Foundations: eqsin: re-image rack 604 servers on new vlan - https://phabricator.wikimedia.org/T428229#12000762 (10BCornwall) Hi! We'll be working on these shortly, I was planning on starting today. [17:52:36] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE: cp5022 is unreachable - https://phabricator.wikimedia.org/T414411#12001497 (10ssingh) Update: Based on the discussion in the Traffic meeting, we would like to pursue the option of getting the additional CPU in July 2026. CC @wiki_willy and @RobH -- please let us kno... [17:54:49] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE: cp5022 is unreachable - https://phabricator.wikimedia.org/T414411#12001508 (10RobH) >>! In T414411#12001497, @ssingh wrote: > Update: Based on the discussion in the Traffic meeting, we would like to pursue the option of getting the additional CPU in July 2026. CC @w... [18:02:07] 06Traffic, 10DNS, 06SRE: 10.67.28.73 reverse DNS showing 2(SERVFAIL) - https://phabricator.wikimedia.org/T428573#12001514 (10cmooney) Seems the IP is in use for //mediawiki-dumps-legacy//: ` cmooney@dse-k8s-ctrl1001:~$ sudo kubectl get pods -o wide --all-namespaces | grep 10.67.28.73 mediawiki-dumps-legacy... [18:33:38] 06Traffic, 06Infrastructure-Foundations, 13Patch-For-Review: eqsin: re-image rack 604 servers on new vlan - https://phabricator.wikimedia.org/T428229#12001641 (10Papaul) Thank you [19:23:20] 06Traffic, 10Diff-blog, 10Technical Blog: Redirect techblog.wikimedia.org to diff.wikimedia.org - https://phabricator.wikimedia.org/T417940#12001842 (10CKoerner_WMF) >>! In T417940#11986184, @Aklapper wrote: > ? Are these followup tasks tracked somewhere? Ah! Thank you for identifying these. I'll talk to th... [19:58:04] 06Traffic, 10Diff-blog, 10Technical Blog: Tech blog migration clean up - https://phabricator.wikimedia.org/T428675 (10LGoto) 03NEW [19:59:44] 06Traffic, 10Diff-blog, 10Technical Blog: Redirect techblog.wikimedia.org to diff.wikimedia.org - https://phabricator.wikimedia.org/T417940#12002012 (10LGoto) Hi @Aklapper we've pulled the followup work into this subtask: https://phabricator.wikimedia.org/T428675 Please feel free to add anything we've missed... [20:04:10] 06Traffic, 10Diff-blog, 10Technical Blog: Tech blog migration clean up - https://phabricator.wikimedia.org/T428675#12002085 (10LGoto) [20:13:09] 06Traffic, 10Diff-blog, 10Technical Blog: Tech blog migration clean up - https://phabricator.wikimedia.org/T428675#12002116 (10LGoto) [21:06:18] 06Traffic, 10DNS, 06SRE: 10.67.28.73 reverse DNS showing 2(SERVFAIL) - https://phabricator.wikimedia.org/T428573#12002497 (10cmooney) @CDanis not sure if you have any thoughts here? I think because this is a job and not a service endpoint there is no DNS created. And from a bit of brief reading it doesn't...