[01:04:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp5032:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5032 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [01:05:09] FIRING: [8x] LVSHighCPU: The host lvs5005:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs5005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [01:09:40] FIRING: [8x] VarnishHighThreadCount: Varnish's thread count on cp5025:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [01:10:09] RESOLVED: [8x] LVSHighCPU: The host lvs5005:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs5005 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [01:14:40] FIRING: [10x] VarnishHighThreadCount: Varnish's thread count on cp5025:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [01:19:40] FIRING: [8x] VarnishHighThreadCount: Varnish's thread count on cp5025:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [02:04:40] FIRING: [16x] VarnishHighThreadCount: Varnish's thread count on cp5025:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [02:24:40] RESOLVED: [8x] VarnishHighThreadCount: Varnish's thread count on cp5025:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [07:36:23] 10netops, 06Infrastructure-Foundations, 10procurement, 06SRE, 13Patch-For-Review: Decom prod infra side of the ulsfo-office link - https://phabricator.wikimedia.org/T379778#10321324 (10ayounsi) [08:42:52] 06Traffic, 13Patch-For-Review: Package and deploy ATS 9.2.6 - https://phabricator.wikimedia.org/T379797#10321524 (10Vgutierrez) [08:53:35] 06Traffic: Upgrade haproxy to 2.8.12 on cp hosts - https://phabricator.wikimedia.org/T379891 (10Vgutierrez) 03NEW [08:53:46] 06Traffic: Upgrade haproxy to 2.8.12 on cp hosts - https://phabricator.wikimedia.org/T379891#10321554 (10Vgutierrez) p:05Triage→03Medium [09:24:49] 06Traffic: Upgrade haproxy to 2.8.12 on cp hosts - https://phabricator.wikimedia.org/T379891#10321669 (10Fabfur) a:03Fabfur [11:13:49] 06Traffic: Upgrade haproxy to 2.8.12 on cp hosts - https://phabricator.wikimedia.org/T379891#10321938 (10Fabfur) 05Open→03In progress That's been already rolled out by @Vgutierrez on cp4044 and cp4052. On monday, if no worrisome events are noticed on these two hosts, we'll proceed with a gradual rollout acr... [12:00:35] 10netops, 06Infrastructure-Foundations, 10netbox: Netbox: librenms report errors - https://phabricator.wikimedia.org/T379907 (10ayounsi) 03NEW [12:04:01] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Q1:eqiad:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371435#10322139 (10cmooney) >>! In T371435#10318507, @RobH wrote: > I'd hand this over to either John or Valerie as ops-eqiad for them to... [13:29:20] 10netops, 06Infrastructure-Foundations, 10procurement, 06SRE, 13Patch-For-Review: Decom prod infra side of the ulsfo-office link - https://phabricator.wikimedia.org/T379778#10322386 (10RobH) [14:11:28] 06Traffic, 10Observability-Alerting, 06SRE, 13Patch-For-Review: PuppetFailure alert is not being fired for host(s) where agent has failed - https://phabricator.wikimedia.org/T379807#10322574 (10ssingh) Thanks for the investigation and fix @colewhite! >>! In T379807#10319559, @colewhite wrote: > The issue... [14:51:18] 10netops, 06Infrastructure-Foundations, 06serviceops, 07Kubernetes: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790#10322697 (10akosiaris) Cool, thanks. In that case, I randomly picked `wikikube-w... [14:52:14] 10netops, 06Infrastructure-Foundations, 06serviceops, 07Kubernetes: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790#10322699 (10akosiaris) [15:44:47] 06Traffic: Package and deploy ATS 9.2.6 - https://phabricator.wikimedia.org/T379797#10322922 (10ssingh) [15:51:24] 06Traffic: Package and deploy ATS 9.2.6 - https://phabricator.wikimedia.org/T379797#10322960 (10ssingh) [15:52:00] FIRING: PurgedHighBacklogQueue: Large backlog queue for purged on cp4043:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=ulsfo%20prometheus/ops&var-instance=cp4043 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [16:09:04] sukhe: ^^is that you? [16:09:12] that's cp4043 yes [16:09:18] the 9.2.6 upgrade one [17:27:48] 06Traffic, 06Data-Persistence, 06SRE, 10SRE-swift-storage, and 6 others: Change default image thumbnail size - https://phabricator.wikimedia.org/T355914#10323409 (10Ladsgroup) Noting that we are starting to slowly drop all thumbnails in swift as a one-off clean up which would make the change in size of thu... [17:47:25] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: cr1-eqiad: disk failure - https://phabricator.wikimedia.org/T372781#10323538 (10Papaul) 05Open→03Resolved This is done, re0 is now the master. Closing this task ` re0.cr1-eqiad> show chassis routing-engine Routing Engine statu... [17:48:06] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10323542 (10Papaul) [19:39:39] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10324084 (10Papaul) [20:53:29] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 3 others: Decom prod infra side of the ulsfo-office link - https://phabricator.wikimedia.org/T379778#10324366 (10RobH) [20:58:00] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 3 others: Decom prod infra side of the ulsfo-office link - https://phabricator.wikimedia.org/T379778#10324353 (10RobH) [21:23:01] 10Wikimedia-Apache-configuration, 06SRE, 06Traffic-Icebox, 10Wiki-Setup (Delete / Redirect): redirect sco.wiktionary.org/wiki/(.*?) -> sco.wikipedia.org/wiki/Define:$1 - https://phabricator.wikimedia.org/T249648#10324468 (10Pppery) [22:14:00] FIRING: PurgedHighBacklogQueue: Large backlog queue for purged on cp4043:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=ulsfo%20prometheus/ops&var-instance=cp4043 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [22:34:53] ^ host is depooled. [22:35:07] brett: can you please downtime cp4043 for one day [22:35:16] I am not near a laptop [22:35:17] thanks [22:35:49] ack [22:37:35] done [22:38:43] <3