[07:53:44] 06Traffic, 10ops-magru, 06SRE, 13Patch-For-Review: magru: Incorrect racking for magru hosts (F-25G and Custom Config interchanged) - https://phabricator.wikimedia.org/T376737#10351574 (10ops-monitoring-bot) Draining ganeti7003.magru.wmnet of running VMs [08:38:49] 06Traffic: Package and deploy ATS 9.2.6 - https://phabricator.wikimedia.org/T379797#10351628 (10Fabfur) [08:50:00] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum7001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [08:55:00] RESOLVED: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on durum7001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [09:20:00] FIRING: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh7001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [09:25:00] RESOLVED: AnycastHealthcheckerRestarted: anycast-healthchecker service restarted on doh7001:9100 - https://wikitech.wikimedia.org/wiki/Anycast#Anycast_healthchecker_not_running - https://grafana.wikimedia.org/d/dxbfeGDZk/anycast?orgId=1&var-protocol=BGP&var-site=magru&var-cluster=All&var-ip_version=All - https://alerts.wikimedia.org/?q=alertname%3DAnycastHealthcheckerRestarted [10:21:20] 10netops, 10Cloud-Services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503#10352059 (10cmooney) Link has been clean since the optic was replaced: {F57745141 width=600} I'll sug... [10:35:56] 06Traffic: Package and deploy ATS 9.2.6 - https://phabricator.wikimedia.org/T379797#10352115 (10Fabfur) Upgraded in ulsfo with `cookbook sre.cdn.roll-upgrade-ats --query 'A:cp-ulsfo and not (P{cp4043.*} or P{cp4051.*})' --reason '9.2.6 upgrade T379797' --version '9.2.6-1wm2' upgrade` [11:28:34] 06Traffic, 10ops-magru, 06SRE, 13Patch-For-Review: magru: Incorrect racking for magru hosts (F-25G and Custom Config interchanged) - https://phabricator.wikimedia.org/T376737#10352383 (10MoritzMuehlenhoff) All VMs have moved away from ganeti7003/ganeti7004 and I've switched them to the insetup::infrastruct... [11:31:24] 10netops, 10Cloud-Services, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Replace optics in cloudsw1-d5-eqiad et-0/0/52 and cloudsw1-e4-eqiad et-0/0/54 - https://phabricator.wikimedia.org/T380503#10352382 (10cmooney) Ok the BGP downpref policy has been reverted, and we have routed traffic back runn... [13:14:54] 06Traffic, 10ops-magru, 06SRE, 13Patch-For-Review: magru: Incorrect racking for magru hosts (F-25G and Custom Config interchanged) - https://phabricator.wikimedia.org/T376737#10352818 (10RobH) {F57745607} {F57745609} [13:55:20] 06Traffic, 10ops-magru, 06SRE, 13Patch-For-Review: magru: Incorrect racking for magru hosts (F-25G and Custom Config interchanged) - https://phabricator.wikimedia.org/T376737#10353072 (10RobH) So the first swap went with some issues, detailed in my followup to the ticket just now: > Support, > > Please... [14:16:18] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10353154 (10RobH) [14:35:18] 06Traffic, 10ops-magru, 06SRE, 13Patch-For-Review: magru: Incorrect racking for magru hosts (F-25G and Custom Config interchanged) - https://phabricator.wikimedia.org/T376737#10353260 (10RobH) Ok, they introduced another mistake: > Support, > > 1HR3PZ3 shows a power supply failure (please check the pow... [16:59:10] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354319 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `ganeti7003.magru.wmnet` - ganeti70... [17:10:21] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354380 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `cp7006.magru.wmnet` - cp7006.magru... [17:20:10] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354407 (10RobH) [17:24:01] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354419 (10RobH) [17:39:48] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354527 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `ganeti7004.magru.wmnet` - ganeti70... [17:43:29] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354538 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `cp7008.magru.wmnet` - cp7008.magru... [17:43:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.224:443 @ cp7001 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [17:48:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.224:443 @ cp7001 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [17:59:07] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354620 (10RobH) [18:18:18] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354737 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `lvs7003.magru.wmnet` - lvs7003.mag... [18:28:27] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354804 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin2002 for hosts: `cp7015.magru.wmnet` - cp7015.magru... [18:43:46] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354896 (10RobH) [18:49:20] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354915 (10Fabfur) [18:55:36] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354939 (10RobH) [18:58:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.224:443 @ cp7006 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [18:59:52] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354962 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin2002 for host ganeti7003.magru.wmnet with OS... [19:03:17] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354984 (10BCornwall) [19:03:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.224:443 @ cp7006 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [19:05:47] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10354992 (10BCornwall) [19:59:04] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10355125 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin2002 for host ganeti7003.magru.wmnet with OS boo... [19:59:21] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10355138 (10RobH) [20:00:11] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10355146 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin2002 for host ganeti7004.magru.wmnet with OS... [20:15:33] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10355192 (10RobH) [20:38:07] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10355245 (10RobH) [20:52:00] 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: installation tracking for hosts affected by magru re-shuffle - https://phabricator.wikimedia.org/T380307#10355267 (10RobH) [23:15:00] sukhe: brett: [23:15:33] sukhe: brett: (sorry hit enter too early) made calendar event for tomorrow, ~22 hours from now. Patch set we'll be working off of is up here https://gerrit.wikimedia.org/r/c/operations/puppet/+/1094074/ [23:16:32] er, starting 20 hours from now not 22 hours [23:27:40] one thing I'm unsure about: for step #4 https://gerrit.wikimedia.org/r/c/operations/puppet/+/1094074/ (adding the service catalog entry) I removed the eqiad-site-specific stuff. I'm not sure what the protocol will be when it's eventually time to add eqiad though; ie if we have codfw deployed to production already will weird stuff happen if we move from `production` to `lvs_setup` when doing the rolling restart for eqiad