[00:08:26] FIRING: [43x] SystemdUnitFailed: user@11984.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:19:03] FIRING: PuppetFailure: Puppet has failed on cumin2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [00:37:50] FIRING: DiskSpace: Disk space build2001:9100:/ 1.43% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=build2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:08:41] FIRING: [43x] SystemdUnitFailed: user@11984.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:11:53] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11920464 (10Papaul) [04:19:04] FIRING: PuppetFailure: Puppet has failed on cumin2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [04:37:50] FIRING: DiskSpace: Disk space build2001:9100:/ 1.429% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=build2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [05:03:33] 10netops, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A2 maintenance - https://phabricator.wikimedia.org/T426199#11920516 (10Marostegui) @ayounsi @FCeratto-WMF dbprpxy hosts will need a `systemctl reload haproxy` after the maintenance is done, even if they are not active. [08:08:41] FIRING: [43x] SystemdUnitFailed: user@11984.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:12:35] RESOLVED: DiskSpace: Disk space build2001:9100:/ 1.435% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=build2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [08:19:03] FIRING: PuppetFailure: Puppet has failed on cumin2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [08:57:48] hi, just a small heads up, the homer repo seems to be failing to pull "due to local changes" on cumin2002. Not impacting me, just reporting it in case it was not known or expected. [08:58:19] worried it could cause some issue if running an operation from that host [09:47:40] jynus: sorry this is my fault, I was made aware yesterday and got distracted, I'll fix it now [09:47:59] (it is the result of a manual chmod I did to try and fix an issue and as I could have predicted just made things worse) [09:50:11] no worries [09:50:17] no need to say sorry [09:50:56] just was giving a heads up in case it could cause issues on network updates or anything [09:58:49] RESOLVED: PuppetFailure: Puppet has failed on cumin2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [10:07:04] jynus: no probs thanks for the heads up, all fixed now thanks [10:21:02] 10netops, 06Infrastructure-Foundations, 06SRE: Create single Homer BGP group template to cover all variants - https://phabricator.wikimedia.org/T349116#11921022 (10cmooney) 05Open→03Declined Closing this one for now. We do need to look a this, but also we need to review in light of having both Junip... [10:24:42] 10netops, 06Infrastructure-Foundations, 06ServiceOps new: codfw: rack A2 maintenance - https://phabricator.wikimedia.org/T426199#11921066 (10jcrespo) @ayounsi not urgent, but please ping me with a time of start of maintenance when you have it so I can do what I mention above in advance. [10:33:56] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic: ASW single-point of failure for LVS VIPs at POPs - https://phabricator.wikimedia.org/T362772#11921091 (10cmooney) 05Open→03Resolved a:03cmooney I'm going to close this one. I think everyone is agreed cross-rack links to have an LVS peer... [10:44:42] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Network telemetry - collect device sub-interface statistics with gnmic - https://phabricator.wikimedia.org/T424683#11921120 (10cmooney) 05Open→03Resolved Ok this is rolled out and working. I have tried to update our dashboards wher... [10:49:19] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Network telemetry - collect device sub-interface statistics with gnmic - https://phabricator.wikimedia.org/T424683#11921125 (10cmooney) {F81386939 width=600} [11:15:41] 10netops, 06Infrastructure-Foundations: Apply egress Source Address Validation on the Wikimedia core routers - https://phabricator.wikimedia.org/T372158#11921238 (10cmooney) This is basically [[ https://datatracker.ietf.org/doc/html/rfc2827 | BCP38 ]] right? >>! In T372158#10097098, @Southparkfan wrote: >>>... [12:08:41] FIRING: [43x] SystemdUnitFailed: user@11984.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:03:26] FIRING: [44x] SystemdUnitFailed: user@11984.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:54:08] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11921702 (10Jhancock.wm) [14:08:22] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11921799 (10Jhancock.wm) [14:08:32] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11921800 (10Jhancock.wm) [14:12:08] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11921818 (10Jhancock.wm) [14:16:06] 10Packaging, 06Abstract Wikipedia team, 10function-evaluator, 06Infrastructure-Foundations, 03Abstract Wikipedia Fix-It tasks: Package rustc from forky for wikimedia-bookworm so we can use it in an image like abstractwiki-rust - https://phabricator.wikimedia.org/T425341#11921828 (10Jdforrester-WMF) Provi... [14:17:02] 10Packaging, 06Abstract Wikipedia team, 10function-evaluator, 06Infrastructure-Foundations, 03Abstract Wikipedia Fix-It tasks: Package rustc from forky for wikimedia-bookworm so we can use it in an image like abstractwiki-rust - https://phabricator.wikimedia.org/T425341#11921832 (10Jdforrester-WMF) [14:42:53] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: FY2526 Q3 ulsfo: switch refresh - https://phabricator.wikimedia.org/T408510#11921972 (10RobH) [15:15:57] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11922147 (10Jhancock.wm) [15:37:47] 10Mail, 06Infrastructure-Foundations, 06SRE: Replace Exim with Postfix on mail servers - https://phabricator.wikimedia.org/T325394#11922256 (10bd808) [15:42:28] 10Mail, 06Infrastructure-Foundations, 06SRE: Replace Exim with Postfix on mail servers - https://phabricator.wikimedia.org/T325394#11922304 (10bd808) [16:43:40] 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: POPs - free up 2xQSFP ports - https://phabricator.wikimedia.org/T424611#11922700 (10cmooney) @RobH we've disabled the following links as part of the work on this, and will need to remove the current fibres and optic modules before we re-use the port... [17:03:41] FIRING: [43x] SystemdUnitFailed: user@11984.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:06:00] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Install new line card in cr2-eqiad slot 0, move card from slot 1 to cr1-eqiad slot 0 and configure - https://phabricator.wikimedia.org/T426343 (10cmooney) 03NEW p:05Triage→03Medium [17:06:38] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Install new line card in cr2-eqiad slot 0, move card from slot 1 to cr1-eqiad slot 0 and configure - https://phabricator.wikimedia.org/T426343#11922831 (10cmooney) [20:12:45] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11923273 (10Papaul) [20:34:59] 10netops, 06Infrastructure-Foundations, 06Data-Platform-SRE (2026-04-24 - 2026-05-15): Create depool hiera keys for cirrussearch hosts - https://phabricator.wikimedia.org/T426228#11923343 (10bking) 05Open→03In progress p:05Triage→03Medium a:03bking [20:42:06] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11923368 (10Papaul) [21:03:41] FIRING: [43x] SystemdUnitFailed: user@11984.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:45:43] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11923791 (10Papaul) [22:46:35] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11923792 (10Papaul) 05Open→03Resolved This is complete. thanks to @Jhancock.wm and @Jgreen