[00:12:25] FIRING: SystemdUnitFailed: sync-puppet-volatile.service on puppetserver2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:22:25] FIRING: [3x] SystemdUnitFailed: sync-puppet-volatile.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:27:25] FIRING: [5x] SystemdUnitFailed: sync-puppet-volatile.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:32:25] FIRING: [5x] SystemdUnitFailed: sync-puppet-volatile.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:37:25] FIRING: [5x] SystemdUnitFailed: sync-puppet-volatile.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:42:25] FIRING: [6x] SystemdUnitFailed: dump_ip_reputation.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:47:25] FIRING: [6x] SystemdUnitFailed: dump_ip_reputation.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:42:31] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: Q1:codfw:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371434#10139563 (10Papaul) >>! In T371434#10120335, @cmooney wrote: >>>! In T371434#10119784, @Papaul wrote: >> The diagram below will ou... [03:55:47] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 10ops-codfw: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587 (10Papaul) 03NEW [03:55:56] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 10ops-codfw: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587#10139580 (10Papaul) p:05Triage→03Medium [03:56:49] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 10ops-codfw: codfw:frack:rack/install/configuration new switches - https://phabricator.wikimedia.org/T374587#10139581 (10Papaul) [04:27:25] FIRING: [2x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:11:35] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10139747 (10ABran-WMF) ES replication source in the path has been moved (T374592), all remaining hosts are depoolable [08:27:25] FIRING: [2x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:04:59] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move db2209 uplink from asw-c5-codfw to lsw1-c5-codfw - https://phabricator.wikimedia.org/T374523#10139909 (10ABran-WMF) >>! In T374523#10136865, @cmooney wrote: >>>! In T374523#10136856, @ABran-WMF wrote: >> I'll get to T374425 to get... [09:14:02] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move db2209 uplink from asw-c5-codfw to lsw1-c5-codfw - https://phabricator.wikimedia.org/T374523#10139921 (10cmooney) >>! In T374523#10139909, @ABran-WMF wrote: > We can add it to today's maintenance if you're up to it. Let me know so... [09:14:33] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move db2209 uplink from asw-c5-codfw to lsw1-c5-codfw - https://phabricator.wikimedia.org/T374523#10139922 (10ABran-WMF) ack, adding it to the pile [10:22:25] FIRING: [2x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:42:25] FIRING: [2x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:47:25] FIRING: [2x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:40:25] 10netops, 06Infrastructure-Foundations, 06SRE: Create alerting for saturation on sub-rated interfaces - https://phabricator.wikimedia.org/T374614 (10cmooney) 03NEW p:05Triage→03Medium [13:03:12] 10netops, 06Infrastructure-Foundations, 06SRE: Create alerting for saturation on sub-rated interfaces - https://phabricator.wikimedia.org/T374614#10140660 (10cmooney) [13:22:07] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Spicerack: expand Supermicro support in the Redfish module - https://phabricator.wikimedia.org/T365372#10140725 (10elukey) Nasty issue found for sretest2001: T365167#10140713 In the provision cookbook we loop through t... [13:22:50] 10netops, 06Infrastructure-Foundations, 06SRE: Alert when anycast-healthchecker withdraws BGP route - https://phabricator.wikimedia.org/T374619 (10cmooney) 03NEW p:05Triage→03Low [13:34:43] 10netops, 06Infrastructure-Foundations, 06SRE: Alert when anycast-healthchecker withdraws BGP route - https://phabricator.wikimedia.org/T374619#10140791 (10cmooney) [13:56:21] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic: Alert when anycast-healthchecker withdraws BGP route - https://phabricator.wikimedia.org/T374619#10140887 (10ssingh) [13:57:46] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic: Alert when anycast-healthchecker withdraws BGP route - https://phabricator.wikimedia.org/T374619#10140885 (10ssingh) Thanks for filing this task! This is indeed something we have discussed in the past but not formally so let's use this task to do th... [14:47:30] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Spicerack: expand Supermicro support in the Redfish module - https://phabricator.wikimedia.org/T365372#10141141 (10elukey) >>! In T365372#10140725, @elukey wrote: > Nasty issue found for sretest2001: T365167#10140713 >... [15:27:35] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10141318 (10jcrespo) I've stopped codfw media backups. @cmooney Would it be possible to get preferencial time on mai... [15:40:50] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10141367 (10ABran-WMF) @cmooney all nodes have been depooled [15:48:55] FIRING: SystemdUnitFailed: dump_ip_reputation.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:50:48] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10141418 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=bb570977-8737-4373-95ac-3765685f6e5e) set... [15:51:39] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10141420 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=5073d83c-c18b-41a0-aa78-a6da63b209f9) set... [16:19:05] 10netops, 06collaboration-services, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10141576 (10cmooney) Everything moved successfully, all ports up on the new switch and everything responding to ping a... [16:30:38] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Move db2209 uplink from asw-c5-codfw to lsw1-c5-codfw - https://phabricator.wikimedia.org/T374523#10141626 (10cmooney) Will re-schedule for Tuesday Sep 17th [16:30:51] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: Migrate servers in codfw racks C6 & C7 from asw to lsw - https://phabricator.wikimedia.org/T373101#10141621 (10cmooney) 05Open→03Resolved a:03cmooney [17:22:54] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. - https://phabricator.wikimedia.org/T373942#10141995 (10Jgreen) [17:33:25] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Move sretest2002 primary uplink to asw-d4-codfw - https://phabricator.wikimedia.org/T370475#10142054 (10cmooney) So once we have completed the move for D4 next Tuesday I have a (hopefully) small request. Could the sretest2002 uplinks... [18:18:53] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Move sretest2002 primary uplink to asw-d4-codfw - https://phabricator.wikimedia.org/T370475#10142127 (10Jhancock.wm) yeah can do [19:48:55] FIRING: SystemdUnitFailed: dump_ip_reputation.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:27:41] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: lvs2011: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370891#10142876 (10Papaul) a:03Papaul [23:28:05] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 3 others: lvs2013: move uplink to lsw1-c2-codfw and connect to per-rack vlan - https://phabricator.wikimedia.org/T370927#10142877 (10Papaul) a:03Papaul [23:48:55] FIRING: SystemdUnitFailed: dump_ip_reputation.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed