[08:10:29] 10netops, 10Ceph, 06Infrastructure-Foundations: cephosd advertised v6 prefix flapping - https://phabricator.wikimedia.org/T376697 (10ayounsi) 03NEW [08:48:29] thank you XioNoX for the review. Just one last question: there are some nodes on Icinga called "ripe-atlas-" and "ripe-atlas- IPv6". I believe they are just placeholders for the checks we want to decommission. As far as I can see, these placeholders add an additional ICMP check to the RIPE Atlas anchor, but they don't provide any extra information. Please correct me [08:48:31] if I’m wrong. Can we delete both the checks and the placeholders? [09:08:50] 10SRE-tools, 06Data-Persistence-SRE, 10Spicerack: mysql_legacy data_directory getter - https://phabricator.wikimedia.org/T376701 (10ABran-WMF) 03NEW [09:12:46] 10SRE-tools, 06Data-Persistence-SRE, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: mysql_legacy data_directory getter - https://phabricator.wikimedia.org/T376701#10209765 (10ABran-WMF) 05Open→03In progress [09:29:04] 10SRE-tools, 06Data-Persistence-SRE, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: mysql_legacy data_directory getter - https://phabricator.wikimedia.org/T376701#10209803 (10ABran-WMF) p:05Triage→03Medium [10:09:15] 10SRE-tools, 06Infrastructure-Foundations, 06serviceops-radar: Race condition on puppetdb in sre.hosts.rename cookbook - https://phabricator.wikimedia.org/T374351#10209938 (10Clement_Goubert) 05Open→03Resolved I don't think this has reoccurred during the rest of the rename campaign, resolving [10:52:00] FIRING: [2x] CertAlmostExpired: Certificate for service cloudidm2001-dev:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#cloudidm2001-dev:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [12:43:55] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack: Spicerack: expand Supermicro support in the Redfish module - https://phabricator.wikimedia.org/T365372#10210397 (10elukey) The new version of the cookbook is deployed, I am running it on insetup hosts listed in T376121 so we can apply the sa... [12:47:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:55:59] 10SRE-tools, 06Data-Persistence-SRE, 10Spicerack: mysql_legacy: SQL query quote escape - https://phabricator.wikimedia.org/T376712 (10ABran-WMF) 03NEW [13:00:03] 10SRE-tools, 06Data-Persistence-SRE, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: mysql_legacy: SQL query quote escape - https://phabricator.wikimedia.org/T376712#10210469 (10ABran-WMF) p:05Triage→03Medium [13:01:10] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack: Upload redfish licenses to supermicro hosts - https://phabricator.wikimedia.org/T376121#10210484 (10elukey) [13:11:32] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack: Upload redfish licenses to supermicro hosts - https://phabricator.wikimedia.org/T376121#10210529 (10elukey) [13:34:41] 10SRE-tools, 06Data-Persistence-SRE, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: mysql_legacy: SQL query quote escape - https://phabricator.wikimedia.org/T376712#10210606 (10ABran-WMF) [13:35:05] 10SRE-tools, 06Data-Persistence-SRE, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: mysql_legacy data_directory getter - https://phabricator.wikimedia.org/T376701#10210609 (10ABran-WMF) [13:58:04] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Support listing pooled / active authdns hosts (rather than all) - https://phabricator.wikimedia.org/T375014#10210766 (10ssingh) >>! In T375014#10205990, @Volans wrote: > @ssingh what do you think of the above draft patch proposal?... [14:16:43] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack: Upload redfish licenses to supermicro hosts - https://phabricator.wikimedia.org/T376121#10210910 (10elukey) [14:33:05] tappof: they're also to check that the Atlas are up and running by answering to pings, so not just a placeholder [14:33:50] as long as there is a prometheus check that does something similar it's fine to remove it [14:42:39] ok, XioNoX thank you [14:52:00] FIRING: [2x] CertAlmostExpired: Certificate for service cloudidm2001-dev:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#cloudidm2001-dev:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [15:18:32] 11:12:59 <+jinxer-wm> FIRING: [8x] CertAlmostExpired: Certificate for service lsw1-e5-eqiad.mgmt.eqiad.wmnet:32767 is about to expire - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [15:32:56] cdanis: thx! [15:33:01] https://www.irccloud.com/pastebin/rv35wJg1/ [15:35:19] and did the same for the other 5 switches [15:49:54] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade Management routers to 23.4R2-S2 - https://phabricator.wikimedia.org/T369504#10211237 (10Papaul) [15:50:08] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade Management routers to 23.4R2-S2 - https://phabricator.wikimedia.org/T369504#10211239 (10Papaul) [15:59:03] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10211299 (10ayounsi) About phase 1. I checked the pfw1 config and steps here. Gave some feedback over IRC. Overall lgtm. I didn't check pha... [16:47:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:57:25] FIRING: [3x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:07:25] FIRING: [3x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:12:25] FIRING: [3x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:52:15] 10Mail, 06Infrastructure-Foundations: Email Verification - https://phabricator.wikimedia.org/T376739 (10EveBlevins) 03NEW [18:47:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:52:00] FIRING: [2x] CertAlmostExpired: Certificate for service cloudidm2001-dev:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#cloudidm2001-dev:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [19:09:21] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10212159 (10Papaul) @Jgreen @Dwisehaupt when do you think you will have time to relocate the 4 servers in the table that have "YES" on the... [20:39:46] 10Mail, 06Infrastructure-Foundations, 06SRE: Lisa@wikipedia.org is receiving a large number of donor responses - https://phabricator.wikimedia.org/T375643#10212511 (10Dzahn) Based on the example content, I am thinking maybe those few users just understand how it works. So that @wikipedia.org won't go to her... [20:49:56] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10212539 (10Papaul) [20:57:00] 10Mail, 06Infrastructure-Foundations: Email Verification - https://phabricator.wikimedia.org/T376739#10212568 (10Reedy) 05Open→03Stalled >I would like it to automatically filter out the emails that are not active or are returning blank/bouncing back. You would like to filter these out where? Who is this... [21:08:19] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10212583 (10Papaul) [22:37:41] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10212771 (10Papaul) @Jhancock.wm we are going to put civi2001 on the new switch on port 7 since on U6 we have a 2U server so we will just be... [22:38:38] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10212778 (10Papaul) [22:52:00] FIRING: [2x] CertAlmostExpired: Certificate for service cloudidm2001-dev:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#cloudidm2001-dev:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired