[02:13:55] FIRING: SystemdUnitFailed: upload_puppet_facts.service on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:13:55] FIRING: SystemdUnitFailed: upload_puppet_facts.service on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:01:39] 10SRE-tools, 06Infrastructure-Foundations: SREBatchBase: Print allowed aliases in help message - https://phabricator.wikimedia.org/T375590 (10MoritzMuehlenhoff) 03NEW [07:01:46] 10SRE-tools, 06Infrastructure-Foundations: SREBatchBase: Print allowed aliases in help message - https://phabricator.wikimedia.org/T375590#10174190 (10MoritzMuehlenhoff) p:05Triage→03Medium [09:20:14] 10SRE-tools, 06Infrastructure-Foundations: SREBatchBase: Print allowed aliases in help message - https://phabricator.wikimedia.org/T375590#10174602 (10Volans) With the current API that's not possible because `allowed_aliases` is an instance property (not a class property) of the runner class, not the cookbook... [10:13:55] FIRING: SystemdUnitFailed: upload_puppet_facts.service on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:38:57] 10SRE-tools, 06Infrastructure-Foundations: Add warning when provision cookbook is ran without the virtualization flag on hypervisors - https://phabricator.wikimedia.org/T344342#10175144 (10ayounsi) 05Open→03Resolved a:03ayounsi [13:25:43] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10175439 (10Papaul) [13:28:50] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10175445 (10Papaul) [13:41:58] Hello. We would like to create some more LDAP groups for the Airflow migration project and have their membership match some existing posix groups. e.g. `analytics-admins` [13:42:43] Is there any existing tooling for checking that this membership of the two groups matches and alerting us if there is a discrepancy? [14:06:42] btullis: o/ the only thing that I know is modules/openldap/files/cross-validate-accounts.py [14:07:01] elukey: Cool, thanks. I will look into that. [14:07:48] but it is tailored for ops/adm/etc.. some functions could easily adapt to your use case [14:08:10] not sure if that file's scope is ok for the airflow project [14:08:46] but maybe another timer/script that alerts DE could run on the same nodes [14:08:56] just to separate concerns [14:13:55] FIRING: SystemdUnitFailed: upload_puppet_facts.service on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:53:58] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10176000 (10Jhancock.wm) [14:56:36] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10176005 (10Jhancock.wm) @papaul there's two more not marked in the comments that do not have 10G cards, but they are being decommed. civi20... [15:46:03] 10netops, 06Infrastructure-Foundations, 06SRE: cr3-ulsfo incident 22 Sep 2024 - https://phabricator.wikimedia.org/T375345#10176212 (10ayounsi) 05Resolved→03Open ` cr3-ulsfo> show system alarms 1 alarms currently active Alarm time Class Description 2024-09-25 13:11:42 UTC Minor FPC 0 Min... [15:51:19] 10Mail, 06Infrastructure-Foundations, 06SRE: Lisa@wikipedia.org is receiving a large number of donor responses - https://phabricator.wikimedia.org/T375643 (10nisrael) 03NEW [15:52:57] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, and 2 others: codfw:frack:servers migration task - https://phabricator.wikimedia.org/T375151#10176239 (10Papaul) @Jhancock.wm thank you no worry on civi2001 and frpig2001. So we have a total of 8 servers that are running on 1G and we... [15:53:30] 10Mail, 06Infrastructure-Foundations, 06SRE: Lisa@wikipedia.org is receiving a large number of donor responses - https://phabricator.wikimedia.org/T375643#10176263 (10Reedy) Is she recieving them to her `@wikimedia.org` mailbox? [16:47:28] 10netops, 06Infrastructure-Foundations, 06SRE: cr3-ulsfo incident 22 Sep 2024 - https://phabricator.wikimedia.org/T375345#10176493 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=dcd9deb3-f5d9-41d3-ade0-567f7154bb5b) set by ayounsi@cumin1002 for 7 days, 0:00:00 on 1 host(s) and their serv... [17:07:04] jhathaway or other mail experts: in the final phase of switchover day 2, we have a command to run when validating mail is working [0] along with recommended bounds on queue wait. I'd like to improve the guidance on how to interpret the `exiqsumm` output. [17:07:04] [0] https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Phase_10_-_verification_and_troubleshooting [17:08:03] given that the oldest message across all destinations is quite old (hundreds of days), is there a recommended "thing" to look at? just the "newest" column? [17:12:15] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: cr3-ulsfo incident 22 Sep 2024 - https://phabricator.wikimedia.org/T375345#10176609 (10RobH) Adding in #ops-ulsfo project tag as I've been CC'd in at this point for the actual processing of the on-site steps for this failed hardware. [17:12:41] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: cr3-ulsfo incident 22 Sep 2024 - https://phabricator.wikimedia.org/T375345#10176610 (10RobH) [18:01:15] swfrench-wmf: sorry that command is out dated, given our switch to postfix for ingress, let me take a look at a sutiable replacement [18:08:48] jhathaway: ack, thank you very much for doing so [18:08:55] RESOLVED: SystemdUnitFailed: upload_puppet_facts.service on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:13:57] swfrench-wmf: updated, let me know if it makes sense [18:16:34] jhathaway: this is great, thank you! [18:17:02] (also just tried it, and yes makes sense) [18:17:30] great [18:25:12] o/ [18:26:00] I ran the decom cookbook and ran into a diff and needed clarification if it's ok to proceed. [18:26:36] https://www.irccloud.com/pastebin/mm89its3/ [18:26:41] Here's the diff. [18:30:55] arnoldokoth: that should be fine, I see this phab task for adding those switches, https://phabricator.wikimedia.org/T374587 [18:35:14] Thank you. 🚀