[04:59:56] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11334176 (10Papaul) [05:04:17] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11334177 (10Papaul) @cmooney i update all the IP's to match the other POP sites. I will be re-running the configuration and validation sometimes this week in m... [08:48:10] FIRING: [3x] GanetiCACertificateAboutToExpire: Ganeti CA certificate ganeti.example.com is about to expire - https://wikitech.wikimedia.org/wiki/Ganeti#Renew_cluster_certificates - TODO - https://alerts.wikimedia.org/?q=alertname%3DGanetiCACertificateAboutToExpire [08:53:10] FIRING: [4x] GanetiCACertificateAboutToExpire: Ganeti CA certificate ganeti.example.com is about to expire - https://wikitech.wikimedia.org/wiki/Ganeti#Renew_cluster_certificates - TODO - https://alerts.wikimedia.org/?q=alertname%3DGanetiCACertificateAboutToExpire [08:53:50] ^ example.com? [09:01:25] that's expected, the ganeti CA is only used internally and all the scripts internally use example.com... [09:02:56] I think I'll tweak the alert to not report the certname, but instead just the cluster name [09:11:28] FIRING: [2x] NodeTextfileStale: Stale textfile for puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [09:11:28] FIRING: [2x] NodeTextfileStale: Stale textfile for config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [11:54:11] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11335237 (10cmooney) Thanks @papaul. One to discuss with @ayounsi when he is back are the IPv6 gateway addresses on the vlans. ` on asw1-22 irb.411 public1-ul... [12:53:25] FIRING: [4x] GanetiCACertificateAboutToExpire: Ganeti CA certificate ganeti.example.com is about to expire - https://wikitech.wikimedia.org/wiki/Ganeti#Renew_cluster_certificates - TODO - https://alerts.wikimedia.org/?q=alertname%3DGanetiCACertificateAboutToExpire [13:11:43] FIRING: [2x] NodeTextfileStale: Stale textfile for config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [13:11:43] FIRING: [2x] NodeTextfileStale: Stale textfile for puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [13:28:12] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067 (10cmooney) 03NEW p:05Triage→03Medium [13:52:08] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067#11335585 (10cmooney) [13:53:58] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067#11335590 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=a04c020e-81be-4ee8-bf2f-5bcc8830a8da) set by cmooney@cumin1003 for 2:00:00... [14:44:48] 10Mail, 06Infrastructure-Foundations, 06serviceops, 06SRE: Sendmail network error (deployment) - https://phabricator.wikimedia.org/T407723#11335888 (10jhathaway) p:05Triage→03Medium a:03jhathaway [16:53:25] FIRING: [4x] GanetiCACertificateAboutToExpire: Ganeti CA certificate ganeti.example.com is about to expire - https://wikitech.wikimedia.org/wiki/Ganeti#Renew_cluster_certificates - TODO - https://alerts.wikimedia.org/?q=alertname%3DGanetiCACertificateAboutToExpire [16:53:45] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067#11336622 (10cmooney) 05Open→03Resolved Uplinks moved, the actual gateway move from CR to switches we will wait until Nokia... [17:11:43] FIRING: [2x] NodeTextfileStale: Stale textfile for config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [17:11:43] FIRING: [2x] NodeTextfileStale: Stale textfile for puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [20:53:25] FIRING: [4x] GanetiCACertificateAboutToExpire: Ganeti CA certificate ganeti.example.com is about to expire - https://wikitech.wikimedia.org/wiki/Ganeti#Renew_cluster_certificates - TODO - https://alerts.wikimedia.org/?q=alertname%3DGanetiCACertificateAboutToExpire [21:11:43] FIRING: [2x] NodeTextfileStale: Stale textfile for config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [21:11:43] FIRING: [2x] NodeTextfileStale: Stale textfile for puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale