[00:30:09] FIRING: LVSHighCPU: The host lvs7001:9100 has at least its CPU 15 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs7001 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [00:35:09] RESOLVED: LVSHighCPU: The host lvs7001:9100 has at least its CPU 15 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs7001 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [10:47:27] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: codfw: rack A2 maintenance - https://phabricator.wikimedia.org/T426199#11952111 (10jcrespo) >>! In T426199#11921066, @jcrespo wrote: > @ayounsi not urgent, but please ping me with a time of start of maintenance when y... [11:24:12] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: codfw: rack A2 maintenance - https://phabricator.wikimedia.org/T426199#11952180 (10FCeratto-WMF) [12:40:59] 06Traffic, 05Bot detection and mitigation (WE4.10 hCaptcha), 07Documentation, 06Product Safety and Integrity (Sprint Iris (May 25 - Jun 12)): hcaptcha proxy: update wikitech page - https://phabricator.wikimedia.org/T411131#11952355 (10OKryva-WMF) [12:42:18] 06Traffic, 06ServiceOps new, 10ServiceOps-Services-Oids, 05Bot detection and mitigation (WE4.2 hCaptcha editing trial), and 2 others: hCaptcha: Stop using urldownloader for health checks of the secure-api.js file - https://phabricator.wikimedia.org/T421464#11952385 (10OKryva-WMF) [13:55:29] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: codfw: rack A2 maintenance - https://phabricator.wikimedia.org/T426199#11952593 (10ayounsi) [13:55:46] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: codfw: rack A2 maintenance - https://phabricator.wikimedia.org/T426199#11952594 (10ayounsi) [14:06:56] 10netops, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations: bird bfd session with 172.20.1.1 down - Bad packet from 172.20.1.1 - unknown session id - https://phabricator.wikimedia.org/T427202 (10fgiunchedi) 03NEW [15:13:44] FIRING: HaproxyKafkaExporterDown: HaproxyKafka on cp1104 is down - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaExporterDown - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=eqiad&var-instance=cp1104 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaExporterDown [15:13:48] yeah reboot [15:13:53] should it alert? no [15:14:48] no, it shouldn't [15:18:44] RESOLVED: [2x] HaproxyKafkaExporterDown: HaproxyKafka on cp1104 is down - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaExporterDown - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaExporterDown [15:21:55] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: codfw: rack A2 maintenance - https://phabricator.wikimedia.org/T426199#11952859 (10cmooney) >>! In T426199#11952111, @jcrespo wrote: >>>! In T426199#11921066, @jcrespo wrote: >> @ayounsi not urgent, but please ping me... [15:33:12] 10netops, 06Infrastructure-Foundations, 06ServiceOps new, 10ServiceOps-Upgrades-Hardware: codfw: rack A2 maintenance - https://phabricator.wikimedia.org/T426199#11952887 (10jcrespo) Thanks, then I will start the depool at 11:30 UTC in order to minimize backup lag. [16:39:25] FIRING: [2x] SystemdUnitFailed: haproxy.service on cp5029:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:44:25] RESOLVED: [4x] SystemdUnitFailed: haproxy.service on cp5021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:03:29] FIRING: HAProxyRestarted: HAProxy server restarted on cp1109:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=eqiad%20prometheus/ops&var-instance=cp1109&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [23:08:29] RESOLVED: HAProxyRestarted: HAProxy server restarted on cp1109:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=eqiad%20prometheus/ops&var-instance=cp1109&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [23:51:29] FIRING: HAProxyRestarted: HAProxy server restarted on cp1111:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=eqiad%20prometheus/ops&var-instance=cp1111&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [23:56:29] RESOLVED: HAProxyRestarted: HAProxy server restarted on cp1111:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=eqiad%20prometheus/ops&var-instance=cp1111&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted