[00:03:33] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1057368 (owner: 10TrainBranchBot) [02:39:21] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:00:39] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:06:40] FIRING: SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:34:40] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367856)', diff saved to https://phabricator.wikimedia.org/P66969 and previous config saved to /var/cache/conftool/dbconfig/20240728-033440-marostegui.json [03:34:50] T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856 [03:49:47] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P66970 and previous config saved to /var/cache/conftool/dbconfig/20240728-034946-marostegui.json [04:04:54] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P66971 and previous config saved to /var/cache/conftool/dbconfig/20240728-040453-marostegui.json [04:20:01] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367856)', diff saved to https://phabricator.wikimedia.org/P66972 and previous config saved to /var/cache/conftool/dbconfig/20240728-042000-marostegui.json [04:20:02] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance [04:20:05] T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856 [04:20:15] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance [04:20:21] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1242 (T367856)', diff saved to https://phabricator.wikimedia.org/P66973 and previous config saved to /var/cache/conftool/dbconfig/20240728-042021-marostegui.json [04:41:41] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2212.codfw.wmnet with reason: Maintenance [04:41:54] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2212.codfw.wmnet with reason: Maintenance [04:42:01] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66974 and previous config saved to /var/cache/conftool/dbconfig/20240728-044200-marostegui.json [04:42:06] T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856 [06:04:21] FIRING: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [06:09:21] RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [07:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240728T0700) [07:06:40] FIRING: SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:54:25] FIRING: SystemdUnitFailed: cadvisor.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:09:25] RESOLVED: SystemdUnitFailed: cadvisor.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:45:03] (03PS1) 10NMW03: Increase edit count requirement for autoconfirmed on English Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1057379 [08:45:21] (03PS2) 10NMW03: Increase edit count requirement for autoconfirmed on English Wikivoyage Bug: T371186 Change-Id: I955e4a2be7c911b638ad8d9c862f469f8c02bd25 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1057379 (https://phabricator.wikimedia.org/T371186) [08:45:49] (03PS3) 10NMW03: Increase edit count requirement for autoconfirmed on English Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1057379 (https://phabricator.wikimedia.org/T371186) [08:46:55] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, July 29 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1057379 (https://phabricator.wikimedia.org/T371186) (owner: 10NMW03) [10:11:25] FIRING: [2x] SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:56:01] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on db1175 - https://phabricator.wikimedia.org/T371190 (10ops-monitoring-bot) 03NEW [11:06:25] FIRING: [2x] SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:07:09] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on ms-be1056 - https://phabricator.wikimedia.org/T371192 (10ops-monitoring-bot) 03NEW [12:36:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 24.5% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [12:41:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-web at eqiad: 24.5% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-web&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [13:05:39] FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:04:21] RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:39:21] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:59:21] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:06:41] FIRING: SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:15:07] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367856)', diff saved to https://phabricator.wikimedia.org/P66975 and previous config saved to /var/cache/conftool/dbconfig/20240728-181506-marostegui.json [18:15:12] T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856 [18:30:14] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P66976 and previous config saved to /var/cache/conftool/dbconfig/20240728-183013-marostegui.json [18:45:21] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P66977 and previous config saved to /var/cache/conftool/dbconfig/20240728-184521-marostegui.json [19:00:28] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367856)', diff saved to https://phabricator.wikimedia.org/P66978 and previous config saved to /var/cache/conftool/dbconfig/20240728-190028-marostegui.json [19:00:30] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance [19:00:38] T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856 [19:00:43] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance [19:00:50] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1243 (T367856)', diff saved to https://phabricator.wikimedia.org/P66979 and previous config saved to /var/cache/conftool/dbconfig/20240728-190050-marostegui.json [19:06:41] FIRING: SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:30:47] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, July 29 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1057033 (https://phabricator.wikimedia.org/T371026) (owner: 10Superzerocool) [23:06:41] FIRING: SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:38:24] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1057416 [23:38:24] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1057416 (owner: 10TrainBranchBot)