[00:00:02] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159 (T410589)', diff saved to https://phabricator.wikimedia.org/P85452 and previous config saved to /var/cache/conftool/dbconfig/20251122-000001-ladsgroup.json [00:00:07] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [00:00:19] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance [00:00:26] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1161 (T410589)', diff saved to https://phabricator.wikimedia.org/P85453 and previous config saved to /var/cache/conftool/dbconfig/20251122-000026-ladsgroup.json [00:02:56] (03PS1) 10Cwhite: aptrepo: add component/opensearch27 [puppet] - 10https://gerrit.wikimedia.org/r/1208499 (https://phabricator.wikimedia.org/T410795) [00:02:58] (03PS1) 10Cwhite: opensearch: add $apt_component parameter [puppet] - 10https://gerrit.wikimedia.org/r/1208500 (https://phabricator.wikimedia.org/T410795) [00:08:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [00:11:49] PROBLEM - MariaDB Replica Lag: s5 on clouddb1016 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 621.10 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [00:11:59] PROBLEM - MariaDB Replica Lag: s5 on an-redacteddb1001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 629.90 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [00:12:07] PROBLEM - MariaDB Replica Lag: s5 on clouddb1020 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 639.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [00:12:14] !log rzl@deploy2002 helmfile [staging] START helmfile.d/services/api-gateway: apply [00:12:15] PROBLEM - MariaDB Replica Lag: s5 on db1154 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 647.41 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [00:12:34] !log rzl@deploy2002 helmfile [staging] DONE helmfile.d/services/api-gateway: apply [00:16:01] !log rzl@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply [00:16:10] !log rzl@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply [00:38:09] PROBLEM - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [00:38:58] FIRING: [2x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:39:36] FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [00:40:30] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1208506 [00:40:30] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1208506 (owner: 10TrainBranchBot) [00:42:17] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - kubemaster_6443: Servers wikikube-ctrl1003.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [00:43:17] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [00:53:54] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1208506 (owner: 10TrainBranchBot) [00:58:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [01:00:38] !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image [01:03:57] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [01:06:34] (03CR) 10Scott French: [C:03+1] "Thanks, Reuven!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1208484 (https://phabricator.wikimedia.org/T409510) (owner: 10RLazarus) [01:08:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [01:09:36] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [01:10:46] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1208529 [01:10:46] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1208529 (owner: 10TrainBranchBot) [01:28:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [01:33:57] FIRING: [2x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:33:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [01:35:51] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1208529 (owner: 10TrainBranchBot) [01:38:09] RECOVERY - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [01:38:57] FIRING: [2x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:43:57] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [01:53:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [01:54:57] FIRING: AlertLintProblem: Linting problems found for DiskSpace - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [02:03:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=eqiad&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [02:08:57] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=eqiad&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [02:13:57] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [02:17:40] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:28:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-bfhcs:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [02:31:43] 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11398060 (10Papaul) [02:48:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [02:53:57] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [02:54:36] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [02:58:57] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=eqiad&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [03:03:57] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [03:08:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [03:13:57] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [03:18:57] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [03:28:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [03:33:57] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [03:48:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [03:53:57] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [03:58:57] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=eqiad&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [04:03:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [04:08:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [04:13:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [04:17:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:18:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-bfhcs:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [04:28:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [04:33:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-bfhcs:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [04:38:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-bfhcs:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [04:39:36] FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [04:58:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-bfhcs:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [05:08:58] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:08:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [05:09:36] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [05:13:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [05:18:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [05:23:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [05:33:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [05:33:58] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:38:58] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:43:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [05:54:57] FIRING: AlertLintProblem: Linting problems found for DiskSpace - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [06:08:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-bfhcs:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [06:18:51] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-codfw:xe-0/1/1:1 (Transport: cr2-eqiad:xe-3/2/2 (Lumen, 442550293) {#12253_12334-2}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [06:23:51] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-codfw:xe-0/1/1:1 (Transport: cr2-eqiad:xe-3/2/2 (Lumen, 442550293) {#12253_12334-2}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [06:28:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [06:33:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [06:53:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [06:54:36] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [06:58:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=eqiad&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [07:23:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [07:37:45] (03CR) 10Dragoniez: "Waiting for feedback from the requestor on `abusefilter-revert`, `abusefilter-modify-restricted`, and `abusefilter-access-protected-vars`" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1208329 (https://phabricator.wikimedia.org/T407978) (owner: 10Dragoniez) [07:43:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [07:48:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=eqiad&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [07:53:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [07:58:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [08:08:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-bfhcs:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [08:23:58] RESOLVED: [2x] CalicoHighMemoryUsage: Calico container calico-node-bfhcs:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=codfw&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [08:31:40] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T410589)', diff saved to https://phabricator.wikimedia.org/P85454 and previous config saved to /var/cache/conftool/dbconfig/20251122-083140-ladsgroup.json [08:31:47] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [08:31:49] RECOVERY - MariaDB Replica Lag: s5 on clouddb1016 is OK: OK slave_sql_lag Replication lag: 0.05 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [08:31:59] RECOVERY - MariaDB Replica Lag: s5 on an-redacteddb1001 is OK: OK slave_sql_lag Replication lag: 0.05 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [08:32:07] RECOVERY - MariaDB Replica Lag: s5 on clouddb1020 is OK: OK slave_sql_lag Replication lag: 0.11 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [08:32:15] RECOVERY - MariaDB Replica Lag: s5 on db1154 is OK: OK slave_sql_lag Replication lag: 0.24 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [08:33:58] FIRING: CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=codfw&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [08:38:58] RESOLVED: [2x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [08:39:36] FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [08:46:48] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P85455 and previous config saved to /var/cache/conftool/dbconfig/20251122-084647-ladsgroup.json [08:48:51] RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-codfw:xe-0/1/1:1 (Transport: cr2-eqiad:xe-3/2/2 (Lumen, 442550293) {#12253_12334-2}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [08:48:58] FIRING: CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=codfw&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [09:01:56] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P85456 and previous config saved to /var/cache/conftool/dbconfig/20251122-090155-ladsgroup.json [09:09:36] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [09:17:03] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T410589)', diff saved to https://phabricator.wikimedia.org/P85457 and previous config saved to /var/cache/conftool/dbconfig/20251122-091703-ladsgroup.json [09:17:09] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [09:17:19] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance [09:17:27] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1185 (T410589)', diff saved to https://phabricator.wikimedia.org/P85458 and previous config saved to /var/cache/conftool/dbconfig/20251122-091726-ladsgroup.json [09:28:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [09:33:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [09:39:36] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:48:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [09:54:57] FIRING: AlertLintProblem: Linting problems found for DiskSpace - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [09:58:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [10:18:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [10:23:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [10:33:58] RESOLVED: CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=eqiad&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [10:53:58] FIRING: CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=codfw&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [10:54:36] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [10:58:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [11:03:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [11:08:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [11:13:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [11:18:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [11:28:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [11:33:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [11:38:58] RESOLVED: CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=codfw&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [11:58:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [12:03:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [12:08:58] RESOLVED: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [12:13:58] FIRING: CalicoHighMemoryUsage: Calico container calico-node-gq2sk:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=codfw&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [12:18:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-bfhcs:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=codfw&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [12:23:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-bfhcs:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=codfw&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [12:28:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [12:33:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [12:38:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [12:39:36] FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [12:48:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [12:54:03] FIRING: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [12:58:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [12:59:03] RESOLVED: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures [13:09:36] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [13:13:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [13:18:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [13:23:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [13:28:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [13:33:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [13:39:36] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:41:31] PROBLEM - Docker registry HTTPS interface on registry2005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Docker [13:43:21] RECOVERY - Docker registry HTTPS interface on registry2005 is OK: HTTP OK: HTTP/1.1 200 OK - 3746 bytes in 0.338 second response time https://wikitech.wikimedia.org/wiki/Docker [13:43:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [13:48:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [13:53:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [13:54:57] FIRING: AlertLintProblem: Linting problems found for DiskSpace - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [13:58:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [14:03:58] RESOLVED: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=eqiad&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [14:08:58] FIRING: CalicoHighMemoryUsage: Calico container calico-node-bfhcs:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=codfw&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [14:13:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [14:18:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [14:28:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [14:29:45] FIRING: CirrusStreamingUpdaterUnknownErrors: CirrusSearch consumer-search@codfw is failing write requests because of unknown errors - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterUnknownErrors [14:33:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [14:38:58] RESOLVED: [2x] CalicoHighMemoryUsage: Calico container calico-node-bfhcs:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=codfw&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [14:39:45] RESOLVED: CirrusStreamingUpdaterUnknownErrors: CirrusSearch consumer-search@codfw is failing write requests because of unknown errors - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterUnknownErrors [14:43:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [14:48:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [14:53:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=eqiad&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [14:54:36] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:58:37] (03CR) 10Dzahn: "we are getting some emails about a "AlertLintProblem collaboration-services" which links to https://wikitech.wikimedia.org/wiki/Alertmanag" [alerts] - 10https://gerrit.wikimedia.org/r/1201087 (https://phabricator.wikimedia.org/T408632) (owner: 10AOkoth) [14:58:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [15:03:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [15:05:00] (03PS1) 10AOkoth: Revert "vrts: alert on vrts junk queue size" [alerts] - 10https://gerrit.wikimedia.org/r/1209112 [15:08:02] (03CR) 10AOkoth: [C:03+2] Revert "vrts: alert on vrts junk queue size" [alerts] - 10https://gerrit.wikimedia.org/r/1209112 (owner: 10AOkoth) [15:08:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [15:08:58] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:09:11] (03Merged) 10jenkins-bot: Revert "vrts: alert on vrts junk queue size" [alerts] - 10https://gerrit.wikimedia.org/r/1209112 (owner: 10AOkoth) [15:13:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [15:18:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [15:23:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [15:28:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [15:33:58] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:38:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [15:43:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [15:48:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-bfhcs:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=codfw&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [15:53:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [15:58:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [16:03:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [16:13:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [16:18:26] FIRING: ProbeDown: Service wdqs1015:443 has failed probes (http_query_wikidata_org_ldf_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1015:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:21:17] FIRING: ProbeDown: Service wdqs2010:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2010:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:23:26] RESOLVED: [2x] ProbeDown: Service wdqs1015:443 has failed probes (http_query_wikidata_org_ldf_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1015:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:23:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [16:26:17] RESOLVED: [3x] ProbeDown: Service wdqs1012:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:28:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [16:33:12] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T410589)', diff saved to https://phabricator.wikimedia.org/P85459 and previous config saved to /var/cache/conftool/dbconfig/20251122-163311-ladsgroup.json [16:33:17] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [16:35:56] FIRING: [2x] ProbeDown: Service wdqs1015:443 has failed probes (http_query_wikidata_org_ldf_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1015:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:38:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [16:39:36] FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [16:40:56] RESOLVED: [2x] ProbeDown: Service wdqs1015:443 has failed probes (http_query_wikidata_org_ldf_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1015:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:43:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [16:48:20] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P85460 and previous config saved to /var/cache/conftool/dbconfig/20251122-164819-ladsgroup.json [16:48:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [16:56:15] FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid releases routed via main (k8s) 1.844s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [17:01:15] RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid releases routed via main (k8s) 1.844s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [17:03:27] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P85461 and previous config saved to /var/cache/conftool/dbconfig/20251122-170327-ladsgroup.json [17:03:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [17:08:58] FIRING: [3x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [17:09:36] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [17:13:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-6gdw7:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [17:18:35] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T410589)', diff saved to https://phabricator.wikimedia.org/P85462 and previous config saved to /var/cache/conftool/dbconfig/20251122-171834-ladsgroup.json [17:18:40] T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589 [17:18:51] !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance [17:18:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [17:18:59] !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1200 (T410589)', diff saved to https://phabricator.wikimedia.org/P85463 and previous config saved to /var/cache/conftool/dbconfig/20251122-171858-ladsgroup.json [17:23:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [17:33:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [17:39:24] (03PS14) 10Andrew Bogott: cloudidp-dev: Hiera changes to make more like normal idp nodes [puppet] - 10https://gerrit.wikimedia.org/r/1208350 (https://phabricator.wikimedia.org/T410294) [17:39:30] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1208350 (https://phabricator.wikimedia.org/T410294) (owner: 10Andrew Bogott) [17:39:36] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:42:36] (03PS15) 10Andrew Bogott: cloudidp-dev: Hiera changes to make more like normal idp nodes [puppet] - 10https://gerrit.wikimedia.org/r/1208350 (https://phabricator.wikimedia.org/T410294) [17:42:39] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1208350 (https://phabricator.wikimedia.org/T410294) (owner: 10Andrew Bogott) [17:43:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [17:45:45] FIRING: CirrusStreamingUpdaterUnknownErrors: CirrusSearch consumer-search@eqiad is failing write requests because of unknown errors - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterUnknownErrors [17:50:44] FIRING: [2x] CirrusStreamingUpdaterUnknownErrors: CirrusSearch consumer-cloudelastic@eqiad is failing write requests because of unknown errors - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterUnknownErrors [17:51:30] (03PS5) 10Andrew Bogott: profile::idp: require many args to be non-empty [puppet] - 10https://gerrit.wikimedia.org/r/1208370 [17:51:31] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1208370 (owner: 10Andrew Bogott) [17:54:57] FIRING: AlertLintProblem: Linting problems found for DiskSpace - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [17:55:45] RESOLVED: [2x] CirrusStreamingUpdaterUnknownErrors: CirrusSearch consumer-cloudelastic@eqiad is failing write requests because of unknown errors - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterUnknownErrors [17:56:59] (03PS6) 10Andrew Bogott: profile::idp: require many args to be non-empty [puppet] - 10https://gerrit.wikimedia.org/r/1208370 [17:56:59] (03PS16) 10Andrew Bogott: cloudidp-dev: Hiera changes to make more like normal idp nodes [puppet] - 10https://gerrit.wikimedia.org/r/1208350 (https://phabricator.wikimedia.org/T410294) [17:57:20] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1208350 (https://phabricator.wikimedia.org/T410294) (owner: 10Andrew Bogott) [17:59:53] (03PS17) 10Andrew Bogott: cloudidp-dev: Hiera changes to make more like normal idp nodes [puppet] - 10https://gerrit.wikimedia.org/r/1208350 (https://phabricator.wikimedia.org/T410294) [18:00:03] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1208350 (https://phabricator.wikimedia.org/T410294) (owner: 10Andrew Bogott) [18:05:35] (03PS18) 10Andrew Bogott: cloudidp-dev: Hiera changes to make more like normal idp nodes [puppet] - 10https://gerrit.wikimedia.org/r/1208350 (https://phabricator.wikimedia.org/T410294) [18:05:35] (03PS7) 10Andrew Bogott: profile::idp: require many args to be non-empty [puppet] - 10https://gerrit.wikimedia.org/r/1208370 [18:08:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [18:13:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [18:23:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [18:54:36] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [18:58:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [19:02:39] (03CR) 10Andrew Bogott: [C:03+2] cloudidp-dev: Hiera changes to make more like normal idp nodes [puppet] - 10https://gerrit.wikimedia.org/r/1208350 (https://phabricator.wikimedia.org/T410294) (owner: 10Andrew Bogott) [19:03:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [19:08:58] RESOLVED: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:13:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [19:14:42] RESOLVED: AlertLintProblem: Linting problems found for DiskSpace - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [19:16:05] (03PS8) 10Andrew Bogott: profile::idp: require many args to be non-empty [puppet] - 10https://gerrit.wikimedia.org/r/1208370 [19:16:05] (03PS1) 10Andrew Bogott: Update backend for cloudidp-dev.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1209220 (https://phabricator.wikimedia.org/T410294) [19:17:55] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:18:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [19:28:25] (03CR) 10Krinkle: [C:03+1] Fix db config for offline maint scripts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1208439 (https://phabricator.wikimedia.org/T410738) (owner: 10Ladsgroup) [19:28:58] (03CR) 10Krinkle: [C:03+1] tests: Make data providers static methods [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1208328 (https://phabricator.wikimedia.org/T410731) (owner: 10D3r1ck01) [19:29:25] (03PS2) 10Andrew Bogott: Update backend for cloudidp-dev.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1209220 (https://phabricator.wikimedia.org/T410294) [19:29:25] (03PS9) 10Andrew Bogott: profile::idp: require many args to be non-empty [puppet] - 10https://gerrit.wikimedia.org/r/1208370 [19:30:48] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1209220 (https://phabricator.wikimedia.org/T410294) (owner: 10Andrew Bogott) [19:33:07] (03PS3) 10Andrew Bogott: Update backend for cloudidp-dev.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1209220 (https://phabricator.wikimedia.org/T410294) [19:33:07] (03PS10) 10Andrew Bogott: profile::idp: require many args to be non-empty [puppet] - 10https://gerrit.wikimedia.org/r/1208370 [19:33:15] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1209220 (https://phabricator.wikimedia.org/T410294) (owner: 10Andrew Bogott) [19:33:58] FIRING: [2x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://grafana.wikimedia.org/d/2AfU0X_Mz?var-site=eqiad&var-prometheus=k8s-staging&var-container_name=calico-node - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [19:36:59] (03CR) 10Andrew Bogott: [C:03+2] Update backend for cloudidp-dev.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1209220 (https://phabricator.wikimedia.org/T410294) (owner: 10Andrew Bogott) [19:38:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [19:48:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [19:53:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [19:56:02] PROBLEM - Host cloudidp2001-dev is DOWN: PING CRITICAL - Packet loss = 100% [20:00:18] RECOVERY - Host cloudidp2001-dev is UP: PING OK - Packet loss = 0%, RTA = 30.58 ms [20:03:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [20:08:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [20:13:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [20:18:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [20:23:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [20:39:36] FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed [20:43:09] (03PS11) 10Andrew Bogott: profile::idp: require many args to be non-empty [puppet] - 10https://gerrit.wikimedia.org/r/1208370 [20:43:09] (03PS1) 10Andrew Bogott: idp_clouddev: use codfw1dev test ldap [puppet] - 10https://gerrit.wikimedia.org/r/1209270 (https://phabricator.wikimedia.org/T410294) [20:43:18] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1209270 (https://phabricator.wikimedia.org/T410294) (owner: 10Andrew Bogott) [20:43:40] (03CR) 10A smart kitten: Set up tokwiki namespaces (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205956 (https://phabricator.wikimedia.org/T404457) (owner: 10Majavah) [20:46:14] (03CR) 10Andrew Bogott: [C:03+2] idp_clouddev: use codfw1dev test ldap [puppet] - 10https://gerrit.wikimedia.org/r/1209270 (https://phabricator.wikimedia.org/T410294) (owner: 10Andrew Bogott) [20:48:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [20:53:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [21:02:50] PROBLEM - Host cloudidp2001-dev is DOWN: PING CRITICAL - Packet loss = 100% [21:03:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [21:09:36] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [21:10:06] (03CR) 10Tbodt: Set up tokwiki namespaces (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205956 (https://phabricator.wikimedia.org/T404457) (owner: 10Majavah) [21:18:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [21:26:18] RECOVERY - Host cloudidp2001-dev is UP: PING OK - Packet loss = 0%, RTA = 30.64 ms [21:28:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [21:52:21] (03CR) 10Majavah: Set up tokwiki namespaces (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205956 (https://phabricator.wikimedia.org/T404457) (owner: 10Majavah) [21:56:49] PROBLEM - Host cloudidp2001-dev is DOWN: PING CRITICAL - Packet loss = 100% [21:58:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [22:08:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [22:10:24] (03PS12) 10Andrew Bogott: profile::idp: require many args to be non-empty [puppet] - 10https://gerrit.wikimedia.org/r/1208370 [22:10:24] (03PS1) 10Andrew Bogott: idp_clouddev: profile::tlsproxy::instance::ssl_compatibility_mode: strong [puppet] - 10https://gerrit.wikimedia.org/r/1209345 [22:11:54] (03CR) 10Andrew Bogott: [C:03+2] idp_clouddev: profile::tlsproxy::instance::ssl_compatibility_mode: strong [puppet] - 10https://gerrit.wikimedia.org/r/1209345 (owner: 10Andrew Bogott) [22:13:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [22:18:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [22:25:18] RECOVERY - Host cloudidp2001-dev is UP: PING OK - Packet loss = 0%, RTA = 30.44 ms [22:28:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [22:43:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [22:48:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [22:53:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [22:54:36] FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [23:03:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [23:08:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [23:11:46] (03Abandoned) 10Huji: Enable ShortDescriptions on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/688381 (https://phabricator.wikimedia.org/T282486) (owner: 10Huji) [23:13:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [23:15:33] (03PS5) 10Huji: Increase AbuseFilter's emergency disable threshold for fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763982 (https://phabricator.wikimedia.org/T302227) [23:18:10] FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:18:11] (03CR) 10Huji: "This is still needed, some 3+ years later. Is there a chance you could deploy this in the next release cycle? There is really nothing that" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/763982 (https://phabricator.wikimedia.org/T302227) (owner: 10Huji) [23:18:58] FIRING: [4x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [23:23:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [23:26:51] PROBLEM - Host cloudidp2001-dev is DOWN: PING CRITICAL - Packet loss = 100% [23:28:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [23:36:22] RECOVERY - Host cloudidp2001-dev is UP: PING OK - Packet loss = 0%, RTA = 30.56 ms [23:38:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [23:48:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage [23:51:54] PROBLEM - Host cloudidp2001-dev is DOWN: PING CRITICAL - Packet loss = 100% [23:56:22] RECOVERY - Host cloudidp2001-dev is UP: PING OK - Packet loss = 0%, RTA = 30.53 ms [23:58:58] FIRING: [5x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage