[00:02:43] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[00:03:10] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:05:21] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69709 and previous config saved to /var/cache/conftool/dbconfig/20241014-000520-ladsgroup.json
[00:06:55] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:07:43] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[00:07:57] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[00:07:58] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[00:08:40] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1079676 (owner: 10TrainBranchBot)
[00:09:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[00:09:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10224358 (10phaultfinder)
[00:11:27] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:11:55] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:12:43] <jinxer-wm>	 RESOLVED: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[00:12:57] <jinxer-wm>	 RESOLVED: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[00:15:57] <jinxer-wm>	 RESOLVED: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[00:16:27] <jinxer-wm>	 RESOLVED: [6x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:16:55] <jinxer-wm>	 RESOLVED: [5x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:17:46] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:19:27] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[00:20:28] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P69710 and previous config saved to /var/cache/conftool/dbconfig/20241014-002027-ladsgroup.json
[00:20:28] <jinxer-wm>	 FIRING: KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[00:21:57] <jinxer-wm>	 FIRING: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[00:24:09] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-staging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[00:25:28] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[00:26:55] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:29:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[00:30:28] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[00:31:55] <jinxer-wm>	 RESOLVED: [6x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:34:09] <jinxer-wm>	 RESOLVED: [6x] KubernetesAPILatency: High Kubernetes API latency (GET ) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[00:34:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[00:35:35] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P69711 and previous config saved to /var/cache/conftool/dbconfig/20241014-003534-ladsgroup.json
[00:36:55] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: kube-controller-manager.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:41:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[00:41:55] <jinxer-wm>	 RESOLVED: [5x] SystemdUnitFailed: kube-controller-manager.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:44:12] <jinxer-wm>	 FIRING: [2x] KubernetesCalicoDown: kubestagemaster2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[00:44:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10224362 (10phaultfinder)
[00:45:28] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[00:46:40] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[00:49:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[00:50:28] <jinxer-wm>	 RESOLVED: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[00:50:42] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69712 and previous config saved to /var/cache/conftool/dbconfig/20241014-005042-ladsgroup.json
[00:50:47] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
[00:50:49] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
[00:50:57] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69713 and previous config saved to /var/cache/conftool/dbconfig/20241014-005056-ladsgroup.json
[00:51:57] <jinxer-wm>	 RESOLVED: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[00:54:57] <jinxer-wm>	 FIRING: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[00:55:43] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[00:58:50] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69714 and previous config saved to /var/cache/conftool/dbconfig/20241014-005849-ladsgroup.json
[00:59:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[01:00:43] <jinxer-wm>	 RESOLVED: KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[01:01:55] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: kube-controller-manager.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:04:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[01:04:57] <jinxer-wm>	 RESOLVED: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[01:06:55] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitFailed: kube-controller-manager.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:09:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[01:10:28] <jinxer-wm>	 FIRING: KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[01:10:57] <jinxer-wm>	 FIRING: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[01:12:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:13:57] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P69715 and previous config saved to /var/cache/conftool/dbconfig/20241014-011356-ladsgroup.json
[01:14:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[01:15:28] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[01:17:54] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:22:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[01:24:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[01:25:28] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[01:27:40] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[01:27:54] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:29:04] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P69716 and previous config saved to /var/cache/conftool/dbconfig/20241014-012903-ladsgroup.json
[01:29:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[01:30:28] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[01:33:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[01:35:28] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[01:38:40] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[01:39:35] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10224439 (10phaultfinder)
[01:44:11] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69717 and previous config saved to /var/cache/conftool/dbconfig/20241014-014410-ladsgroup.json
[01:44:16] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
[01:44:29] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
[01:44:36] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69718 and previous config saved to /var/cache/conftool/dbconfig/20241014-014435-ladsgroup.json
[01:45:28] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[01:47:54] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:50:28] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[01:50:30] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69719 and previous config saved to /var/cache/conftool/dbconfig/20241014-015030-ladsgroup.json
[01:52:54] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:54:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[01:54:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[01:59:40] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[02:04:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[02:05:12] <jinxer-wm>	 RESOLVED: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[02:05:28] <jinxer-wm>	 RESOLVED: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[02:05:37] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P69720 and previous config saved to /var/cache/conftool/dbconfig/20241014-020537-ladsgroup.json
[02:06:27] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:07:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[02:07:54] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:11:27] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:11:58] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[02:12:54] <jinxer-wm>	 FIRING: [5x] SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:12:57] <jinxer-wm>	 FIRING: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[02:14:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[02:16:58] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[02:17:54] <jinxer-wm>	 RESOLVED: [5x] SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:19:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10224490 (10phaultfinder)
[02:20:44] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P69721 and previous config saved to /var/cache/conftool/dbconfig/20241014-022044-ladsgroup.json
[02:26:58] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[02:29:58] <jinxer-wm>	 FIRING: ProbeDown: Service upload-https:443 has failed probes (http_upload-https_ip6) #page - https://wikitech.wikimedia.org/wiki/Runbook#upload-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:31:44] <jinxer-wm>	 FIRING: HaproxyUnavailable: HAProxy (cache_upload) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[02:32:46] <jinxer-wm>	 FIRING: [8x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)   - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:34:57] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service upload-https:443 has failed probes (http_upload-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#upload-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:35:51] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69722 and previous config saved to /var/cache/conftool/dbconfig/20241014-023551-ladsgroup.json
[02:35:56] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
[02:36:10] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
[02:36:16] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69723 and previous config saved to /var/cache/conftool/dbconfig/20241014-023616-ladsgroup.json
[02:36:44] <jinxer-wm>	 RESOLVED: HaproxyUnavailable: HAProxy (cache_upload) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[02:36:58] <jinxer-wm>	 RESOLVED: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[02:37:13] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:39:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[02:40:58] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[02:41:50] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69724 and previous config saved to /var/cache/conftool/dbconfig/20241014-024149-ladsgroup.json
[02:42:46] <jinxer-wm>	 FIRING: [7x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)   - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:42:53] <jinxer-wm>	 FIRING: DDoSDetected: FastNetMon has detected an attack on eqsin #page - https://bit.ly/wmf-fastnetmon - https://w.wiki/8oU - https://alerts.wikimedia.org/?q=alertname%3DDDoSDetected
[02:43:44] <jinxer-wm>	 FIRING: HaproxyUnavailable: HAProxy (cache_upload) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[02:44:24] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: kube-controller-manager.service on kubestagemaster2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:46:25] <wikibugs>	 (03PS1) 10Tim Starling: Enable {{USERLANGUAGE}} on Commons and Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079680 (https://phabricator.wikimedia.org/T4085)
[02:46:27] <jinxer-wm>	 FIRING: [7x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)   - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:48:44] <jinxer-wm>	 RESOLVED: HaproxyUnavailable: HAProxy (cache_upload) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[02:49:27] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[02:50:58] <jinxer-wm>	 RESOLVED: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[02:51:27] <jinxer-wm>	 FIRING: [7x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)   - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:52:40] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[02:52:53] <jinxer-wm>	 RESOLVED: DDoSDetected: FastNetMon has detected an attack on eqsin #page - https://bit.ly/wmf-fastnetmon - https://w.wiki/8oU - https://alerts.wikimedia.org/?q=alertname%3DDDoSDetected
[02:52:57] <jinxer-wm>	 RESOLVED: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[02:54:12] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[02:54:24] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: wmf_auto_restart_ssh.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:56:56] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P69725 and previous config saved to /var/cache/conftool/dbconfig/20241014-025656-ladsgroup.json
[02:59:24] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: wmf_auto_restart_ssh.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[03:02:13] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:06:09] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-staging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[03:09:39] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10224496 (10phaultfinder)
[03:12:03] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P69726 and previous config saved to /var/cache/conftool/dbconfig/20241014-031203-ladsgroup.json
[03:27:10] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69727 and previous config saved to /var/cache/conftool/dbconfig/20241014-032710-ladsgroup.json
[03:27:15] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
[03:27:17] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
[03:32:17] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
[03:32:31] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
[03:32:37] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69728 and previous config saved to /var/cache/conftool/dbconfig/20241014-033237-ladsgroup.json
[03:39:22] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69729 and previous config saved to /var/cache/conftool/dbconfig/20241014-033922-ladsgroup.json
[03:44:40] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10224532 (10phaultfinder)
[03:46:27] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service kubestagemaster2005:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#kubestagemaster2005:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[03:46:59] <wikibugs>	 (03CR) 10Tim Starling: [C:03+1] Missing.php: Redirect Scots Wiktionary to Scots Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1078122 (https://phabricator.wikimedia.org/T249648) (owner: 10Pppery)
[03:49:12] <jinxer-wm>	 RESOLVED: KubernetesCalicoDown: kubestagemaster2005.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-staging&var-instance=kubestagemaster2005.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[03:51:12] <wikibugs>	 (03CR) 10Bugreporter: "I think we should add it to CommonSetting instead (https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/7cd0d3710ced29a9cf9c1632ed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079680 (https://phabricator.wikimedia.org/T4085) (owner: 10Tim Starling)
[03:54:29] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69730 and previous config saved to /var/cache/conftool/dbconfig/20241014-035429-ladsgroup.json
[04:01:09] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-staging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[04:03:27] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, October 14 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1078122 (https://phabricator.wikimedia.org/T249648) (owner: 10Pppery)
[04:09:24] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_ssh.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:09:36] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69731 and previous config saved to /var/cache/conftool/dbconfig/20241014-040936-ladsgroup.json
[04:11:27] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#kubestagemaster2003:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[04:19:09] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-staging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[04:19:24] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitFailed: wmf_auto_restart_ssh.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:21:27] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[04:22:57] <jinxer-wm>	 FIRING: KubernetesCalicoDown: kubestagemaster2005.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-staging&var-instance=kubestagemaster2005.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[04:23:28] <jinxer-wm>	 FIRING: KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[04:23:57] <jinxer-wm>	 FIRING: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[04:24:09] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-staging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[04:24:36] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10224538 (10phaultfinder)
[04:24:43] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69732 and previous config saved to /var/cache/conftool/dbconfig/20241014-042443-ladsgroup.json
[04:24:48] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
[04:24:54] <wikibugs>	 (03PS1) 10KartikMistry: Update MinT to 2024-10-11-113932-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079682 (https://phabricator.wikimedia.org/T368521)
[04:25:02] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
[04:27:57] <jinxer-wm>	 FIRING: [2x] KubernetesCalicoDown: kubestagemaster2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[04:28:28] <jinxer-wm>	 RESOLVED: KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[04:30:01] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
[04:30:15] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
[04:30:54] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_ssh.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:32:57] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[04:34:28] <jinxer-wm>	 FIRING: KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[04:34:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[04:35:37] <wikibugs>	 (03CR) 10Tim Starling: "What is the source for this list?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079054 (https://phabricator.wikimedia.org/T376923) (owner: 10Pppery)
[04:35:54] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitFailed: wmf_auto_restart_ssh.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:37:57] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[04:39:28] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[04:39:40] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[04:39:54] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_ssh.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:41:09] <jinxer-wm>	 FIRING: [4x] KubernetesAPILatency: High Kubernetes API latency (GET ) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-staging&var-latency_percentile=0.95&var-verb=GET - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[04:42:57] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[04:43:10] <jinxer-wm>	 FIRING: [2x] KubernetesRsyslogDown: rsyslog on kubestage2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[04:44:28] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[04:46:09] <jinxer-wm>	 RESOLVED: [4x] KubernetesAPILatency: High Kubernetes API latency (GET ) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-staging&var-latency_percentile=0.95&var-verb=GET - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[04:48:10] <jinxer-wm>	 RESOLVED: [2x] KubernetesRsyslogDown: rsyslog on kubestage2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[04:54:55] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[04:59:10] <jinxer-wm>	 FIRING: [2x] KubernetesRsyslogDown: rsyslog on kubestage2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[04:59:28] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[04:59:54] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: wmf_auto_restart_ssh.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:02:57] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[05:04:49] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10224547 (10phaultfinder)
[05:09:28] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[05:12:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:14:24] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_ssh.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:16:11] <wikibugs>	 (03CR) 10Pppery: "I started with lists like https://meta.wikimedia.org/w/index.php?title=Wikisource#Wikisources_in_Wikipedia, https://meta.wikimedia.org/wik" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079054 (https://phabricator.wikimedia.org/T376923) (owner: 10Pppery)
[05:19:24] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:29:28] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[05:32:57] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[05:34:10] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[05:34:28] <jinxer-wm>	 RESOLVED: KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[05:37:57] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[05:39:24] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:41:27] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[05:42:46] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[05:42:57] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[05:44:24] <jinxer-wm>	 FIRING: [7x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:52:46] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[05:52:57] <jinxer-wm>	 RESOLVED: KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-staging&var-instance=kubestagemaster2003.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[05:54:09] <jinxer-wm>	 FIRING: [2x] KubernetesAPILatency: High Kubernetes API latency (GET leases) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-staging&var-latency_percentile=0.95&var-verb=GET - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[05:57:46] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[05:59:09] <jinxer-wm>	 RESOLVED: [4x] KubernetesAPILatency: High Kubernetes API latency (GET leases) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[06:01:57] <jinxer-wm>	 FIRING: KubernetesCalicoDown: kubestagemaster2005.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-staging&var-instance=kubestagemaster2005.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[06:04:24] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:06:09] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (GET endpoints) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-staging&var-latency_percentile=0.95&var-verb=GET - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[06:06:57] <jinxer-wm>	 FIRING: [2x] KubernetesCalicoDown: kubestagemaster2004.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[06:08:57] <jinxer-wm>	 RESOLVED: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[06:09:24] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:10:28] <jinxer-wm>	 FIRING: KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[06:11:57] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[06:13:57] <jinxer-wm>	 FIRING: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[06:14:24] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:16:09] <jinxer-wm>	 FIRING: [9x] KubernetesAPILatency: High Kubernetes API latency (GET ) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[06:20:28] <jinxer-wm>	 RESOLVED: KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[06:20:52] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
[06:20:54] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
[06:21:00] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
[06:21:09] <jinxer-wm>	 RESOLVED: [9x] KubernetesAPILatency: High Kubernetes API latency (GET ) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[06:21:14] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
[06:21:15] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
[06:21:27] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:21:28] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
[06:21:35] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69733 and previous config saved to /var/cache/conftool/dbconfig/20241014-062135-arnaudb.json
[06:21:39] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[06:21:57] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[06:22:29] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[06:22:43] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[06:22:46] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:22:49] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69734 and previous config saved to /var/cache/conftool/dbconfig/20241014-062249-arnaudb.json
[06:24:24] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitFailed: imagecatalog_record.service on deploy2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:24:40] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10224612 (10phaultfinder)
[06:25:05] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69735 and previous config saved to /var/cache/conftool/dbconfig/20241014-062505-arnaudb.json
[06:26:27] <jinxer-wm>	 FIRING: [6x] ProbeDown: Service kubestagemaster2003:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:26:57] <jinxer-wm>	 FIRING: [2x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[06:31:57] <jinxer-wm>	 FIRING: [2x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[06:35:09] <jinxer-wm>	 FIRING: [4x] KubernetesAPILatency: High Kubernetes API latency (GET ) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[06:35:28] <jinxer-wm>	 FIRING: KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[06:36:57] <jinxer-wm>	 FIRING: [2x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[06:37:24] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_ssh.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:40:09] <jinxer-wm>	 RESOLVED: [4x] KubernetesAPILatency: High Kubernetes API latency (GET ) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[06:40:12] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P69736 and previous config saved to /var/cache/conftool/dbconfig/20241014-064012-arnaudb.json
[06:40:28] <jinxer-wm>	 FIRING: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[06:41:57] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[06:43:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[06:45:28] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[06:48:40] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on kubestage2002:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubestage2002 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[06:53:46] <wikibugs>	 06SRE, 06serviceops: host rdb1014 is down - https://phabricator.wikimedia.org/T376961#10224623 (10LSobanski)
[06:55:19] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P69737 and previous config saved to /var/cache/conftool/dbconfig/20241014-065519-arnaudb.json
[06:55:21] <wikibugs>	 06SRE: host elastic1064 is down - https://phabricator.wikimedia.org/T376960#10224628 (10LSobanski) 05Open→03Resolved
[06:55:28] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[06:56:47] <wikibugs>	 06SRE: host elastic1064 is down - https://phabricator.wikimedia.org/T376960#10224626 (10LSobanski) Host is up now, see {https://phabricator.wikimedia.org/T376881}
[06:57:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: kube-publish-sa-cert.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:00:05] <jouncebot>	 Amir1 and Urbanecm: Your horoscope predicts another UTC morning backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T0700).
[07:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[07:00:28] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[07:02:24] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: kube-apiserver-safe-restart.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:02:58] <hashar>	 no patch scheduled
[07:03:18] <awight>	 excellent
[07:05:28] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[07:10:26] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69738 and previous config saved to /var/cache/conftool/dbconfig/20241014-071026-arnaudb.json
[07:10:28] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[07:10:30] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[07:10:41] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[07:10:48] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69739 and previous config saved to /var/cache/conftool/dbconfig/20241014-071048-arnaudb.json
[07:13:03] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69740 and previous config saved to /var/cache/conftool/dbconfig/20241014-071302-arnaudb.json
[07:22:01] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69741 and previous config saved to /var/cache/conftool/dbconfig/20241014-072201-arnaudb.json
[07:22:05] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[07:22:24] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: kube-apiserver-safe-restart.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:23:44] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+2] acme_chief: add SAN for requestctl.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1078984 (https://phabricator.wikimedia.org/T371782) (owner: 10Giuseppe Lavagetto)
[07:27:24] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: kube-apiserver-safe-restart.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:28:10] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P69742 and previous config saved to /var/cache/conftool/dbconfig/20241014-072810-arnaudb.json
[07:28:51] <wikibugs>	 (03PS2) 10Brouberol: Import ceph-csi-cephfs chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1077872 (https://phabricator.wikimedia.org/T376406)
[07:28:51] <wikibugs>	 (03PS3) 10Brouberol: Make it possible to deploy provisioner without the snahshotter [deployment-charts] - 10https://gerrit.wikimedia.org/r/1077873 (https://phabricator.wikimedia.org/T376406)
[07:28:51] <wikibugs>	 (03PS3) 10Brouberol: Run the driver-registrar as root [deployment-charts] - 10https://gerrit.wikimedia.org/r/1077874 (https://phabricator.wikimedia.org/T376406)
[07:28:52] <wikibugs>	 (03PS3) 10Brouberol: Disable the priviledged security context of the liveness-prometheus container [deployment-charts] - 10https://gerrit.wikimedia.org/r/1077875 (https://phabricator.wikimedia.org/T376406)
[07:28:52] <wikibugs>	 (03PS3) 10Brouberol: Make it possible to create several storage classes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1078387 (https://phabricator.wikimedia.org/T376406)
[07:28:54] <wikibugs>	 (03PS6) 10Brouberol: Define the ceph-csi-cephfs admin_ng helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/1077878 (https://phabricator.wikimedia.org/T376406)
[07:28:58] <wikibugs>	 (03CR) 10Brouberol: "Sure!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1077875 (https://phabricator.wikimedia.org/T376406) (owner: 10Brouberol)
[07:30:24] <wikibugs>	 (03PS1) 10Michael Große: refactor(HomepageHooks): extract method for simpler modifyability [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079894
[07:32:25] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: kube-apiserver-safe-restart.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:37:08] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69743 and previous config saved to /var/cache/conftool/dbconfig/20241014-073707-arnaudb.json
[07:38:06] <wikibugs>	 (03PS1) 10Brouberol: ceph.backup.s3_local: fix typo in systemd timer command [puppet] - 10https://gerrit.wikimedia.org/r/1079922 (https://phabricator.wikimedia.org/T377104)
[07:39:07] <wikibugs>	 (03CR) 10Brouberol: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4295/co" [puppet] - 10https://gerrit.wikimedia.org/r/1079922 (https://phabricator.wikimedia.org/T377104) (owner: 10Brouberol)
[07:39:25] <wikibugs>	 (03PS2) 10Michael Große: refactor(HomepageHooks): extract method for simpler modifyability [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079894
[07:41:02] <wikibugs>	 (03Abandoned) 10Ayounsi: Monitoring rename pfw3-codfw to pfw1 add new fasw [puppet] - 10https://gerrit.wikimedia.org/r/1079216 (https://phabricator.wikimedia.org/T374176) (owner: 10Ayounsi)
[07:43:17] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P69744 and previous config saved to /var/cache/conftool/dbconfig/20241014-074317-arnaudb.json
[07:45:20] <wikibugs>	 (03PS2) 10Ayounsi: Prefix validator: ensure k8s role and site [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1079525 (https://phabricator.wikimedia.org/T354169)
[07:46:05] <wikibugs>	 (03PS1) 10Brouberol: cloudnative-pg-cluster: Simplify the s3 bucket name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079926
[07:47:25] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: kube-apiserver-safe-restart.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:50:07] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] cloudnative-pg-cluster: Simplify the s3 bucket name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079926 (owner: 10Brouberol)
[07:52:06] <wikibugs>	 (03CR) 10Elukey: "Removing the +1 since I'd like to get more into the clusterrolebinding pattern, I had a chat with Janis and there may be a solution to mak" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1077872 (https://phabricator.wikimedia.org/T376406) (owner: 10Brouberol)
[07:52:15] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69745 and previous config saved to /var/cache/conftool/dbconfig/20241014-075214-arnaudb.json
[07:52:25] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: kube-apiserver-safe-restart.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:52:28] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[07:52:36] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[07:55:28] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[07:57:25] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: kube-apiserver-safe-restart.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:58:23] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69746 and previous config saved to /var/cache/conftool/dbconfig/20241014-075823-arnaudb.json
[07:58:25] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1186.eqiad.wmnet with reason: Maintenance
[07:58:27] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[07:58:39] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1186.eqiad.wmnet with reason: Maintenance
[07:58:45] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69747 and previous config saved to /var/cache/conftool/dbconfig/20241014-075845-arnaudb.json
[08:00:28] <jinxer-wm>	 FIRING: [3x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[08:00:28] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
[08:00:52] <logmsgbot>	 !log jayme@cumin1002 END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM kubestagemaster2005.codfw.wmnet
[08:01:00] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69748 and previous config saved to /var/cache/conftool/dbconfig/20241014-080059-arnaudb.json
[08:01:11] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
[08:02:21] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2004.codfw.wmnet
[08:05:28] <jinxer-wm>	 RESOLVED: [2x] KubernetesAPINotScrapable: k8s-staging@codfw is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[08:07:02] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2005.codfw.wmnet
[08:07:22] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69749 and previous config saved to /var/cache/conftool/dbconfig/20241014-080721-arnaudb.json
[08:07:24] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
[08:07:25] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitFailed: kube-controller-manager.service on kubestagemaster2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:07:25] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[08:07:37] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
[08:07:45] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69750 and previous config saved to /var/cache/conftool/dbconfig/20241014-080744-arnaudb.json
[08:08:05] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2003.codfw.wmnet
[08:09:35] <wikibugs>	 (03PS3) 10Ayounsi: Prefix validator: ensure k8s role and site [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1079525 (https://phabricator.wikimedia.org/T354169)
[08:10:37] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2004.codfw.wmnet
[08:10:53] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[08:11:17] <logmsgbot>	 !log elukey@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[08:11:27] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service kubestagemaster2004:6443 has failed probes (http_staging_codfw_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#kubestagemaster2004:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:11:53] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[08:12:07] <logmsgbot>	 !log elukey@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[08:12:52] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[08:13:55] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2003.codfw.wmnet
[08:13:57] <jinxer-wm>	 RESOLVED: CalicoKubeControllersDown: Calico Kubernetes Controllers not running - https://wikitech.wikimedia.org/wiki/Calico#Kube_Controllers" - TODO - https://alerts.wikimedia.org/?q=alertname%3DCalicoKubeControllersDown
[08:14:42] <jinxer-wm>	 RESOLVED: KubernetesCalicoDown: kubestagemaster2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-staging&var-instance=kubestagemaster2003.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[08:16:07] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P69751 and previous config saved to /var/cache/conftool/dbconfig/20241014-081606-arnaudb.json
[08:16:07] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[08:16:26] <wikibugs>	 (03PS1) 10Cathal Mooney: Add "no_smallnet" term to BGP6_outfilter policy map on CRs [homer/public] - 10https://gerrit.wikimedia.org/r/1079929
[08:21:30] <wikibugs>	 (03CR) 10Volans: "The approach looks good, it would be nice to add a test for it. Couple of suggestions inline." [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) (owner: 10Arnaudb)
[08:22:09] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-staging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[08:22:41] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM!" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1079525 (https://phabricator.wikimedia.org/T354169) (owner: 10Ayounsi)
[08:24:07] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Add api tokens for requestctl web ui [labs/private] - 10https://gerrit.wikimedia.org/r/1079930
[08:25:37] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Add api tokens for requestctl web ui [labs/private] - 10https://gerrit.wikimedia.org/r/1079930 (owner: 10Giuseppe Lavagetto)
[08:26:55] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1078985 (https://phabricator.wikimedia.org/T371782) (owner: 10Giuseppe Lavagetto)
[08:26:57] <wikibugs>	 (03CR) 10Volans: mariadb: add data directory accessor (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078616 (https://phabricator.wikimedia.org/T376701) (owner: 10Arnaudb)
[08:27:46] <wikibugs>	 (03PS1) 10David Caro: toolforge::k8s::deployer: add click libraries [puppet] - 10https://gerrit.wikimedia.org/r/1079931
[08:29:57] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/1079931 (owner: 10David Caro)
[08:30:37] <wikibugs>	 (03PS6) 10Giuseppe Lavagetto: role::alerting_host: add web interface for requestctl [puppet] - 10https://gerrit.wikimedia.org/r/1078985 (https://phabricator.wikimedia.org/T371782)
[08:31:14] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P69752 and previous config saved to /var/cache/conftool/dbconfig/20241014-083113-arnaudb.json
[08:32:08] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4296/co" [puppet] - 10https://gerrit.wikimedia.org/r/1078985 (https://phabricator.wikimedia.org/T371782) (owner: 10Giuseppe Lavagetto)
[08:34:35] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: ml-services: enable multiprocessing for enwiki-damaging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079932 (https://phabricator.wikimedia.org/T363336)
[08:35:57] <wikibugs>	 (03CR) 10David Caro: [C:03+2] toolforge::k8s::deployer: add click libraries [puppet] - 10https://gerrit.wikimedia.org/r/1079931 (owner: 10David Caro)
[08:36:00] <wikibugs>	 (03PS3) 10Arnaudb: mysql_legacy: double quote escape in run_query [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712)
[08:36:43] <wikibugs>	 (03CR) 10Arnaudb: "Test written!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) (owner: 10Arnaudb)
[08:40:02] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[08:40:31] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] "good catch!" [homer/public] - 10https://gerrit.wikimedia.org/r/1079929 (owner: 10Cathal Mooney)
[08:42:14] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Prefix validator: ensure k8s role and site [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1079525 (https://phabricator.wikimedia.org/T354169) (owner: 10Ayounsi)
[08:42:32] <wikibugs>	 (03PS5) 10Arnaudb: mariadb: add data directory accessor [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078616 (https://phabricator.wikimedia.org/T376701)
[08:42:52] <wikibugs>	 (03CR) 10Arnaudb: mariadb: add data directory accessor (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078616 (https://phabricator.wikimedia.org/T376701) (owner: 10Arnaudb)
[08:43:17] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[08:43:23] <wikibugs>	 (03PS6) 10Arnaudb: mariadb: add data directory accessor [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078616 (https://phabricator.wikimedia.org/T376701)
[08:43:28] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2003.codfw.wmnet
[08:43:56] <wikibugs>	 (03Merged) 10jenkins-bot: Prefix validator: ensure k8s role and site [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1079525 (https://phabricator.wikimedia.org/T354169) (owner: 10Ayounsi)
[08:44:35] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mysql_legacy: double quote escape in run_query [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) (owner: 10Arnaudb)
[08:44:41] <wikibugs>	 (03PS1) 10Ayounsi: Netbox: enable prefix validator on -next [puppet] - 10https://gerrit.wikimedia.org/r/1079933 (https://phabricator.wikimedia.org/T354169)
[08:44:42] <wikibugs>	 (03PS1) 10Ayounsi: Netbox: enable prefix validator in prod [puppet] - 10https://gerrit.wikimedia.org/r/1079934 (https://phabricator.wikimedia.org/T354169)
[08:46:21] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69753 and previous config saved to /var/cache/conftool/dbconfig/20241014-084620-arnaudb.json
[08:46:23] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1195.eqiad.wmnet with reason: Maintenance
[08:46:24] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[08:46:36] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1195.eqiad.wmnet with reason: Maintenance
[08:46:43] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69754 and previous config saved to /var/cache/conftool/dbconfig/20241014-084643-arnaudb.json
[08:46:58] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops: Disk (sdv) failed on ms-be1065 - https://phabricator.wikimedia.org/T376775#10224838 (10MatthewVernon) p:05Triage→03High
[08:47:09] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-staging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[08:47:28] <wikibugs>	 (03PS1) 10JMeybohm: kubernetes: Create profile::kubernetes::container_runtime [puppet] - 10https://gerrit.wikimedia.org/r/1079935 (https://phabricator.wikimedia.org/T362408)
[08:48:04] <wikibugs>	 (03CR) 10CI reject: [V:04-1] kubernetes: Create profile::kubernetes::container_runtime [puppet] - 10https://gerrit.wikimedia.org/r/1079935 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[08:48:23] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
[08:48:57] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69755 and previous config saved to /var/cache/conftool/dbconfig/20241014-084856-arnaudb.json
[08:48:58] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
[08:49:05] <wikibugs>	 (03PS2) 10JMeybohm: kubernetes: Create profile::kubernetes::container_runtime [puppet] - 10https://gerrit.wikimedia.org/r/1079935 (https://phabricator.wikimedia.org/T362408)
[08:49:08] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Netbox: enable prefix validator on -next [puppet] - 10https://gerrit.wikimedia.org/r/1079933 (https://phabricator.wikimedia.org/T354169) (owner: 10Ayounsi)
[08:49:19] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2003.codfw.wmnet
[08:49:21] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2004.codfw.wmnet
[08:55:24] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2004.codfw.wmnet
[08:55:27] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
[08:55:42] <wikibugs>	 (03CR) 10Volans: mysql_legacy: double quote escape in run_query (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) (owner: 10Arnaudb)
[08:56:28] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Add "no_smallnet" term to BGP6_outfilter policy map on CRs [homer/public] - 10https://gerrit.wikimedia.org/r/1079929 (owner: 10Cathal Mooney)
[08:57:00] <wikibugs>	 (03Merged) 10jenkins-bot: Add "no_smallnet" term to BGP6_outfilter policy map on CRs [homer/public] - 10https://gerrit.wikimedia.org/r/1079929 (owner: 10Cathal Mooney)
[08:57:15] <wikibugs>	 (03PS4) 10Arnaudb: mysql_legacy: double quote escape in run_query [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712)
[08:58:28] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
[08:58:37] <wikibugs>	 (03CR) 10Arnaudb: mysql_legacy: double quote escape in run_query (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) (owner: 10Arnaudb)
[08:59:47] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM, thx." [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078616 (https://phabricator.wikimedia.org/T376701) (owner: 10Arnaudb)
[09:01:06] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2005.codfw.wmnet
[09:01:20] <wikibugs>	 (03CR) 10Ayounsi: "It's live on Netbox-next, ready for prod." [puppet] - 10https://gerrit.wikimedia.org/r/1079934 (https://phabricator.wikimedia.org/T354169) (owner: 10Ayounsi)
[09:01:29] <wikibugs>	 (03PS14) 10Stevemunene: Setup DPE Ceph alerts [alerts] - 10https://gerrit.wikimedia.org/r/1076460 (https://phabricator.wikimedia.org/T369583)
[09:02:49] <wikibugs>	 (03PS3) 10Clément Goubert: kubernetes: codfw refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079239 (https://phabricator.wikimedia.org/T376170)
[09:03:01] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[09:03:15] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[09:03:16] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[09:03:29] <wikibugs>	 (03PS3) 10Clément Goubert: kubernetes: codfw expansion [puppet] - 10https://gerrit.wikimedia.org/r/1079240 (https://phabricator.wikimedia.org/T376665)
[09:03:33] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[09:03:40] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69756 and previous config saved to /var/cache/conftool/dbconfig/20241014-090340-ladsgroup.json
[09:04:04] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P69757 and previous config saved to /var/cache/conftool/dbconfig/20241014-090403-arnaudb.json
[09:04:40] <wikibugs>	 (03PS2) 10Clément Goubert: kubernetes: eqiad refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079241 (https://phabricator.wikimedia.org/T376185)
[09:05:20] <wikibugs>	 (03CR) 10Btullis: Setup DPE Ceph alerts (033 comments) [alerts] - 10https://gerrit.wikimedia.org/r/1076460 (https://phabricator.wikimedia.org/T369583) (owner: 10Stevemunene)
[09:05:51] <wikibugs>	 (03PS2) 10Clément Goubert: kubernetes: eqiad expansion [puppet] - 10https://gerrit.wikimedia.org/r/1079242 (https://phabricator.wikimedia.org/T376307)
[09:06:35] <wikibugs>	 (03CR) 10Ayounsi: "Thanks ! I think I have a slight preference for the aggregate to keep things more tidy as less prefixes are advertised to WMCS upstreams (" [homer/public] - 10https://gerrit.wikimedia.org/r/1079288 (https://phabricator.wikimedia.org/T245495) (owner: 10Cathal Mooney)
[09:06:44] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mysql_legacy: double quote escape in run_query [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) (owner: 10Arnaudb)
[09:07:27] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1079922 (https://phabricator.wikimedia.org/T377104) (owner: 10Brouberol)
[09:07:37] <wikibugs>	 (03PS5) 10Arnaudb: mysql_legacy: double quote escape in run_query [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712)
[09:08:10] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69758 and previous config saved to /var/cache/conftool/dbconfig/20241014-090810-arnaudb.json
[09:08:14] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[09:08:43] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+1 C:03+2] role::alerting_host: add web interface for requestctl [puppet] - 10https://gerrit.wikimedia.org/r/1078985 (https://phabricator.wikimedia.org/T371782) (owner: 10Giuseppe Lavagetto)
[09:08:56] <wikibugs>	 (03PS2) 10Clément Goubert: kubestage: codfw refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079257 (https://phabricator.wikimedia.org/T376171)
[09:09:49] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
[09:10:39] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops: Disk (sdt) failed on ms-be1075 - https://phabricator.wikimedia.org/T377109 (10MatthewVernon) 03NEW
[09:10:50] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops: Disk (sdt) failed on ms-be1075 - https://phabricator.wikimedia.org/T377109#10224916 (10MatthewVernon) p:05Triage→03High
[09:10:53] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] Netbox: enable prefix validator in prod [puppet] - 10https://gerrit.wikimedia.org/r/1079934 (https://phabricator.wikimedia.org/T354169) (owner: 10Ayounsi)
[09:10:55] <wikibugs>	 (03CR) 10David Caro: [V:03+1 C:03+2] "The only reason is that api.toolforge.org is easier to remember for users, being an intentional user entry point, more than an internal se" [puppet] - 10https://gerrit.wikimedia.org/r/1078986 (https://phabricator.wikimedia.org/T362066) (owner: 10David Caro)
[09:12:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:14:03] <wikibugs>	 (03PS1) 10Clément Goubert: mc-gp: codfw refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079941 (https://phabricator.wikimedia.org/T376968)
[09:14:22] <wikibugs>	 (03PS1) 10Michael Große: refactor(tests): don't use per-method coverage annotation [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079923
[09:15:21] <wikibugs>	 (03PS2) 10Michael Große: Clear LinkRecommendation suggestions on page save [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079915 (https://phabricator.wikimedia.org/T364341)
[09:15:28] <wikibugs>	 (03PS2) 10Michael Große: Run fixLinkRecommendationData even when disabled in CC [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079925 (https://phabricator.wikimedia.org/T373176)
[09:16:10] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+1] "LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079932 (https://phabricator.wikimedia.org/T363336) (owner: 10Ilias Sarantopoulos)
[09:17:02] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] ml-services: enable multiprocessing for enwiki-damaging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079932 (https://phabricator.wikimedia.org/T363336) (owner: 10Ilias Sarantopoulos)
[09:17:07] <wikibugs>	 (03CR) 10Brouberol: [V:03+1 C:03+2] ceph.backup.s3_local: fix typo in systemd timer command [puppet] - 10https://gerrit.wikimedia.org/r/1079922 (https://phabricator.wikimedia.org/T377104) (owner: 10Brouberol)
[09:17:21] <wikibugs>	 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack: Upload redfish licenses to supermicro hosts - https://phabricator.wikimedia.org/T376121#10224935 (10elukey)
[09:18:07] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: enable multiprocessing for enwiki-damaging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079932 (https://phabricator.wikimedia.org/T363336) (owner: 10Ilias Sarantopoulos)
[09:18:33] <wikibugs>	 (03PS1) 10Clément Goubert: mc-gp: eqiad refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079942 (https://phabricator.wikimedia.org/T376186)
[09:19:11] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P69759 and previous config saved to /var/cache/conftool/dbconfig/20241014-091911-arnaudb.json
[09:21:04] <logmsgbot>	 !log isaranto@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
[09:21:06] <wikibugs>	 (03CR) 10Ladsgroup: "I gave some notes." [puppet] - 10https://gerrit.wikimedia.org/r/1078901 (https://phabricator.wikimedia.org/T376726) (owner: 10Kosta Harlan)
[09:22:00] <wikibugs>	 (03CR) 10JMeybohm: "This change should not be required rn. With the correct host header set, you should be able to access mw-api via localhost:6501, keeping t" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079326 (owner: 10Jforrester)
[09:23:05] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] kubestage: codfw refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079257 (https://phabricator.wikimedia.org/T376171) (owner: 10Clément Goubert)
[09:23:17] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69760 and previous config saved to /var/cache/conftool/dbconfig/20241014-092317-arnaudb.json
[09:23:41] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Netbox: enable prefix validator in prod [puppet] - 10https://gerrit.wikimedia.org/r/1079934 (https://phabricator.wikimedia.org/T354169) (owner: 10Ayounsi)
[09:24:18] <wikibugs>	 (03PS4) 10Clément Goubert: kubernetes: codfw refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079239 (https://phabricator.wikimedia.org/T376170)
[09:24:18] <wikibugs>	 (03PS4) 10Clément Goubert: kubernetes: codfw expansion [puppet] - 10https://gerrit.wikimedia.org/r/1079240 (https://phabricator.wikimedia.org/T376665)
[09:24:19] <wikibugs>	 (03PS3) 10Clément Goubert: kubernetes: eqiad refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079241 (https://phabricator.wikimedia.org/T376185)
[09:24:19] <wikibugs>	 (03PS3) 10Clément Goubert: kubernetes: eqiad expansion [puppet] - 10https://gerrit.wikimedia.org/r/1079242 (https://phabricator.wikimedia.org/T376307)
[09:24:45] <wikibugs>	 (03PS3) 10Clément Goubert: kubestage: codfw refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079257 (https://phabricator.wikimedia.org/T376171)
[09:26:55] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: hiddenparma: various bugfixes [puppet] - 10https://gerrit.wikimedia.org/r/1079944
[09:27:28] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] kubernetes: eqiad refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079241 (https://phabricator.wikimedia.org/T376185) (owner: 10Clément Goubert)
[09:29:05] <Amir1>	 jouncebot: nowandnext
[09:29:05] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 30 minute(s)
[09:29:05] <jouncebot>	 In 0 hour(s) and 30 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1000)
[09:30:10] <wikibugs>	 (03CR) 10JMeybohm: [C:04-1] kubernetes: eqiad expansion (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1079242 (https://phabricator.wikimedia.org/T376307) (owner: 10Clément Goubert)
[09:32:51] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: hiddenparma: various bugfixes [puppet] - 10https://gerrit.wikimedia.org/r/1079944
[09:33:01] <wikibugs>	 (03PS4) 10Clément Goubert: kubernetes: eqiad expansion [puppet] - 10https://gerrit.wikimedia.org/r/1079242 (https://phabricator.wikimedia.org/T376307)
[09:33:57] <wikibugs>	 (03CR) 10JMeybohm: [C:04-1] kubernetes: codfw refresh (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1079239 (https://phabricator.wikimedia.org/T376170) (owner: 10Clément Goubert)
[09:34:18] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69761 and previous config saved to /var/cache/conftool/dbconfig/20241014-093418-arnaudb.json
[09:34:21] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1196.eqiad.wmnet with reason: Maintenance
[09:34:21] <wikibugs>	 (03PS1) 10Ladsgroup: Init config for rskwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079947 (https://phabricator.wikimedia.org/T374963)
[09:34:22] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[09:34:27] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4299/co" [puppet] - 10https://gerrit.wikimedia.org/r/1079944 (owner: 10Giuseppe Lavagetto)
[09:34:35] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1196.eqiad.wmnet with reason: Maintenance
[09:34:36] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[09:34:53] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[09:35:00] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69762 and previous config saved to /var/cache/conftool/dbconfig/20241014-093459-arnaudb.json
[09:35:02] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Init config for rskwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079947 (https://phabricator.wikimedia.org/T374963) (owner: 10Ladsgroup)
[09:35:08] <wikibugs>	 (03CR) 10CI reject: [V:04-1] kubernetes: eqiad expansion [puppet] - 10https://gerrit.wikimedia.org/r/1079242 (https://phabricator.wikimedia.org/T376307) (owner: 10Clément Goubert)
[09:36:30] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+1 C:03+2] hiddenparma: various bugfixes [puppet] - 10https://gerrit.wikimedia.org/r/1079944 (owner: 10Giuseppe Lavagetto)
[09:36:46] <logmsgbot>	 !log isaranto@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
[09:37:14] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69763 and previous config saved to /var/cache/conftool/dbconfig/20241014-093713-arnaudb.json
[09:37:28] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+1] alertmanager-irc: improve ErrorBudgetBurn SLO alert text (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1078718 (https://phabricator.wikimedia.org/T376740) (owner: 10Herron)
[09:37:50] <wikibugs>	 (03PS2) 10Ladsgroup: Init config for rskwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079947 (https://phabricator.wikimedia.org/T374963)
[09:38:08] <wikibugs>	 (03PS5) 10Clément Goubert: kubernetes: codfw refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079239 (https://phabricator.wikimedia.org/T376170)
[09:38:24] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] kubernetes: codfw expansion (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1079240 (https://phabricator.wikimedia.org/T376665) (owner: 10Clément Goubert)
[09:38:25] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69764 and previous config saved to /var/cache/conftool/dbconfig/20241014-093824-arnaudb.json
[09:38:36] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1079935 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[09:39:02] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Init config for rskwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079947 (https://phabricator.wikimedia.org/T374963) (owner: 10Ladsgroup)
[09:39:42] <wikibugs>	 (03Merged) 10jenkins-bot: Init config for rskwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079947 (https://phabricator.wikimedia.org/T374963) (owner: 10Ladsgroup)
[09:39:49] <wikibugs>	 (03CR) 10Clément Goubert: kubernetes: codfw refresh (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1079239 (https://phabricator.wikimedia.org/T376170) (owner: 10Clément Goubert)
[09:40:10] <wikibugs>	 (03Abandoned) 10Tiziano Fogli: curator: free up space to safely restart daemons [puppet] - 10https://gerrit.wikimedia.org/r/1064781 (https://phabricator.wikimedia.org/T371961) (owner: 10Tiziano Fogli)
[09:40:20] <wikibugs>	 (03CR) 10Clément Goubert: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1079242 (https://phabricator.wikimedia.org/T376307) (owner: 10Clément Goubert)
[09:41:20] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Creating rskwiki (T374963)
[09:41:24] <stashbot>	 T374963: Create Wikipedia Pannonian Rusyn - https://phabricator.wikimedia.org/T374963
[09:42:26] <wikibugs>	 (03PS5) 10Clément Goubert: kubernetes: eqiad expansion [puppet] - 10https://gerrit.wikimedia.org/r/1079242 (https://phabricator.wikimedia.org/T376307)
[09:43:03] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: hiddenparma: remove notifying the app from the api token files [puppet] - 10https://gerrit.wikimedia.org/r/1079952
[09:43:57] <wikibugs>	 (03CR) 10Clément Goubert: kubernetes: eqiad expansion (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1079242 (https://phabricator.wikimedia.org/T376307) (owner: 10Clément Goubert)
[09:44:29] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] [uawikimedia] Enable the CampaignEvents extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079518 (https://phabricator.wikimedia.org/T376695) (owner: 10Daimona Eaytoy)
[09:44:40] <wikibugs>	 (03CR) 10CI reject: [V:04-1] kubernetes: eqiad expansion [puppet] - 10https://gerrit.wikimedia.org/r/1079242 (https://phabricator.wikimedia.org/T376307) (owner: 10Clément Goubert)
[09:45:22] <wikibugs>	 (03PS6) 10Clément Goubert: kubernetes: eqiad expansion [puppet] - 10https://gerrit.wikimedia.org/r/1079242 (https://phabricator.wikimedia.org/T376307)
[09:45:32] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+2] hiddenparma: remove notifying the app from the api token files [puppet] - 10https://gerrit.wikimedia.org/r/1079952 (owner: 10Giuseppe Lavagetto)
[09:45:48] <wikibugs>	 (03PS6) 10Clément Goubert: kubernetes: codfw refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079239 (https://phabricator.wikimedia.org/T376170)
[09:45:48] <wikibugs>	 (03PS5) 10Clément Goubert: kubernetes: codfw expansion [puppet] - 10https://gerrit.wikimedia.org/r/1079240 (https://phabricator.wikimedia.org/T376665)
[09:45:48] <wikibugs>	 (03PS4) 10Clément Goubert: kubernetes: eqiad refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079241 (https://phabricator.wikimedia.org/T376185)
[09:45:49] <wikibugs>	 (03PS7) 10Clément Goubert: kubernetes: eqiad expansion [puppet] - 10https://gerrit.wikimedia.org/r/1079242 (https://phabricator.wikimedia.org/T376307)
[09:45:51] <wikibugs>	 (03PS4) 10Clément Goubert: kubestage: codfw refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079257 (https://phabricator.wikimedia.org/T376171)
[09:47:54] <wikibugs>	 (03PS26) 10Arnaudb: mariadb: clone cookbook maintenance [cookbooks] - 10https://gerrit.wikimedia.org/r/1071155 (https://phabricator.wikimedia.org/T374191)
[09:48:49] <wikibugs>	 (03PS27) 10Arnaudb: mariadb: clone cookbook maintenance [cookbooks] - 10https://gerrit.wikimedia.org/r/1071155 (https://phabricator.wikimedia.org/T374191)
[09:49:10] <wikibugs>	 (03PS28) 10Arnaudb: mariadb: clone cookbook maintenance [cookbooks] - 10https://gerrit.wikimedia.org/r/1071155 (https://phabricator.wikimedia.org/T374191)
[09:50:29] <wikibugs>	 (03CR) 10Phuedx: [C:03+1] "Further to @aotto@wikimedia.org and @xcollazo@wikimedia.org's +1's. All changes LGTM!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1077504 (https://phabricator.wikimedia.org/T376065) (owner: 10Kimberly Sarabia)
[09:51:08] <wikibugs>	 (03PS1) 10Ladsgroup: Add namespace translations for Tai Nüa (tdd) [core] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079954 (https://phabricator.wikimedia.org/T375421)
[09:51:23] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Add namespace translations for Tai Nüa (tdd) [core] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079954 (https://phabricator.wikimedia.org/T375421) (owner: 10Ladsgroup)
[09:52:21] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P69765 and previous config saved to /var/cache/conftool/dbconfig/20241014-095220-arnaudb.json
[09:53:32] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69766 and previous config saved to /var/cache/conftool/dbconfig/20241014-095331-arnaudb.json
[09:53:34] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
[09:53:36] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[09:53:46] <wikibugs>	 (03PS2) 10Ladsgroup: mariadb: Add SLAVE MONITOR to promotheus grants [puppet] - 10https://gerrit.wikimedia.org/r/1079006
[09:53:48] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
[09:53:50] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] mariadb: Add SLAVE MONITOR to promotheus grants [puppet] - 10https://gerrit.wikimedia.org/r/1079006 (owner: 10Ladsgroup)
[09:53:53] <wikibugs>	 (03CR) 10Ladsgroup: [V:03+2 C:03+2] mariadb: Add SLAVE MONITOR to promotheus grants [puppet] - 10https://gerrit.wikimedia.org/r/1079006 (owner: 10Ladsgroup)
[09:53:55] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69767 and previous config saved to /var/cache/conftool/dbconfig/20241014-095354-arnaudb.json
[09:54:08] <wikibugs>	 (03PS1) 10JMeybohm: Merge worker_containerd values back into worker [labs/private] - 10https://gerrit.wikimedia.org/r/1079955 (https://phabricator.wikimedia.org/T362408)
[09:54:08] <wikibugs>	 (03CR) 10Ladsgroup: [V:03+2 C:03+2] "FWIW, it was on other pc hosts." [puppet] - 10https://gerrit.wikimedia.org/r/1079006 (owner: 10Ladsgroup)
[09:55:21] <hnowlan>	 jouncebot: nowandnext
[09:55:21] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 4 minute(s)
[09:55:21] <jouncebot>	 In 0 hour(s) and 4 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1000)
[09:55:26] <Amir1>	 hnowlan: I'm deploying :D
[09:57:46] <wikibugs>	 (03PS2) 10JMeybohm: Merge worker_containerd back to worker role [labs/private] - 10https://gerrit.wikimedia.org/r/1079955 (https://phabricator.wikimedia.org/T362408)
[09:59:05] <logmsgbot>	 !log oblivian@cumin2002 START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
[09:59:05] <logmsgbot>	 !log oblivian@cumin2002 END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
[09:59:36] <hnowlan>	 Amir1: cool :) 
[09:59:59] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Creating rskwiki (T374963) (duration: 18m 38s)
[09:59:59] <logmsgbot>	 !log oblivian@cumin2002 START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
[10:00:00] <logmsgbot>	 !log oblivian@cumin2002 END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
[10:00:00] <logmsgbot>	 !log eoghan@cumin2002 START - Cookbook sre.hosts.reboot-single for host lists2001.wikimedia.org
[10:00:04] <stashbot>	 T374963: Create Wikipedia Pannonian Rusyn - https://phabricator.wikimedia.org/T374963
[10:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1000)
[10:00:35] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [core] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079954 (https://phabricator.wikimedia.org/T375421) (owner: 10Ladsgroup)
[10:00:45] <akosiaris>	 !log powercycle rdb1014 T376961
[10:00:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:00:48] <stashbot>	 T376961: host rdb1014 is down - https://phabricator.wikimedia.org/T376961
[10:01:39] <wikibugs>	 (03PS29) 10Arnaudb: mariadb: clone cookbook maintenance [cookbooks] - 10https://gerrit.wikimedia.org/r/1071155 (https://phabricator.wikimedia.org/T374191)
[10:02:33] <wikibugs>	 (03CR) 10JMeybohm: [V:03+2 C:03+2] Merge worker_containerd back to worker role [labs/private] - 10https://gerrit.wikimedia.org/r/1079955 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[10:02:53] <wikibugs>	 (03CR) 10Volans: mariadb: clone cookbook maintenance (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1071155 (https://phabricator.wikimedia.org/T374191) (owner: 10Arnaudb)
[10:03:56] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69768 and previous config saved to /var/cache/conftool/dbconfig/20241014-100356-ladsgroup.json
[10:05:12] <wikibugs>	 (03PS1) 10Ladsgroup: Init config for nrwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079958 (https://phabricator.wikimedia.org/T375087)
[10:05:30] <wikibugs>	 (03CR) 10JMeybohm: [V:03+1] "PCC SUCCESS (CORE_DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4300/co" [puppet] - 10https://gerrit.wikimedia.org/r/1079935 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[10:06:19] <logmsgbot>	 !log eoghan@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists2001.wikimedia.org
[10:06:42] <logmsgbot>	 !log eoghan@cumin2002 START - Cookbook sre.hosts.reboot-single for host lists1004.wikimedia.org
[10:07:28] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P69769 and previous config saved to /var/cache/conftool/dbconfig/20241014-100727-arnaudb.json
[10:07:30] <wikibugs>	 (03PS30) 10Arnaudb: mariadb: clone cookbook maintenance [cookbooks] - 10https://gerrit.wikimedia.org/r/1071155 (https://phabricator.wikimedia.org/T374191)
[10:08:20] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] kubernetes: Create profile::kubernetes::container_runtime [puppet] - 10https://gerrit.wikimedia.org/r/1079935 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[10:09:07] <wikibugs>	 (03CR) 10JMeybohm: [V:03+1 C:03+2] kubernetes: Create profile::kubernetes::container_runtime [puppet] - 10https://gerrit.wikimedia.org/r/1079935 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[10:11:57] <wikibugs>	 (03PS1) 10JMeybohm: cumin/aliases: Merge worker_containerd back to worker role [puppet] - 10https://gerrit.wikimedia.org/r/1079960 (https://phabricator.wikimedia.org/T362408)
[10:12:47] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69770 and previous config saved to /var/cache/conftool/dbconfig/20241014-101246-ladsgroup.json
[10:12:50] <stashbot>	 T375652: Wikimedia\Rdbms\DBQueryError: Error 1062: Duplicate entry '1' for key 'PRIMARY' Function: MediaWiki\CheckUser\Services\CheckUserLogService::addLogEntry - https://phabricator.wikimedia.org/T375652
[10:13:31] <logmsgbot>	 !log eoghan@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1004.wikimedia.org
[10:13:55] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69771 and previous config saved to /var/cache/conftool/dbconfig/20241014-101354-ladsgroup.json
[10:14:21] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] cumin/aliases: Merge worker_containerd back to worker role [puppet] - 10https://gerrit.wikimedia.org/r/1079960 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[10:15:57] <wikibugs>	 (03PS1) 10JMeybohm: Remove role kubernetes::staging::worker_containerd [labs/private] - 10https://gerrit.wikimedia.org/r/1079961 (https://phabricator.wikimedia.org/T362408)
[10:17:14] <wikibugs>	 (03CR) 10JMeybohm: [V:03+2 C:03+2] Remove role kubernetes::staging::worker_containerd [labs/private] - 10https://gerrit.wikimedia.org/r/1079961 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[10:17:34] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.mysql.clone of db2194.codfw.wmnet onto db2227.codfw.wmnet
[10:17:46] <wikibugs>	 06SRE, 06serviceops: host rdb1014 is down - https://phabricator.wikimedia.org/T376961#10225247 (10akosiaris) 05Open→03Resolved a:03akosiaris The host has some history of failure per {T370633}  It is the passive failover for rdb1013, which means we have no degradation of anything right now.   Nothing...
[10:19:03] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P69772 and previous config saved to /var/cache/conftool/dbconfig/20241014-101903-ladsgroup.json
[10:19:24] <wikibugs>	 (03CR) 10Volans: [C:03+1] "It now looks in a good state for starting testing it on safe instances with the test-cookbook." [cookbooks] - 10https://gerrit.wikimedia.org/r/1071155 (https://phabricator.wikimedia.org/T374191) (owner: 10Arnaudb)
[10:20:31] <jinxer-wm>	 FIRING: RedisReplicaDown: Redis replica down rdb1014:16379 redis_misc - https://wikitech.wikimedia.org/wiki/Redis#Cluster_redis_misc - https://grafana.wikimedia.org/d/000000174/redis?orgId=1&var-site=eqiad&var-job=redis_misc&var-instance=rdb1014:16379 - https://alerts.wikimedia.org/?q=alertname%3DRedisReplicaDown
[10:21:19] <wikibugs>	 (03CR) 10Clément Goubert: "Thanks for that, hope the amended docstrings are clearer." [cookbooks] - 10https://gerrit.wikimedia.org/r/912813 (https://phabricator.wikimedia.org/T335364) (owner: 10Clément Goubert)
[10:22:31] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, October 14 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079619 (https://phabricator.wikimedia.org/T362620) (owner: 10NMW03)
[10:22:35] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69773 and previous config saved to /var/cache/conftool/dbconfig/20241014-102234-arnaudb.json
[10:22:36] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1206.eqiad.wmnet with reason: Maintenance
[10:22:38] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[10:22:48] <wikibugs>	 (03PS6) 10Clément Goubert: sre.discovery.datacenter: Add failover_from action [cookbooks] - 10https://gerrit.wikimedia.org/r/912813 (https://phabricator.wikimedia.org/T335364)
[10:22:50] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1206.eqiad.wmnet with reason: Maintenance
[10:22:57] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69774 and previous config saved to /var/cache/conftool/dbconfig/20241014-102256-arnaudb.json
[10:24:13] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69775 and previous config saved to /var/cache/conftool/dbconfig/20241014-102412-arnaudb.json
[10:24:22] <wikibugs>	 (03Merged) 10jenkins-bot: Add namespace translations for Tai Nüa (tdd) [core] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079954 (https://phabricator.wikimedia.org/T375421) (owner: 10Ladsgroup)
[10:25:01] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1079954|Add namespace translations for Tai Nüa (tdd) (T375421)]]
[10:25:04] <stashbot>	 T375421: Prepare MessagesTdd.php for Tai Nüa Wikipedia - https://phabricator.wikimedia.org/T375421
[10:25:27] <Nemoralis>	 jouncebot: now
[10:25:27] <jouncebot>	 For the next 0 hour(s) and 34 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1000)
[10:25:29] <wikibugs>	 (03Abandoned) 10Cathal Mooney: Add orlonger to policy on announced v6 routes from cloudsw [homer/public] - 10https://gerrit.wikimedia.org/r/1079288 (https://phabricator.wikimedia.org/T245495) (owner: 10Cathal Mooney)
[10:25:33] <wikibugs>	 (03CR) 10Cathal Mooney: "ok np." [homer/public] - 10https://gerrit.wikimedia.org/r/1079288 (https://phabricator.wikimedia.org/T245495) (owner: 10Cathal Mooney)
[10:27:05] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Backport for [[gerrit:1079954|Add namespace translations for Tai Nüa (tdd) (T375421)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[10:27:17] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Continuing with sync
[10:27:22] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Init config for nrwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079958 (https://phabricator.wikimedia.org/T375087) (owner: 10Ladsgroup)
[10:28:04] <wikibugs>	 (03Merged) 10jenkins-bot: Init config for nrwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079958 (https://phabricator.wikimedia.org/T375087) (owner: 10Ladsgroup)
[10:28:22] <Nemoralis>	 jouncebot: next
[10:28:22] <jouncebot>	 In 2 hour(s) and 31 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1300)
[10:29:39] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10225275 (10phaultfinder)
[10:31:47] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1079954|Add namespace translations for Tai Nüa (tdd) (T375421)]] (duration: 06m 45s)
[10:31:50] <stashbot>	 T375421: Prepare MessagesTdd.php for Tai Nüa Wikipedia - https://phabricator.wikimedia.org/T375421
[10:32:10] <wikibugs>	 (03PS1) 10Ladsgroup: Init config for tddwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079969 (https://phabricator.wikimedia.org/T375422)
[10:33:11] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Creating nrwiki (T375087)
[10:33:15] <stashbot>	 T375087: Create Wikipedia South Ndebele - https://phabricator.wikimedia.org/T375087
[10:34:10] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P69776 and previous config saved to /var/cache/conftool/dbconfig/20241014-103410-ladsgroup.json
[10:34:21] <wikibugs>	 (03PS1) 10JMeybohm: wikikube: Prepare clusters for containerd workers [puppet] - 10https://gerrit.wikimedia.org/r/1079970 (https://phabricator.wikimedia.org/T362408)
[10:35:45] <logmsgbot>	 !log oblivian@cumin2002 START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
[10:35:46] <wikibugs>	 (03PS2) 10JMeybohm: wikikube: Prepare clusters for containerd workers [puppet] - 10https://gerrit.wikimedia.org/r/1079970 (https://phabricator.wikimedia.org/T362408)
[10:35:59] <logmsgbot>	 !log oblivian@cumin2002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
[10:36:29] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] kubernetes: eqiad expansion [puppet] - 10https://gerrit.wikimedia.org/r/1079242 (https://phabricator.wikimedia.org/T376307) (owner: 10Clément Goubert)
[10:36:36] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] kubernetes: codfw refresh [puppet] - 10https://gerrit.wikimedia.org/r/1079239 (https://phabricator.wikimedia.org/T376170) (owner: 10Clément Goubert)
[10:36:52] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1079970 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[10:37:44] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Init config for tddwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079969 (https://phabricator.wikimedia.org/T375422) (owner: 10Ladsgroup)
[10:38:30] <wikibugs>	 (03Merged) 10jenkins-bot: Init config for tddwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079969 (https://phabricator.wikimedia.org/T375422) (owner: 10Ladsgroup)
[10:38:52] <wikibugs>	 (03PS17) 10Jelto: miscweb: add support to mount add confimaps [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079465 (https://phabricator.wikimedia.org/T350793)
[10:39:06] <wikibugs>	 (03PS19) 10Jelto: wikidata-query-gui: mount custom-config.json into pod [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079466 (https://phabricator.wikimedia.org/T350793)
[10:39:20] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P69777 and previous config saved to /var/cache/conftool/dbconfig/20241014-103919-arnaudb.json
[10:40:06] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Creating nrwiki (T375087) (duration: 06m 54s)
[10:40:10] <stashbot>	 T375087: Create Wikipedia South Ndebele - https://phabricator.wikimedia.org/T375087
[10:42:09] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Creating tddwiki (T375422)
[10:42:12] <stashbot>	 T375422: Create Wikipedia Tai Nüa - https://phabricator.wikimedia.org/T375422
[10:43:17] <wikibugs>	 (03PS1) 10Btullis: Remove the dumps_store_load_average icinga check [puppet] - 10https://gerrit.wikimedia.org/r/1079971 (https://phabricator.wikimedia.org/T374821)
[10:44:02] <logmsgbot>	 !log oblivian@cumin2002 START - Cookbook sre.deploy.python-code hiddenparma to alert1002.wikimedia.org with reason: init - oblivian@cumin2002
[10:44:21] <logmsgbot>	 !log oblivian@cumin2002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert1002.wikimedia.org with reason: init - oblivian@cumin2002
[10:44:37] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] mw-script: Add prometheus-statsd-exporter [deployment-charts] - 10https://gerrit.wikimedia.org/r/1078666 (https://phabricator.wikimedia.org/T376714) (owner: 10Alexandros Kosiaris)
[10:44:46] <wikibugs>	 (03PS1) 10Ladsgroup: Init config for annwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079972 (https://phabricator.wikimedia.org/T376332)
[10:45:40] <wikibugs>	 (03Merged) 10jenkins-bot: mw-script: Add prometheus-statsd-exporter [deployment-charts] - 10https://gerrit.wikimedia.org/r/1078666 (https://phabricator.wikimedia.org/T376714) (owner: 10Alexandros Kosiaris)
[10:46:34] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Init config for annwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079972 (https://phabricator.wikimedia.org/T376332) (owner: 10Ladsgroup)
[10:47:16] <wikibugs>	 (03PS1) 10Jcrespo: check footer legal complience: Add support for relative URLs [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789)
[10:47:24] <wikibugs>	 (03Merged) 10jenkins-bot: Init config for annwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079972 (https://phabricator.wikimedia.org/T376332) (owner: 10Ladsgroup)
[10:47:43] <wikibugs>	 (03Restored) 10Btullis: Set a non-default mapreduce file committer algorithm for spark [puppet] - 10https://gerrit.wikimedia.org/r/975006 (https://phabricator.wikimedia.org/T351388) (owner: 10Btullis)
[10:48:02] <jynus>	 elukey: volans sending patch to fix the "Ensure legal html en.wp" alert
[10:48:04] <wikibugs>	 (03CR) 10CI reject: [V:04-1] check footer legal complience: Add support for relative URLs [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789) (owner: 10Jcrespo)
[10:48:37] <elukey>	 jynus: ack thanks!
[10:48:53] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "Reopening this patch, based on the comment here:" [puppet] - 10https://gerrit.wikimedia.org/r/975006 (https://phabricator.wikimedia.org/T351388) (owner: 10Btullis)
[10:48:55] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Creating tddwiki (T375422) (duration: 06m 46s)
[10:49:00] <stashbot>	 T375422: Create Wikipedia Tai Nüa - https://phabricator.wikimedia.org/T375422
[10:49:17] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69778 and previous config saved to /var/cache/conftool/dbconfig/20241014-104916-ladsgroup.json
[10:49:21] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[10:49:29] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] mw-script: Remove ci_only_release_do_not_deploy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1078672 (owner: 10Alexandros Kosiaris)
[10:49:35] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[10:49:36] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mw-script: Remove ci_only_release_do_not_deploy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1078672 (owner: 10Alexandros Kosiaris)
[10:49:37] <wikibugs>	 (03PS4) 10Alexandros Kosiaris: mw-script: Remove ci_only_release_do_not_deploy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1078672
[10:49:42] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69779 and previous config saved to /var/cache/conftool/dbconfig/20241014-104941-ladsgroup.json
[10:51:31] <logmsgbot>	 !log mvernon@cumin1002 START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
[10:52:08] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Creating annwiki (T376332)
[10:52:11] <stashbot>	 T376332: Create Wikipedia Obolo - https://phabricator.wikimedia.org/T376332
[10:52:51] <wikibugs>	 (03PS2) 10Jcrespo: check footer legal complience: Add support for relative URLs [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789)
[10:54:10] <wikibugs>	 (03PS3) 10Jcrespo: check footer legal complience: Add support for relative URLs [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789)
[10:54:21] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69780 and previous config saved to /var/cache/conftool/dbconfig/20241014-105421-arnaudb.json
[10:54:25] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[10:54:27] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P69781 and previous config saved to /var/cache/conftool/dbconfig/20241014-105426-arnaudb.json
[10:55:20] <wikibugs>	 (03PS15) 10Stevemunene: Setup DPE Ceph alerts [alerts] - 10https://gerrit.wikimedia.org/r/1076460 (https://phabricator.wikimedia.org/T369583)
[10:55:27] <logmsgbot>	 !log mvernon@cumin1002 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
[10:55:40] <wikibugs>	 (03CR) 10Alexandros Kosiaris: mw-script: Remove ci_only_release_do_not_deploy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1078672 (owner: 10Alexandros Kosiaris)
[10:55:58] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1078672 (owner: 10Alexandros Kosiaris)
[10:56:48] <wikibugs>	 (03Merged) 10jenkins-bot: mw-script: Remove ci_only_release_do_not_deploy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1078672 (owner: 10Alexandros Kosiaris)
[10:57:11] <wikibugs>	 (03PS1) 10Ladsgroup: Init config for ibawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079974 (https://phabricator.wikimedia.org/T376568)
[10:57:56] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69782 and previous config saved to /var/cache/conftool/dbconfig/20241014-105755-ladsgroup.json
[10:58:53] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Creating annwiki (T376332) (duration: 06m 45s)
[10:58:57] <stashbot>	 T376332: Create Wikipedia Obolo - https://phabricator.wikimedia.org/T376332
[10:59:06] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Init config for ibawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079974 (https://phabricator.wikimedia.org/T376568) (owner: 10Ladsgroup)
[11:00:04] <wikibugs>	 (03Merged) 10jenkins-bot: Init config for ibawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079974 (https://phabricator.wikimedia.org/T376568) (owner: 10Ladsgroup)
[11:00:26] <wikibugs>	 06SRE, 10SRE-swift-storage: The file "XXX" is in an inconsistent state within the internal storage backends - https://phabricator.wikimedia.org/T291137#10225382 (10MatthewVernon) @Yann Please open new tickets if you have a new object you want looking at, otherwise these phab tickets just become a series of loo...
[11:00:32] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Looks good to me." [alerts] - 10https://gerrit.wikimedia.org/r/1076460 (https://phabricator.wikimedia.org/T369583) (owner: 10Stevemunene)
[11:00:42] <logmsgbot>	 !log eoghan@cumin2002 START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
[11:01:04] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Creating ibawiki (T376568)
[11:01:09] <stashbot>	 T376568: Create Wikipedia Iban - https://phabricator.wikimedia.org/T376568
[11:01:27] <wikibugs>	 (03CR) 10Jcrespo: [C:04-1] "Barely tested, needs checking against https://people.wikimedia.org/~jynus/ examples." [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789) (owner: 10Jcrespo)
[11:03:34] <wikibugs>	 (03PS1) 10Ladsgroup: Init config for bclwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079976 (https://phabricator.wikimedia.org/T377084)
[11:05:08] <wikibugs>	 (03CR) 10Hnowlan: php-cli: include mercurius in 8.1 image (032 comments) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1077682 (https://phabricator.wikimedia.org/T371699) (owner: 10Hnowlan)
[11:05:38] <logmsgbot>	 !log eoghan@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet
[11:06:11] <volans>	 thanks jynus 
[11:07:32] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Init config for bclwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079976 (https://phabricator.wikimedia.org/T377084) (owner: 10Ladsgroup)
[11:07:50] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Creating ibawiki (T376568) (duration: 06m 45s)
[11:07:55] <stashbot>	 T376568: Create Wikipedia Iban - https://phabricator.wikimedia.org/T376568
[11:08:24] <wikibugs>	 (03Merged) 10jenkins-bot: Init config for bclwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079976 (https://phabricator.wikimedia.org/T377084) (owner: 10Ladsgroup)
[11:09:28] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69783 and previous config saved to /var/cache/conftool/dbconfig/20241014-110927-arnaudb.json
[11:09:34] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69784 and previous config saved to /var/cache/conftool/dbconfig/20241014-110933-arnaudb.json
[11:09:35] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1207.eqiad.wmnet with reason: Maintenance
[11:09:37] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[11:09:49] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1207.eqiad.wmnet with reason: Maintenance
[11:09:56] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69785 and previous config saved to /var/cache/conftool/dbconfig/20241014-110956-arnaudb.json
[11:09:57] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Creating bclwikisource (T377084)
[11:10:03] <stashbot>	 T377084: Create Wikisource Central Bikol - https://phabricator.wikimedia.org/T377084
[11:12:12] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69786 and previous config saved to /var/cache/conftool/dbconfig/20241014-111211-arnaudb.json
[11:13:03] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P69787 and previous config saved to /var/cache/conftool/dbconfig/20241014-111302-ladsgroup.json
[11:13:58] <wikibugs>	 (03PS3) 10JMeybohm: wikikube: Prepare clusters for containerd workers [puppet] - 10https://gerrit.wikimedia.org/r/1079970 (https://phabricator.wikimedia.org/T362408)
[11:14:12] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1079970 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[11:16:47] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Creating bclwikisource (T377084) (duration: 06m 49s)
[11:16:50] <stashbot>	 T377084: Create Wikisource Central Bikol - https://phabricator.wikimedia.org/T377084
[11:20:44] <wikibugs>	 (03PS1) 10Hashar: Merge tag 'v3.10.2' into wmf/stable-3.10 [software/gerrit] (wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1079981 (https://phabricator.wikimedia.org/T373897)
[11:21:42] <wikibugs>	 (03PS1) 10Cathal Mooney: Remove WMCS codfw prefix from CR aggregate conf and adjust outfilter [homer/public] - 10https://gerrit.wikimedia.org/r/1079982 (https://phabricator.wikimedia.org/T245495)
[11:24:35] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69788 and previous config saved to /var/cache/conftool/dbconfig/20241014-112434-arnaudb.json
[11:26:31] <claime>	 !log Running ./redis-check-aof --fix on rdb1014 tcp_6379 instance - T376961
[11:26:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:26:34] <stashbot>	 T376961: host rdb1014 is down - https://phabricator.wikimedia.org/T376961
[11:27:19] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P69789 and previous config saved to /var/cache/conftool/dbconfig/20241014-112719-arnaudb.json
[11:28:10] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P69790 and previous config saved to /var/cache/conftool/dbconfig/20241014-112809-ladsgroup.json
[11:29:54] <wikibugs>	 (03PS4) 10JMeybohm: wikikube: Prepare clusters for containerd workers [puppet] - 10https://gerrit.wikimedia.org/r/1079970 (https://phabricator.wikimedia.org/T362408)
[11:30:14] <hnowlan>	 Amir1: mind if I do a quick restbase deploy in between your syncs? 
[11:30:23] <Amir1>	 hnowlan: I am done
[11:30:26] <Amir1>	 sorry I forgot to mention
[11:30:31] <jinxer-wm>	 RESOLVED: RedisReplicaDown: Redis replica down rdb1014:16379 redis_misc - https://wikitech.wikimedia.org/wiki/Redis#Cluster_redis_misc - https://grafana.wikimedia.org/d/000000174/redis?orgId=1&var-site=eqiad&var-job=redis_misc&var-instance=rdb1014:16379 - https://alerts.wikimedia.org/?q=alertname%3DRedisReplicaDown
[11:30:48] <wikibugs>	 (03PS5) 10JMeybohm: wikikube: Prepare clusters for containerd workers [puppet] - 10https://gerrit.wikimedia.org/r/1079970 (https://phabricator.wikimedia.org/T362408)
[11:30:55] <logmsgbot>	 !log andrewtavis-wmde@deploy2002 Started deploy [airflow-dags/wmde@c9a2532]: (no justification provided)
[11:30:55] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1079970 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[11:31:01] <logmsgbot>	 !log andrewtavis-wmde@deploy2002 Finished deploy [airflow-dags/wmde@c9a2532]: (no justification provided) (duration: 00m 08s)
[11:31:30] <wikibugs>	 (03CR) 10Stevemunene: [C:03+2] Setup DPE Ceph alerts [alerts] - 10https://gerrit.wikimedia.org/r/1076460 (https://phabricator.wikimedia.org/T369583) (owner: 10Stevemunene)
[11:31:55] <hnowlan>	 np, thanks! 
[11:32:42] <wikibugs>	 (03Merged) 10jenkins-bot: Setup DPE Ceph alerts [alerts] - 10https://gerrit.wikimedia.org/r/1076460 (https://phabricator.wikimedia.org/T369583) (owner: 10Stevemunene)
[11:33:55] <wikibugs>	 (03PS1) 10Ladsgroup: Update interwiki.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079984
[11:34:24] <logmsgbot>	 !log hnowlan@deploy2002 Started deploy [restbase/deploy@26112d4]: Remove unused AQS components. Add bdrwiki (T371761)
[11:34:28] <stashbot>	 T371761: Add bdrwiki to RESTBase - https://phabricator.wikimedia.org/T371761
[11:39:42] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69791 and previous config saved to /var/cache/conftool/dbconfig/20241014-113941-arnaudb.json
[11:39:45] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[11:42:26] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P69792 and previous config saved to /var/cache/conftool/dbconfig/20241014-114225-arnaudb.json
[11:42:40] <wikibugs>	 (03CR) 10Hashar: [C:03+2] Merge tag 'v3.10.2' into wmf/stable-3.10 [software/gerrit] (wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1079981 (https://phabricator.wikimedia.org/T373897) (owner: 10Hashar)
[11:43:17] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69793 and previous config saved to /var/cache/conftool/dbconfig/20241014-114316-ladsgroup.json
[11:43:21] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
[11:43:35] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
[11:43:42] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69794 and previous config saved to /var/cache/conftool/dbconfig/20241014-114341-ladsgroup.json
[11:44:00] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Add aux-k8s-etcd1004 in service [puppet] - 10https://gerrit.wikimedia.org/r/1079534 (https://phabricator.wikimedia.org/T344230) (owner: 10Elukey)
[11:45:51] <Dreamy_Jazz>	 !log Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
[11:45:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:50:02] <logmsgbot>	 !log hnowlan@deploy2002 Finished deploy [restbase/deploy@26112d4]: Remove unused AQS components. Add bdrwiki (T371761) (duration: 15m 38s)
[11:50:07] <stashbot>	 T371761: Add bdrwiki to RESTBase - https://phabricator.wikimedia.org/T371761
[11:50:54] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
[11:51:35] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Add aux-k8s-etcd1004 to the aux-k8s SRV records [dns] - 10https://gerrit.wikimedia.org/r/1079539 (https://phabricator.wikimedia.org/T344230) (owner: 10Elukey)
[11:51:37] <volans>	 !incidents
[11:51:37] <sirenbot>	 5318 (UNACKED)  db2149 (paged)/MariaDB Replica SQL: s3 (paged)
[11:51:37] <sirenbot>	 5316 (RESOLVED)  DDoSDetected sre (netflow5002:9100 eqsin)
[11:51:38] <sirenbot>	 5317 (RESOLVED)  HaproxyUnavailable cache_upload global sre (thanos-rule)
[11:51:38] <sirenbot>	 5315 (RESOLVED)  HaproxyUnavailable cache_upload global sre (thanos-rule)
[11:51:38] <sirenbot>	 5314 (RESOLVED)  ProbeDown sre (2001:df2:e500:ed1a::2:b ip6 upload-https:443 probes/service http_upload-https_ip6 eqsin)
[11:51:38] <sirenbot>	 5313 (RESOLVED)  db2147 (paged)/MariaDB Replica Lag: s4 (paged)
[11:51:48] <volans>	 !ack 5318
[11:51:48] <sirenbot>	 5318 (ACKED)  db2149 (paged)/MariaDB Replica SQL: s3 (paged)
[11:52:15] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2194.codfw.wmnet onto db2227.codfw.wmnet
[11:52:24] <jinxer-wm>	 FIRING: SystemdUnitFailed: etcd.service on aux-k8s-etcd1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:52:25] <volans>	 Amir1, arnaudb related to any ongoing work?
[11:52:36] <volans>	 the page for db2149
[11:52:37] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
[11:52:58] <claime>	 looks like it was being cloned?
[11:53:39] <volans>	 also no page on IRC
[11:53:44] <volans>	 that's weird
[11:53:52] <elukey>	 still haven't got a page from VO 
[11:54:11] <volans>	 :/
[11:54:14] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Also looks good to me. Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1075163 (https://phabricator.wikimedia.org/T350143) (owner: 10Hnowlan)
[11:54:26] <wikibugs>	 (03Merged) 10jenkins-bot: Merge tag 'v3.10.2' into wmf/stable-3.10 [software/gerrit] (wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1079981 (https://phabricator.wikimedia.org/T373897) (owner: 10Hashar)
[11:54:37] <Emperor>	 the database server alerts seem to come out from nagios to alerts@w.o directly.
[11:55:46] <volans>	 Emperor: so? they shoukd page here anyway
[11:56:27] <logmsgbot>	 !log aborrero@cumin1002 START - Cookbook sre.dns.netbox
[11:56:51] <Dreamy_Jazz>	 !log Started time limited scan on enwiki for MediaModeration - https://wikitech.wikimedia.org/wiki/MediaModeration
[11:56:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:57:15] <elukey>	 so it seems an s3 replica
[11:57:24] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: etcd.service on aux-k8s-etcd1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:57:33] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69796 and previous config saved to /var/cache/conftool/dbconfig/20241014-115732-arnaudb.json
[11:57:34] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1218.eqiad.wmnet with reason: Maintenance
[11:57:36] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[11:57:48] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1218.eqiad.wmnet with reason: Maintenance
[11:57:55] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69797 and previous config saved to /var/cache/conftool/dbconfig/20241014-115755-arnaudb.json
[11:57:59] <elukey>	 Slave: Index for table 'recentchanges' is corrupt; try to repair it Error_code: 1034
[11:58:08] <elukey>	 there seems to be an issue in replication afaics
[11:58:53] <elukey>	 I've never depooled an mariadb instance but I'd say it is the case
[11:58:56] <elukey>	 checking docs
[11:59:48] <Amir1>	 volans: it's not the clone, the same mariadb bug
[11:59:51] <logmsgbot>	 !log aborrero@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb2004-dev cloud-private adddress - aborrero@cumin1002"
[11:59:55] <logmsgbot>	 !log aborrero@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb2004-dev cloud-private adddress - aborrero@cumin1002"
[11:59:55] <logmsgbot>	 !log aborrero@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:00:01] <elukey>	 Amir1: o/ should I dbctl depool it?
[12:00:11] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69798 and previous config saved to /var/cache/conftool/dbconfig/20241014-120011-arnaudb.json
[12:00:14] <Amir1>	 elukey: no :( I have depooled two already
[12:00:29] <Amir1>	 we should fix it right away, let me do it
[12:00:29] <elukey>	 ack :(
[12:00:32] <elukey>	 <3
[12:00:36] <elukey>	 lemme know how/if I can help
[12:01:16] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Add aux-k8s-etcd1005 to the Aux k8s SRV records [dns] - 10https://gerrit.wikimedia.org/r/1079540 (https://phabricator.wikimedia.org/T344230) (owner: 10Elukey)
[12:01:30] <Amir1>	 elukey: it should be fixed now
[12:01:44] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Add aux-k8s-etcd1005 in service [puppet] - 10https://gerrit.wikimedia.org/r/1079535 (https://phabricator.wikimedia.org/T344230) (owner: 10Elukey)
[12:01:51] <wikibugs>	 (03PS2) 10Elukey: Add aux-k8s-etcd1005 in service [puppet] - 10https://gerrit.wikimedia.org/r/1079535 (https://phabricator.wikimedia.org/T344230)
[12:01:53] <wikibugs>	 (03CR) 10Elukey: [V:03+2 C:03+2] Add aux-k8s-etcd1005 in service [puppet] - 10https://gerrit.wikimedia.org/r/1079535 (https://phabricator.wikimedia.org/T344230) (owner: 10Elukey)
[12:02:21] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: cloudlb2004-dev: replace cloudlb2001-dev [puppet] - 10https://gerrit.wikimedia.org/r/1079987 (https://phabricator.wikimedia.org/T377126)
[12:02:32] <volans>	 !incodents
[12:02:37] <volans>	 !incidents
[12:02:38] <sirenbot>	 5318 (RESOLVED)  db2149 (paged)/MariaDB Replica SQL: s3 (paged)
[12:02:38] <sirenbot>	 5319 (RESOLVED)  db2149 (paged)/MariaDB Replica Lag: s3 (paged)
[12:02:38] <sirenbot>	 5316 (RESOLVED)  DDoSDetected sre (netflow5002:9100 eqsin)
[12:02:38] <sirenbot>	 5317 (RESOLVED)  HaproxyUnavailable cache_upload global sre (thanos-rule)
[12:02:38] <sirenbot>	 5315 (RESOLVED)  HaproxyUnavailable cache_upload global sre (thanos-rule)
[12:02:39] <sirenbot>	 5314 (RESOLVED)  ProbeDown sre (2001:df2:e500:ed1a::2:b ip6 upload-https:443 probes/service http_upload-https_ip6 eqsin)
[12:02:39] <sirenbot>	 5313 (RESOLVED)  db2147 (paged)/MariaDB Replica Lag: s4 (paged)
[12:02:41] <elukey>	 yep 
[12:02:46] <elukey>	 Amir1: <3
[12:02:58] <volans>	 thanks Amir1!
[12:04:22] <wikibugs>	 (03CR) 10CI reject: [V:04-1] cloudlb2004-dev: replace cloudlb2001-dev [puppet] - 10https://gerrit.wikimedia.org/r/1079987 (https://phabricator.wikimedia.org/T377126) (owner: 10Arturo Borrero Gonzalez)
[12:06:55] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: cloudlb2004-dev: replace cloudlb2001-dev [puppet] - 10https://gerrit.wikimedia.org/r/1079987 (https://phabricator.wikimedia.org/T377126)
[12:08:54] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: etcd.service on aux-k8s-etcd1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:09:48] <elukey>	 !log increase etcd k8s aux cluster from 3 -> 5 - T344230
[12:09:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:09:51] <stashbot>	 T344230: Get aux-k8s cluster row-redundant and with more workers - https://phabricator.wikimedia.org/T344230
[12:12:05] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] "lgtm!" [homer/public] - 10https://gerrit.wikimedia.org/r/1079982 (https://phabricator.wikimedia.org/T245495) (owner: 10Cathal Mooney)
[12:13:54] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitFailed: etcd.service on aux-k8s-etcd1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:15:18] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P69799 and previous config saved to /var/cache/conftool/dbconfig/20241014-121518-arnaudb.json
[12:19:44] <wikibugs>	 (03CR) 10Hnowlan: [V:03+1 C:03+2] aqs: remove AQSv1 service components [puppet] - 10https://gerrit.wikimedia.org/r/1075163 (https://phabricator.wikimedia.org/T350143) (owner: 10Hnowlan)
[12:21:00] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079997
[12:22:01] <wikibugs>	 (03PS1) 10Elukey: Remove aux-k8x-{ctrl,worker}1001 from production [puppet] - 10https://gerrit.wikimedia.org/r/1079999 (https://phabricator.wikimedia.org/T344230)
[12:22:24] <logmsgbot>	 !log elukey@puppetserver1001 conftool action : set/pooled=inactive; selector: name=aux-k8s-ctrl1001.eqiad.wmnet
[12:22:30] <logmsgbot>	 !log elukey@puppetserver1001 conftool action : set/pooled=inactive; selector: name=aux-k8s-worker1001.eqiad.wmnet
[12:23:56] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.decommission for hosts aux-k8s-ctrl1001.eqiad.wmnet
[12:24:45] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Remove WMCS codfw prefix from CR aggregate conf and adjust outfilter [homer/public] - 10https://gerrit.wikimedia.org/r/1079982 (https://phabricator.wikimedia.org/T245495) (owner: 10Cathal Mooney)
[12:26:01] <wikibugs>	 (03Merged) 10jenkins-bot: Remove WMCS codfw prefix from CR aggregate conf and adjust outfilter [homer/public] - 10https://gerrit.wikimedia.org/r/1079982 (https://phabricator.wikimedia.org/T245495) (owner: 10Cathal Mooney)
[12:28:49] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.dns.netbox
[12:30:25] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P69800 and previous config saved to /var/cache/conftool/dbconfig/20241014-123025-arnaudb.json
[12:30:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on aux-k8s-worker1001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=aux-k8s-worker1001 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[12:30:57] <hnowlan>	 !log removed all aqsv1 service components from aqs* hosts 
[12:30:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:32:19] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
[12:32:23] <Nemoralis>	 jouncebot: next
[12:32:23] <jouncebot>	 In 0 hour(s) and 27 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1300)
[12:32:38] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
[12:32:38] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:32:38] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-ctrl1001.eqiad.wmnet
[12:32:52] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.decommission for hosts aux-k8s-worker1001.eqiad.wmnet
[12:35:18] <wikibugs>	 (03PS1) 10Hashar: Gerrit 3.10.2 and rebuild plugins [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1080009 (https://phabricator.wikimedia.org/T373897)
[12:37:44] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.dns.netbox
[12:38:53] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'db2227 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69801 and previous config saved to /var/cache/conftool/dbconfig/20241014-123853-ladsgroup.json
[12:40:59] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-worker1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
[12:43:14] <wikibugs>	 (03CR) 10Elukey: "Added two suggestions to use urllib parse directly, I'll stand by until you are ready for review!" [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789) (owner: 10Jcrespo)
[12:43:28] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-worker1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
[12:43:29] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:43:29] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-worker1001.eqiad.wmnet
[12:43:58] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69802 and previous config saved to /var/cache/conftool/dbconfig/20241014-124357-ladsgroup.json
[12:44:25] <logmsgbot>	 !log aqu@deploy2002 Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
[12:44:38] <logmsgbot>	 !log aqu@deploy2002 Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 12s)
[12:45:26] <wikibugs>	 (03PS2) 10Hashar: Gerrit 3.10.2 and rebuild plugins [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1080009 (https://phabricator.wikimedia.org/T373897)
[12:45:32] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69803 and previous config saved to /var/cache/conftool/dbconfig/20241014-124532-arnaudb.json
[12:45:34] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
[12:45:36] <stashbot>	 T367781: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781
[12:45:47] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
[12:45:52] <wikibugs>	 (03CR) 10Hashar: [C:03+2] Gerrit 3.10.2 and rebuild plugins [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1080009 (https://phabricator.wikimedia.org/T373897) (owner: 10Hashar)
[12:45:54] <wikibugs>	 (03PS1) 10Elukey: kubernetes: change the AUX etcd urls nodes [puppet] - 10https://gerrit.wikimedia.org/r/1080011 (https://phabricator.wikimedia.org/T344230)
[12:45:54] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69804 and previous config saved to /var/cache/conftool/dbconfig/20241014-124554-arnaudb.json
[12:46:25] <wikibugs>	 (03Merged) 10jenkins-bot: Gerrit 3.10.2 and rebuild plugins [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1080009 (https://phabricator.wikimedia.org/T373897) (owner: 10Hashar)
[12:47:53] <wikibugs>	 10SRE-tools, 06Data-Persistence-SRE, 06DBA, 10Spicerack: mariadb: systemctl status accessor in mysql_legacy - https://phabricator.wikimedia.org/T377129 (10ABran-WMF) 03NEW
[12:47:56] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Remove aux-k8x-{ctrl,worker}1001 from production [puppet] - 10https://gerrit.wikimedia.org/r/1079999 (https://phabricator.wikimedia.org/T344230) (owner: 10Elukey)
[12:48:00] <wikibugs>	 (03PS6) 10JMeybohm: wikikube: Prepare clusters for containerd workers [puppet] - 10https://gerrit.wikimedia.org/r/1079970 (https://phabricator.wikimedia.org/T362408)
[12:48:08] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1079970 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[12:48:51] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Remove aux-k8x-{ctrl,worker}1001 from production [puppet] - 10https://gerrit.wikimedia.org/r/1079999 (https://phabricator.wikimedia.org/T344230) (owner: 10Elukey)
[12:48:58] <wikibugs>	 10SRE-tools, 06Data-Persistence-SRE, 06DBA, 06Infrastructure-Foundations, 10Spicerack: mariadb: systemctl status accessor in mysql_legacy - https://phabricator.wikimedia.org/T377129#10225683 (10ABran-WMF) p:05Triage→03Medium
[12:49:00] <wikibugs>	 10SRE-tools, 06Data-Persistence-SRE, 06DBA, 06Infrastructure-Foundations, 10Spicerack: mariadb: systemctl status accessor in mysql_legacy - https://phabricator.wikimedia.org/T377129#10225686 (10ABran-WMF)
[12:49:15] <wikibugs>	 (03CR) 10Elukey: [C:03+2] kubernetes: change the AUX etcd urls nodes [puppet] - 10https://gerrit.wikimedia.org/r/1080011 (https://phabricator.wikimedia.org/T344230) (owner: 10Elukey)
[12:49:30] <wikibugs>	 (03PS1) 10Ayounsi: re-image: ask user about migrating to per-rack vlan/IP [cookbooks] - 10https://gerrit.wikimedia.org/r/1080012
[12:50:14] <wikibugs>	 (03PS2) 10Ayounsi: re-image: ask user about migrating to per-rack vlan/IP [cookbooks] - 10https://gerrit.wikimedia.org/r/1080012
[12:53:59] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'db2227 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69805 and previous config saved to /var/cache/conftool/dbconfig/20241014-125358-ladsgroup.json
[12:56:28] <wikibugs>	 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10225742 (10aborrero) >>! In T375847#10217807, @cmooney wrote: >>>! In T375847#10195673, @aborrero wrote: >> `lang=shell-session >> roo...
[12:57:21] <wikibugs>	 (03PS1) 10Elukey: Remove aux-k8s-etcd100[1,2] from the AUX client SRV records [dns] - 10https://gerrit.wikimedia.org/r/1080016 (https://phabricator.wikimedia.org/T344230)
[12:57:22] <wikibugs>	 (03PS1) 10Elukey: Remove aux-k8s-etcd1001 from the AUX cluster's SRV records [dns] - 10https://gerrit.wikimedia.org/r/1080017 (https://phabricator.wikimedia.org/T344230)
[12:57:24] <wikibugs>	 (03PS1) 10Elukey: Remove aux-k8s-etcd1002 from the AUX cluster's SRV records [dns] - 10https://gerrit.wikimedia.org/r/1080018 (https://phabricator.wikimedia.org/T344230)
[12:57:32] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, October 14 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079923 (owner: 10Michael Große)
[12:57:54] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, October 14 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079894 (owner: 10Michael Große)
[12:58:11] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, October 14 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079915 (https://phabricator.wikimedia.org/T364341) (owner: 10Michael Große)
[12:58:24] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, October 14 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079925 (https://phabricator.wikimedia.org/T373176) (owner: 10Michael Große)
[12:59:04] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P69806 and previous config saved to /var/cache/conftool/dbconfig/20241014-125904-ladsgroup.json
[12:59:23] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "I’m confused, I don’t see how this is different from the change Ie3906e3b67 that had to be reverted… you say you forgot to update `wgMetaN" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079619 (https://phabricator.wikimedia.org/T362620) (owner: 10NMW03)
[13:00:04] <jouncebot>	 Lucas_WMDE, Urbanecm, awight, and TheresNoTime: That opportune time for a UTC afternoon backport window deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1300).
[13:00:05] <jouncebot>	 Daimona, Nemoralis, and MichaelG_WMF: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:13] <Lucas_WMDE>	 o/
[13:00:16] <MichaelG_WMF>	 o/
[13:00:25] <Nemoralis>	 jouncebot: now
[13:00:25] <jouncebot>	 For the next 0 hour(s) and 59 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1300)
[13:00:25] <Daimona>	 o/
[13:00:27] <Nemoralis>	 o/
[13:00:35] <Lucas_WMDE>	 ooh, a lot of patches suddenly appeared
[13:00:46] <MichaelG_WMF>	 My changes can be deployed alltogether and are not testable 
[13:00:56] <Lucas_WMDE>	 lots of PHP notice in logspam-watch, meh
[13:01:25] <MichaelG_WMF>	 maybe the change to the maint script is in principle, because it changes the output, but no idea how that works with deployment hosts
[13:01:38] <Lucas_WMDE>	 not sure either
[13:01:41] <MichaelG_WMF>	 @Lucas_WMDE Where can I see the PHP notices?
[13:01:42] <Lucas_WMDE>	 I guess I could scap pull on mwmaint
[13:02:00] <Lucas_WMDE>	 MichaelG_WMF: I see them in logspam-watch on mwlog2002
[13:02:08] <Lucas_WMDE>	 presumably they’re also in logstash somewhere
[13:02:14] <Lucas_WMDE>	 but maybe not in mediawiki-errors, not sure
[13:02:16] <Lucas_WMDE>	 they’re from BackupDumper
[13:02:29] <Lucas_WMDE>	 anyway, let’s start with Daimona 
[13:02:35] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079518 (https://phabricator.wikimedia.org/T376695) (owner: 10Daimona Eaytoy)
[13:02:50] <Lucas_WMDE>	 out of curiosity, why are so many wikis getting CampaignEvents enabled recently? is it a new extension? ^^
[13:03:12] <HouseOfM>	 We've just gotten a lot of traction
[13:03:18] <wikibugs>	 (03Merged) 10jenkins-bot: [uawikimedia] Enable the CampaignEvents extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079518 (https://phabricator.wikimedia.org/T376695) (owner: 10Daimona Eaytoy)
[13:03:20] <Nemoralis>	 Lucas_WMDE: I just saw your comment
[13:03:35] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1079518|[uawikimedia] Enable the CampaignEvents extension (T376695)]]
[13:03:38] <stashbot>	 T376695: Enable CampaignEvents Extension on Wikimedia Ukraine's wiki [Oct 14] - https://phabricator.wikimedia.org/T376695
[13:03:43] <Lucas_WMDE>	 Nemoralis: yeah, I was just looking at it before the window started
[13:03:45] <Nemoralis>	 I somehow thought I had forgotten it
[13:03:46] <Daimona>	 ^that. It's somewhat new but not really new
[13:03:52] <Lucas_WMDE>	 ok ^^
[13:04:05] <Nemoralis>	 then what could be the reason for the previous change not working
[13:04:13] <Lucas_WMDE>	 no idea :/
[13:04:18] <wikibugs>	 (03CR) 10NMW03: "o_O" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079619 (https://phabricator.wikimedia.org/T362620) (owner: 10NMW03)
[13:05:11] <Lucas_WMDE>	 MichaelG_WMF: I see the fwrite notices in mediawiki-errors in logstash too fwiw
[13:05:27] * Lucas_WMDE makes a task
[13:05:51] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, daimona: Backport for [[gerrit:1079518|[uawikimedia] Enable the CampaignEvents extension (T376695)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:05:54] <MichaelG_WMF>	 Lucas_WMDE: thanks, now I've also connected mwlog2002 and am looking at logspam watch
[13:05:54] <Nemoralis>	 Lucas_WMDE: maybe because deployer didn't run the maintenance script
[13:05:57] <Nemoralis>	 I remember that
[13:06:41] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Remove aux-k8s-etcd100[1,2] from the AUX client SRV records [dns] - 10https://gerrit.wikimedia.org/r/1080016 (https://phabricator.wikimedia.org/T344230) (owner: 10Elukey)
[13:06:41] <Nemoralis>	 it was Urbanecm iirc, he didn't run the maintenance script
[13:06:55] * urbanecm was summoned
[13:06:59] <Nemoralis>	 hi
[13:07:05] <Lucas_WMDE>	 Daimona: please test :)
[13:08:13] <urbanecm>	 Nemoralis: can you clarify what recent action of mine are you referring to?
[13:08:28] <HouseOfM>	 LGTM @Daimona
[13:08:38] <Nemoralis>	 urbanecm: we are talking about this patch
[13:08:38] <Nemoralis>	 https://gerrit.wikimedia.org/r/q/Ie3906e3b67
[13:08:47] <Daimona>	 Yep LGTM too
[13:09:04] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'db2227 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69807 and previous config saved to /var/cache/conftool/dbconfig/20241014-130904-ladsgroup.json
[13:09:15] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, daimona: Continuing with sync
[13:09:18] <Lucas_WMDE>	 ok, thanks!
[13:09:21] <Daimona>	 Thank you!
[13:09:26] <Lucas_WMDE>	 filed T377136 for the logspam-watch fwrite errors btw
[13:09:27] <stashbot>	 T377136: PHP Notice: fwrite(): write of X bytes failed with errno=32 Broken pipe - https://phabricator.wikimedia.org/T377136
[13:12:06] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Remove aux-k8s-etcd1001 from the AUX cluster's SRV records [dns] - 10https://gerrit.wikimedia.org/r/1080017 (https://phabricator.wikimedia.org/T344230) (owner: 10Elukey)
[13:12:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:12:57] <Lucas_WMDE>	 “SUCCESS in 53m 08s” ._.
[13:13:07] <Lucas_WMDE>	 does GrowthExperiments CI usually take this long or did that build get unlucky?
[13:13:13] <Lucas_WMDE>	 anyway, let’s +2 those backports already
[13:13:25] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "start gate-and-submit ahead of deployment" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079923 (owner: 10Michael Große)
[13:13:29] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "start gate-and-submit ahead of deployment" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079894 (owner: 10Michael Große)
[13:13:35] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "start gate-and-submit ahead of deployment" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079915 (https://phabricator.wikimedia.org/T364341) (owner: 10Michael Große)
[13:13:42] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "start gate-and-submit ahead of deployment" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079925 (https://phabricator.wikimedia.org/T373176) (owner: 10Michael Große)
[13:13:54] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1079518|[uawikimedia] Enable the CampaignEvents extension (T376695)]] (duration: 10m 19s)
[13:13:59] <stashbot>	 T376695: Enable CampaignEvents Extension on Wikimedia Ukraine's wiki [Oct 14] - https://phabricator.wikimedia.org/T376695
[13:14:11] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P69808 and previous config saved to /var/cache/conftool/dbconfig/20241014-131411-ladsgroup.json
[13:14:19] <MichaelG_WMF>	 Lucas_WMDE: I think that build was particularly unlucky, but it does take a long time
[13:14:23] <Lucas_WMDE>	 ok
[13:14:31] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Remove aux-k8s-etcd1002 from the AUX cluster's SRV records [dns] - 10https://gerrit.wikimedia.org/r/1080018 (https://phabricator.wikimedia.org/T344230) (owner: 10Elukey)
[13:14:43] <Nemoralis>	 Lucas_WMDE: just checked the irc archive, yes we didn't run the maintenance script
[13:14:53] <Nemoralis>	 is it required? I don't remember
[13:14:57] <MichaelG_WMF>	 running the phpunit tests in parallel really makes a big difference for GrowthExperiments, hopefully this will be widely available soon
[13:15:27] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: about to decom
[13:15:31] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: about to decom
[13:15:52] <Nemoralis>	 also, I am not sure if mediawiki recognizes the different quote on namespace
[13:16:13] <Lucas_WMDE>	 Nemoralis: I just looked at the archive too (https://wm-bot.wmcloud.org/browser/index.php?start=04%2F24%2F2024&end=04%2F24%2F2024&display=%23wikimedia-operations)
[13:16:15] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: about to decom
[13:16:23] <Nemoralis>	 it is Vikilug‘at => Vikilugʻat
[13:16:23] <Lucas_WMDE>	 it’s not really clear what didn’t work in the first place IMHO
[13:16:29] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: about to decom
[13:16:40] <Lucas_WMDE>	 you just said something didn’t work and then it was reverted
[13:17:47] <Lucas_WMDE>	 let me try to find another config change that did the same for another wiki
[13:17:48] <Lucas_WMDE>	 for comparison
[13:18:09] <Nemoralis>	 I said "didn't work" in the sense that I see the old namespace instead of new 
[13:18:09] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.decommission for hosts aux-k8s-etcd1001.eqiad.wmnet
[13:18:32] <Nemoralis>	 you can check https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1076254
[13:18:34] <wikibugs>	 (03CR) 10Jcrespo: "Funny how the tables have turned :-) - but please use proper sql argument interpolation on query, not fstrings (provide a tuple). Will do " [cookbooks] - 10https://gerrit.wikimedia.org/r/1079536 (https://phabricator.wikimedia.org/T375144) (owner: 10Volans)
[13:18:36] <Nemoralis>	 it is same thing
[13:18:44] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] mariadb: add data directory accessor [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078616 (https://phabricator.wikimedia.org/T376701) (owner: 10Arnaudb)
[13:19:24] <jinxer-wm>	 FIRING: SystemdUnitFailed: mwscript-cleanup.service on deploy1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:19:28] <wikibugs>	 (03PS2) 10Arnaudb: mariadb: get systemd status for instance [software/spicerack] - 10https://gerrit.wikimedia.org/r/1080019 (https://phabricator.wikimedia.org/T377129)
[13:20:19] <wikibugs>	 (03PS1) 10Elukey: Remove aux-k8s-etcd100[1,2] from production [puppet] - 10https://gerrit.wikimedia.org/r/1080022 (https://phabricator.wikimedia.org/T344230)
[13:21:18] <wikibugs>	 (03CR) 10Volans: "@jcrespo@wikimedia.org as we're running the query via ssh on the CLI we don't have a client able to perform proper interpolation. What wou" [cookbooks] - 10https://gerrit.wikimedia.org/r/1079536 (https://phabricator.wikimedia.org/T375144) (owner: 10Volans)
[13:21:55] <wikibugs>	 (03CR) 10Jcrespo: [C:04-1] "Thank you elukey, I thought the lib would only provide url parsing, not relative url resolution, but you showed me that it does. Amending." [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789) (owner: 10Jcrespo)
[13:22:40] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.dns.netbox
[13:23:28] <wikibugs>	 (03PS5) 10Hashar: contint: define component/ci only once [puppet] - 10https://gerrit.wikimedia.org/r/1074468 (https://phabricator.wikimedia.org/T375278)
[13:23:28] <wikibugs>	 (03CR) 10Hashar: "I was waiting for the parent change to merge in order to review the catalogue compilation. The only oddity I found was `Package[jenkins]` " [puppet] - 10https://gerrit.wikimedia.org/r/1074468 (https://phabricator.wikimedia.org/T375278) (owner: 10Hashar)
[13:24:09] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'db2227 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69809 and previous config saved to /var/cache/conftool/dbconfig/20241014-132409-ladsgroup.json
[13:25:00] <Lucas_WMDE>	 Nemoralis: if I understand correctly, existing pages should have shown the new namespace name without the maintenance script
[13:25:09] <Lucas_WMDE>	 (though you might have to access them via ?curid=)
[13:26:04] <Nemoralis>	 yes I said "didn't work" because I saw the old namespace instead of new
[13:26:09] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
[13:26:16] <Lucas_WMDE>	 yeah… then I don’t know what we would need to fix
[13:26:22] <Lucas_WMDE>	 since it sounds like it wasn’t just the maintenance script missing
[13:26:28] <wikibugs>	 (03PS1) 10Ammarpad: contactpages: Move stewards contactpage to MetaContactPages.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080023
[13:26:28] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
[13:26:28] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:26:29] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-etcd1001.eqiad.wmnet
[13:26:51] <Nemoralis>	 the code is same as https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1076254
[13:26:54] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.decommission for hosts aux-k8s-etcd1002.eqiad.wmnet
[13:26:54] <Nemoralis>	 I am not sure either
[13:26:59] <wikibugs>	 (03PS2) 10Ammarpad: contactpages: Move stewards contactpage to MetaContactPages.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080023
[13:27:21] <Lucas_WMDE>	 “(though you might have to access them via ?curid=)” – disregard that part, that’s what the namespace alias is for, nevermind
[13:27:31] <Lucas_WMDE>	 I think we have to skip this config change then :/
[13:27:40] <wikibugs>	 (03CR) 10CI reject: [V:04-1] contactpages: Move stewards contactpage to MetaContactPages.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080023 (owner: 10Ammarpad)
[13:27:49] <Nemoralis>	 ok, I will recheck it then
[13:28:22] <Lucas_WMDE>	 good luck…
[13:28:34] <wikibugs>	 (03CR) 10FNegri: alertmanager: fix WMCS template (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1077038 (https://phabricator.wikimedia.org/T375479) (owner: 10FNegri)
[13:28:45] <wikibugs>	 (03PS1) 10Elukey: admin_ng: remove ad-hoc anti-affinity rules for Calico typha in AUX [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080024 (https://phabricator.wikimedia.org/T333302)
[13:28:51] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079923 (owner: 10Michael Große)
[13:28:51] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079894 (owner: 10Michael Große)
[13:28:52] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079915 (https://phabricator.wikimedia.org/T364341) (owner: 10Michael Große)
[13:28:54] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079925 (https://phabricator.wikimedia.org/T373176) (owner: 10Michael Große)
[13:29:15] <wikibugs>	 (03PS7) 10FNegri: alertmanager: fix WMCS template [puppet] - 10https://gerrit.wikimedia.org/r/1077038 (https://phabricator.wikimedia.org/T375479)
[13:29:18] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69810 and previous config saved to /var/cache/conftool/dbconfig/20241014-132918-ladsgroup.json
[13:29:21] <wikibugs>	 (03PS3) 10Ammarpad: contactpages: Move stewards contactpage to MetaContactPages.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080023
[13:29:23] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[13:29:37] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[13:29:44] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69811 and previous config saved to /var/cache/conftool/dbconfig/20241014-132944-ladsgroup.json
[13:30:35] <wikibugs>	 (03Merged) 10jenkins-bot: mariadb: add data directory accessor [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078616 (https://phabricator.wikimedia.org/T376701) (owner: 10Arnaudb)
[13:31:26] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.dns.netbox
[13:33:03] <wikibugs>	 (03PS4) 10Jcrespo: check footer legal complience: Add support for relative URLs [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789)
[13:33:36] <wikibugs>	 (03PS6) 10Arnaudb: mysql_legacy: double quote escape in run_query [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712)
[13:33:52] <wikibugs>	 (03CR) 10CI reject: [V:04-1] check footer legal complience: Add support for relative URLs [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789) (owner: 10Jcrespo)
[13:34:09] <wikibugs>	 (03CR) 10Jcrespo: "used urllib.parse.urljoin()" [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789) (owner: 10Jcrespo)
[13:34:17] <wikibugs>	 (03CR) 10Arnaudb: mysql_legacy: double quote escape in run_query (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) (owner: 10Arnaudb)
[13:34:49] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
[13:34:50] <wikibugs>	 (03PS5) 10Jcrespo: check footer legal complience: Add support for relative URLs [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789)
[13:35:04] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
[13:35:04] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:35:05] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-etcd1002.eqiad.wmnet
[13:35:35] <Amir1>	 jouncebot: nowandnext
[13:35:35] <jouncebot>	 For the next 0 hour(s) and 24 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1300)
[13:35:35] <jouncebot>	 In 1 hour(s) and 54 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1530)
[13:35:54] <wikibugs>	 10SRE-tools, 06Data-Persistence-SRE, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: mysql_legacy data_directory getter - https://phabricator.wikimedia.org/T376701#10225935 (10ABran-WMF) 05In progress→03Resolved
[13:36:18] <wikibugs>	 (03PS6) 10Jcrespo: check footer legal complience: Add support for relative URLs [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789)
[13:36:18] <Lucas_WMDE>	 Amir1: I’m waiting for CI to finish on some backports
[13:37:00] <wikibugs>	 (03PS4) 10Ammarpad: contactpages: Move stewards contactpage to MetaContactPages.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080023
[13:37:01] <Amir1>	 thanks. Please let me know once you're done, if you can squeeze https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1079984 that'd be amazing
[13:37:09] <Amir1>	 (it should be fast)
[13:37:18] <Lucas_WMDE>	 zuul says ETA 10 minutes
[13:37:27] <Lucas_WMDE>	 I think you could try to squeeze that in now
[13:37:36] <Lucas_WMDE>	 if I ctrl+c my scap backport
[13:37:56] <Amir1>	 thanks
[13:38:04] <Lucas_WMDE>	 alright, I’ve ctrl+c’ed mine
[13:38:06] <Lucas_WMDE>	 go ahead
[13:38:10] <wikibugs>	 (03PS1) 10Brouberol: cloudnative_pg: monitor daily rclone sync of PG S3 buckets [alerts] - 10https://gerrit.wikimedia.org/r/1080027 (https://phabricator.wikimedia.org/T377112)
[13:38:14] <Lucas_WMDE>	 and the CI can just continue
[13:38:15] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Update interwiki.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079984 (owner: 10Ladsgroup)
[13:38:22] <MichaelG_WMF>	 ETA 10 minutes is not realistic, more like 20 minutes
[13:38:25] <Lucas_WMDE>	 ok
[13:38:25] <MichaelG_WMF>	 sadly
[13:38:30] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079984 (owner: 10Ladsgroup)
[13:39:15] <wikibugs>	 (03Merged) 10jenkins-bot: Update interwiki.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079984 (owner: 10Ladsgroup)
[13:39:16] <MichaelG_WMF>	 (Where are those zuul estimates coming from anyway? They seem so often way too low for some repositories)
[13:39:21] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: hiddenparma: fix whitespace in hiddenparma-default.erb [puppet] - 10https://gerrit.wikimedia.org/r/1080028
[13:39:29] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1079984|Update interwiki.php]]
[13:40:38] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4301/co" [puppet] - 10https://gerrit.wikimedia.org/r/1080028 (owner: 10Giuseppe Lavagetto)
[13:40:57] <Lucas_WMDE>	 no idea
[13:41:27] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V:03+1 C:03+2] hiddenparma: fix whitespace in hiddenparma-default.erb [puppet] - 10https://gerrit.wikimedia.org/r/1080028 (owner: 10Giuseppe Lavagetto)
[13:41:43] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Backport for [[gerrit:1079984|Update interwiki.php]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:41:49] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Continuing with sync
[13:44:17] <wikibugs>	 (03CR) 10CI reject: [V:04-1] mysql_legacy: double quote escape in run_query [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) (owner: 10Arnaudb)
[13:44:21] <logmsgbot>	 !log kcvelaga@deploy2002 Started deploy [airflow-dags/analytics_product@fbcf880]: T375480
[13:44:25] <stashbot>	 T375480: ETL pipeline for Automoderator monthly key metrics - https://phabricator.wikimedia.org/T375480
[13:44:47] <wikibugs>	 (03PS7) 10Arnaudb: mysql_legacy: double quote escape in run_query [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712)
[13:45:24] <logmsgbot>	 !log kcvelaga@deploy2002 Finished deploy [airflow-dags/analytics_product@fbcf880]: T375480 (duration: 01m 07s)
[13:46:29] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1079984|Update interwiki.php]] (duration: 07m 00s)
[13:47:33] <Lucas_WMDE>	 Amir1: all done?
[13:47:50] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: wikimedia.org: add requestctl [dns] - 10https://gerrit.wikimedia.org/r/1080029 (https://phabricator.wikimedia.org/T371782)
[13:48:28] <Amir1>	 yup
[13:48:35] <Lucas_WMDE>	 ok, resuming my scap then
[13:48:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079923 (owner: 10Michael Große)
[13:48:39] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079894 (owner: 10Michael Große)
[13:48:39] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079915 (https://phabricator.wikimedia.org/T364341) (owner: 10Michael Große)
[13:48:41] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079925 (https://phabricator.wikimedia.org/T373176) (owner: 10Michael Große)
[13:49:12] <wikibugs>	 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10225985 (10cmooney) We definitely want to use DHCPv6 (stateful) for address assignment.  So OpenStack is in control of what IPs are us...
[13:49:28] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+2] wikimedia.org: add requestctl [dns] - 10https://gerrit.wikimedia.org/r/1080029 (https://phabricator.wikimedia.org/T371782) (owner: 10Giuseppe Lavagetto)
[13:50:47] <wikibugs>	 14SRE-Sprint-Week-Sustainability-March2023, 06serviceops, 10Sustainability (Incident Followup): Expand upon Kask/Sessionstore documentation - https://phabricator.wikimedia.org/T320398#10225997 (10hnowlan) a:05hnowlan→03None
[13:56:26] <wikibugs>	 (03PS7) 10Brouberol: Define the ceph-csi-cephfs admin_ng helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/1077878 (https://phabricator.wikimedia.org/T376406)
[13:56:26] <wikibugs>	 (03PS1) 10Brouberol: ceph-csi-cephfs: replace the ClusterRole by a list of ns-scoped Roles [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080032 (https://phabricator.wikimedia.org/T376406)
[13:58:13] <logmsgbot>	 !log stevemunene@cumin1002 START - Cookbook sre.hosts.decommission for hosts an-worker1176.eqiad.wmnet
[13:58:36] <wikibugs>	 (03Merged) 10jenkins-bot: refactor(tests): don't use per-method coverage annotation [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079923 (owner: 10Michael Große)
[13:59:16] <wikibugs>	 (03Merged) 10jenkins-bot: refactor(HomepageHooks): extract method for simpler modifyability [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079894 (owner: 10Michael Große)
[13:59:29] <Lucas_WMDE>	 almost there…
[13:59:33] <Lucas_WMDE>	 jouncebot: next
[13:59:33] <jouncebot>	 In 1 hour(s) and 30 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1530)
[13:59:39] <Lucas_WMDE>	 ok we have some time left
[14:00:16] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "This looks good to me. Thanks. This addresses the issue of the CSI plugin having privileges that are too high across the cluster. Adding e" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080032 (https://phabricator.wikimedia.org/T376406) (owner: 10Brouberol)
[14:00:48] * MichaelG_WMF is twiddeling their thumbs looking at the last change that is still going
[14:01:30] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Nice. Thanks." [alerts] - 10https://gerrit.wikimedia.org/r/1080027 (https://phabricator.wikimedia.org/T377112) (owner: 10Brouberol)
[14:01:38] <wikibugs>	 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops: openstack: initial IPv6 support in neutron - https://phabricator.wikimedia.org/T375847#10226051 (10cmooney) So.... maybe this is normal for DHCPv6?  Re-reading the reddit post and looking at the setup on the VM it seems li...
[14:02:11] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: idp: add entry for requesctl.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/1080034 (https://phabricator.wikimedia.org/T371782)
[14:02:20] <wikibugs>	 (03Merged) 10jenkins-bot: Clear LinkRecommendation suggestions on page save [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079915 (https://phabricator.wikimedia.org/T364341) (owner: 10Michael Große)
[14:02:22] <wikibugs>	 (03Merged) 10jenkins-bot: Run fixLinkRecommendationData even when disabled in CC [extensions/GrowthExperiments] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1079925 (https://phabricator.wikimedia.org/T373176) (owner: 10Michael Große)
[14:02:29] <Lucas_WMDE>	 wheeee
[14:02:32] <MichaelG_WMF>	 \o/
[14:02:42] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1079923|refactor(tests): don't use per-method coverage annotation]], [[gerrit:1079894|refactor(HomepageHooks): extract method for simpler modifyability]], [[gerrit:1079915|Clear LinkRecommendation suggestions on page save (T364341 T372337)]], [[gerrit:1079925|Run fixLinkRecommendationData even when disabled in CC (T373176)]]
[14:02:48] <stashbot>	 T364341: [wmf.3]  Special:Homepage - high rate for dangling db records   - https://phabricator.wikimedia.org/T364341
[14:02:49] <stashbot>	 T372337: High number of dangling search index results at fr.wikipedia or it.wikipedia - https://phabricator.wikimedia.org/T372337
[14:02:49] <stashbot>	 T373176: fixLinkRecommendationData.php does not run when link-recommendation task type is disabled - https://phabricator.wikimedia.org/T373176
[14:03:14] <logmsgbot>	 !log stevemunene@cumin1002 START - Cookbook sre.dns.netbox
[14:03:33] <wikibugs>	 (03CR) 10Btullis: "Thanks elukey. This has now been addressed in: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1080032" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1077872 (https://phabricator.wikimedia.org/T376406) (owner: 10Brouberol)
[14:04:41] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "I'm happy with this. We might also want to replicate this functionality in the RBD version and/or send this upstream." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1078387 (https://phabricator.wikimedia.org/T376406) (owner: 10Brouberol)
[14:04:49] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 migr, lucaswerkmeister-wmde: Backport for [[gerrit:1079923|refactor(tests): don't use per-method coverage annotation]], [[gerrit:1079894|refactor(HomepageHooks): extract method for simpler modifyability]], [[gerrit:1079915|Clear LinkRecommendation suggestions on page save (T364341 T372337)]], [[gerrit:1079925|Run fixLinkRecommendationData even when disabled in CC (T373176)]] synced to
[14:04:49] <logmsgbot>	 the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:04:57] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 migr, lucaswerkmeister-wmde: Continuing with sync
[14:06:10] <MichaelG_WMF>	 nothing should change from these patches, so if the logs are fine, we can move forward I think.
[14:06:31] <MichaelG_WMF>	 (this will take a config change to toggle the feature flag for there to be a change over time)
[14:06:41] <logmsgbot>	 !log stevemunene@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1176.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
[14:07:06] <logmsgbot>	 !log stevemunene@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1176.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
[14:07:06] <logmsgbot>	 !log stevemunene@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:07:07] <logmsgbot>	 !log stevemunene@cumin1002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-worker1176.eqiad.wmnet
[14:07:51] <logmsgbot>	 !log stevemunene@cumin1002 START - Cookbook sre.hosts.decommission for hosts an-worker1177.eqiad.wmnet
[14:09:31] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1079923|refactor(tests): don't use per-method coverage annotation]], [[gerrit:1079894|refactor(HomepageHooks): extract method for simpler modifyability]], [[gerrit:1079915|Clear LinkRecommendation suggestions on page save (T364341 T372337)]], [[gerrit:1079925|Run fixLinkRecommendationData even when disabled in CC (T373176)]] (duration: 0
[14:09:31] <logmsgbot>	 6m 48s)
[14:09:37] <stashbot>	 T364341: [wmf.3]  Special:Homepage - high rate for dangling db records   - https://phabricator.wikimedia.org/T364341
[14:09:37] <stashbot>	 T372337: High number of dangling search index results at fr.wikipedia or it.wikipedia - https://phabricator.wikimedia.org/T372337
[14:09:37] <stashbot>	 T373176: fixLinkRecommendationData.php does not run when link-recommendation task type is disabled - https://phabricator.wikimedia.org/T373176
[14:10:02] <Lucas_WMDE>	 !log [untruncated duration: 06m 48s]
[14:10:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:12:06] <Lucas_WMDE>	 !log UTC afternoon backport+config window done
[14:12:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:12:26] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] cloudnative_pg: monitor daily rclone sync of PG S3 buckets [alerts] - 10https://gerrit.wikimedia.org/r/1080027 (https://phabricator.wikimedia.org/T377112) (owner: 10Brouberol)
[14:12:27] <MichaelG_WMF>	 Lucas_WMDE: Thank you! 🙏
[14:12:35] <Lucas_WMDE>	 np :)
[14:12:51] <logmsgbot>	 !log stevemunene@cumin1002 START - Cookbook sre.dns.netbox
[14:16:12] <logmsgbot>	 !log stevemunene@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1177.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
[14:16:33] <logmsgbot>	 !log stevemunene@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1177.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
[14:16:33] <logmsgbot>	 !log stevemunene@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:16:34] <logmsgbot>	 !log stevemunene@cumin1002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-worker1177.eqiad.wmnet
[14:18:35] <wikibugs>	 (03CR) 10Gmodena: [C:03+1] Renamed log fields for pipeline migration (haproxykafka) [puppet] - 10https://gerrit.wikimedia.org/r/1074414 (https://phabricator.wikimedia.org/T370668) (owner: 10Fabfur)
[14:18:40] <wikibugs>	 (03PS1) 10JMeybohm: dragonfly::dfdaemon: Enable by default when profile is included [puppet] - 10https://gerrit.wikimedia.org/r/1080038 (https://phabricator.wikimedia.org/T362408)
[14:21:46] <wikibugs>	 (03PS1) 10Michael Große: eswiki: switch clearing link recommendations to PageSaveComplete hook [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080035 (https://phabricator.wikimedia.org/T372337)
[14:21:46] <wikibugs>	 (03CR) 10Michael Große: "I'm a bit unsure about whether we should do this in one change (like here) or in two changes where the first one only adds the default, an" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080035 (https://phabricator.wikimedia.org/T372337) (owner: 10Michael Große)
[14:22:08] <wikibugs>	 (03PS1) 10Brouberol: flink-operator: deply an image with fixes for recent OpenJDK vulnerability fixes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080039 (https://phabricator.wikimedia.org/T371874)
[14:23:44] <wikibugs>	 (03CR) 10JMeybohm: [V:03+1] "PCC SUCCESS (NOOP 7): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4302/console" [puppet] - 10https://gerrit.wikimedia.org/r/1080038 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[14:29:13] <wikibugs>	 (03CR) 10David Caro: [C:03+1] "LGTM, let's try" [puppet] - 10https://gerrit.wikimedia.org/r/1077038 (https://phabricator.wikimedia.org/T375479) (owner: 10FNegri)
[14:30:00] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69812 and previous config saved to /var/cache/conftool/dbconfig/20241014-143000-ladsgroup.json
[14:31:34] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Remove the dumps_store_load_average icinga check [puppet] - 10https://gerrit.wikimedia.org/r/1079971 (https://phabricator.wikimedia.org/T374821) (owner: 10Btullis)
[14:33:51] <wikibugs>	 (03PS2) 10Brouberol: flink-operator: deply an image with fixes for recent OpenJDK vulns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080039 (https://phabricator.wikimedia.org/T371874)
[14:34:04] <wikibugs>	 (03CR) 10Elukey: "Anything missing Janis? We are now fully row redundant :)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080024 (https://phabricator.wikimedia.org/T333302) (owner: 10Elukey)
[14:34:44] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10226204 (10phaultfinder)
[14:35:57] <wikibugs>	 (03PS1) 10JMeybohm: dragonfly::dfdaemon: Refactor docker integration [puppet] - 10https://gerrit.wikimedia.org/r/1080042 (https://phabricator.wikimedia.org/T362408)
[14:37:13] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:37:26] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] "I don't think there was anything else. 🚢" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080024 (https://phabricator.wikimedia.org/T333302) (owner: 10Elukey)
[14:38:13] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1080042 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm)
[14:39:02] <wikibugs>	 (03PS1) 10Tiziano Fogli: logstash: stripping containerd prefix [puppet] - 10https://gerrit.wikimedia.org/r/1080047 (https://phabricator.wikimedia.org/T377132)
[14:39:11] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[14:40:00] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 06serviceops: Clean up the Docker Registry catalog and Swift storage from old images - https://phabricator.wikimedia.org/T375645#10226221 (10elukey) p:05Triage→03Medium
[14:40:16] <wikibugs>	 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations, 10Spicerack: Upload redfish licenses to supermicro hosts - https://phabricator.wikimedia.org/T376121#10226222 (10elukey) p:05Triage→03Medium
[14:40:57] <wikibugs>	 (03CR) 10CI reject: [V:04-1] logstash: stripping containerd prefix [puppet] - 10https://gerrit.wikimedia.org/r/1080047 (https://phabricator.wikimedia.org/T377132) (owner: 10Tiziano Fogli)
[14:41:08] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[14:41:39] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[14:42:38] <wikibugs>	 (03PS2) 10Tiziano Fogli: logstash: stripping containerd prefix [puppet] - 10https://gerrit.wikimedia.org/r/1080047 (https://phabricator.wikimedia.org/T377132)
[14:43:02] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[14:43:51] <logmsgbot>	 !log aikochou@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
[14:45:07] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P69813 and previous config saved to /var/cache/conftool/dbconfig/20241014-144507-ladsgroup.json
[14:45:30] <wikibugs>	 10SRE-tools, 06Infrastructure-Foundations: debmonitor could provide users with cumin and/or debdeploy pre-made config/command - https://phabricator.wikimedia.org/T375475#10226275 (10joanna_borun) p:05Triage→03Low
[14:45:36] <wikibugs>	 10SRE-tools, 06Infrastructure-Foundations: debmonitor could provide users with cumin and/or debdeploy pre-made config/command - https://phabricator.wikimedia.org/T375475#10226274 (10joanna_borun) @fgiunchedi could you please expand on the use-case and problem so we can figure out best way to address it?
[14:46:50] <wikibugs>	 06SRE, 07SRE-Unowned, 06Infrastructure-Foundations: Create and deploy a re-reimplementation of irc.wikimedia.org in Python 3 without external service deps - https://phabricator.wikimedia.org/T376014#10226280 (10elukey) p:05Triage→03Medium
[14:48:42] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Phase out platform-engineering POSIX group - https://phabricator.wikimedia.org/T376808#10226299 (10elukey) p:05Triage→03Medium
[14:50:11] <wikibugs>	 (03CR) 10Majavah: "yeah, let's do that please while we still can. I'll send a patch." [puppet] - 10https://gerrit.wikimedia.org/r/1078986 (https://phabricator.wikimedia.org/T362066) (owner: 10David Caro)
[14:50:14] <wikibugs>	 (03PS3) 10Tiziano Fogli: logstash: stripping containerd prefix [puppet] - 10https://gerrit.wikimedia.org/r/1080047 (https://phabricator.wikimedia.org/T377132)
[14:51:58] <wikibugs>	 06SRE-OnFire, 06Infrastructure-Foundations, 10netops, 10Sustainability (Incident Followup): Juniper: regularly run `request system configuration rescue save` - https://phabricator.wikimedia.org/T376005#10226302 (10joanna_borun) p:05Triage→03Low a:03ayounsi
[14:55:45] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: ml-services: bump kserve in langid to 0.13.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080052 (https://phabricator.wikimedia.org/T367048)
[15:00:14] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P69814 and previous config saved to /var/cache/conftool/dbconfig/20241014-150014-ladsgroup.json
[15:00:44] <wikibugs>	 (03CR) 10Urbanecm: [C:03+1] "Both are equally acceptable! Personally, I believe it is wise to "pin" all/most variables in operations/mediawiki-config the moment you ad" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080035 (https://phabricator.wikimedia.org/T372337) (owner: 10Michael Große)
[15:02:04] <wikibugs>	 (03PS1) 10Majavah: P:toolforge::proxy: use svc.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/1080056
[15:02:07] <wikibugs>	 (03CR) 10AikoChou: [C:03+1] ml-services: bump kserve in langid to 0.13.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080052 (https://phabricator.wikimedia.org/T367048) (owner: 10Ilias Sarantopoulos)
[15:02:13] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:03:02] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] ml-services: bump kserve in langid to 0.13.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080052 (https://phabricator.wikimedia.org/T367048) (owner: 10Ilias Sarantopoulos)
[15:03:28] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Remove aux-k8s-etcd100[1,2] from production [puppet] - 10https://gerrit.wikimedia.org/r/1080022 (https://phabricator.wikimedia.org/T344230) (owner: 10Elukey)
[15:03:59] <wikibugs>	 (03CR) 10CI reject: [V:04-1] P:toolforge::proxy: use svc.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/1080056 (owner: 10Majavah)
[15:04:10] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: bump kserve in langid to 0.13.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080052 (https://phabricator.wikimedia.org/T367048) (owner: 10Ilias Sarantopoulos)
[15:04:20] <wikibugs>	 (03CR) 10Elukey: [C:03+1] idp: add entry for requesctl.wikimedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1080034 (https://phabricator.wikimedia.org/T371782) (owner: 10Giuseppe Lavagetto)
[15:04:37] <wikibugs>	 (03PS2) 10Majavah: P:toolforge::proxy: use svc.toolforge.org [puppet] - 10https://gerrit.wikimedia.org/r/1080056
[15:04:48] <wikibugs>	 (03CR) 10Elukey: [C:03+2] admin_ng: remove ad-hoc anti-affinity rules for Calico typha in AUX [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080024 (https://phabricator.wikimedia.org/T333302) (owner: 10Elukey)
[15:05:30] <logmsgbot>	 !log isaranto@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
[15:06:07] <logmsgbot>	 !log elukey@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
[15:07:37] <logmsgbot>	 !log elukey@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
[15:15:21] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69815 and previous config saved to /var/cache/conftool/dbconfig/20241014-151521-ladsgroup.json
[15:15:25] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
[15:15:39] <logmsgbot>	 !log isaranto@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
[15:15:39] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
[15:15:46] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69816 and previous config saved to /var/cache/conftool/dbconfig/20241014-151546-ladsgroup.json
[15:16:02] <logmsgbot>	 !log isaranto@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
[15:19:24] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: mwscript-cleanup.service on deploy1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:23:56] <wikibugs>	 (03CR) 10Volans: "forgot to git add them?" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) (owner: 10Arnaudb)
[15:30:04] <jouncebot>	 jan_drewniak: Time to snap out of that daydream and deploy Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1530).
[15:33:28] <wikibugs>	 (03CR) 10Volans: "Nice, couple of small improvements inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1080019 (https://phabricator.wikimedia.org/T377129) (owner: 10Arnaudb)
[15:34:24] <wikibugs>	 (03PS8) 10Arnaudb: mysql_legacy: double quote escape in run_query [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712)
[15:36:16] <wikibugs>	 (03CR) 10Volans: [C:03+1] "It seems ok, give it a test before/after merging." [cookbooks] - 10https://gerrit.wikimedia.org/r/1080012 (owner: 10Ayounsi)
[15:39:13] <wikibugs>	 (03PS2) 10JMeybohm: dragonfly::dfdaemon: Refactor docker integration [puppet] - 10https://gerrit.wikimedia.org/r/1080042 (https://phabricator.wikimedia.org/T362408)
[15:43:15] <wikibugs>	 (03CR) 10Arnaudb: "I removed them and forgot to write a new set, its fixed!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) (owner: 10Arnaudb)
[15:46:39] <wikibugs>	 (03Abandoned) 10Elukey: role::docker_registry_ha::registry: add nginx monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1077933 (https://phabricator.wikimedia.org/T376285) (owner: 10Elukey)
[15:46:41] <logmsgbot>	 !log aikochou@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
[15:46:52] <wikibugs>	 (03Abandoned) 10Elukey: profile::trafficserver::backend: change timeouts for the docker registry [puppet] - 10https://gerrit.wikimedia.org/r/1075528 (https://phabricator.wikimedia.org/T242604) (owner: 10Elukey)
[15:52:49] <logmsgbot>	 !log aikochou@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
[16:03:52] <sergi0>	 !log Running `sgimeno@mwmaint2002:~$ foreachwiki userOptions.php --delete --old=1 growthexperiments-tour-newimpact-discovery` (T376461)
[16:03:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:04:13] <stashbot>	 T376461: Remove unused user property growthexperiments-tour-newimpact-discovery - https://phabricator.wikimedia.org/T376461
[16:06:40] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Disk (sdk) failed on moss-be1002 - https://phabricator.wikimedia.org/T377154 (10MatthewVernon) 03NEW
[16:07:03] <wikibugs>	 10ops-eqiad, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Disk (sdk) failed on moss-be1002 - https://phabricator.wikimedia.org/T377154#10226646 (10MatthewVernon) p:05Triage→03Medium
[16:16:02] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69817 and previous config saved to /var/cache/conftool/dbconfig/20241014-161602-ladsgroup.json
[16:16:51] <wikibugs>	 (03PS1) 10JMeybohm: containerd: Remove container log line length limit [puppet] - 10https://gerrit.wikimedia.org/r/1080071 (https://phabricator.wikimedia.org/T377132)
[16:17:19] <wikibugs>	 (03CR) 10Volans: mysql_legacy: double quote escape in run_query (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) (owner: 10Arnaudb)
[16:18:41] <wikibugs>	 (03PS1) 10Ammarpad: ContactPage: Move nlwiki contactpage config to CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080072 (https://phabricator.wikimedia.org/T142544)
[16:19:23] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ContactPage: Move nlwiki contactpage config to CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080072 (https://phabricator.wikimedia.org/T142544) (owner: 10Ammarpad)
[16:23:29] <wikibugs>	 (03PS2) 10Ammarpad: ContactPage: Move nlwiki contactpage config to CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080072 (https://phabricator.wikimedia.org/T142544)
[16:31:09] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P69818 and previous config saved to /var/cache/conftool/dbconfig/20241014-163109-ladsgroup.json
[16:36:03] <wikibugs>	 (03CR) 10Ammarpad: "Test plan: Check these forms work" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080072 (https://phabricator.wikimedia.org/T142544) (owner: 10Ammarpad)
[16:46:16] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P69819 and previous config saved to /var/cache/conftool/dbconfig/20241014-164616-ladsgroup.json
[16:50:48] <logmsgbot>	 !log fnegri@cumin1002 START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
[16:50:51] <logmsgbot>	 !log fnegri@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
[16:51:29] <stashbot>	 T375223: 2024-09-21 NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223
[17:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1700)
[17:00:05] <jouncebot>	 ryankemper: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T1700).
[17:01:23] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69820 and previous config saved to /var/cache/conftool/dbconfig/20241014-170123-ladsgroup.json
[17:01:28] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
[17:01:42] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
[17:03:36] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Wikimedia-Portals, 13Patch-For-Review: www.wikipedia.org: prefilling the search box with the "search" URL parameter does not work - https://phabricator.wikimedia.org/T318285#10226824 (10simon04) Likely a regression of https://gerrit.wikimedia.org/r/c/operations/puppet/+/81...
[17:06:27] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
[17:06:41] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
[17:06:48] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69821 and previous config saved to /var/cache/conftool/dbconfig/20241014-170647-ladsgroup.json
[17:12:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:48:38] <wikibugs>	 (03CR) 10Ammarpad: "No default form should show for:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080072 (https://phabricator.wikimedia.org/T142544) (owner: 10Ammarpad)
[18:00:57] <wikibugs>	 (03PS7) 10Jcrespo: check footer legal complience: Add support for relative URLs [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789)
[18:01:50] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] "Tests in this version look ok:" [puppet] - 10https://gerrit.wikimedia.org/r/1079973 (https://phabricator.wikimedia.org/T375789) (owner: 10Jcrespo)
[18:07:05] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69822 and previous config saved to /var/cache/conftool/dbconfig/20241014-180704-ladsgroup.json
[18:22:12] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P69823 and previous config saved to /var/cache/conftool/dbconfig/20241014-182211-ladsgroup.json
[18:37:19] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P69824 and previous config saved to /var/cache/conftool/dbconfig/20241014-183718-ladsgroup.json
[18:39:42] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10226992 (10phaultfinder)
[18:47:30] <logmsgbot>	 !log aqu@deploy2002 Started deploy [airflow-dags/analytics_test@a1a70ce]: Deploy last fixes on Refine staging [airflow-dags@a1a70ce8]
[18:47:43] <logmsgbot>	 !log aqu@deploy2002 Finished deploy [airflow-dags/analytics_test@a1a70ce]: Deploy last fixes on Refine staging [airflow-dags@a1a70ce8] (duration: 00m 13s)
[18:52:26] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69825 and previous config saved to /var/cache/conftool/dbconfig/20241014-185225-ladsgroup.json
[18:52:31] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
[18:52:44] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
[18:57:29] <logmsgbot>	 !log aqu@deploy2002 Started deploy [airflow-dags/analytics@a1a70ce]: Deploy last version for Refine staging [airflow-dags@a1a70ce8]
[18:57:59] <logmsgbot>	 !log aqu@deploy2002 Finished deploy [airflow-dags/analytics@a1a70ce]: Deploy last version for Refine staging [airflow-dags@a1a70ce8] (duration: 00m 29s)
[19:19:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: mwscript-cleanup.service on deploy1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:25:20] <wikibugs>	 (03PS1) 10Pppery: Configure namespaces, sitenames, and timezones for new wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080078
[19:26:01] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Configure namespaces, sitenames, and timezones for new wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080078 (owner: 10Pppery)
[19:27:03] <wikibugs>	 (03PS2) 10Pppery: Configure namespaces, sitenames, and timezones for new wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080078
[19:27:11] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2209 to s3 master [puppet] - 10https://gerrit.wikimedia.org/r/1080079 (https://phabricator.wikimedia.org/T377164)
[19:27:15] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: wmnet: Update s3-master alias [dns] - 10https://gerrit.wikimedia.org/r/1080080 (https://phabricator.wikimedia.org/T377164)
[19:29:16] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[19:29:30] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[19:29:32] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[19:29:49] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[19:29:57] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69826 and previous config saved to /var/cache/conftool/dbconfig/20241014-192956-ladsgroup.json
[19:30:18] <wikibugs>	 (03PS3) 10Pppery: Configure namespaces, sitenames, and timezones for new wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080078 (https://phabricator.wikimedia.org/T377160)
[19:33:54] <wikibugs>	 (03PS4) 10Pppery: Configure namespaces, sitenames, and timezones for new wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080078 (https://phabricator.wikimedia.org/T377160)
[19:39:19] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69827 and previous config saved to /var/cache/conftool/dbconfig/20241014-193918-ladsgroup.json
[19:54:26] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P69828 and previous config saved to /var/cache/conftool/dbconfig/20241014-195425-ladsgroup.json
[20:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: #bothumor My software never has bugs. It just develops random features. Rise for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T2000).
[20:00:05] <jouncebot>	 Pppery: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:11] <Pppery>	 here
[20:05:09] <Pppery>	 Anyone here to deploy?
[20:08:43] <TheresNoTime>	 I can :)
[20:08:44] <TheresNoTime>	 one moment
[20:09:30] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by samtar@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1078122 (https://phabricator.wikimedia.org/T249648) (owner: 10Pppery)
[20:09:33] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P69829 and previous config saved to /var/cache/conftool/dbconfig/20241014-200932-ladsgroup.json
[20:10:13] <wikibugs>	 (03Merged) 10jenkins-bot: Missing.php: Redirect Scots Wiktionary to Scots Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1078122 (https://phabricator.wikimedia.org/T249648) (owner: 10Pppery)
[20:10:31] <logmsgbot>	 !log samtar@deploy2002 Started scap sync-world: Backport for [[gerrit:1078122|Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648)]]
[20:10:41] <stashbot>	 T249648: redirect sco.wiktionary.org/wiki/(.*?) -> sco.wikipedia.org/wiki/Define:$1 - https://phabricator.wikimedia.org/T249648
[20:12:49] <logmsgbot>	 !log samtar@deploy2002 samtar, pppery: Backport for [[gerrit:1078122|Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[20:13:00] <TheresNoTime>	 Pppery: on mwdebug for testing
[20:13:04] <Pppery>	 On it
[20:13:34] <Pppery>	 See to work as I want it to
[20:13:38] <Pppery>	 (seems)
[20:14:03] <logmsgbot>	 !log samtar@deploy2002 samtar, pppery: Continuing with sync
[20:14:27] <Pppery>	 Although my testing did re-reveal an unrelated bug which I already submitted a patch for
[20:15:43] <TheresNoTime>	 Pppery: that change ^ was okay to get deployed though, right?
[20:15:50] <Pppery>	 yes
[20:16:24] <Pppery>	 since the bug happens regardless of whether or not it is deployed
[20:16:33] <TheresNoTime>	 ack :)
[20:18:23] <Pppery>	 The bug being, by the way, that https://sco.wiktionary.org/wiki/w:Foo doesn't work - with the patch it goes to https://sco.wikipedia.org/wiki/Define:w:foo, and without the patch it goes to https://incubator.wikimedia.org/wiki/Wt/sco/w:foo, both of which are 404 pages hence neither is better than the other. The right destination would be
[20:18:24] <Pppery>	 https://sco.wikipedia.org/wiki/Foo, and https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1075957 will fix it
[20:18:46] <logmsgbot>	 !log samtar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1078122|Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648)]] (duration: 08m 14s)
[20:18:56] <stashbot>	 T249648: redirect sco.wiktionary.org/wiki/(.*?) -> sco.wikipedia.org/wiki/Define:$1 - https://phabricator.wikimedia.org/T249648
[20:19:55] <TheresNoTime>	 deployed :) as for the follow-up, it'd be best to have a +1 rather than deploy it now, do you agree?
[20:20:00] <Pppery>	 yep
[20:20:10] <Pppery>	 which is why I haven't scheduled it for deployment
[20:20:37] <TheresNoTime>	 :)
[20:21:32] <wikibugs>	 06SRE, 06Traffic-Icebox, 10Wikimedia-Apache-configuration, 13Patch-For-Review, 10Wiki-Setup (Delete / Redirect): redirect sco.wiktionary.org/wiki/(.*?) -> sco.wikipedia.org/wiki/Define:$1 - https://phabricator.wikimedia.org/T249648#10227207 (10Pppery) 05Open→03Resolved a:03Pppery
[20:21:33] <TheresNoTime>	 !log UTC late backport window done
[20:21:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:40] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69830 and previous config saved to /var/cache/conftool/dbconfig/20241014-202439-ladsgroup.json
[20:24:44] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
[20:24:57] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
[20:25:04] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69831 and previous config saved to /var/cache/conftool/dbconfig/20241014-202504-ladsgroup.json
[20:25:28] <RhinosF1>	 TheresNoTime: check discord
[20:34:17] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69832 and previous config saved to /var/cache/conftool/dbconfig/20241014-203416-ladsgroup.json
[20:49:24] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P69833 and previous config saved to /var/cache/conftool/dbconfig/20241014-204923-ladsgroup.json
[20:57:48] <wikibugs>	 06SRE-OnFire, 10Incident Tooling: corto: implement updating IRC topics and wikimediastatus.net - https://phabricator.wikimedia.org/T370785#10227246 (10lmata) >>! In T370785#10211164, @Eevans wrote: > Q: Should this be a part of the MVP (i.e. Day 1), or saved for a subsequent iteration?  Having this in a later...
[21:00:04] <jouncebot>	 Reedy, sbassett, Maryum, and manfredi: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241014T2100).
[21:04:31] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P69834 and previous config saved to /var/cache/conftool/dbconfig/20241014-210430-ladsgroup.json
[21:11:30] <jinxer-wm>	 FIRING: ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[21:12:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:16:30] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[21:19:37] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69835 and previous config saved to /var/cache/conftool/dbconfig/20241014-211937-ladsgroup.json
[21:19:41] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[21:19:55] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[21:20:02] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69836 and previous config saved to /var/cache/conftool/dbconfig/20241014-212001-ladsgroup.json
[21:29:23] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69837 and previous config saved to /var/cache/conftool/dbconfig/20241014-212922-ladsgroup.json
[21:34:54] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'db2194 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69838 and previous config saved to /var/cache/conftool/dbconfig/20241014-213453-ladsgroup.json
[21:38:43] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1226.eqiad.wmnet with reason: Maintenance
[21:38:56] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1226.eqiad.wmnet with reason: Maintenance
[21:39:03] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1226 (T367856)', diff saved to https://phabricator.wikimedia.org/P69839 and previous config saved to /var/cache/conftool/dbconfig/20241014-213902-ladsgroup.json
[21:39:06] <stashbot>	 T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856
[21:44:30] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P69840 and previous config saved to /var/cache/conftool/dbconfig/20241014-214429-ladsgroup.json
[21:45:06] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db1236.eqiad.wmnet with reason: Maintenance
[21:45:08] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1236.eqiad.wmnet with reason: Maintenance
[21:45:16] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69841 and previous config saved to /var/cache/conftool/dbconfig/20241014-214515-ladsgroup.json
[21:45:19] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[21:49:59] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'db2194 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69842 and previous config saved to /var/cache/conftool/dbconfig/20241014-214958-ladsgroup.json
[21:51:44] <wikibugs>	 (03PS2) 10Andrea Denisse: grafana: Ensure grafana-loki service auto restarts after system updates [puppet] - 10https://gerrit.wikimedia.org/r/1080090 (https://phabricator.wikimedia.org/T377166)
[21:51:44] <wikibugs>	 (03CR) 10Andrea Denisse: [V:03+1] "PCC results: https://puppet-compiler.wmflabs.org/output/1080090/4305/" [puppet] - 10https://gerrit.wikimedia.org/r/1080090 (https://phabricator.wikimedia.org/T377166) (owner: 10Andrea Denisse)
[21:59:36] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P69843 and previous config saved to /var/cache/conftool/dbconfig/20241014-215936-ladsgroup.json
[22:01:14] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
[22:01:28] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
[22:01:35] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69844 and previous config saved to /var/cache/conftool/dbconfig/20241014-220134-ladsgroup.json
[22:01:39] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[22:05:04] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'db2194 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69845 and previous config saved to /var/cache/conftool/dbconfig/20241014-220504-ladsgroup.json
[22:10:08] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69846 and previous config saved to /var/cache/conftool/dbconfig/20241014-221008-ladsgroup.json
[22:10:12] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[22:14:43] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69847 and previous config saved to /var/cache/conftool/dbconfig/20241014-221443-ladsgroup.json
[22:14:48] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
[22:15:01] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
[22:15:09] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69848 and previous config saved to /var/cache/conftool/dbconfig/20241014-221508-ladsgroup.json
[22:20:10] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'db2194 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69849 and previous config saved to /var/cache/conftool/dbconfig/20241014-222009-ladsgroup.json
[22:23:18] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69850 and previous config saved to /var/cache/conftool/dbconfig/20241014-222317-ladsgroup.json
[22:25:15] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69851 and previous config saved to /var/cache/conftool/dbconfig/20241014-222515-ladsgroup.json
[22:38:25] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P69852 and previous config saved to /var/cache/conftool/dbconfig/20241014-223824-ladsgroup.json
[22:40:22] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69853 and previous config saved to /var/cache/conftool/dbconfig/20241014-224022-ladsgroup.json
[22:43:12] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69854 and previous config saved to /var/cache/conftool/dbconfig/20241014-224311-ladsgroup.json
[22:43:15] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[22:44:41] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10227342 (10phaultfinder)
[22:50:48] <wikibugs>	 (03CR) 10Krinkle: [C:03+1] Enable {{USERLANGUAGE}} on Commons and Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079680 (https://phabricator.wikimedia.org/T4085) (owner: 10Tim Starling)
[22:52:52] <wikibugs>	 (03CR) 10Krinkle: [C:03+1] Enable {{USERLANGUAGE}} on Commons and Meta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079680 (https://phabricator.wikimedia.org/T4085) (owner: 10Tim Starling)
[22:53:32] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P69855 and previous config saved to /var/cache/conftool/dbconfig/20241014-225331-ladsgroup.json
[22:55:29] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69856 and previous config saved to /var/cache/conftool/dbconfig/20241014-225528-ladsgroup.json
[22:55:32] <stashbot>	 T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742
[22:58:21] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69857 and previous config saved to /var/cache/conftool/dbconfig/20241014-225818-ladsgroup.json
[22:59:42] <jinxer-wm>	 FIRING: Device rebooted: Alert for device ps1-e3-eqiad.mgmt.eqiad.wmnet - Device rebooted   - https://alerts.wikimedia.org/?q=alertname%3DDevice+rebooted
[23:04:42] <jinxer-wm>	 RESOLVED: Device rebooted: Device ps1-e3-eqiad.mgmt.eqiad.wmnet recovered from Device rebooted   - https://alerts.wikimedia.org/?q=alertname%3DDevice+rebooted
[23:08:38] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69858 and previous config saved to /var/cache/conftool/dbconfig/20241014-230838-ladsgroup.json
[23:08:43] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
[23:08:56] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
[23:09:03] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69859 and previous config saved to /var/cache/conftool/dbconfig/20241014-230903-ladsgroup.json
[23:09:31] <jinxer-wm>	 FIRING: Device rebooted: Alert for device ps1-e2-eqiad.mgmt.eqiad.wmnet - Device rebooted   - https://alerts.wikimedia.org/?q=alertname%3DDevice+rebooted
[23:09:37] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10227354 (10phaultfinder)
[23:13:28] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69860 and previous config saved to /var/cache/conftool/dbconfig/20241014-231328-ladsgroup.json
[23:14:31] <jinxer-wm>	 RESOLVED: Device rebooted: Device ps1-e2-eqiad.mgmt.eqiad.wmnet recovered from Device rebooted   - https://alerts.wikimedia.org/?q=alertname%3DDevice+rebooted
[23:17:16] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69861 and previous config saved to /var/cache/conftool/dbconfig/20241014-231715-ladsgroup.json
[23:19:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: mwscript-cleanup.service on deploy1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:24:42] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10227362 (10phaultfinder)
[23:28:35] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69862 and previous config saved to /var/cache/conftool/dbconfig/20241014-232835-ladsgroup.json
[23:28:37] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
[23:28:39] <stashbot>	 T370903: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903
[23:28:50] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
[23:28:57] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2140 (T370903)', diff saved to https://phabricator.wikimedia.org/P69863 and previous config saved to /var/cache/conftool/dbconfig/20241014-232857-ladsgroup.json
[23:32:23] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P69864 and previous config saved to /var/cache/conftool/dbconfig/20241014-233222-ladsgroup.json
[23:38:30] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1080096
[23:38:30] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1080096 (owner: 10TrainBranchBot)
[23:47:30] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P69865 and previous config saved to /var/cache/conftool/dbconfig/20241014-234729-ladsgroup.json
[23:49:49] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10227369 (10phaultfinder)
[23:53:42] <wikibugs>	 (03CR) 10Tim Starling: "You know that's 49 wikis. It should probably be discussed somewhere." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1079680 (https://phabricator.wikimedia.org/T4085) (owner: 10Tim Starling)