[00:01:48] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on mw1366:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:02:40] <wikibugs>	 10SRE-swift-storage, 10MediaWiki-Uploading, 10Patch-For-Review, 10User-revi: FAILED: stashfailed: Could not read file "mwstore://local-swift-eqiad/local-temp/a/ac/15xi9btm14os.u9p1dr.1208681.webm.0". - https://phabricator.wikimedia.org/T200820#9565825 (10Bawolff) Gerrit patch to detect the situation where...
[00:02:46] <icinga-wm_>	 PROBLEM - PyBal backends health check on lvs4010 is CRITICAL: PYBAL CRITICAL - CRITICAL - ncredirlb_80: Servers ncredir4001.ulsfo.wmnet are marked down but pooled: ncredirlb_443: Servers ncredir4002.ulsfo.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[00:02:48] <icinga-wm_>	 PROBLEM - PyBal backends health check on lvs4008 is CRITICAL: PYBAL CRITICAL - CRITICAL - ncredirlb6_80: Servers ncredir4001.ulsfo.wmnet are marked down but pooled: ncredirlb_80: Servers ncredir4001.ulsfo.wmnet are marked down but pooled: ncredirlb_443: Servers ncredir4001.ulsfo.wmnet are marked down but pooled: ncredirlb6_443: Servers ncredir4001.ulsfo.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[00:02:57] <jinxer-wm>	 (ProbeDown) firing: (2) Service ncredir-https:443 has failed probes (http_ncredir-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#ncredir-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:03:10] <wikibugs>	 10SRE-swift-storage, 10UploadWizard: Problem uploading FLAC file in Upload Wizzard to Wikimedia Commons - https://phabricator.wikimedia.org/T355610#9565833 (10Bawolff) https://gerrit.wikimedia.org/r/1005632 may help with this.
[00:04:40] <jinxer-wm>	 (KubernetesRsyslogDown) resolved: rsyslog on kubernetes2057:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=kubernetes2057 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[00:05:25] <jinxer-wm>	 (SystemdUnitFailed) firing: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:06:49] <jinxer-wm>	 (PuppetZeroResources) firing: (6) Puppet has failed generate resources on mw1366:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:10:49] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on parse1014:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:11:48] <jinxer-wm>	 (PuppetZeroResources) firing: (6) Puppet has failed generate resources on mw1366:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:11:49] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on maps1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:12:10] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2182 (T357189)', diff saved to https://phabricator.wikimedia.org/P57649 and previous config saved to /var/cache/conftool/dbconfig/20240222-001210-arnaudb.json
[00:12:27] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[00:12:36] <wikibugs>	 10SRE-swift-storage, 10UploadWizard: Problem uploading FLAC file in Upload Wizzard to Wikimedia Commons - https://phabricator.wikimedia.org/T355610#9565857 (10Bawolff)
[00:12:58] <wikibugs>	 10SRE-swift-storage, 10MediaWiki-Uploading, 10Patch-For-Review, 10User-revi: FAILED: stashfailed: Could not read file "mwstore://local-swift-eqiad/local-temp/a/ac/15xi9btm14os.u9p1dr.1208681.webm.0". - https://phabricator.wikimedia.org/T200820#9565859 (10Bawolff)
[00:13:02] <jinxer-wm>	 (ProbeDown) resolved: (2) Service ncredir-https:443 has failed probes (http_ncredir-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#ncredir-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:13:38] <jinxer-wm>	 (ProbeDown) firing: (2) Service ncredir-https:443 has failed probes (http_ncredir-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#ncredir-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:13:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on parse1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:14:54] <wikibugs>	 10SRE-swift-storage, 10UploadWizard: Problem with uploading large files (2 GB) - https://phabricator.wikimedia.org/T355433#9565873 (10Bawolff)
[00:14:58] <jinxer-wm>	 (ProbeDown) firing: Service ncredir-https:443 has failed probes (http_ncredir-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#ncredir-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:15:04] <wikibugs>	 10SRE-swift-storage, 10MediaWiki-Uploading, 10Patch-For-Review, 10User-revi: FAILED: stashfailed: Could not read file "mwstore://local-swift-eqiad/local-temp/a/ac/15xi9btm14os.u9p1dr.1208681.webm.0". - https://phabricator.wikimedia.org/T200820#9565875 (10Bawolff)
[00:16:49] <jinxer-wm>	 (PuppetZeroResources) firing: (6) Puppet has failed generate resources on mw1366:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:17:54] <icinga-wm_>	 RECOVERY - PyBal backends health check on lvs4008 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[00:18:48] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1357:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:18:56] <jinxer-wm>	 (ProbeDown) firing: (2) Service ncredir-https:443 has failed probes (http_ncredir-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#ncredir-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:19:54] <icinga-wm_>	 RECOVERY - PyBal backends health check on lvs4010 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[00:19:58] <jinxer-wm>	 (ProbeDown) resolved: (2) Service ncredir-https:443 has failed probes (http_ncredir-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#ncredir-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:21:31] <jinxer-wm>	 (ProbeDown) firing: (2) Service gerrit1003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit1003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:21:48] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on conf1007:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:21:49] <jinxer-wm>	 (PuppetZeroResources) resolved: (5) Puppet has failed generate resources on mw1415:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:22:49] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on mw1403:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:23:38] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job gerrit in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:23:38] <jinxer-wm>	 (ProbeDown) resolved: (2) Service ncredir-https:443 has failed probes (http_ncredir-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#ncredir-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:25:49] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on parse1014:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:26:22] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus: Add script to orchestrate reindexing [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005635 (https://phabricator.wikimedia.org/T356303)
[00:26:31] <jinxer-wm>	 (ProbeDown) resolved: (2) Service gerrit1003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit1003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[00:27:49] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1403:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:28:38] <jinxer-wm>	 (JobUnavailable) resolved: (4) Reduced availability for job gerrit in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:28:48] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1357:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:31:43] <wikibugs>	 10SRE, 10MW-on-K8s, 10Scap, 10serviceops, and 2 others: Scap should check errors coming from mw-on-k8s canaries during deployments - https://phabricator.wikimedia.org/T357402#9565937 (10CodeReviewBot) thcipriani merged https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/219  Check bare metal an...
[00:32:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1426:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:33:48] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1357:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:34:02] <wikibugs>	 (03PS2) 10Ebernhardson: cirrus: Add script to orchestrate reindexing [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005635 (https://phabricator.wikimedia.org/T356303)
[00:37:49] <jinxer-wm>	 (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on mw1403:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:38:19] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1403:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:38:26] <icinga-wm_>	 RECOVERY - Host ps1-c3-codfw is UP: PING OK - Packet loss = 0%, RTA = 31.18 ms
[00:38:48] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1357:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:39:14] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1005530
[00:39:17] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1005530 (owner: 10TrainBranchBot)
[00:40:10] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Not receiving posts or moderation messages - https://phabricator.wikimedia.org/T358020#9565952 (10Legoktm) queue runner seems to have crashed, based on https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3?orgId=1&viewPanel=2&from=now-2d&to=now  {F42031241}  trying to flag down a...
[00:42:39] <wikibugs>	 (03PS1) 10Legoktm: Revert "admin: temporarily revoke legoktm's ssh key" [puppet] - 10https://gerrit.wikimedia.org/r/1005637
[00:43:18] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1391:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:43:48] <icinga-wm_>	 RECOVERY - mailman3_runners on lists1001 is OK: PROCS OK: 14 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[00:43:48] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1357:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:43:51] <rzl>	 !log rzl@lists1001:~$ sudo systemctl restart mailman3
[00:43:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:45:18] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on moss-be1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:47:49] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1421:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:48:23] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10Wikimedia-Incident: Not receiving posts or moderation messages - https://phabricator.wikimedia.org/T358020#9565958 (10RLazarus) 05Open→03Resolved a:03RLazarus Restarted mailman3 at 00:43, icinga alerts are cleared, and the graph in T358020#9565952 is trending down ag...
[00:48:48] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10Wikimedia-Incident: Not receiving posts or moderation messages - https://phabricator.wikimedia.org/T358020#9565961 (10JJMC89) Looks the same as a previous #wikimedia-incident {T331626}
[00:53:18] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1391:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:57:49] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1421:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[00:58:49] <jinxer-wm>	 (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on parse1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:03:18] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1391:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:03:37] <jinxer-wm>	 (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:04:13] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1005530 (owner: 10TrainBranchBot)
[01:07:49] <jinxer-wm>	 (PuppetZeroResources) resolved: (3) Puppet has failed generate resources on mw1421:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:08:19] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1391:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:18:18] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1371:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:23:18] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1371:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:25:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1428:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:25:39] <jinxer-wm>	 (CirrusSearchNodeIndexingNotIncreasing) firing: (4) Elasticsearch instance elastic2057-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[01:35:18] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on mw1428:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:40:33] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1398:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:43:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on chartmuseum1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:48:18] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1364:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:58:18] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1364:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:58:33] <jinxer-wm>	 (PuppetZeroResources) resolved: (3) Puppet has failed generate resources on mw1364:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:58:48] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on chartmuseum1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:59:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on conf1009:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[01:59:49] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on mw1483:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:02:24] <icinga-wm_>	 PROBLEM - Host ps1-c3-codfw is DOWN: PING CRITICAL - Packet loss = 100%
[02:03:18] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1364:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:08:58] <icinga-wm_>	 RECOVERY - Host ps1-c3-codfw is UP: PING OK - Packet loss = 0%, RTA = 32.04 ms
[02:09:49] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on conf1009:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:10:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on parse1007:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:14:48] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on mw1483:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:19:49] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on maps1009:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:20:33] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1398:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:20:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on parse1007:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:22:19] <jinxer-wm>	 (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on mw1398:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:27:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on parse1013:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:30:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on parse1007:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:34:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on registry1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:38:03] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on mw1489:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:38:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1450:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:38:38] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:39:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mwmaint1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:45:48] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on mw1357:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:48:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) send_tile_invalidations.service on maps1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:49:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mwmaint1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:49:51] <jinxer-wm>	 (KubernetesAPINotScrapable) firing: (2) k8s@eqiad is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[02:50:41] <wikibugs>	 (03CR) 10Ssingh: "I may be completely wrong on this and it's late so apologies in advance: it seems like we have a bunch of Puppet failure after this change" [puppet] - 10https://gerrit.wikimedia.org/r/1003112 (https://phabricator.wikimedia.org/T356459) (owner: 10JHathaway)
[02:51:31] <wikibugs>	 (03PS1) 10Ssingh: Revert "etcd: disable the diff output for client config with passwords" [puppet] - 10https://gerrit.wikimedia.org/r/1005482
[02:52:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1468:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:53:03] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on mw1489:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:55:48] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on mw1357:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:58:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1448:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[02:59:49] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on maps1009:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:02:32] <icinga-wm_>	 PROBLEM - WDQS SPARQL on wdqs2011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:02:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1468:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:06:22] <icinga-wm_>	 RECOVERY - WDQS SPARQL on wdqs2011 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 0.194 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:07:49] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1467:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:08:18] <jinxer-wm>	 (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on mw1448:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:13:18] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on mw1413:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:13:38] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:20:49] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on parse1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:22:48] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1467:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:33:49] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on mw1352:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:34:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on mw1443:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:35:48] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on parse1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:42:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1484:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:43:49] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1352:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:44:48] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on registry1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:48:49] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1352:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:49:48] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on mw1443:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:53:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on maps1005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:53:49] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1352:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:55:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on puppetmaster1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[03:55:49] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on parse1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:00:18] <icinga-wm_>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:00:48] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on parse1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:01:06] <icinga-wm_>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:01:18] <icinga-wm_>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 8.941 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:01:58] <icinga-wm_>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 51451 bytes in 0.060 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[04:03:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on dragonfly-supernode1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:05:25] <jinxer-wm>	 (SystemdUnitFailed) firing: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:08:49] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on mw1352:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:10:48] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on puppetmaster1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:11:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on poolcounter1005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:13:49] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on mw1352:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:16:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on mw1426:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:18:49] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on mw1352:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:20:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on moss-be1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:21:48] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on poolcounter1005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:23:49] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1352:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:28:49] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1352:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:30:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on parse1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:31:48] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on mw1426:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:32:49] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on mw1484:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:33:48] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on maps1005:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:33:49] <jinxer-wm>	 (PuppetZeroResources) firing: (6) Puppet has failed generate resources on mw1352:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:37:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1398:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:40:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1484:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:42:18] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1398:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:43:18] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on dragonfly-supernode1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:43:49] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1411:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:48:55] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) update-tails-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:50:45] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] Revert "etcd: disable the diff output for client config with passwords" [puppet] - 10https://gerrit.wikimedia.org/r/1005482 (owner: 10Ssingh)
[04:52:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1398:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:52:34] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1398:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:55:18] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on mw1485:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:55:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on parse1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:57:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1398:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[04:57:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on maps1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:00:48] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on moss-be1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:03:49] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1411:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:06:06] <icinga-wm_>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS6939/IPv6: Idle - HE, AS6939/IPv4: Idle - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[05:06:54] <icinga-wm_>	 PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 70, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:07:48] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on maps1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:10:48] <jinxer-wm>	 (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on parse1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:11:18] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on mw1445:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:13:49] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1411:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:16:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on parse1022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:22:12] <icinga-wm_>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 211, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:22:18] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on mw1398:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:22:52] <icinga-wm_>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 121, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:25:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1398:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:25:39] <jinxer-wm>	 (CirrusSearchNodeIndexingNotIncreasing) firing: (4) Elasticsearch instance elastic2057-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[05:26:18] <jinxer-wm>	 (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on parse1022:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:33:49] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1372:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:35:18] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1398:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:40:18] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1443:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:40:22] <icinga-wm_>	 RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 71, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:40:26] <icinga-wm_>	 RECOVERY - BGP status on cr4-ulsfo is OK: BGP OK - up: 135, down: 4, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[05:41:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on irc2001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:44:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on mw1356:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:48:37] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) update-tails-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:49:08] <icinga-wm_>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:49:44] <icinga-wm_>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:49:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1356:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:50:18] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1443:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:50:33] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1443:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:51:02] <icinga-wm_>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.247 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:51:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1445:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:51:38] <icinga-wm_>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 51451 bytes in 0.100 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[05:51:48] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on irc2001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:53:49] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1372:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:55:18] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1400:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[05:59:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-api-int (k8s) 1.325s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[05:59:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1356:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:01:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1445:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:03:49] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1372:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:04:48] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1356:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:09:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-api-int (k8s) 1.142s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[06:12:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-api-int (k8s) 1.043s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[06:13:49] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on mw1372:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:14:49] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1356:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:17:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-api-int (k8s) 1.037s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[06:18:49] <jinxer-wm>	 (PuppetZeroResources) firing: (8) Puppet has failed generate resources on mw1370:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:19:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on maps1009:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:20:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1400:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:23:49] <jinxer-wm>	 (PuppetZeroResources) firing: (9) Puppet has failed generate resources on mw1370:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:24:48] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1356:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:28:49] <jinxer-wm>	 (PuppetZeroResources) firing: (9) Puppet has failed generate resources on mw1370:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:29:48] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1356:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:30:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1400:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:33:49] <jinxer-wm>	 (PuppetZeroResources) firing: (7) Puppet has failed generate resources on mw1370:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:33:50] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on seaborgium:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:37:11] <wikibugs>	 (03PS1) 10Marostegui: Revert "db2137: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1005483
[06:38:49] <jinxer-wm>	 (PuppetZeroResources) firing: (7) Puppet has failed generate resources on mw1370:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:38:51] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db2137: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1005483 (owner: 10Marostegui)
[06:39:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57650 and previous config saved to /var/cache/conftool/dbconfig/20240222-063923-root.json
[06:42:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote es1030 as es2 master T358080', diff saved to https://phabricator.wikimedia.org/P57651 and previous config saved to /var/cache/conftool/dbconfig/20240222-064205-marostegui.json
[06:42:11] <stashbot>	 T358080: Upgrade es2 to MariaDB 10.6 - https://phabricator.wikimedia.org/T358080
[06:42:26] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] deployment_server: Add mwscript_k8s [puppet] - 10https://gerrit.wikimedia.org/r/988851 (https://phabricator.wikimedia.org/T341553) (owner: 10RLazarus)
[06:42:47] <wikibugs>	 (03PS1) 10Marostegui: es1033: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1005649 (https://phabricator.wikimedia.org/T358080)
[06:42:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es1033 T358080', diff saved to https://phabricator.wikimedia.org/P57652 and previous config saved to /var/cache/conftool/dbconfig/20240222-064253-root.json
[06:43:50] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on seaborgium:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:44:23] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.reimage for host es1033.eqiad.wmnet with OS bookworm
[06:44:29] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] es1033: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1005649 (https://phabricator.wikimedia.org/T358080) (owner: 10Marostegui)
[06:46:18] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on mw1445:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:46:32] <wikibugs>	 (03PS1) 10Marostegui: clouddb1017: Migration to 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/1005650 (https://phabricator.wikimedia.org/T356838)
[06:46:44] <logmsgbot>	 !log marostegui@cumin1002 conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s3
[06:46:46] <logmsgbot>	 !log marostegui@cumin1002 conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s1
[06:47:33] <logmsgbot>	 !log marostegui@cumin1002 conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s1
[06:47:42] <logmsgbot>	 !log marostegui@cumin1002 conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s3
[06:48:07] <logmsgbot>	 !log marostegui@cumin1002 conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
[06:48:10] <logmsgbot>	 !log marostegui@cumin1002 conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
[06:48:19] <logmsgbot>	 !log marostegui@cumin1002 conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
[06:48:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) send_tile_invalidations.service on maps1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:48:32] <wikibugs>	 (03CR) 10Marostegui: "The host is already depooled:" [puppet] - 10https://gerrit.wikimedia.org/r/1005650 (https://phabricator.wikimedia.org/T356838) (owner: 10Marostegui)
[06:48:49] <jinxer-wm>	 (PuppetZeroResources) firing: (7) Puppet has failed generate resources on mw1370:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:49:49] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:50:06] <jinxer-wm>	 (KubernetesAPINotScrapable) firing: (2) k8s@eqiad is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[06:53:49] <jinxer-wm>	 (PuppetZeroResources) firing: (6) Puppet has failed generate resources on mw1370:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:54:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57653 and previous config saved to /var/cache/conftool/dbconfig/20240222-065428-root.json
[06:55:54] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] mediawiki: Support one-off jobs [deployment-charts] - 10https://gerrit.wikimedia.org/r/988849 (https://phabricator.wikimedia.org/T341553) (owner: 10RLazarus)
[06:57:52] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on es1033.eqiad.wmnet with reason: host reimage
[06:58:01] <logmsgbot>	 !log marostegui@cumin1002 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on es1033.eqiad.wmnet with reason: host reimage
[06:58:40] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] "Overall LGTM, couple comments on helmfile.yaml, but they're not in the way of merging this patch." [deployment-charts] - 10https://gerrit.wikimedia.org/r/988850 (https://phabricator.wikimedia.org/T341553) (owner: 10RLazarus)
[06:58:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on dragonfly-supernode1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[06:58:49] <jinxer-wm>	 (PuppetZeroResources) firing: (7) Puppet has failed generate resources on mw1368:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T0700)
[07:00:04] <jouncebot>	 kormat, marostegui, Amir1, and arnaudb: How many deployers does it take to do Primary database switchover deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T0700).
[07:03:49] <jinxer-wm>	 (PuppetZeroResources) firing: (8) Puppet has failed generate resources on mw1368:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:04:49] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:09:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57654 and previous config saved to /var/cache/conftool/dbconfig/20240222-070933-root.json
[07:09:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on parse1010:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:11:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on parse1014:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:12:20] <icinga-wm_>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:13:02] <icinga-wm_>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 212, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:13:49] <jinxer-wm>	 (PuppetZeroResources) firing: (7) Puppet has failed generate resources on mw1368:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:14:48] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on parse1010:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:17:10] <wikibugs>	 (03PS18) 10ArielGlenn: sql/xml dumps: add role for helper worker for wikidata full history dumps [puppet] - 10https://gerrit.wikimedia.org/r/993659 (https://phabricator.wikimedia.org/T252396)
[07:18:22] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sql/xml dumps: add role for helper worker for wikidata full history dumps [puppet] - 10https://gerrit.wikimedia.org/r/993659 (https://phabricator.wikimedia.org/T252396) (owner: 10ArielGlenn)
[07:19:51] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1033.eqiad.wmnet with OS bookworm
[07:20:18] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1400:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:23:10] <wikibugs>	 (03PS19) 10ArielGlenn: sql/xml dumps: add role for helper worker for wikidata full history dumps [puppet] - 10https://gerrit.wikimedia.org/r/993659 (https://phabricator.wikimedia.org/T252396)
[07:23:49] <jinxer-wm>	 (PuppetZeroResources) firing: (6) Puppet has failed generate resources on mw1368:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:24:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57655 and previous config saved to /var/cache/conftool/dbconfig/20240222-072438-root.json
[07:24:48] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on parse1010:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:25:03] <wikibugs>	 (03PS1) 10Marostegui: Revert "es1033: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1005484
[07:25:18] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1400:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:25:54] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1005095 (https://phabricator.wikimedia.org/T357749) (owner: 10Muehlenhoff)
[07:27:17] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "es1033: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1005484 (owner: 10Marostegui)
[07:27:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1033 (re)pooling @ 1%: After migration', diff saved to https://phabricator.wikimedia.org/P57656 and previous config saved to /var/cache/conftool/dbconfig/20240222-072729-root.json
[07:28:49] <jinxer-wm>	 (PuppetZeroResources) firing: (6) Puppet has failed generate resources on mw1368:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:30:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote es2026 as es2 codfw master T358080', diff saved to https://phabricator.wikimedia.org/P57657 and previous config saved to /var/cache/conftool/dbconfig/20240222-073017-marostegui.json
[07:30:18] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1400:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:30:25] <stashbot>	 T358080: Upgrade es2 to MariaDB 10.6 - https://phabricator.wikimedia.org/T358080
[07:33:49] <jinxer-wm>	 (PuppetZeroResources) firing: (6) Puppet has failed generate resources on mw1368:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:34:48] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on parse1010:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:35:19] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1400:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:35:34] <jinxer-wm>	 (PuppetZeroResources) resolved: (3) Puppet has failed generate resources on mw1400:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:35:54] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Promote es2026 to es2 master [dns] - 10https://gerrit.wikimedia.org/r/1005653 (https://phabricator.wikimedia.org/T358080)
[07:38:49] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on mw1368:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:39:40] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] wmnet: Promote es2026 to es2 master [dns] - 10https://gerrit.wikimedia.org/r/1005653 (https://phabricator.wikimedia.org/T358080) (owner: 10Marostegui)
[07:39:43] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57658 and previous config saved to /var/cache/conftool/dbconfig/20240222-073943-root.json
[07:39:48] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on parse1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:40:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es2033 T358080', diff saved to https://phabricator.wikimedia.org/P57659 and previous config saved to /var/cache/conftool/dbconfig/20240222-074042-root.json
[07:40:48] <stashbot>	 T358080: Upgrade es2 to MariaDB 10.6 - https://phabricator.wikimedia.org/T358080
[07:42:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1033 (re)pooling @ 5%: After migration', diff saved to https://phabricator.wikimedia.org/P57660 and previous config saved to /var/cache/conftool/dbconfig/20240222-074233-root.json
[07:43:49] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on mw1368:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:48:07] <wikibugs>	 (03PS1) 10Marostegui: s8-pager.sql: this is not needed anymore [software] - 10https://gerrit.wikimedia.org/r/1005681
[07:48:25] <wikibugs>	 (03PS2) 10Marostegui: s8-pager.sql: this is not needed anymore [software] - 10https://gerrit.wikimedia.org/r/1005681
[07:49:49] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on parse1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:51:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1468:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:53:49] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1368:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[07:54:17] <wikibugs>	 (03PS1) 10Marostegui: es2033: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1005682
[07:54:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57661 and previous config saved to /var/cache/conftool/dbconfig/20240222-075448-root.json
[07:54:54] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.reimage for host es2033.codfw.wmnet with OS bookworm
[07:55:10] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] s8-pager.sql: this is not needed anymore [software] - 10https://gerrit.wikimedia.org/r/1005681 (owner: 10Marostegui)
[07:55:31] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] es2033: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1005682 (owner: 10Marostegui)
[07:55:42] <wikibugs>	 (03Merged) 10jenkins-bot: s8-pager.sql: this is not needed anymore [software] - 10https://gerrit.wikimedia.org/r/1005681 (owner: 10Marostegui)
[07:57:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: After migration', diff saved to https://phabricator.wikimedia.org/P57662 and previous config saved to /var/cache/conftool/dbconfig/20240222-075738-root.json
[07:58:18] <taavi>	 !log taavi@puppetmaster1002 ~ $ sudo systemctl restart apache2 # lots of 'Error 500 on SERVER: Server Error: undefined method `content' for nil:NilClass' in the logs, seems to have helped
[07:58:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:00:04] <jouncebot>	 Amir1 and Urbanecm: gettimeofday() says it's time for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T0800)
[08:00:04] <jouncebot>	 hoo: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:01:46] <wikibugs>	 (03PS1) 10Hoo man: Migrate to virtual domain mapping [extensions/Cognate] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005485 (https://phabricator.wikimedia.org/T348526)
[08:01:49] <jinxer-wm>	 (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on mw1468:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[08:03:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by hoo@deploy2002 using scap backport" [extensions/Cognate] (wmf/1.42.0-wmf.18) - 10https://gerrit.wikimedia.org/r/1005467 (https://phabricator.wikimedia.org/T348526) (owner: 10Hoo man)
[08:03:15] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by hoo@deploy2002 using scap backport" [extensions/Cognate] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005485 (https://phabricator.wikimedia.org/T348526) (owner: 10Hoo man)
[08:03:49] <jinxer-wm>	 (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on mw1397:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[08:04:19] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1372:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[08:04:49] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on maps1009:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[08:04:49] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on parse1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[08:05:31] <wikibugs>	 (03Merged) 10jenkins-bot: Migrate to virtual domain mapping [extensions/Cognate] (wmf/1.42.0-wmf.18) - 10https://gerrit.wikimedia.org/r/1005467 (https://phabricator.wikimedia.org/T348526) (owner: 10Hoo man)
[08:05:38] <wikibugs>	 (03Merged) 10jenkins-bot: Migrate to virtual domain mapping [extensions/Cognate] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005485 (https://phabricator.wikimedia.org/T348526) (owner: 10Hoo man)
[08:05:41] <jinxer-wm>	 (SystemdUnitFailed) firing: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:06:30] <logmsgbot>	 !log hoo@deploy2002 Started scap: Backport for [[gerrit:1005467|Migrate to virtual domain mapping (T348526)]], [[gerrit:1005485|Migrate to virtual domain mapping (T348526)]]
[08:06:36] <stashbot>	 T348526: [COG] [TECH] Migrate Cognate to use a virtual database domain - https://phabricator.wikimedia.org/T348526
[08:08:04] <logmsgbot>	 !log hoo@deploy2002 hoo: Backport for [[gerrit:1005467|Migrate to virtual domain mapping (T348526)]], [[gerrit:1005485|Migrate to virtual domain mapping (T348526)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:09:19] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on mw1372:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[08:12:23] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on es2033.codfw.wmnet with reason: host reimage
[08:12:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: After migration', diff saved to https://phabricator.wikimedia.org/P57663 and previous config saved to /var/cache/conftool/dbconfig/20240222-081243-root.json
[08:13:03] <logmsgbot>	 !log hoo@deploy2002 hoo: Continuing with sync
[08:14:37] <wikibugs>	 (03CR) 10Majavah: [C: 03+1] clouddb1017: Migration to 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/1005650 (https://phabricator.wikimedia.org/T356838) (owner: 10Marostegui)
[08:14:49] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on parse1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[08:14:51] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] clouddb1017: Migration to 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/1005650 (https://phabricator.wikimedia.org/T356838) (owner: 10Marostegui)
[08:14:55] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2033.codfw.wmnet with reason: host reimage
[08:16:34] <wikibugs>	 (03CR) 10ArielGlenn: "The full diff looks good to my eyes:" [puppet] - 10https://gerrit.wikimedia.org/r/993659 (https://phabricator.wikimedia.org/T252396) (owner: 10ArielGlenn)
[08:16:55] <wikibugs>	 (03PS1) 10Marostegui: Revert "es2033: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1005686
[08:19:19] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on mw1372:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[08:19:59] <wikibugs>	 (03PS1) 10Marostegui: clouddb1016: Migrate to 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/1005684 (https://phabricator.wikimedia.org/T356838)
[08:20:33] <logmsgbot>	 !log marostegui@cumin1002 conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s3
[08:20:38] <logmsgbot>	 !log marostegui@cumin1002 conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
[08:20:54] <wikibugs>	 (03CR) 10Marostegui: "Not yet depooled, will do it before merging" [puppet] - 10https://gerrit.wikimedia.org/r/1005684 (https://phabricator.wikimedia.org/T356838) (owner: 10Marostegui)
[08:21:15] <logmsgbot>	 !log hoo@deploy2002 Finished scap: Backport for [[gerrit:1005467|Migrate to virtual domain mapping (T348526)]], [[gerrit:1005485|Migrate to virtual domain mapping (T348526)]] (duration: 14m 44s)
[08:21:20] <stashbot>	 T348526: [COG] [TECH] Migrate Cognate to use a virtual database domain - https://phabricator.wikimedia.org/T348526
[08:23:44] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 138997
[08:24:17] <logmsgbot>	 !log ayounsi@cumin1002 END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 138997
[08:24:19] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1372:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[08:24:26] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'configure' for AS: 138997
[08:24:49] <jinxer-wm>	 (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on parse1006:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[08:25:09] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138997
[08:26:05] <wikibugs>	 (03CR) 10Majavah: [C: 03+1] clouddb1016: Migrate to 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/1005684 (https://phabricator.wikimedia.org/T356838) (owner: 10Marostegui)
[08:27:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on parse1013:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[08:27:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: After migration', diff saved to https://phabricator.wikimedia.org/P57664 and previous config saved to /var/cache/conftool/dbconfig/20240222-082750-root.json
[08:28:25] <wikibugs>	 (03PS1) 10Marostegui: Revert "wmnet: Promote es2026 to es2 master" [dns] - 10https://gerrit.wikimedia.org/r/1005687
[08:28:25] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'configure' for AS: 18779
[08:29:08] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 18779
[08:29:29] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "wmnet: Promote es2026 to es2 master" [dns] - 10https://gerrit.wikimedia.org/r/1005687 (owner: 10Marostegui)
[08:30:53] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2033.codfw.wmnet with OS bookworm
[08:30:53] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "es2033: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1005686 (owner: 10Marostegui)
[08:31:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2033 (re)pooling @ 1%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57665 and previous config saved to /var/cache/conftool/dbconfig/20240222-083111-root.json
[08:34:19] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1372:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[08:34:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on moss-be1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[08:42:10] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
[08:42:23] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
[08:42:23] <logmsgbot>	 !log jelto@cumin1002 START - Cookbook sre.hosts.reboot-single for host puppetmaster1002.eqiad.wmnet
[08:42:24] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[08:42:29] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[08:42:35] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1167 (T357189)', diff saved to https://phabricator.wikimedia.org/P57666 and previous config saved to /var/cache/conftool/dbconfig/20240222-084235-arnaudb.json
[08:42:45] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[08:42:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: After migration', diff saved to https://phabricator.wikimedia.org/P57667 and previous config saved to /var/cache/conftool/dbconfig/20240222-084255-root.json
[08:44:34] <logmsgbot>	 !log jelto@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1002.eqiad.wmnet
[08:46:16] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2033 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57668 and previous config saved to /var/cache/conftool/dbconfig/20240222-084616-root.json
[08:52:01] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on db[2143,2195].codfw.wmnet,db1187.eqiad.wmnet with reason: Silence for reboot T356240
[08:52:06] <jayme>	 !log rolling out prometheus-rsyslog-exporter 1.0.0+git20221110-1 to wikikube nodes - T357616
[08:52:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:52:12] <stashbot>	 T357616: Logs from containers sometimes not visible in logstash - https://phabricator.wikimedia.org/T357616
[08:52:15] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2143,2195].codfw.wmnet,db1187.eqiad.wmnet with reason: Silence for reboot T356240
[08:53:23] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, 10cloud-services-team (FY2023/2024-Q3-Q4): cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9566585 (10dcaro)
[08:55:22] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'T356240 - depooling db1187  db2143 db2195', diff saved to https://phabricator.wikimedia.org/P57669 and previous config saved to /var/cache/conftool/dbconfig/20240222-085521-arnaudb.json
[08:55:36] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.upgrade for db1180.eqiad.wmnet
[08:55:53] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.upgrade for db2143.codfw.wmnet
[08:56:06] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.upgrade for db2195.codfw.wmnet
[08:56:18] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): The python-build images regenerate wheels even when matching ones are already available - https://phabricator.wikimedia.org/T259611#9566586 (10hashar) That has been solved by setting PIP_FIN...
[08:58:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: After migration', diff saved to https://phabricator.wikimedia.org/P57670 and previous config saved to /var/cache/conftool/dbconfig/20240222-085800-root.json
[08:58:24] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove puppetmaster1002 from Puppet 5 for now [puppet] - 10https://gerrit.wikimedia.org/r/1005708
[08:59:08] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Enable $wgLocalHTTPProxy on group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1004135 (https://phabricator.wikimedia.org/T298265) (owner: 10Clément Goubert)
[08:59:33] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1180.eqiad.wmnet
[08:59:53] <wikibugs>	 (03CR) 10Jelto: [C: 03+1] "we can try that for now until puppetmaster1002 is fixed/replaced, so lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/1005708 (owner: 10Muehlenhoff)
[08:59:55] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Remove puppetmaster1002 from Puppet 5 for now [puppet] - 10https://gerrit.wikimedia.org/r/1005708 (owner: 10Muehlenhoff)
[09:00:43] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2195.codfw.wmnet
[09:01:10] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove puppetmaster1002 from Puppet 5 for now [puppet] - 10https://gerrit.wikimedia.org/r/1005708 (owner: 10Muehlenhoff)
[09:01:18] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2143.codfw.wmnet
[09:01:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57671 and previous config saved to /var/cache/conftool/dbconfig/20240222-090121-root.json
[09:03:39] <jayme>	 !log restart prometheus@k8s in eqiad - T343529
[09:03:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:03:44] <stashbot>	 T343529: Prometheus doesn't reload or alert on expired client certificates - https://phabricator.wikimedia.org/T343529
[09:03:50] <wikibugs>	 (03CR) 10Brouberol: [C: 03+1] "The diff looks good. I trust you on the actual absented jobs." [puppet] - 10https://gerrit.wikimedia.org/r/1005565 (https://phabricator.wikimedia.org/T357419) (owner: 10Joal)
[09:04:19] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on mw1372:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:06:50] <wikibugs>	 (03PS3) 10Ayounsi: users: add jwheeler to analytics_privatedata_users [puppet] - 10https://gerrit.wikimedia.org/r/1004187 (https://phabricator.wikimedia.org/T357731) (owner: 10Hnowlan)
[09:07:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1494:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:07:49] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on mw1398:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:08:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on dragonfly-supernode1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:09:19] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1372:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:09:51] <jinxer-wm>	 (KubernetesAPINotScrapable) resolved: (2) k8s@eqiad is failing to scrape the k8s api - https://phabricator.wikimedia.org/T343529 - TODO - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPINotScrapable
[09:10:26] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure: Connection errors from puppetmaster1002 to puppetdb - https://phabricator.wikimedia.org/T358187#9566631 (10MoritzMuehlenhoff)
[09:10:50] <brouberol>	 ^ we're seeing quite a few puppet errors at the time, with messages such as "Error 500 on SERVER: Server Error: Could not retrieve facts for xxx.xxx.wmnet: Failed to find facts from PuppetDB at puppet:8140: undefined method `content' for nil:NilClass"
[09:11:08] <brouberol>	 Ah, I think this is related to the ticket m.oritz just created
[09:12:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on mw1494:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:13:48] <jinxer-wm>	 (PuppetZeroResources) firing: (2) Puppet has failed generate resources on dragonfly-supernode1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:14:19] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on mw1372:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:14:45] <wikibugs>	 (03PS2) 10Ayounsi: Update brion to bvibber [puppet] - 10https://gerrit.wikimedia.org/r/1005441 (https://phabricator.wikimedia.org/T358044)
[09:14:48] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on moss-be1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:15:36] <wikibugs>	 (03CR) 10Ayounsi: Update brion to bvibber (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1005441 (https://phabricator.wikimedia.org/T358044) (owner: 10Ayounsi)
[09:16:13] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Update brion to bvibber [puppet] - 10https://gerrit.wikimedia.org/r/1005441 (https://phabricator.wikimedia.org/T358044) (owner: 10Ayounsi)
[09:16:25] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure: Connection errors from puppetmaster1002 to puppetdb - https://phabricator.wikimedia.org/T358187#9566643 (10Jelto) > A restart of Apache and a reboot of puppetmaster1002 did not help.  This restarts //probably// had different effects. It seems the...
[09:16:26] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2033 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57672 and previous config saved to /var/cache/conftool/dbconfig/20240222-091626-root.json
[09:16:38] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/1005021 (owner: 10Ayounsi)
[09:17:35] <wikibugs>	 (03PS1) 10Filippo Giunchedi: sre: move PuppetZeroResources to warning [alerts] - 10https://gerrit.wikimedia.org/r/1005712 (https://phabricator.wikimedia.org/T357893)
[09:17:49] <jinxer-wm>	 (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on mw1494:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:18:48] <wikibugs>	 (03CR) 10Muehlenhoff: Update brion to bvibber (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1005441 (https://phabricator.wikimedia.org/T358044) (owner: 10Ayounsi)
[09:18:49] <jinxer-wm>	 (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on dragonfly-supernode1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:19:19] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on mw1364:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:20:14] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "Sorry for the trouble. Interesting, I wonder why PCC didn't catch it, it was run on like 55 hosts with the resource..." [puppet] - 10https://gerrit.wikimedia.org/r/1003112 (https://phabricator.wikimedia.org/T356459) (owner: 10JHathaway)
[09:22:49] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on mw1398:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:22:55] <wikibugs>	 (03PS3) 10Ayounsi: Update brion to bvibber [puppet] - 10https://gerrit.wikimedia.org/r/1005441 (https://phabricator.wikimedia.org/T358044)
[09:23:18] <wikibugs>	 (03CR) 10Ayounsi: Update brion to bvibber (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1005441 (https://phabricator.wikimedia.org/T358044) (owner: 10Ayounsi)
[09:23:49] <wikibugs>	 (03CR) 10Muehlenhoff: "Do you have a list of servers which failed? Might have been unrelated to this patch, but https://phabricator.wikimedia.org/T358187 ?" [puppet] - 10https://gerrit.wikimedia.org/r/1003112 (https://phabricator.wikimedia.org/T356459) (owner: 10JHathaway)
[09:25:04] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1180 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57673 and previous config saved to /var/cache/conftool/dbconfig/20240222-092503-arnaudb.json
[09:25:54] <jinxer-wm>	 (CirrusSearchNodeIndexingNotIncreasing) firing: (4) Elasticsearch instance elastic2057-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[09:26:10] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db2195 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57674 and previous config saved to /var/cache/conftool/dbconfig/20240222-092609-arnaudb.json
[09:29:19] <jinxer-wm>	 (PuppetZeroResources) resolved: (5) Puppet has failed generate resources on mw1364:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[09:29:49] <wikibugs>	 (03CR) 10Volans: "Adding my 2 cents to the discussion" [puppet] - 10https://gerrit.wikimedia.org/r/1005140 (owner: 10Ssingh)
[09:31:31] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2033 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57675 and previous config saved to /var/cache/conftool/dbconfig/20240222-093130-root.json
[09:33:10] <wikibugs>	 (03CR) 10Muehlenhoff: "Looks good, but before merging let's wait for feedback on https://phabricator.wikimedia.org/T358044#9562598" [puppet] - 10https://gerrit.wikimedia.org/r/1005441 (https://phabricator.wikimedia.org/T358044) (owner: 10Ayounsi)
[09:38:08] <wikibugs>	 (03CR) 10MVernon: [C: 03+1] "Thanks, there was a lot of alert spam this morning!" [alerts] - 10https://gerrit.wikimedia.org/r/1005712 (https://phabricator.wikimedia.org/T357893) (owner: 10Filippo Giunchedi)
[09:39:32] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] sre: move PuppetZeroResources to warning [alerts] - 10https://gerrit.wikimedia.org/r/1005712 (https://phabricator.wikimedia.org/T357893) (owner: 10Filippo Giunchedi)
[09:39:36] <wikibugs>	 (03CR) 10Jelto: "one note from todays incident: the total failure was around 7% globally (because only in eqiad and puppet 5). WidespreadPuppetFailure aler" [alerts] - 10https://gerrit.wikimedia.org/r/1005712 (https://phabricator.wikimedia.org/T357893) (owner: 10Filippo Giunchedi)
[09:39:46] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] sre: move PuppetZeroResources to warning [alerts] - 10https://gerrit.wikimedia.org/r/1005712 (https://phabricator.wikimedia.org/T357893) (owner: 10Filippo Giunchedi)
[09:40:09] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57677 and previous config saved to /var/cache/conftool/dbconfig/20240222-094008-arnaudb.json
[09:41:14] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db2195 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57678 and previous config saved to /var/cache/conftool/dbconfig/20240222-094114-arnaudb.json
[09:42:58] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T357189)', diff saved to https://phabricator.wikimedia.org/P57679 and previous config saved to /var/cache/conftool/dbconfig/20240222-094257-arnaudb.json
[09:43:04] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[09:46:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2033 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57680 and previous config saved to /var/cache/conftool/dbconfig/20240222-094635-root.json
[09:47:24] <wikibugs>	 10SRE-swift-storage, 10Observability-Metrics, 10SRE Observability (FY2023/2024-Q3): Capacity planning/estimation for Thanos - https://phabricator.wikimedia.org/T357747#9566703 (10fgiunchedi)
[09:48:37] <jinxer-wm>	 (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:49:07] <wikibugs>	 (03CR) 10Volans: "the idea looks ok, minor nits inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421) (owner: 10Cathal Mooney)
[09:49:50] <wikibugs>	 10SRE-swift-storage, 10Observability-Metrics, 10SRE Observability (FY2023/2024-Q3): Capacity planning/estimation for Thanos - https://phabricator.wikimedia.org/T357747#9566716 (10fgiunchedi) >>! In T357747#9562810, @MatthewVernon wrote: > I think the proposed table should look like this? >  > | # weeks | GBs...
[09:53:36] <wikibugs>	 (03CR) 10Muehlenhoff: "Not sure if this is really an adequate replacement? For the https://phabricator.wikimedia.org/T358187 incident I don't see a WidespreadPup" [alerts] - 10https://gerrit.wikimedia.org/r/1005712 (https://phabricator.wikimedia.org/T357893) (owner: 10Filippo Giunchedi)
[09:55:14] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57681 and previous config saved to /var/cache/conftool/dbconfig/20240222-095513-arnaudb.json
[09:56:19] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db2195 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57682 and previous config saved to /var/cache/conftool/dbconfig/20240222-095619-arnaudb.json
[09:58:04] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P57683 and previous config saved to /var/cache/conftool/dbconfig/20240222-095804-arnaudb.json
[10:01:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57684 and previous config saved to /var/cache/conftool/dbconfig/20240222-100140-root.json
[10:02:56] <wikibugs>	 (03CR) 10Fabfur: [V: 03+1] haproxy: configure extended logging (preparatory for Benthos) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1005548 (https://phabricator.wikimedia.org/T358105) (owner: 10Fabfur)
[10:10:19] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57685 and previous config saved to /var/cache/conftool/dbconfig/20240222-101018-arnaudb.json
[10:11:24] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db2195 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57686 and previous config saved to /var/cache/conftool/dbconfig/20240222-101123-arnaudb.json
[10:13:11] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P57687 and previous config saved to /var/cache/conftool/dbconfig/20240222-101310-arnaudb.json
[10:18:58] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] "Good point re: WidespreadPuppetFailure I have reopened to investigate https://phabricator.wikimedia.org/T357893" [alerts] - 10https://gerrit.wikimedia.org/r/1005712 (https://phabricator.wikimedia.org/T357893) (owner: 10Filippo Giunchedi)
[10:26:37] <wikibugs>	 (03PS1) 10JMeybohm: Don't restart rsyslog on updates, kill exporter instead [debs/prometheus-rsyslog-exporter] - 10https://gerrit.wikimedia.org/r/1005718 (https://phabricator.wikimedia.org/T357616)
[10:26:53] <wikibugs>	 (03PS2) 10Clément Goubert: Enable $wgLocalHTTPProxy on group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1004135 (https://phabricator.wikimedia.org/T298265)
[10:28:17] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T357189)', diff saved to https://phabricator.wikimedia.org/P57688 and previous config saved to /var/cache/conftool/dbconfig/20240222-102817-arnaudb.json
[10:28:19] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[10:28:23] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[10:28:33] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[10:28:47] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
[10:29:00] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
[10:29:07] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1172 (T357189)', diff saved to https://phabricator.wikimedia.org/P57689 and previous config saved to /var/cache/conftool/dbconfig/20240222-102906-arnaudb.json
[10:29:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add puppetised java.security config file for hardened TLS settings [puppet] - 10https://gerrit.wikimedia.org/r/1005095 (https://phabricator.wikimedia.org/T357749) (owner: 10Muehlenhoff)
[10:30:50] <wikibugs>	 (03PS1) 10EoghanGaffney: [apt/gitlab] Add new package for Gitlab update [puppet] - 10https://gerrit.wikimedia.org/r/1005721 (https://phabricator.wikimedia.org/T358182)
[10:31:01] <logmsgbot>	 !log marostegui@cumin1002 conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s5
[10:31:05] <logmsgbot>	 !log marostegui@cumin1002 conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s8
[10:31:12] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1005721 (https://phabricator.wikimedia.org/T358182) (owner: 10EoghanGaffney)
[10:31:26] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T357189)', diff saved to https://phabricator.wikimedia.org/P57690 and previous config saved to /var/cache/conftool/dbconfig/20240222-103125-arnaudb.json
[10:31:53] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] clouddb1016: Migrate to 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/1005684 (https://phabricator.wikimedia.org/T356838) (owner: 10Marostegui)
[10:32:07] <wikibugs>	 (03CR) 10EoghanGaffney: [C: 03+2] [apt/gitlab] Add new package for Gitlab update [puppet] - 10https://gerrit.wikimedia.org/r/1005721 (https://phabricator.wikimedia.org/T358182) (owner: 10EoghanGaffney)
[10:35:31] <logmsgbot>	 !log marostegui@cumin1002 conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s8
[10:35:34] <logmsgbot>	 !log marostegui@cumin1002 conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s5
[10:38:38] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:43:20] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good (but this change also applies to the production IDPs let's disable Puppet on idp1002/2002 before rollout)." [puppet] - 10https://gerrit.wikimedia.org/r/1005094 (https://phabricator.wikimedia.org/T357749) (owner: 10Slyngshede)
[10:43:38] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job atlas_exporter in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:46:32] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P57692 and previous config saved to /var/cache/conftool/dbconfig/20240222-104632-arnaudb.json
[10:46:36] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Message-ID: <CACNAgmwRgLo4Uy0w9ZtAi07kB_rA9wxC5T3p+kr+ftWDUq7hWg@mail.gmail.com> delayed by 3 days - https://phabricator.wikimedia.org/T358198#9566926 (10saper)
[10:47:39] <wikibugs>	 10SRE, 10MW-on-K8s, 10Scap, 10serviceops, 10Release-Engineering-Team (Now this 🫠): Find a way to address canary releases directly - https://phabricator.wikimedia.org/T358117#9566949 (10Clement_Goubert) We've talked this over, and while doing swagger checks made sense when there were just a few canaries o...
[10:48:27] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) send_tile_invalidations.service on maps1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:49:41] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Message-ID: <CACNAgmwRgLo4Uy0w9ZtAi07kB_rA9wxC5T3p+kr+ftWDUq7hWg@mail.gmail.com> delayed by 3 days - https://phabricator.wikimedia.org/T358198#9566956 (10taavi)
[10:50:54] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10Wikimedia-Incident: Not receiving posts or moderation messages - https://phabricator.wikimedia.org/T358020#9566958 (10taavi)
[10:52:40] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Message-ID: <CACNAgmwRgLo4Uy0w9ZtAi07kB_rA9wxC5T3p+kr+ftWDUq7hWg@mail.gmail.com> delayed by 3 days - https://phabricator.wikimedia.org/T358198#9566926 (10saper) p:05Triage→03Low
[10:52:51] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Message-ID: <CACNAgmwRgLo4Uy0w9ZtAi07kB_rA9wxC5T3p+kr+ftWDUq7hWg@mail.gmail.com> delayed by 3 days - https://phabricator.wikimedia.org/T358198#9566984 (10saper) 05duplicate→03Open
[10:54:03] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.3/12.4 point update - https://phabricator.wikimedia.org/T353057#9566992 (10MoritzMuehlenhoff)
[10:56:18] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.3/12.4 point update - https://phabricator.wikimedia.org/T353057#9566993 (10MoritzMuehlenhoff) 05Open→03Resolved This is resolved
[10:56:38] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: ClusterConfig: Add kube-wiki-parsoid test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005723 (https://phabricator.wikimedia.org/T357392)
[11:00:04] <jouncebot>	 mvolz: gettimeofday() says it's time for Services – Citoid / Zotero. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1100)
[11:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1100)
[11:01:38] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P57693 and previous config saved to /var/cache/conftool/dbconfig/20240222-110138-arnaudb.json
[11:03:46] <wikibugs>	 (03CR) 10Volans: "I'm getting:" [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/1004192 (owner: 10Hashar)
[11:09:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es1028 T358180', diff saved to https://phabricator.wikimedia.org/P57694 and previous config saved to /var/cache/conftool/dbconfig/20240222-110914-root.json
[11:09:20] <stashbot>	 T358180: Upgrade es3 to MariaDB 10.6 - https://phabricator.wikimedia.org/T358180
[11:09:47] <wikibugs>	 (03PS1) 10Marostegui: es1028: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1005724 (https://phabricator.wikimedia.org/T358180)
[11:12:41] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.reimage for host es1028.eqiad.wmnet with OS bookworm
[11:12:41] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] es1028: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1005724 (https://phabricator.wikimedia.org/T358180) (owner: 10Marostegui)
[11:16:45] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T357189)', diff saved to https://phabricator.wikimedia.org/P57695 and previous config saved to /var/cache/conftool/dbconfig/20240222-111644-arnaudb.json
[11:16:46] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
[11:16:50] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[11:17:00] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
[11:17:07] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1177 (T357189)', diff saved to https://phabricator.wikimedia.org/P57696 and previous config saved to /var/cache/conftool/dbconfig/20240222-111706-arnaudb.json
[11:19:26] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T357189)', diff saved to https://phabricator.wikimedia.org/P57697 and previous config saved to /var/cache/conftool/dbconfig/20240222-111925-arnaudb.json
[11:26:35] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on es1028.eqiad.wmnet with reason: host reimage
[11:29:05] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1028.eqiad.wmnet with reason: host reimage
[11:34:33] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P57698 and previous config saved to /var/cache/conftool/dbconfig/20240222-113432-arnaudb.json
[11:42:42] <urbanecm>	 jouncebot: nowandnext
[11:42:42] <jouncebot>	 For the next 0 hour(s) and 17 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1100)
[11:42:42] <jouncebot>	 For the next 0 hour(s) and 17 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1100)
[11:42:42] <jouncebot>	 In 1 hour(s) and 17 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1300)
[11:49:39] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P57699 and previous config saved to /var/cache/conftool/dbconfig/20240222-114938-arnaudb.json
[11:50:03] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1028.eqiad.wmnet with OS bookworm
[11:51:24] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
[11:51:32] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
[11:51:53] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Analytics-privatedata-users for jwheeler - https://phabricator.wikimedia.org/T357731#9567233 (10hnowlan) 05Open→03Resolved a:03hnowlan Done
[11:52:50] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure: Connection errors from puppetmaster1002 to puppetdb - https://phabricator.wikimedia.org/T358187#9567253 (10cmooney) Definitely kind of strange.  IP connectivity between these hosts is ok: `lines=15 cmooney@es1031:~$ ping puppetmaster1002  PING pup...
[11:52:57] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
[11:53:05] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
[11:55:23] <logmsgbot>	 !log eoghan@cumin1002 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrading gitlab
[12:02:37] <logmsgbot>	 !log eoghan@cumin1002 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrading gitlab
[12:04:46] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T357189)', diff saved to https://phabricator.wikimedia.org/P57700 and previous config saved to /var/cache/conftool/dbconfig/20240222-120445-arnaudb.json
[12:04:48] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
[12:04:54] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[12:05:12] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
[12:05:19] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1178 (T357189)', diff saved to https://phabricator.wikimedia.org/P57701 and previous config saved to /var/cache/conftool/dbconfig/20240222-120518-arnaudb.json
[12:05:41] <jinxer-wm>	 (SystemdUnitFailed) firing: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:07:38] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T357189)', diff saved to https://phabricator.wikimedia.org/P57702 and previous config saved to /var/cache/conftool/dbconfig/20240222-120737-arnaudb.json
[12:22:45] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P57703 and previous config saved to /var/cache/conftool/dbconfig/20240222-122244-arnaudb.json
[12:27:44] <wikibugs>	 (03CR) 10Volans: "immediate replies, I didn't checked yet the new PS" [puppet] - 10https://gerrit.wikimedia.org/r/1004672 (owner: 10Slyngshede)
[12:30:26] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Data-Platform-SRE, 10User-ItamarWMDE: Remove production data access for former WMDE staff member goransm - https://phabricator.wikimedia.org/T356279#9567347 (10MoritzMuehlenhoff) @AndrewTavis_WMDE  Thanks! This is a long task and to make things explicit: Is the summary below...
[12:33:38] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: openstack: nova: compute: depend on the ceph config file being deployed [puppet] - 10https://gerrit.wikimedia.org/r/1005733 (https://phabricator.wikimedia.org/T358101)
[12:33:53] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1005733 (https://phabricator.wikimedia.org/T358101) (owner: 10Arturo Borrero Gonzalez)
[12:37:51] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P57704 and previous config saved to /var/cache/conftool/dbconfig/20240222-123750-arnaudb.json
[12:38:01] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: cloudvirt1034: move to modern single NIC setup [puppet] - 10https://gerrit.wikimedia.org/r/1005750 (https://phabricator.wikimedia.org/T319184)
[12:39:17] <wikibugs>	 (03PS4) 10Arturo Borrero Gonzalez: openstack: nova: compute: depend on the ceph config file being deployed [puppet] - 10https://gerrit.wikimedia.org/r/1005733 (https://phabricator.wikimedia.org/T358101)
[12:39:42] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1005733 (https://phabricator.wikimedia.org/T358101) (owner: 10Arturo Borrero Gonzalez)
[12:42:06] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Data-Platform-SRE, 10User-ItamarWMDE: Remove production data access for former WMDE staff member goransm - https://phabricator.wikimedia.org/T356279#9567384 (10AndrewTavis_WMDE) Thanks for checking in @MoritzMuehlenhoff! A correction to one of your points:  - Membership in a...
[12:43:55] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "es1028: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1005691 (owner: 10Marostegui)
[12:44:01] <wikibugs>	 (03CR) 10Majavah: [C: 03+1] openstack: nova: compute: depend on the ceph config file being deployed [puppet] - 10https://gerrit.wikimedia.org/r/1005733 (https://phabricator.wikimedia.org/T358101) (owner: 10Arturo Borrero Gonzalez)
[12:44:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1028 (re)pooling @ 1%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57705 and previous config saved to /var/cache/conftool/dbconfig/20240222-124438-root.json
[12:45:26] <logmsgbot>	 !log eoghan@cumin1002 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrading gitlab
[12:47:06] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: nova: compute: depend on the ceph config file being deployed [puppet] - 10https://gerrit.wikimedia.org/r/1005733 (https://phabricator.wikimedia.org/T358101) (owner: 10Arturo Borrero Gonzalez)
[12:52:32] <logmsgbot>	 !log eoghan@cumin1002 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrading gitlab
[12:52:57] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T357189)', diff saved to https://phabricator.wikimedia.org/P57706 and previous config saved to /var/cache/conftool/dbconfig/20240222-125257-arnaudb.json
[12:52:59] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
[12:53:04] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[12:53:13] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
[12:53:20] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1192 (T357189)', diff saved to https://phabricator.wikimedia.org/P57707 and previous config saved to /var/cache/conftool/dbconfig/20240222-125319-arnaudb.json
[12:55:07] <wikibugs>	 (03CR) 10Volans: "Great! Final nits and it's ready!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/979040 (https://phabricator.wikimedia.org/T350152) (owner: 10Ayounsi)
[12:55:39] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1192 (T357189)', diff saved to https://phabricator.wikimedia.org/P57708 and previous config saved to /var/cache/conftool/dbconfig/20240222-125538-arnaudb.json
[12:55:52] <wikibugs>	 (03PS1) 10Filippo Giunchedi: sre: fix WidespreadPuppetFailure logic for no resources [alerts] - 10https://gerrit.wikimedia.org/r/1005752 (https://phabricator.wikimedia.org/T357893)
[12:56:14] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Data-Platform-SRE, 10User-ItamarWMDE: Remove production data access for former WMDE staff member goransm - https://phabricator.wikimedia.org/T356279#9567420 (10MoritzMuehlenhoff) >>! In T356279#9567384, @AndrewTavis_WMDE wrote: > Thanks for checking in @MoritzMuehlenhoff! A...
[12:57:19] <wikibugs>	 (03PS9) 10Ayounsi: Netbox: add generic function to execute a Netbox script [software/spicerack] - 10https://gerrit.wikimedia.org/r/979121 (https://phabricator.wikimedia.org/T350152)
[12:57:31] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove goransm from analytics-wmde-users [puppet] - 10https://gerrit.wikimedia.org/r/1005753 (https://phabricator.wikimedia.org/T356279)
[12:58:29] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Data-Platform-SRE, 10Patch-For-Review, 10User-ItamarWMDE: Remove production data access for former WMDE staff member goransm - https://phabricator.wikimedia.org/T356279#9567427 (10AndrewTavis_WMDE) Thank you for the help with this, @MoritzMuehlenhoff! Please also add in @M...
[12:59:42] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: cloudvirt1034: move to modern single NIC setup [puppet] - 10https://gerrit.wikimedia.org/r/1005750 (https://phabricator.wikimedia.org/T319184)
[12:59:43] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1028 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57709 and previous config saved to /var/cache/conftool/dbconfig/20240222-125943-root.json
[13:00:05] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1300)
[13:00:21] <wikibugs>	 (03CR) 10Majavah: [C: 03+1] cloudvirt1034: move to modern single NIC setup [puppet] - 10https://gerrit.wikimedia.org/r/1005750 (https://phabricator.wikimedia.org/T319184) (owner: 10Arturo Borrero Gonzalez)
[13:00:35] <wikibugs>	 (03CR) 10Brouberol: [C: 03+1] "Nicely done!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005495 (https://phabricator.wikimedia.org/T357890) (owner: 10Btullis)
[13:01:34] <wikibugs>	 (03CR) 10Cathal Mooney: WIP: adjust reimage cookbook to clear switch caches for vms too (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421) (owner: 10Cathal Mooney)
[13:01:40] <logmsgbot>	 !log aborrero@cumin1002 START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bookworm
[13:01:51] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10cloud-services-team, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9567450 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1034.eqiad.wmnet with OS...
[13:02:11] <logmsgbot>	 !log eoghan@cumin1002 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading gitlab
[13:02:30] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudvirt1034: move to modern single NIC setup [puppet] - 10https://gerrit.wikimedia.org/r/1005750 (https://phabricator.wikimedia.org/T319184) (owner: 10Arturo Borrero Gonzalez)
[13:03:01] <Emperor>	 !log ms-eqiad set ACL {"read-only":["mw:backup"]} T269108
[13:03:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:06] <stashbot>	 T269108: Create a read-only swift identity for backup taking - https://phabricator.wikimedia.org/T269108
[13:03:25] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove cumin1001 from Puppet [puppet] - 10https://gerrit.wikimedia.org/r/1005755 (https://phabricator.wikimedia.org/T353419)
[13:03:25] <jinxer-wm>	 (SystemdUnitFailed) resolved: (2) send_tile_invalidations.service on maps1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:03:43] <wikibugs>	 (03CR) 10Filippo Giunchedi: "I see what you are getting at here, and in general I agree if we could get less disruptive upgrades that would be great. Though tbh it see" [debs/prometheus-rsyslog-exporter] - 10https://gerrit.wikimedia.org/r/1005718 (https://phabricator.wikimedia.org/T357616) (owner: 10JMeybohm)
[13:05:02] <Emperor>	 !log ms-codfw set ACL {"read-only":["mw:backup"]} T269108
[13:05:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:08] <wikibugs>	 (03PS16) 10Ayounsi: Netbox module: add get/set for primary IPs and access vlan [software/spicerack] - 10https://gerrit.wikimedia.org/r/979040 (https://phabricator.wikimedia.org/T350152)
[13:05:20] <wikibugs>	 (03CR) 10Ayounsi: Netbox module: add get/set for primary IPs and access vlan (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/979040 (https://phabricator.wikimedia.org/T350152) (owner: 10Ayounsi)
[13:06:36] <wikibugs>	 10SRE, 10MW-on-K8s, 10RESTBase, 10serviceops: Migrate restbase from mwapi-async to mw-api-int - https://phabricator.wikimedia.org/T358213#9567469 (10Clement_Goubert)
[13:06:48] <wikibugs>	 (03PS2) 10Filippo Giunchedi: thanos: fix bucket-query tools import [puppet] - 10https://gerrit.wikimedia.org/r/1005442 (https://phabricator.wikimedia.org/T351927)
[13:07:50] <wikibugs>	 10SRE, 10MW-on-K8s, 10RESTBase, 10serviceops: Migrate restbase from mwapi-async to mw-api-int - https://phabricator.wikimedia.org/T358213#9567482 (10Clement_Goubert) 05Open→03In progress p:05Triage→03Medium
[13:08:00] <wikibugs>	 10SRE, 10MW-on-K8s, 10Traffic, 10serviceops, and 2 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120#9567484 (10Clement_Goubert)
[13:10:22] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] grafana: provision thanos-downsample datasources [puppet] - 10https://gerrit.wikimedia.org/r/1004680 (owner: 10Filippo Giunchedi)
[13:10:45] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P57710 and previous config saved to /var/cache/conftool/dbconfig/20240222-131045-arnaudb.json
[13:12:43] <godog>	 jouncebot: next
[13:12:43] <jouncebot>	 In 0 hour(s) and 47 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1400)
[13:13:04] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791#9567499 (10hnowlan)
[13:13:11] <godog>	 !log bounce grafana to apply new datasources 
[13:13:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:13:30] <wikibugs>	 (03CR) 10Volans: "Nice! Almost ready" [software/spicerack] - 10https://gerrit.wikimedia.org/r/979121 (https://phabricator.wikimedia.org/T350152) (owner: 10Ayounsi)
[13:13:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove goransm from analytics-wmde-users [puppet] - 10https://gerrit.wikimedia.org/r/1005753 (https://phabricator.wikimedia.org/T356279) (owner: 10Muehlenhoff)
[13:14:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57711 and previous config saved to /var/cache/conftool/dbconfig/20240222-131448-root.json
[13:16:38] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Data-Platform-SRE, 10Patch-For-Review, 10User-ItamarWMDE: Remove production data access for former WMDE staff member goransm - https://phabricator.wikimedia.org/T356279#9567502 (10MoritzMuehlenhoff) 05Stalled→03Open a:03MoritzMuehlenhoff
[13:16:44] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Add an nginx reverse proxy to superset to help with serving static assets [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005495 (https://phabricator.wikimedia.org/T357890) (owner: 10Btullis)
[13:17:11] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "Ship it!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/979040 (https://phabricator.wikimedia.org/T350152) (owner: 10Ayounsi)
[13:17:33] <wikibugs>	 (03Merged) 10jenkins-bot: Add an nginx reverse proxy to superset to help with serving static assets [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005495 (https://phabricator.wikimedia.org/T357890) (owner: 10Btullis)
[13:17:52] <wikibugs>	 10SRE, 10SRE-swift-storage, 10Data-Persistence-Backup: Create a read-only swift identity for backup taking - https://phabricator.wikimedia.org/T269108#9567508 (10MatthewVernon) @jcrespo can you try now, please?  I constructed the appropriate URL thus: ` matthew@tsk:~/puppet$ python3 Python 3.9.2 (default, Fe...
[13:18:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: p75 latency high: eqiad mw-api-int (k8s) 1.068s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[13:18:18] <logmsgbot>	 !log aborrero@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
[13:18:57] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] thanos: fix bucket-query tools import [puppet] - 10https://gerrit.wikimedia.org/r/1005442 (https://phabricator.wikimedia.org/T351927) (owner: 10Filippo Giunchedi)
[13:19:11] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Migrate Spicerack logs from cumin1001 to cumin1002? - https://phabricator.wikimedia.org/T353523#9567511 (10Volans) a:03Volans
[13:20:20] <wikibugs>	 (03PS2) 10Cathal Mooney: WIP: adjust reimage cookbook to clear switch caches for vms too [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421)
[13:20:50] <logmsgbot>	 !log aborrero@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
[13:20:58] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[13:21:02] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1005743 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[13:21:06] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] C:puppetmaster::monitoring Disable Icinga merge check. [puppet] - 10https://gerrit.wikimedia.org/r/1005743 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[13:23:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: p75 latency high: eqiad mw-api-int (k8s) 1.068s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[13:23:38] <wikibugs>	 (03PS3) 10Hashar: Change build image user from root to nobody [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/1004192
[13:25:53] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P57712 and previous config saved to /var/cache/conftool/dbconfig/20240222-132551-arnaudb.json
[13:25:56] <jinxer-wm>	 (CirrusSearchNodeIndexingNotIncreasing) firing: (4) Elasticsearch instance elastic2057-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[13:28:12] <wikibugs>	 (03PS2) 10Slyngshede: C:puppetmaster::monitoring Disable Icinga merge check. [puppet] - 10https://gerrit.wikimedia.org/r/1005743 (https://phabricator.wikimedia.org/T350694)
[13:29:29] <wikibugs>	 (03PS1) 10Filippo Giunchedi: sre: deploy pki alerts to eqiad/codfw only [alerts] - 10https://gerrit.wikimedia.org/r/1005758 (https://phabricator.wikimedia.org/T354255)
[13:29:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57713 and previous config saved to /var/cache/conftool/dbconfig/20240222-132953-root.json
[13:30:39] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1430/co" [puppet] - 10https://gerrit.wikimedia.org/r/1005743 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[13:30:42] <wikibugs>	 (03PS1) 10Btullis: Update the image tags for superset to reflect the actual tags [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005759 (https://phabricator.wikimedia.org/T357890)
[13:31:24] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] C:puppetmaster::monitoring Disable Icinga merge check. [puppet] - 10https://gerrit.wikimedia.org/r/1005743 (https://phabricator.wikimedia.org/T350694) (owner: 10Slyngshede)
[13:31:38] <wikibugs>	 (03CR) 10Brouberol: [C: 03+1] Update the image tags for superset to reflect the actual tags [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005759 (https://phabricator.wikimedia.org/T357890) (owner: 10Btullis)
[13:32:26] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Update the image tags for superset to reflect the actual tags [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005759 (https://phabricator.wikimedia.org/T357890) (owner: 10Btullis)
[13:32:32] <eoghan>	 We'll be restarting gitlab for an update in approximately 1 hour. This should last less than 5 minutes. 
[13:33:43] <wikibugs>	 (03Merged) 10jenkins-bot: Update the image tags for superset to reflect the actual tags [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005759 (https://phabricator.wikimedia.org/T357890) (owner: 10Btullis)
[13:34:05] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[13:38:45] <wikibugs>	 (03PS1) 10Btullis: Fix errant bracket in nginx configmap [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005761 (https://phabricator.wikimedia.org/T357890)
[13:39:55] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Fix errant bracket in nginx configmap [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005761 (https://phabricator.wikimedia.org/T357890) (owner: 10Btullis)
[13:40:17] <logmsgbot>	 !log aborrero@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1034
[13:40:35] <logmsgbot>	 !log aborrero@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1034
[13:41:00] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1192 (T357189)', diff saved to https://phabricator.wikimedia.org/P57714 and previous config saved to /var/cache/conftool/dbconfig/20240222-134059-arnaudb.json
[13:41:00] <wikibugs>	 (03Merged) 10jenkins-bot: Fix errant bracket in nginx configmap [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005761 (https://phabricator.wikimedia.org/T357890) (owner: 10Btullis)
[13:41:02] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
[13:41:05] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[13:41:15] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
[13:41:19] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[13:41:21] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1193 (T357189)', diff saved to https://phabricator.wikimedia.org/P57715 and previous config saved to /var/cache/conftool/dbconfig/20240222-134120-arnaudb.json
[13:42:23] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10cloud-services-team, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9567605 (10aborrero)
[13:43:40] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1193 (T357189)', diff saved to https://phabricator.wikimedia.org/P57716 and previous config saved to /var/cache/conftool/dbconfig/20240222-134340-arnaudb.json
[13:44:56] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[13:44:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57717 and previous config saved to /var/cache/conftool/dbconfig/20240222-134458-root.json
[13:45:10] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
[13:45:50] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[13:46:38] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
[13:46:54] <logmsgbot>	 !log aborrero@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bookworm
[13:47:10] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10cloud-services-team, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9567631 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1034.eqiad.wmnet with OS book...
[13:48:05] <wikibugs>	 10SRE, 10SRE-swift-storage, 10Data-Persistence-Backup: Create a read-only swift identity for backup taking - https://phabricator.wikimedia.org/T269108#9567633 (10jcrespo) Thank you a lot, as I mentioned in private, I will try to run the automatic downloads back again with the new user, if it works we will be...
[13:48:37] <jinxer-wm>	 (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:48:38] <wikibugs>	 (03PS1) 10Btullis: Bump superset chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005762 (https://phabricator.wikimedia.org/T357890)
[13:48:55] <wikibugs>	 (03PS6) 10Slyngshede: C:prometheus::process_exporter Add a simplistic process exporter. [puppet] - 10https://gerrit.wikimedia.org/r/1004672
[13:49:06] <wikibugs>	 (03CR) 10Slyngshede: C:prometheus::process_exporter Add a simplistic process exporter. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1004672 (owner: 10Slyngshede)
[13:49:49] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Bump superset chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005762 (https://phabricator.wikimedia.org/T357890) (owner: 10Btullis)
[13:50:42] <wikibugs>	 (03Merged) 10jenkins-bot: Bump superset chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005762 (https://phabricator.wikimedia.org/T357890) (owner: 10Btullis)
[13:51:04] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[13:51:23] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
[13:51:26] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[13:51:31] <wikibugs>	 (03PS10) 10Ayounsi: Netbox: add generic function to execute a Netbox script [software/spicerack] - 10https://gerrit.wikimedia.org/r/979121 (https://phabricator.wikimedia.org/T350152)
[13:51:42] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
[13:51:58] <wikibugs>	 (03CR) 10Ayounsi: Netbox: add generic function to execute a Netbox script (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/979121 (https://phabricator.wikimedia.org/T350152) (owner: 10Ayounsi)
[13:52:08] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[13:52:17] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] C:prometheus::process_exporter Add a simplistic process exporter. [puppet] - 10https://gerrit.wikimedia.org/r/1004672 (owner: 10Slyngshede)
[13:52:18] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
[13:52:30] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[13:52:40] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
[13:52:48] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[13:52:57] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
[13:53:09] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[13:54:39] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Control IPv6 RA generation on core routers - https://phabricator.wikimedia.org/T358220#9567655 (10cmooney) p:05Triage→03Low
[13:55:15] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Control IPv6 RA generation on core routers - https://phabricator.wikimedia.org/T358220#9567676 (10cmooney)
[13:55:21] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544#9567677 (10cmooney)
[13:57:51] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: nova: compute: drop version matrix split [puppet] - 10https://gerrit.wikimedia.org/r/1005763
[13:57:53] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: nova: compute: extend dependency on ceph.conf [puppet] - 10https://gerrit.wikimedia.org/r/1005764 (https://phabricator.wikimedia.org/T358101)
[13:57:57] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Netbox: add generic function to execute a Netbox script [software/spicerack] - 10https://gerrit.wikimedia.org/r/979121 (https://phabricator.wikimedia.org/T350152) (owner: 10Ayounsi)
[13:58:47] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P57718 and previous config saved to /var/cache/conftool/dbconfig/20240222-135846-arnaudb.json
[13:58:55] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1005764 (https://phabricator.wikimedia.org/T358101) (owner: 10Arturo Borrero Gonzalez)
[13:59:03] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] openstack: nova: compute: drop version matrix split [puppet] - 10https://gerrit.wikimedia.org/r/1005763 (owner: 10Arturo Borrero Gonzalez)
[13:59:14] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] openstack: nova: compute: extend dependency on ceph.conf [puppet] - 10https://gerrit.wikimedia.org/r/1005764 (https://phabricator.wikimedia.org/T358101) (owner: 10Arturo Borrero Gonzalez)
[14:00:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57719 and previous config saved to /var/cache/conftool/dbconfig/20240222-140003-root.json
[14:00:04] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: May I have your attention please! UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1400)
[14:00:04] <jouncebot>	 claime: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:26] <wikibugs>	 10SRE, 10ops-codfw, 10ops-eqiad: Decommission Arelion's eqiad-codfw 10G link - https://phabricator.wikimedia.org/T353424#9567694 (10RobH)
[14:00:34] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: nova: compute: drop version matrix split [puppet] - 10https://gerrit.wikimedia.org/r/1005763
[14:00:36] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: nova: compute: extend dependency on ceph.conf [puppet] - 10https://gerrit.wikimedia.org/r/1005764 (https://phabricator.wikimedia.org/T358101)
[14:01:08] <wikibugs>	 (03CR) 10Slyngshede: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1004672 (owner: 10Slyngshede)
[14:01:10] <Lucas_WMDE>	 I’m in a meeting but can deploy later if nobody else is around
[14:01:52] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1005730 (https://phabricator.wikimedia.org/T357392) (owner: 10Alexandros Kosiaris)
[14:02:06] <wikibugs>	 10SRE, 10ops-codfw, 10ops-eqiad: Decommission Arelion's eqiad-codfw 10G link - https://phabricator.wikimedia.org/T353424#9567706 (10RobH) 05Open→03Stalled Both disconnects are currently pending with the vendors.  EQ's has a ticket submitted directly where CyrusOne is via our account reps.  Updates to bot...
[14:02:59] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1005764 (https://phabricator.wikimedia.org/T358101) (owner: 10Arturo Borrero Gonzalez)
[14:03:28] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
[14:04:34] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1005731 (https://phabricator.wikimedia.org/T331613) (owner: 10Muehlenhoff)
[14:04:44] <wikibugs>	 (03CR) 10Volans: "LGTM, one detail/question" [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421) (owner: 10Cathal Mooney)
[14:05:19] <taavi>	 claime: will you self-deploy or do you need someone else to do that?
[14:11:20] <icinga-wm_>	 PROBLEM - BGP status on cr2-eqord is CRITICAL: BGP CRITICAL - AS6939/IPv6: Connect - HE, AS6939/IPv4: Connect - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:13:51] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Setup cumin1002 and eventually decom cumin1001 - https://phabricator.wikimedia.org/T353419#9567739 (10Volans)
[14:13:53] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P57720 and previous config saved to /var/cache/conftool/dbconfig/20240222-141353-arnaudb.json
[14:14:06] <icinga-wm_>	 PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - failed 93 probes of 798 (alerts on 35) - https://atlas.ripe.net/measurements/1790945/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[14:14:14] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Migrate Spicerack logs from cumin1001 to cumin1002? - https://phabricator.wikimedia.org/T353523#9567737 (10Volans) 05Open→03Resolved I've copied the logs in `/var/log/{cumin,debdeploy,spicerack}` on `cumin1001` to `/var/log/cumin1001` on `cumin1002` and `cumin2002` usin...
[14:15:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57721 and previous config saved to /var/cache/conftool/dbconfig/20240222-141508-root.json
[14:17:56] <wikibugs>	 (03PS1) 10Volans: validators: dcim.device fix asset tag regex [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1005768 (https://phabricator.wikimedia.org/T356633)
[14:18:56] <wikibugs>	 10ops-eqiad, 10DC-Ops: asset tag typos - audit and correct - https://phabricator.wikimedia.org/T358223#9567756 (10RobH) p:05Triage→03Medium
[14:19:03] <wikibugs>	 10ops-eqiad, 10DC-Ops: asset tag typos - audit and correct - https://phabricator.wikimedia.org/T358223#9567756 (10RobH)
[14:19:06] <icinga-wm_>	 RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 33 probes of 798 (alerts on 35) - https://atlas.ripe.net/measurements/1790945/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[14:19:10] <wikibugs>	 10ops-eqiad, 10DC-Ops: asset tag typos - audit and correct - https://phabricator.wikimedia.org/T358223#9567756 (10RobH)
[14:19:57] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: profile_openstack_base_nova_compute_service_spec: unbreak tests [puppet] - 10https://gerrit.wikimedia.org/r/1005769
[14:20:38] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: profile_openstack_base_nova_compute_service_spec: unbreak tests [puppet] - 10https://gerrit.wikimedia.org/r/1005769
[14:22:00] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: mw-parsoid: Add parsoid.discovery.wmnet in cert SANs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005770 (https://phabricator.wikimedia.org/T357392)
[14:24:24] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] profile_openstack_base_nova_compute_service_spec: unbreak tests [puppet] - 10https://gerrit.wikimedia.org/r/1005769 (owner: 10Arturo Borrero Gonzalez)
[14:25:18] <claime>	 taavi: I'll self deploy
[14:25:37] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10Patch-For-Review, 10Puppet (Puppet 7.0): Figure out next steps for cergen in Puppet setup - https://phabricator.wikimedia.org/T357750#9567796 (10MoritzMuehlenhoff)
[14:26:28] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: profile_openstack_base_nova_compute_service_spec: unbreak tests [puppet] - 10https://gerrit.wikimedia.org/r/1005769
[14:27:01] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by cgoubert@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1004135 (https://phabricator.wikimedia.org/T298265) (owner: 10Clément Goubert)
[14:27:44] <wikibugs>	 (03Merged) 10jenkins-bot: Enable $wgLocalHTTPProxy on group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1004135 (https://phabricator.wikimedia.org/T298265) (owner: 10Clément Goubert)
[14:27:54] <claime>	 (sorry I got called on the phone...)
[14:28:09] <logmsgbot>	 !log cgoubert@deploy2002 Started scap: Backport for [[gerrit:1004135|Enable $wgLocalHTTPProxy on group1 wikis (T298265)]]
[14:28:15] <stashbot>	 T298265: Have internal MediaWiki to MediaWiki HTTP requests use an envoyproxy on appservers - https://phabricator.wikimedia.org/T298265
[14:28:19] <wikibugs>	 (03PS4) 10Arturo Borrero Gonzalez: profile_openstack_base_nova_compute_service_spec: unbreak tests [puppet] - 10https://gerrit.wikimedia.org/r/1005769
[14:29:00] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1193 (T357189)', diff saved to https://phabricator.wikimedia.org/P57722 and previous config saved to /var/cache/conftool/dbconfig/20240222-142859-arnaudb.json
[14:29:03] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
[14:29:07] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[14:29:16] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
[14:29:22] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1203 (T357189)', diff saved to https://phabricator.wikimedia.org/P57723 and previous config saved to /var/cache/conftool/dbconfig/20240222-142921-arnaudb.json
[14:29:41] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] profile_openstack_base_nova_compute_service_spec: unbreak tests [puppet] - 10https://gerrit.wikimedia.org/r/1005769 (owner: 10Arturo Borrero Gonzalez)
[14:29:43] <logmsgbot>	 !log cgoubert@deploy2002 cgoubert: Backport for [[gerrit:1004135|Enable $wgLocalHTTPProxy on group1 wikis (T298265)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:30:03] <wikibugs>	 (03PS5) 10Arturo Borrero Gonzalez: profile_openstack_base_nova_compute_service_spec: unbreak tests [puppet] - 10https://gerrit.wikimedia.org/r/1005769
[14:30:30] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10Patch-For-Review, 10Puppet (Puppet 7.0): Figure out next steps for cergen in Puppet setup - https://phabricator.wikimedia.org/T357750#9567826 (10MoritzMuehlenhoff)
[14:31:42] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1203 (T357189)', diff saved to https://phabricator.wikimedia.org/P57724 and previous config saved to /var/cache/conftool/dbconfig/20240222-143141-arnaudb.json
[14:31:47] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1005769 (owner: 10Arturo Borrero Gonzalez)
[14:34:29] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] profile_openstack_base_nova_compute_service_spec: unbreak tests [puppet] - 10https://gerrit.wikimedia.org/r/1005769 (owner: 10Arturo Borrero Gonzalez)
[14:37:40] <logmsgbot>	 !log cgoubert@deploy2002 cgoubert: Continuing with sync
[14:40:50] <wikibugs>	 10SRE-swift-storage, 10MediaWiki-Uploading, 10Patch-For-Review, 10User-revi: FAILED: stashfailed: Could not read file "mwstore://local-swift-eqiad/local-temp/a/ac/15xi9btm14os.u9p1dr.1208681.webm.0". - https://phabricator.wikimedia.org/T200820#9567876 (10MatthewVernon) Here's the relevant logs, sorted by t...
[14:41:08] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: mw-parsoid: Add parsoid.discovery.wmnet in cert SANs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005770 (https://phabricator.wikimedia.org/T357392)
[14:41:44] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Make apt2002 a repository server [puppet] - 10https://gerrit.wikimedia.org/r/1005731 (https://phabricator.wikimedia.org/T331613) (owner: 10Muehlenhoff)
[14:44:08] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host an-redacteddb1001.eqiad.wmnet with OS bullseye
[14:44:14] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: Q#:rack/setup/install an-redacteddb1001 - https://phabricator.wikimedia.org/T355571#9567887 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host an-redacteddb1001.eqiad.wmnet with OS bullseye
[14:44:18] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-redacteddb1001.eqiad.wmnet with OS bullseye
[14:44:24] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: Q#:rack/setup/install an-redacteddb1001 - https://phabricator.wikimedia.org/T355571#9567888 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host an-redacteddb1001.eqiad.wmnet with OS bullseye executed with errors: - an-redacteddb1001 (...
[14:45:56] <logmsgbot>	 !log cgoubert@deploy2002 Finished scap: Backport for [[gerrit:1004135|Enable $wgLocalHTTPProxy on group1 wikis (T298265)]] (duration: 17m 46s)
[14:46:02] <stashbot>	 T298265: Have internal MediaWiki to MediaWiki HTTP requests use an envoyproxy on appservers - https://phabricator.wikimedia.org/T298265
[14:46:49] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P57725 and previous config saved to /var/cache/conftool/dbconfig/20240222-144648-arnaudb.json
[14:47:22] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Remove useless fixtures from services [deployment-charts] - 10https://gerrit.wikimedia.org/r/1002917 (owner: 10Giuseppe Lavagetto)
[14:47:37] <wikibugs>	 (03PS1) 10Cathal Mooney: Change name of dhcp_relay var and use it to control CR IPv6 RAs also [homer/public] - 10https://gerrit.wikimedia.org/r/1005772 (https://phabricator.wikimedia.org/T358220)
[14:48:39] <wikibugs>	 (03PS2) 10Cathal Mooney: Change name of dhcp_relay var and use it to control CR IPv6 RAs also [homer/public] - 10https://gerrit.wikimedia.org/r/1005772 (https://phabricator.wikimedia.org/T358220)
[14:49:52] <wikibugs>	 (03Merged) 10jenkins-bot: Remove useless fixtures from services [deployment-charts] - 10https://gerrit.wikimedia.org/r/1002917 (owner: 10Giuseppe Lavagetto)
[14:50:07] <wikibugs>	 (03CR) 10JMeybohm: "To be fair: The current solution also does not work with multiple instances." [debs/prometheus-rsyslog-exporter] - 10https://gerrit.wikimedia.org/r/1005718 (https://phabricator.wikimedia.org/T357616) (owner: 10JMeybohm)
[14:50:36] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] mw-parsoid: Add parsoid.discovery.wmnet in cert SANs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005770 (https://phabricator.wikimedia.org/T357392) (owner: 10Alexandros Kosiaris)
[14:50:52] <wikibugs>	 (03CR) 10JMeybohm: "Like without the typo... 😊" [debs/prometheus-rsyslog-exporter] - 10https://gerrit.wikimedia.org/r/1005718 (https://phabricator.wikimedia.org/T357616) (owner: 10JMeybohm)
[14:51:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2149 (re)pooling @ 1%: After recloning', diff saved to https://phabricator.wikimedia.org/P57726 and previous config saved to /var/cache/conftool/dbconfig/20240222-145120-root.json
[14:51:25] <wikibugs>	 (03PS3) 10Cathal Mooney: Change name of dhcp_relay var and use it to control CR IPv6 RAs also [homer/public] - 10https://gerrit.wikimedia.org/r/1005772 (https://phabricator.wikimedia.org/T358220)
[14:51:46] <wikibugs>	 (03Merged) 10jenkins-bot: mw-parsoid: Add parsoid.discovery.wmnet in cert SANs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005770 (https://phabricator.wikimedia.org/T357392) (owner: 10Alexandros Kosiaris)
[14:54:25] <wikibugs>	 (03PS4) 10Cathal Mooney: Change name of dhcp_relay var and use it to control CR IPv6 RAs also [homer/public] - 10https://gerrit.wikimedia.org/r/1005772 (https://phabricator.wikimedia.org/T358220)
[14:54:32] <wikibugs>	 (03CR) 10Ssingh: "Looking at my history, parse1017, parse1018, clouddumps1002. In some of these, you can see the task you mentioned above but some just had " [puppet] - 10https://gerrit.wikimedia.org/r/1003112 (https://phabricator.wikimedia.org/T356459) (owner: 10JHathaway)
[14:55:08] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
[14:55:18] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
[14:55:35] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
[14:55:42] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
[14:57:32] <wikibugs>	 (03CR) 10Muehlenhoff: "Thanks! clouddumps1002 uses Puppet 7, so this confirms that the issue was in fact unrelated to https://phabricator.wikimedia.org/T358187." [puppet] - 10https://gerrit.wikimedia.org/r/1003112 (https://phabricator.wikimedia.org/T356459) (owner: 10JHathaway)
[14:59:39] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Puppet-Infrastructure: Connection errors from puppetmaster1002 to puppetdb - https://phabricator.wikimedia.org/T358187#9567991 (10ssingh) When I looked at this last night as the alerts were coming in, I noticed that some hosts were not reporting the connection failure b...
[15:01:06] <logmsgbot>	 !log eoghan@cumin1002 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrading gitlab
[15:01:55] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P57727 and previous config saved to /var/cache/conftool/dbconfig/20240222-150154-arnaudb.json
[15:03:38] <wikibugs>	 (03PS1) 10Hnowlan: kubernetes: move all remaining eligible jobrunners to k8s [puppet] - 10https://gerrit.wikimedia.org/r/1005776 (https://phabricator.wikimedia.org/T354791)
[15:04:56] <wikibugs>	 (03PS2) 10Hnowlan: kubernetes: move all remaining eligible eqiad jobrunners to k8s [puppet] - 10https://gerrit.wikimedia.org/r/1005776 (https://phabricator.wikimedia.org/T354791)
[15:06:26] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2149 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P57728 and previous config saved to /var/cache/conftool/dbconfig/20240222-150626-root.json
[15:11:07] <wikibugs>	 10SRE, 10Content-Transform-Team, 10MW-on-K8s, 10Traffic, and 3 others: Create parsoid mediawiki deployment and migrate parsoid-php.discovery.wmnet traffic to it - https://phabricator.wikimedia.org/T357392#9568026 (10akosiaris)
[15:11:18] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] etherpad: make exporter and blackbox checks configurable [puppet] - 10https://gerrit.wikimedia.org/r/1005458 (https://phabricator.wikimedia.org/T316421) (owner: 10Jelto)
[15:12:45] <akosiaris>	 !log Bump weight of old parsoid hosts from 10 to 110. This is a noop right now but will makes calculations later spelled out in T357392 possible.
[15:12:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:12:50] <logmsgbot>	 !log akosiaris@cumin1002 conftool action : set/weight=110; selector: service=parsoid-php,name=(pars.*|mw.*)
[15:12:51] <stashbot>	 T357392: Create parsoid mediawiki deployment and migrate parsoid-php.discovery.wmnet traffic to it - https://phabricator.wikimedia.org/T357392
[15:13:21] <logmsgbot>	 !log akosiaris@cumin1002 conftool action : set/weight=1; selector: service=parsoid-php,name=kubernetes.*
[15:13:52] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 04-1] "The current solution would work with multiple instances because typically they are partof/bindsto (or the equivalent) to rsyslog.service, " [debs/prometheus-rsyslog-exporter] - 10https://gerrit.wikimedia.org/r/1005718 (https://phabricator.wikimedia.org/T357616) (owner: 10JMeybohm)
[15:14:22] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] validators: dcim.device fix asset tag regex [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1005768 (https://phabricator.wikimedia.org/T356633) (owner: 10Volans)
[15:14:49] <wikibugs>	 (03PS11) 10Ayounsi: Netbox: add generic function to execute a Netbox script [software/spicerack] - 10https://gerrit.wikimedia.org/r/979121 (https://phabricator.wikimedia.org/T350152)
[15:15:11] <akosiaris>	 !log T357392 pool 46 kubernetes hosts of parsoid-php with a weight of 1. Since the 42 parse hosts are at weight 110, that means 1% goes to mw-parsoid deployment, aka mw-on-k8s
[15:15:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:15:18] <wikibugs>	 (03CR) 10Volans: [C: 03+2] validators: dcim.device fix asset tag regex [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1005768 (https://phabricator.wikimedia.org/T356633) (owner: 10Volans)
[15:15:32] <logmsgbot>	 !log akosiaris@cumin1002 conftool action : set/pooled=yes; selector: service=parsoid-php,name=kubernetes.*
[15:15:54] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.reimage for host testvm2002.codfw.wmnet with OS bullseye
[15:17:01] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1203 (T357189)', diff saved to https://phabricator.wikimedia.org/P57729 and previous config saved to /var/cache/conftool/dbconfig/20240222-151701-arnaudb.json
[15:17:03] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
[15:17:07] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[15:17:27] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
[15:17:28] <wikibugs>	 (03Merged) 10jenkins-bot: validators: dcim.device fix asset tag regex [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1005768 (https://phabricator.wikimedia.org/T356633) (owner: 10Volans)
[15:17:34] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1211 (T357189)', diff saved to https://phabricator.wikimedia.org/P57730 and previous config saved to /var/cache/conftool/dbconfig/20240222-151733-arnaudb.json
[15:17:50] <wikibugs>	 (03CR) 10Jelto: [V: 03+1 C: 03+2] etherpad: make exporter and blackbox checks configurable [puppet] - 10https://gerrit.wikimedia.org/r/1005458 (https://phabricator.wikimedia.org/T316421) (owner: 10Jelto)
[15:18:10] <icinga-wm_>	 PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - parsoid-php_443: Servers kubernetes1015.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:18:38] <icinga-wm_>	 PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - parsoid-php_443: Servers kubernetes1015.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:18:50] <icinga-wm_>	 PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - parsoid-php_443: Servers kubernetes2007.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:18:50] <icinga-wm_>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - parsoid-php_443: Servers kubernetes2007.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:19:53] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1211 (T357189)', diff saved to https://phabricator.wikimedia.org/P57731 and previous config saved to /var/cache/conftool/dbconfig/20240222-151952-arnaudb.json
[15:20:59] <wikibugs>	 (03CR) 10Ebernhardson: "dc" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005635 (https://phabricator.wikimedia.org/T356303) (owner: 10Ebernhardson)
[15:21:32] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2149 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P57732 and previous config saved to /var/cache/conftool/dbconfig/20240222-152131-root.json
[15:25:14] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] "I don't see any past puppet failures on clouddumps1002, so I this issue was caused by https://phabricator.wikimedia.org/T358187, not by th" [puppet] - 10https://gerrit.wikimedia.org/r/1003112 (https://phabricator.wikimedia.org/T356459) (owner: 10JHathaway)
[15:25:33] <wikibugs>	 (03PS1) 10Hnowlan: kubernetes: move all remaining eligible codfw jobrunners to k8s [puppet] - 10https://gerrit.wikimedia.org/r/1005786 (https://phabricator.wikimedia.org/T354791)
[15:26:00] <wikibugs>	 (03CR) 10JHathaway: [C: 04-1] "I don't think this is needed, as the cause was, https://phabricator.wikimedia.org/T358187" [puppet] - 10https://gerrit.wikimedia.org/r/1005482 (owner: 10Ssingh)
[15:26:51] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] kubernetes: move all remaining eligible codfw jobrunners to k8s [puppet] - 10https://gerrit.wikimedia.org/r/1005786 (https://phabricator.wikimedia.org/T354791) (owner: 10Hnowlan)
[15:27:16] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
[15:27:23] <moritzm>	 !log installing glib2.0 security updates on bullseye
[15:27:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:28:26] <wikibugs>	 (03CR) 10Ssingh: "You are right, sorry, it was https://puppetboard.wikimedia.org/report/cloudcephosd1029.eqiad.wmnet/bccf677965e65173ff9d2b307ffd5f69785c0cb" [puppet] - 10https://gerrit.wikimedia.org/r/1003112 (https://phabricator.wikimedia.org/T356459) (owner: 10JHathaway)
[15:28:44] <wikibugs>	 (03Abandoned) 10Ssingh: Revert "etcd: disable the diff output for client config with passwords" [puppet] - 10https://gerrit.wikimedia.org/r/1005482 (owner: 10Ssingh)
[15:28:48] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "Ship it!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/979121 (https://phabricator.wikimedia.org/T350152) (owner: 10Ayounsi)
[15:30:04] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] "no problem at all, the error message was definitely confusing!" [puppet] - 10https://gerrit.wikimedia.org/r/1003112 (https://phabricator.wikimedia.org/T356459) (owner: 10JHathaway)
[15:32:06] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
[15:33:14] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Netbox: add generic function to execute a Netbox script [software/spicerack] - 10https://gerrit.wikimedia.org/r/979121 (https://phabricator.wikimedia.org/T350152) (owner: 10Ayounsi)
[15:33:38] <wikibugs>	 (03PS2) 10Hnowlan: kubernetes: move all remaining eligible codfw jobrunners to k8s [puppet] - 10https://gerrit.wikimedia.org/r/1005786 (https://phabricator.wikimedia.org/T354791)
[15:34:43] <wikibugs>	 (03CR) 10Herron: [C: 03+1] sre: fix WidespreadPuppetFailure logic for no resources [alerts] - 10https://gerrit.wikimedia.org/r/1005752 (https://phabricator.wikimedia.org/T357893) (owner: 10Filippo Giunchedi)
[15:35:00] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P57733 and previous config saved to /var/cache/conftool/dbconfig/20240222-153459-arnaudb.json
[15:36:37] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P57734 and previous config saved to /var/cache/conftool/dbconfig/20240222-153636-root.json
[15:37:35] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544#9568184 (10cmooney)
[15:38:41] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A8 from asw-a8-codfw to lsw1-a8-codfw - https://phabricator.wikimedia.org/T355874#9568181 (10cmooney) 05Open→03Resolved a:03cmooney Closing this, thanks all for the help!
[15:39:08] <logmsgbot>	 !log aqu@deploy2002 Started deploy [airflow-dags/analytics_test@b115452]: Deploy Refine job POC on test cluster
[15:39:18] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.9 point update - https://phabricator.wikimedia.org/T357144#9568189 (10MoritzMuehlenhoff)
[15:39:25] <logmsgbot>	 !log aqu@deploy2002 Finished deploy [airflow-dags/analytics_test@b115452]: Deploy Refine job POC on test cluster (duration: 00m 16s)
[15:40:20] <wikibugs>	 (03Merged) 10jenkins-bot: Netbox: add generic function to execute a Netbox script [software/spicerack] - 10https://gerrit.wikimedia.org/r/979121 (https://phabricator.wikimedia.org/T350152) (owner: 10Ayounsi)
[15:40:36] <wikibugs>	 (03CR) 10Jelto: [C: 03+1] "lgtm, but the expression for missing resource fires on 3% and agent failures on 10%. I can not tell if that's intended and agent failures " [alerts] - 10https://gerrit.wikimedia.org/r/1005752 (https://phabricator.wikimedia.org/T357893) (owner: 10Filippo Giunchedi)
[15:42:29] <wikibugs>	 (03PS1) 10Bking: rdf-streaming-updater: raise storage alert threshold [alerts] - 10https://gerrit.wikimedia.org/r/1005791 (https://phabricator.wikimedia.org/T348685)
[15:43:40] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] rdf-streaming-updater: raise storage alert threshold [alerts] - 10https://gerrit.wikimedia.org/r/1005791 (https://phabricator.wikimedia.org/T348685) (owner: 10Bking)
[15:44:59] <wikibugs>	 (03PS2) 10Bking: rdf-streaming-updater: raise storage alert threshold [alerts] - 10https://gerrit.wikimedia.org/r/1005791 (https://phabricator.wikimedia.org/T348685)
[15:46:06] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] rdf-streaming-updater: raise storage alert threshold [alerts] - 10https://gerrit.wikimedia.org/r/1005791 (https://phabricator.wikimedia.org/T348685) (owner: 10Bking)
[15:46:19] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 3:00:00 on cp[2031-2032].codfw.wmnet with reason: T355868
[15:46:22] <wikibugs>	 (03CR) 10Clément Goubert: [C: 03+1] kubernetes: move all remaining eligible eqiad jobrunners to k8s (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1005776 (https://phabricator.wikimedia.org/T354791) (owner: 10Hnowlan)
[15:46:25] <stashbot>	 T355868: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868
[15:46:49] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp[2031-2032].codfw.wmnet with reason: T355868
[15:48:11] <wikibugs>	 (03CR) 10JMeybohm: "agreed, thanks anyways" [debs/prometheus-rsyslog-exporter] - 10https://gerrit.wikimedia.org/r/1005718 (https://phabricator.wikimedia.org/T357616) (owner: 10JMeybohm)
[15:48:16] <wikibugs>	 (03Abandoned) 10JMeybohm: Don't restart rsyslog on updates, kill exporter instead [debs/prometheus-rsyslog-exporter] - 10https://gerrit.wikimedia.org/r/1005718 (https://phabricator.wikimedia.org/T357616) (owner: 10JMeybohm)
[15:48:19] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on asw-b-codfw,cr[1-2]-codfw,lsw1-b2-codfw.mgmt with reason: prepping for server uplink migration codfw rack b2
[15:48:38] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-b-codfw,cr[1-2]-codfw,lsw1-b2-codfw.mgmt with reason: prepping for server uplink migration codfw rack b2
[15:48:44] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868#9568216 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=93a3c441-2097-4840-a202-5694f260c1b5...
[15:50:04] <wikibugs>	 (03CR) 10Clément Goubert: [C: 04-1] kubernetes: move all remaining eligible codfw jobrunners to k8s (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1005786 (https://phabricator.wikimedia.org/T354791) (owner: 10Hnowlan)
[15:50:06] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P57735 and previous config saved to /var/cache/conftool/dbconfig/20240222-155005-arnaudb.json
[15:51:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P57736 and previous config saved to /var/cache/conftool/dbconfig/20240222-155141-root.json
[15:53:37] <Emperor>	 !log depool thanos-fe2002 T355868
[15:53:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:53:43] <stashbot>	 T355868: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868
[15:54:02] <Emperor>	 !log depool codfs-mw T355868
[15:54:05] <logmsgbot>	 !log mvernon@cumin2002 conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
[15:54:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:55:32] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on 25 hosts with reason: Migrating servers in codfw rack B2 to lsw1-b2-codfw
[15:55:43] <wikibugs>	 (03PS3) 10Hnowlan: kubernetes: move all remaining eligible codfw jobrunners to k8s [puppet] - 10https://gerrit.wikimedia.org/r/1005786 (https://phabricator.wikimedia.org/T354791)
[15:56:07] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 25 hosts with reason: Migrating servers in codfw rack B2 to lsw1-b2-codfw
[15:56:13] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868#9568300 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=90864fe1-6d91-45db-a2a5-2bb22463c114...
[15:56:37] <wikibugs>	 (03CR) 10Hnowlan: kubernetes: move all remaining eligible codfw jobrunners to k8s (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1005786 (https://phabricator.wikimedia.org/T354791) (owner: 10Hnowlan)
[15:57:20] <hnowlan>	 !log depooling mw[1458,1467-1468,1483-1485,1494].eqiad.wmnet in advance of reimaging 
[15:57:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:58:54] <logmsgbot>	 !log cmooney@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host testvm2002.codfw.wmnet with OS bullseye
[16:00:21] <topranks>	 !log Commencing network maintenance migrating servers to new switch codfw rack B2 T355868
[16:00:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:00:42] <stashbot>	 T355868: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868
[16:04:19] <logmsgbot>	 !log volans@cumin1002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
[16:05:12] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1211 (T357189)', diff saved to https://phabricator.wikimedia.org/P57737 and previous config saved to /var/cache/conftool/dbconfig/20240222-160512-arnaudb.json
[16:05:14] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
[16:05:17] <logmsgbot>	 !log volans@cumin1002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1001.eqiad.wmnet
[16:05:18] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[16:05:28] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
[16:05:35] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1214 (T357189)', diff saved to https://phabricator.wikimedia.org/P57738 and previous config saved to /var/cache/conftool/dbconfig/20240222-160534-arnaudb.json
[16:05:41] <jinxer-wm>	 (SystemdUnitFailed) firing: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:06:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P57739 and previous config saved to /var/cache/conftool/dbconfig/20240222-160646-root.json
[16:07:54] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1214 (T357189)', diff saved to https://phabricator.wikimedia.org/P57740 and previous config saved to /var/cache/conftool/dbconfig/20240222-160753-arnaudb.json
[16:08:15] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868#9568400 (10cmooney) All hosts moved successfully and back responding to pings.
[16:09:01] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Message-ID: <CACNAgmwRgLo4Uy0w9ZtAi07kB_rA9wxC5T3p+kr+ftWDUq7hWg@mail.gmail.com> delayed by 3 days - https://phabricator.wikimedia.org/T358198#9568404 (10matmarex)
[16:09:04] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10Wikimedia-Incident: Not receiving posts or moderation messages - https://phabricator.wikimedia.org/T358020#9568405 (10matmarex)
[16:09:35] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Message-ID: <CACNAgmwRgLo4Uy0w9ZtAi07kB_rA9wxC5T3p+kr+ftWDUq7hWg@mail.gmail.com> delayed by 3 days - https://phabricator.wikimedia.org/T358198#9566926 (10matmarex) (presumably reopened by accident because Phabricator doesn't detect edit conflicts)
[16:10:24] <Emperor>	 !log repool thanos-fe2002 T355868
[16:10:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:10:30] <stashbot>	 T355868: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868
[16:10:56] <dancy>	 jouncebot nowandnext
[16:10:57] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 49 minute(s)
[16:10:57] <jouncebot>	 In 0 hour(s) and 49 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1700)
[16:11:02] <Emperor>	 !log repool codfs-mw T355868
[16:11:07] <logmsgbot>	 !log mvernon@cumin2002 conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
[16:11:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:11:45] <Emperor>	 sigh, at least my typos were consistent 🤦
[16:12:58] <wikibugs>	 (03PS2) 10Filippo Giunchedi: sre: fix WidespreadPuppetFailure logic for no resources [alerts] - 10https://gerrit.wikimedia.org/r/1005752 (https://phabricator.wikimedia.org/T357893)
[16:13:00] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868#9568428 (10MatthewVernon) Swift is back OK, thanks.
[16:13:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Good point, I've moved both to 3% instead" [alerts] - 10https://gerrit.wikimedia.org/r/1005752 (https://phabricator.wikimedia.org/T357893) (owner: 10Filippo Giunchedi)
[16:16:09] <logmsgbot>	 !log volans@cumin1002 START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
[16:18:43] <wikibugs>	 (03PS3) 10Cathal Mooney: WIP: adjust reimage cookbook to clear switch caches for vms too [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421)
[16:19:35] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.reimage for host testvm2002.codfw.wmnet with OS bullseye
[16:21:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P57741 and previous config saved to /var/cache/conftool/dbconfig/20240222-162151-root.json
[16:22:08] <wikibugs>	 (03PS1) 10Filippo Giunchedi: jaeger: bump collector and query resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005794 (https://phabricator.wikimedia.org/T358152)
[16:22:41] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP: adjust reimage cookbook to clear switch caches for vms too [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421) (owner: 10Cathal Mooney)
[16:22:50] <logmsgbot>	 !log volans@cumin1002 END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
[16:23:00] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P57742 and previous config saved to /var/cache/conftool/dbconfig/20240222-162300-arnaudb.json
[16:25:28] <logmsgbot>	 !log dancy@deploy2002 Installing scap version "4.66.0" for 458 hosts
[16:25:31] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] jaeger: bump collector and query resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005794 (https://phabricator.wikimedia.org/T358152) (owner: 10Filippo Giunchedi)
[16:26:26] <logmsgbot>	 !log dancy@deploy2002 Installation of scap version "4.66.0" completed for 458 hosts
[16:28:07] <logmsgbot>	 !log dancy@deploy2002 Started scap: testing T357402
[16:28:12] <stashbot>	 T357402: Scap should check errors coming from mw-on-k8s canaries during deployments - https://phabricator.wikimedia.org/T357402
[16:28:40] <logmsgbot>	 !log fabfur@cumin2002 START - Cookbook sre.hosts.remove-downtime for cp[2031-2032].codfw.wmnet
[16:28:42] <logmsgbot>	 !log fabfur@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp[2031-2032].codfw.wmnet
[16:30:51] <logmsgbot>	 !log fabfur@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=(cdn|ats-be)
[16:30:57] <logmsgbot>	 !log fabfur@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=(cdn|ats-be)
[16:32:56] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9568564 (10cmooney) p:05Triage→03Medium
[16:33:38] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9568586 (10cmooney)
[16:33:47] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868#9568588 (10Fabfur) cp2031 and cp2032 are ok and repooled
[16:36:18] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[16:36:25] <logmsgbot>	 !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[16:38:06] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P57743 and previous config saved to /var/cache/conftool/dbconfig/20240222-163806-arnaudb.json
[16:39:09] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9568622 (10cmooney) All interfaces on asw-a-codfw are set to 'disabled' apart from the uplinks to ssw's, and no mac's learnt on SSW side so proceeding to delete those links...
[16:42:12] <logmsgbot>	 !log akosiaris@cumin1002 conftool action : set/pooled=inactive; selector: service=parsoid-php,name=kubernetes.*
[16:43:04] <logmsgbot>	 !log dancy@deploy2002 sync-world aborted: testing T357402 (duration: 14m 57s)
[16:43:10] <stashbot>	 T357402: Scap should check errors coming from mw-on-k8s canaries during deployments - https://phabricator.wikimedia.org/T357402
[16:43:43] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+1] conftool: Remove thumbor [puppet] - 10https://gerrit.wikimedia.org/r/1005728 (owner: 10Alexandros Kosiaris)
[16:45:29] <logmsgbot>	 !log dancy@deploy2002 Started scap: testing T357402 again
[16:49:59] <wikibugs>	 (03PS1) 10Cathal Mooney: Remove definition/config for codfw ssw's ESI-LAG to asw-a-codfw [homer/public] - 10https://gerrit.wikimedia.org/r/1005799 (https://phabricator.wikimedia.org/T358244)
[16:52:39] <icinga-wm_>	 RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[16:53:13] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1214 (T357189)', diff saved to https://phabricator.wikimedia.org/P57744 and previous config saved to /var/cache/conftool/dbconfig/20240222-165312-arnaudb.json
[16:53:15] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
[16:53:21] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[16:53:23] <icinga-wm_>	 RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[16:53:28] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
[16:53:37] <icinga-wm_>	 RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[16:53:42] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
[16:53:55] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
[16:54:02] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db1226 (T357189)', diff saved to https://phabricator.wikimedia.org/P57745 and previous config saved to /var/cache/conftool/dbconfig/20240222-165401-arnaudb.json
[16:54:27] <logmsgbot>	 !log dancy@deploy2002 Finished scap: testing T357402 again (duration: 08m 58s)
[16:54:35] <stashbot>	 T357402: Scap should check errors coming from mw-on-k8s canaries during deployments - https://phabricator.wikimedia.org/T357402
[16:56:04] <topranks>	 !log disabling link from asw-a-codfw vc to ssw1-a1-codfw and ssw1-a8-codfw T355544
[16:56:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:56:09] <stashbot>	 T355544: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544
[16:56:19] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1226 (T357189)', diff saved to https://phabricator.wikimedia.org/P57746 and previous config saved to /var/cache/conftool/dbconfig/20240222-165619-arnaudb.json
[16:56:42] <wikibugs>	 10SRE, 10MW-on-K8s, 10Scap, 10serviceops, 10Release-Engineering-Team (Now this 🫠): Scap should check errors coming from mw-on-k8s canaries during deployments - https://phabricator.wikimedia.org/T357402#9568746 (10dancy) 05Open→03Resolved
[16:56:50] <wikibugs>	 10SRE, 10MW-on-K8s, 10Traffic, 10serviceops, 10Release-Engineering-Team (Seen): Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536#9568747 (10dancy)
[16:57:12] <wikibugs>	 (03PS3) 10Bking: rdf-streaming-updater: raise storage alert threshold [alerts] - 10https://gerrit.wikimedia.org/r/1005791 (https://phabricator.wikimedia.org/T348685)
[16:57:48] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Remove legacy codfw vc switches from synced hiera data after netbox status change - cmooney@cumin1002 - T355544"
[16:58:39] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Remove legacy codfw vc switches from synced hiera data after netbox status change - cmooney@cumin1002 - T355544"
[16:58:47] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+2] Remove definition/config for codfw ssw's ESI-LAG to asw-a-codfw [homer/public] - 10https://gerrit.wikimedia.org/r/1005799 (https://phabricator.wikimedia.org/T358244) (owner: 10Cathal Mooney)
[16:58:57] <wikibugs>	 (03CR) 10DLynch: [C: 03+1] DiscussionTools: Remove no-op config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1004749 (owner: 10Esanders)
[17:00:04] <jouncebot>	 jhathaway and rzl: gettimeofday() says it's time for Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1700)
[17:00:04] <jouncebot>	 dancy: A patch you scheduled for Puppet request window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[17:00:46] <dancy>	 o/
[17:00:56] <wikibugs>	 (03Merged) 10jenkins-bot: Remove definition/config for codfw ssw's ESI-LAG to asw-a-codfw [homer/public] - 10https://gerrit.wikimedia.org/r/1005799 (https://phabricator.wikimedia.org/T358244) (owner: 10Cathal Mooney)
[17:00:58] <rzl>	 dancy: hi! looking
[17:01:07] <jhathaway>	 looking as well
[17:01:38] <rzl>	 jhathaway: all yours if you'd like it :)
[17:01:48] <jhathaway>	 sure
[17:02:00] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9568765 (10cmooney) Ok I've removed the configuration for the ESI-LAG between the codfw spine switches and asw-a-codfw both sides now.  DC-Ops you can...
[17:02:18] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] logstash_checker.py: Exit 10 if over error threshold [puppet] - 10https://gerrit.wikimedia.org/r/1005610 (https://phabricator.wikimedia.org/T144033) (owner: 10Ahmon Dancy)
[17:02:39] <jhathaway>	 dancy: any special steps you want, other than merging in?
[17:02:51] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9568799 (10cmooney)
[17:03:10] <dancy>	 Just merging is fine
[17:03:13] <dancy>	 Thanks!
[17:03:20] <jhathaway>	 great, done
[17:05:33] <topranks>	 !log disabling IPv6 RAs for private1-a-codfw vlan on codfw core routers T355544
[17:05:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:05:50] <stashbot>	 T355544: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544
[17:11:26] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P57747 and previous config saved to /var/cache/conftool/dbconfig/20240222-171125-arnaudb.json
[17:11:32] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+1] "Not very Pythonic to have well behaved exit codes instead of just spewing confusing exception traces all over the place." [puppet] - 10https://gerrit.wikimedia.org/r/1005610 (https://phabricator.wikimedia.org/T144033) (owner: 10Ahmon Dancy)
[17:17:24] <wikibugs>	 10SRE, 10SRE-swift-storage, 10Data-Persistence-Backup: Create a read-only swift identity for backup taking - https://phabricator.wikimedia.org/T269108#9568895 (10MatthewVernon) a:03MatthewVernon
[17:25:00] <icinga-wm_>	 RECOVERY - BGP status on cr2-eqord is OK: BGP OK - up: 198, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:25:56] <jinxer-wm>	 (CirrusSearchNodeIndexingNotIncreasing) firing: (4) Elasticsearch instance elastic2057-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[17:26:22] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] kubernetes: move all remaining eligible eqiad jobrunners to k8s [puppet] - 10https://gerrit.wikimedia.org/r/1005776 (https://phabricator.wikimedia.org/T354791) (owner: 10Hnowlan)
[17:26:32] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P57748 and previous config saved to /var/cache/conftool/dbconfig/20240222-172632-arnaudb.json
[17:30:43] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] kubernetes: move all remaining eligible eqiad jobrunners to k8s (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1005776 (https://phabricator.wikimedia.org/T354791) (owner: 10Hnowlan)
[17:31:58] <icinga-wm_>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[17:34:37] <wikibugs>	 10SRE, 10SRE-swift-storage, 10Data-Persistence-Backup: Create a read-only swift identity for backup taking - https://phabricator.wikimedia.org/T269108#9569041 (10jcrespo) 05In progress→03Resolved It took some time to confirm it live, because the number of new deleted files don't grow as fast as the "late...
[17:35:39] <logmsgbot>	 !log cmooney@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host testvm2002.codfw.wmnet with OS bullseye
[17:36:38] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[17:38:05] <wikibugs>	 10SRE-OnFire, 10SRE-Sprint-Week-Sustainability-March2023, 10Cassandra, 10Data-Persistence, 10Sustainability (Incident Followup): Document best-practice for hinted-handoff - https://phabricator.wikimedia.org/T315517#9569072 (10Eevans) p:05Triage→03Low
[17:39:13] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[17:39:28] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host mw1458.eqiad.wmnet with OS bullseye
[17:39:30] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host mw1467.eqiad.wmnet with OS bullseye
[17:40:42] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
[17:41:07] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host mw1468.eqiad.wmnet with OS bullseye
[17:41:22] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791#9569088 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1468.eqiad.wmnet with OS bullseye
[17:41:23] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host mw1483.eqiad.wmnet with OS bullseye
[17:41:30] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host mw1484.eqiad.wmnet with OS bullseye
[17:41:32] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host mw1485.eqiad.wmnet with OS bullseye
[17:41:34] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791#9569090 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1483.eqiad.wmnet with OS bullseye
[17:41:39] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1226 (T357189)', diff saved to https://phabricator.wikimedia.org/P57749 and previous config saved to /var/cache/conftool/dbconfig/20240222-174138-arnaudb.json
[17:41:40] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
[17:41:42] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791#9569091 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1484.eqiad.wmnet with OS bullseye
[17:41:44] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host mw1494.eqiad.wmnet with OS bullseye
[17:41:54] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
[17:41:55] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[17:41:56] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791#9569092 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1485.eqiad.wmnet with OS bullseye
[17:42:02] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791#9569096 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host mw1494.eqiad.wmnet with OS bullseye
[17:42:06] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
[17:42:19] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] jaeger: bump collector and query resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005794 (https://phabricator.wikimedia.org/T358152) (owner: 10Filippo Giunchedi)
[17:42:22] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
[17:42:38] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
[17:42:51] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
[17:43:06] <wikibugs>	 (03Merged) 10jenkins-bot: jaeger: bump collector and query resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005794 (https://phabricator.wikimedia.org/T358152) (owner: 10Filippo Giunchedi)
[17:43:07] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
[17:43:19] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
[17:43:22] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
[17:43:29] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2152 (T357189)', diff saved to https://phabricator.wikimedia.org/P57750 and previous config saved to /var/cache/conftool/dbconfig/20240222-174328-arnaudb.json
[17:43:47] <logmsgbot>	 !log cdanis@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
[17:43:49] <wikibugs>	 10SRE, 10Traffic: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260#9569109 (10cmooney) p:05Triage→03Low
[17:43:50] <logmsgbot>	 !log cdanis@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
[17:43:58] <logmsgbot>	 !log cdanis@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
[17:44:02] <logmsgbot>	 !log cdanis@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
[17:44:18] <wikibugs>	 10SRE, 10Cassandra, 10Data-Persistence: Migrate Cassandra to Java 11 - https://phabricator.wikimedia.org/T350567#9569125 (10Eevans) p:05Triage→03Medium
[17:44:26] <wikibugs>	 (03CR) 10BCornwall: "Looks good to me, see inline." [puppet] - 10https://gerrit.wikimedia.org/r/1005548 (https://phabricator.wikimedia.org/T358105) (owner: 10Fabfur)
[17:44:41] <wikibugs>	 10SRE, 10Traffic: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260#9569128 (10cmooney)
[17:44:50] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2152 (T357189)', diff saved to https://phabricator.wikimedia.org/P57751 and previous config saved to /var/cache/conftool/dbconfig/20240222-174449-arnaudb.json
[17:44:50] <logmsgbot>	 !log cdanis@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
[17:45:10] <logmsgbot>	 !log cdanis@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
[17:45:16] <jinxer-wm>	 (AppserversUnreachable) firing: Appserver unavailable for cluster jobrunner at eqiad - https://wikitech.wikimedia.org/wiki/Application_servers - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&var-site=eqiad&var-cluster=jobrunner - https://alerts.wikimedia.org/?q=alertname%3DAppserversUnreachable
[17:46:05] <hnowlan>	 ^ me, not an actual problem. acked 
[17:48:38] <jinxer-wm>	 (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:51:55] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host mw2384.codfw.wmnet with OS bullseye
[17:52:08] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1458.eqiad.wmnet with reason: host reimage
[17:52:33] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1467.eqiad.wmnet with reason: host reimage
[17:54:02] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1468.eqiad.wmnet with reason: host reimage
[17:54:19] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1485.eqiad.wmnet with reason: host reimage
[17:54:35] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1484.eqiad.wmnet with reason: host reimage
[17:54:47] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1458.eqiad.wmnet with reason: host reimage
[17:54:48] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1483.eqiad.wmnet with reason: host reimage
[17:54:59] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1494.eqiad.wmnet with reason: host reimage
[17:57:23] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1483.eqiad.wmnet with reason: host reimage
[17:59:09] <wikibugs>	 (03PS1) 10BryanDavis: developer-portal: Bump container to 2024-02-22-164056-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005809 (https://phabricator.wikimedia.org/T280500)
[17:59:29] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1468.eqiad.wmnet with reason: host reimage
[17:59:56] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P57752 and previous config saved to /var/cache/conftool/dbconfig/20240222-175956-arnaudb.json
[18:00:05] <jouncebot>	 bd808: I, the Bot under the Fountain, call upon thee, The Deployer, to do Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1800).
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1800)
[18:00:44] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+2] developer-portal: Bump container to 2024-02-22-164056-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005809 (https://phabricator.wikimedia.org/T280500) (owner: 10BryanDavis)
[18:01:39] <wikibugs>	 (03Merged) 10jenkins-bot: developer-portal: Bump container to 2024-02-22-164056-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005809 (https://phabricator.wikimedia.org/T280500) (owner: 10BryanDavis)
[18:01:49] <wikibugs>	 10SRE, 10Traffic: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260#9569247 (10cmooney) So I'm realising the RAs are how the LVS is determining the attached v6 subnet and creating the auto-assigned eui-64 addresses on each vlan interface....
[18:01:58] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1494.eqiad.wmnet with reason: host reimage
[18:02:45] <jinxer-wm>	 (AppserversUnreachable) resolved: Appserver unavailable for cluster jobrunner at eqiad - https://wikitech.wikimedia.org/wiki/Application_servers - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&var-site=eqiad&var-cluster=jobrunner - https://alerts.wikimedia.org/?q=alertname%3DAppserversUnreachable
[18:02:47] <logmsgbot>	 !log bd808@deploy2002 helmfile [staging] START helmfile.d/services/developer-portal: apply
[18:03:07] <logmsgbot>	 !log bd808@deploy2002 helmfile [staging] DONE helmfile.d/services/developer-portal: apply
[18:03:11] <logmsgbot>	 !log hnowlan@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host mw2384.codfw.wmnet with OS bullseye
[18:03:21] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10vm-requests: Ganeti VM for contint migration - https://phabricator.wikimedia.org/T358237#9569250 (10MoritzMuehlenhoff) What do you want to use as the host name, something like zuul1001?
[18:03:25] <jinxer-wm>	 (SystemdUnitFailed) firing: ferm.service on mw1457:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:03:38] <logmsgbot>	 !log bd808@deploy2002 helmfile [eqiad] START helmfile.d/services/developer-portal: apply
[18:03:43] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host mw2384.codfw.wmnet with OS bullseye
[18:04:04] <logmsgbot>	 !log bd808@deploy2002 helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
[18:04:11] <logmsgbot>	 !log bd808@deploy2002 helmfile [codfw] START helmfile.d/services/developer-portal: apply
[18:04:34] <logmsgbot>	 !log bd808@deploy2002 helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
[18:04:49] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1484.eqiad.wmnet with reason: host reimage
[18:07:49] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1467.eqiad.wmnet with reason: host reimage
[18:11:21] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1485.eqiad.wmnet with reason: host reimage
[18:12:01] <wikibugs>	 10SRE, 10Traffic: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260#9569280 (10cmooney) >>! In T358260#9569247, @cmooney wrote: > I notice there is a //"net.ipv6.conf.<interface>.accept_ra_defrtr"// which from what I can tell will not add a...
[18:12:35] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1458.eqiad.wmnet with OS bullseye
[18:13:47] <wikibugs>	 (03PS17) 10Ayounsi: Netbox module: add get/set for primary IPs and access vlan [software/spicerack] - 10https://gerrit.wikimedia.org/r/979040 (https://phabricator.wikimedia.org/T350152)
[18:14:29] <wikibugs>	 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations, 10Patch-For-Review: Cookbook sre.hardware.upgrade-firmware fails to get firmwares from Dell's website - https://phabricator.wikimedia.org/T357756#9569312 (10Volans) I've tested that the cookbook works fine with the existing cached firmwares on the cumin...
[18:14:49] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1483.eqiad.wmnet with OS bullseye
[18:14:55] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/979040 (https://phabricator.wikimedia.org/T350152) (owner: 10Ayounsi)
[18:15:02] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791#9569314 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1483.eqiad.wmnet with OS bullseye completed: - mw1483 (**PASS**)...
[18:15:03] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P57753 and previous config saved to /var/cache/conftool/dbconfig/20240222-181502-arnaudb.json
[18:17:01] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1468.eqiad.wmnet with OS bullseye
[18:17:12] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791#9569317 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1468.eqiad.wmnet with OS bullseye completed: - mw1468 (**PASS**)...
[18:18:57] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw2384.codfw.wmnet with reason: host reimage
[18:19:50] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Do we need to generate aggregates for LVS service IP ranges? - https://phabricator.wikimedia.org/T350354#9569320 (10cmooney) 05Open→03Resolved a:03cmooney >>! In T350354#9312533, @BBlack wrote: > I don't suspect it serves any real purpose at present, unles...
[18:21:03] <icinga-wm_>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw1457 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[18:21:47] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2384.codfw.wmnet with reason: host reimage
[18:22:17] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1484.eqiad.wmnet with OS bullseye
[18:22:26] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
[18:22:30] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791#9569350 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1484.eqiad.wmnet with OS bullseye completed: - mw1484 (**PASS**)...
[18:22:31] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
[18:24:43] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1494.eqiad.wmnet with OS bullseye
[18:24:56] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791#9569356 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1494.eqiad.wmnet with OS bullseye completed: - mw1494 (**WARN**)...
[18:25:19] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1467.eqiad.wmnet with OS bullseye
[18:28:44] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1485.eqiad.wmnet with OS bullseye
[18:28:58] <wikibugs>	 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791#9569364 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host mw1485.eqiad.wmnet with OS bullseye completed: - mw1485 (**PASS**)...
[18:30:03] <wikibugs>	 (03PS5) 10Cathal Mooney: Change name of dhcp_relay var and use it to control CR IPv6 RAs also [homer/public] - 10https://gerrit.wikimedia.org/r/1005772 (https://phabricator.wikimedia.org/T358220)
[18:30:09] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2152 (T357189)', diff saved to https://phabricator.wikimedia.org/P57755 and previous config saved to /var/cache/conftool/dbconfig/20240222-183009-arnaudb.json
[18:30:11] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
[18:30:20] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[18:30:24] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
[18:30:31] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2154 (T357189)', diff saved to https://phabricator.wikimedia.org/P57756 and previous config saved to /var/cache/conftool/dbconfig/20240222-183030-arnaudb.json
[18:31:48] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.reimage for host mw2385.codfw.wmnet with OS bullseye
[18:32:52] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2154 (T357189)', diff saved to https://phabricator.wikimedia.org/P57757 and previous config saved to /var/cache/conftool/dbconfig/20240222-183251-arnaudb.json
[18:40:39] <jinxer-wm>	 (CirrusSearchNodeIndexingNotIncreasing) firing: (4) Elasticsearch instance elastic2057-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[18:40:56] <wikibugs>	 (03PS1) 10Ssingh: Revert "conftool: introduce schema and host file for dnsboxes" [puppet] - 10https://gerrit.wikimedia.org/r/1005693
[18:42:39] <wikibugs>	 (03CR) 10Ssingh: "After some more discussion, bblack and I have decided to revert this custom schema. This schema was necessitated mostly by our requirement" [puppet] - 10https://gerrit.wikimedia.org/r/1005693 (owner: 10Ssingh)
[18:44:57] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2384.codfw.wmnet with OS bullseye
[18:45:18] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Netbox module: add get/set for primary IPs and access vlan [software/spicerack] - 10https://gerrit.wikimedia.org/r/979040 (https://phabricator.wikimedia.org/T350152) (owner: 10Ayounsi)
[18:46:53] <logmsgbot>	 !log hnowlan@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw2385.codfw.wmnet with reason: host reimage
[18:47:58] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P57758 and previous config saved to /var/cache/conftool/dbconfig/20240222-184757-arnaudb.json
[18:49:43] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2385.codfw.wmnet with reason: host reimage
[18:50:39] <jinxer-wm>	 (CirrusSearchNodeIndexingNotIncreasing) firing: (4) Elasticsearch instance elastic2057-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[18:52:17] <wikibugs>	 10SRE, 10Traffic: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260#9569458 (10cmooney)
[18:53:04] <wikibugs>	 (03PS1) 10Ssingh: Revert "tests: add schema for dnsbox" [software/conftool] - 10https://gerrit.wikimedia.org/r/1005694
[18:53:26] <wikibugs>	 (03Merged) 10jenkins-bot: Netbox module: add get/set for primary IPs and access vlan [software/spicerack] - 10https://gerrit.wikimedia.org/r/979040 (https://phabricator.wikimedia.org/T350152) (owner: 10Ayounsi)
[18:56:08] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Revert "tests: add schema for dnsbox" [software/conftool] - 10https://gerrit.wikimedia.org/r/1005694 (owner: 10Ssingh)
[18:56:59] <wikibugs>	 (03CR) 10Ssingh: "13:55:06 ERROR: InvocationError for command /src/.tox/py38-style/bin/black --config black.toml --check --diff . (exited with code 1)" [software/conftool] - 10https://gerrit.wikimedia.org/r/1005694 (owner: 10Ssingh)
[18:59:23] <wikibugs>	 (03PS1) 10CDanis: WIP [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005819
[18:59:28] <wikibugs>	 (03PS1) 10Jdlrobson: Change font-size "Small" label to "Standard" [extensions/MobileFrontend] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005695 (https://phabricator.wikimedia.org/T358074)
[19:00:04] <jouncebot>	 jeena and brennen: I, the Bot under the Fountain, call upon thee, The Deployer, to do MediaWiki train - Utc-7 Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1900).
[19:00:29] <jeena>	 o/
[19:01:11] <brennen>	 o/
[19:03:04] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P57759 and previous config saved to /var/cache/conftool/dbconfig/20240222-190304-arnaudb.json
[19:04:03] <wikibugs>	 10SRE, 10Traffic: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260#9569509 (10cmooney) FWIW this was the test I ran on one of our bookworm hosts.  Starting with primary interface down, and vlan interface which is built on it also down, plu...
[19:05:40] <jinxer-wm>	 (CirrusSearchNodeIndexingNotIncreasing) firing: (3) Elasticsearch instance elastic2063-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[19:05:55] <inflatador>	 ^^ looking into these Elastic alerts
[19:06:02] <wikibugs>	 (03PS1) 10TrainBranchBot: group2 wikis to 1.42.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005820 (https://phabricator.wikimedia.org/T354437)
[19:06:04] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] group2 wikis to 1.42.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005820 (https://phabricator.wikimedia.org/T354437) (owner: 10TrainBranchBot)
[19:06:40] <wikibugs>	 (03PS2) 10CDanis: jaeger: also give cpu res/limit to oauth2-proxy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005819 (https://phabricator.wikimedia.org/T358152)
[19:06:58] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] jaeger: also give cpu res/limit to oauth2-proxy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005819 (https://phabricator.wikimedia.org/T358152) (owner: 10CDanis)
[19:07:03] <wikibugs>	 (03Merged) 10jenkins-bot: group2 wikis to 1.42.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005820 (https://phabricator.wikimedia.org/T354437) (owner: 10TrainBranchBot)
[19:07:54] <wikibugs>	 (03Merged) 10jenkins-bot: jaeger: also give cpu res/limit to oauth2-proxy [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005819 (https://phabricator.wikimedia.org/T358152) (owner: 10CDanis)
[19:14:57] <logmsgbot>	 !log hnowlan@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2385.codfw.wmnet with OS bullseye
[19:15:16] <wikibugs>	 (03PS1) 10CDanis: jaeger: bump chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005822
[19:15:39] <jinxer-wm>	 (CirrusSearchNodeIndexingNotIncreasing) resolved: (2) Elasticsearch instance elastic2063-production-search-codfw is not indexing - https://wikitech.wikimedia.org/wiki/Search#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[19:16:07] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] jaeger: bump chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005822 (owner: 10CDanis)
[19:17:45] <wikibugs>	 (03PS2) 10CDanis: jaeger: bump chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005822
[19:17:58] <wikibugs>	 (03PS1) 10Dbrant: Add verbiage for Account Vanishing contact page. [extensions/WikimediaMessages] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005696 (https://phabricator.wikimedia.org/T343536)
[19:18:11] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2154 (T357189)', diff saved to https://phabricator.wikimedia.org/P57760 and previous config saved to /var/cache/conftool/dbconfig/20240222-191810-arnaudb.json
[19:18:13] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
[19:18:18] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[19:18:27] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
[19:18:32] <logmsgbot>	 !log jhuneidi@deploy2002 rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.19  refs T354437
[19:18:34] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2161 (T357189)', diff saved to https://phabricator.wikimedia.org/P57761 and previous config saved to /var/cache/conftool/dbconfig/20240222-191834-arnaudb.json
[19:18:37] <stashbot>	 T354437: 1.42.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T354437
[19:19:07] <wikibugs>	 10SRE, 10ops-codfw, 10serviceops: Issues reimaging servers in codfw - https://phabricator.wikimedia.org/T358001#9569554 (10hnowlan) 05Open→03Resolved a:03hnowlan >>! In T358001#9563665, @Jhancock.wm wrote: > @hnowlan I've replaced the network cable on both of these. These are both connected to a 1G swi...
[19:19:25] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] jaeger: bump chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005822 (owner: 10CDanis)
[19:20:13] <wikibugs>	 (03Merged) 10jenkins-bot: jaeger: bump chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005822 (owner: 10CDanis)
[19:20:38] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+2] rdf-streaming-updater: raise storage alert threshold [alerts] - 10https://gerrit.wikimedia.org/r/1005791 (https://phabricator.wikimedia.org/T348685) (owner: 10Bking)
[19:20:56] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2161 (T357189)', diff saved to https://phabricator.wikimedia.org/P57762 and previous config saved to /var/cache/conftool/dbconfig/20240222-192055-arnaudb.json
[19:21:41] <wikibugs>	 (03PS1) 10Dbrant: testwiki: Allow modifying email in account vanishing contact form. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005824 (https://phabricator.wikimedia.org/T343536)
[19:21:44] <wikibugs>	 (03Merged) 10jenkins-bot: rdf-streaming-updater: raise storage alert threshold [alerts] - 10https://gerrit.wikimedia.org/r/1005791 (https://phabricator.wikimedia.org/T348685) (owner: 10Bking)
[19:22:56] <logmsgbot>	 !log cdanis@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
[19:23:39] <logmsgbot>	 !log cdanis@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
[19:27:29] <logmsgbot>	 !log robh@cumin2002 START - Cookbook sre.dns.netbox
[19:29:26] <logmsgbot>	 !log robh@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup incorrect asset tags - robh@cumin2002"
[19:30:18] <logmsgbot>	 !log robh@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup incorrect asset tags - robh@cumin2002"
[19:30:18] <logmsgbot>	 !log robh@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:33:46] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10LDAP-Access-Requests, 10Phabricator, 10Patch-For-Review: Migrate dev user accounts for bvibber - https://phabricator.wikimedia.org/T358044#9569601 (10bvibber) >>! In T358044#9562210, @Bugreporter wrote: >>Too late now, but Phabricator accounts are easy enough to rename and...
[19:35:36] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10LDAP-Access-Requests, 10Phabricator, 10Patch-For-Review: Migrate dev user accounts for bvibber - https://phabricator.wikimedia.org/T358044#9569603 (10bvibber) >>! In T358044#9562598, @MoritzMuehlenhoff wrote: > @bvibber Renaming the user name for SSH access will leave file...
[19:36:00] <wikibugs>	 (03PS4) 10Cathal Mooney: WIP: adjust reimage cookbook to clear switch caches for vms too [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421)
[19:36:02] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P57763 and previous config saved to /var/cache/conftool/dbconfig/20240222-193601-arnaudb.json
[19:40:16] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.reimage for host testvm2002.codfw.wmnet with OS bullseye
[19:40:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP: adjust reimage cookbook to clear switch caches for vms too [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421) (owner: 10Cathal Mooney)
[19:45:11] <icinga-wm_>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1002 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[19:46:14] <James_F>	 jouncebot: nowandnext
[19:46:14] <jouncebot>	 For the next 1 hour(s) and 13 minute(s): MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T1900)
[19:46:14] <jouncebot>	 In 1 hour(s) and 13 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T2100)
[19:46:53] <wikibugs>	 (03PS4) 10Jforrester: wikifunctions: Upgrade orchestrator from 2024-01-09-190638 to 2024-01-18-182456 [deployment-charts] - 10https://gerrit.wikimedia.org/r/992756 (https://phabricator.wikimedia.org/T278596)
[19:47:56] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10LDAP-Access-Requests, 10Phabricator, 10Patch-For-Review: Migrate dev user accounts for bvibber - https://phabricator.wikimedia.org/T358044#9569624 (10bvibber) Gerrit lets me connect but won't let me push updates to a patchset:  ` % git review remote:  remote: Processing ch...
[19:48:02] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] wikifunctions: Upgrade orchestrator from 2024-01-09-190638 to 2024-01-18-182456 [deployment-charts] - 10https://gerrit.wikimedia.org/r/992756 (https://phabricator.wikimedia.org/T278596) (owner: 10Jforrester)
[19:49:00] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Upgrade orchestrator from 2024-01-09-190638 to 2024-01-18-182456 [deployment-charts] - 10https://gerrit.wikimedia.org/r/992756 (https://phabricator.wikimedia.org/T278596) (owner: 10Jforrester)
[19:49:34] <logmsgbot>	 !log jforrester@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[19:50:04] <logmsgbot>	 !log jforrester@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[19:50:10] <wikibugs>	 (03CR) 10Fabfur: [V: 03+1] haproxy: configure extended logging (preparatory for Benthos) (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1005548 (https://phabricator.wikimedia.org/T358105) (owner: 10Fabfur)
[19:50:35] <logmsgbot>	 !log jforrester@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[19:51:08] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P57764 and previous config saved to /var/cache/conftool/dbconfig/20240222-195108-arnaudb.json
[19:52:03] <logmsgbot>	 !log jforrester@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[19:52:18] <logmsgbot>	 !log jforrester@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[19:52:27] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
[19:53:03] <icinga-wm_>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[19:53:06] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10LDAP-Access-Requests, 10Phabricator, 10Patch-For-Review: Migrate dev user accounts for bvibber - https://phabricator.wikimedia.org/T358044#9569661 (10taavi) Only members of https://gerrit.wikimedia.org/r/admin/groups/2021f25e7515187a81d51f8fe14dd6f25617cce0 can amend chang...
[19:53:26] <logmsgbot>	 !log jforrester@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[19:53:37] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10LDAP-Access-Requests, 10Phabricator, 10Patch-For-Review: Migrate dev user accounts for bvibber - https://phabricator.wikimedia.org/T358044#9569668 (10bvibber) Thx!
[19:53:43] <wikibugs>	 (03PS1) 10CDobbins: admin: update data.yaml for cdobbins [puppet] - 10https://gerrit.wikimedia.org/r/1005828
[19:53:59] <wikibugs>	 (03PS2) 10Jforrester: wikifunctions: Upgrade orchestrator from 2024-01-18-182456 to 2024-02-12-155846 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1002624 (https://phabricator.wikimedia.org/T296937)
[19:54:08] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] wikifunctions: Upgrade orchestrator from 2024-01-18-182456 to 2024-02-12-155846 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1002624 (https://phabricator.wikimedia.org/T296937) (owner: 10Jforrester)
[19:55:07] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Upgrade orchestrator from 2024-01-18-182456 to 2024-02-12-155846 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1002624 (https://phabricator.wikimedia.org/T296937) (owner: 10Jforrester)
[19:55:17] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
[19:55:20] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] admin: update data.yaml for cdobbins [puppet] - 10https://gerrit.wikimedia.org/r/1005828 (owner: 10CDobbins)
[19:56:10] <logmsgbot>	 !log jforrester@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[19:56:49] <logmsgbot>	 !log jforrester@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[19:57:27] <logmsgbot>	 !log jforrester@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[19:58:45] <logmsgbot>	 !log jforrester@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[19:58:53] <logmsgbot>	 !log jforrester@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[20:00:37] <logmsgbot>	 !log jforrester@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[20:00:46] <wikibugs>	 10SRE, 10Data-Platform-SRE: Update maxmind download to pull databases from new url - https://phabricator.wikimedia.org/T358268#9569715 (10Gehel) p:05Triage→03High
[20:02:42] <wikibugs>	 (03Abandoned) 10CDobbins: admin: update data.yaml for cdobbins [puppet] - 10https://gerrit.wikimedia.org/r/1005828 (owner: 10CDobbins)
[20:03:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (2) ferm.service on mw1457:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:03:42] <wikibugs>	 (03PS2) 10Jforrester: wikifunctions: Upgrade evaluators from 2024-01-18-182630 to 2024-02-12-160222 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1002625 (https://phabricator.wikimedia.org/T287978)
[20:05:21] <wikibugs>	 (03PS1) 10CDobbins: admin: update data.yaml for cdobbins [puppet] - 10https://gerrit.wikimedia.org/r/1005830
[20:05:41] <jinxer-wm>	 (SystemdUnitFailed) firing: ncmonitor.service on ncmonitor1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:06:14] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2161 (T357189)', diff saved to https://phabricator.wikimedia.org/P57765 and previous config saved to /var/cache/conftool/dbconfig/20240222-200614-arnaudb.json
[20:06:16] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
[20:06:17] <logmsgbot>	 !log cmooney@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host testvm2002.codfw.wmnet with OS bullseye
[20:06:22] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[20:06:30] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
[20:06:38] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2162 (T357189)', diff saved to https://phabricator.wikimedia.org/P57766 and previous config saved to /var/cache/conftool/dbconfig/20240222-200636-arnaudb.json
[20:07:44] <wikibugs>	 (03CR) 10Ssingh: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1005830 (owner: 10CDobbins)
[20:08:22] <wikibugs>	 (03CR) 10CDobbins: [C: 03+2] admin: update data.yaml for cdobbins [puppet] - 10https://gerrit.wikimedia.org/r/1005122 (owner: 10CDobbins)
[20:08:59] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2162 (T357189)', diff saved to https://phabricator.wikimedia.org/P57767 and previous config saved to /var/cache/conftool/dbconfig/20240222-200858-arnaudb.json
[20:12:03] <wikibugs>	 (03CR) 10CDobbins: [C: 03+2] admin: update data.yaml for cdobbins [puppet] - 10https://gerrit.wikimedia.org/r/1005830 (owner: 10CDobbins)
[20:17:48] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to releasers-wikibase for darthmon_wmde - https://phabricator.wikimedia.org/T342968#9569913 (10darthmon_wmde)
[20:19:49] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10SRE Observability: Q#:rack/setup/install logging-hd100[123] - https://phabricator.wikimedia.org/T355700#9569914 (10Jclark-ctr) a:03Jclark-ctr
[20:19:58] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to releasers-wikibase for darthmon_wmde - https://phabricator.wikimedia.org/T342968#9569915 (10darthmon_wmde) 05Invalid→03Open I am very sorry - this ticket got out of my sight and I completely forgot about it. Could we pick it up anew, please?  I just added...
[20:21:11] <icinga-wm_>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw1494 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[20:23:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (3) ferm.service on mw1457:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:24:05] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P57768 and previous config saved to /var/cache/conftool/dbconfig/20240222-202404-arnaudb.json
[20:35:43] <wikibugs>	 (03PS5) 10Cathal Mooney: WIP: adjust reimage cookbook to clear switch caches for vms too [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421)
[20:36:31] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10LDAP-Access-Requests, 10Phabricator, 10Patch-For-Review: Migrate dev user accounts for bvibber - https://phabricator.wikimedia.org/T358044#9569964 (10Peachey88) >>! In T358044#9569601, @bvibber wrote: > That's probably the way to go then, if it'll keep assignments intact w...
[20:39:08] <wikibugs>	 (03PS6) 10Cathal Mooney: WIP: adjust reimage cookbook to clear switch caches for vms too [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421)
[20:39:11] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P57769 and previous config saved to /var/cache/conftool/dbconfig/20240222-203911-arnaudb.json
[20:41:06] <wikibugs>	 (03PS7) 10Cathal Mooney: WIP: adjust reimage cookbook to clear switch caches for vms too [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421)
[20:41:17] <icinga-wm_>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw2384 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[20:44:31] <wikibugs>	 10SRE-swift-storage, 10MediaWiki-Uploading, 10MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), 10User-revi: FAILED: stashfailed: Could not read file "mwstore://local-swift-eqiad/local-temp/a/ac/15xi9btm14os.u9p1dr.1208681.webm.0". - https://phabricator.wikimedia.org/T200820#9569981 (10Bawolff) >>! In T200820#956...
[20:44:41] <wikibugs>	 (03PS8) 10Cathal Mooney: WIP: adjust reimage cookbook to clear switch caches for vms too [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421)
[20:45:39] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.reimage for host testvm2002.codfw.wmnet with OS bullseye
[20:49:06] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP: adjust reimage cookbook to clear switch caches for vms too [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421) (owner: 10Cathal Mooney)
[20:53:25] <jinxer-wm>	 (SystemdUnitFailed) firing: (4) ferm.service on mw1457:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:54:18] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2162 (T357189)', diff saved to https://phabricator.wikimedia.org/P57770 and previous config saved to /var/cache/conftool/dbconfig/20240222-205417-arnaudb.json
[20:54:20] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
[20:54:24] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[20:54:34] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
[20:54:38] <wikibugs>	 10SRE, 10MW-on-K8s, 10Scap, 10serviceops, 10Release-Engineering-Team (Now this 🫠): Find a way to address canary releases directly - https://phabricator.wikimedia.org/T358117#9570020 (10thcipriani) >>! In T358117#9566949, @Clement_Goubert wrote: > We've talked this over, and while doing swagger checks mad...
[20:54:41] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2163 (T357189)', diff saved to https://phabricator.wikimedia.org/P57771 and previous config saved to /var/cache/conftool/dbconfig/20240222-205440-arnaudb.json
[20:55:01] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10SRE Observability: Q#:rack/setup/install logging-hd100[123] - https://phabricator.wikimedia.org/T355700#9570023 (10VRiley-WMF)
[20:57:02] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2163 (T357189)', diff saved to https://phabricator.wikimedia.org/P57772 and previous config saved to /var/cache/conftool/dbconfig/20240222-205701-arnaudb.json
[20:57:27] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
[21:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: OwO what's this, a deployment window?? UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240222T2100). nyaa~
[21:00:05] <jouncebot>	 jan_drewniak, dbrant, and bawolff: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:14] <bawolff>	 \o/
[21:00:33] <jan_drewniak>	 o/
[21:00:42] <dbrant>	 o/
[21:01:10] <cjming>	 i can deploy
[21:01:24] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
[21:02:10] <cjming>	 hi jan_drewniak :) i'll start with yours
[21:02:29] <jan_drewniak>	 hi cjming! thanks
[21:02:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by cjming@deploy2002 using scap backport" [extensions/MobileFrontend] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005695 (https://phabricator.wikimedia.org/T358074) (owner: 10Jdlrobson)
[21:06:28] <wikibugs>	 10SRE, 10Data-Platform-SRE: Update maxmind download to pull databases from new url - https://phabricator.wikimedia.org/T358268#9570086 (10Dwisehaupt) We are tracking this from the fr-tech side in T358043. No impact on your work, just adding for full knowledge.
[21:06:36] <cjming>	 hi dbrant :) i'll get your backport going too bec CI
[21:06:59] <dbrant>	 thx cjming
[21:07:02] <wikibugs>	 (03CR) 10Clare Ming: [C: 03+2] Add verbiage for Account Vanishing contact page. [extensions/WikimediaMessages] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005696 (https://phabricator.wikimedia.org/T343536) (owner: 10Dbrant)
[21:07:49] <cjming>	 bawolff: can you prep your backport patch and add it to the cal? 
[21:08:24] <bawolff>	 oh right, sorry its been a super long time since I've done this. Just a moment
[21:08:46] <cjming>	 np!
[21:09:35] <wikibugs>	 (03PS1) 10Brian Wolff: Improve chunked upload jobs and abort assemble job if already in progress [core] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005698 (https://phabricator.wikimedia.org/T200820)
[21:10:18] <icinga-wm_>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw2385 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[21:10:27] <bawolff>	 cjming: calendar updated
[21:10:36] <cjming>	 ty
[21:11:00] * bawolff used to just doing config patches.
[21:12:09] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P57773 and previous config saved to /var/cache/conftool/dbconfig/20240222-211208-arnaudb.json
[21:12:18] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
[21:17:48] <wikibugs>	 (03PS9) 10Cathal Mooney: Adjust reimage cookbook to clear switch caches for vms too [cookbooks] - 10https://gerrit.wikimedia.org/r/1005573 (https://phabricator.wikimedia.org/T306421)
[21:21:25] <wikibugs>	 (03Merged) 10jenkins-bot: Change font-size "Small" label to "Standard" [extensions/MobileFrontend] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005695 (https://phabricator.wikimedia.org/T358074) (owner: 10Jdlrobson)
[21:21:38] <logmsgbot>	 !log cjming@deploy2002 Started scap: Backport for [[gerrit:1005695|Change font-size "Small" label to "Standard" (T358074)]]
[21:21:47] <stashbot>	 T358074: Mobile labels are incorrect for font size - https://phabricator.wikimedia.org/T358074
[21:22:40] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Persistence: Q3:rack/setup/install es10[35-40] - https://phabricator.wikimedia.org/T355269#9570125 (10VRiley-WMF)
[21:27:06] <wikibugs>	 (03Merged) 10jenkins-bot: Add verbiage for Account Vanishing contact page. [extensions/WikimediaMessages] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005696 (https://phabricator.wikimedia.org/T343536) (owner: 10Dbrant)
[21:27:15] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P57774 and previous config saved to /var/cache/conftool/dbconfig/20240222-212715-arnaudb.json
[21:27:50] <cjming>	 ahoy - if any SREs are around -- maybe i'm being impatient but i don't recall in recent memory getting changes out to test servers taking so long -- seems stuck on "K8s images build/push output redirected to /home/cjming/scap-image-build-and-push-log" in my terminal
[21:28:06] <wikibugs>	 (03CR) 10Andrea Denisse: [C: 03+1] "LGTM, thank you!" [alerts] - 10https://gerrit.wikimedia.org/r/1005752 (https://phabricator.wikimedia.org/T357893) (owner: 10Filippo Giunchedi)
[21:28:40] <cjming>	 doh - nvm - just started going again
[21:29:16] <cjming>	 but it does seem pokey fwiw
[21:29:21] <taavi>	 cjming: backporting i18n changes is always super slow
[21:29:39] <cjming>	 ah - gtk - thx
[21:29:47] <dbrant>	 (apologies)
[21:29:55] <cjming>	 lol - nw
[21:35:51] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Upgrade orchestrator from 2024-02-12-155846 to 2024-02-22-165335 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1005843 (https://phabricator.wikimedia.org/T335695)
[21:35:54] <logmsgbot>	 !log cjming@deploy2002 cjming and jdlrobson: Backport for [[gerrit:1005695|Change font-size "Small" label to "Standard" (T358074)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:35:57] <cjming>	 jan_drewniak: up on mwdebug - lmk when to sync
[21:36:06] <stashbot>	 T358074: Mobile labels are incorrect for font size - https://phabricator.wikimedia.org/T358074
[21:38:06] <cjming>	 jan_drewniak: shall i sync?
[21:39:11] <jan_drewniak>	 hey cjming looks good to sync
[21:39:25] <logmsgbot>	 !log cjming@deploy2002 cjming and jdlrobson: Continuing with sync
[21:42:22] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2163 (T357189)', diff saved to https://phabricator.wikimedia.org/P57775 and previous config saved to /var/cache/conftool/dbconfig/20240222-214221-arnaudb.json
[21:42:25] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
[21:42:28] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[21:42:49] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
[21:42:50] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
[21:43:04] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
[21:43:11] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2164 (T357189)', diff saved to https://phabricator.wikimedia.org/P57776 and previous config saved to /var/cache/conftool/dbconfig/20240222-214310-arnaudb.json
[21:47:33] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2164 (T357189)', diff saved to https://phabricator.wikimedia.org/P57777 and previous config saved to /var/cache/conftool/dbconfig/20240222-214732-arnaudb.json
[21:47:38] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[21:48:38] <jinxer-wm>	 (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:50:46] <logmsgbot>	 !log cjming@deploy2002 Finished scap: Backport for [[gerrit:1005695|Change font-size "Small" label to "Standard" (T358074)]] (duration: 29m 07s)
[21:50:51] <stashbot>	 T358074: Mobile labels are incorrect for font size - https://phabricator.wikimedia.org/T358074
[21:51:05] <logmsgbot>	 !log cjming@deploy2002 Started scap: Backport for [[gerrit:1005696|Add verbiage for Account Vanishing contact page. (T343536)]]
[21:51:11] <stashbot>	 T343536: [M] Create v1 of Special:Contact page for account vanish requests - https://phabricator.wikimedia.org/T343536
[21:51:13] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10SRE Observability: Q#:rack/setup/install logging-hd100[123] - https://phabricator.wikimedia.org/T355700#9570268 (10VRiley-WMF) logging-hd1001 Rack A7 U 32  logging-hd1002 Rack B7 U 21  logging-hd1003 Rack D7 U26
[21:51:28] <wikibugs>	 (03CR) 10Clare Ming: [C: 03+2] Improve chunked upload jobs and abort assemble job if already in progress [core] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005698 (https://phabricator.wikimedia.org/T200820) (owner: 10Brian Wolff)
[21:52:06] <cjming>	 jan_drewniak: should be live!
[21:53:09] <jan_drewniak>	 awesome, thanks!
[21:53:27] <cjming>	 yw!
[21:54:15] <cjming>	 dbrant: getting your 1st patch out to test servers - just waiting like before - your next patch should go quick
[21:54:50] <cjming>	 bawolff: went ahead and +2'd your backport in the meantime
[21:56:12] <bawolff>	 👍
[22:02:39] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P57778 and previous config saved to /var/cache/conftool/dbconfig/20240222-220238-arnaudb.json
[22:05:44] <logmsgbot>	 !log cjming@deploy2002 dbrant and cjming: Backport for [[gerrit:1005696|Add verbiage for Account Vanishing contact page. (T343536)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[22:05:48] <cjming>	 dbrant: wanna test? lmk if i should sync
[22:06:02] <stashbot>	 T343536: [M] Create v1 of Special:Contact page for account vanish requests - https://phabricator.wikimedia.org/T343536
[22:06:03] <dbrant>	 cjming: all good!
[22:06:11] <cjming>	 cool - syncing
[22:06:14] <logmsgbot>	 !log cjming@deploy2002 dbrant and cjming: Continuing with sync
[22:09:25] <wikibugs>	 (03Merged) 10jenkins-bot: Improve chunked upload jobs and abort assemble job if already in progress [core] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005698 (https://phabricator.wikimedia.org/T200820) (owner: 10Brian Wolff)
[22:16:25] <dbrant>	 seeing it live.
[22:17:18] <cjming>	 cool ! just waiting for php restarts
[22:17:45] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P57779 and previous config saved to /var/cache/conftool/dbconfig/20240222-221745-arnaudb.json
[22:18:32] <cjming>	 thanks for your patience dbrant, bawolff - we're going over but since it's thurs, i'll finish the queue - the rest should go quick
[22:18:52] <bawolff>	 ty :)
[22:18:53] <logmsgbot>	 !log cjming@deploy2002 Finished scap: Backport for [[gerrit:1005696|Add verbiage for Account Vanishing contact page. (T343536)]] (duration: 27m 47s)
[22:18:59] <stashbot>	 T343536: [M] Create v1 of Special:Contact page for account vanish requests - https://phabricator.wikimedia.org/T343536
[22:19:13] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by cjming@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005824 (https://phabricator.wikimedia.org/T343536) (owner: 10Dbrant)
[22:19:55] <wikibugs>	 (03Merged) 10jenkins-bot: testwiki: Allow modifying email in account vanishing contact form. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005824 (https://phabricator.wikimedia.org/T343536) (owner: 10Dbrant)
[22:20:21] <logmsgbot>	 !log cjming@deploy2002 Started scap: Backport for [[gerrit:1005824|testwiki: Allow modifying email in account vanishing contact form. (T343536)]]
[22:21:47] <logmsgbot>	 !log cjming@deploy2002 cjming and dbrant: Backport for [[gerrit:1005824|testwiki: Allow modifying email in account vanishing contact form. (T343536)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[22:21:53] <cjming>	 dbrant: 1st patch should be live everywhere, 2nd patch on test servers if you want to check
[22:22:11] <dbrant>	 cjming: looks good
[22:22:16] <logmsgbot>	 !log cjming@deploy2002 cjming and dbrant: Continuing with sync
[22:30:19] <logmsgbot>	 !log cjming@deploy2002 Finished scap: Backport for [[gerrit:1005824|testwiki: Allow modifying email in account vanishing contact form. (T343536)]] (duration: 09m 58s)
[22:30:30] <stashbot>	 T343536: [M] Create v1 of Special:Contact page for account vanish requests - https://phabricator.wikimedia.org/T343536
[22:30:38] <logmsgbot>	 !log cjming@deploy2002 Started scap: Backport for [[gerrit:1005698|Improve chunked upload jobs and abort assemble job if already in progress (T200820)]]
[22:30:44] <stashbot>	 T200820: FAILED: stashfailed: Could not read file "mwstore://local-swift-eqiad/local-temp/a/ac/15xi9btm14os.u9p1dr.1208681.webm.0". - https://phabricator.wikimedia.org/T200820
[22:30:51] <cjming>	 dbrant: 2nd patch should be live!
[22:31:14] <dbrant>	 confirmed. thanks so much cjming
[22:31:22] <cjming>	 yw!
[22:31:50] <cjming>	 bawolff: i'm assuming your patch isn't really testable - should i just go ahead and sync?
[22:32:02] <logmsgbot>	 !log cjming@deploy2002 bawolff and cjming: Backport for [[gerrit:1005698|Improve chunked upload jobs and abort assemble job if already in progress (T200820)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[22:32:06] <bawolff>	 correct, it mostly applies to the job queue
[22:32:14] <cjming>	 cool - syncing then
[22:32:16] <logmsgbot>	 !log cjming@deploy2002 bawolff and cjming: Continuing with sync
[22:32:18] <bawolff>	 only after you upload a multi-GB file
[22:32:52] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2164 (T357189)', diff saved to https://phabricator.wikimedia.org/P57780 and previous config saved to /var/cache/conftool/dbconfig/20240222-223251-arnaudb.json
[22:32:54] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
[22:32:59] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[22:33:08] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
[22:33:15] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2166 (T357189)', diff saved to https://phabricator.wikimedia.org/P57781 and previous config saved to /var/cache/conftool/dbconfig/20240222-223314-arnaudb.json
[22:35:36] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2166 (T357189)', diff saved to https://phabricator.wikimedia.org/P57782 and previous config saved to /var/cache/conftool/dbconfig/20240222-223536-arnaudb.json
[22:36:17] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Persistence: Q3:rack/setup/install es10[35-40] - https://phabricator.wikimedia.org/T355269#9570449 (10VRiley-WMF) es1035 WMF10710 FPZGC14 E 5 U 18 CableID 20220092  es1036 WMF10711 DPZGC14 E 6 U 18 CableID 20220057  es1037 WMF10712 CPZGC14 E 7 U 18 CableID 20220096  es1...
[22:40:24] <logmsgbot>	 !log cjming@deploy2002 Finished scap: Backport for [[gerrit:1005698|Improve chunked upload jobs and abort assemble job if already in progress (T200820)]] (duration: 09m 46s)
[22:40:27] <cjming>	 bawolff: should be live!
[22:40:30] <stashbot>	 T200820: FAILED: stashfailed: Could not read file "mwstore://local-swift-eqiad/local-temp/a/ac/15xi9btm14os.u9p1dr.1208681.webm.0". - https://phabricator.wikimedia.org/T200820
[22:40:45] <bawolff>	 ty
[22:40:58] <cjming>	 yw
[22:41:00] <cjming>	 !log end of UTC late backport window
[22:41:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:50:43] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P57783 and previous config saved to /var/cache/conftool/dbconfig/20240222-225042-arnaudb.json
[23:01:13] <wikibugs>	 (03CR) 10Eevans: [C: 03+2] restbase: provision restbase1035-{a,b,c} (new) [puppet] - 10https://gerrit.wikimedia.org/r/1005591 (https://phabricator.wikimedia.org/T354560) (owner: 10Eevans)
[23:05:49] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P57784 and previous config saved to /var/cache/conftool/dbconfig/20240222-230549-arnaudb.json
[23:10:08] <wikibugs>	 10SRE-OnFire, 10Incident Tooling: introducing corto internal incident response workflow automation - https://phabricator.wikimedia.org/T356790#9570525 (10jhathaway)
[23:20:56] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2166 (T357189)', diff saved to https://phabricator.wikimedia.org/P57785 and previous config saved to /var/cache/conftool/dbconfig/20240222-232056-arnaudb.json
[23:20:58] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
[23:21:02] <stashbot>	 T357189: Drop iwl_prefix_from_title from iwlinks - https://phabricator.wikimedia.org/T357189
[23:21:09] <wikibugs>	 10SRE-swift-storage, 10Commons, 10UploadWizard: Incomplete files uploaded (10 MB interruption) - https://phabricator.wikimedia.org/T350917#9570554 (10Bawolff) >>! In T350917#9358944, @MatthewVernon wrote: > Picking a recent failure: > ` > mvernon@cumin1001:~$ sudo cumin -x --force --no-progress --no-color -o...
[23:21:12] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
[23:21:19] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depooling db2167 (T357189)', diff saved to https://phabricator.wikimedia.org/P57786 and previous config saved to /var/cache/conftool/dbconfig/20240222-232118-arnaudb.json
[23:21:20] <wikibugs>	 (03PS1) 10Tim Starling: OCR: Add HTTP proxy config [extensions/Wikisource] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005700 (https://phabricator.wikimedia.org/T357857)
[23:21:51] <wikibugs>	 (03CR) 10Tim Starling: [C: 03+2] OCR: Add HTTP proxy config [extensions/Wikisource] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005700 (https://phabricator.wikimedia.org/T357857) (owner: 10Tim Starling)
[23:22:21] <wikibugs>	 (03PS2) 10Tim Starling: CommonSettings: Set $wgWikisourceHttpProxy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005434 (https://phabricator.wikimedia.org/T357857) (owner: 10Samwilson)
[23:22:41] <wikibugs>	 (03CR) 10Tim Starling: [C: 03+2] CommonSettings: Set $wgWikisourceHttpProxy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005434 (https://phabricator.wikimedia.org/T357857) (owner: 10Samwilson)
[23:23:39] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2167 (T357189)', diff saved to https://phabricator.wikimedia.org/P57787 and previous config saved to /var/cache/conftool/dbconfig/20240222-232338-arnaudb.json
[23:27:51] <wikibugs>	 (03PS1) 10Zabe: block: Pass wikiId to DatabaseBlock::getId in DatabaseBlockStore [core] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005701 (https://phabricator.wikimedia.org/T358208)
[23:27:58] <zabe>	 jouncebot: nowandnext
[23:27:58] <jouncebot>	 No deployments scheduled for the next 7 hour(s) and 32 minute(s)
[23:27:58] <jouncebot>	 In 7 hour(s) and 32 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240223T0700)
[23:28:05] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] block: Pass wikiId to DatabaseBlock::getId in DatabaseBlockStore [core] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005701 (https://phabricator.wikimedia.org/T358208) (owner: 10Zabe)
[23:30:37] <wikibugs>	 (03CR) 10Tim Starling: [C: 03+2] "..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005434 (https://phabricator.wikimedia.org/T357857) (owner: 10Samwilson)
[23:32:05] <zabe>	 TimStarling: could you ping me when you are done with backporting?
[23:34:31] <icinga-wm_>	 PROBLEM - cassandra-a SSL 10.64.0.130:7000 on restbase1035 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
[23:35:32] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase1035.eqiad.wmnet with reason: Bootstrapping — T354560
[23:35:37] <TimStarling>	 yes
[23:35:38] <stashbot>	 T354560: Provision new RESTBase cluster nodes: restbase10[34-42] - https://phabricator.wikimedia.org/T354560
[23:35:45] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1035.eqiad.wmnet with reason: Bootstrapping — T354560
[23:36:28] <TimStarling>	 CI taking ages -- apparently it needs to run extension tests from 53 extensions in order to merge this change
[23:38:45] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P57788 and previous config saved to /var/cache/conftool/dbconfig/20240222-233845-arnaudb.json
[23:39:18] <TimStarling>	 not sure why gate checks didn't run on the config patch but the main test build looks the same so I'm probably going to hit the submit button
[23:40:27] <wikibugs>	 (03Merged) 10jenkins-bot: OCR: Add HTTP proxy config [extensions/Wikisource] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005700 (https://phabricator.wikimedia.org/T357857) (owner: 10Tim Starling)
[23:43:16] <wikibugs>	 (03CR) 10Tim Starling: [C: 03+2] InitializeSettings: Add Wikisource logging channel to prod and labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005435 (https://phabricator.wikimedia.org/T357857) (owner: 10Samwilson)
[23:46:05] <wikibugs>	 (03Merged) 10jenkins-bot: block: Pass wikiId to DatabaseBlock::getId in DatabaseBlockStore [core] (wmf/1.42.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1005701 (https://phabricator.wikimedia.org/T358208) (owner: 10Zabe)
[23:47:07] <wikibugs>	 (03CR) 10Tim Starling: InitializeSettings: Add Wikisource logging channel to prod and labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005435 (https://phabricator.wikimedia.org/T357857) (owner: 10Samwilson)
[23:47:12] <wikibugs>	 (03PS2) 10Tim Starling: InitializeSettings: Add Wikisource logging channel to prod and labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005435 (https://phabricator.wikimedia.org/T357857) (owner: 10Samwilson)
[23:48:32] <wikibugs>	 (03CR) 10Tim Starling: [C: 03+2] InitializeSettings: Add Wikisource logging channel to prod and labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005435 (https://phabricator.wikimedia.org/T357857) (owner: 10Samwilson)
[23:49:18] <wikibugs>	 (03Merged) 10jenkins-bot: InitializeSettings: Add Wikisource logging channel to prod and labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1005435 (https://phabricator.wikimedia.org/T357857) (owner: 10Samwilson)
[23:49:55] <logmsgbot>	 !log tstarling@deploy2002 Started scap: (no justification provided)
[23:53:52] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P57789 and previous config saved to /var/cache/conftool/dbconfig/20240222-235351-arnaudb.json
[23:57:18] <wikibugs>	 10SRE-swift-storage, 10Commons, 10UploadWizard: Incomplete files uploaded (10 MB interruption) - https://phabricator.wikimedia.org/T350917#9570643 (10Bawolff) More recent example is File:Delft_Van_Miereveltlaan_7.jpg (aka 1aqdj7jclxmc.afcsnk.1553787.jpg aka 1aqdj6vvp0a8.a3pwzj.1553787.jpg  ) which is cut off...
[23:59:23] <icinga-wm_>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-web_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[23:59:36] <logmsgbot>	 !log tstarling@deploy2002 Finished scap: (no justification provided) (duration: 09m 40s)
[23:59:46] <wikibugs>	 10SRE-swift-storage, 10Commons, 10UploadWizard: Incomplete files uploaded - chunked upload drops last chunk. - https://phabricator.wikimedia.org/T350917#9570654 (10Bawolff)
[23:59:53] <TimStarling>	 ^zabe