[00:05:37] <icinga-wm>	 PROBLEM - Check systemd state on grafana1002 is CRITICAL: CRITICAL - degraded: The following units failed: grafana-ldap-users-sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:12:57] <wikibugs>	 10SRE, 10Observability-Logging, 10Wikimedia-Logstash, 10observability, 10serviceops: ensure httpd error logs from "misc apps" (krypton) end up in logstash - https://phabricator.wikimedia.org/T216090 (10lmata) Thank you for the update this will help for backlog priorities setting.
[00:24:37] <icinga-wm>	 RECOVERY - SSH on kubernetes1004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:26:33] <icinga-wm>	 RECOVERY - SSH on ms-fe2008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:32:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[00:39:43] <icinga-wm>	 PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:42:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[00:47:37] <icinga-wm>	 PROBLEM - Check systemd state on logstash2026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:48:39] <icinga-wm>	 PROBLEM - Check systemd state on logstash1026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:50:25] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=sidekiq site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:55:11] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[01:35:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[01:40:59] <icinga-wm>	 RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:45:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[02:07:33] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.38.0-wmf.18 [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754606
[02:07:35] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/1.38.0-wmf.18 [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754606 (owner: 10TrainBranchBot)
[02:13:03] <wikibugs>	 10SRE-swift-storage, 10MW-on-K8s, 10Shellbox, 10serviceops, and 2 others: Support large files in Shellbox - https://phabricator.wikimedia.org/T292322 (10tstarling) > 52 seconds in Shellbox\Client::computeHmac over 3 calls, I guess all signatures for the remote shellbox calls  I benchmarked the SHA-256 HMAC...
[02:23:40] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[02:23:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:24:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Branch commit for wmf/1.38.0-wmf.18 [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754606 (owner: 10TrainBranchBot)
[02:26:39] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.38.0-wmf.18 [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754606 (owner: 10TrainBranchBot)
[02:26:57] <icinga-wm>	 PROBLEM - SSH on mw2254.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:29:54] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[02:29:55] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[02:29:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:29:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:33:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[02:36:21] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[02:36:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:41:23] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[02:41:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:43:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[02:48:08] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[02:48:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[02:48:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:48:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:54:29] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[02:54:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:08:39] <icinga-wm>	 PROBLEM - SSH on mw2257.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:09:47] <icinga-wm>	 PROBLEM - SSH on mw2258.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:13:29] <wikibugs>	 10SRE, 10Analytics, 10Data-Engineering, 10Event-Platform, 10Sustainability (Incident Followup): Pool eventgate-main in both datacenters (active/active) - https://phabricator.wikimedia.org/T296699 (10Ottomata) 05Open→03Resolved a:03Ottomata Yup should be!
[03:28:55] <icinga-wm>	 PROBLEM - SSH on kubernetes1004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:34:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[03:39:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[04:11:01] <icinga-wm>	 RECOVERY - SSH on mw2258.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:12:04] <wikibugs>	 (03PS1) 104nn1l2: commonswiki: Add peerj.com to wgCopyUploadsDomains whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754612 (https://phabricator.wikimedia.org/T299247)
[04:29:33] <icinga-wm>	 RECOVERY - SSH on mw2254.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:30:11] <icinga-wm>	 RECOVERY - SSH on kubernetes1004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:35:07] <wikibugs>	 (03PS1) 104nn1l2: azwiki: Add draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754613 (https://phabricator.wikimedia.org/T299332)
[04:37:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[04:42:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[05:32:05] <wikibugs>	 (03PS1) 10KartikMistry: Update apertium to 2022-01-18-052631-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/754614 (https://phabricator.wikimedia.org/T218184)
[05:35:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[05:41:28] * kart_ deploying Apertium service..
[05:42:25] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Update apertium to 2022-01-18-052631-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/754614 (https://phabricator.wikimedia.org/T218184) (owner: 10KartikMistry)
[05:45:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[05:45:55] <wikibugs>	 (03Merged) 10jenkins-bot: Update apertium to 2022-01-18-052631-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/754614 (https://phabricator.wikimedia.org/T218184) (owner: 10KartikMistry)
[05:46:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove watchlist group from s3 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P18764 and previous config saved to /var/cache/conftool/dbconfig/20220118-054659-marostegui.json
[05:47:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:47:04] <stashbot>	 T263127: Remove groups from db configs - https://phabricator.wikimedia.org/T263127
[05:48:04] <wikibugs>	 (03PS1) 10Marostegui: Revert "dbproxy1017: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/754590
[05:49:02] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] START helmfile.d/services/apertium: apply on staging
[05:49:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:49:04] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] DONE helmfile.d/services/apertium: apply on production
[05:49:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:49:10] <wikibugs>	 (03PS1) 10Marostegui: Revert "dbproxy1015: Reimage to Bullseye" [puppet] - 10https://gerrit.wikimedia.org/r/754591
[05:49:36] <logmsgbot>	 !log kartik@deploy1002 helmfile [staging] DONE helmfile.d/services/apertium: sync on staging
[05:49:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:49:50] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "dbproxy1017: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/754590 (owner: 10Marostegui)
[05:49:59] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "dbproxy1015: Reimage to Bullseye" [puppet] - 10https://gerrit.wikimedia.org/r/754591 (owner: 10Marostegui)
[05:51:40] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] START helmfile.d/services/apertium: apply on production
[05:51:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:51:43] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] DONE helmfile.d/services/apertium: apply on staging
[05:51:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:53:20] <logmsgbot>	 !log kartik@deploy1002 helmfile [codfw] DONE helmfile.d/services/apertium: sync on production
[05:53:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:54:01] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] START helmfile.d/services/apertium: apply on production
[05:54:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:54:04] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] DONE helmfile.d/services/apertium: apply on staging
[05:54:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:54:38] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] START helmfile.d/services/apertium: apply on production
[05:54:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:54:40] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] DONE helmfile.d/services/apertium: apply on staging
[05:54:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:55:01] <kart_>	 Uh oh. Too much logging?
[05:56:19] <logmsgbot>	 !log kartik@deploy1002 helmfile [eqiad] DONE helmfile.d/services/apertium: sync on production
[05:56:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:57:36] <wikibugs>	 (03PS1) 10Marostegui: pc1014: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/754784 (https://phabricator.wikimedia.org/T299046)
[05:57:56] <kart_>	 Looks OK then! Apertium service is on Bullseye now!
[05:58:51] <kart_>	 !log Update apertium to 2022-01-18-052631-production (T218184, T202276, T218184, T270061, T248653, T248293, T248812, T248654)
[05:59:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:59:02] <stashbot>	 T248293: Update apertium-af-nl package - https://phabricator.wikimedia.org/T248293
[05:59:03] <stashbot>	 T202276: Package apertium-pol-szl (Polish-Silesian) - https://phabricator.wikimedia.org/T202276
[05:59:03] <stashbot>	 T248812: Update apertium-bel-rus package - https://phabricator.wikimedia.org/T248812
[05:59:03] <stashbot>	 T270061: Update apertium-ita-srd (Italian-Sardinian) - https://phabricator.wikimedia.org/T270061
[05:59:03] <stashbot>	 T218184: Update apertium-nno-nob, apertium-swe-dan, apertium-swe-nor and apertium-dan-nor packages - https://phabricator.wikimedia.org/T218184
[05:59:04] <stashbot>	 T248653: Update apertium-id-ms to 0.1.2 - https://phabricator.wikimedia.org/T248653
[05:59:04] <stashbot>	 T248654: Update apertium-ca-it to 0.2.1 - https://phabricator.wikimedia.org/T248654
[05:59:43] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] pc1014: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/754784 (https://phabricator.wikimedia.org/T299046) (owner: 10Marostegui)
[06:00:58] <wikibugs>	 (03PS1) 10Marostegui: pc1014: Move it to pc2 [puppet] - 10https://gerrit.wikimedia.org/r/754805 (https://phabricator.wikimedia.org/T299046)
[06:01:47] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] pc1014: Move it to pc2 [puppet] - 10https://gerrit.wikimedia.org/r/754805 (https://phabricator.wikimedia.org/T299046) (owner: 10Marostegui)
[06:02:08] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host pc1014.eqiad.wmnet with OS bullseye
[06:02:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:13:07] <wikibugs>	 (03PS1) 10Marostegui: realm.pp: Add ipinfo_ip_changes to private tables [puppet] - 10https://gerrit.wikimedia.org/r/754806 (https://phabricator.wikimedia.org/T297696)
[06:13:23] <logmsgbot>	 !log marostegui@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1014.eqiad.wmnet with OS bullseye
[06:13:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:23:09] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host pc1014.eqiad.wmnet with OS bullseye
[06:23:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:34:07] <logmsgbot>	 !log marostegui@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1014.eqiad.wmnet with OS bullseye
[06:34:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:34:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[06:44:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[06:46:54] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on pc1014 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.64.48.89. Check system logs on 10.64.48.89 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T299376 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[06:47:02] <wikibugs>	 10SRE, 10ops-eqiad: Degraded RAID on pc1014 - https://phabricator.wikimedia.org/T299376 (10ops-monitoring-bot)
[06:47:40] <wikibugs>	 10SRE, 10ops-eqiad: Degraded RAID on pc1014 - https://phabricator.wikimedia.org/T299376 (10Marostegui) 05Open→03Declined The host is being reimaged
[06:52:25] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: service::monitor: do not include services with just probes [puppet] - 10https://gerrit.wikimedia.org/r/754808
[06:53:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] service::monitor: do not include services with just probes [puppet] - 10https://gerrit.wikimedia.org/r/754808 (owner: 10Giuseppe Lavagetto)
[06:54:42] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: service::monitor: do not include services with just probes [puppet] - 10https://gerrit.wikimedia.org/r/754808
[06:57:45] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: service::monitor: do not include services with just probes [puppet] - 10https://gerrit.wikimedia.org/r/754808
[07:01:07] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] service::monitor: do not include services with just probes [puppet] - 10https://gerrit.wikimedia.org/r/754808 (owner: 10Giuseppe Lavagetto)
[07:09:42] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host pc1014.eqiad.wmnet with OS bullseye
[07:09:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:27:13] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubestagemaster2001 is CRITICAL: instance=10.192.48.10 verb={CREATE,UPDATE} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27
[07:27:33] <icinga-wm>	 PROBLEM - SSH on restbase2010.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:29:35] <icinga-wm>	 RECOVERY - k8s API server requests latencies on kubestagemaster2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27
[07:32:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[07:36:10] <wikibugs>	 (03PS15) 10Elukey: kafka: add check to test the Broker's TLS port [puppet] - 10https://gerrit.wikimedia.org/r/753738
[07:36:35] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1014.eqiad.wmnet with OS bullseye
[07:36:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:36:58] <wikibugs>	 (03PS16) 10Elukey: kafka: add check to test the Broker's TLS port [puppet] - 10https://gerrit.wikimedia.org/r/753738
[07:38:21] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33287/console" [puppet] - 10https://gerrit.wikimedia.org/r/753738 (owner: 10Elukey)
[07:42:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[07:47:50] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "elukey@alert1001:/usr/lib/nagios/plugins$ ./check_ssl -H kafka-main1001.eqiad.wmnet -p 9093 -w 30 -c 30" [puppet] - 10https://gerrit.wikimedia.org/r/753738 (owner: 10Elukey)
[07:48:56] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] realm.pp: Add ipinfo_ip_changes to private tables [puppet] - 10https://gerrit.wikimedia.org/r/754806 (https://phabricator.wikimedia.org/T297696) (owner: 10Marostegui)
[07:49:18] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] realm.pp: Add ipinfo_ip_changes to private tables [puppet] - 10https://gerrit.wikimedia.org/r/754806 (https://phabricator.wikimedia.org/T297696) (owner: 10Marostegui)
[07:53:09] <wikibugs>	 (03PS17) 10Elukey: kafka: add check to test the Broker's TLS port [puppet] - 10https://gerrit.wikimedia.org/r/753738
[07:53:51] <wikibugs>	 (03PS18) 10Elukey: kafka: add check to test the Broker's TLS port [puppet] - 10https://gerrit.wikimedia.org/r/753738
[07:54:21] <wikibugs>	 (03CR) 10Elukey: "Adding Filippo for the nagios part :)" [puppet] - 10https://gerrit.wikimedia.org/r/753738 (owner: 10Elukey)
[07:54:33] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33289/console" [puppet] - 10https://gerrit.wikimedia.org/r/753738 (owner: 10Elukey)
[07:57:12] <wikibugs>	 (03CR) 10Elukey: [V: 03+1] kafka: add check to test the Broker's TLS port (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/753738 (owner: 10Elukey)
[08:04:25] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Drop 'inline-media-caption' lint requests [extensions/Linter] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754144 (https://phabricator.wikimedia.org/T297443) (owner: 10Subramanya Sastry)
[08:05:33] <Amir1>	 I'll be deploying several backports
[08:07:12] <wikibugs>	 (03Merged) 10jenkins-bot: Drop 'inline-media-caption' lint requests [extensions/Linter] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754144 (https://phabricator.wikimedia.org/T297443) (owner: 10Subramanya Sastry)
[08:09:00] <icinga-wm>	 RECOVERY - HTTPS-wmfusercontent on phab.wmfusercontent.org is OK: SSL OK - Certificate *.wikipedia.org valid until 2022-04-11 07:59:19 +0000 (expires in 82 days) https://phabricator.wikimedia.org/tag/phabricator/
[08:10:26] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] "This change is ready for review." [extensions/ProofreadPage] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754598 (https://phabricator.wikimedia.org/T292300) (owner: 10Ladsgroup)
[08:12:30] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[08:12:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:12:50] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized php-1.38.0-wmf.17/extensions/Linter/includes/RecordLintJob.php: Backport: [[gerrit:754144|Drop 'inline-media-caption' lint requests (T297443 T299302)]] (duration: 00m 52s)
[08:12:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:12:55] <stashbot>	 T299302: Linter jobs are running slowly - https://phabricator.wikimedia.org/T299302
[08:12:55] <stashbot>	 T297443: Add a linter category for inline images with captions - https://phabricator.wikimedia.org/T297443
[08:13:28] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[08:13:29] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[08:13:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:13:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:17:07] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[08:17:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:19:46] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Enable ganeti 2.16 in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/754540 (https://phabricator.wikimedia.org/T296721) (owner: 10Muehlenhoff)
[08:20:35] <Amir1>	 !log cleaning up commons linter errors T298782
[08:20:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:20:38] <stashbot>	 T298782: Linter seems to be not cleaning up after page deletion - https://phabricator.wikimedia.org/T298782
[08:21:10] <wikibugs>	 (03PS1) 10Marostegui: ProductionServices.php: Promote pc1014 to pc2 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754864 (https://phabricator.wikimedia.org/T299046)
[08:23:00] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Promote pc1014 to pc2 master [puppet] - 10https://gerrit.wikimedia.org/r/754865 (https://phabricator.wikimedia.org/T299046)
[08:24:30] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] ProductionServices.php: Promote pc1014 to pc2 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754864 (https://phabricator.wikimedia.org/T299046) (owner: 10Marostegui)
[08:28:18] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] ProductionServices.php: Promote pc1014 to pc2 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754864 (https://phabricator.wikimedia.org/T299046) (owner: 10Marostegui)
[08:28:24] <wikibugs>	 (03Merged) 10jenkins-bot: Use fillParserOutputInternal instead of getParserOutput. [extensions/ProofreadPage] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754598 (https://phabricator.wikimedia.org/T292300) (owner: 10Ladsgroup)
[08:28:34] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Promote pc1014 to pc2 master [puppet] - 10https://gerrit.wikimedia.org/r/754865 (https://phabricator.wikimedia.org/T299046) (owner: 10Marostegui)
[08:29:01] <wikibugs>	 (03Merged) 10jenkins-bot: ProductionServices.php: Promote pc1014 to pc2 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754864 (https://phabricator.wikimedia.org/T299046) (owner: 10Marostegui)
[08:29:20] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Overall lgtm, but we need to work on the regexes." [puppet] - 10https://gerrit.wikimedia.org/r/724049 (https://phabricator.wikimedia.org/T205361) (owner: 10Majavah)
[08:30:28] <logmsgbot>	 !log marostegui@deploy1002 Synchronized wmf-config/ProductionServices.php: Promote pc1014 to master in pc2 T299046 (duration: 00m 51s)
[08:30:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:32] <stashbot>	 T299046: Upgrade parsercache infra to Bullseye - https://phabricator.wikimedia.org/T299046
[08:32:16] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[08:32:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:32:57] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host pc1012.eqiad.wmnet with OS bullseye
[08:32:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:33:50] <wikibugs>	 (03CR) 10Matthias Mullie: [C: 03+1] "Ready to be deployed!" [extensions/MediaSearch] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/753487 (owner: 10Cparle)
[08:36:51] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[08:36:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[08:36:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:36:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:36:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[08:37:45] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized php-1.38.0-wmf.17/extensions/ProofreadPage/includes/Page/PageContentHandler.php: Backport: [[gerrit:754598|Use fillParserOutputInternal instead of getParserOutput. (T292300)]] (duration: 00m 51s)
[08:37:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:37:48] <stashbot>	 T292300: Eliminate unnecessary duplicate parses - https://phabricator.wikimedia.org/T292300
[08:38:10] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[08:38:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:31] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on build2001.codfw.wmnet with reason: reinstallation
[08:42:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:34] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on build2001.codfw.wmnet with reason: reinstallation
[08:42:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:43:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[08:43:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:43:17] <wikibugs>	 (03PS1) 10Ladsgroup: watcheditem: Try getting the cached version in resetNotificationTimestamp [core] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754599
[08:43:24] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] watcheditem: Try getting the cached version in resetNotificationTimestamp [core] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754599 (owner: 10Ladsgroup)
[08:44:24] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] kafka: add check to test the Broker's TLS port [puppet] - 10https://gerrit.wikimedia.org/r/753738 (owner: 10Elukey)
[08:46:31] <wikibugs>	 10SRE-swift-storage, 10MW-on-K8s, 10Shellbox, 10serviceops, and 2 others: Support large files in Shellbox - https://phabricator.wikimedia.org/T292322 (10Joe) >>! In T292322#7627149, @tstarling wrote: >> 52 seconds in Shellbox\Client::computeHmac over 3 calls, I guess all signatures for the remote shellbox...
[08:46:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[08:47:41] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[08:47:43] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[08:47:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:47:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:48:28] <wikibugs>	 10SRE, 10Analytics-Radar: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10akosiaris) This is starting to show up rather frequently, so I am wondering whether it is starting to consume enough time to warrant solving it somehow. Finding the race might prove...
[08:49:04] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[08:49:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:51:06] <wikibugs>	 (03CR) 10Elukey: [V: 03+1 C: 03+2] "Going to test this!" [puppet] - 10https://gerrit.wikimedia.org/r/753738 (owner: 10Elukey)
[08:52:17] <wikibugs>	 (03PS1) 10Marostegui: Revert "ProductionServices.php: Promote pc1014 to pc2 master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754600
[08:52:29] <wikibugs>	 (03PS1) 10Marostegui: Revert "mariadb: Promote pc1014 to pc2 master" [puppet] - 10https://gerrit.wikimedia.org/r/754601
[08:55:06] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] Revert "ProductionServices.php: Promote pc1014 to pc2 master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754600 (owner: 10Marostegui)
[08:55:15] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.renew-cert for build2001.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
[08:55:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:55:27] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for build2001.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
[08:55:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:55:58] <taavi>	 Amir1: ping me when done backporting, please?
[08:56:04] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics clients for mfossati - https://phabricator.wikimedia.org/T299343 (10Jelto)
[08:56:15] <Amir1>	 sure
[08:56:53] <wikibugs>	 (03PS2) 10Volans: redfish: improve support for DRY-RUN mode [software/spicerack] - 10https://gerrit.wikimedia.org/r/749852
[08:57:29] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1012.eqiad.wmnet with OS bullseye
[08:57:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:01:55] <wikibugs>	 (03Merged) 10jenkins-bot: watcheditem: Try getting the cached version in resetNotificationTimestamp [core] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754599 (owner: 10Ladsgroup)
[09:04:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[09:04:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[09:05:19] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[09:05:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:06:40] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized php-1.38.0-wmf.17/includes/watcheditem/WatchedItemStore.php: Backport: [[gerrit:754599|watcheditem: Try getting the cached version in resetNotificationTimestamp]] (duration: 00m 51s)
[09:06:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:08:21] <wikibugs>	 (03CR) 10Volans: "addressed comment" [software/spicerack] - 10https://gerrit.wikimedia.org/r/749852 (owner: 10Volans)
[09:09:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[09:09:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:08] <wikibugs>	 (03PS1) 10Ladsgroup: page: Use MainObjectStash instead of 'db-replicated' cache [core] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754602 (https://phabricator.wikimedia.org/T272512)
[09:10:28] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] page: Use MainObjectStash instead of 'db-replicated' cache [core] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754602 (https://phabricator.wikimedia.org/T272512) (owner: 10Ladsgroup)
[09:11:59] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Migrate eqiad Ganeti cluster to Buster - https://phabricator.wikimedia.org/T296721 (10MoritzMuehlenhoff)
[09:14:24] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=sidekiq site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:15:11] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Disable "inline-media-caption" category [extensions/Linter] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754145 (https://phabricator.wikimedia.org/T297443) (owner: 10Subramanya Sastry)
[09:16:10] <wikibugs>	 (03CR) 10Volans: "post merge nit inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/751228 (https://phabricator.wikimedia.org/T239814) (owner: 10Ladsgroup)
[09:18:54] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:28:08] <wikibugs>	 (03Merged) 10jenkins-bot: page: Use MainObjectStash instead of 'db-replicated' cache [core] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754602 (https://phabricator.wikimedia.org/T272512) (owner: 10Ladsgroup)
[09:28:13] <wikibugs>	 (03Merged) 10jenkins-bot: Disable "inline-media-caption" category [extensions/Linter] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754145 (https://phabricator.wikimedia.org/T297443) (owner: 10Subramanya Sastry)
[09:31:16] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized php-1.38.0-wmf.17/extensions/Linter/extension.json: Backport: [[gerrit:754145|Disable "inline-media-caption" category (T297443)]] (duration: 00m 51s)
[09:31:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:31:20] <stashbot>	 T297443: Add a linter category for inline images with captions - https://phabricator.wikimedia.org/T297443
[09:32:41] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized php-1.38.0-wmf.17/includes: Backport: [[gerrit:754602|page: Use MainObjectStash instead of 'db-replicated' cache (T272512)]] (duration: 00m 56s)
[09:32:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:32:45] <stashbot>	 T272512: Apply outstanding schema changes for "objectcache" tables in production (exptime, flags, modtoken) - https://phabricator.wikimedia.org/T272512
[09:33:18] <icinga-wm>	 PROBLEM - Kafka broker TLS certificate validity on kafka-test1009 is CRITICAL: SSL CRITICAL - Certificate kafka-test1009.eqiad.wmnet valid until 2022-01-29 19:27:00 +0000 (expires in 11 days) https://wikitech.wikimedia.org/wiki/Kafka/Administration%23Renew_TLS_certificate
[09:33:51] <Amir1>	 taavi: the floor is yours
[09:33:53] <taavi>	 thanks
[09:34:03] <wikibugs>	 (03PS2) 10Majavah: Enable temporary global user groups on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/752344 (https://phabricator.wikimedia.org/T153815)
[09:34:14] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] Enable temporary global user groups on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/752344 (https://phabricator.wikimedia.org/T153815) (owner: 10Majavah)
[09:34:22] <wikibugs>	 (03PS1) 10Joal: Add an-test-cord1001 to analytics rsync allow list [puppet] - 10https://gerrit.wikimedia.org/r/754869
[09:34:27] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[09:34:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:35:00] <icinga-wm>	 PROBLEM - SSH on kubernetes1004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:35:15] <wikibugs>	 (03Merged) 10jenkins-bot: Enable temporary global user groups on production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/752344 (https://phabricator.wikimedia.org/T153815) (owner: 10Majavah)
[09:35:19] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[09:35:21] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[09:35:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:35:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:35:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[09:36:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[09:36:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:36:26] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Add an-test-cord1001 to analytics rsync allow list [puppet] - 10https://gerrit.wikimedia.org/r/754869 (owner: 10Joal)
[09:38:12] <icinga-wm>	 PROBLEM - Kafka broker TLS certificate validity on kafka-test1007 is CRITICAL: SSL CRITICAL - Certificate kafka-test1007.eqiad.wmnet valid until 2022-01-29 19:16:00 +0000 (expires in 11 days) https://wikitech.wikimedia.org/wiki/Kafka/Administration%23Renew_TLS_certificate
[09:41:03] <logmsgbot>	 !log taavi@deploy1002 Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:752344|Enable temporary global user groups on production (T153815)]] (duration: 00m 51s)
[09:41:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:41:06] <stashbot>	 T153815: Allow global groups to be assigned temporarily (expire) - https://phabricator.wikimedia.org/T153815
[09:41:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[09:41:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:44:24] <zabe>	 taavi: hey, would you at some point be willing to run the maintenance script for fixing the wrong entries in the globalblocks table?
[09:44:37] <taavi>	 zabe: sure, thanks for reminding me
[09:45:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[09:45:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[09:45:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:45:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[09:45:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:45:57] <elukey>	 the kafka-test tls broker etc.. are my fault, new alert
[09:47:11] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[09:47:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:50:01] <moritzm>	 !log installing ganeti 2.16.0-1~bpo9+1+wmf1 on ganeti/eqiad servers T296721
[09:50:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:50:05] <stashbot>	 T296721: Migrate eqiad Ganeti cluster to Buster - https://phabricator.wikimedia.org/T296721
[09:50:12] <icinga-wm>	 PROBLEM - Kafka broker TLS certificate validity on kafka-test1010 is CRITICAL: SSL CRITICAL - Certificate kafka-test1010.eqiad.wmnet valid until 2022-01-29 19:09:00 +0000 (expires in 11 days) https://wikitech.wikimedia.org/wiki/Kafka/Administration%23Renew_TLS_certificate
[09:50:23] <taavi>	 !log mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "QuiteUnusual" "MarcGarver" # T298707
[09:50:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:50:27] <stashbot>	 T298707: "InvalidArgumentException: Blocker must be a local user" from GlobalBlocking - https://phabricator.wikimedia.org/T298707
[09:50:31] <taavi>	 zabe: ^ was that the only case?
[09:52:29] <zabe>	 as far as I know
[09:52:46] <icinga-wm>	 PROBLEM - Kafka broker TLS certificate validity on kafka-test1008 is CRITICAL: SSL CRITICAL - Certificate kafka-test1008.eqiad.wmnet valid until 2022-01-29 19:19:00 +0000 (expires in 11 days) https://wikitech.wikimedia.org/wiki/Kafka/Administration%23Renew_TLS_certificate
[09:57:05] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Migrate eqiad Ganeti cluster to Buster - https://phabricator.wikimedia.org/T296721 (10MoritzMuehlenhoff)
[09:57:45] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "ProductionServices.php: Promote pc1014 to pc2 master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754600 (owner: 10Marostegui)
[09:57:49] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "mariadb: Promote pc1014 to pc2 master" [puppet] - 10https://gerrit.wikimedia.org/r/754601 (owner: 10Marostegui)
[09:57:52] <wikibugs>	 (03PS1) 10Elukey: nagios_common: update check_ssl_kafka warning/critical values [puppet] - 10https://gerrit.wikimedia.org/r/754870
[09:58:32] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "ProductionServices.php: Promote pc1014 to pc2 master" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754600 (owner: 10Marostegui)
[09:59:14] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is CRITICAL: 269 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[09:59:39] <logmsgbot>	 !log marostegui@deploy1002 Synchronized wmf-config/ProductionServices.php: Revert: Promote pc1014 to master in pc2 T299046 (duration: 00m 50s)
[09:59:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:59:43] <stashbot>	 T299046: Upgrade parsercache infra to Bullseye - https://phabricator.wikimedia.org/T299046
[10:00:45] <marostegui>	 !log Move pc1014 to pc3 T299046
[10:00:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:01:29] <moritzm>	 !log running gnt-cluster renew-crypto --new-cluster-certificate --new-rapi-certificate --new-spice-certificate for ganeti/eqiad cluster 
[10:01:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:02:21] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[10:02:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:03:27] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[10:03:28] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[10:03:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:03:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:03:33] <wikibugs>	 (03PS1) 10Marostegui: pc1014: Move to pc3 [puppet] - 10https://gerrit.wikimedia.org/r/754871 (https://phabricator.wikimedia.org/T299046)
[10:04:40] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[10:04:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:06:53] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] nagios_common: update check_ssl_kafka warning/critical values [puppet] - 10https://gerrit.wikimedia.org/r/754870 (owner: 10Elukey)
[10:07:08] <icinga-wm>	 PROBLEM - HTTPS Ganeti RAPI eqiad on ganeti1009 is CRITICAL: connect to address ganeti01.svc.eqiad.wmnet and port 5080: Connection refused https://www.mediawiki.org/wiki/Ganeti%23RAPI_daemon
[10:07:22] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti1009 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[10:07:40] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti1009 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[10:07:50] <icinga-wm>	 PROBLEM - ganeti-mond running on ganeti1009 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-mond https://wikitech.wikimedia.org/wiki/Ganeti
[10:08:58] <moritzm>	 ^ these are expected, the daemons are stopped while the certs are regenerated/distributed, will recover soon
[10:09:40] <icinga-wm>	 RECOVERY - ganeti-noded running on ganeti1009 is OK: PROCS OK: 1 process with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[10:11:58] <icinga-wm>	 PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The following units failed: netbox_ganeti_eqiad_sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:13:40] <icinga-wm>	 PROBLEM - Check unit status of netbox_ganeti_eqiad_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_eqiad_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[10:14:02] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] pc1014: Move to pc3 [puppet] - 10https://gerrit.wikimedia.org/r/754871 (https://phabricator.wikimedia.org/T299046) (owner: 10Marostegui)
[10:16:00] <icinga-wm>	 RECOVERY - SSH on mw2257.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:16:30] <icinga-wm>	 RECOVERY - HTTPS Ganeti RAPI eqiad on ganeti1009 is OK: HTTP OK: Status line output matched 401 - 309 bytes in 0.014 second response time https://www.mediawiki.org/wiki/Ganeti%23RAPI_daemon
[10:17:02] <icinga-wm>	 RECOVERY - ganeti-confd running on ganeti1009 is OK: PROCS OK: 1 process with UID = 113 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[10:17:12] <icinga-wm>	 RECOVERY - ganeti-mond running on ganeti1009 is OK: PROCS OK: 1 process with UID = 0 (root), command name ganeti-mond https://wikitech.wikimedia.org/wiki/Ganeti
[10:20:53] <wikibugs>	 (03PS2) 10Ayounsi: LibreNMS report, only log_info devices with no IP [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/753731
[10:21:07] <wikibugs>	 (03PS1) 10Volans: sre.mysql.upgrade: various improvements [cookbooks] - 10https://gerrit.wikimedia.org/r/754872 (https://phabricator.wikimedia.org/T239814)
[10:21:59] <icinga-wm>	 RECOVERY - Kafka broker TLS certificate validity on kafka-test1010 is OK: SSL OK - Certificate kafka-test1010.eqiad.wmnet valid until 2022-01-29 19:09:00 +0000 (expires in 11 days) https://wikitech.wikimedia.org/wiki/Kafka/Administration%23Renew_TLS_certificate
[10:22:23] <icinga-wm>	 RECOVERY - Kafka broker TLS certificate validity on kafka-test1008 is OK: SSL OK - Certificate kafka-test1008.eqiad.wmnet valid until 2022-01-29 19:19:00 +0000 (expires in 11 days) https://wikitech.wikimedia.org/wiki/Kafka/Administration%23Renew_TLS_certificate
[10:22:46] <wikibugs>	 (03PS1) 10Filippo Giunchedi: wmnet: move reads to graphite1004 [dns] - 10https://gerrit.wikimedia.org/r/754874 (https://phabricator.wikimedia.org/T299383)
[10:22:48] <wikibugs>	 (03PS1) 10Filippo Giunchedi: wmnet: move writes to graphite1004 [dns] - 10https://gerrit.wikimedia.org/r/754875 (https://phabricator.wikimedia.org/T299383)
[10:23:39] <icinga-wm>	 RECOVERY - Check unit status of netbox_ganeti_eqiad_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_eqiad_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[10:29:07] <wikibugs>	 (03PS2) 10Volans: sre.mysql.upgrade: various improvements [cookbooks] - 10https://gerrit.wikimedia.org/r/754872 (https://phabricator.wikimedia.org/T239814)
[10:29:23] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Revert "graphite: check graphite2003 metrics" [puppet] - 10https://gerrit.wikimedia.org/r/754876 (https://phabricator.wikimedia.org/T299383)
[10:29:25] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Revert "profile: move statsd writes to graphite2003" [puppet] - 10https://gerrit.wikimedia.org/r/754877 (https://phabricator.wikimedia.org/T299383)
[10:30:04] <wikibugs>	 (03PS1) 10Marostegui: db1117: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/754878 (https://phabricator.wikimedia.org/T299344)
[10:30:53] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1117: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/754878 (https://phabricator.wikimedia.org/T299344) (owner: 10Marostegui)
[10:31:09] <wikibugs>	 (03CR) 10Volans: "DO NOT MERGE AS IS" [cookbooks] - 10https://gerrit.wikimedia.org/r/754872 (https://phabricator.wikimedia.org/T239814) (owner: 10Volans)
[10:31:18] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.reimage for host db1117.eqiad.wmnet with OS bullseye
[10:31:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:31:54] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Revert "ProductionServices: use graphite2003 for statsd" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754879 (https://phabricator.wikimedia.org/T299383)
[10:32:34] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, let's try this way and we can re-evaluate later" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/753731 (owner: 10Ayounsi)
[10:32:53] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1015 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
[10:32:56] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Migrate eqiad Ganeti cluster to Buster - https://phabricator.wikimedia.org/T296721 (10MoritzMuehlenhoff)
[10:33:01] <marostegui>	 haproxy alerts are expected
[10:33:21] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1017 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
[10:33:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[10:34:31] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [software/spicerack] - 10https://gerrit.wikimedia.org/r/749852 (owner: 10Volans)
[10:35:27] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] LibreNMS report, only log_info devices with no IP [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/753731 (owner: 10Ayounsi)
[10:35:37] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1013 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
[10:35:48] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] LibreNMS report, only log_info devices with no IP (032 comments) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/753731 (owner: 10Ayounsi)
[10:36:10] <wikibugs>	 (03Merged) 10jenkins-bot: LibreNMS report, only log_info devices with no IP [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/753731 (owner: 10Ayounsi)
[10:36:27] <icinga-wm>	 ACKNOWLEDGEMENT - haproxy failover on dbproxy1015 is CRITICAL: CRITICAL check_failover servers up 1 down 1: Marostegui known https://wikitech.wikimedia.org/wiki/HAProxy
[10:36:27] <icinga-wm>	 ACKNOWLEDGEMENT - haproxy failover on dbproxy1017 is CRITICAL: CRITICAL check_failover servers up 1 down 1: Marostegui known https://wikitech.wikimedia.org/wiki/HAProxy
[10:37:14] <icinga-wm>	 ACKNOWLEDGEMENT - haproxy failover on dbproxy1013 is CRITICAL: CRITICAL check_failover servers up 1 down 1: Marostegui known https://wikitech.wikimedia.org/wiki/HAProxy
[10:37:39] <icinga-wm>	 ACKNOWLEDGEMENT - haproxy failover on dbproxy1013 is CRITICAL: CRITICAL check_failover servers up 1 down 1: Marostegui known https://wikitech.wikimedia.org/wiki/HAProxy
[10:37:55] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1020 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
[10:40:33] <wikibugs>	 (03CR) 10Volans: [C: 03+2] redfish: improve support for DRY-RUN mode [software/spicerack] - 10https://gerrit.wikimedia.org/r/749852 (owner: 10Volans)
[10:40:57] <icinga-wm>	 RECOVERY - Kafka broker TLS certificate validity on kafka-test1009 is OK: SSL OK - Certificate kafka-test1009.eqiad.wmnet valid until 2022-01-29 19:27:00 +0000 (expires in 11 days) https://wikitech.wikimedia.org/wiki/Kafka/Administration%23Renew_TLS_certificate
[10:40:57] <icinga-wm>	 RECOVERY - Kafka broker TLS certificate validity on kafka-test1007 is OK: SSL OK - Certificate kafka-test1007.eqiad.wmnet valid until 2022-01-29 19:16:00 +0000 (expires in 11 days) https://wikitech.wikimedia.org/wiki/Kafka/Administration%23Renew_TLS_certificate
[10:41:23] <wikibugs>	 (03PS1) 10Ayounsi: Atlas exporter: add probes and traceroute mesurements [puppet] - 10https://gerrit.wikimedia.org/r/754880 (https://phabricator.wikimedia.org/T251156)
[10:41:29] <icinga-wm>	 ACKNOWLEDGEMENT - haproxy failover on dbproxy1020 is CRITICAL: CRITICAL check_failover servers up 1 down 1: Marostegui known https://wikitech.wikimedia.org/wiki/HAProxy
[10:43:46] <icinga-wm>	 RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:43:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[10:44:43] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar), 10User-ema: Package and deploy Varnish 6.0.9 - https://phabricator.wikimedia.org/T298758 (10MMandere) We,ve analyzed `cp3052` and `cp3053`  (text and upload nodes respectively) and compared the following resources  * Cache hits * Request R...
[10:44:45] <wikibugs>	 (03CR) 10Ayounsi: "https://puppet-compiler.wmflabs.org/pcc-worker1001/33290/alert1001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/753000 (owner: 10Ayounsi)
[10:45:42] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1016 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy
[10:46:06] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Add msw2-eqiad to monitoring [puppet] - 10https://gerrit.wikimedia.org/r/753000 (owner: 10Ayounsi)
[10:46:24] <moritzm>	 !log gnt-cluster upgrade --to 2.16  for ganeti/eqiad cluster
[10:46:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:46:55] <wikibugs>	 (03Merged) 10jenkins-bot: redfish: improve support for DRY-RUN mode [software/spicerack] - 10https://gerrit.wikimedia.org/r/749852 (owner: 10Volans)
[10:47:08] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: use / as miscweb health check [puppet] - 10https://gerrit.wikimedia.org/r/754881 (https://phabricator.wikimedia.org/T291946)
[10:49:23] <wikibugs>	 (03CR) 10Ayounsi: "https://puppet-compiler.wmflabs.org/pcc-worker1002/33291/netmon1002.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/754880 (https://phabricator.wikimedia.org/T251156) (owner: 10Ayounsi)
[10:49:24] <icinga-wm>	 PROBLEM - ganeti-wconfd running on ganeti1009 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 114 (gnt-masterd), command name ganeti-wconfd https://wikitech.wikimedia.org/wiki/Ganeti
[10:50:46] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti1012 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[10:50:52] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1016 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
[10:52:04] <moritzm>	 ^ these are expected, the daemons are stopped during the update to 2.16, will recover soon
[10:52:14] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti1009 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[10:53:20] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1013 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
[10:53:52] <wikibugs>	 (03PS1) 10Kormat: switchover: Drop tendril support. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/754882 (https://phabricator.wikimedia.org/T297605)
[10:53:54] <icinga-wm>	 RECOVERY - ganeti-confd running on ganeti1012 is OK: PROCS OK: 1 process with UID = 113 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[10:55:26] <icinga-wm>	 PROBLEM - Check unit status of netbox_ganeti_eqiad_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_eqiad_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[10:55:43] <wikibugs>	 (03PS1) 10Ayounsi: LibreNMS report only count devices with no IP [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/754883
[10:55:54] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2021 is CRITICAL: /en.wikipedia.org/v1/page/talk/{title} (Get structured talk page for enwiki Salt article) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:56:06] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1117.eqiad.wmnet with OS bullseye
[10:56:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:56:26] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] switchover: Drop tendril support. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/754882 (https://phabricator.wikimedia.org/T297605) (owner: 10Kormat)
[10:57:20] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti1012 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[10:57:32] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti1012 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[10:58:10] <icinga-wm>	 PROBLEM - HTTPS Ganeti RAPI eqiad on ganeti1009 is CRITICAL: connect to address ganeti01.svc.eqiad.wmnet and port 5080: Connection refused https://www.mediawiki.org/wiki/Ganeti%23RAPI_daemon
[10:58:20] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2021 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[10:58:49] <wikibugs>	 (03Merged) 10jenkins-bot: switchover: Drop tendril support. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/754882 (https://phabricator.wikimedia.org/T297605) (owner: 10Kormat)
[10:59:01] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1117: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/754604
[10:59:58] <icinga-wm>	 RECOVERY - ganeti-noded running on ganeti1012 is OK: PROCS OK: 1 process with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[11:00:08] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: ttyS0-115200: Add a comment about this being VM specific [puppet] - 10https://gerrit.wikimedia.org/r/754884
[11:00:10] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1015 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
[11:00:12] <icinga-wm>	 RECOVERY - ganeti-confd running on ganeti1012 is OK: PROCS OK: 1 process with UID = 113 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[11:00:35] <wikibugs>	 (03PS1) 10Elukey: role::pki::multirootca: add dedicated profile for ml-serve k8s [puppet] - 10https://gerrit.wikimedia.org/r/754885 (https://phabricator.wikimedia.org/T298976)
[11:00:38] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti1009 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[11:01:40] <icinga-wm>	 RECOVERY - ganeti-wconfd running on ganeti1009 is OK: PROCS OK: 1 process with UID = 114 (gnt-masterd), command name ganeti-wconfd https://wikitech.wikimedia.org/wiki/Ganeti
[11:02:06] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db1117: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/754604 (owner: 10Marostegui)
[11:02:10] <icinga-wm>	 RECOVERY - ganeti-confd running on ganeti1009 is OK: PROCS OK: 1 process with UID = 113 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[11:02:18] <icinga-wm>	 RECOVERY - ganeti-noded running on ganeti1009 is OK: PROCS OK: 1 process with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[11:02:32] <icinga-wm>	 RECOVERY - HTTPS Ganeti RAPI eqiad on ganeti1009 is OK: HTTP OK: Status line output matched 401 - 309 bytes in 0.019 second response time https://www.mediawiki.org/wiki/Ganeti%23RAPI_daemon
[11:02:36] <wikibugs>	 (03PS1) 10Elukey: profile::pki::multirootca: add fake profile credentials for ml-serve [labs/private] - 10https://gerrit.wikimedia.org/r/754887
[11:02:47] <wikibugs>	 (03PS1) 10Volans: requests: add support for conn/read timeouts [software/pywmflib] - 10https://gerrit.wikimedia.org/r/754888
[11:02:52] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] profile::pki::multirootca: add fake profile credentials for ml-serve [labs/private] - 10https://gerrit.wikimedia.org/r/754887 (owner: 10Elukey)
[11:04:45] <wikibugs>	 (03CR) 10Volans: LibreNMS report only count devices with no IP (031 comment) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/754883 (owner: 10Ayounsi)
[11:06:10] <icinga-wm>	 RECOVERY - Check unit status of netbox_ganeti_eqiad_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_eqiad_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[11:06:28] <mmandere>	 !log start rolling upgrade to varnish 6.0.9 T298758
[11:06:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:06:32] <stashbot>	 T298758: Package and deploy Varnish 6.0.9 - https://phabricator.wikimedia.org/T298758
[11:06:54] <moritzm>	 !log running gnt-cluster renew-crypto --new-node-certificates for ganeti/eqiad cluster following 2.16 update
[11:06:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:07:02] <wikibugs>	 10SRE, 10Analytics-Radar: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10akosiaris) There is indeed a race condition between `networking.service` and `ifup@ens5.service`. Checked on a couple of VMs that did not exhibit this problem as well as some that di...
[11:07:09] <wikibugs>	 (03PS3) 10Jcrespo: mediabackups: Backup s7 media files at codfw [puppet] - 10https://gerrit.wikimedia.org/r/754025 (https://phabricator.wikimedia.org/T262668)
[11:08:04] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti1010 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[11:08:04] <icinga-wm>	 PROBLEM - ganeti-mond running on ganeti1022 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-mond https://wikitech.wikimedia.org/wiki/Ganeti
[11:09:23] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/754880 (https://phabricator.wikimedia.org/T251156) (owner: 10Ayounsi)
[11:09:48] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is OK: (C)100 gt (W)50 gt 1 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[11:09:56] <icinga-wm>	 RECOVERY - ganeti-noded running on ganeti1010 is OK: PROCS OK: 1 process with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[11:10:16] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti1011 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[11:10:58] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti1009 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[11:11:12] <icinga-wm>	 PROBLEM - Check systemd state on build2001 is CRITICAL: CRITICAL - degraded: The following units failed: ifup@ens13.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:11:26] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1017 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
[11:11:28] <icinga-wm>	 PROBLEM - HTTPS Ganeti RAPI eqiad on ganeti1009 is CRITICAL: connect to address ganeti01.svc.eqiad.wmnet and port 5080: Connection refused https://www.mediawiki.org/wiki/Ganeti%23RAPI_daemon
[11:11:56] <icinga-wm>	 RECOVERY - ganeti-mond running on ganeti1022 is OK: PROCS OK: 1 process with UID = 0 (root), command name ganeti-mond https://wikitech.wikimedia.org/wiki/Ganeti
[11:11:57] <wikibugs>	 (03CR) 10Elukey: "I added https://gerrit.wikimedia.org/r/c/labs/private/+/754887 but I am currently getting an error from pcc:" [puppet] - 10https://gerrit.wikimedia.org/r/754885 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[11:12:16] <icinga-wm>	 RECOVERY - ganeti-confd running on ganeti1011 is OK: PROCS OK: 1 process with UID = 113 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[11:13:37] <wikibugs>	 (03PS2) 10Ayounsi: LibreNMS report only count devices with no IP [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/754883
[11:14:01] <wikibugs>	 (03CR) 10Ayounsi: "Thanks!" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/754883 (owner: 10Ayounsi)
[11:14:48] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1020 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy
[11:14:51] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] LibreNMS report only count devices with no IP [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/754883 (owner: 10Ayounsi)
[11:15:18] <icinga-wm>	 RECOVERY - ganeti-confd running on ganeti1009 is OK: PROCS OK: 1 process with UID = 113 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[11:15:46] <icinga-wm>	 RECOVERY - HTTPS Ganeti RAPI eqiad on ganeti1009 is OK: HTTP OK: Status line output matched 401 - 309 bytes in 0.021 second response time https://www.mediawiki.org/wiki/Ganeti%23RAPI_daemon
[11:16:48] <wikibugs>	 (03PS1) 10Elukey: helmfile.d: deploy cert-manager for ml-serve nodes [deployment-charts] - 10https://gerrit.wikimedia.org/r/754890 (https://phabricator.wikimedia.org/T298976)
[11:17:38] <wikibugs>	 10SRE, 10Analytics-Radar: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10MoritzMuehlenhoff) >>! In T273026#7627740, @akosiaris wrote: > * Get rid of ifupdown and /etc/network/interfaces and get a proper and modern network interface manager. See T234207. T...
[11:18:41] <wikibugs>	 (03PS2) 10Jbond: P:installserver::proxy: switch access logs to syslog [puppet] - 10https://gerrit.wikimedia.org/r/754520 (https://phabricator.wikimedia.org/T298087)
[11:19:18] <icinga-wm>	 PROBLEM - Disk space on deploy1002 is CRITICAL: DISK CRITICAL - /run/docker/netns/663e8ee211ef is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=deploy1002&var-datasource=eqiad+prometheus/ops
[11:19:42] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Migrate codfw Ganeti cluster to Buster - https://phabricator.wikimedia.org/T296622 (10MoritzMuehlenhoff)
[11:20:01] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Migrate eqiad Ganeti cluster to Buster - https://phabricator.wikimedia.org/T296721 (10MoritzMuehlenhoff)
[11:22:12] <wikibugs>	 (03PS1) 10Ayounsi: Add grafana-worldmap-panel [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/754892 (https://phabricator.wikimedia.org/T251184)
[11:22:56] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to ldap/wmf for Sérgio Lopes - https://phabricator.wikimedia.org/T299353 (10dr0ptp4kt) Approved.
[11:24:53] <wikibugs>	 (03PS2) 10Ayounsi: Add grafana-worldmap-panel [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/754892 (https://phabricator.wikimedia.org/T251184)
[11:27:34] <wikibugs>	 10SRE, 10Analytics-Radar: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10akosiaris) >>! In T273026#7627758, @MoritzMuehlenhoff wrote: >>>! In T273026#7627740, @akosiaris wrote: >> * Get rid of ifupdown and /etc/network/interfaces and get a proper and mode...
[11:28:15] <Amir1>	 !log mwscript findBadBlobs.php --wiki=dewiki --revisions 5730218 --mark "T299387"
[11:28:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:19] <stashbot>	 T299387: Bad revision in German Wikipedia - https://phabricator.wikimedia.org/T299387
[11:29:28] <icinga-wm>	 RECOVERY - SSH on restbase2010.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:34:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[11:35:31] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, in terms of cardinality / metric load on Prometheus let's see what happens! Might need to revert but we'll worry about that later" [puppet] - 10https://gerrit.wikimedia.org/r/754880 (https://phabricator.wikimedia.org/T251156) (owner: 10Ayounsi)
[11:35:55] <icinga-wm>	 RECOVERY - SSH on kubernetes1004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:38:54] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[11:38:56] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[11:38:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:38:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:00] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[11:39:01] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[11:39:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:06] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[11:39:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:08] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[11:39:09] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[11:39:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:12] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[11:39:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1165 (T285149)', diff saved to https://phabricator.wikimedia.org/P18766 and previous config saved to /var/cache/conftool/dbconfig/20220118-113916-marostegui.json
[11:39:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:39:20] <stashbot>	 T285149: Schema change for dropping rev_page_id index - https://phabricator.wikimedia.org/T285149
[11:40:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T285149)', diff saved to https://phabricator.wikimedia.org/P18767 and previous config saved to /var/cache/conftool/dbconfig/20220118-114024-marostegui.json
[11:40:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:41:57] <wikibugs>	 (03PS1) 10Muehlenhoff: scap: No longer install dependencies via Puppet [puppet] - 10https://gerrit.wikimedia.org/r/754894 (https://phabricator.wikimedia.org/T298463)
[11:44:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[11:46:04] <hashar>	 !log Rolled back Quibble 1.3.0 jobs due to php configuration files with at least releng/quibble-buster73:1.3.0 # T299389
[11:46:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:46:07] <stashbot>	 T299389: Wikibase CI broken due to missing PHP extensions: dom, intl, mbstring, xml, xmlreader, xmlwriter - https://phabricator.wikimedia.org/T299389
[11:48:30] <wikibugs>	 (03PS2) 10Muehlenhoff: scap: No longer install dependencies via Puppet [puppet] - 10https://gerrit.wikimedia.org/r/754894 (https://phabricator.wikimedia.org/T298463)
[11:50:41] <wikibugs>	 (03PS1) 10Kosta Harlan: Monitoring: Add '.Save' to distinguish from '.Click' events [extensions/GrowthExperiments] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754605 (https://phabricator.wikimedia.org/T286366)
[11:50:44] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: mediawiki-httpd: add and configure mod_remoteip [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/754897 (https://phabricator.wikimedia.org/T297613)
[11:52:55] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "you will first need to create the CA on the root server and upload the public certificate, see:" [puppet] - 10https://gerrit.wikimedia.org/r/754885 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[11:53:08] <wikibugs>	 (03PS1) 10Kosta Harlan: Monitoring: Add '.Save' to distinguish from '.Click' events [extensions/GrowthExperiments] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754906 (https://phabricator.wikimedia.org/T286366)
[11:55:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P18768 and previous config saved to /var/cache/conftool/dbconfig/20220118-115529-marostegui.json
[11:55:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:05] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: Your horoscope predicts another unfortunate UTC morning backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220118T1200).
[12:00:05] <jouncebot>	 kostajh, subbu[m], and nn1l2: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[12:00:11] <Lucas_WMDE>	 o/
[12:00:19] <kostajh>	 hi
[12:00:26] <Guest9639>	 hi
[12:00:34] <Lucas_WMDE>	 eight changes, tsk tsk tsk ;)
[12:00:38] * urbanecm waves but can't deploy
[12:00:42] <kostajh>	 We may need to wait for the job updates that hashar is doing
[12:00:57] <taavi>	 I'm also present-ish but would prefer not deploying
[12:01:04] <Lucas_WMDE>	 I can deploy
[12:01:21] * kostajh offers Lucas_WMDE a cookie
[12:01:26] <nn1l2>	 hi
[12:02:08] <wikibugs>	 (03PS2) 10Jbond: nfs-mounts: Used to store facts between all nodes [puppet] - 10https://gerrit.wikimedia.org/r/754509 (https://phabricator.wikimedia.org/T299390)
[12:02:15] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Post-edit dialog: Reload page upon dialog closing for structured tasks [extensions/GrowthExperiments] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754129 (https://phabricator.wikimedia.org/T299188) (owner: 10Kosta Harlan)
[12:02:22] <Lucas_WMDE>	 let’s start with kostajh
[12:04:27] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] fawiki: Add flow-delete right to eliminators (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/753969 (https://phabricator.wikimedia.org/T299223) (owner: 104nn1l2)
[12:06:30] <wikibugs>	 (03PS1) 10Vgutierrez: cache::envoy: Set upstream idle timeout [puppet] - 10https://gerrit.wikimedia.org/r/754901 (https://phabricator.wikimedia.org/T271421)
[12:06:36] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] commonswiki: Add peerj.com to wgCopyUploadsDomains whitelist (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754612 (https://phabricator.wikimedia.org/T299247) (owner: 104nn1l2)
[12:06:51] <Lucas_WMDE>	 deploying the commonswiki change while waiting for GrowthExperiments CI
[12:06:58] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Add bullseye build [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/754902
[12:07:00] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Add build2001 as a target [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/754903
[12:07:04] <Lucas_WMDE>	 nn1l2: is that change testable without actually uploading a file to Commons?
[12:07:18] <nn1l2>	 let me upload a filr
[12:07:39] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33294/console" [puppet] - 10https://gerrit.wikimedia.org/r/754901 (https://phabricator.wikimedia.org/T271421) (owner: 10Vgutierrez)
[12:07:45] <wikibugs>	 (03Merged) 10jenkins-bot: commonswiki: Add peerj.com to wgCopyUploadsDomains whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754612 (https://phabricator.wikimedia.org/T299247) (owner: 104nn1l2)
[12:08:15] <Lucas_WMDE>	 nn1l2: alright, the change should be on mwdebug1001 now
[12:08:21] <Lucas_WMDE>	 please let me know if it works
[12:10:09] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+1 C: 03+2] cache::envoy: Set upstream idle timeout [puppet] - 10https://gerrit.wikimedia.org/r/754901 (https://phabricator.wikimedia.org/T271421) (owner: 10Vgutierrez)
[12:10:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P18769 and previous config saved to /var/cache/conftool/dbconfig/20220118-121034-marostegui.json
[12:10:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[12:11:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:12:08] <nn1l2>	 There was a problem during the HTTP request: 432
[12:12:16] <nn1l2>	 test failed
[12:12:30] <nn1l2>	 I couldn't upload it
[12:13:03] <wikibugs>	 10SRE, 10SRE-Access-Requests: NRodriguez uses the same SSH key(s) in WMCS and production - https://phabricator.wikimedia.org/T299336 (10Jelto) p:05Triage→03Medium a:03NRodriguez
[12:13:30] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] nfs-mounts: Used to store facts between all nodes [puppet] - 10https://gerrit.wikimedia.org/r/754509 (https://phabricator.wikimedia.org/T299390) (owner: 10Jbond)
[12:13:45] <Lucas_WMDE>	 o_O 432 doesn’t seem to be a known HTTP error code
[12:13:59] * Lucas_WMDE peeks at logstash
[12:14:37] <Lucas_WMDE>	 nn1l2: did you use the WikimediaDebug extension?
[12:14:46] <nn1l2>	 yes
[12:15:07] <Lucas_WMDE>	 weird, I only see one log event in the mwdebug logstash dashboard
[12:15:12] <Lucas_WMDE>	 usually there are more events even for regular pageviews
[12:15:42] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[12:15:43] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[12:15:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:15:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:15:47] <nn1l2>	 this is a new error
[12:15:49] <Lucas_WMDE>	 hm, the one event is for commons Special:Upload though, so that is probably your request
[12:15:57] <nn1l2>	 I have not seen it before
[12:16:10] <Lucas_WMDE>	 can you try again, maybe?
[12:16:55] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[12:16:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:17:32] <nn1l2>	 still the same weird error: There was a problem during the HTTP request: 432
[12:18:08] <Lucas_WMDE>	 very weird
[12:18:57] <Lucas_WMDE>	 I think I’ll sync this change anyways
[12:19:07] <Lucas_WMDE>	 it seems unlikely that this error is due to the addition
[12:19:08] <nn1l2>	 okay
[12:19:15] <Lucas_WMDE>	 I feel like it’s more likely that mwdebug1001 has errors in general
[12:19:21] <Lucas_WMDE>	 hm, maybe you could try mwdebug1002?
[12:19:27] <nn1l2>	 yes
[12:20:24] <Lucas_WMDE>	 kostajh: your other backport (add .save to distinguish…) failed gate-and-submit on master, can you retry that?
[12:20:37] <Lucas_WMDE>	 I’d prefer not to merge the backport before it’s successfully gone into master
[12:20:54] <Lucas_WMDE>	 https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/754893
[12:23:30] <kostajh>	 Lucas_WMDE: yeah I +2'ed already
[12:23:40] <Lucas_WMDE>	 ok thanks
[12:25:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T285149)', diff saved to https://phabricator.wikimedia.org/P18770 and previous config saved to /var/cache/conftool/dbconfig/20220118-122538-marostegui.json
[12:25:40] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[12:25:42] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[12:25:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:43] <stashbot>	 T285149: Schema change for dropping rev_page_id index - https://phabricator.wikimedia.org/T285149
[12:25:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:47] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1180 (T285149)', diff saved to https://phabricator.wikimedia.org/P18771 and previous config saved to /var/cache/conftool/dbconfig/20220118-122546-marostegui.json
[12:25:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:25:55] <Lucas_WMDE>	 nn1l2: any news?
[12:25:56] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Superset for Margeigh Novotny - https://phabricator.wikimedia.org/T299072 (10Jelto) p:05Triage→03Medium @MNovotny_WMF this access request needs some more information before we can proceed.  The mentioned `data access` is a bit broad. Please clarify what da...
[12:26:33] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Prod-Kubernetes, and 3 others: decommission kubestage100[12]-eqiad - https://phabricator.wikimedia.org/T299142 (10Aklapper)
[12:26:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T285149)', diff saved to https://phabricator.wikimedia.org/P18772 and previous config saved to /var/cache/conftool/dbconfig/20220118-122654-marostegui.json
[12:26:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:27:13] <moritzm>	 !log imported docker-report bullseye rebuild to apt.wikimedia.org T298463
[12:27:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:27:16] <stashbot>	 T298463: Setup a new build host based on bullseye - https://phabricator.wikimedia.org/T298463
[12:27:51] <nn1l2>	 It says "Copy uploads are not available from this domain.
[12:27:51] <nn1l2>	 "
[12:28:00] <nn1l2>	 test failed again
[12:28:04] <Lucas_WMDE>	 ok, that makes sense at least
[12:28:07] <Lucas_WMDE>	 I’ll just sync it
[12:28:11] <Lucas_WMDE>	 and then you can try it without debug
[12:29:30] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754612|commonswiki: Add peerj.com to wgCopyUploadsDomains whitelist (T299247)]] (duration: 00m 51s)
[12:29:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:29:34] <stashbot>	 T299247: Add peerj.com to Commons wgCopyUploadsDomains whitelist - https://phabricator.wikimedia.org/T299247
[12:29:42] <wikibugs>	 (03Merged) 10jenkins-bot: Post-edit dialog: Reload page upon dialog closing for structured tasks [extensions/GrowthExperiments] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754129 (https://phabricator.wikimedia.org/T299188) (owner: 10Kosta Harlan)
[12:30:01] <Lucas_WMDE>	 alright let’s do the first GrowthExperiments backport
[12:30:09] <Lucas_WMDE>	 nn1l2: please let me know if it works now in the meantime
[12:30:32] <nn1l2>	 it live now?
[12:30:43] <nn1l2>	 it's live now?
[12:31:00] <Lucas_WMDE>	 it should be on all wikis, yes
[12:31:16] <Lucas_WMDE>	 kostajh: the PostEdit JS change should be on mwdebug1001 now, can you test it?
[12:31:23] <kostajh>	 Yes looking
[12:32:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[12:32:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:33:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[12:33:15] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[12:33:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:33:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:33:17] <nn1l2>	 It still fails
[12:33:29] <nn1l2>	 with the same weird error: There was a problem during the HTTP request: 432 
[12:33:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[12:34:04] <Lucas_WMDE>	 weird
[12:34:27] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[12:34:27] <Lucas_WMDE>	 I guess that could mean that peerj.com responds with HTTP 432 to the request MediaWiki makes?
[12:34:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:34:36] <kostajh>	 Lucas_WMDE: looks good, we can sync it
[12:34:42] <Lucas_WMDE>	 ok thanks
[12:35:32] <Lucas_WMDE>	 syncing
[12:36:11] <Lucas_WMDE>	 nn1l2: looks like someone on StackOverflow got the same error https://stackoverflow.com/questions/70718078/download-pdf-from-peerj
[12:36:18] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized php-1.38.0-wmf.17/extensions/GrowthExperiments/modules/ext.growthExperiments.PostEdit/index.js: Backport: [[gerrit:754129|Post-edit dialog: Reload page upon dialog closing for structured tasks (T299188)]] (duration: 00m 51s)
[12:36:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:36:22] <stashbot>	 T299188: [testwiki-wmf.17] Add image - post-edit dialog option "Close and edit this article again" displays "Suggestions are no longer available on this article" - https://phabricator.wikimedia.org/T299188
[12:36:39] <Lucas_WMDE>	 sounds like PeerJ are just blocking requests without the right Referer/User-Agent/who-knows-which-one request header, with an unassigned HTTP error code?
[12:37:13] <Lucas_WMDE>	 which is a shame but ultimately their problem I’d say
[12:37:24] <Lucas_WMDE>	 and we should probably remove the domain again until they go and make their website friendlier
[12:37:30] <nn1l2>	 so we should revert?
[12:37:44] <Lucas_WMDE>	 I think so yeah
[12:37:56] <Lucas_WMDE>	 unless this request came from PeerJ people or we have some contacts there?
[12:38:09] <Lucas_WMDE>	 so that there would be a reasonable chance of getting this resolved on their end
[12:38:22] <nn1l2>	 no, it was just a wikipedian
[12:38:48] <nn1l2>	 I decline the phab request
[12:38:59] <Lucas_WMDE>	 thanks, and please paste the error there
[12:39:09] <Lucas_WMDE>	 revert doesn’t have to be in this window, let’s see how it plays out
[12:39:26] <nn1l2>	 that's good
[12:39:35] <nn1l2>	 let us wait at least 24 hours
[12:40:00] <wikibugs>	 (03PS1) 10Jbond: O:puppet_compiler: mount yaml dir [puppet] - 10https://gerrit.wikimedia.org/r/754904
[12:40:12] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): azwiki: Add draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754613 (https://phabricator.wikimedia.org/T299332) (owner: 104nn1l2)
[12:40:39] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33295/console" [puppet] - 10https://gerrit.wikimedia.org/r/754904 (owner: 10Jbond)
[12:40:57] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] O:puppet_compiler: mount yaml dir [puppet] - 10https://gerrit.wikimedia.org/r/754904 (owner: 10Jbond)
[12:41:41] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] azwiki: Add draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754613 (https://phabricator.wikimedia.org/T299332) (owner: 104nn1l2)
[12:41:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P18773 and previous config saved to /var/cache/conftool/dbconfig/20220118-124159-marostegui.json
[12:42:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:42:27] <wikibugs>	 (03Merged) 10jenkins-bot: azwiki: Add draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754613 (https://phabricator.wikimedia.org/T299332) (owner: 104nn1l2)
[12:43:48] <Lucas_WMDE>	 nn1l2: meanwhile, the azwiki draft namespace should be on mwdebug1001, can you test it?
[12:43:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[12:43:55] <Lucas_WMDE>	 (I’m checking it as well)
[12:44:38] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[12:44:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:45:01] <nn1l2>	 LGTM
[12:45:15] <Lucas_WMDE>	 alright
[12:45:50] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[12:45:51] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[12:45:51] <Lucas_WMDE>	 syncing
[12:45:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:45:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:46:20] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Monitoring: Add '.Save' to distinguish from '.Click' events [extensions/GrowthExperiments] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754605 (https://phabricator.wikimedia.org/T286366) (owner: 10Kosta Harlan)
[12:46:25] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Monitoring: Add '.Save' to distinguish from '.Click' events [extensions/GrowthExperiments] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754906 (https://phabricator.wikimedia.org/T286366) (owner: 10Kosta Harlan)
[12:46:37] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754613|azwiki: Add draft namespace (T299332)]] (duration: 00m 51s)
[12:46:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:46:40] <stashbot>	 T299332: Add draft namespace on Azerbaijani Wikipedia - https://phabricator.wikimedia.org/T299332
[12:47:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[12:47:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:47:12] <Lucas_WMDE>	 nn1l2: if you want to update https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/753969 (or convince me the comment shouldn’t be split :P) we can deploy that as well
[12:47:20] <Lucas_WMDE>	 while waiting for the gate-and-submit in GrowthExperiments
[12:47:52] <Lucas_WMDE>	 subbu[m]: are you around btw?
[12:48:11] <nn1l2>	 give a min and I'll upload a patch
[12:48:20] <Lucas_WMDE>	 ok
[12:48:24] <Lucas_WMDE>	 hm, subbu’s changes were already merged
[12:48:27] * Lucas_WMDE checks SAL
[12:48:52] <Lucas_WMDE>	 ok, already deployed this morning
[12:49:00] <Lucas_WMDE>	 nothing to do there I guess 🤷
[12:50:16] <Lucas_WMDE>	 (I added a comment to the PeerJ task btw)
[12:50:49] <wikibugs>	 (03PS3) 104nn1l2: fawiki: Add flow-delete right to eliminators [mediawiki-config] - 10https://gerrit.wikimedia.org/r/753969 (https://phabricator.wikimedia.org/T299223)
[12:51:08] <nn1l2>	 uploaded
[12:51:17] <Lucas_WMDE>	 thx, lgtm
[12:51:22] <Lucas_WMDE>	 waiting for CI there
[12:51:39] <wikibugs>	 (03PS1) 10Jbond: O:puppet_compiler: mount yaml dir [puppet] - 10https://gerrit.wikimedia.org/r/754927
[12:52:00] <moritzm>	 !log installing ghostcript security updates for stretch
[12:52:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:52:03] <wikibugs>	 (03CR) 104nn1l2: fawiki: Add flow-delete right to eliminators (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/753969 (https://phabricator.wikimedia.org/T299223) (owner: 104nn1l2)
[12:52:08] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33296/console" [puppet] - 10https://gerrit.wikimedia.org/r/754927 (owner: 10Jbond)
[12:52:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[12:52:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:52:20] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] O:puppet_compiler: mount yaml dir [puppet] - 10https://gerrit.wikimedia.org/r/754927 (owner: 10Jbond)
[12:53:15] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[12:53:16] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[12:53:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:53:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:54:19] <wikibugs>	 (03PS4) 10Lucas Werkmeister (WMDE): fawiki: Add flow-delete right to eliminators [mediawiki-config] - 10https://gerrit.wikimedia.org/r/753969 (https://phabricator.wikimedia.org/T299223) (owner: 104nn1l2)
[12:54:23] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] fawiki: Add flow-delete right to eliminators [mediawiki-config] - 10https://gerrit.wikimedia.org/r/753969 (https://phabricator.wikimedia.org/T299223) (owner: 104nn1l2)
[12:54:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[12:54:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:56:01] <kostajh>	 Lucas_WMDE: it finished merging into master
[12:56:11] <Lucas_WMDE>	 ack, thanks
[12:56:22] <Lucas_WMDE>	 (I already +2ed the backports on the assumption it wouldn’t fail again)
[12:56:26] <wikibugs>	 (03Merged) 10jenkins-bot: fawiki: Add flow-delete right to eliminators [mediawiki-config] - 10https://gerrit.wikimedia.org/r/753969 (https://phabricator.wikimedia.org/T299223) (owner: 104nn1l2)
[12:56:28] <Lucas_WMDE>	 Zuul says 9 more minutes
[12:56:51] <Lucas_WMDE>	 nn1l2: fawiki change is on mwdebug1001, can you test it?
[12:57:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P18774 and previous config saved to /var/cache/conftool/dbconfig/20220118-125703-marostegui.json
[12:57:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:57:45] <nn1l2>	 Good to go
[12:57:47] <Lucas_WMDE>	 https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%98%D9%87:%D8%A7%D8%AE%D8%AA%DB%8C%D8%A7%D8%B1%D8%A7%D8%AA_%DA%AF%D8%B1%D9%88%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1%DB%8C&uselang=en looks correct to me
[12:57:49] <Lucas_WMDE>	 ok
[12:58:18] <Lucas_WMDE>	 syncing
[12:58:42] <Lucas_WMDE>	 jouncebot: next
[12:58:42] <jouncebot>	 In 3 hour(s) and 1 minute(s): CI server restart (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220118T1600)
[12:58:57] <Lucas_WMDE>	 ok, we’ll overrun the window a bit for the last GrowthExperiments backports but should be okay
[12:59:06] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:753969|fawiki: Add flow-delete right to eliminators (T299223)]] (duration: 00m 51s)
[12:59:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:59:10] <stashbot>	 T299223: Add flow-delete right to eliminators on fawiki - https://phabricator.wikimedia.org/T299223
[12:59:32] <wikibugs>	 (03PS1) 10Ayounsi: Update requirements [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/754929
[12:59:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[12:59:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:00:25] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/754929 (owner: 10Ayounsi)
[13:01:41] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Update requirements [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/754929 (owner: 10Ayounsi)
[13:02:10] <Lucas_WMDE>	 only waiting for those gate-and-submit builds now
[13:02:22] <Lucas_WMDE>	 (is the correct plural gate-and-submits or gates-and-submit 🤔)
[13:02:54] <logmsgbot>	 !log ayounsi@deploy1002 Started deploy [homer/deploy@0f02386]: update requirements
[13:02:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:09] <kostajh>	 keine ahnung 
[13:03:26] <Lucas_WMDE>	 on second thought gates-and-submit sounds like a microsoft joke from the early 2000s
[13:04:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[13:04:05] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[13:04:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:04:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:04:22] <logmsgbot>	 !log ayounsi@deploy1002 Finished deploy [homer/deploy@0f02386]: update requirements (duration: 01m 27s)
[13:04:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:11] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[13:05:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:17] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: update requirements - ayounsi@cumin1001
[13:05:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:06:06] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: update requirements - ayounsi@cumin1001
[13:06:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:09:50] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: changeprop/api-gateway: use the common_images data structure [deployment-charts] - 10https://gerrit.wikimedia.org/r/730559 (https://phabricator.wikimedia.org/T291530)
[13:12:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T285149)', diff saved to https://phabricator.wikimedia.org/P18775 and previous config saved to /var/cache/conftool/dbconfig/20220118-131208-marostegui.json
[13:12:10] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[13:12:11] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[13:12:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:12:13] <stashbot>	 T285149: Schema change for dropping rev_page_id index - https://phabricator.wikimedia.org/T285149
[13:12:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:12:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1168 (T285149)', diff saved to https://phabricator.wikimedia.org/P18776 and previous config saved to /var/cache/conftool/dbconfig/20220118-131215-marostegui.json
[13:12:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:12:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:14:05] <moritzm>	 !log installing python-babel security updates on buster
[13:14:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:47] <Lucas_WMDE>	 kostajh: looks like the backports are about to merge
[13:17:32] <wikibugs>	 (03Merged) 10jenkins-bot: Monitoring: Add '.Save' to distinguish from '.Click' events [extensions/GrowthExperiments] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754605 (https://phabricator.wikimedia.org/T286366) (owner: 10Kosta Harlan)
[13:17:35] <wikibugs>	 (03Merged) 10jenkins-bot: Monitoring: Add '.Save' to distinguish from '.Click' events [extensions/GrowthExperiments] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754906 (https://phabricator.wikimedia.org/T286366) (owner: 10Kosta Harlan)
[13:17:37] <Lucas_WMDE>	 I’m guessing the wmf.18 one won’t really be testable on its own
[13:17:40] <kostajh>	 good prediction
[13:17:42] <Lucas_WMDE>	 but wmf.17 should be okay hopefully
[13:17:45] <kostajh>	 yeah nothing to test for wmf.18
[13:18:02] <kostajh>	 I can try a test for wmf.17, sure
[13:18:41] <Lucas_WMDE>	 ok, wmf.17 should be on mwdebug1001 now
[13:20:17] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[13:20:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:20] <kostajh>	 Lucas_WMDE: thanks, looking
[13:20:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T285149)', diff saved to https://phabricator.wikimedia.org/P18777 and previous config saved to /var/cache/conftool/dbconfig/20220118-132026-marostegui.json
[13:20:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:33] <stashbot>	 T285149: Schema change for dropping rev_page_id index - https://phabricator.wikimedia.org/T285149
[13:21:28] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[13:21:29] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[13:21:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:21:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:22:41] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[13:22:42] <kostajh>	 Lucas_WMDE: it works
[13:22:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:22:49] <Lucas_WMDE>	 ack
[13:24:39] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized php-1.38.0-wmf.17/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:754605|Monitoring: Add '.Save' to distinguish from '.Click' events (T286366)]] (duration: 00m 54s)
[13:24:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:43] <stashbot>	 T286366: Implement product key performance indicator monitoring for Growth features in Grafana - https://phabricator.wikimedia.org/T286366
[13:25:08] <Lucas_WMDE>	 hrm, php-1.38.0-wmf.18 does not exist yet
[13:25:17] <Lucas_WMDE>	 on deploy1002
[13:25:40] <Lucas_WMDE>	 I think that means it’s okay to leave it alone and it’ll make it into the train automatically
[13:26:02] <Lucas_WMDE>	 but pinging the conductors just in case
[13:26:23] <Lucas_WMDE>	 jeena, twentyafterfour: I merged the wmf.18 backport https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/754906 before wmf.18 existed on deploy1002, I hope that’s okay
[13:26:59] <Lucas_WMDE>	 !log UTC morning backport window done
[13:27:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:27:47] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[13:27:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:28:51] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[13:28:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[13:28:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:28:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:29:34] <kostajh>	 thank you Lucas_WMDE 
[13:29:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[13:30:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:31:04] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Add grafana-worldmap-panel [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/754892 (https://phabricator.wikimedia.org/T251184) (owner: 10Ayounsi)
[13:32:13] <jinxer-wm>	 (IcingaOverload) firing: Checks are taking long to execute on alert2001:9245  - https://grafana.wikimedia.org/d/rsCfQfuZz/icinga - https://alerts.wikimedia.org
[13:32:55] <Lucas_WMDE>	 np :)
[13:33:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[13:34:45] <hashar>	 ^ eek
[13:35:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P18778 and previous config saved to /var/cache/conftool/dbconfig/20220118-133531-marostegui.json
[13:35:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:35] <hashar>	 looks like a spike every hour or so
[13:37:13] <jinxer-wm>	 (IcingaOverload) resolved: Checks are taking long to execute on alert2001:9245  - https://grafana.wikimedia.org/d/rsCfQfuZz/icinga - https://alerts.wikimedia.org
[13:40:24] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+2 C: 03+2] Add grafana-worldmap-panel [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/754892 (https://phabricator.wikimedia.org/T251184) (owner: 10Ayounsi)
[13:43:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[13:45:43] <twentyafterfour>	 Lucas_WMDE: I think that's fine 
[13:46:16] <XioNoX>	 !log add grafana-plugins 0.3 (with worldmap plugin) to reprepo
[13:46:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:47:04] <wikibugs>	 (03PS1) 10Kormat: Prepare for 0.8 release. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/754935
[13:47:32] <wikibugs>	 (03PS2) 10Kormat: Prepare for 0.8 release. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/754935 (https://phabricator.wikimedia.org/T297605)
[13:49:19] <wikibugs>	 (03CR) 10Elukey: role::pki::multirootca: add dedicated profile for ml-serve k8s (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/754885 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[13:50:25] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] Prepare for 0.8 release. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/754935 (https://phabricator.wikimedia.org/T297605) (owner: 10Kormat)
[13:50:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P18779 and previous config saved to /var/cache/conftool/dbconfig/20220118-135036-marostegui.json
[13:50:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:53:00] <wikibugs>	 (03Merged) 10jenkins-bot: Prepare for 0.8 release. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/754935 (https://phabricator.wikimedia.org/T297605) (owner: 10Kormat)
[13:53:37] <wikibugs>	 (03PS1) 10Vgutierrez: cache::envoy: Decrease upstream idle_timeout to 30s [puppet] - 10https://gerrit.wikimedia.org/r/754938 (https://phabricator.wikimedia.org/T271421)
[13:55:05] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] cache::envoy: Decrease upstream idle_timeout to 30s [puppet] - 10https://gerrit.wikimedia.org/r/754938 (https://phabricator.wikimedia.org/T271421) (owner: 10Vgutierrez)
[13:55:30] <XioNoX>	 !log update grafana-plugins on grafana hosts - T251184
[13:55:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:55:33] <stashbot>	 T251184: Add Grafana worldmap panel - https://phabricator.wikimedia.org/T251184
[13:58:40] <wikibugs>	 (03PS2) 10JMeybohm: Update codfw kubernetes master to a full node [puppet] - 10https://gerrit.wikimedia.org/r/754556 (https://phabricator.wikimedia.org/T290967)
[14:05:12] <wikibugs>	 (03CR) 10Ottomata: P:installserver::proxy: Add domain whitelist to proxy (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/753029 (https://phabricator.wikimedia.org/T298087) (owner: 10Jbond)
[14:05:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T285149)', diff saved to https://phabricator.wikimedia.org/P18780 and previous config saved to /var/cache/conftool/dbconfig/20220118-140540-marostegui.json
[14:05:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:05:51] <stashbot>	 T285149: Schema change for dropping rev_page_id index - https://phabricator.wikimedia.org/T285149
[14:06:38] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Nick Ray - https://phabricator.wikimedia.org/T299186 (10Ottomata) Approved
[14:06:50] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 04-1] "PCC SUCCESS (NOOP 1 DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33298/console" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/754885 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[14:07:56] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/754885 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[14:10:06] <moritzm>	 !log installing vim security updates on stretch
[14:10:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:14:13] <wikibugs>	 (03PS1) 10JMeybohm: Add keys needed for k8s node profile to master nodes [labs/private] - 10https://gerrit.wikimedia.org/r/754943
[14:14:42] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+2 C: 03+2] Add keys needed for k8s node profile to master nodes [labs/private] - 10https://gerrit.wikimedia.org/r/754943 (owner: 10JMeybohm)
[14:15:01] <wikibugs>	 (03PS3) 10JMeybohm: Update codfw kubernetes master to a full node [puppet] - 10https://gerrit.wikimedia.org/r/754556 (https://phabricator.wikimedia.org/T290967)
[14:16:45] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: wmcs: factorize common arguments [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/754473
[14:16:47] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: wmcs: toolforge: grid: introduce cookbook to repool a node [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/754555 (https://phabricator.wikimedia.org/T298948)
[14:16:49] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: wmcs: toolforge: grid: introduce cookbook to verify basic grid health [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/754944 (https://phabricator.wikimedia.org/T298948)
[14:18:19] <wikibugs>	 10SRE-swift-storage, 10Data-Engineering, 10Data-Engineering-Kanban: Deploy research_poc Swift credidentials to Hadoop - https://phabricator.wikimedia.org/T296945 (10Ottomata) Hm, perhaps, although I'm not sure where.  This is sort of a one off.  We'd love to have more first class support for exporting to swi...
[14:18:28] <wikibugs>	 10SRE-swift-storage, 10Data-Engineering, 10Data-Engineering-Kanban: Deploy research_poc Swift credidentials to Hadoop - https://phabricator.wikimedia.org/T296945 (10Ottomata) 05Open→03Resolved
[14:18:32] <wikibugs>	 10SRE-swift-storage: Storage request for datasets published by research team - https://phabricator.wikimedia.org/T294380 (10Ottomata)
[14:19:04] <wikibugs>	 (03PS1) 10JMeybohm: Add kubestagemaster2001 to k8s_staging iBGP config [homer/public] - 10https://gerrit.wikimedia.org/r/754945 (https://phabricator.wikimedia.org/T290967)
[14:19:51] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add kubestagemaster2001 to k8s_staging iBGP config [homer/public] - 10https://gerrit.wikimedia.org/r/754945 (https://phabricator.wikimedia.org/T290967) (owner: 10JMeybohm)
[14:20:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs: toolforge: grid: introduce cookbook to verify basic grid health [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/754944 (https://phabricator.wikimedia.org/T298948) (owner: 10Arturo Borrero Gonzalez)
[14:21:16] <wikibugs>	 (03PS2) 10JMeybohm: Add kubestagemaster2001 to k8s_staging iBGP config [homer/public] - 10https://gerrit.wikimedia.org/r/754945 (https://phabricator.wikimedia.org/T290967)
[14:25:54] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Superset for Margeigh Novotny - https://phabricator.wikimedia.org/T299072 (10cmooney) @Jelto thanks for picking this up.  I disucssed briefly with Margeigh on Slack and she confirmed she needs access to dashboards with private data in Superset.  So I believe w...
[14:27:21] <wikibugs>	 (03PS1) 10Jbond: hieradata pcc: add deployment prep [puppet] - 10https://gerrit.wikimedia.org/r/754948
[14:27:54] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] hieradata pcc: add deployment prep [puppet] - 10https://gerrit.wikimedia.org/r/754948 (owner: 10Jbond)
[14:28:01] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor pedantic nit, otherwise +1" [homer/public] - 10https://gerrit.wikimedia.org/r/754945 (https://phabricator.wikimedia.org/T290967) (owner: 10JMeybohm)
[14:28:05] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics clients for mfossati - https://phabricator.wikimedia.org/T299343 (10Ottomata) Approved!
[14:28:48] <moritzm>	 !log installing xorg-server security updates on stretch
[14:28:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:29:14] <wikibugs>	 (03CR) 10Ottomata: ":)" [puppet] - 10https://gerrit.wikimedia.org/r/753738 (owner: 10Elukey)
[14:29:25] <wikibugs>	 (03PS4) 10JMeybohm: Update codfw kubernetes master to a full node [puppet] - 10https://gerrit.wikimedia.org/r/754556 (https://phabricator.wikimedia.org/T290967)
[14:30:41] <wikibugs>	 (03PS1) 10Jbond: O:uppetmaster::standalone: Add upload_facts parameter [puppet] - 10https://gerrit.wikimedia.org/r/754949
[14:31:42] <moritzm>	 !log installing rsync security updates on stretch
[14:31:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:06] <kormat>	 !log uploaded wmfmariadbpy 0.8 to apt.wm.o
[14:33:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:18] <kormat>	 !log Deploying wmfmariadbpy 0.8 T299406
[14:33:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:21] <stashbot>	 T299406: Deploy wmfmariadbpy 0.8 - https://phabricator.wikimedia.org/T299406
[14:33:26] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33301/console" [puppet] - 10https://gerrit.wikimedia.org/r/754556 (https://phabricator.wikimedia.org/T290967) (owner: 10JMeybohm)
[14:33:33] <wikibugs>	 10SRE, 10Data-Engineering: Allow kafka brokers to reload the TLS keystore - https://phabricator.wikimedia.org/T299409 (10elukey)
[14:33:35] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] O:uppetmaster::standalone: Add upload_facts parameter [puppet] - 10https://gerrit.wikimedia.org/r/754949 (owner: 10Jbond)
[14:34:53] <wikibugs>	 (03PS3) 10JMeybohm: Add kubestagemaster2001 to k8s_staging eBGP config [homer/public] - 10https://gerrit.wikimedia.org/r/754945 (https://phabricator.wikimedia.org/T290967)
[14:35:58] <wikibugs>	 (03CR) 10JMeybohm: Add kubestagemaster2001 to k8s_staging eBGP config (032 comments) [homer/public] - 10https://gerrit.wikimedia.org/r/754945 (https://phabricator.wikimedia.org/T290967) (owner: 10JMeybohm)
[14:36:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[14:41:32] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Nick Ray - https://phabricator.wikimedia.org/T299186 (10Jelto)
[14:46:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[14:49:52] <wikibugs>	 (03PS4) 10Eigyan: [wmf-config] Deploy the cawiki test safety survey to production. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/753543 (https://phabricator.wikimedia.org/T296657)
[14:55:12] <wikibugs>	 (03PS1) 10Jelto: admin: Shell account and analytics-privatedata-users for nray [puppet] - 10https://gerrit.wikimedia.org/r/754954 (https://phabricator.wikimedia.org/T299186)
[14:55:18] <wikibugs>	 (03CR) 10JMeybohm: role::pki::multirootca: add dedicated profile for ml-serve k8s (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/754885 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[14:56:50] <wikibugs>	 (03CR) 10Elukey: role::pki::multirootca: add dedicated profile for ml-serve k8s (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/754885 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[14:57:09] <hashar>	 jouncebot: now
[14:57:09] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 2 minute(s)
[14:57:12] <hashar>	 jouncebot: next
[14:57:12] <jouncebot>	 In 1 hour(s) and 2 minute(s): CI server restart (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220118T1600)
[14:58:17] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics clients for mfossati - https://phabricator.wikimedia.org/T299343 (10Jelto)
[14:58:27] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] helmfile.d: deploy cert-manager for ml-serve nodes [deployment-charts] - 10https://gerrit.wikimedia.org/r/754890 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[15:02:19] <wikibugs>	 (03PS1) 10Jelto: admin: Shell account and analytics-privatedata-users for mfossati [puppet] - 10https://gerrit.wikimedia.org/r/754955 (https://phabricator.wikimedia.org/T299343)
[15:04:12] <wikibugs>	 (03PS1) 10Kormat: dbutil: read_section_ports_list() bug when path not supplied [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/754956
[15:06:20] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/754957
[15:06:46] <wikibugs>	 (03PS1) 10Jbond: C:cfssl:signer: update default expiry [puppet] - 10https://gerrit.wikimedia.org/r/754958
[15:07:28] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+1] role::pki::multirootca: add dedicated profile for ml-serve k8s (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/754885 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[15:07:30] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33302/console" [puppet] - 10https://gerrit.wikimedia.org/r/754958 (owner: 10Jbond)
[15:09:01] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] C:cfssl:signer: update default expiry [puppet] - 10https://gerrit.wikimedia.org/r/754958 (owner: 10Jbond)
[15:09:54] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "LGTM" [homer/public] - 10https://gerrit.wikimedia.org/r/754945 (https://phabricator.wikimedia.org/T290967) (owner: 10JMeybohm)
[15:12:28] <wikibugs>	 (03PS1) 10AOkoth: kuberenetes: disable mwautopull timer [puppet] - 10https://gerrit.wikimedia.org/r/754960 (https://phabricator.wikimedia.org/T288345)
[15:14:50] <wikibugs>	 (03CR) 10AOkoth: "https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33303/console" [puppet] - 10https://gerrit.wikimedia.org/r/754960 (https://phabricator.wikimedia.org/T288345) (owner: 10AOkoth)
[15:15:06] <wikibugs>	 (03CR) 10Klausman: [C: 03+1] dbutil: read_section_ports_list() bug when path not supplied [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/754956 (owner: 10Kormat)
[15:16:17] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Add kubestagemaster2001 to k8s_staging eBGP config [homer/public] - 10https://gerrit.wikimedia.org/r/754945 (https://phabricator.wikimedia.org/T290967) (owner: 10JMeybohm)
[15:18:53] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] dbutil: read_section_ports_list() bug when path not supplied [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/754956 (owner: 10Kormat)
[15:20:52] <wikibugs>	 (03CR) 10JMeybohm: kuberenetes: disable mwautopull timer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/754960 (https://phabricator.wikimedia.org/T288345) (owner: 10AOkoth)
[15:21:36] <wikibugs>	 (03Merged) 10jenkins-bot: dbutil: read_section_ports_list() bug when path not supplied [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/754956 (owner: 10Kormat)
[15:25:10] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "LGTM, I didn't spot anything strange. Couple of notes from IRC:" [puppet] - 10https://gerrit.wikimedia.org/r/754556 (https://phabricator.wikimedia.org/T290967) (owner: 10JMeybohm)
[15:27:50] <wikibugs>	 (03CR) 10Jbond: "looks good but see nit" [puppet] - 10https://gerrit.wikimedia.org/r/754954 (https://phabricator.wikimedia.org/T299186) (owner: 10Jelto)
[15:29:36] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] role::pki::multirootca: add dedicated profile for ml-serve k8s [puppet] - 10https://gerrit.wikimedia.org/r/754885 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[15:32:29] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "-1 see comment i think this will need `krb: present`" [puppet] - 10https://gerrit.wikimedia.org/r/754955 (https://phabricator.wikimedia.org/T299343) (owner: 10Jelto)
[15:34:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[15:35:04] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] helmfile.d: deploy cert-manager for ml-serve nodes [deployment-charts] - 10https://gerrit.wikimedia.org/r/754890 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[15:35:11] <godog>	 !log regenerate kartotherian certs via cergen - T297604
[15:35:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:35:15] <stashbot>	 T297604: cergen should include the cert's name in SAN too - https://phabricator.wikimedia.org/T297604
[15:35:17] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=cfssl site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:36:31] <icinga-wm>	 PROBLEM - Check systemd state on pki1001 is CRITICAL: CRITICAL - degraded: The following units failed: cfssl-multirootca.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:37:39] <icinga-wm>	 PROBLEM - Check systemd state on pki2001 is CRITICAL: CRITICAL - degraded: The following units failed: cfssl-multirootca.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:37:42] <wikibugs>	 (03PS5) 10JMeybohm: Update codfw kubernetes master to a full node [puppet] - 10https://gerrit.wikimedia.org/r/754556 (https://phabricator.wikimedia.org/T290967)
[15:39:27] <wikibugs>	 (03CR) 10JMeybohm: Update codfw kubernetes master to a full node (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/754556 (https://phabricator.wikimedia.org/T290967) (owner: 10JMeybohm)
[15:40:34] <wikibugs>	 (03CR) 10Eevans: [C: 03+1] partman: use reuse profiles on all restbase hosts [puppet] - 10https://gerrit.wikimedia.org/r/753986 (https://phabricator.wikimedia.org/T295375) (owner: 10Hnowlan)
[15:43:18] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/754894 (https://phabricator.wikimedia.org/T298463) (owner: 10Muehlenhoff)
[15:44:21] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] scap: No longer install dependencies via Puppet [puppet] - 10https://gerrit.wikimedia.org/r/754894 (https://phabricator.wikimedia.org/T298463) (owner: 10Muehlenhoff)
[15:44:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[15:45:13] <logmsgbot>	 !log aqu@deploy1002 Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
[15:45:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:45:16] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 02s)
[15:45:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:45:27] <wikibugs>	 (03PS2) 10Jelto: admin: Shell account and analytics-privatedata-users for mfossati [puppet] - 10https://gerrit.wikimedia.org/r/754955 (https://phabricator.wikimedia.org/T299343)
[15:45:49] <wikibugs>	 (03PS1) 10Filippo Giunchedi: ssl: update kartotherian cert [puppet] - 10https://gerrit.wikimedia.org/r/754968 (https://phabricator.wikimedia.org/T297604)
[15:46:55] <wikibugs>	 (03CR) 10Jelto: admin: Shell account and analytics-privatedata-users for mfossati (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/754955 (https://phabricator.wikimedia.org/T299343) (owner: 10Jelto)
[15:47:21] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] ssl: update kartotherian cert [puppet] - 10https://gerrit.wikimedia.org/r/754968 (https://phabricator.wikimedia.org/T297604) (owner: 10Filippo Giunchedi)
[15:47:43] <andrewbogott>	 !log resizing the wikitech-static host for T298052
[15:47:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:56] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/754902 (owner: 10Giuseppe Lavagetto)
[15:48:09] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/754903 (owner: 10Giuseppe Lavagetto)
[15:50:01] <logmsgbot>	 !log aqu@deploy1002 Started deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided)
[15:50:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:10] <logmsgbot>	 !log aqu@deploy1002 Finished deploy [airflow-dags/analytics-test@3ad07a0]: (no justification provided) (duration: 00m 09s)
[15:50:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:54:20] <moritzm>	 !log installing libssh2 security updates on stretch
[15:54:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:54:49] <godog>	 !log update kartotherian certs on maps hosts and roll-reload nginx - T297604
[15:54:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:54:52] <stashbot>	 T297604: cergen should include the cert's name in SAN too - https://phabricator.wikimedia.org/T297604
[15:55:16] <wikibugs>	 (03PS1) 10MMandere: cumin: Add cache::upload_envoy to cp aliases [puppet] - 10https://gerrit.wikimedia.org/r/754975 (https://phabricator.wikimedia.org/T271421)
[15:56:54] <wikibugs>	 10SRE, 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10Papaul) @hashar let me know when this is offline so i can take over
[15:57:18] <wikibugs>	 (03PS2) 10Jbond: P:rsyslog: add squid to the list of programs sent to central log [puppet] - 10https://gerrit.wikimedia.org/r/754521 (https://phabricator.wikimedia.org/T298087)
[15:57:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cumin: Add cache::upload_envoy to cp aliases [puppet] - 10https://gerrit.wikimedia.org/r/754975 (https://phabricator.wikimedia.org/T271421) (owner: 10MMandere)
[15:57:39] <wikibugs>	 (03CR) 10Jbond: "no need for mtail as there is already squid exporter, as such i think this is it?" [puppet] - 10https://gerrit.wikimedia.org/r/754521 (https://phabricator.wikimedia.org/T298087) (owner: 10Jbond)
[15:59:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:rsyslog: add squid to the list of programs sent to central log [puppet] - 10https://gerrit.wikimedia.org/r/754521 (https://phabricator.wikimedia.org/T298087) (owner: 10Jbond)
[15:59:52] <hashar>	 !log Shutting down CI for maintenance on contint2001  # T283582
[15:59:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:59:56] <stashbot>	 T283582: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582
[16:00:04] <jouncebot>	 papaul and hashar: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) CI server restart deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220118T1600).
[16:01:02] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] partman: use reuse profiles on all restbase hosts [puppet] - 10https://gerrit.wikimedia.org/r/753986 (https://phabricator.wikimedia.org/T295375) (owner: 10Hnowlan)
[16:02:35] <wikibugs>	 10SRE, 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10hashar) @Papaul the machine is shutting down. I am on IRC if you want t...
[16:03:07] <moritzm>	 !log installing xen security updates on buster (client-side libraries)
[16:03:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:03:47] <icinga-wm>	 PROBLEM - Host contint2001 is DOWN: PING CRITICAL - Packet loss = 100%
[16:04:35] <wikibugs>	 10Puppet, 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: Hieradata yaml style checking - https://phabricator.wikimedia.org/T236954 (10jhathaway) > From: @Joe  > I'm generally not a big fan of reformatting patches, because of how hard they make to reconstruct git history. However,...
[16:07:00] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: wmcs: toolforge: grid: introduce cookbook to verify basic grid health [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/754944 (https://phabricator.wikimedia.org/T298948)
[16:07:38] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
[16:07:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:56] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
[16:09:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:10:40] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
[16:10:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:11:44] <logmsgbot>	 !log hnowlan@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2010.codfw.wmnet with OS buster
[16:11:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:42] <wikibugs>	 (03PS1) 10Elukey: helmfile.d: add 'cert-manager' namespace to ml-serve [deployment-charts] - 10https://gerrit.wikimedia.org/r/754981 (https://phabricator.wikimedia.org/T298976)
[16:13:21] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
[16:13:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:14:37] <logmsgbot>	 !log hnowlan@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2010.codfw.wmnet with OS buster
[16:14:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:16:34] <wikibugs>	 (03PS6) 10DCausse: blazegraph: prometheus exporter may bypass nginx [puppet] - 10https://gerrit.wikimedia.org/r/754523
[16:21:49] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
[16:21:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:23:35] <wikibugs>	 (03CR) 10Herron: [C: 03+1] hieradata: use / as miscweb health check [puppet] - 10https://gerrit.wikimedia.org/r/754881 (https://phabricator.wikimedia.org/T291946) (owner: 10Filippo Giunchedi)
[16:23:51] <wikibugs>	 (03CR) 10Herron: [C: 03+1] Revert "ProductionServices: use graphite2003 for statsd" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754879 (https://phabricator.wikimedia.org/T299383) (owner: 10Filippo Giunchedi)
[16:23:59] <wikibugs>	 (03CR) 10Herron: [C: 03+1] wmnet: move reads to graphite1004 [dns] - 10https://gerrit.wikimedia.org/r/754874 (https://phabricator.wikimedia.org/T299383) (owner: 10Filippo Giunchedi)
[16:24:07] <wikibugs>	 (03CR) 10Herron: [C: 03+1] wmnet: move writes to graphite1004 [dns] - 10https://gerrit.wikimedia.org/r/754875 (https://phabricator.wikimedia.org/T299383) (owner: 10Filippo Giunchedi)
[16:24:15] <wikibugs>	 (03CR) 10Herron: [C: 03+1] Revert "graphite: check graphite2003 metrics" [puppet] - 10https://gerrit.wikimedia.org/r/754876 (https://phabricator.wikimedia.org/T299383) (owner: 10Filippo Giunchedi)
[16:24:31] <wikibugs>	 (03CR) 10Herron: [C: 03+1] Revert "profile: move statsd writes to graphite2003" [puppet] - 10https://gerrit.wikimedia.org/r/754877 (https://phabricator.wikimedia.org/T299383) (owner: 10Filippo Giunchedi)
[16:32:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[16:37:06] <wikibugs>	 10ops-codfw: Possible cable issue on restbase2010 management interface - https://phabricator.wikimedia.org/T299426 (10hnowlan)
[16:41:22] <wikibugs>	 (03CR) 10Herron: "This would work to send squid syslog messages to the kafka logging (logstash) pipeline, but wouldn't affect centrallog as the description " [puppet] - 10https://gerrit.wikimedia.org/r/754521 (https://phabricator.wikimedia.org/T298087) (owner: 10Jbond)
[16:42:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[16:43:00] <wikibugs>	 10ops-codfw, 10Lift-Wing: ml-serve2001 logged a corrected memory error - https://phabricator.wikimedia.org/T299427 (10klausman)
[16:43:06] <wikibugs>	 10Puppet, 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: Hieradata yaml style checking - https://phabricator.wikimedia.org/T236954 (10jhathaway) >>! In T236954#7625612, @jbond wrote: > Thanks for the work on this looks really good, in relation to linting vs automatic formatting i...
[16:44:02] <urbanecm>	 jouncebot: nowandnext
[16:44:02] <jouncebot>	 For the next 0 hour(s) and 15 minute(s): CI server restart (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220118T1600)
[16:44:02] <jouncebot>	 In 0 hour(s) and 15 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220118T1700)
[16:45:47] <logmsgbot>	 !log klausman@cumin2001 START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
[16:45:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:45:52] <icinga-wm>	 RECOVERY - Host contint2001 is UP: PING OK - Packet loss = 0%, RTA = 31.66 ms
[16:45:55] <wikibugs>	 10SRE, 10ops-codfw, 10Lift-Wing: ml-serve2001 logged a corrected memory error - https://phabricator.wikimedia.org/T299427 (10ops-monitoring-bot) Host rebooted by klausman@cumin2001 with reason: Reboot to clear ECC state in dmesg
[16:46:53] <wikibugs>	 10SRE, 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10Papaul) reset IDRAC, uograde BIOS and IDRAC.
[16:47:07] <wikibugs>	 10Puppet, 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: Hieradata yaml style checking - https://phabricator.wikimedia.org/T236954 (10jhathaway) >>! In T236954#7626447, @fgiunchedi wrote: > 100% agreed on consistency, I like the general idea and wanted to say +1 on not removing bl...
[16:47:13] <wikibugs>	 (03CR) 10SBassett: [C: 04-1] doc.wikimedia.org CSP: Allow XHR requests to Wikipedia and Wikidata (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/754048 (https://phabricator.wikimedia.org/T285570) (owner: 10Catrope)
[16:47:17] <wikibugs>	 (03PS1) 10Btullis: Deploy the dev version of cassandra to aqs1010.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/754988 (https://phabricator.wikimedia.org/T298516)
[16:47:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Deploy the dev version of cassandra to aqs1010.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/754988 (https://phabricator.wikimedia.org/T298516) (owner: 10Btullis)
[16:47:43] <logmsgbot>	 !log hnowlan@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2010.codfw.wmnet with OS buster
[16:47:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:48:05] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
[16:48:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:48:20] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on contint2001 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[16:48:25] <jinxer-wm>	 (LogstashIndexingFailures) firing: (2) Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[16:48:26] <icinga-wm>	 PROBLEM - Check systemd state on contint2001 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:48:40] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[16:49:28] <logmsgbot>	 !log hnowlan@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2010.codfw.wmnet with OS buster
[16:49:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:50:52] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] "LGTM.  Key matches the one Nick sent me over WMF Slack." [puppet] - 10https://gerrit.wikimedia.org/r/754954 (https://phabricator.wikimedia.org/T299186) (owner: 10Jelto)
[16:51:21] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for Nick Ray - https://phabricator.wikimedia.org/T299186 (10cmooney) Nick has sent me his key over Slack (responding to query from last week).  I can confirm it matches the one in the Gerrit patch.
[16:51:41] <wikibugs>	 (03CR) 10Btullis: [V: 03+1] "PCC SUCCESS (NOOP 2 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33304/console" [puppet] - 10https://gerrit.wikimedia.org/r/754988 (https://phabricator.wikimedia.org/T298516) (owner: 10Btullis)
[16:52:20] <hashar>	 !log contint2001: restarted ferm service
[16:52:21] <wikibugs>	 (03CR) 10JHathaway: Hieradata: format yaml with vinyl (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/754114 (https://phabricator.wikimedia.org/T236954) (owner: 10JHathaway)
[16:52:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:52:56] <icinga-wm>	 RECOVERY - Check systemd state on contint2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:53:01] <logmsgbot>	 !log hnowlan@cumin1001 START - Cookbook sre.hosts.reimage for host restbase2010.codfw.wmnet with OS buster
[16:53:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:14] <logmsgbot>	 !log klausman@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
[16:53:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:54:34] <wikibugs>	 10SRE, 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10hashar) 05Open→03Resolved a:03Papaul I have restarted ferm.  Zuul...
[16:56:58] <wikibugs>	 10SRE, 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10Papaul) @hashar no problem you can close the task once all is back onli...
[16:57:17] <wikibugs>	 (03CR) 10Elukey: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/754981 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[16:58:34] <wikibugs>	 10SRE, 10ops-codfw, 10Lift-Wing: ml-serve2001 logged a corrected memory error - https://phabricator.wikimedia.org/T299427 (10klausman) `root@ml-serve2001:/sys/devices/system/edac/mc# grep .  mc*/*count mc0/ce_count:0 mc0/ce_noinfo_count:0 mc0/ue_count:0 mc0/ue_noinfo_count:0 mc1/ce_count:0 mc1/ce_noinfo_coun...
[17:00:04] <jouncebot>	 jbond and rzl: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Puppet request window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220118T1700).
[17:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[17:03:09] <wikibugs>	 10SRE, 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10Papaul)
[17:07:21] <wikibugs>	 10SRE, 10ops-codfw: Possible cable issue on restbase2010 management interface - https://phabricator.wikimedia.org/T299426 (10Papaul) @hnowlan looks like an IDRAC reset and firmware upgrade too on this server will fix the issue   PE R430 purchased in 2016
[17:08:22] <wikibugs>	 (03CR) 10Klausman: [C: 03+1] helmfile.d: add 'cert-manager' namespace to ml-serve [deployment-charts] - 10https://gerrit.wikimedia.org/r/754981 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[17:09:06] <wikibugs>	 (03CR) 10DCausse: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/754523 (owner: 10DCausse)
[17:09:45] <wikibugs>	 10SRE, 10ops-codfw, 10Lift-Wing: ml-serve2001 logged a corrected memory error - https://phabricator.wikimedia.org/T299427 (10Papaul) confirmed all green in IDRAC
[17:12:46] <urbanecm>	 hashar: hi, is CI supposed to be fine at this point?
[17:14:01] <wikibugs>	 (03PS2) 10Andrew Bogott: cloud-vps nfsclient: switch to using the VM-hosted scratch NFS server [puppet] - 10https://gerrit.wikimedia.org/r/754043 (https://phabricator.wikimedia.org/T291405)
[17:14:04] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs nfsclient: remove a long-absented mount [puppet] - 10https://gerrit.wikimedia.org/r/754991
[17:15:45] <wikibugs>	 (03PS1) 10Ppchelko: First pass on creating config-schema.yaml [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754910
[17:16:14] <wikibugs>	 (03PS1) 10Ppchelko: Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911
[17:16:28] <wikibugs>	 (03PS1) 10Joal: Reset druid load jobs for network_flows_internal [puppet] - 10https://gerrit.wikimedia.org/r/754994 (https://phabricator.wikimedia.org/T263277)
[17:16:33] <wikibugs>	 (03PS2) 10Ppchelko: Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911
[17:16:37] <moritzm>	 !log installing gmp security updates
[17:16:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:18:38] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on contint2001 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[17:23:59] <wikibugs>	 (03PS3) 10Ppchelko: Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911
[17:25:37] <wikibugs>	 (03PS1) 10Cwhite: logstash: gitlab: rename service field prior to populating object [puppet] - 10https://gerrit.wikimedia.org/r/754995
[17:26:02] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: wmcs: toolforge: grid: introduce cookbook to verify basic grid health [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/754944 (https://phabricator.wikimedia.org/T298948)
[17:32:35] <Lucas_WMDE>	 urbanecm: according to hashar’s email, “the maintenance is complete”, but you’re not the only one having issues :/
[17:32:48] <urbanecm>	 i can't get jenkins to run anything
[17:32:55] <hashar>	 maybe something is still broken
[17:32:56] <icinga-wm>	 PROBLEM - puppet last run on mx1001 is CRITICAL: CRITICAL: Puppet last ran 5 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[17:32:57] <urbanecm>	 and https://integration.wikimedia.org/zuul/ is empty
[17:33:03] <Lucas_WMDE>	 yeah I think so
[17:33:14] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 6982 MB (19% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[17:33:16] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] "I dislike this approach. We need better code/config decoupling." [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/754944 (https://phabricator.wikimedia.org/T298948) (owner: 10Arturo Borrero Gonzalez)
[17:33:33] <hashar>	 I swear I have seen changes being tested and jobs being triggered
[17:34:06] <dancy>	 A couple of jobs processed but then everything stopped.
[17:34:14] <hashar>	 hmm
[17:34:31] <hashar>	 indeed the zuul scheduler is idle
[17:34:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[17:37:38] <hashar>	 !log restarted zuul on contint2001
[17:37:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:37:50] <hashar>	 Jan 18 17:37:32 contint2001 zuul-server[20572]: /srv/deployment/zuul/venv/local/lib/python2.7/site-packages/paramiko/client.py:685: UserWarning: Unknown ssh-rsa host key for [gerrit.wikimedia.org]:29418: dce9687b991b27d0f9fdce6a2ebf92e1
[17:37:52] <hashar>	 ...
[17:38:06] <hashar>	 oh joy
[17:38:23] <dancy>	 ew.
[17:38:42] <hashar>	 I have no idea how Gerrit ssh host key might have changed
[17:38:58] <hashar>	 or the zuul user no more knows about it
[17:39:14] <dancy>	 if it really did change then I should get a complaint when I try to clone a repo.
[17:39:15] <dancy>	 <testing>
[17:39:16] <icinga-wm>	 RECOVERY - puppet last run on mx1001 is OK: OK: Puppet is currently enabled, last run 6 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[17:39:34] <dancy>	 No complaint received.
[17:40:21] <Lucas_WMDE>	 not seeing a changed gerrit host key here either
[17:40:24] <hashar>	 the scheduler runs as zuul  the list of known hosts is in /var/lib/zuul/.ssh/known_hosts  last touched in July 2020
[17:40:39] <hashar>	 maybe that is false warning ;)
[17:40:58] <hashar>	 zuul-serv 20572 zuul   17u  IPv6             222452       0t0      TCP contint2001.wikimedia.org:54516->gerrit.wikimedia.org:29418 (ESTABLISHED)
[17:41:33] <wikibugs>	 (03CR) 10Dzahn: "I can also add the /healthz file. I just did not care so far because I knew it only checks if the port is open. hmm" [puppet] - 10https://gerrit.wikimedia.org/r/754881 (https://phabricator.wikimedia.org/T291946) (owner: 10Filippo Giunchedi)
[17:44:00] <hashar>	 *sigh*
[17:44:01] <dancy>	 hashar: It looks happier now.
[17:44:16] <hashar>	 maybe it failed to connect to gerrit
[17:44:26] <hashar>	 or the connection dropped when I restarted ferm
[17:44:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[17:45:22] <wikibugs>	 10ops-codfw, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache200[1-3] - https://phabricator.wikimedia.org/T299433 (10RobH)
[17:45:48] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache200[1-3] - https://phabricator.wikimedia.org/T299433 (10RobH)
[17:46:06] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache200[1-3] - https://phabricator.wikimedia.org/T299433 (10RobH) a:03Papaul
[17:46:08] <hashar>	 so I think the sequence is zuul restarted all fine, connected to gerrit and processed changes
[17:46:40] <hashar>	 I then restarted ferm (firewall stuff) at 16:51:50 UTC
[17:47:01] <hashar>	 which might have killed the zuul ---(ssh:29418)---> gerrit  connection
[17:47:04] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7102 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[17:47:06] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 6974 MB (19% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[17:47:13] <wikibugs>	 (03CR) 10Eevans: [C: 03+1] "LGTM; When are you planning to push this out?" [puppet] - 10https://gerrit.wikimedia.org/r/754988 (https://phabricator.wikimedia.org/T298516) (owner: 10Btullis)
[17:47:32] <wikibugs>	 10SRE, 10Observability-Metrics, 10observability, 10Graphite: PHP statsd client doesn't support tagging metrics - https://phabricator.wikimedia.org/T225721 (10lmata)
[17:47:51] <wikibugs>	 10SRE, 10Observability-Metrics, 10WMF-Legal, 10observability, and 2 others: Add license statement to Grafana dashboards - https://phabricator.wikimedia.org/T214819 (10lmata)
[17:48:01] <hashar>	 and possible flushed all the gearman function which RhinosF1 noticed at some point (job results reported as NOT_REGISTERED)
[17:52:27] <hashar>	 dancy: yeah looks better thx!
[17:52:30] <hashar>	 Lucas_WMDE: should be good now
[17:52:38] <Lucas_WMDE>	 yup, thanks!
[17:54:18] <dancy>	 hashar: I assume the connection from zuul to gerrit is to receive the events stream?   
[17:54:36] <hashar>	 correct
[17:55:00] <hashar>	 then if the firewall killed the connection I would expect zuul to notice that and attempt to reconnect
[17:55:55] <dancy>	 if zuul never sends anything down the connection (other than TCP ACKs), it'll never find out
[17:56:01] <dancy>	 it'll just think there are no events.
[17:56:17] <dancy>	 TCP keepalives might help recognize the broken connection sooner.
[17:56:53] <dancy>	 That said, I'm surprised that connection tracking didn't keep the traffic flowing.  
[17:57:06] <dcausse>	 !log restarting blazegraph on wdqs1007 (jvm stuck for 13hours)
[17:57:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:57:15] <dancy>	 But I haven't looked at the ferm rules closely to see how they're arranged.
[17:57:17] <hashar>	 maybe it is something entirely different
[17:57:57] <icinga-wm>	 PROBLEM - Blazegraph Port for wdqs-blazegraph on wdqs1007 is CRITICAL: connect to address 127.0.0.1 and port 9999: Connection refused https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[17:59:07] <icinga-wm>	 RECOVERY - Blazegraph Port for wdqs-blazegraph on wdqs1007 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9999 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[18:00:04] <jouncebot>	 chrisalbon and accraze: That opportune time is upon us again. Time for a Services – Graphoid / ORES deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220118T1800).
[18:01:36] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+1] blazegraph: prometheus exporter may bypass nginx [puppet] - 10https://gerrit.wikimedia.org/r/754523 (owner: 10DCausse)
[18:03:11] <wikibugs>	 10ops-eqiad, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10RobH)
[18:03:46] <wikibugs>	 10ops-eqiad, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10RobH)
[18:04:09] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] blazegraph: prometheus exporter may bypass nginx [puppet] - 10https://gerrit.wikimedia.org/r/754523 (owner: 10DCausse)
[18:04:15] <wikibugs>	 10ops-eqiad, 10DC-Ops, 10Machine-Learning-Team: Q3:(Need By: TBD) rack/setup/install ml-cache100[1-3] - https://phabricator.wikimedia.org/T299435 (10RobH) a:03Jclark-ctr
[18:04:29] <hashar>	 looks like it is fine, away idling again
[18:04:33] <hashar>	 I am away idling again
[18:04:40] <wikibugs>	 (03CR) 10Accraze: [C: 03+1] helmfile.d: add 'cert-manager' namespace to ml-serve [deployment-charts] - 10https://gerrit.wikimedia.org/r/754981 (https://phabricator.wikimedia.org/T298976) (owner: 10Elukey)
[18:04:44] <dancy>	 Thanks hashar!
[18:05:04] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/754954 (https://phabricator.wikimedia.org/T299186) (owner: 10Jelto)
[18:05:13] <wikibugs>	 (03CR) 10Ppchelko: "recheck" [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754910 (owner: 10Ppchelko)
[18:05:23] <wikibugs>	 (03CR) 10Ppchelko: "recheck" [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911 (owner: 10Ppchelko)
[18:06:08] <wikibugs>	 (03PS1) 10Majavah: Drop CentralAuthUserMerge log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754998 (https://phabricator.wikimedia.org/T216089)
[18:07:21] <wikibugs>	 (03PS2) 10Majavah: Drop CentralAuthUserMerge log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754998 (https://phabricator.wikimedia.org/T216089)
[18:09:07] <wikibugs>	 (03PS1) 10Majavah: Disable UserMerge [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754999 (https://phabricator.wikimedia.org/T216089)
[18:10:05] <urbanecm>	 jouncebot: nowandnext
[18:10:05] <jouncebot>	 For the next 0 hour(s) and 49 minute(s): Services – Graphoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220118T1800)
[18:10:05] <jouncebot>	 In 0 hour(s) and 49 minute(s): Pre MediaWiki train break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220118T1900)
[18:10:11] <wikibugs>	 (03PS2) 10Urbanecm: pwnwiki: Deploy Growth features to newcomers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754504 (https://phabricator.wikimedia.org/T298115)
[18:10:21] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] pwnwiki: Deploy Growth features to newcomers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754504 (https://phabricator.wikimedia.org/T298115) (owner: 10Urbanecm)
[18:11:35] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 6981 MB (19% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[18:11:51] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Machine-Learning-Team: (Need By: TBD) rack/setup/install ml-serve200[5-8] - https://phabricator.wikimedia.org/T294945 (10Papaul)
[18:13:46] <wikibugs>	 (03PS1) 10Jeena Huneidi: testwikis wikis to 1.38.0-wmf.18  refs T293959 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755001
[18:13:51] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 03+2] testwikis wikis to 1.38.0-wmf.18  refs T293959 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755001 (owner: 10Jeena Huneidi)
[18:14:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] testwikis wikis to 1.38.0-wmf.18  refs T293959 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755001 (owner: 10Jeena Huneidi)
[18:14:49] <urbanecm>	 jeena: I'm sorry, i thought train is in 2 hours and +2'ed a config change of myself
[18:15:00] <urbanecm>	 happy to wait though
[18:15:23] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7337 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[18:15:50] <RhinosF1>	 hashar: CI is still broke
[18:16:00] <wikibugs>	 (03Merged) 10jenkins-bot: pwnwiki: Deploy Growth features to newcomers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754504 (https://phabricator.wikimedia.org/T298115) (owner: 10Urbanecm)
[18:16:06] <hashar>	 damn
[18:16:07] <urbanecm>	 not fully at least
[18:16:10] <RhinosF1>	 I'm getting errors on sonar and freshnel builds
[18:16:27] <hashar>	 links?
[18:16:29] <urbanecm>	 but the config patch took more than i'd want it to, and the failure at jeena's patch is weird too
[18:16:34] <RhinosF1>	 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/754909/7
[18:16:43] <RhinosF1>	 hashar: pretty sure sonar / freshnel isn't me
[18:16:43] <urbanecm>	 https://integration.wikimedia.org/ci/job/operations-mw-config-php72-composer-lint-docker/19990/console
[18:16:53] <hashar>	 jeena: I will clean the agent disks
[18:16:57] <urbanecm>	 "No space left on device"
[18:16:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911 (owner: 10Ppchelko)
[18:17:16] <hashar>	 pruning
[18:17:30] <hashar>	 I looked last friday at rebuilding the fleet of agents to have more disk space
[18:17:32] * urbanecm goes to sync his patch that got merged
[18:17:40] <jeena>	 thanks hashar. I canceled the merge anyway
[18:17:42] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[18:17:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:18:41] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Machine-Learning-Team: (Need By: TBD) rack/setup/install ml-staging200[12] - https://phabricator.wikimedia.org/T294946 (10Papaul)
[18:18:56] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[18:18:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[18:18:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:37] <wikibugs>	 (03CR) 10Jforrester: [C: 03+1] Disable UserMerge [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754999 (https://phabricator.wikimedia.org/T216089) (owner: 10Majavah)
[18:20:03] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 0ff5874469b717cba38ed7cff0669754517a3553: pwnwiki: Deploy Growth features to newcomers (T298115) (duration: 02m 14s)
[18:20:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:20:07] <stashbot>	 T298115: Deploy Growth features at pwn.wikipedia.org - https://phabricator.wikimedia.org/T298115
[18:20:08] * urbanecm done with deployment
[18:20:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[18:20:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:20:11] <urbanecm>	 jeena: ^˘
[18:20:22] <wikibugs>	 (03PS2) 10Ppchelko: First pass on creating config-schema.yaml [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754910
[18:20:42] <wikibugs>	 (03PS4) 10Ppchelko: Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911
[18:20:57] <hashar>	 RhinosF1: jeena I have cleaned the CI agents
[18:21:17] <wikibugs>	 (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755001 (owner: 10Jeena Huneidi)
[18:21:45] <RhinosF1>	 hashar: let's see how many errors I cause this time
[18:21:47] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7095 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[18:22:45] <jeena>	 urbanecm: sorry I didn't see your earlier messages!
[18:23:04] <urbanecm>	 i saw you cancelled your merge, so i just finished what i wanted to do
[18:23:46] <jeena>	 I should have checked before trying to deploy to testwikis
[18:23:55] <jeena>	 train is on pause atm now though
[18:24:08] <RhinosF1>	 hashar: only me caused errors at the moment
[18:25:07] <jeena>	 but yeah the train window is indeed at the time you thought, we just usually deploy to testwikis ahead of that
[18:25:11] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[18:25:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:25:41] <wikibugs>	 (03CR) 10Ebernhardson: sre.wdqs.data-reload: few fixes and cleanups (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/753426 (owner: 10DCausse)
[18:26:12] <urbanecm>	 jeena: i see, didn't know that. Thanks for explaining. Anyway, I'm done now :).
[18:26:25] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[18:26:26] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[18:26:27] <icinga-wm>	 RECOVERY - Cassandra instance data free space on restbase2011 is OK: DISK OK - free space: /srv/cassandra/instance-data 11433 MB (32% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[18:26:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:26:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:26:56] <jeena>	 Thanks! :)
[18:27:40] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[18:27:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:31:10] <wikibugs>	 10Puppet, 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: Hieradata yaml style checking - https://phabricator.wikimedia.org/T236954 (10colewhite) Thanks for looking into this!  Automatic formatting would be great as long as the output is human-oriented.  >>! In T236954#7624944, @jh...
[18:31:11] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7360 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[18:32:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[18:35:57] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7068 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[18:38:10] <wikibugs>	 (03CR) 10Cwhite: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/754995 (owner: 10Cwhite)
[18:40:41] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7147 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[18:41:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] logstash: gitlab: rename service field prior to populating object [puppet] - 10https://gerrit.wikimedia.org/r/754995 (owner: 10Cwhite)
[18:42:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911 (owner: 10Ppchelko)
[18:42:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[18:44:43] <wikibugs>	 10ops-codfw, 10DC-Ops, 10Platform Engineering, 10RESTBase: Q3:(Need By: TBD) rack/setup/install restbase-dev200[123].codfw.wmnet - https://phabricator.wikimedia.org/T299437 (10RobH)
[18:45:29] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Platform Engineering, 10RESTBase: Q3:(Need By: TBD) rack/setup/install restbase-dev200[123].codfw.wmnet - https://phabricator.wikimedia.org/T299437 (10RobH)
[18:46:41] <wikibugs>	 (03CR) 10SBassett: [C: 03+1] doc.wikimedia.org CSP: Allow XHR requests to Wikipedia and Wikidata (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/754048 (https://phabricator.wikimedia.org/T285570) (owner: 10Catrope)
[18:46:59] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Platform Engineering, 10RESTBase: Q3:(Need By: TBD) rack/setup/install restbase-dev200[123].codfw.wmnet - https://phabricator.wikimedia.org/T299437 (10RobH) a:03Papaul
[18:52:29] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7290 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[18:52:51] <wikibugs>	 (03CR) 10Cwhite: "CI seems to be complaining about something different." [puppet] - 10https://gerrit.wikimedia.org/r/754995 (owner: 10Cwhite)
[18:53:09] <wikibugs>	 (03PS1) 10Jgiannelos: proton: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/755005
[18:58:18] <wikibugs>	 (03CR) 10Jgiannelos: [C: 03+2] proton: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/755005 (owner: 10Jgiannelos)
[18:59:33] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7331 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[18:59:33] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 6897 MB (19% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[19:00:05] <jouncebot>	 Deploy window Pre MediaWiki train break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220118T1900)
[19:02:10] <wikibugs>	 (03Merged) 10jenkins-bot: proton: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/755005 (owner: 10Jgiannelos)
[19:05:21] <wikibugs>	 (03CR) 1020after4: [C: 03+2] testwikis wikis to 1.38.0-wmf.18  refs T293959 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755001 (owner: 10Jeena Huneidi)
[19:05:44] <wikibugs>	 (03CR) 1020after4: testwikis wikis to 1.38.0-wmf.18  refs T293959 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755001 (owner: 10Jeena Huneidi)
[19:06:29] <twentyafterfour>	 whoops didn't noticce jeena had removed +2 
[19:06:54] <jeena>	 twentyafterfour: yes, paused the deployment due to an UBN
[19:08:09] <twentyafterfour>	 apologies, I won't deploy anything 
[19:08:25] <wikibugs>	 (03PS5) 10Ppchelko: Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911
[19:08:29] <jeena>	 np, thanks though!
[19:09:01] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 6818 MB (19% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[19:11:25] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7328 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[19:14:46] <wikibugs>	 (03PS1) 10Ottomata: Prep for releasing ~wmf6 [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/755008
[19:15:07] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Prep for releasing ~wmf6 [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/755008 (owner: 10Ottomata)
[19:16:07] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7334 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[19:20:51] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7304 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[19:21:46] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+1] wcqs: set QUERY_SERVICE env name with wcqs/wdqs [puppet] - 10https://gerrit.wikimedia.org/r/753973 (owner: 10DCausse)
[19:22:04] <wikibugs>	 (03PS6) 10Ppchelko: Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911
[19:23:15] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7250 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[19:25:35] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7362 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[19:30:58] <wikibugs>	 (03CR) 10Herron: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/754995 (owner: 10Cwhite)
[19:32:09] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on restbase2010 is CRITICAL: cluster=restbase device={sde,sdf,sdg,sdh,sdi,sdj} instance=restbase2010 job=node site=codfw https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=restbase2010&var-datasource=codfw+prometheus/ops
[19:32:11] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7142 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[19:32:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[19:35:14] <wikibugs>	 (03PS1) 10Jdlrobson: SkinTemplate: Set template context in buildPersonalUrls() [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754912 (https://phabricator.wikimedia.org/T299352)
[19:35:30] <wikibugs>	 (03CR) 10Herron: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/754995 (owner: 10Cwhite)
[19:36:39] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7163 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[19:37:51] <wikibugs>	 (03PS1) 10Jdlrobson: Don't run Vector hook when menu absent from page [skins/Vector] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754913 (https://phabricator.wikimedia.org/T289619)
[19:38:56] <wikibugs>	 (03PS1) 10Jdlrobson: Restore icons to user links dropdown [skins/Vector] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/755012 (https://phabricator.wikimedia.org/T289619)
[19:42:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[19:43:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911 (owner: 10Ppchelko)
[19:48:31] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7069 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[19:52:14] <wikibugs>	 10SRE, 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10hashar)
[19:52:50] <wikibugs>	 10SRE, 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10hashar) CI had to be restarted after the machine went up due to some od...
[19:52:56] <wikibugs>	 10SRE, 10ops-codfw, 10Continuous-Integration-Infrastructure, 10serviceops-radar, 10Release-Engineering-Team (Radar): contint2001.mgmt disappeared from Icinga - https://phabricator.wikimedia.org/T298861 (10hashar) 05Stalled→03Resolved a:03jbond The DRAC on contint2001.wikimedia.org has been upgraded...
[19:54:47] <wikibugs>	 (03PS7) 10Ppchelko: Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911
[19:54:59] <wikibugs>	 (03PS8) 10Ppchelko: Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911
[19:55:35] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 6792 MB (19% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[19:58:19] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 03+2] SkinTemplate: Set template context in buildPersonalUrls() [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754912 (https://phabricator.wikimedia.org/T299352) (owner: 10Jdlrobson)
[19:59:20] <wikibugs>	 (03CR) 10Jdlrobson: [C: 03+1] Don't run Vector hook when menu absent from page [skins/Vector] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754913 (https://phabricator.wikimedia.org/T289619) (owner: 10Jdlrobson)
[19:59:24] <wikibugs>	 (03CR) 10Jdlrobson: [C: 03+1] Restore icons to user links dropdown [skins/Vector] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/755012 (https://phabricator.wikimedia.org/T289619) (owner: 10Jdlrobson)
[20:00:05] <jouncebot>	 jeena and twentyafterfour: #bothumor My software never has bugs. It just develops random features. Rise for MediaWiki train - Utc-7 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220118T2000).
[20:01:26] <jeena>	 deploy will commence after merging some backports to the wmf.18 branch
[20:01:49] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mediabackups: Backup s7 media files at codfw [puppet] - 10https://gerrit.wikimedia.org/r/754025 (https://phabricator.wikimedia.org/T262668) (owner: 10Jcrespo)
[20:04:02] <wikibugs>	 (03CR) 10Cwhite: [V: 03+2 C: 03+2] logstash: gitlab: rename service field prior to populating object [puppet] - 10https://gerrit.wikimedia.org/r/754995 (owner: 10Cwhite)
[20:04:22] <wikibugs>	 (03PS2) 10Cwhite: logstash: gitlab: rename service field prior to populating object [puppet] - 10https://gerrit.wikimedia.org/r/754995
[20:04:28] <wikibugs>	 (03CR) 10Cwhite: [V: 03+2 C: 03+2] logstash: gitlab: rename service field prior to populating object [puppet] - 10https://gerrit.wikimedia.org/r/754995 (owner: 10Cwhite)
[20:12:13] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 6962 MB (19% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[20:16:14] <wikibugs>	 (03CR) 10Herron: [C: 03+2] remove references to centrallog2001 [homer/public] - 10https://gerrit.wikimedia.org/r/754028 (https://phabricator.wikimedia.org/T298994) (owner: 10Herron)
[20:16:55] <icinga-wm>	 RECOVERY - Cassandra instance data free space on restbase2012 is OK: DISK OK - free space: /srv/cassandra/instance-data 11801 MB (33% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[20:17:30] <wikibugs>	 (03Merged) 10jenkins-bot: remove references to centrallog2001 [homer/public] - 10https://gerrit.wikimedia.org/r/754028 (https://phabricator.wikimedia.org/T298994) (owner: 10Herron)
[20:18:31] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation: (Need By: TBD) rack/setup/install dumpsdata100[67] - https://phabricator.wikimedia.org/T299443 (10RobH)
[20:19:11] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation: (Need By: TBD) rack/setup/install dumpsdata100[67] - https://phabricator.wikimedia.org/T299443 (10RobH)
[20:19:38] <wikibugs>	 (03Merged) 10jenkins-bot: SkinTemplate: Set template context in buildPersonalUrls() [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754912 (https://phabricator.wikimedia.org/T299352) (owner: 10Jdlrobson)
[20:20:00] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Dumps-Generation: (Need By: TBD) rack/setup/install dumpsdata100[67] - https://phabricator.wikimedia.org/T299443 (10RobH) p:05Medium→03High Setting to high priority as this is the test bed order for the PERC H750 controller blocking other orders via T297913.  Getting at...
[20:20:46] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 03+2] "backport" [skins/Vector] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/755012 (https://phabricator.wikimedia.org/T289619) (owner: 10Jdlrobson)
[20:21:31] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 6652 MB (18% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[20:21:39] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 03+2] "backport" [skins/Vector] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754913 (https://phabricator.wikimedia.org/T289619) (owner: 10Jdlrobson)
[20:23:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[20:24:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:38] <wikibugs>	 (03CR) 10Daniel Kinzler: "Looks good to me! I'd like to hear from Timo though." [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911 (owner: 10Ppchelko)
[20:24:56] <wikibugs>	 (03CR) 10Herron: "hmm, seeing "Unable to connect" errors while tying to apply this via homer https://phabricator.wikimedia.org/P18785" [homer/public] - 10https://gerrit.wikimedia.org/r/754028 (https://phabricator.wikimedia.org/T298994) (owner: 10Herron)
[20:24:56] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[20:24:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[20:24:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:25:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:25:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[20:25:59] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 6758 MB (19% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[20:25:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:29:10] <wikibugs>	 (03CR) 10Daniel Kinzler: [C: 03+1] "Good to go as an experiment. The new file isn't used anywhere (except for the experimental follow up patch). So this should be safe." [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754910 (owner: 10Ppchelko)
[20:29:29] <wikibugs>	 (03PS1) 104nn1l2: Revert "commonswiki: Add peerj.com to wgCopyUploadsDomains whitelist" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754914
[20:30:23] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7143 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[20:32:37] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7334 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[20:38:24] <wikibugs>	 (03PS9) 10Ppchelko: Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911
[20:38:32] <wikibugs>	 (03CR) 10Ppchelko: Benchmark loading DefaultSettings from YAML (032 comments) [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911 (owner: 10Ppchelko)
[20:38:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[20:38:56] <wikibugs>	 (03Merged) 10jenkins-bot: Don't run Vector hook when menu absent from page [skins/Vector] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754913 (https://phabricator.wikimedia.org/T289619) (owner: 10Jdlrobson)
[20:39:17] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7183 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[20:39:57] <wikibugs>	 (03Merged) 10jenkins-bot: Restore icons to user links dropdown [skins/Vector] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/755012 (https://phabricator.wikimedia.org/T289619) (owner: 10Jdlrobson)
[20:40:15] <icinga-wm>	 PROBLEM - Widespread puppet agent failures on alert1001 is CRITICAL: 0.0103 ge 0.01 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[20:40:51] <wikibugs>	 (03PS14) 10Brennen Bearnes: gitlab-runner: restrict docker images and services [puppet] - 10https://gerrit.wikimedia.org/r/724472 (https://phabricator.wikimedia.org/T291978)
[20:41:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[20:41:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:42:11] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[20:42:12] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[20:42:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:42:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:42:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] gitlab-runner: restrict docker images and services [puppet] - 10https://gerrit.wikimedia.org/r/724472 (https://phabricator.wikimedia.org/T291978) (owner: 10Brennen Bearnes)
[20:43:12] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[20:43:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:46:05] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7199 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[20:47:14] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 03+2] testwikis wikis to 1.38.0-wmf.18  refs T293959 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755001 (owner: 10Jeena Huneidi)
[20:48:00] <wikibugs>	 (03Merged) 10jenkins-bot: testwikis wikis to 1.38.0-wmf.18  refs T293959 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755001 (owner: 10Jeena Huneidi)
[20:48:15] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[20:48:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:48:29] <wikibugs>	 (03PS1) 104nn1l2: fawiki: Exempt userspaces from being indexed by search engines [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755018 (https://phabricator.wikimedia.org/T299363)
[20:48:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[20:49:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[20:49:10] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[20:49:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:49:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[20:50:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:54] <logmsgbot>	 !log jhuneidi@deploy1002 Started scap: testwikis to 1.38.0-wmf.18 refs T293959
[20:50:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:58] <stashbot>	 T293959: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959
[20:55:31] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7016 MB (19% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[20:55:31] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7111 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[20:57:08] <wikibugs>	 (03CR) 10Krinkle: Benchmark loading DefaultSettings from YAML (034 comments) [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911 (owner: 10Ppchelko)
[20:57:13] <wikibugs>	 (03PS1) 10Ottomata: Use conda-environment.yaml for repeatable env builds [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/755021
[20:57:59] <wikibugs>	 (03PS2) 10Ottomata: Use conda-environment.yaml for repeatable env builds [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/755021
[20:59:30] <wikibugs>	 (03PS3) 10Ottomata: Use conda-environment.yaml for repeatable env builds [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/755021
[20:59:58] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Use conda-environment.yaml for repeatable env builds [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/755021 (owner: 10Ottomata)
[21:00:01] <icinga-wm>	 PROBLEM - Disk space on deneb is CRITICAL: DISK CRITICAL - free space: / 10882 MB (4% inode=63%): /tmp 10882 MB (4% inode=63%): /var/tmp 10882 MB (4% inode=63%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=deneb&var-datasource=codfw+prometheus/ops
[21:04:57] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7121 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[21:05:59] <wikibugs>	 (03PS15) 10Brennen Bearnes: gitlab-runner: restrict docker images and services [puppet] - 10https://gerrit.wikimedia.org/r/724472 (https://phabricator.wikimedia.org/T291978)
[21:07:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] gitlab-runner: restrict docker images and services [puppet] - 10https://gerrit.wikimedia.org/r/724472 (https://phabricator.wikimedia.org/T291978) (owner: 10Brennen Bearnes)
[21:14:01] <icinga-wm>	 PROBLEM - Ensure hosts are not performing a change on every puppet run on cumin1001 is CRITICAL: CRITICAL: the following (5) node(s) change every puppet run: miscweb1002, build2001, wdqs1010, labstore1006, labstore1007 https://wikitech.wikimedia.org/wiki/Puppet%23check_puppet_run_changes
[21:14:27] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2011 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7235 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[21:23:28] <wikibugs>	 (03PS10) 10Ppchelko: Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911
[21:23:47] <wikibugs>	 (03CR) 10Ppchelko: Benchmark loading DefaultSettings from YAML (034 comments) [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911 (owner: 10Ppchelko)
[21:24:28] <wikibugs>	 (03PS11) 10Ppchelko: Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911
[21:26:21] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7074 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[21:29:26] <logmsgbot>	 !log jhuneidi@deploy1002 Finished scap: testwikis to 1.38.0-wmf.18 refs T293959 (duration: 38m 31s)
[21:29:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:29:29] <stashbot>	 T293959: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959
[21:30:49] <wikibugs>	 (03PS1) 10Hashar: Merge tag 'v3.3.9' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755024 (https://phabricator.wikimedia.org/T240264)
[21:34:17] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Merge tag 'v3.3.9' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755024 (https://phabricator.wikimedia.org/T240264) (owner: 10Hashar)
[21:34:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[21:35:27] <wikibugs>	 (03PS1) 104nn1l2: azwiki: change alias Q to QA for the draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755026 (https://phabricator.wikimedia.org/T299332)
[21:35:54] <wikibugs>	 (03PS2) 10Hashar: Merge tag 'v3.3.9' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755024 (https://phabricator.wikimedia.org/T240264)
[21:37:41] <jeena>	 deploying to group0 shortly
[21:37:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Merge tag 'v3.3.9' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755024 (https://phabricator.wikimedia.org/T240264) (owner: 10Hashar)
[21:39:32] <wikibugs>	 (03PS1) 10Jeena Huneidi: group0 wikis to 1.38.0-wmf.18  refs T293959 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755027
[21:39:38] <wikibugs>	 (03CR) 10Jeena Huneidi: [C: 03+2] group0 wikis to 1.38.0-wmf.18  refs T293959 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755027 (owner: 10Jeena Huneidi)
[21:40:05] <wikibugs>	 (03PS2) 104nn1l2: azwiki: Change alias Q to QA for the draft namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755026 (https://phabricator.wikimedia.org/T299332)
[21:40:32] <wikibugs>	 (03Merged) 10jenkins-bot: group0 wikis to 1.38.0-wmf.18  refs T293959 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755027 (owner: 10Jeena Huneidi)
[21:41:08] <wikibugs>	 (03PS1) 10Hashar: Update Gerrit to 3.3.9 [software/gerrit] (deploy/wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755028 (https://phabricator.wikimedia.org/T299451)
[21:42:26] <logmsgbot>	 !log jhuneidi@deploy1002 rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.18  refs T293959
[21:42:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:42:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911 (owner: 10Ppchelko)
[21:42:30] <stashbot>	 T293959: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959
[21:44:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[21:46:40] <wikibugs>	 (03PS3) 10Hashar: Merge tag 'v3.3.9' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755024 (https://phabricator.wikimedia.org/T240264)
[21:53:48] <wikibugs>	 (03PS2) 10Ryan Kemper: elasticsearch: decom elastic10[32-47] (step 4) [puppet] - 10https://gerrit.wikimedia.org/r/736119 (https://phabricator.wikimedia.org/T294805)
[21:55:05] <wikibugs>	 (03PS2) 10Ryan Kemper: elasticsearch: hiera for new eqiad nodes (step 1) [puppet] - 10https://gerrit.wikimedia.org/r/736116 (https://phabricator.wikimedia.org/T294805)
[21:55:07] <wikibugs>	 (03PS2) 10Ryan Kemper: elasticsearch: activate role (step 2) [puppet] - 10https://gerrit.wikimedia.org/r/736117 (https://phabricator.wikimedia.org/T294805)
[21:55:09] <wikibugs>	 (03PS2) 10Ryan Kemper: elasticsearch: new master config (step 3) [puppet] - 10https://gerrit.wikimedia.org/r/736118 (https://phabricator.wikimedia.org/T294805)
[21:55:11] <wikibugs>	 (03PS3) 10Ryan Kemper: elasticsearch: decom elastic10[32-47] (step 4) [puppet] - 10https://gerrit.wikimedia.org/r/736119 (https://phabricator.wikimedia.org/T294805)
[21:57:42] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install ganeti10[29|3(012)] - https://phabricator.wikimedia.org/T299459 (10RobH)
[21:57:48] <wikibugs>	 (03PS4) 10Ryan Kemper: elasticsearch: decom elastic10[32-47] (step 4) [puppet] - 10https://gerrit.wikimedia.org/r/736119 (https://phabricator.wikimedia.org/T294805)
[21:57:55] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install ganeti10[29|3(012)] - https://phabricator.wikimedia.org/T299459 (10RobH)
[21:58:21] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar), 10User-ema: Experiment with single backend CDN nodes - https://phabricator.wikimedia.org/T288106 (10Krinkle)
[21:59:36] <wikibugs>	 (03CR) 10Ryan Kemper: elasticsearch: decom elastic10[32-47] (step 4) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/736119 (https://phabricator.wikimedia.org/T294805) (owner: 10Ryan Kemper)
[21:59:57] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar), 10User-ema: Package and deploy Varnish 6.0.9 - https://phabricator.wikimedia.org/T298758 (10Krinkle)
[22:07:22] <wikibugs>	 (03CR) 10Brennen Bearnes: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/724472 (https://phabricator.wikimedia.org/T291978) (owner: 10Brennen Bearnes)
[22:07:26] <wikibugs>	 (03PS12) 10Ppchelko: Benchmark loading DefaultSettings from YAML [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911
[22:08:54] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install ms-be10[68-71] - https://phabricator.wikimedia.org/T299462 (10RobH)
[22:09:27] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install ms-be10[68-71] - https://phabricator.wikimedia.org/T299462 (10RobH)
[22:10:11] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install ms-be10[68-71] - https://phabricator.wikimedia.org/T299462 (10RobH) a:03Jclark-ctr
[22:12:16] <wikibugs>	 (03PS2) 10Hashar: Update Gerrit to 3.3.9 + plugins [software/gerrit] (deploy/wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755028 (https://phabricator.wikimedia.org/T240264)
[22:20:45] <wikibugs>	 (03CR) 10Ryan Kemper: elasticsearch: activate role (step 2) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/736117 (https://phabricator.wikimedia.org/T294805) (owner: 10Ryan Kemper)
[22:27:13] <jinxer-wm>	 (IcingaOverload) firing: Checks are taking long to execute on alert2001:9245  - https://grafana.wikimedia.org/d/rsCfQfuZz/icinga - https://alerts.wikimedia.org
[22:29:51] <wikibugs>	 (03PS1) 10Cwhite: bump patch version to update plugins [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/755033
[22:31:19] <wikibugs>	 (03CR) 10Ppchelko: [C: 04-2] "After discussing with Tim, this should go into mediawiki-config repo." [core] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754911 (owner: 10Ppchelko)
[22:34:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[22:36:03] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Enable wikis to customize the syntax used for replies [extensions/DiscussionTools] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754915 (https://phabricator.wikimedia.org/T259864)
[22:36:05] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] logstash: add optional document_type parameter to es output config [puppet] - 10https://gerrit.wikimedia.org/r/747634 (https://phabricator.wikimedia.org/T297239) (owner: 10Cwhite)
[22:36:23] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Ensure the marker appears in a reasonable place when replying with a bullet [extensions/DiscussionTools] (wmf/1.38.0-wmf.17) - 10https://gerrit.wikimedia.org/r/754916 (https://phabricator.wikimedia.org/T259864)
[22:37:13] <jinxer-wm>	 (IcingaOverload) resolved: Checks are taking long to execute on alert2001:9245  - https://grafana.wikimedia.org/d/rsCfQfuZz/icinga - https://alerts.wikimedia.org
[22:37:23] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] logstash: add optional document_type parameter to es output config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/747634 (https://phabricator.wikimedia.org/T297239) (owner: 10Cwhite)
[22:37:49] <wikibugs>	 (03PS3) 10Bartosz Dziewoński: DiscussionTools: Use bullet indentation on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/753192 (https://phabricator.wikimedia.org/T259864)
[22:40:41] <wikibugs>	 (03PS5) 10Cwhite: logstash: add opensearch output config definition [puppet] - 10https://gerrit.wikimedia.org/r/727624 (https://phabricator.wikimedia.org/T288618)
[22:43:06] <wikibugs>	 (03PS6) 10Cwhite: logstash: add opensearch output config definition [puppet] - 10https://gerrit.wikimedia.org/r/727624 (https://phabricator.wikimedia.org/T288618)
[22:43:49] <wikibugs>	 (03PS7) 10Cwhite: logstash: add opensearch output config definition [puppet] - 10https://gerrit.wikimedia.org/r/727624 (https://phabricator.wikimedia.org/T288618)
[22:44:25] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7324 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[22:44:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[22:45:38] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] logstash: add opensearch output config definition [puppet] - 10https://gerrit.wikimedia.org/r/727624 (https://phabricator.wikimedia.org/T288618) (owner: 10Cwhite)
[22:46:29] <icinga-wm>	 PROBLEM - Ensure hosts are not performing a change on every puppet run on cumin2001 is CRITICAL: CRITICAL: the following (5) node(s) change every puppet run: build2001, labstore1007, wdqs1010, labstore1006, miscweb1002 https://wikitech.wikimedia.org/wiki/Puppet%23check_puppet_run_changes
[22:50:07] <sbassett>	 Hey all - I'd like to deploy a security patch for T298434 to wmf.18 and wmf.17 now.  Let me know if I shouldn't...
[22:51:31] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7347 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[22:52:49] <wikibugs>	 (03PS1) 10Clare Ming: Update config for pilot wikis: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755038 (https://phabricator.wikimedia.org/T298519)
[22:56:17] <icinga-wm>	 RECOVERY - Cassandra instance data free space on restbase2011 is OK: DISK OK - free space: /srv/cassandra/instance-data 12032 MB (34% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[22:57:20] <sbassett>	 !log Deployed security patch for T298434 to 1.380-wmf.17
[22:57:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:59:38] <sbassett>	 !log Deployed security patch for T298434 to 1.38.0-wmf.18
[22:59:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:01:05] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7061 MB (19% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[23:05:12] <wikibugs>	 (03PS1) 10Zabe: Don't use array keys for OOUI [extensions/AbuseFilter] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754917 (https://phabricator.wikimedia.org/T299463)
[23:07:55] <icinga-wm>	 PROBLEM - MariaDB Replica IO: x1 on db2101 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2026, Errmsg: error reconnecting to master repl@db2096.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: SSL connection error00000000:lib(0):func(0):reason(0) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:08:03] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s2 on db2101 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2026, Errmsg: error reconnecting to master repl@db2104.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: SSL connection error00000000:lib(0):func(0):reason(0) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:10:05] <jynus>	 I need to rebalance db2101, it gets too loaded at peak backup time
[23:14:21] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: (Need By: TBD) rack/setup/install stat1009 - https://phabricator.wikimedia.org/T299466 (10RobH)
[23:14:48] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering: (Need By: TBD) rack/setup/install stat1009 - https://phabricator.wikimedia.org/T299466 (10RobH)
[23:17:48] <wikibugs>	 10SRE, 10DC-Ops: Confirm support of PERC 750 raid controller - https://phabricator.wikimedia.org/T297913 (10RobH)
[23:20:10] <wikibugs>	 (03PS1) 10Cwhite: beta-logs: use opensearch output plugin [puppet] - 10https://gerrit.wikimedia.org/r/755040 (https://phabricator.wikimedia.org/T299168)
[23:21:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] beta-logs: use opensearch output plugin [puppet] - 10https://gerrit.wikimedia.org/r/755040 (https://phabricator.wikimedia.org/T299168) (owner: 10Cwhite)
[23:22:05] <icinga-wm>	 PROBLEM - MariaDB Replica IO: s5 on db2101 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2026, Errmsg: error reconnecting to master repl@db2123.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: SSL connection error00000000:lib(0):func(0):reason(0) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:23:12] <wikibugs>	 (03CR) 10Cwhite: [V: 03+2 C: 03+2] beta-logs: use opensearch output plugin [puppet] - 10https://gerrit.wikimedia.org/r/755040 (https://phabricator.wikimedia.org/T299168) (owner: 10Cwhite)
[23:23:31] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s2 on db2101 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 449.43 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:24:27] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s5 on db2101 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:24:31] <icinga-wm>	 RECOVERY - MariaDB Replica IO: x1 on db2101 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:24:39] <icinga-wm>	 RECOVERY - MariaDB Replica IO: s2 on db2101 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:24:45] <icinga-wm>	 PROBLEM - Cassandra instance data free space on restbase2012 is CRITICAL: DISK CRITICAL - free space: /srv/cassandra/instance-data 7401 MB (20% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[23:25:51] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s2 on db2101 is OK: OK slave_sql_lag Replication lag: 1.48 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[23:29:29] <icinga-wm>	 RECOVERY - Cassandra instance data free space on restbase2012 is OK: DISK OK - free space: /srv/cassandra/instance-data 12466 MB (35% inode=99%): https://wikitech.wikimedia.org/wiki/RESTBase%23instance-data
[23:33:55] <jinxer-wm>	 (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[23:34:03] <wikibugs>	 (03CR) 10Clare Ming: "I have a question out to Olga confirming that the language alert in the sidebar should be enabled for pilot wikis." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/755038 (https://phabricator.wikimedia.org/T298519) (owner: 10Clare Ming)
[23:34:50] <wikibugs>	 (03PS1) 10Cwhite: prepare for logstash 7.16.3 [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/755041 (https://phabricator.wikimedia.org/T299168)
[23:36:14] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-codfw, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install ms-be20[66-69] - https://phabricator.wikimedia.org/T299468 (10RobH)
[23:36:54] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-codfw, 10DC-Ops: Q3:(Need By: TBD) rack/setup/install ms-be20[66-69] - https://phabricator.wikimedia.org/T299468 (10RobH)
[23:39:04] <wikibugs>	 (03PS1) 10Zabe: Don't use array keys for OOUI in AbuseFilterViewDiff [extensions/AbuseFilter] (wmf/1.38.0-wmf.18) - 10https://gerrit.wikimedia.org/r/754918 (https://phabricator.wikimedia.org/T299463)
[23:39:15] <wikibugs>	 (03PS1) 10Cwhite: builder: add opensearch1 pbuilder hooks for logstash-plugins update [puppet] - 10https://gerrit.wikimedia.org/r/755043 (https://phabricator.wikimedia.org/T299168)
[23:41:45] <wikibugs>	 (03PS2) 104nn1l2: Revert "commonswiki: Add peerj.com to wgCopyUploadsDomains whitelist" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/754914
[23:43:55] <jinxer-wm>	 (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org
[23:57:01] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops, 10Kubernetes: (Need By: TBD) rack/setup/install kubernetes20[19|2(012)] - https://phabricator.wikimedia.org/T299470 (10RobH)
[23:58:02] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10serviceops, 10Kubernetes: (Need By: TBD) rack/setup/install kubernetes20[19|2(012)] - https://phabricator.wikimedia.org/T299470 (10RobH)