[00:03:54] <icinga-wm>	 PROBLEM - Check systemd state on maps1009 is CRITICAL: CRITICAL - degraded: The following units failed: planet_sync_tile_generation-gis.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:04:02] <icinga-wm>	 PROBLEM - Check systemd state on alert1001 is CRITICAL: CRITICAL - degraded: The following units failed: dispatch-ldap-users-sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:04:46] <icinga-wm>	 PROBLEM - Check systemd state on maps2009 is CRITICAL: CRITICAL - degraded: The following units failed: planet_sync_tile_generation-gis.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:05:00] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Add "Clear Affordances" to DiscussionTools beta feature on remaining wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879158 (https://phabricator.wikimedia.org/T321955)
[00:07:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:09:09] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Add "Page Frame" to DiscussionTools beta feature on partner wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879159 (https://phabricator.wikimedia.org/T317907)
[00:09:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43048 and previous config saved to /var/cache/conftool/dbconfig/20230112-000929-marostegui.json
[00:16:48] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[00:18:14] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[00:20:24] <wikibugs>	 10SRE-OnFire, 10SRE Observability (FY2022/2023-Q3): implementing an incident response workflow automation tool for SRE - https://phabricator.wikimedia.org/T308467 (10lmata)
[00:24:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43049 and previous config saved to /var/cache/conftool/dbconfig/20230112-002436-marostegui.json
[00:24:38] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
[00:24:41] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[00:24:51] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
[00:24:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43050 and previous config saved to /var/cache/conftool/dbconfig/20230112-002457-marostegui.json
[00:26:42] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Enable visual enhancements on all talk namespaces [extensions/DiscussionTools] (wmf/1.40.0-wmf.18) - 10https://gerrit.wikimedia.org/r/879103 (https://phabricator.wikimedia.org/T325417)
[00:27:00] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for appserver on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[00:27:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43051 and previous config saved to /var/cache/conftool/dbconfig/20230112-002721-marostegui.json
[00:27:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:33:44] <icinga-wm>	 PROBLEM - Check systemd state on logstash1023 is CRITICAL: CRITICAL - degraded: The following units failed: opensearch_2@production-elk7-eqiad.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:34:02] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus: Divert requests with x-public-cloud set to a dedicated pool counter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879161 (https://phabricator.wikimedia.org/T326757)
[00:35:20] <icinga-wm>	 RECOVERY - Check systemd state on logstash1023 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:37:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:42:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43052 and previous config saved to /var/cache/conftool/dbconfig/20230112-004228-marostegui.json
[00:42:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:44:34] <wikibugs>	 (03CR) 10Raymond Ndibe: Modify maintain-dbusers.py to call the rest-api service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[00:47:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:50:38] <wikibugs>	 (03CR) 10Raymond Ndibe: Modify maintain-dbusers.py to call the rest-api service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[00:52:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:57:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43053 and previous config saved to /var/cache/conftool/dbconfig/20230112-005734-marostegui.json
[00:57:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:59:55] <wikibugs>	 (03CR) 10Cwhite: opensearch: make upgrade-phatality.sh stricter (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/849631 (https://phabricator.wikimedia.org/T304440) (owner: 10Hashar)
[01:12:42] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43054 and previous config saved to /var/cache/conftool/dbconfig/20230112-011241-marostegui.json
[01:12:43] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
[01:12:45] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[01:12:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:12:56] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
[01:13:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43055 and previous config saved to /var/cache/conftool/dbconfig/20230112-011302-marostegui.json
[01:15:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43056 and previous config saved to /var/cache/conftool/dbconfig/20230112-011526-marostegui.json
[01:17:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:27:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:28:02] <icinga-wm>	 PROBLEM - Check systemd state on rpki1001 is CRITICAL: CRITICAL - degraded: The following units failed: node-bgpalerter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:30:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43057 and previous config saved to /var/cache/conftool/dbconfig/20230112-013033-marostegui.json
[01:33:46] <icinga-wm>	 PROBLEM - Check systemd state on gitlab2002 is CRITICAL: CRITICAL - degraded: The following units failed: backup-restore.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:37:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:42:46] <jinxer-wm>	 (JobUnavailable) firing: (7) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:45:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43058 and previous config saved to /var/cache/conftool/dbconfig/20230112-014539-marostegui.json
[01:47:46] <jinxer-wm>	 (JobUnavailable) firing: (7) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:57:12] <wikibugs>	 (03CR) 10Raymond Ndibe: Modify maintain-dbusers.py to call the rest-api service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[01:57:46] <jinxer-wm>	 (JobUnavailable) firing: (8) Reduced availability for job gitaly in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:00:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43059 and previous config saved to /var/cache/conftool/dbconfig/20230112-020046-marostegui.json
[02:00:50] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[02:02:46] <jinxer-wm>	 (JobUnavailable) firing: (8) Reduced availability for job gitaly in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:15:15] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[02:16:23] <wikibugs>	 10Puppet, 10SRE, 10Data-Services, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): clouddumps1002: ferm is being started on every puppet run - https://phabricator.wikimedia.org/T323324 (10Andrew) >>! In T323324#8517754, @Dzahn wrote: > @Andrew Is it not maybe 65.19.157.35 ?  Because that is...
[02:17:46] <jinxer-wm>	 (JobUnavailable) firing: (8) Reduced availability for job gitaly in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:22:46] <jinxer-wm>	 (JobUnavailable) firing: (7) Reduced availability for job gitaly in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:24:25] <wikibugs>	 10SRE, 10Observability-Alerting, 10WMF-Legal, 10WikimediaMessages, and 2 others: Find the right procedure to update wiki footers (was en.wikibooks.org has changed legal footer) - https://phabricator.wikimedia.org/T317169 (10Slaporte) This looks good. Thank you!
[02:27:46] <jinxer-wm>	 (JobUnavailable) firing: (8) Reduced availability for job gitaly in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:47:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:50:46] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
[02:51:10] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
[02:51:34] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[02:51:47] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[02:51:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43060 and previous config saved to /var/cache/conftool/dbconfig/20230112-025153-marostegui.json
[02:51:57] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[02:52:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:54:18] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43061 and previous config saved to /var/cache/conftool/dbconfig/20230112-025417-marostegui.json
[02:57:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:09:24] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43062 and previous config saved to /var/cache/conftool/dbconfig/20230112-030924-marostegui.json
[03:24:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43063 and previous config saved to /var/cache/conftool/dbconfig/20230112-032430-marostegui.json
[03:32:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:39:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43064 and previous config saved to /var/cache/conftool/dbconfig/20230112-033937-marostegui.json
[03:39:39] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[03:39:41] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[03:39:52] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[03:39:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43065 and previous config saved to /var/cache/conftool/dbconfig/20230112-033958-marostegui.json
[03:42:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43066 and previous config saved to /var/cache/conftool/dbconfig/20230112-034221-marostegui.json
[03:52:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:57:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43067 and previous config saved to /var/cache/conftool/dbconfig/20230112-035727-marostegui.json
[04:02:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:07:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:12:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43068 and previous config saved to /var/cache/conftool/dbconfig/20230112-041234-marostegui.json
[04:12:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:22:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:27:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43069 and previous config saved to /var/cache/conftool/dbconfig/20230112-042741-marostegui.json
[04:27:43] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[04:27:45] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[04:27:45] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
[04:27:47] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[04:27:51] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[04:27:52] <wikibugs>	 10SRE, 10Parsoid, 10vm-requests: <site>: <number of> VMs requested for <service> - https://phabricator.wikimedia.org/T326775 (101313)
[04:27:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43070 and previous config saved to /var/cache/conftool/dbconfig/20230112-042757-marostegui.json
[04:30:20] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43071 and previous config saved to /var/cache/conftool/dbconfig/20230112-043020-marostegui.json
[04:32:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:42:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:45:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43072 and previous config saved to /var/cache/conftool/dbconfig/20230112-044526-marostegui.json
[04:57:03] <wikibugs>	 (03PS32) 10Raymond Ndibe: Modify maintain-dbusers.py to call the rest-api service [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040)
[04:59:18] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Modify maintain-dbusers.py to call the rest-api service [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[04:59:52] <wikibugs>	 (03CR) 10Raymond Ndibe: "Hello Bryan, I made changes to maintain_dbusers.py and maintain_dbusers.pp in attempt to address your review. Can you verify that what I d" [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[05:00:04] <wikibugs>	 (03PS33) 10Raymond Ndibe: Modify maintain-dbusers.py to call the rest-api service [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040)
[05:00:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43073 and previous config saved to /var/cache/conftool/dbconfig/20230112-050033-marostegui.json
[05:02:18] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Modify maintain-dbusers.py to call the rest-api service [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040) (owner: 10Raymond Ndibe)
[05:15:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43074 and previous config saved to /var/cache/conftool/dbconfig/20230112-051539-marostegui.json
[05:15:41] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
[05:15:44] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[05:15:55] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
[05:16:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43075 and previous config saved to /var/cache/conftool/dbconfig/20230112-051601-marostegui.json
[05:18:24] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43076 and previous config saved to /var/cache/conftool/dbconfig/20230112-051823-marostegui.json
[05:33:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43077 and previous config saved to /var/cache/conftool/dbconfig/20230112-053330-marostegui.json
[05:48:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43078 and previous config saved to /var/cache/conftool/dbconfig/20230112-054837-marostegui.json
[06:03:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43079 and previous config saved to /var/cache/conftool/dbconfig/20230112-060343-marostegui.json
[06:03:45] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[06:03:48] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[06:03:58] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
[06:04:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43080 and previous config saved to /var/cache/conftool/dbconfig/20230112-060404-marostegui.json
[06:06:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43081 and previous config saved to /var/cache/conftool/dbconfig/20230112-060627-marostegui.json
[06:15:15] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[06:21:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43082 and previous config saved to /var/cache/conftool/dbconfig/20230112-062134-marostegui.json
[06:32:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:36:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43083 and previous config saved to /var/cache/conftool/dbconfig/20230112-063640-marostegui.json
[06:37:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:51:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43084 and previous config saved to /var/cache/conftool/dbconfig/20230112-065147-marostegui.json
[06:51:49] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
[06:51:52] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[06:52:02] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
[06:52:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43085 and previous config saved to /var/cache/conftool/dbconfig/20230112-065208-marostegui.json
[06:52:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:54:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43086 and previous config saved to /var/cache/conftool/dbconfig/20230112-065430-marostegui.json
[06:57:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:00:04] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T0700)
[07:00:04] <jouncebot>	 kormat, marostegui, and Amir1: Dear deployers, time to do the Primary database switchover deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T0700).
[07:02:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:07:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:09:05] <wikibugs>	 10SRE, 10Performance-Team, 10Traffic, 10Performance Issue: en.wiki slow to respond when editing, and occasionally throws an error with Chrome search shortcuts, or blocked because missing HTTPS - https://phabricator.wikimedia.org/T326496 (10larissagaulia) 05Open→03Resolved a:03larissagaulia
[07:09:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43087 and previous config saved to /var/cache/conftool/dbconfig/20230112-070936-marostegui.json
[07:22:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:23:12] <wikibugs>	 (03PS1) 10Slyngshede: C:idm::deployment Add TLS termination. [puppet] - 10https://gerrit.wikimedia.org/r/879182
[07:24:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43088 and previous config saved to /var/cache/conftool/dbconfig/20230112-072443-marostegui.json
[07:26:45] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39101/console" [puppet] - 10https://gerrit.wikimedia.org/r/879182 (owner: 10Slyngshede)
[07:28:28] <wikibugs>	 (03PS2) 10Slyngshede: C:idm::deployment Add TLS termination. [puppet] - 10https://gerrit.wikimedia.org/r/879182
[07:29:27] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39102/console" [puppet] - 10https://gerrit.wikimedia.org/r/879182 (owner: 10Slyngshede)
[07:35:32] <wikibugs>	 (03CR) 10Slyngshede: "I still need to figure out how to do the CNAME in DNS." [puppet] - 10https://gerrit.wikimedia.org/r/879182 (owner: 10Slyngshede)
[07:38:51] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.debug for Netbox circuit ID 112
[07:39:19] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
[07:39:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43089 and previous config saved to /var/cache/conftool/dbconfig/20230112-073949-marostegui.json
[07:39:51] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
[07:39:53] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[07:40:04] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
[07:40:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43090 and previous config saved to /var/cache/conftool/dbconfig/20230112-074010-marostegui.json
[07:40:39] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.peering with action 'configure' for AS: 37002
[07:41:11] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 37002
[07:42:07] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.peering with action 'configure' for AS: 9584
[07:42:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43091 and previous config saved to /var/cache/conftool/dbconfig/20230112-074232-marostegui.json
[07:43:27] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9584
[07:50:03] <jinxer-wm>	 (ProbeDown) firing: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:52:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:53:58] <icinga-wm>	 PROBLEM - Check systemd state on rpki2002 is CRITICAL: CRITICAL - degraded: The following units failed: node-bgpalerter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:55:03] <jinxer-wm>	 (ProbeDown) resolved: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:56:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] package_builder::pbuilder_hook: Manage the hook directory with Puppet [puppet] - 10https://gerrit.wikimedia.org/r/878879 (owner: 10Muehlenhoff)
[07:57:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43092 and previous config saved to /var/cache/conftool/dbconfig/20230112-075739-marostegui.json
[07:59:47] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host bast5003.wikimedia.org
[07:59:48] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[08:00:04] <jouncebot>	 Amir1, apergos, and jnuche: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC morning backport and config training deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T0800).
[08:00:34] <apergos>	 morning! there are no trainees signed up today and no patches scheduled for the window. if any self deployers want to get something done, now's the time. 
[08:02:15] <wikibugs>	 (03PS1) 10Ayounsi: Depool esams for network maintenance [dns] - 10https://gerrit.wikimedia.org/r/879268 (https://phabricator.wikimedia.org/T316532)
[08:02:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:04:41] <wikibugs>	 10SRE, 10SRE-OnFire, 10Infrastructure-Foundations, 10netops, and 2 others: Upgrade POPs asw to Junos 21 - https://phabricator.wikimedia.org/T316532 (10ayounsi) On ulsfo: > I had no issues in the lab going from 14.1X53-D54.1 (It was the only available in the lab) to 19.1. (the closest version available on t...
[08:05:59] <wikibugs>	 (03PS1) 10Ayounsi: Add untrusted/customer/parked prefixes to bgpalerter [puppet] - 10https://gerrit.wikimedia.org/r/879269 (https://phabricator.wikimedia.org/T230600)
[08:06:50] <wikibugs>	 (03PS2) 10Ayounsi: Add untrusted/customer/parked prefixes to bgpalerter [puppet] - 10https://gerrit.wikimedia.org/r/879269 (https://phabricator.wikimedia.org/T230600)
[08:09:51] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+1] "https://puppet-compiler.wmflabs.org/output/879269/39103/rpki1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/879269 (https://phabricator.wikimedia.org/T230600) (owner: 10Ayounsi)
[08:12:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43093 and previous config saved to /var/cache/conftool/dbconfig/20230112-081245-marostegui.json
[08:16:41] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
[08:17:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:17:47] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
[08:17:47] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:17:47] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.wipe-cache bast5003.wikimedia.org on all recursors
[08:17:50] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast5003.wikimedia.org on all recursors
[08:22:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:27:49] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] "Deploying after getting legal's ok- we can later tune the keywords if needed." [puppet] - 10https://gerrit.wikimedia.org/r/878010 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[08:27:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43094 and previous config saved to /var/cache/conftool/dbconfig/20230112-082752-marostegui.json
[08:27:54] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[08:28:04] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[08:28:07] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
[08:28:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43095 and previous config saved to /var/cache/conftool/dbconfig/20230112-082813-marostegui.json
[08:31:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43096 and previous config saved to /var/cache/conftool/dbconfig/20230112-083135-marostegui.json
[08:34:25] <wikibugs>	 (03PS1) 10Effie Mouzeli: Revert "P:memcached::memkeys: install memkeys only if on buster" [puppet] - 10https://gerrit.wikimedia.org/r/879303
[08:34:25] <icinga-wm>	 RECOVERY - Check systemd state on people2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:36:51] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast5003.wikimedia.org
[08:36:59] <wikibugs>	 (03PS2) 10Effie Mouzeli: Revert "P:memcached::memkeys: install memkeys only if on buster" [puppet] - 10https://gerrit.wikimedia.org/r/879303
[08:40:43] <wikibugs>	 (03PS1) 10Filippo Giunchedi: systemd: send ::syslog output to remote destination [puppet] - 10https://gerrit.wikimedia.org/r/879272 (https://phabricator.wikimedia.org/T325806)
[08:41:13] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] Revert "P:memcached::memkeys: install memkeys only if on buster" [puppet] - 10https://gerrit.wikimedia.org/r/879303 (owner: 10Effie Mouzeli)
[08:41:27] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[08:42:37] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[08:42:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:42:51] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] systemd: send ::syslog output to remote destination [puppet] - 10https://gerrit.wikimedia.org/r/879272 (https://phabricator.wikimedia.org/T325806) (owner: 10Filippo Giunchedi)
[08:42:54] <wikibugs>	 (03PS1) 10Muehlenhoff: Add bast5003 [puppet] - 10https://gerrit.wikimedia.org/r/879273 (https://phabricator.wikimedia.org/T324974)
[08:43:52] <wikibugs>	 (03PS2) 10Filippo Giunchedi: systemd: send ::syslog output to remote destination [puppet] - 10https://gerrit.wikimedia.org/r/879272 (https://phabricator.wikimedia.org/T325806)
[08:46:25] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add bast5003 [puppet] - 10https://gerrit.wikimedia.org/r/879273 (https://phabricator.wikimedia.org/T324974) (owner: 10Muehlenhoff)
[08:46:42] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43097 and previous config saved to /var/cache/conftool/dbconfig/20230112-084641-marostegui.json
[08:48:39] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Depool esams for network maintenance [dns] - 10https://gerrit.wikimedia.org/r/879268 (https://phabricator.wikimedia.org/T316532) (owner: 10Ayounsi)
[08:48:45] <wikibugs>	 (03CR) 10Filippo Giunchedi: "This is the list of users of systemd::syslog as of today, for reference:" [puppet] - 10https://gerrit.wikimedia.org/r/879272 (https://phabricator.wikimedia.org/T325806) (owner: 10Filippo Giunchedi)
[08:49:09] <zabe>	 !log deployed updated patch for T311337
[08:49:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:50:11] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host ping1003.eqiad.wmnet
[08:50:12] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[08:50:29] <wikibugs>	 10Puppet, 10SRE, 10Data-Services, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): clouddumps1002: ferm is being started on every puppet run - https://phabricator.wikimedia.org/T323324 (10taavi) can you try running `ferm-status` with `--verbose`?
[08:50:38] <XioNoX>	 !log depool esams for network maintenance - T316532
[08:50:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:50:41] <stashbot>	 T316532: Upgrade POPs asw to Junos 21 - https://phabricator.wikimedia.org/T316532
[08:52:24] <wikibugs>	 (03PS1) 10Majavah: hieradata: add wmcs-roots to clouddumps servers [puppet] - 10https://gerrit.wikimedia.org/r/879274
[08:53:58] <logmsgbot>	 !log phedenskog@deploy1002 Started deploy [performance/navtiming@172cc22]: (no justification provided)
[08:54:16] <logmsgbot>	 !log phedenskog@deploy1002 Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 17s)
[08:54:19] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
[08:54:54] <logmsgbot>	 !log phedenskog@deploy1002 Started deploy [performance/navtiming@172cc22]: (no justification provided)
[08:55:17] <logmsgbot>	 !log phedenskog@deploy1002 Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 22s)
[08:55:23] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
[08:55:23] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:55:23] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.wipe-cache ping1003.eqiad.wmnet on all recursors
[08:55:26] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping1003.eqiad.wmnet on all recursors
[09:00:34] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping1003.eqiad.wmnet
[09:01:49] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43098 and previous config saved to /var/cache/conftool/dbconfig/20230112-090148-marostegui.json
[09:02:21] <wikibugs>	 (03PS1) 10KartikMistry: testwiki: Use Parsoid in Mediawiki Core for Content Translation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879276 (https://phabricator.wikimedia.org/T323667)
[09:08:48] <wikibugs>	 (03PS1) 10Daniel Kinzler: Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879277
[09:11:17] <logmsgbot>	 !log jiji@cumin1001 START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
[09:13:21] <wikibugs>	 10SRE, 10serviceops, 10User-Elukey: Test memsniff as possible replacement of memkeys - https://phabricator.wikimedia.org/T228970 (10jijiki) For the time being, we have packaged memkeys for bullseye so not to block T293216
[09:13:32] <wikibugs>	 (03PS3) 10Thiemo Kreuz (WMDE): Deprecate the EnableMapFrame feature flag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/875463 (https://phabricator.wikimedia.org/T326288) (owner: 10Awight)
[09:13:40] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Deprecate the EnableMapFrame feature flag [mediawiki-config] - 10https://gerrit.wikimedia.org/r/875463 (https://phabricator.wikimedia.org/T326288) (owner: 10Awight)
[09:14:04] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to datacenter-ops for Jennifer Hancock - https://phabricator.wikimedia.org/T326649 (10Jelto) >>! In T326649#8513274, @Papaul wrote: > @Jelto thanks for the reply i have already her SSH-key and I will personally be adding her to the group onc...
[09:16:55] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43099 and previous config saved to /var/cache/conftool/dbconfig/20230112-091654-marostegui.json
[09:16:57] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[09:16:59] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[09:17:10] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
[09:17:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43100 and previous config saved to /var/cache/conftool/dbconfig/20230112-091716-marostegui.json
[09:17:24] <wikibugs>	 (03CR) 10Jelto: admin: Add Jennifer Hancock to the datacenter-ops group (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/878171 (https://phabricator.wikimedia.org/T326649) (owner: 10Papaul)
[09:19:16] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: ml-services: multi-processing changes for articlequality and drafttopic [deployment-charts] - 10https://gerrit.wikimedia.org/r/879279 (https://phabricator.wikimedia.org/T323624)
[09:19:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43101 and previous config saved to /var/cache/conftool/dbconfig/20230112-091937-marostegui.json
[09:20:23] <wikibugs>	 (03PS2) 10Ilias Sarantopoulos: ml-services: multi-processing changes drafttopic (load-testing) [deployment-charts] - 10https://gerrit.wikimedia.org/r/879279 (https://phabricator.wikimedia.org/T323624)
[09:21:39] <icinga-wm>	 RECOVERY - Check systemd state on gitlab2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:22:36] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.wikireplicas.add-wiki
[09:24:54] <logmsgbot>	 !log jiji@cumin1001 END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mc1039.eqiad.wmnet
[09:24:58] <logmsgbot>	 !log jiji@cumin1001 START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
[09:25:12] <logmsgbot>	 !log jiji@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc1039.eqiad.wmnet
[09:26:05] <wikibugs>	 (03PS1) 10Jcrespo: icinga: Add BeautifulSoap4 python dependency for check_legal [puppet] - 10https://gerrit.wikimedia.org/r/879280 (https://phabricator.wikimedia.org/T317169)
[09:26:26] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] icinga: Add BeautifulSoap4 python dependency for check_legal [puppet] - 10https://gerrit.wikimedia.org/r/879280 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[09:27:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:28:06] <wikibugs>	 (03CR) 10MVernon: [C: 03+2] hiera: move swift credentials into common [labs/private] - 10https://gerrit.wikimedia.org/r/868718 (https://phabricator.wikimedia.org/T162123) (owner: 10MVernon)
[09:28:10] <wikibugs>	 (03CR) 10MVernon: [V: 03+2 C: 03+2] hiera: move swift credentials into common [labs/private] - 10https://gerrit.wikimedia.org/r/868718 (https://phabricator.wikimedia.org/T162123) (owner: 10MVernon)
[09:28:23] <wikibugs>	 (03PS2) 10Jcrespo: icinga: Add BeautifulSoap4 python dependency for check_legal [puppet] - 10https://gerrit.wikimedia.org/r/879280 (https://phabricator.wikimedia.org/T317169)
[09:28:59] <wikibugs>	 (03CR) 10Klausman: [C: 03+2] ml-services: multi-processing changes drafttopic (load-testing) [deployment-charts] - 10https://gerrit.wikimedia.org/r/879279 (https://phabricator.wikimedia.org/T323624) (owner: 10Ilias Sarantopoulos)
[09:29:49] <wikibugs>	 (03CR) 10Jcrespo: "I know you are out, but I am rather confident about this patch, so feel free to post-merge review" [puppet] - 10https://gerrit.wikimedia.org/r/879280 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[09:29:57] <wikibugs>	 (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/868721 (https://phabricator.wikimedia.org/T162123) (owner: 10MVernon)
[09:31:03] <wikibugs>	 (03PS3) 10Jcrespo: icinga: Add BeautifulSoap4 python dependency for check_legal [puppet] - 10https://gerrit.wikimedia.org/r/879280 (https://phabricator.wikimedia.org/T317169)
[09:31:07] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
[09:31:32] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
[09:31:36] <wikibugs>	 10SRE, 10observability: service implementation tracking: arclamp1001.eqiad.wmnet - https://phabricator.wikimedia.org/T319434 (10MoritzMuehlenhoff) p:05Triage→03Medium
[09:31:36] <wikibugs>	 (03PS4) 10Jcrespo: icinga: Add BeautifulSoup4 python dependency for check_legal [puppet] - 10https://gerrit.wikimedia.org/r/879280 (https://phabricator.wikimedia.org/T317169)
[09:31:41] <wikibugs>	 10SRE, 10SRE-OnFire, 10Infrastructure-Foundations, 10netops, and 2 others: Upgrade POPs asw to Junos 21 - https://phabricator.wikimedia.org/T316532 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=43642849-a893-44f6-961e-0bb82f3a9b4e) set by ayounsi@cumin1001 for 2:00:00 on 36 host(s) an...
[09:31:43] <icinga-wm>	 RECOVERY - Check systemd state on deploy1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:32:46] <jinxer-wm>	 (JobUnavailable) firing: (6) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:34:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43102 and previous config saved to /var/cache/conftool/dbconfig/20230112-093443-marostegui.json
[09:34:47] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: multi-processing changes drafttopic (load-testing) [deployment-charts] - 10https://gerrit.wikimedia.org/r/879279 (https://phabricator.wikimedia.org/T323624) (owner: 10Ilias Sarantopoulos)
[09:34:59] <wikibugs>	 (03PS5) 10Jcrespo: icinga: Add BeautifulSoup4 python dependency for check_legal [puppet] - 10https://gerrit.wikimedia.org/r/879280 (https://phabricator.wikimedia.org/T317169)
[09:35:22] <wikibugs>	 10SRE, 10ops-esams, 10DC-Ops, 10Infrastructure-Foundations, 10decommission-hardware: decommission atlas-esams - https://phabricator.wikimedia.org/T307026 (10ayounsi) That's a bit embarrassing, but the box came back from the dead... https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=ripe...
[09:36:51] <wikibugs>	 (03CR) 10MVernon: [C: 03+2] swift: move accounts_keys to common hiera global_account_keys [puppet] - 10https://gerrit.wikimedia.org/r/868721 (https://phabricator.wikimedia.org/T162123) (owner: 10MVernon)
[09:37:56] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
[09:38:57] <wikibugs>	 (03CR) 10Jcrespo: "https://puppet-compiler.wmflabs.org/output/879280/39105/alert1001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/879280 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[09:39:18] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] icinga: Add BeautifulSoup4 python dependency for check_legal [puppet] - 10https://gerrit.wikimedia.org/r/879280 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[09:39:36] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host ping2003.codfw.wmnet
[09:39:38] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[09:41:59] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
[09:42:58] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
[09:42:58] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:42:58] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.wipe-cache ping2003.codfw.wmnet on all recursors
[09:43:01] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping2003.codfw.wmnet on all recursors
[09:45:53] <icinga-wm>	 PROBLEM - puppet last run on gitlab2002 is CRITICAL: CRITICAL: Puppet last ran 8 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[09:46:25] <icinga-wm>	 RECOVERY - Ensure legal html en.wp on en.wikipedia.org is OK: All legal html excerpts are present for https://en.wikipedia.org/wiki/Main_Page (desktop site): copyright, terms, privacy, trademark https://phabricator.wikimedia.org/project/members/28/
[09:46:28] <XioNoX>	 !log redirect ns2 to authdns1001 - T316532
[09:46:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:31] <stashbot>	 T316532: Upgrade POPs asw to Junos 21 - https://phabricator.wikimedia.org/T316532
[09:47:14] <logmsgbot>	 !log btullis@cumin1001 Added views for new wiki: pcmwiki T310879
[09:47:14] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
[09:47:17] <stashbot>	 T310879: Prepare and check storage layer for pcmwiki - https://phabricator.wikimedia.org/T310879
[09:47:46] <jinxer-wm>	 (JobUnavailable) firing: (7) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:47:59] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on maps1009 is CRITICAL: CRITICAL - degraded: The following units failed: planet_sync_tile_generation-gis.service Effie Mouzeli It is ok as we are retiring some components https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:47:59] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on maps2009 is CRITICAL: CRITICAL - degraded: The following units failed: planet_sync_tile_generation-gis.service Effie Mouzeli It is ok as we are retiring some components https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:48:11] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2003.codfw.wmnet
[09:49:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43103 and previous config saved to /var/cache/conftool/dbconfig/20230112-094950-marostegui.json
[09:50:01] <logmsgbot>	 !log cgoubert@cumin1001 START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
[09:50:38] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host ping3003.esams.wmnet
[09:50:39] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[09:53:38] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
[09:54:42] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
[09:54:42] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:54:43] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.wipe-cache ping3003.esams.wmnet on all recursors
[09:54:45] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping3003.esams.wmnet on all recursors
[09:56:18] <logmsgbot>	 !log cgoubert@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
[09:56:38] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: wikifunctions: Add AppArmor profile usage [deployment-charts] - 10https://gerrit.wikimedia.org/r/879282 (https://phabricator.wikimedia.org/T326785)
[09:56:51] <wikibugs>	 10SRE, 10Observability-Alerting, 10WMF-Legal, 10WikimediaMessages, and 2 others: Find the right procedure to update wiki footers (was en.wikibooks.org has changed legal footer) - https://phabricator.wikimedia.org/T317169 (10jcrespo) 👍 {F36153930}
[09:58:12] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.wikireplicas.add-wiki
[09:58:28] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/879269 (https://phabricator.wikimedia.org/T230600) (owner: 10Ayounsi)
[09:59:48] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping3003.esams.wmnet
[10:00:38] <wikibugs>	 (03PS1) 10MVernon: hiera: remove swift accounts_keys [labs/private] - 10https://gerrit.wikimedia.org/r/879283 (https://phabricator.wikimedia.org/T162123)
[10:01:12] <XioNoX>	 !log reboot asw2-esams for upgrade - T316532
[10:01:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:01:16] <stashbot>	 T316532: Upgrade POPs asw to Junos 21 - https://phabricator.wikimedia.org/T316532
[10:01:54] <wikibugs>	 (03CR) 10MVernon: "Now we've deployed the global swift credential hiera entry, remove the per-site ones to reduce confusion in future :-)" [labs/private] - 10https://gerrit.wikimedia.org/r/879283 (https://phabricator.wikimedia.org/T162123) (owner: 10MVernon)
[10:04:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43104 and previous config saved to /var/cache/conftool/dbconfig/20230112-100456-marostegui.json
[10:04:58] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[10:05:01] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[10:05:11] <icinga-wm>	 PROBLEM - VRRP status on cr3-esams is CRITICAL: VRRP CRITICAL - 3 inconsistent interfaces, 0 misconfigured interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23VRRP_status
[10:05:11] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[10:05:25] <icinga-wm>	 PROBLEM - BGP status on cr3-esams is CRITICAL: BGP CRITICAL - AS64605/IPv4: Connect - Anycast, AS64605/IPv4: Connect - Anycast, AS64600/IPv4: Connect - PyBal, AS64605/IPv4: Connect - Anycast, AS64605/IPv4: Connect - Anycast, AS64605/IPv4: Connect - Anycast, AS64605/IPv6: Connect - Anycast, AS64605/IPv6: Connect - Anycast, AS64600/IPv4: Connect - PyBal, AS64600/IPv4: Connect - PyBal, AS64605/IPv6: Connect - Anycast, AS64605/IPv6: Connect -
[10:05:25] <icinga-wm>	 , AS64605/IPv4: Connect - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:05:28] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[10:05:31] <XioNoX>	 expected ^
[10:05:41] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
[10:05:47] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 76, down: 2, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:05:57] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
[10:05:59] <icinga-wm>	 PROBLEM - Router interfaces on cr3-knams is CRITICAL: CRITICAL: host 91.198.174.246, interfaces up: 57, down: 4, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:06:10] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
[10:06:11] <icinga-wm>	 PROBLEM - BFD status on cr3-esams is CRITICAL: CRIT: Down: 10 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[10:06:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43105 and previous config saved to /var/cache/conftool/dbconfig/20230112-100616-marostegui.json
[10:06:21] <icinga-wm>	 PROBLEM - Router interfaces on cr3-esams is CRITICAL: CRITICAL: host 91.198.174.245, interfaces up: 70, down: 2, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:06:37] <icinga-wm>	 RECOVERY - puppet last run on gitlab2002 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[10:07:46] <jinxer-wm>	 (JobUnavailable) firing: (7) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:08:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43106 and previous config saved to /var/cache/conftool/dbconfig/20230112-100839-marostegui.json
[10:11:29] <icinga-wm>	 PROBLEM - Check systemd state on netbox1002 is CRITICAL: CRITICAL - degraded: The following units failed: netbox_ganeti_esams_sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:12:27] <icinga-wm>	 RECOVERY - VRRP status on cr3-esams is OK: VRRP OK - 0 misconfigured interfaces, 0 inconsistent interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23VRRP_status
[10:12:43] <icinga-wm>	 PROBLEM - Host 2620:0:862:1:91:198:174:62 is DOWN: PING CRITICAL - Packet loss = 100%
[10:12:43] <icinga-wm>	 PROBLEM - Host 2620:0:862:1:91:198:174:61 is DOWN: PING CRITICAL - Packet loss = 100%
[10:12:46] <jinxer-wm>	 (JobUnavailable) firing: (29) Reduced availability for job bird in ops@esams - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:12:49] <jinxer-wm>	 (ProbeDown) firing: (14) Service ncredir-https:443 has failed probes (http_ncredir-https_ip6)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:13:13] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 90, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:13:25] <icinga-wm>	 RECOVERY - Router interfaces on cr3-knams is OK: OK: host 91.198.174.246, interfaces up: 73, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:13:41] <icinga-wm>	 RECOVERY - BFD status on cr3-esams is OK: OK: UP: 19 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[10:13:49] <icinga-wm>	 RECOVERY - Router interfaces on cr3-esams is OK: OK: host 91.198.174.245, interfaces up: 84, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:13:57] <icinga-wm>	 RECOVERY - Host 2620:0:862:1:91:198:174:61 is UP: PING OK - Packet loss = 0%, RTA = 81.02 ms
[10:14:03] <icinga-wm>	 RECOVERY - Host 2620:0:862:1:91:198:174:62 is UP: PING OK - Packet loss = 0%, RTA = 81.04 ms
[10:14:23] <icinga-wm>	 RECOVERY - BGP status on cr3-esams is OK: BGP OK - up: 20, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:15:15] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[10:15:26] <wikibugs>	 10SRE, 10SRE-OnFire, 10Infrastructure-Foundations, 10netops, and 2 others: Upgrade POPs asw to Junos 21 - https://phabricator.wikimedia.org/T316532 (10ayounsi)
[10:15:31] <icinga-wm>	 PROBLEM - Check systemd state on rpki2002 is CRITICAL: CRITICAL - degraded: The following units failed: node-bgpalerter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:15:51] <wikibugs>	 10SRE, 10SRE-OnFire, 10Infrastructure-Foundations, 10netops, and 2 others: Upgrade POPs asw to Junos 21 - https://phabricator.wikimedia.org/T316532 (10ayounsi) 10min downtime, everything went smooth.
[10:16:43] <wikibugs>	 10SRE, 10SRE-OnFire, 10Infrastructure-Foundations, 10netops, and 2 others: Upgrade POPs asw to Junos 21 - https://phabricator.wikimedia.org/T316532 (10ayounsi)
[10:17:46] <jinxer-wm>	 (JobUnavailable) firing: (30) Reduced availability for job bird in ops@esams - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:17:49] <jinxer-wm>	 (ProbeDown) resolved: (14) Service ncredir-https:443 has failed probes (http_ncredir-https_ip6)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:18:37] <wikibugs>	 (03PS1) 10Muehlenhoff: Add ping[123]003 [puppet] - 10https://gerrit.wikimedia.org/r/879284 (https://phabricator.wikimedia.org/T273509)
[10:19:17] <icinga-wm>	 PROBLEM - Check unit status of netbox_ganeti_esams_sync on netbox1002 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_esams_sync https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[10:21:30] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/879272 (https://phabricator.wikimedia.org/T325806) (owner: 10Filippo Giunchedi)
[10:23:13] <wikibugs>	 (03PS2) 10Muehlenhoff: Add ping[123]003 [puppet] - 10https://gerrit.wikimedia.org/r/879284 (https://phabricator.wikimedia.org/T273509)
[10:23:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43107 and previous config saved to /var/cache/conftool/dbconfig/20230112-102345-marostegui.json
[10:24:13] <icinga-wm>	 RECOVERY - Check systemd state on netbox1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:24:35] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add ping[123]003 [puppet] - 10https://gerrit.wikimedia.org/r/879284 (https://phabricator.wikimedia.org/T273509) (owner: 10Muehlenhoff)
[10:24:58] <XioNoX>	 !log rollback redirect ns2 to authdns1001 - T316532
[10:25:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:02] <stashbot>	 T316532: Upgrade POPs asw to Junos 21 - https://phabricator.wikimedia.org/T316532
[10:25:30] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
[10:29:03] <wikibugs>	 (03PS1) 10JMeybohm: admin_ng RBAC: If-guard additional permissions [deployment-charts] - 10https://gerrit.wikimedia.org/r/879285
[10:29:34] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
[10:29:49] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/879182 (owner: 10Slyngshede)
[10:29:51] <icinga-wm>	 RECOVERY - Check unit status of netbox_ganeti_esams_sync on netbox1002 is OK: OK: Status of the systemd unit netbox_ganeti_esams_sync https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[10:31:40] <wikibugs>	 (03PS1) 10Jelto: gitlab: start restore job later on replicas [puppet] - 10https://gerrit.wikimedia.org/r/879406 (https://phabricator.wikimedia.org/T326315)
[10:34:20] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/868732 (https://phabricator.wikimedia.org/T323483) (owner: 10FNegri)
[10:38:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43108 and previous config saved to /var/cache/conftool/dbconfig/20230112-103852-marostegui.json
[10:38:56] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.peering with action 'email' for AS: 8932
[10:39:45] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8932
[10:39:54] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.peering with action 'email' for AS: 8674
[10:39:55] <wikibugs>	 (03CR) 10Phedenskog: "This change is ready for review." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/875887 (https://phabricator.wikimedia.org/T321398) (owner: 10Phedenskog)
[10:40:47] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
[10:41:08] <wikibugs>	 (03CR) 10Phedenskog: "We plan to test out the other metrics with fewer labels so we gonna wait with adding those now, lets try with the CPU benchmark first." [puppet] - 10https://gerrit.wikimedia.org/r/875887 (https://phabricator.wikimedia.org/T321398) (owner: 10Phedenskog)
[10:41:31] <logmsgbot>	 !log hashar@deploy1002 Started deploy [integration/docroot@577d68a]: zuul: Link to report_url if available
[10:41:46] <logmsgbot>	 !log hashar@deploy1002 Finished deploy [integration/docroot@577d68a]: zuul: Link to report_url if available (duration: 00m 14s)
[10:42:21] <wikibugs>	 10SRE, 10Observability-Alerting, 10WMF-Legal, 10WikimediaMessages, and 2 others: Find the right procedure to update wiki footers (was en.wikibooks.org has changed legal footer) - https://phabricator.wikimedia.org/T317169 (10jcrespo) a:03jcrespo @Xaosflux So in the end, no change of procedure is needed fo...
[10:44:07] <wikibugs>	 (03PS32) 10Jbond: P:installserver::proxy: Add global whitelist and list mappings [puppet] - 10https://gerrit.wikimedia.org/r/753029 (https://phabricator.wikimedia.org/T300977)
[10:44:09] <wikibugs>	 (03PS1) 10Jbond: base::cache: drop wikimediafoundation.org from wikimedia_domains [puppet] - 10https://gerrit.wikimedia.org/r/879409 (https://phabricator.wikimedia.org/T300977)
[10:49:51] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.hosts.remove-downtime for 36 hosts
[10:50:03] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 36 hosts
[10:51:19] <wikibugs>	 (03CR) 10FNegri: [C: 03+2] Make sure cloud_cumin public key is evaluated [puppet] - 10https://gerrit.wikimedia.org/r/868732 (https://phabricator.wikimedia.org/T323483) (owner: 10FNegri)
[10:53:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43109 and previous config saved to /var/cache/conftool/dbconfig/20230112-105358-marostegui.json
[10:54:01] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[10:54:02] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[10:54:24] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
[10:54:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43110 and previous config saved to /var/cache/conftool/dbconfig/20230112-105430-marostegui.json
[10:54:32] <wikibugs>	 (03PS12) 10FNegri: Make sure cloud_cumin public key is evaluated [puppet] - 10https://gerrit.wikimedia.org/r/868732 (https://phabricator.wikimedia.org/T323483)
[10:56:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43111 and previous config saved to /var/cache/conftool/dbconfig/20230112-105652-marostegui.json
[10:57:54] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "just rebased" [puppet] - 10https://gerrit.wikimedia.org/r/868732 (https://phabricator.wikimedia.org/T323483) (owner: 10FNegri)
[10:58:29] <wikibugs>	 (03CR) 10FNegri: [C: 03+2] Make sure cloud_cumin public key is evaluated [puppet] - 10https://gerrit.wikimedia.org/r/868732 (https://phabricator.wikimedia.org/T323483) (owner: 10FNegri)
[11:00:04] <jouncebot>	 mvolz: OwO what's this, a deployment window?? Services – Citoid / Zotero. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T1100). nyaa~
[11:00:04] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T1100)
[11:00:36] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+1 C: 03+2] Add untrusted/customer/parked prefixes to bgpalerter [puppet] - 10https://gerrit.wikimedia.org/r/879269 (https://phabricator.wikimedia.org/T230600) (owner: 10Ayounsi)
[11:04:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me!" [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315) (owner: 10Jbond)
[11:11:17] <icinga-wm>	 PROBLEM - Check systemd state on rpki2002 is CRITICAL: CRITICAL - degraded: The following units failed: node-bgpalerter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:11:22] <zabe>	 !log mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "Defender" "Elton" # T298707
[11:11:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:11:26] <stashbot>	 T298707: "InvalidArgumentException: Blocker must be a local user" from GlobalBlocking - https://phabricator.wikimedia.org/T298707
[11:11:55] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.peering with action 'email' for AS: 3302
[11:11:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43112 and previous config saved to /var/cache/conftool/dbconfig/20230112-111159-marostegui.json
[11:12:55] <logmsgbot>	 !log ayounsi@cumin1001 END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3302
[11:13:00] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.peering with action 'email' for AS: 3303
[11:14:20] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3303
[11:14:51] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.peering with action 'email' for AS: 25885
[11:15:05] <wikibugs>	 (03PS1) 10Urbanecm: throttle: Add new rule for cswiki course [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879412 (https://phabricator.wikimedia.org/T326792)
[11:15:13] <urbanecm>	 jouncebot: nowandnext
[11:15:13] <jouncebot>	 For the next 0 hour(s) and 44 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T1100)
[11:15:13] <jouncebot>	 For the next 0 hour(s) and 44 minute(s): MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T1100)
[11:15:13] <jouncebot>	 In 2 hour(s) and 44 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T1400)
[11:15:13] <jouncebot>	 In 2 hour(s) and 44 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T1400)
[11:15:30] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25885
[11:15:57] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by urbanecm@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879412 (https://phabricator.wikimedia.org/T326792) (owner: 10Urbanecm)
[11:16:41] <wikibugs>	 (03Merged) 10jenkins-bot: throttle: Add new rule for cswiki course [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879412 (https://phabricator.wikimedia.org/T326792) (owner: 10Urbanecm)
[11:17:06] <logmsgbot>	 !log urbanecm@deploy1002 Started scap: Backport for [[gerrit:879412|throttle: Add new rule for cswiki course (T326792)]]
[11:17:09] <stashbot>	 T326792: Request a throttle lift for a cswiki wiki course – 2023-01-12 - https://phabricator.wikimedia.org/T326792
[11:17:11] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqsin is CRITICAL: CRITICAL: host 103.102.166.130, interfaces up: 80, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[11:17:12] <wikibugs>	 10SRE, 10MW-on-K8s, 10observability, 10serviceops: Logging options for apache httpd in k8s - https://phabricator.wikimedia.org/T265876 (10Clement_Goubert) >>! In T265876#8512672, @Joe wrote: > We have now the logs in kafka, and thus should also be ingested in logstash, and create a dashboard. >  > Once tha...
[11:21:35] <wikibugs>	 (03PS33) 10Jbond: P:installserver::proxy: Add global whitelist and list mappings [puppet] - 10https://gerrit.wikimedia.org/r/753029 (https://phabricator.wikimedia.org/T300977)
[11:24:37] <wikibugs>	 10SRE, 10MW-on-K8s, 10SRE Observability, 10serviceops: Ingest php-slowlog in logstash - https://phabricator.wikimedia.org/T326794 (10Clement_Goubert)
[11:24:53] <logmsgbot>	 !log urbanecm@deploy1002 Finished scap: Backport for [[gerrit:879412|throttle: Add new rule for cswiki course (T326792)]] (duration: 07m 47s)
[11:24:57] <stashbot>	 T326792: Request a throttle lift for a cswiki wiki course – 2023-01-12 - https://phabricator.wikimedia.org/T326792
[11:25:02] <wikibugs>	 10SRE, 10MW-on-K8s, 10SRE Observability, 10serviceops: Ingest php-slowlog in logstash - https://phabricator.wikimedia.org/T326794 (10Clement_Goubert) 05Open→03In progress p:05Triage→03Medium
[11:25:09] <wikibugs>	 10SRE, 10MW-on-K8s, 10SRE Observability, 10serviceops, 10Patch-For-Review: Make logging work for mediawiki in k8s - https://phabricator.wikimedia.org/T288851 (10Clement_Goubert)
[11:26:02] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM! I'm ok to merge this as-is if it looks good to you" [puppet] - 10https://gerrit.wikimedia.org/r/875887 (https://phabricator.wikimedia.org/T321398) (owner: 10Phedenskog)
[11:26:54] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [labs/private] - 10https://gerrit.wikimedia.org/r/879283 (https://phabricator.wikimedia.org/T162123) (owner: 10MVernon)
[11:27:06] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43113 and previous config saved to /var/cache/conftool/dbconfig/20230112-112705-marostegui.json
[11:27:11] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] START helmfile.d/services/thumbor: sync
[11:29:28] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [codfw] START helmfile.d/services/thumbor: sync
[11:30:03] <wikibugs>	 10SRE, 10MW-on-K8s, 10SRE Observability, 10serviceops: Ingest php-slowlog in logstash - https://phabricator.wikimedia.org/T326794 (10Clement_Goubert) The retention of the kafka topic is currently the default 7 days. This will be reduced once logstash ingestion is setup.
[11:34:56] <wikibugs>	 (03PS11) 10Jbond: environment: add no_proxy config directly to environment [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315)
[11:34:58] <wikibugs>	 (03PS1) 10Jbond: P:environment: roll out no proxy config to all hosts [puppet] - 10https://gerrit.wikimedia.org/r/879418 (https://phabricator.wikimedia.org/T300977)
[11:35:17] <wikibugs>	 (03PS12) 10Jbond: environment: add no_proxy config directly to environment [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315)
[11:36:14] <wikibugs>	 (03CR) 10Jbond: environment: add no_proxy config directly to environment (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315) (owner: 10Jbond)
[11:36:25] <wikibugs>	 (03PS1) 10Majavah: admin: remove duplicate users from ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/879420
[11:36:27] <wikibugs>	 (03PS1) 10Majavah: admin: add a test to prevent duplicates in users/ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/879421
[11:37:11] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] environment: add no_proxy config directly to environment [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315) (owner: 10Jbond)
[11:37:15] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
[11:37:40] <wikibugs>	 (03PS13) 10Jbond: environment: add no_proxy config directly to environment [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315)
[11:39:24] <wikibugs>	 (03PS2) 10KartikMistry: testwiki: Use Parsoid in Mediawiki Core for Content Translation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879276 (https://phabricator.wikimedia.org/T323667)
[11:39:26] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] environment: add no_proxy config directly to environment [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315) (owner: 10Jbond)
[11:39:39] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [codfw] DONE helmfile.d/services/thumbor: sync
[11:40:18] <wikibugs>	 (03CR) 10Clément Goubert: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/879417 (https://phabricator.wikimedia.org/T326794) (owner: 10Clément Goubert)
[11:40:37] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39106/console" [puppet] - 10https://gerrit.wikimedia.org/r/879417 (https://phabricator.wikimedia.org/T326794) (owner: 10Clément Goubert)
[11:41:59] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [codfw] START helmfile.d/services/thumbor: sync
[11:42:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43114 and previous config saved to /var/cache/conftool/dbconfig/20230112-114212-marostegui.json
[11:42:14] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
[11:42:16] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[11:42:27] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
[11:42:44] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[11:42:44] <wikibugs>	 (03PS14) 10Jbond: environment: add no_proxy config directly to environment [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315)
[11:42:57] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[11:43:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43115 and previous config saved to /var/cache/conftool/dbconfig/20230112-114302-marostegui.json
[11:45:25] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39110/console" [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315) (owner: 10Jbond)
[11:45:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43116 and previous config saved to /var/cache/conftool/dbconfig/20230112-114524-marostegui.json
[11:45:31] <wikibugs>	 (03PS15) 10Jbond: environment: add no_proxy config directly to environment [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315)
[11:49:18] <wikibugs>	 (03PS16) 10Jbond: environment: add no_proxy config directly to environment [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315)
[11:50:45] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39112/console" [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315) (owner: 10Jbond)
[11:52:21] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [codfw] DONE helmfile.d/services/thumbor: sync
[11:52:23] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqsin is CRITICAL: CRITICAL: host 103.102.166.130, interfaces up: 80, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[11:54:37] <XioNoX>	 !log re-seating cr2-esams fpc0 linecard  - T318783
[11:54:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:41] <stashbot>	 T318783: cr2-esams:FPC0 Parity error - https://phabricator.wikimedia.org/T318783
[13:31:22] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "lgtm but a few minor minor issues" [puppet] - 10https://gerrit.wikimedia.org/r/879051 (https://phabricator.wikimedia.org/T311385) (owner: 10Ayounsi)
[13:36:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43131 and previous config saved to /var/cache/conftool/dbconfig/20230112-133636-marostegui.json
[13:39:45] <wikibugs>	 (03PS1) 10Slyngshede: CNAME for idm-test [dns] - 10https://gerrit.wikimedia.org/r/879522
[13:45:51] <wikibugs>	 10SRE, 10SRE-OnFire, 10Release-Engineering-Team, 10serviceops-collab, 10Sustainability: Remove old scap repositories from deploy1002 - https://phabricator.wikimedia.org/T309162 (10hashar) The issue we had was to compare the state of the repositories between the two deployment servers. One of them had som...
[13:49:47] <wikibugs>	 (03PS7) 10Acamicamacaraca: Allow administrators to revoke autopatroller rights on sh.WP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/871272 (https://phabricator.wikimedia.org/T325938)
[13:50:45] <icinga-wm>	 PROBLEM - Host mc2040 is DOWN: PING CRITICAL - Packet loss = 100%
[13:51:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43132 and previous config saved to /var/cache/conftool/dbconfig/20230112-135143-marostegui.json
[13:53:21] <logmsgbot>	 !log jelto@cumin1001 START - Cookbook sre.gitlab.upgrade
[13:58:06] <wikibugs>	 10SRE, 10Observability-Alerting, 10WMF-Legal, 10WikimediaMessages, and 2 others: Find the right procedure to update wiki footers (was en.wikibooks.org has changed legal footer) - https://phabricator.wikimedia.org/T317169 (10jcrespo) Documentation/runbook: https://wikitech.wikimedia.org/wiki/Check_legal_html
[14:00:04] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T1400)
[14:00:05] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: My dear minions, it's time we take the moon! Just kidding. Time for UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T1400).
[14:00:05] <jouncebot>	 Aca and MatmaRex: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:15] <Aca>	 Hey! I’d like to confirm that I’m present here regarding the deployment of patch 871272 (Allow administrators to revoke autopatroller rights on sh.WP).
[14:00:43] <taavi>	 o/ I can deploy today
[14:00:46] <MatmaRex>	 hi
[14:01:06] <taavi>	 Aca: do you have the x-wikimedia-debug browser extension installed?
[14:01:26] <Aca>	 Yep. Should I open it and which server should I select?
[14:01:34] <MatmaRex>	 (my backport has no obvious effect that can be tested, it only affects some logging)
[14:01:53] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] "backport" [extensions/DiscussionTools] (wmf/1.40.0-wmf.18) - 10https://gerrit.wikimedia.org/r/879101 (owner: 10Bartosz Dziewoński)
[14:02:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by taavi@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/871272 (https://phabricator.wikimedia.org/T325938) (owner: 10Acamicamacaraca)
[14:03:01] <wikibugs>	 (03Merged) 10jenkins-bot: Allow administrators to revoke autopatroller rights on sh.WP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/871272 (https://phabricator.wikimedia.org/T325938) (owner: 10Acamicamacaraca)
[14:03:27] <logmsgbot>	 !log taavi@deploy1002 Started scap: Backport for [[gerrit:871272|Allow administrators to revoke autopatroller rights on sh.WP (T325938)]]
[14:03:31] <stashbot>	 T325938: Change the configuration for revoking some rights on sh.WP - https://phabricator.wikimedia.org/T325938
[14:04:33] <taavi>	 Aca: i'll let you know when your patch can be tested, but when it's available you can pick any of the mwdebug servers
[14:04:42] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] environment: add no_proxy config directly to environment [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315) (owner: 10Jbond)
[14:05:15] <logmsgbot>	 !log taavi@deploy1002 taavi and aleksandar: Backport for [[gerrit:871272|Allow administrators to revoke autopatroller rights on sh.WP (T325938)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
[14:05:29] <taavi>	 Aca: your patch is now available for testing
[14:05:54] <Aca>	 Okie. I would also like to ask if I should select any of the additional options (XHGui, Verbosе)
[14:06:37] <taavi>	 no, just select a server and set the enabled switch to 'ON'
[14:06:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43133 and previous config saved to /var/cache/conftool/dbconfig/20230112-140649-marostegui.json
[14:06:51] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
[14:06:53] <Aca>	 Alright
[14:06:54] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
[14:06:54] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[14:06:57] <logmsgbot>	 !log jiji@cumin1001 START - Cookbook sre.hosts.reimage for host mc1040.eqiad.wmnet with OS bullseye
[14:07:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43134 and previous config saved to /var/cache/conftool/dbconfig/20230112-140659-marostegui.json
[14:07:33] <wikibugs>	 (03Merged) 10jenkins-bot: Track callers of parseRevisionParsoidHtml. [extensions/DiscussionTools] (wmf/1.40.0-wmf.18) - 10https://gerrit.wikimedia.org/r/879101 (owner: 10Bartosz Dziewoński)
[14:09:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43135 and previous config saved to /var/cache/conftool/dbconfig/20230112-140921-marostegui.json
[14:09:28] <wikibugs>	 (03PS1) 10Volans: cumin: set version during Debian build [software/cumin] - 10https://gerrit.wikimedia.org/r/879546
[14:10:22] <taavi>	 Aca: hey, is it working? do you need help with anything?
[14:10:37] <icinga-wm>	 RECOVERY - BGP status on cr3-ulsfo is OK: BGP OK - up: 89, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:10:43] <Aca>	 Everything looks correct. I also checked the special page User group rights, and everything seems to have been updated accordingly.
[14:10:56] <wikibugs>	 (03PS17) 10Jbond: environment: add no_proxy config directly to environment [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315)
[14:10:59] <taavi>	 great! pushing the changes to all the servers
[14:11:13] <taavi>	 you can turn off the x-wikimedia-debug extension now, if you didn't already
[14:11:31] <Aca>	 Done
[14:12:46] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39115/console" [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315) (owner: 10Jbond)
[14:12:53] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] environment: add no_proxy config directly to environment (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315) (owner: 10Jbond)
[14:13:01] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] environment: add no_proxy config directly to environment [puppet] - 10https://gerrit.wikimedia.org/r/878884 (https://phabricator.wikimedia.org/T278315) (owner: 10Jbond)
[14:15:15] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[14:16:58] <logmsgbot>	 !log taavi@deploy1002 Finished scap: Backport for [[gerrit:871272|Allow administrators to revoke autopatroller rights on sh.WP (T325938)]] (duration: 13m 30s)
[14:17:02] <stashbot>	 T325938: Change the configuration for revoking some rights on sh.WP - https://phabricator.wikimedia.org/T325938
[14:17:13] <taavi>	 Aca: your patch is now live
[14:17:18] <taavi>	 MatmaRex: yours is up next
[14:17:31] <Aca>	 Awesome. Thank you!
[14:17:34] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.wikireplicas.add-wiki
[14:17:46] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:17:55] <taavi>	 MatmaRex: do you still want to test the patch on a debug server or should I sync directly
[14:17:56] <taavi>	 yw
[14:18:12] <logmsgbot>	 !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1040.eqiad.wmnet with reason: host reimage
[14:18:26] <MatmaRex>	 taavi: up to you
[14:18:40] <MatmaRex>	 taavi: i can verify that it doesn't break normal functionality, i guess
[14:18:49] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:18:52] <taavi>	 i would prefer that if it's not too hard
[14:18:54] <logmsgbot>	 !log taavi@deploy1002 Started scap: Backport for [[gerrit:879101|Track callers of parseRevisionParsoidHtml.]]
[14:19:13] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:19:27] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:20:33] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
[14:20:39] <logmsgbot>	 !log taavi@deploy1002 taavi and matmarex: Backport for [[gerrit:879101|Track callers of parseRevisionParsoidHtml.]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
[14:20:40] <logmsgbot>	 !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1040.eqiad.wmnet with reason: host reimage
[14:20:52] <taavi>	 MatmaRex: pulled to the test servers
[14:21:06] <MatmaRex>	 looking
[14:22:24] <MatmaRex>	 taavi: seems good
[14:22:25] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 9.699 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:22:31] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Mon 20 Feb 2023 05:31:14 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:22:39] <taavi>	 thanks! syncing
[14:23:27] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 49420 bytes in 0.155 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:24:19] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
[14:24:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43136 and previous config saved to /var/cache/conftool/dbconfig/20230112-142428-marostegui.json
[14:26:18] <logmsgbot>	 !log jelto@cumin1001 END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
[14:28:28] <logmsgbot>	 !log taavi@deploy1002 Finished scap: Backport for [[gerrit:879101|Track callers of parseRevisionParsoidHtml.]] (duration: 09m 34s)
[14:28:41] <taavi>	 MatmaRex: done!
[14:28:47] <taavi>	 anyone have anything else to deploy?
[14:28:49] <MatmaRex>	 thanks taavi
[14:33:59] <taavi>	 !log UTC afternoon backports done
[14:34:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:55] <logmsgbot>	 !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1040.eqiad.wmnet with OS bullseye
[14:37:20] <moritzm>	 !log installing sqlite3 security updates on buster
[14:37:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:38:44] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host moss-fe1002.eqiad.wmnet
[14:39:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43137 and previous config saved to /var/cache/conftool/dbconfig/20230112-143934-marostegui.json
[14:42:22] <logmsgbot>	 !log btullis@cumin1001 Added views for new wiki: guwwikiquote T321288
[14:42:22] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
[14:42:25] <stashbot>	 T321288: Prepare and check storage layer for guwwikiquote - https://phabricator.wikimedia.org/T321288
[14:44:22] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1002.eqiad.wmnet
[14:49:51] <icinga-wm>	 RECOVERY - BGP status on cr2-esams is OK: BGP OK - up: 467, down: 3, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:50:21] <moritzm>	 !log installing postgresql-11 security updates on puppetdb1002
[14:50:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:48] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host moss-fe2002.codfw.wmnet
[14:53:31] <wikibugs>	 (03PS1) 10Effie Mouzeli: hieradata: disable maps tile_generation timers for planet import [puppet] - 10https://gerrit.wikimedia.org/r/879556 (https://phabricator.wikimedia.org/T314472)
[14:54:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43138 and previous config saved to /var/cache/conftool/dbconfig/20230112-145441-marostegui.json
[14:54:43] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[14:54:45] <stashbot>	 T321391: Add new column cu_log.cul_reason_id and cu_log.cul_reason_plaintext_id to wmf wikis - https://phabricator.wikimedia.org/T321391
[14:54:56] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[14:56:28] <wikibugs>	 (03CR) 10Jgiannelos: [C: 03+1] hieradata: disable maps tile_generation timers for planet import [puppet] - 10https://gerrit.wikimedia.org/r/879556 (https://phabricator.wikimedia.org/T314472) (owner: 10Effie Mouzeli)
[14:58:28] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe2002.codfw.wmnet
[15:01:55] <wikibugs>	 (03CR) 10Effie Mouzeli: ipsec: remove ipsec role and the strongswan module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/875897 (owner: 10Effie Mouzeli)
[15:02:14] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "Agreed! LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/879520 (https://phabricator.wikimedia.org/T299125) (owner: 10MVernon)
[15:02:16] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] hieradata: disable maps tile_generation timers for planet import [puppet] - 10https://gerrit.wikimedia.org/r/879556 (https://phabricator.wikimedia.org/T314472) (owner: 10Effie Mouzeli)
[15:04:16] <wikibugs>	 (03CR) 10Ssingh: [C: 03+1] "LGTM! Let's coordinate the deployment when we merge this." [puppet] - 10https://gerrit.wikimedia.org/r/878201 (https://phabricator.wikimedia.org/T323723) (owner: 10BCornwall)
[15:05:15] <logmsgbot>	 !log jiji@cumin1001 START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
[15:05:50] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/875897 (owner: 10Effie Mouzeli)
[15:06:14] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] ipsec: remove ipsec role and the strongswan module [puppet] - 10https://gerrit.wikimedia.org/r/875897 (owner: 10Effie Mouzeli)
[15:06:22] <wikibugs>	 (03PS4) 10Effie Mouzeli: ipsec: remove ipsec role and the strongswan module [puppet] - 10https://gerrit.wikimedia.org/r/875897
[15:06:30] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
[15:06:50] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Update README [deployment-charts] - 10https://gerrit.wikimedia.org/r/879557
[15:10:19] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
[15:11:14] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.wikireplicas.add-wiki
[15:11:58] <logmsgbot>	 !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
[15:15:30] <wikibugs>	 10SRE, 10Traffic: Remove IPSec/Strongswan from Puppet repository - https://phabricator.wikimedia.org/T326745 (10BBlack) https://gerrit.wikimedia.org/r/c/operations/puppet/+/875897/ ! (apparently someone was already working on this!)
[15:15:44] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [dns] - 10https://gerrit.wikimedia.org/r/879522 (owner: 10Slyngshede)
[15:18:55] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [software/cumin] - 10https://gerrit.wikimedia.org/r/879546 (owner: 10Volans)
[15:18:58] <wikibugs>	 (03CR) 10Svantje Lilienthal: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879559 (https://phabricator.wikimedia.org/T326317) (owner: 10Svantje Lilienthal)
[15:20:56] <wikibugs>	 10Puppet, 10SRE, 10Data-Services, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): clouddumps1002: ferm is being started on every puppet run - https://phabricator.wikimedia.org/T323324 (10Andrew) >>! In T323324#8518687, @taavi wrote: > can you try running `ferm-status` with `--verbose`?  Yea...
[15:24:19] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[15:25:45] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 49420 bytes in 0.132 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[15:27:59] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www-7.4.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method
[15:28:47] <effie>	 !log Planet import in codfw (on maps2009) started at 15:26 UTC - T314472
[15:28:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:28:51] <stashbot>	 T314472: Re-import full planet data into eqiad and codfw - https://phabricator.wikimedia.org/T314472
[15:29:33] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on api_appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=POST
[15:29:50] <wikibugs>	 (03PS1) 10Stang: etwikiquote: Switch logo variant back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879561 (https://phabricator.wikimedia.org/T313698)
[15:31:42] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [dns] - 10https://gerrit.wikimedia.org/r/879522 (owner: 10Slyngshede)
[15:34:31] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
[15:34:43] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] cumin: set version during Debian build [software/cumin] - 10https://gerrit.wikimedia.org/r/879546 (owner: 10Volans)
[15:34:44] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
[15:35:07] <wikibugs>	 (03CR) 10Volans: [C: 03+2] cumin: set version during Debian build [software/cumin] - 10https://gerrit.wikimedia.org/r/879546 (owner: 10Volans)
[15:35:48] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to datacenter-ops for Jennifer Hancock - https://phabricator.wikimedia.org/T326649 (10Papaul)
[15:35:50] <logmsgbot>	 !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
[15:36:06] <logmsgbot>	 !log btullis@cumin1001 Added views for new wiki: shnwikibooks T321256
[15:36:06] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
[15:36:09] <stashbot>	 T321256: Prepare and check storage layer for shnwikibooks - https://phabricator.wikimedia.org/T321256
[15:38:49] <wikibugs>	 (03CR) 10Hokwelum: "Hello, looks like bd808 and raymond-ndibe are still listed as wmcs-roots but are no longer on the team. Perhaps, they could be taken off t" [puppet] - 10https://gerrit.wikimedia.org/r/879274 (owner: 10Majavah)
[15:41:05] <wikibugs>	 (03PS2) 10Jbond: P:environment: roll out no proxy config to all hosts [puppet] - 10https://gerrit.wikimedia.org/r/879418 (https://phabricator.wikimedia.org/T300977)
[15:42:01] <wikibugs>	 (03Merged) 10jenkins-bot: cumin: set version during Debian build [software/cumin] - 10https://gerrit.wikimedia.org/r/879546 (owner: 10Volans)
[15:42:21] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] P:environment: roll out no proxy config to all hosts [puppet] - 10https://gerrit.wikimedia.org/r/879418 (https://phabricator.wikimedia.org/T300977) (owner: 10Jbond)
[15:44:08] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.wikireplicas.add-wiki
[15:46:12] <wikibugs>	 (03CR) 10Herron: [C: 03+1] systemd: send ::syslog output to remote destination [puppet] - 10https://gerrit.wikimedia.org/r/879272 (https://phabricator.wikimedia.org/T325806) (owner: 10Filippo Giunchedi)
[15:46:24] <logmsgbot>	 !log hnowlan@puppetmaster1001 conftool action : set/weight=8; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
[15:47:47] <logmsgbot>	 !log hnowlan@puppetmaster1001 conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
[15:52:54] <wikibugs>	 (03PS1) 10Marostegui: install_server: Adjust new mariadb hosts [puppet] - 10https://gerrit.wikimedia.org/r/879563
[15:53:42] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Adjust new mariadb hosts [puppet] - 10https://gerrit.wikimedia.org/r/879563 (owner: 10Marostegui)
[16:03:57] <wikibugs>	 (03PS1) 10Jbond: ssh: add new match_config parameter [puppet] - 10https://gerrit.wikimedia.org/r/879586 (https://phabricator.wikimedia.org/T323484)
[16:04:28] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ssh: add new match_config parameter [puppet] - 10https://gerrit.wikimedia.org/r/879586 (https://phabricator.wikimedia.org/T323484) (owner: 10Jbond)
[16:05:02] <wikibugs>	 (03PS2) 10Jbond: ssh: add new match_config parameter [puppet] - 10https://gerrit.wikimedia.org/r/879586 (https://phabricator.wikimedia.org/T323484)
[16:05:33] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ssh: add new match_config parameter [puppet] - 10https://gerrit.wikimedia.org/r/879586 (https://phabricator.wikimedia.org/T323484) (owner: 10Jbond)
[16:06:03] <wikibugs>	 (03PS3) 10Jbond: ssh: add new match_config parameter [puppet] - 10https://gerrit.wikimedia.org/r/879586 (https://phabricator.wikimedia.org/T323484)
[16:08:51] <logmsgbot>	 !log btullis@cumin1001 Added views for new wiki: bjnwiktionary T312214
[16:08:51] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
[16:08:57] <stashbot>	 T312214: Prepare and check storage layer for bjnwiktionary - https://phabricator.wikimedia.org/T312214
[16:09:59] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39116/console" [puppet] - 10https://gerrit.wikimedia.org/r/879586 (https://phabricator.wikimedia.org/T323484) (owner: 10Jbond)
[16:10:03] <jinxer-wm>	 (ProbeDown) firing: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:10:40] <wikibugs>	 (03PS3) 10Jbond: P:environment: roll out no proxy config to all hosts [puppet] - 10https://gerrit.wikimedia.org/r/879418 (https://phabricator.wikimedia.org/T300977)
[16:13:12] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/879586 (https://phabricator.wikimedia.org/T323484) (owner: 10Jbond)
[16:14:01] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] START helmfile.d/services/thumbor: sync
[16:14:04] <wikibugs>	 (03PS4) 10Jbond: ssh: add new match_config parameter [puppet] - 10https://gerrit.wikimedia.org/r/879586 (https://phabricator.wikimedia.org/T323484)
[16:14:13] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] ssh: add new match_config parameter [puppet] - 10https://gerrit.wikimedia.org/r/879586 (https://phabricator.wikimedia.org/T323484) (owner: 10Jbond)
[16:14:15] <wikibugs>	 (03PS2) 10Vlad.shapik: WIP: Update Thumbor repository according to the latest changes [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/876229 (https://phabricator.wikimedia.org/T325811)
[16:14:43] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Enable kartographer external data parse time fetch for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879559 (https://phabricator.wikimedia.org/T326317) (owner: 10Svantje Lilienthal)
[16:14:45] <wikibugs>	 (03PS3) 10Vlad.shapik: WIP: Update Thumbor repository according to the latest changes [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/876229 (https://phabricator.wikimedia.org/T325811)
[16:15:03] <jinxer-wm>	 (ProbeDown) resolved: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:15:06] <wikibugs>	 (03PS4) 10Vlad.shapik: WIP: Update Thumbor repository according to the latest changes [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/876229 (https://phabricator.wikimedia.org/T325811)
[16:17:05] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www-7.4.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method
[16:18:34] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP: Update Thumbor repository according to the latest changes [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/876229 (https://phabricator.wikimedia.org/T325811) (owner: 10Vlad.shapik)
[16:18:36] <wikibugs>	 (03PS1) 10Zabe: Stop writing to cul_user and cul_user_text on a few wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879590 (https://phabricator.wikimedia.org/T233004)
[16:18:41] <wikibugs>	 (03PS1) 10Zabe: Start writing to rev_comment_id on group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879591 (https://phabricator.wikimedia.org/T299954)
[16:19:42] <zabe>	 jouncebot, nowandnext
[16:19:42] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 40 minute(s)
[16:19:42] <jouncebot>	 In 0 hour(s) and 40 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T1700)
[16:20:11] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Stop writing to cul_user and cul_user_text on a few wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879590 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[16:20:17] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www-7.4.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method
[16:20:24] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Start writing to rev_comment_id on group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879591 (https://phabricator.wikimedia.org/T299954) (owner: 10Zabe)
[16:21:03] <wikibugs>	 (03Merged) 10jenkins-bot: Stop writing to cul_user and cul_user_text on a few wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879590 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[16:21:12] <wikibugs>	 (03Merged) 10jenkins-bot: Start writing to rev_comment_id on group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879591 (https://phabricator.wikimedia.org/T299954) (owner: 10Zabe)
[16:21:46] <logmsgbot>	 !log zabe@deploy1002 Started scap: Backport for [[gerrit:879590|Stop writing to cul_user and cul_user_text on a few wikis (T233004)]], [[gerrit:879591|Start writing to rev_comment_id on group1 wikis (T299954)]]
[16:21:51] <stashbot>	 T299954: Write code for handing write and read of rev_comment_id - https://phabricator.wikimedia.org/T299954
[16:21:51] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[16:23:31] <logmsgbot>	 !log zabe@deploy1002 zabe and zabe: Backport for [[gerrit:879590|Stop writing to cul_user and cul_user_text on a few wikis (T233004)]], [[gerrit:879591|Start writing to rev_comment_id on group1 wikis (T299954)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
[16:24:39] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_dispersion_stats_lowlatency.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:28:17] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on api_appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=POST
[16:31:35] <logmsgbot>	 !log zabe@deploy1002 Finished scap: Backport for [[gerrit:879590|Stop writing to cul_user and cul_user_text on a few wikis (T233004)]], [[gerrit:879591|Start writing to rev_comment_id on group1 wikis (T299954)]] (duration: 09m 49s)
[16:31:40] <stashbot>	 T299954: Write code for handing write and read of rev_comment_id - https://phabricator.wikimedia.org/T299954
[16:31:41] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[16:34:05] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
[16:36:10] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] cirrus: Divert requests with x-public-cloud set to a dedicated pool counter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879161 (https://phabricator.wikimedia.org/T326757) (owner: 10Ebernhardson)
[16:37:29] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:41:09] <wikibugs>	 (03CR) 10Dzahn: "ACK, thank you Alex! I will do so to clean up." [puppet] - 10https://gerrit.wikimedia.org/r/685914 (https://phabricator.wikimedia.org/T280718) (owner: 10Dzahn)
[16:41:17] <wikibugs>	 (03Abandoned) 10Dzahn: thumbor/mwmaint: add periodic job to pull fc-list file [puppet] - 10https://gerrit.wikimedia.org/r/685914 (https://phabricator.wikimedia.org/T280718) (owner: 10Dzahn)
[16:43:58] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.wikireplicas.add-wiki
[16:46:39] <jinxer-wm>	 (HelmReleaseBadStatus) firing: Helm release thumbor/main on k8s@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=thumbor - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[16:47:46] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] START helmfile.d/services/thumbor: sync
[16:48:07] <hnowlan>	 neat alert! that was my doing, fixing
[16:48:33] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] START helmfile.d/services/thumbor: sync
[16:48:36] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
[16:49:39] <wikibugs>	 (03PS1) 10Ryan Kemper: [WIP] wdqs: add recording rule for req success ratio [puppet] - 10https://gerrit.wikimedia.org/r/879599 (https://phabricator.wikimedia.org/T323064)
[16:51:39] <jinxer-wm>	 (HelmReleaseBadStatus) resolved: Helm release thumbor/main on k8s@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=thumbor - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[16:54:15] <wikibugs>	 10SRE, 10ops-ulsfo, 10DC-Ops, 10Traffic: ulsfo refresh scheduling - https://phabricator.wikimedia.org/T317249 (10RobH) 05Open→03Resolved
[16:54:19] <wikibugs>	 (03PS1) 10Stang: nlwiki: Add block right to checkuser group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879600 (https://phabricator.wikimedia.org/T326355)
[16:54:33] <wikibugs>	 10SRE, 10ops-ulsfo, 10decommission-hardware: ulsfo unified decom task - https://phabricator.wikimedia.org/T321596 (10RobH) 05Open→03Resolved
[16:54:38] <wikibugs>	 10SRE, 10ops-ulsfo, 10DC-Ops, 10Traffic, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10RobH)
[16:54:52] <wikibugs>	 10SRE, 10ops-ulsfo, 10DC-Ops: ulsfo next visit checklist - https://phabricator.wikimedia.org/T322861 (10RobH) 05Open→03Resolved
[16:55:13] <wikibugs>	 10SRE, 10ops-ulsfo, 10Traffic, 10decommission-hardware: decommission dns4001 - https://phabricator.wikimedia.org/T319215 (10RobH) 05Open→03Resolved this host is now gone
[16:57:26] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
[16:57:32] <wikibugs>	 10SRE, 10ops-codfw: codfw:test new Supermicro server - https://phabricator.wikimedia.org/T322578 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host sretest2002.codfw.wmnet with OS bullseye
[16:58:24] <wikibugs>	 (03PS5) 10Vlad.shapik: WIP: Update Thumbor repository according to the latest changes [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/876229 (https://phabricator.wikimedia.org/T325811)
[16:59:10] <wikibugs>	 (03PS6) 10Vlad.shapik: WIP: Update Thumbor repository according to the latest changes [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/876229 (https://phabricator.wikimedia.org/T325811)
[16:59:17] <wikibugs>	 (03PS1) 10Jcrespo: icinga:Update legal check to link to wikitech and add legal contact [puppet] - 10https://gerrit.wikimedia.org/r/879601 (https://phabricator.wikimedia.org/T317169)
[17:00:05] <jouncebot>	 jbond and rzl: It is that lovely time of the day again! You are hereby commanded to deploy Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T1700).
[17:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[17:01:28] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "Thank you! LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/879601 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[17:03:16] <wikibugs>	 (03CR) 10Dzahn: "thanks for your work on this, Jaime" [puppet] - 10https://gerrit.wikimedia.org/r/879601 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[17:03:53] <wikibugs>	 (03PS1) 10Jbond: ssh: update match_config data structure [puppet] - 10https://gerrit.wikimedia.org/r/879602
[17:04:38] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] icinga:Update legal check to link to wikitech and add legal contact [puppet] - 10https://gerrit.wikimedia.org/r/879601 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[17:04:44] <wikibugs>	 (03PS1) 10Ebernhardson: looksLikeAutomation: Allow flagging requests from arbitrary headers [extensions/CirrusSearch] (wmf/1.40.0-wmf.18) - 10https://gerrit.wikimedia.org/r/879571 (https://phabricator.wikimedia.org/T326757)
[17:05:19] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ssh: update match_config data structure [puppet] - 10https://gerrit.wikimedia.org/r/879602 (owner: 10Jbond)
[17:05:39] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] icinga:Update legal check to link to wikitech and add legal contact [puppet] - 10https://gerrit.wikimedia.org/r/879601 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[17:06:39] <wikibugs>	 (03PS2) 10Jbond: ssh: update match_config data structure [puppet] - 10https://gerrit.wikimedia.org/r/879602
[17:08:16] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39118/console" [puppet] - 10https://gerrit.wikimedia.org/r/879602 (owner: 10Jbond)
[17:08:35] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "after changes to contacts/contacgroups it's usually a good idea to run a "sudo icinga -v /etc/icinga/icinga.cfg" on the server (alert1001)" [puppet] - 10https://gerrit.wikimedia.org/r/879601 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[17:08:38] <logmsgbot>	 !log btullis@cumin1001 Added views for new wiki: aswikiquote T321294
[17:08:38] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
[17:08:41] <stashbot>	 T321294: Prepare and check storage layer for aswikiquote - https://phabricator.wikimedia.org/T321294
[17:09:20] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] icinga:Update legal check to link to wikitech and add legal contact (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/879601 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[17:09:57] <wikibugs>	 (03PS1) 10Jbond: ssh::server: add validate_cmd to sshd_config [puppet] - 10https://gerrit.wikimedia.org/r/879605
[17:12:40] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] icinga:Update legal check to link to wikitech and add legal contact (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/879601 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[17:13:29] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] "Looks ok :-)" [puppet] - 10https://gerrit.wikimedia.org/r/879601 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[17:14:52] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] icinga:Update legal check to link to wikitech and add legal contact (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/879601 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[17:16:12] <wikibugs>	 (03PS1) 10Ryan Kemper: [WIP] wdqs: use pre-computed wdqs recording rules [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/879606 (https://phabricator.wikimedia.org/T323064)
[17:16:21] <wikibugs>	 (03PS2) 10Jbond: ssh::server: add validate_cmd to sshd_config [puppet] - 10https://gerrit.wikimedia.org/r/879605
[17:17:10] <wikibugs>	 (03PS2) 10Ryan Kemper: [WIP] wdqs: add recording rule for req success ratio [puppet] - 10https://gerrit.wikimedia.org/r/879599 (https://phabricator.wikimedia.org/T323064)
[17:17:34] <wikibugs>	 10SRE, 10Observability-Alerting, 10WMF-Legal, 10WikimediaMessages, and 2 others: Find the right procedure to update wiki footers (was en.wikibooks.org has changed legal footer) - https://phabricator.wikimedia.org/T317169 (10jcrespo) 05Open→03Resolved So the updated alarm has been deployed. Now the tick...
[17:18:29] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] icinga:Update legal check to link to wikitech and add legal contact (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/879601 (https://phabricator.wikimedia.org/T317169) (owner: 10Jcrespo)
[17:22:16] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] gitlab: start restore job later on replicas [puppet] - 10https://gerrit.wikimedia.org/r/879406 (https://phabricator.wikimedia.org/T326315) (owner: 10Jelto)
[17:32:10] <wikibugs>	 (03CR) 10Eevans: [C: 03+1] swift: disable swifrepl timer job [puppet] - 10https://gerrit.wikimedia.org/r/879520 (https://phabricator.wikimedia.org/T299125) (owner: 10MVernon)
[17:32:19] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v4.2.0 [software/cumin] - 10https://gerrit.wikimedia.org/r/879612
[17:42:17] <wikibugs>	 (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v4.2.0 [software/cumin] - 10https://gerrit.wikimedia.org/r/879612 (owner: 10Volans)
[17:42:51] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic, 10Patch-For-Review: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10RobH)
[17:44:25] <wikibugs>	 10SRE, 10ops-ulsfo, 10DC-Ops, 10Traffic, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10ssingh)
[17:45:21] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic, 10Patch-For-Review: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10ssingh) 05Open→03Resolved I think we can close this task and mark it as resolved. The original purpose for which this was required has now been met and fut...
[17:45:27] <mutante>	 !log powercycling mc2040 via mgmt ocnsole
[17:45:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:47:43] <icinga-wm>	 RECOVERY - Host mc2040 is UP: PING OK - Packet loss = 0%, RTA = 31.61 ms
[17:49:09] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v4.2.0 [software/cumin] - 10https://gerrit.wikimedia.org/r/879612 (owner: 10Volans)
[17:54:26] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
[17:54:31] <wikibugs>	 10SRE, 10ops-codfw: codfw:test new Supermicro server - https://phabricator.wikimedia.org/T322578 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host sretest2002.codfw.wmnet with OS bullseye executed with errors: - sretest2002 (**FAIL**)   - Downtimed on Icinga/Alert...
[17:58:53] <icinga-wm>	 RECOVERY - Check systemd state on logstash2026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:00:04] <jouncebot>	 bd808: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Technical Engagement weekly deploy (Toolhub, Developer portal, Striker) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T1800).
[18:00:04] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T1800)
[18:03:17] <wikibugs>	 10SRE, 10serviceops: Memcached, mcrouter in MediaWiki on Kubernetes - https://phabricator.wikimedia.org/T277711 (10jijiki) a:05Joe→03jijiki
[18:03:24] <wikibugs>	 10ops-eqiad, 10DC-Ops: hw troubleshooting: DIMM_B2 for mc2040.codfw.wmnet - https://phabricator.wikimedia.org/T326834 (10jijiki)
[18:07:27] <wikibugs>	 (03PS1) 10Ottomata: flink-kubernetes-operator - allow flink-app pods to talk to k8s API [deployment-charts] - 10https://gerrit.wikimedia.org/r/879618 (https://phabricator.wikimedia.org/T324576)
[18:08:10] <wikibugs>	 10ops-eqiad, 10DC-Ops: hw troubleshooting: DIMM_B2 for mc2040.codfw.wmnet - https://phabricator.wikimedia.org/T326834 (10Papaul) @jijiki mc2040 that is codfw not eqiad
[18:08:20] <wikibugs>	 10ops-codfw, 10DC-Ops: hw troubleshooting: DIMM_B2 for mc2040.codfw.wmnet - https://phabricator.wikimedia.org/T326834 (10Dzahn)
[18:08:50] <wikibugs>	 10ops-codfw, 10DC-Ops: hw troubleshooting: DIMM_B2 for mc2040.codfw.wmnet - https://phabricator.wikimedia.org/T326834 (10Clement_Goubert) From what I understand, you can work on it any time, and we don't need to depool it. We may want to downtime it before y'all work on it.
[18:15:15] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[18:16:05] <wikibugs>	 10ops-codfw, 10DC-Ops: hw troubleshooting: DIMM_B2 for mc2040.codfw.wmnet - https://phabricator.wikimedia.org/T326834 (10Jhancock.wm) @Clement_Goubert can you downtime the server? Please let me know when I can work on the server.
[18:17:11] <wikibugs>	 (03PS1) 10Volans: Upstream release v4.2.0 [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/879620
[18:17:46] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[18:18:44] <logmsgbot>	 !log cgoubert@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
[18:19:09] <logmsgbot>	 !log cgoubert@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
[18:19:14] <wikibugs>	 10ops-codfw, 10DC-Ops: hw troubleshooting: DIMM_B2 for mc2040.codfw.wmnet - https://phabricator.wikimedia.org/T326834 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=4016a17a-817d-4d48-be1d-b36713ff2632) set by cgoubert@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reaso...
[18:19:29] <wikibugs>	 10ops-codfw, 10DC-Ops: hw troubleshooting: DIMM_B2 for mc2040.codfw.wmnet - https://phabricator.wikimedia.org/T326834 (10Clement_Goubert) @Jhancock.wm Done.
[18:35:15] <icinga-wm>	 RECOVERY - Check systemd state on stat1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:35:46] <mutante>	 !log stat1007 - systemctl reset-failed  - clears Icinga alerts
[18:35:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:35:57] <wikibugs>	 (03CR) 10Herron: "nice! please see a few comments inline" [puppet] - 10https://gerrit.wikimedia.org/r/879599 (https://phabricator.wikimedia.org/T323064) (owner: 10Ryan Kemper)
[18:36:40] <mutante>	 !log stat1008 - systemctl reset-failed  - clears Icinga alerts from failed things of the past
[18:36:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:37:15] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:38:32] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops: Decommission mc20[19-27] and mc20[29-37] - https://phabricator.wikimedia.org/T313733 (10jijiki) a:05Jclark-ctr→03None
[18:40:13] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops: hw troubleshooting: DIMM_B2 for mc2040.codfw.wmnet - https://phabricator.wikimedia.org/T326834 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm @Clement_Goubert Thank you.   We powered down and swapped the A2 and B2 DIMM to see if the error carries over. as of right now w...
[18:40:40] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops: Decommission mc20[19-27] and mc20[29-37] - https://phabricator.wikimedia.org/T313733 (10Papaul) a:03Jhancock.wm
[18:41:36] <wikibugs>	 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops, 10serviceops: Q3:rack/setup/install ms-fe1013 - ms-fe1014, thanos-fe1004 - https://phabricator.wikimedia.org/T326846 (10RobH)
[18:41:51] <wikibugs>	 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops, 10serviceops: Q3:rack/setup/install ms-fe1013 - ms-fe1014, thanos-fe1004 - https://phabricator.wikimedia.org/T326846 (10RobH)
[18:43:00] <wikibugs>	 10SRE-swift-storage, 10serviceops: serviceops implementation tracking for ms-fe1013 - ms-fe1014, thanos-fe1004 - https://phabricator.wikimedia.org/T326847 (10RobH)
[18:44:34] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops: Decommission mc20[19-27] and mc20[29-37] - https://phabricator.wikimedia.org/T313733 (10Papaul) Hello can someone please confirm that those servers are ready for decom since they are are all active in Netbox . Thanks
[18:45:41] <wikibugs>	 10SRE-swift-storage, 10ops-codfw, 10DC-Ops: Q3:rack/setup/install ms-fe2013 - ms-fe2014, thanos-fe2004 - https://phabricator.wikimedia.org/T326848 (10RobH)
[18:46:17] <wikibugs>	 10SRE-swift-storage, 10ops-codfw, 10DC-Ops: Q3:rack/setup/install ms-fe2013 - ms-fe2014, thanos-fe2004 - https://phabricator.wikimedia.org/T326848 (10RobH)
[18:47:17] <wikibugs>	 10SRE-swift-storage, 10serviceops: serviceops implementation tracking for ms-fe2013 - ms-fe2014, thanos-fe2004 - https://phabricator.wikimedia.org/T326849 (10RobH)
[19:00:05] <jouncebot>	 jeena and dduvall: Time to snap out of that daydream and deploy MediaWiki train - Utc-7 Version. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T1900).
[19:02:45] <wikibugs>	 (03PS1) 10TrainBranchBot: all wikis to 1.40.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879622 (https://phabricator.wikimedia.org/T325581)
[19:02:47] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] all wikis to 1.40.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879622 (https://phabricator.wikimedia.org/T325581) (owner: 10TrainBranchBot)
[19:03:06] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: haproxy: work on systemd unit hardening (cp hosts) - https://phabricator.wikimedia.org/T323944 (10ssingh) The hardened haproxy unit has been running for a while on traffic-cache-bullseye.traffic.eqiad1.wikimedia.cloud without any issues. Pending any further comments or i...
[19:03:23] <wikibugs>	 (03Merged) 10jenkins-bot: all wikis to 1.40.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879622 (https://phabricator.wikimedia.org/T325581) (owner: 10TrainBranchBot)
[19:09:49] <wikibugs>	 (03PS1) 10JHathaway: facter block_devices support containers [puppet] - 10https://gerrit.wikimedia.org/r/879624
[19:10:17] <wikibugs>	 (03CR) 10JHathaway: "kindly review!" [puppet] - 10https://gerrit.wikimedia.org/r/879624 (owner: 10JHathaway)
[19:11:09] <logmsgbot>	 !log jhuneidi@deploy1002 rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.18  refs T325581
[19:11:13] <icinga-wm>	 PROBLEM - SSH on stat1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[19:11:13] <stashbot>	 T325581: 1.40.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T325581
[19:12:30] <icinga-wm>	 ACKNOWLEDGEMENT - Router interfaces on cr2-eqsin is CRITICAL: CRITICAL: host 103.102.166.130, interfaces up: 80, down: 1, dormant: 0, excluded: 0, unused: 0: daniel_zahn singtel maintenance https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[19:12:30] <icinga-wm>	 ACKNOWLEDGEMENT - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 70, down: 1, dormant: 0, excluded: 0, unused: 0: daniel_zahn singtel maintenance https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[19:14:39] <icinga-wm>	 PROBLEM - Check systemd state on stat1007 is CRITICAL: CRITICAL - degraded: The following units failed: jupyter-xcollazo-singleuser-conda-analytics.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:15:55] <icinga-wm>	 RECOVERY - SSH on stat1004 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[19:16:15] <icinga-wm>	 RECOVERY - Check systemd state on stat1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:23:58] <wikibugs>	 (03CR) 10BCornwall: [V: 03+1 C: 03+2] varnish: Template out thread pool settings [puppet] - 10https://gerrit.wikimedia.org/r/878201 (https://phabricator.wikimedia.org/T323723) (owner: 10BCornwall)
[19:52:42] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Add db1176 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/879630 (https://phabricator.wikimedia.org/T326116)
[19:53:30] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] instances.yaml: Add db1176 to dbctl [puppet] - 10https://gerrit.wikimedia.org/r/879630 (https://phabricator.wikimedia.org/T326116) (owner: 10Marostegui)
[19:55:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1176 (mariadb 11) to dbctl, depooled T326116', diff saved to https://phabricator.wikimedia.org/P43146 and previous config saved to /var/cache/conftool/dbconfig/20230112-195514-marostegui.json
[19:55:20] <stashbot>	 T326116: Package and test MariaDB 11 - https://phabricator.wikimedia.org/T326116
[19:56:51] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight T326116', diff saved to https://phabricator.wikimedia.org/P43147 and previous config saved to /var/cache/conftool/dbconfig/20230112-195651-marostegui.json
[19:59:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1176 T326116', diff saved to https://phabricator.wikimedia.org/P43148 and previous config saved to /var/cache/conftool/dbconfig/20230112-195922-marostegui.json
[20:06:09] <wikibugs>	 10SRE: Number of mw swift objects in eqiad greater than codfw - https://phabricator.wikimedia.org/T326857 (10andrea.denisse)
[20:06:24] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-codfw, 10DC-Ops: Q3:rack/setup/install ms-fe2013 - ms-fe2014, thanos-fe2004 - https://phabricator.wikimedia.org/T326848 (10RobH)
[20:06:38] <wikibugs>	 10SRE-swift-storage, 10serviceops: serviceops implementation tracking for ms-fe2013 - ms-fe2014, thanos-fe2004 - https://phabricator.wikimedia.org/T326849 (10RobH) 05Open→03Invalid actually data persistence this was a mis categorization
[20:07:05] <wikibugs>	 10SRE: Number of mw swift objects in eqiad greater than codfw - https://phabricator.wikimedia.org/T326857 (10andrea.denisse)
[20:07:22] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops, 10serviceops: Q3:rack/setup/install ms-fe1013 - ms-fe1014, thanos-fe1004 - https://phabricator.wikimedia.org/T326846 (10RobH)
[20:07:36] <wikibugs>	 10SRE-swift-storage, 10serviceops: serviceops implementation tracking for ms-fe1013 - ms-fe1014, thanos-fe1004 - https://phabricator.wikimedia.org/T326847 (10RobH) 05Open→03Invalid in valid this is actually data persistence i had it mislabeled
[20:07:40] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10DC-Ops: Q3:rack/setup/install ms-fe1013 - ms-fe1014, thanos-fe1004 - https://phabricator.wikimedia.org/T326846 (10RobH)
[20:08:08] <brett>	 !log Setting thread_pool_max for varnish-frontend to 12000
[20:08:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:50] <icinga-wm>	 ACKNOWLEDGEMENT - Number of mw swift objects in eqiad greater than codfw on alert1001 is CRITICAL: account=mw-media class=thumb Andrea Denisse https://phabricator.wikimedia.org/T326857 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?var-DC=eqiad
[20:11:53] <wikibugs>	 10SRE, 10Traffic: Remove IPSec/Strongswan from Puppet repository - https://phabricator.wikimedia.org/T326745 (10BCornwall) 05Open→03Resolved It's merged, so I guess this can be closed. :)
[20:17:46] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_presto in analytics@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[20:19:24] <wikibugs>	 10SRE-OnFire, 10SRE Observability (FY2022/2023-Q3): implementing an incident response workflow automation tool for SRE - https://phabricator.wikimedia.org/T308467 (10lmata)
[20:21:34] <wikibugs>	 10SRE-OnFire, 10SRE Observability (FY2022/2023-Q3): implementing an incident response workflow automation tool for SRE - https://phabricator.wikimedia.org/T308467 (10lmata)
[20:33:58] <wikibugs>	 (03PS3) 10Ryan Kemper: [WIP] wdqs: add recording rule for req success ratio [puppet] - 10https://gerrit.wikimedia.org/r/879599 (https://phabricator.wikimedia.org/T323064)
[20:36:05] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] [WIP] wdqs: add recording rule for req success ratio [puppet] - 10https://gerrit.wikimedia.org/r/879599 (https://phabricator.wikimedia.org/T323064) (owner: 10Ryan Kemper)
[20:36:29] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[20:37:31] <wikibugs>	 (03PS12) 10Ryan Kemper: wdqs-data-reload: use NFS for data reloads [cookbooks] - 10https://gerrit.wikimedia.org/r/876217 (https://phabricator.wikimedia.org/T323096) (owner: 10Bking)
[20:37:45] <wikibugs>	 (03PS13) 10Ryan Kemper: wdqs: use NFS for data reloads [cookbooks] - 10https://gerrit.wikimedia.org/r/876217 (https://phabricator.wikimedia.org/T323096) (owner: 10Bking)
[20:38:01] <icinga-wm>	 PROBLEM - Query Service HTTP Port on wdqs1004 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 380 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[20:39:19] <wikibugs>	 (03Abandoned) 10Ryan Kemper: elastic: decom elastic2035 [puppet] - 10https://gerrit.wikimedia.org/r/759637 (https://phabricator.wikimedia.org/T316729) (owner: 10Ryan Kemper)
[20:39:31] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1004 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.070 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[20:39:37] <icinga-wm>	 RECOVERY - Query Service HTTP Port on wdqs1004 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.022 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[20:49:07] <wikibugs>	 (03PS4) 10Ryan Kemper: team-search-platform: relax kafka burrow check [alerts] - 10https://gerrit.wikimedia.org/r/868234 (owner: 10DCausse)
[20:49:47] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+1] wdqs: use NFS for data reloads [cookbooks] - 10https://gerrit.wikimedia.org/r/876217 (https://phabricator.wikimedia.org/T323096) (owner: 10Bking)
[20:49:51] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] wdqs: use NFS for data reloads [cookbooks] - 10https://gerrit.wikimedia.org/r/876217 (https://phabricator.wikimedia.org/T323096) (owner: 10Bking)
[20:55:03] <wikibugs>	 (03PS1) 10Herron: grafana: stop testing home.json dashboard [puppet] - 10https://gerrit.wikimedia.org/r/879644
[20:56:06] <wikibugs>	 (03CR) 10Herron: [C: 03+2] grafana: stop testing home.json dashboard [puppet] - 10https://gerrit.wikimedia.org/r/879644 (owner: 10Herron)
[20:56:37] <wikibugs>	 (03PS4) 10Herron: [WIP] wdqs: add recording rule for req success ratio [puppet] - 10https://gerrit.wikimedia.org/r/879599 (https://phabricator.wikimedia.org/T323064) (owner: 10Ryan Kemper)
[21:00:04] <jouncebot>	 brennen and TheresNoTime: That opportune time is upon us again. Time for a UTC late backport and config training deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230112T2100).
[21:00:05] <jouncebot>	 samwilson, koi, and ebernhardson: A patch you scheduled for UTC late backport and config training is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:01:06] <samwilson>	 brennen TheresNoTime hello I'm present.
[21:01:09] <wikibugs>	 (03PS1) 10Ottomata: Add dummy an-launcher1002.eqiad.wmnet/analytics-platform-eng keytab [labs/private] - 10https://gerrit.wikimedia.org/r/879646 (https://phabricator.wikimedia.org/T326827)
[21:01:17] <ebernhardson>	 \o
[21:01:48] <thcipriani>	 ahoy all!
[21:01:54] <wikibugs>	 (03CR) 10Herron: [C: 03+2] "followed this up with I94b7c3400a7d493e30b5ab03504d08cbc3aca8a3 since CI was failing Grafana changes with 'FileNotFoundError: [Errno 2] No" [puppet] - 10https://gerrit.wikimedia.org/r/871290 (https://phabricator.wikimedia.org/T307465) (owner: 10Majavah)
[21:02:35] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+1] team-search-platform: relax kafka burrow check [alerts] - 10https://gerrit.wikimedia.org/r/868234 (owner: 10DCausse)
[21:02:37] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] team-search-platform: relax kafka burrow check [alerts] - 10https://gerrit.wikimedia.org/r/868234 (owner: 10DCausse)
[21:02:41] <thcipriani>	 I can be deploy tribute today
[21:03:27] <samwilson>	 thanks
[21:03:46] <wikibugs>	 (03Merged) 10jenkins-bot: team-search-platform: relax kafka burrow check [alerts] - 10https://gerrit.wikimedia.org/r/868234 (owner: 10DCausse)
[21:04:10] <wikibugs>	 (03PS6) 10Thcipriani: Remove Beta Feature for Realtime Preview and enable on plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/868816 (https://phabricator.wikimedia.org/T323033) (owner: 10Samwilson)
[21:04:32] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by thcipriani@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/868816 (https://phabricator.wikimedia.org/T323033) (owner: 10Samwilson)
[21:04:40] <wikibugs>	 (03PS1) 10Ottomata: Add analytics-platform-eng-admins and system user keytab to an-launcher1002 [puppet] - 10https://gerrit.wikimedia.org/r/879648 (https://phabricator.wikimedia.org/T326827)
[21:05:03] <thcipriani>	 ebernhardson: any particular order for yours?
[21:05:19] <wikibugs>	 (03Merged) 10jenkins-bot: Remove Beta Feature for Realtime Preview and enable on plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/868816 (https://phabricator.wikimedia.org/T323033) (owner: 10Samwilson)
[21:05:35] <logmsgbot>	 !log thcipriani@deploy1002 Started scap: Backport for [[gerrit:868816|Remove Beta Feature for Realtime Preview and enable on plwiki (T323033)]]
[21:05:37] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Add dummy an-launcher1002.eqiad.wmnet/analytics-platform-eng keytab [labs/private] - 10https://gerrit.wikimedia.org/r/879646 (https://phabricator.wikimedia.org/T326827) (owner: 10Ottomata)
[21:05:39] <ebernhardson>	 thcipriani: shouldn't matter, although the first config patch doesn't do anything until the wmf.18 patch is deployed
[21:05:39] <stashbot>	 T323033: Graduate Realtime Preview feature from Beta to being available for everyone - https://phabricator.wikimedia.org/T323033
[21:06:04] <thcipriani>	 no koi no stang :\
[21:06:13] <cirno>	 o/
[21:06:23] <wikibugs>	 (03CR) 10Thcipriani: [C: 03+2] looksLikeAutomation: Allow flagging requests from arbitrary headers [extensions/CirrusSearch] (wmf/1.40.0-wmf.18) - 10https://gerrit.wikimedia.org/r/879571 (https://phabricator.wikimedia.org/T326757) (owner: 10Ebernhardson)
[21:06:31] <cirno>	 sorry for the delay, I mute the sound 0 0
[21:07:03] <wikibugs>	 (03CR) 10Ottomata: "This will grant some sudo perms on an-launcher1002:" [puppet] - 10https://gerrit.wikimedia.org/r/879648 (https://phabricator.wikimedia.org/T326827) (owner: 10Ottomata)
[21:07:04] <thcipriani>	 oh hey cirno 
[21:07:08] <logmsgbot>	 !log thcipriani@deploy1002 thcipriani and samwilson: Backport for [[gerrit:868816|Remove Beta Feature for Realtime Preview and enable on plwiki (T323033)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
[21:07:24] <wikibugs>	 (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39120/console" [puppet] - 10https://gerrit.wikimedia.org/r/879648 (https://phabricator.wikimedia.org/T326827) (owner: 10Ottomata)
[21:07:31] <samwilson>	 testing now
[21:07:35] <thcipriani>	 samwilson: your change should be on mwdebug, ch...cool :)
[21:09:32] <samwilson>	 thcipriani: hehe :) I like the new message, and it's now on all debug servers? cool. And yep, tested and all looks grand. Am happy for it to proceed.
[21:10:07] <thcipriani>	 samwilson: great, going live everywhere now, thanks for checking :)
[21:10:16] <thcipriani>	 and, yeah, all debug servers
[21:16:19] <logmsgbot>	 !log thcipriani@deploy1002 Finished scap: Backport for [[gerrit:868816|Remove Beta Feature for Realtime Preview and enable on plwiki (T323033)]] (duration: 10m 43s)
[21:16:23] <stashbot>	 T323033: Graduate Realtime Preview feature from Beta to being available for everyone - https://phabricator.wikimedia.org/T323033
[21:16:31] <thcipriani>	 samwilson: ^ should be live now
[21:16:42] <samwilson>	 thanks! checking.
[21:16:56] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by thcipriani@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879561 (https://phabricator.wikimedia.org/T313698) (owner: 10Stang)
[21:17:08] <thcipriani>	 ^ cirno getting your first one staged now
[21:17:38] <wikibugs>	 (03Merged) 10jenkins-bot: etwikiquote: Switch logo variant back [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879561 (https://phabricator.wikimedia.org/T313698) (owner: 10Stang)
[21:17:53] <logmsgbot>	 !log thcipriani@deploy1002 Started scap: Backport for [[gerrit:879561|etwikiquote: Switch logo variant back (T313698)]]
[21:17:57] <stashbot>	 T313698: Requesting temporary logo change for et.wikiquote.org - https://phabricator.wikimedia.org/T313698
[21:19:24] <samwilson>	 thcipriani: Tested everywhere and all is well. Thanks!
[21:19:28] <logmsgbot>	 !log thcipriani@deploy1002 thcipriani and stang: Backport for [[gerrit:879561|etwikiquote: Switch logo variant back (T313698)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
[21:19:41] <thcipriani>	 samwilson: nice, thanks for checking :)
[21:19:53] <thcipriani>	 cirno: your first change is live on mwdebug, check please
[21:19:56] <cirno>	 looking
[21:20:09] <cirno>	 it works
[21:21:18] <thcipriani>	 cool going live
[21:21:37] <wikibugs>	 (03Merged) 10jenkins-bot: looksLikeAutomation: Allow flagging requests from arbitrary headers [extensions/CirrusSearch] (wmf/1.40.0-wmf.18) - 10https://gerrit.wikimedia.org/r/879571 (https://phabricator.wikimedia.org/T326757) (owner: 10Ebernhardson)
[21:21:56] <thcipriani>	 ^ ebernhardson we'll get your wmf.18 one next since it just merged
[21:22:22] <ebernhardson>	 kk
[21:23:27] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[21:24:57] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[21:27:19] <logmsgbot>	 !log thcipriani@deploy1002 Finished scap: Backport for [[gerrit:879561|etwikiquote: Switch logo variant back (T313698)]] (duration: 09m 25s)
[21:27:22] <stashbot>	 T313698: Requesting temporary logo change for et.wikiquote.org - https://phabricator.wikimedia.org/T313698
[21:27:54] <thcipriani>	 ^ cirno first one should be live, I'm going to do a quick mediawiki backport, then hop back to your config patches
[21:28:07] <cirno>	 got it
[21:28:37] <wikibugs>	 10Puppet, 10SRE, 10Data-Services, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): clouddumps1002: ferm is being started on every puppet run - https://phabricator.wikimedia.org/T323324 (10Dzahn) >>! In T323324#8518330, @Andrew wrote: >>>! In T323324#8517754, @Dzahn wrote: >> @Andrew Is it no...
[21:28:37] <logmsgbot>	 !log thcipriani@deploy1002 Started scap: Backport for [[gerrit:879571|looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757)]]
[21:28:41] <stashbot>	 T326757: Investigate doubling of full_text search query rate since jan 1, 2023 - https://phabricator.wikimedia.org/T326757
[21:30:21] <logmsgbot>	 !log thcipriani@deploy1002 thcipriani and ebernhardson: Backport for [[gerrit:879571|looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
[21:30:43] <thcipriani>	 ^ ebernhardson the wmf.18 change should be live on mwdebug, check please
[21:31:12] <ebernhardson>	 thcipriani: basic tests look to work (it does nothing until configured)
[21:31:21] <ebernhardson>	 meaning nothing appears broken :)
[21:31:36] <thcipriani>	 :)
[21:31:40] <thcipriani>	 sounds good, going live
[21:33:17] <thcipriani>	 cirno: I'm going to skip 876196 for the time being since it requires table creation. I'd like someone who is more familiar with how we do that nowadays to take a look at that :) But I'll get 879600 done after this.
[21:33:46] <cirno>	 ok, I'll reschedule this patch
[21:34:55] <thcipriani>	 <3
[21:36:28] <wikibugs>	 10SRE, 10SRE-OnFire, 10Release-Engineering-Team, 10serviceops-collab, 10Sustainability: Remove old scap repositories from deploy1002 - https://phabricator.wikimedia.org/T309162 (10Dzahn) >>! In T309162#8519453, @hashar wrote: >  That is what this task is about:  remove repos from the deployment servers w...
[21:37:47] <logmsgbot>	 !log thcipriani@deploy1002 Finished scap: Backport for [[gerrit:879571|looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757)]] (duration: 09m 10s)
[21:37:51] <stashbot>	 T326757: Investigate doubling of full_text search query rate since jan 1, 2023 - https://phabricator.wikimedia.org/T326757
[21:38:09] <thcipriani>	 ^ ebernhardson we'll get the rest of yours done all together
[21:38:27] <wikibugs>	 (03PS2) 10Thcipriani: nlwiki: Add block right to checkuser group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879600 (https://phabricator.wikimedia.org/T326355) (owner: 10Stang)
[21:38:45] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by thcipriani@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879600 (https://phabricator.wikimedia.org/T326355) (owner: 10Stang)
[21:38:57] <thcipriani>	 (after this one :))
[21:39:30] <wikibugs>	 (03Merged) 10jenkins-bot: nlwiki: Add block right to checkuser group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879600 (https://phabricator.wikimedia.org/T326355) (owner: 10Stang)
[21:39:42] <logmsgbot>	 !log thcipriani@deploy1002 Started scap: Backport for [[gerrit:879600|nlwiki: Add block right to checkuser group (T326355)]]
[21:39:46] <stashbot>	 T326355: Assign block rights to the checkuser group on nl.wikipedia.org - https://phabricator.wikimedia.org/T326355
[21:39:46] <ebernhardson>	 kk
[21:41:19] <logmsgbot>	 !log thcipriani@deploy1002 thcipriani and stang: Backport for [[gerrit:879600|nlwiki: Add block right to checkuser group (T326355)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
[21:41:40] <thcipriani>	 ^ cirno last one, check please (if you're able)
[21:42:25] <cirno>	 thcipriani, I checked https://nl.wikipedia.org/wiki/Special:Listgrouprights and LGTM
[21:42:44] <thcipriani>	 cool, thanks, going live :)
[21:45:06] <wikibugs>	 (03PS5) 10Ryan Kemper: wdqs: add recording rule for req success ratio [puppet] - 10https://gerrit.wikimedia.org/r/879599 (https://phabricator.wikimedia.org/T323064)
[21:45:15] <wikibugs>	 (03CR) 10Ryan Kemper: wdqs: add recording rule for req success ratio (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/879599 (https://phabricator.wikimedia.org/T323064) (owner: 10Ryan Kemper)
[21:45:37] <wikibugs>	 (03PS6) 10Ryan Kemper: wdqs: add recording rule for req success ratio [puppet] - 10https://gerrit.wikimedia.org/r/879599 (https://phabricator.wikimedia.org/T323064)
[21:47:09] <wikibugs>	 (03PS1) 10Dreamy Jazz: Start writing to cul_comment_id and cul_comment_plaintext_id on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879652 (https://phabricator.wikimedia.org/T233004)
[21:48:47] <logmsgbot>	 !log thcipriani@deploy1002 Finished scap: Backport for [[gerrit:879600|nlwiki: Add block right to checkuser group (T326355)]] (duration: 09m 04s)
[21:48:51] <stashbot>	 T326355: Assign block rights to the checkuser group on nl.wikipedia.org - https://phabricator.wikimedia.org/T326355
[21:49:05] <thcipriani>	 ^ cirno should be live now
[21:49:22] <wikibugs>	 (03PS2) 10Thcipriani: cirrus: Divert requests with x-public-cloud set to a dedicated pool counter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879161 (https://phabricator.wikimedia.org/T326757) (owner: 10Ebernhardson)
[21:49:25] <cirno>	 thanks!
[21:49:49] <wikibugs>	 (03PS2) 10Thcipriani: cirrus: Disable incoming link counting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/862343 (https://phabricator.wikimedia.org/T317023) (owner: 10Ebernhardson)
[21:50:37] <wikibugs>	 10SRE, 10SRE-OnFire, 10Release-Engineering-Team, 10serviceops-collab, 10Sustainability: Remove old scap repositories from deploy1002 - https://phabricator.wikimedia.org/T309162 (10Krinkle) >>! In T309162#7958771, @Dzahn wrote: > Top ten oldest repos by modifiation time, oldest first: >  > ` > May 30  201...
[21:51:06] <thcipriani>	 ebernhardson: I think these two patches are going to merge conflict :)
[21:51:24] <wikibugs>	 (03CR) 10Thcipriani: [C: 03+2] cirrus: Divert requests with x-public-cloud set to a dedicated pool counter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879161 (https://phabricator.wikimedia.org/T326757) (owner: 10Ebernhardson)
[21:52:06] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus: Divert requests with x-public-cloud set to a dedicated pool counter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879161 (https://phabricator.wikimedia.org/T326757) (owner: 10Ebernhardson)
[21:52:10] <wikibugs>	 (03PS2) 10Dreamy Jazz: Start writing to cul_reason_id and cul_reason_plaintext_id on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879652 (https://phabricator.wikimedia.org/T233004)
[21:52:16] <wikibugs>	 (03PS3) 10Dreamy Jazz: Start writing to cul_reason_id and cul_reason_plaintext_id on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879652 (https://phabricator.wikimedia.org/T233004)
[21:52:52] <thcipriani>	 ebernhardson: yeep, merge conflict, could you update https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/862343 for me?
[21:53:00] <ebernhardson>	 sure, sec
[21:54:05] <ryankemper>	 ebernhardson: thcipriani: fwiw https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/879161/2/wmf-config/InitialiseSettings.php is slightly non-alphabetical order, might want to fix that while fixing the conflict
[21:54:08] <wikibugs>	 (03PS3) 10Ebernhardson: cirrus: Disable incoming link counting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/862343 (https://phabricator.wikimedia.org/T317023)
[21:54:32] <wikibugs>	 (03PS4) 10Dreamy Jazz: Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879652 (https://phabricator.wikimedia.org/T233004)
[21:54:59] <ryankemper>	 well, I guess I'm assuming it should be alphabetical, I'm not actually sure :D
[21:55:20] <ebernhardson>	 ryankemper: sadly, it only happens to look alphabetical in that little snippet, there isn't a particular ordering except new things at the end of the other cirrus things
[21:55:23] <thcipriani>	 :D
[21:55:33] <ryankemper>	 checks out :P
[21:55:36] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: Q2:rack/setup/install puppetdb1003 - https://phabricator.wikimedia.org/T317892 (10Papaul) 05Open→03Resolved
[21:56:01] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by thcipriani@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/862343 (https://phabricator.wikimedia.org/T317023) (owner: 10Ebernhardson)
[21:56:20] <ebernhardson>	 thcipriani: the incoming links one isn't testable, it only executes on the job runners
[21:56:45] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus: Disable incoming link counting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/862343 (https://phabricator.wikimedia.org/T317023) (owner: 10Ebernhardson)
[21:56:49] <zabe>	 !log run populateCucComment.php on testwiki # T233004
[21:56:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:56:52] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[21:56:58] <logmsgbot>	 !log thcipriani@deploy1002 Started scap: Backport for [[gerrit:879161|cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757)]], [[gerrit:862343|cirrus: Disable incoming link counting (T317023)]]
[21:57:02] <stashbot>	 T317023: Investigate moving incoming_links computation to a batch job - https://phabricator.wikimedia.org/T317023
[21:57:03] <stashbot>	 T326757: Investigate doubling of full_text search query rate since jan 1, 2023 - https://phabricator.wikimedia.org/T326757
[21:58:05] <logmsgbot>	 !log krinkle@deploy1002 Installing scap version "4.32.0" for 1 hosts
[21:58:15] <logmsgbot>	 !log krinkle@deploy1002 Installation of scap version "4.32.0" completed for 1 hosts
[21:58:33] <logmsgbot>	 !log thcipriani@deploy1002 thcipriani and ebernhardson: Backport for [[gerrit:879161|cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757)]], [[gerrit:862343|cirrus: Disable incoming link counting (T317023)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
[21:58:46] <logmsgbot>	 !log krinkle@deploy1002 Installing scap version "4.32.0" for 1 hosts
[21:58:56] <logmsgbot>	 !log krinkle@deploy1002 Installation of scap version "4.32.0" completed for 1 hosts
[21:59:16] <thcipriani>	 ^ ebernhardson both of the configs are live on mwdebug
[21:59:29] <thcipriani>	 check please :)
[21:59:34] <Krinkle>	 !log krinkle@deploy1002$ `scap install-world -v --limit-hosts` for webperf1003.eqiad and webperf2003.codfw, ref T326668
[21:59:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:59:37] <stashbot>	 T326668: Scap fails on debian bullseye targets - https://phabricator.wikimedia.org/T326668
[21:59:42] <logmsgbot>	 !log krinkle@deploy1002 Started deploy [performance/navtiming@172cc22]: (no justification provided)
[21:59:51] <logmsgbot>	 !log krinkle@deploy1002 Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 08s)
[22:00:09] <ebernhardson>	 thcipriani: everything looks reasonable
[22:00:17] <thcipriani>	 cool, going live
[22:01:40] <wikibugs>	 (03PS1) 10Dreamy Jazz: Start writing to cul_reason_[plaintext]_id on group0 and group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879653 (https://phabricator.wikimedia.org/T233004)
[22:02:00] <wikibugs>	 (03PS5) 10Dreamy Jazz: Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879652 (https://phabricator.wikimedia.org/T233004)
[22:02:07] <wikibugs>	 (03PS2) 10Dreamy Jazz: Start writing to cul_reason_[plaintext]_id on group0 and group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879653 (https://phabricator.wikimedia.org/T233004)
[22:06:22] <logmsgbot>	 !log thcipriani@deploy1002 Finished scap: Backport for [[gerrit:879161|cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757)]], [[gerrit:862343|cirrus: Disable incoming link counting (T317023)]] (duration: 09m 23s)
[22:06:27] <stashbot>	 T317023: Investigate moving incoming_links computation to a batch job - https://phabricator.wikimedia.org/T317023
[22:06:27] <stashbot>	 T326757: Investigate doubling of full_text search query rate since jan 1, 2023 - https://phabricator.wikimedia.org/T326757
[22:06:34] <thcipriani>	 ^ ebernhardson alright, all should be live now
[22:07:04] <ebernhardson>	 thcipriani: thanks! already seeing them working in dashboards
[22:07:09] <thcipriani>	 nice :)
[22:07:13] <thcipriani>	 !log end UTC late backport
[22:07:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:08:54] <zabe>	 !log start of "foreachwikiindblist s3.dblist extensions/CheckUser/maintenance/populateCucComment.php" in a screen in mwmaint1002 # T233004
[22:08:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:08:57] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[22:15:15] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[22:31:45] <icinga-wm>	 PROBLEM - Check systemd state on people2002 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_rsync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:38:51] <mutante>	 ^ yea, that server should not have the auto restart service..
[22:39:00] <mutante>	 it doesnt have rsync on it
[22:41:04] <wikibugs>	 (03CR) 10Herron: "LGTM overall, please see minor syntax issue inline" [puppet] - 10https://gerrit.wikimedia.org/r/879599 (https://phabricator.wikimedia.org/T323064) (owner: 10Ryan Kemper)
[22:45:06] <wikibugs>	 (03CR) 10Herron: "annnnd one more thing 😇" [puppet] - 10https://gerrit.wikimedia.org/r/879599 (https://phabricator.wikimedia.org/T323064) (owner: 10Ryan Kemper)
[22:45:19] <mutante>	 !log people2002 - apt-get remove --purge rsync
[22:45:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:46:26] <mutante>	 no, it's different, rsync package is installed..but someone or something deleted the config 
[22:46:36] <mutante>	 letting puppet recreate
[22:54:23] <wikibugs>	 10SRE: rsync server on people2002 - https://phabricator.wikimedia.org/T326888 (10Dzahn)
[22:55:01] <sbassett>	 Hey all - was going to deploy some changes to PrivateSettings.php - let me know if I shouldn’t for any reason.
[22:55:15] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "this worked after merge but today it's like the config for rsyncd has been wiped - https://phabricator.wikimedia.org/T326888" [puppet] - 10https://gerrit.wikimedia.org/r/875806 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[22:56:19] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on people2002 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_rsync.service daniel_zahn https://phabricator.wikimedia.org/T326888 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:57:17] <wikibugs>	 10SRE: rsync server on people2002 - https://phabricator.wikimedia.org/T326888 (10Dzahn) restart service was only added recently, but I had tested it and it did not have that problem a couple days ago.  https://gerrit.wikimedia.org/r/875806
[22:58:37] <wikibugs>	 10SRE, 10SRE-OnFire, 10Release-Engineering-Team, 10serviceops-collab, 10Sustainability: Remove old scap repositories from deploy1002 - https://phabricator.wikimedia.org/T309162 (10Dzahn) I don't remember exactly but most likely find -mtime, yea. ACK!
[23:06:59] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[23:08:29] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[23:08:34] <sbassett>	 !log Deployed (temporary) security mitigations for T326691
[23:08:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:08:46] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops: hw troubleshooting: DIMM_B2 for mc2040.codfw.wmnet - https://phabricator.wikimedia.org/T326834 (10Dzahn) @Jhancock.wm Thanks for the super quick turnaround. That was fast, wow.   someone needs to follow-up, for example do we set the status back to active in netbox, does it have...
[23:10:34] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops: hw troubleshooting: DIMM_B2 for mc2040.codfw.wmnet - https://phabricator.wikimedia.org/T326834 (10Dzahn) set back to active in netbox
[23:10:38] <logmsgbot>	 !log ebernhardson@deploy1002 Started deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3
[23:13:10] <logmsgbot>	 !log ebernhardson@deploy1002 Finished deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3 (duration: 02m 31s)
[23:19:21] <wikibugs>	 (03PS1) 10Jdlrobson: English Wikipedia uses Vector 2022 skin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879659
[23:22:47] <wikibugs>	 (03PS1) 10BCornwall: prometheus: Generate varnish params file [puppet] - 10https://gerrit.wikimedia.org/r/879660 (https://phabricator.wikimedia.org/T323723)
[23:22:59] <wikibugs>	 (03PS1) 10Jdlrobson: [Just in case] Disable thumbnails on English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879661
[23:23:09] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] prometheus: Generate varnish params file [puppet] - 10https://gerrit.wikimedia.org/r/879660 (https://phabricator.wikimedia.org/T323723) (owner: 10BCornwall)
[23:24:23] <wikibugs>	 (03PS2) 10BCornwall: prometheus: Generate varnish params file [puppet] - 10https://gerrit.wikimedia.org/r/879660 (https://phabricator.wikimedia.org/T323723)
[23:24:43] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] prometheus: Generate varnish params file [puppet] - 10https://gerrit.wikimedia.org/r/879660 (https://phabricator.wikimedia.org/T323723) (owner: 10BCornwall)
[23:25:47] <wikibugs>	 (03PS3) 10BCornwall: prometheus: Generate varnish params file [puppet] - 10https://gerrit.wikimedia.org/r/879660 (https://phabricator.wikimedia.org/T323723)
[23:26:08] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] prometheus: Generate varnish params file [puppet] - 10https://gerrit.wikimedia.org/r/879660 (https://phabricator.wikimedia.org/T323723) (owner: 10BCornwall)
[23:26:39] <wikibugs>	 (03CR) 10Jdlrobson: [C: 04-1] "FYI" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879661 (owner: 10Jdlrobson)
[23:29:55] <wikibugs>	 (03PS1) 10Jdlrobson: Remove redundant block for search descriptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/879664 (https://phabricator.wikimedia.org/T324859)
[23:30:34] <wikibugs>	 (03PS4) 10BCornwall: prometheus: Generate varnish params file [puppet] - 10https://gerrit.wikimedia.org/r/879660 (https://phabricator.wikimedia.org/T323723)
[23:31:18] <wikibugs>	 (03PS7) 10Ryan Kemper: wdqs: add recording rule for req success ratio [puppet] - 10https://gerrit.wikimedia.org/r/879599 (https://phabricator.wikimedia.org/T323064)
[23:32:40] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] prometheus: Generate varnish params file [puppet] - 10https://gerrit.wikimedia.org/r/879660 (https://phabricator.wikimedia.org/T323723) (owner: 10BCornwall)
[23:32:52] <wikibugs>	 (03CR) 10Ryan Kemper: "done! thanks for catching those" [puppet] - 10https://gerrit.wikimedia.org/r/879599 (https://phabricator.wikimedia.org/T323064) (owner: 10Ryan Kemper)
[23:38:25] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to datacenter-ops for Jennifer Hancock - https://phabricator.wikimedia.org/T326649 (10Papaul)
[23:40:29] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to datacenter-ops for Jennifer Hancock - https://phabricator.wikimedia.org/T326649 (10Papaul) 05Open→03Resolved I tested this today with @Jhancock.wm all is working. We can close the task. Thanks @Jelto @Dzahn
[23:41:44] <wikibugs>	 (03PS5) 10BCornwall: prometheus: Generate varnish params file [puppet] - 10https://gerrit.wikimedia.org/r/879660 (https://phabricator.wikimedia.org/T323723)
[23:43:46] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] prometheus: Generate varnish params file [puppet] - 10https://gerrit.wikimedia.org/r/879660 (https://phabricator.wikimedia.org/T323723) (owner: 10BCornwall)
[23:44:17] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host an-coord1003.mgmt.eqiad.wmnet with reboot policy FORCED
[23:46:52] <wikibugs>	 (03PS6) 10BCornwall: prometheus: Generate varnish params file [puppet] - 10https://gerrit.wikimedia.org/r/879660 (https://phabricator.wikimedia.org/T323723)
[23:47:05] <brett>	 Don't mind me....
[23:50:08] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.provision for host an-coord1004.mgmt.eqiad.wmnet with reboot policy FORCED
[23:53:06] <zabe>	 !log start running cuc_comment_id population script on rest of sections in screens with --sleep 2 # T233004
[23:53:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:53:10] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[23:53:51] <wikibugs>	 (03CR) 10BCornwall: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39121/console" [puppet] - 10https://gerrit.wikimedia.org/r/879660 (https://phabricator.wikimedia.org/T323723) (owner: 10BCornwall)