[00:43:47] <icinga-wm>	 RECOVERY - Check systemd state on logstash2026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:12:58] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST nodes) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[01:17:59] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST nodes) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[01:37:45] <jinxer-wm>	 (JobUnavailable) firing: (5) Reduced availability for job redis_gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:42:45] <jinxer-wm>	 (JobUnavailable) firing: (8) Reduced availability for job nginx in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:52:45] <jinxer-wm>	 (JobUnavailable) firing: (10) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:07:45] <jinxer-wm>	 (JobUnavailable) firing: (10) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:17:45] <jinxer-wm>	 (JobUnavailable) firing: (10) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:22:45] <jinxer-wm>	 (JobUnavailable) resolved: (10) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:11:47] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:12:57] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:16:49] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 48974 bytes in 0.062 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:17:43] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.242 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:56:11] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www-7.4.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method
[03:58:11] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on api_appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=POST
[04:41:58] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST services) on k8s-mlserve@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-mlserve - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[04:46:58] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST services) on k8s-mlserve@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-mlserve - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[05:23:25] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is CRITICAL: 187 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:25:27] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is OK: (C)100 gt (W)50 gt 8 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:42:54] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2127.codfw.wmnet with reason: Maintenance
[05:43:08] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2127.codfw.wmnet with reason: Maintenance
[06:10:28] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1123.eqiad.wmnet with reason: Maintenance
[06:10:41] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1123.eqiad.wmnet with reason: Maintenance
[06:19:21] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1121.eqiad.wmnet with reason: Maintenance
[06:19:45] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1121.eqiad.wmnet with reason: Maintenance
[06:19:47] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[06:20:02] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[06:20:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1121 (T321126)', diff saved to https://phabricator.wikimedia.org/P41258 and previous config saved to /var/cache/conftool/dbconfig/20221128-062008-marostegui.json
[06:20:14] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[06:25:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1121 (T321126)', diff saved to https://phabricator.wikimedia.org/P41259 and previous config saved to /var/cache/conftool/dbconfig/20221128-062516-marostegui.json
[06:25:23] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[06:34:17] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/860913 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[06:34:34] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb::proxy: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/860911 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[06:36:56] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: db2174 lost power - https://phabricator.wikimedia.org/T323512 (10Marostegui) MySQL is now off again, so @Papaul you can do the test whenever you can.
[06:37:40] <wikibugs>	 (03PS1) 10Marostegui: control-mysql-5.7: We won't use 5.7 [software] - 10https://gerrit.wikimedia.org/r/861193
[06:38:17] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] control-mysql-5.7: We won't use 5.7 [software] - 10https://gerrit.wikimedia.org/r/861193 (owner: 10Marostegui)
[06:38:50] <wikibugs>	 (03Merged) 10jenkins-bot: control-mysql-5.7: We won't use 5.7 [software] - 10https://gerrit.wikimedia.org/r/861193 (owner: 10Marostegui)
[06:40:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P41260 and previous config saved to /var/cache/conftool/dbconfig/20221128-064022-marostegui.json
[06:42:21] <wikibugs>	 (03PS2) 10Kosta Harlan: GrowthExperiments: Start newimpact experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860867 (https://phabricator.wikimedia.org/T323526)
[06:42:40] <wikibugs>	 (03PS3) 10Kosta Harlan: GrowthExperiments: Start oldimpact experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860867 (https://phabricator.wikimedia.org/T323526)
[06:43:43] <wikibugs>	 (03PS4) 10Kosta Harlan: GrowthExperiments: Start oldimpact experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860867 (https://phabricator.wikimedia.org/T323526)
[06:55:29] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P41261 and previous config saved to /var/cache/conftool/dbconfig/20221128-065529-marostegui.json
[07:00:15] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is CRITICAL: 111 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[07:02:17] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is OK: (C)100 gt (W)50 gt 9 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[07:08:45] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2022-11-28-053412-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/861195 (https://phabricator.wikimedia.org/T323825)
[07:10:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1121 (T321126)', diff saved to https://phabricator.wikimedia.org/P41262 and previous config saved to /var/cache/conftool/dbconfig/20221128-071035-marostegui.json
[07:10:37] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1141.eqiad.wmnet with reason: Maintenance
[07:10:42] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[07:10:51] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1141.eqiad.wmnet with reason: Maintenance
[07:10:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1141 (T321126)', diff saved to https://phabricator.wikimedia.org/P41263 and previous config saved to /var/cache/conftool/dbconfig/20221128-071057-marostegui.json
[07:13:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1141 (T321126)', diff saved to https://phabricator.wikimedia.org/P41264 and previous config saved to /var/cache/conftool/dbconfig/20221128-071306-marostegui.json
[07:28:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P41265 and previous config saved to /var/cache/conftool/dbconfig/20221128-072813-marostegui.json
[07:31:01] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+1] dumps/distribution: add more data types to parameters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/852260 (owner: 10Dzahn)
[07:31:11] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Remove the parsoid chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/860703 (owner: 10Giuseppe Lavagetto)
[07:35:37] <wikibugs>	 (03Merged) 10jenkins-bot: Remove the parsoid chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/860703 (owner: 10Giuseppe Lavagetto)
[07:36:41] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is CRITICAL: 117 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[07:37:38] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] miscweb: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860510 (owner: 10Giuseppe Lavagetto)
[07:38:43] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is OK: (C)100 gt (W)50 gt 12 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[07:42:41] <wikibugs>	 (03Merged) 10jenkins-bot: miscweb: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860510 (owner: 10Giuseppe Lavagetto)
[07:43:20] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P41266 and previous config saved to /var/cache/conftool/dbconfig/20221128-074319-marostegui.json
[07:43:34] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] recommendation-api: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860511 (owner: 10Giuseppe Lavagetto)
[07:47:58] <wikibugs>	 (03Merged) 10jenkins-bot: recommendation-api: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860511 (owner: 10Giuseppe Lavagetto)
[07:53:43] <wikibugs>	 (03PS2) 10Muehlenhoff: graphite: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/860910 (https://phabricator.wikimedia.org/T308013)
[07:58:12] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/860945 (https://phabricator.wikimedia.org/T322670) (owner: 10Andrea Denisse)
[07:58:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1141 (T321126)', diff saved to https://phabricator.wikimedia.org/P41267 and previous config saved to /var/cache/conftool/dbconfig/20221128-075826-marostegui.json
[07:58:28] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1142.eqiad.wmnet with reason: Maintenance
[07:58:34] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[07:58:41] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1142.eqiad.wmnet with reason: Maintenance
[07:58:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1142 (T321126)', diff saved to https://phabricator.wikimedia.org/P41268 and previous config saved to /var/cache/conftool/dbconfig/20221128-075847-marostegui.json
[07:59:00] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] graphite: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/860910 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[08:00:04] <jouncebot>	 Amir1 and Urbanecm: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221128T0800).
[08:00:05] <jouncebot>	 kart_: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:00:37] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is CRITICAL: 208 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[08:00:41] * kart_ is here and will self deploy..
[08:00:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142 (T321126)', diff saved to https://phabricator.wikimedia.org/P41269 and previous config saved to /var/cache/conftool/dbconfig/20221128-080057-marostegui.json
[08:02:39] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is OK: (C)100 gt (W)50 gt 10 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[08:02:39] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ml-serve1005 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[08:02:40] <wikibugs>	 (03PS2) 10KartikMistry: Content Translation: Reverse MT threshold for Japanese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860701 (https://phabricator.wikimedia.org/T323721)
[08:03:41] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] gitlab_runner: make one Shared Runner canary [puppet] - 10https://gerrit.wikimedia.org/r/858188 (owner: 10Jelto)
[08:04:06] <moritzm>	 !log rebalance Ganeti group C/codfw following reboots
[08:04:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:05:00] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by kartik@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860701 (https://phabricator.wikimedia.org/T323721) (owner: 10KartikMistry)
[08:06:16] <wikibugs>	 (03Merged) 10jenkins-bot: Content Translation: Reverse MT threshold for Japanese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860701 (https://phabricator.wikimedia.org/T323721) (owner: 10KartikMistry)
[08:07:50] <logmsgbot>	 !log kartik@deploy1002 Backport cancelled.
[08:08:27] <kart_>	 James_F: "There were unexpected commits pulled from origin for /srv/mediawiki-staging." Did you forget something to deploy?
[08:09:03] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Make ganeti2032 a Ganeti node [puppet] - 10https://gerrit.wikimedia.org/r/860873 (https://phabricator.wikimedia.org/T313856) (owner: 10Muehlenhoff)
[08:09:07] <wikibugs>	 (03PS1) 10TrainBranchBot: Revert "Content Translation: Reverse MT threshold for Japanese Wikipedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861341
[08:09:09] <wikibugs>	 (03CR) 10TrainBranchBot: "kartik@deploy1002 created a revert of this change as I8f4434220bd2d53947fd2eaab55fe47d80e36f8a" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860701 (https://phabricator.wikimedia.org/T323721) (owner: 10KartikMistry)
[08:09:30] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by kartik@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861341 (owner: 10TrainBranchBot)
[08:09:40] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/recommendation-api: apply
[08:09:59] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
[08:10:15] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Content Translation: Reverse MT threshold for Japanese Wikipedia" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861341 (owner: 10TrainBranchBot)
[08:10:18] <wikibugs>	 (03PS3) 10Slyngshede: Allow multiple server connections to be defined. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/860857
[08:10:27] <logmsgbot>	 !log kartik@deploy1002 Started scap: Backport for [[gerrit:861341|Revert "Content Translation: Reverse MT threshold for Japanese Wikipedia"]]
[08:10:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[08:11:32] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] START helmfile.d/services/recommendation-api: apply
[08:11:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[08:11:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[08:11:57] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
[08:12:59] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[08:15:03] <kart_>	 Not sure - but scap revert seems stuck at, `08:10:29 K8s images build/push output redirected to /home/kartik/scap-image-build-and-push-log`
[08:15:20] <wikibugs>	 (03CR) 10Slyngshede: Allow multiple server connections to be defined. (031 comment) [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/860857 (owner: 10Slyngshede)
[08:16:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P41270 and previous config saved to /var/cache/conftool/dbconfig/20221128-081603-marostegui.json
[08:16:57] <logmsgbot>	 !log kartik@deploy1002 kartik and trainbranchbot: Backport for [[gerrit:861341|Revert "Content Translation: Reverse MT threshold for Japanese Wikipedia"]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
[08:18:03] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[08:19:05] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
[08:19:30] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
[08:21:28] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/miscweb: apply
[08:21:39] <logmsgbot>	 !log kartik@deploy1002 Finished scap: Backport for [[gerrit:861341|Revert "Content Translation: Reverse MT threshold for Japanese Wikipedia"]] (duration: 11m 12s)
[08:21:44] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[08:21:52] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] START helmfile.d/services/miscweb: apply
[08:22:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[08:22:07] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[08:22:20] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[08:24:58] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[08:25:25] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[08:25:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[08:26:02] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38449/console" [puppet] - 10https://gerrit.wikimedia.org/r/860568 (owner: 10Slyngshede)
[08:30:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[08:31:10] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P41271 and previous config saved to /var/cache/conftool/dbconfig/20221128-083110-marostegui.json
[08:32:43] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ml-serve1005 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[08:35:00] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[08:35:01] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[08:35:20] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
[08:35:38] <wikibugs>	 (03PS2) 10Slyngshede: WIP C:ldap::client::utils Rewrite add-ldap-group [puppet] - 10https://gerrit.wikimedia.org/r/860568
[08:37:13] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38450/console" [puppet] - 10https://gerrit.wikimedia.org/r/860568 (owner: 10Slyngshede)
[08:37:42] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP C:ldap::client::utils Rewrite add-ldap-group [puppet] - 10https://gerrit.wikimedia.org/r/860568 (owner: 10Slyngshede)
[08:39:00] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[08:42:15] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www-7.4.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method
[08:43:35] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on api_appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=POST
[08:43:50] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
[08:44:39] <wikibugs>	 (03PS3) 10Slyngshede: WIP C:ldap::client::utils Rewrite add-ldap-group [puppet] - 10https://gerrit.wikimedia.org/r/860568
[08:46:16] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38451/console" [puppet] - 10https://gerrit.wikimedia.org/r/860568 (owner: 10Slyngshede)
[08:46:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1142 (T321126)', diff saved to https://phabricator.wikimedia.org/P41272 and previous config saved to /var/cache/conftool/dbconfig/20221128-084616-marostegui.json
[08:46:18] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1143.eqiad.wmnet with reason: Maintenance
[08:46:24] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[08:46:31] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1143.eqiad.wmnet with reason: Maintenance
[08:46:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1143 (T321126)', diff saved to https://phabricator.wikimedia.org/P41273 and previous config saved to /var/cache/conftool/dbconfig/20221128-084637-marostegui.json
[08:51:43] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is CRITICAL: 119 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[08:55:07] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is OK: (C)100 gt (W)50 gt 12 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[08:58:19] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] "LGTM feel free to ignore the nits" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860915 (owner: 10Arturo Borrero Gonzalez)
[09:00:23] <wikibugs>	 10SRE, 10SRE-OnFire, 10Product-Infrastructure-Team-Backlog, 10Maps (Kartotherian), 10Sustainability (Incident Followup): Kartotherian/Maps outage followups, 2020-10-29 - https://phabricator.wikimedia.org/T266807 (10Marostegui) @lmata ping
[09:03:14] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations: Broken disk on thanos-be1003 but not reported / task not opened - https://phabricator.wikimedia.org/T285662 (10Marostegui) @Volans do you want to keep this open?
[09:04:45] <wikibugs>	 10SRE, 10SRE-OnFire (FY2021/2022-Q3), 10Data-Engineering, 10Event-Platform Value Stream, and 2 others: Incident: 2022-03-4 Banner sampling leading to a relatively wide site outage (mostly esams) - https://phabricator.wikimedia.org/T303036 (10Marostegui) @lmata what should we do with this follow up task?
[09:05:01] <wikibugs>	 10SRE-swift-storage, 10Commons: File not found: /v1/AUTH_mw/wikipedia-commons-local-public.9e/9/9e/Christopher_Wilbrand.jpg - https://phabricator.wikimedia.org/T304788 (10Marostegui)
[09:05:37] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations: Broken disk on thanos-be1003 but not reported / task not opened - https://phabricator.wikimedia.org/T285662 (10Volans) @Marostegui Good question, I'm not aware of other occurrences of the same issue, so it can probably be closed. @fgiunchedi any thoughts?
[09:06:07] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:06:29] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:06:49] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.4 point update - https://phabricator.wikimedia.org/T312637 (10Marostegui) @MoritzMuehlenhoff is this all done?
[09:07:23] <wikibugs>	 10SRE, 10Phabricator, 10Traffic, 10Wikimedia-Incident: Phabricator was logging out users repeatedly (2022-08-26) - https://phabricator.wikimedia.org/T316337 (10Marostegui) What should we do with this task? Anything left?
[09:07:57] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 48975 bytes in 0.133 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:08:18] <wikibugs>	 10SRE, 10Traffic, 10affects-Kiwix-and-openZIM: HTTP 500 against api.php?action=parse API on tr.wikipedia.org - https://phabricator.wikimedia.org/T317011 (10Marostegui) 05Open→03Resolved a:03Marostegui I am going to tentatively close this as fixed per T317011#8212217. Please reopen if it is not the case.
[09:08:19] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.242 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[09:08:24] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti2032.codfw.wmnet to cluster codfw and group B
[09:09:05] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wmcs: openstack: common: allow to list servers with extra information (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860915 (owner: 10Arturo Borrero Gonzalez)
[09:09:08] <wikibugs>	 10SRE, 10Traffic, 10affects-Kiwix-and-openZIM: HTTP 500 against api.php?action=parse API on tr.wikipedia.org - https://phabricator.wikimedia.org/T317011 (10Marostegui) a:05Marostegui→03None
[09:09:40] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.4 point update - https://phabricator.wikimedia.org/T312637 (10Marostegui) 05Open→03Resolved a:03MoritzMuehlenhoff I am assuming {T317416} takes over, so closing this
[09:12:19] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[09:12:26] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.4 point update - https://phabricator.wikimedia.org/T312637 (10MoritzMuehlenhoff) 05Resolved→03Open Actually, the openssh update is still TBD, reopening until I have completed that one.
[09:12:32] <moritzm>	 !log rebalance Ganeti group A/eqiad T311687
[09:12:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:12:39] <stashbot>	 T311687: Upgrade ganeti/eqiad to Bullseye - https://phabricator.wikimedia.org/T311687
[09:14:13] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[09:14:57] <icinga-wm>	 RECOVERY - Ganeti memory on ganeti1023 is OK: OK Memory 73% used https://wikitech.wikimedia.org/wiki/Ganeti%23Memory_pressure
[09:15:17] <wikibugs>	 (03CR) 10FNegri: harbor: ensure that it's started (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/860896 (https://phabricator.wikimedia.org/T267616) (owner: 10David Caro)
[09:15:46] <wikibugs>	 (03PS4) 10Slyngshede: WIP C:ldap::client::utils Rewrite add-ldap-group [puppet] - 10https://gerrit.wikimedia.org/r/860568
[09:16:21] <wikibugs>	 (03Abandoned) 10Awight: Send PostgreSQL logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/853941 (https://phabricator.wikimedia.org/T321887) (owner: 10Awight)
[09:16:33] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations: Broken disk on thanos-be1003 but not reported / task not opened - https://phabricator.wikimedia.org/T285662 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Agreed, I'm not aware of further occurrences. I'll be BOLD and resolve the task, thank you!
[09:18:25] <wikibugs>	 10SRE, 10ops-ulsfo, 10Infrastructure-Foundations: Degraded RAID on ganeti4006 - https://phabricator.wikimedia.org/T321863 (10Marostegui) 05Open→03Resolved a:03Marostegui The RAID is actually ok ` Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sd...
[09:18:44] <wikibugs>	 10SRE, 10Phabricator, 10Traffic, 10Wikimedia-Incident: Phabricator was logging out users repeatedly (2022-08-26) - https://phabricator.wikimedia.org/T316337 (10jcrespo) As soon as I finish the wikitech description I intend to resolve it.
[09:18:51] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38452/console" [puppet] - 10https://gerrit.wikimedia.org/r/860568 (owner: 10Slyngshede)
[09:19:16] <wikibugs>	 10SRE, 10ops-codfw: Degraded RAID on ganeti2013 - https://phabricator.wikimedia.org/T323222 (10Marostegui) a:03Papaul
[09:20:18] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: lists.wikimedia.org returning 500's - https://phabricator.wikimedia.org/T323448 (10Marostegui) 05Open→03Resolved Works for me too now. Going to resolve it for now.Please reopen if you run into this again. Thanks for reporting
[09:20:30] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] P:openstack::designate: remove separate profile for firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/854539 (owner: 10Majavah)
[09:22:58] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] graphite: mirror traffic to graphite1005 [puppet] - 10https://gerrit.wikimedia.org/r/860521 (https://phabricator.wikimedia.org/T318903) (owner: 10Filippo Giunchedi)
[09:23:03] <wikibugs>	 (03PS2) 10Filippo Giunchedi: graphite: mirror traffic to graphite1005 [puppet] - 10https://gerrit.wikimedia.org/r/860521 (https://phabricator.wikimedia.org/T318903)
[09:26:27] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: enalbe multi-processing for ml-staging revscoring-editquality-goodfaith model [deployment-charts] - 10https://gerrit.wikimedia.org/r/861345 (https://phabricator.wikimedia.org/T323624)
[09:27:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] New upstream release [debs/thanos] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/860846 (https://phabricator.wikimedia.org/T303154) (owner: 10Filippo Giunchedi)
[09:29:46] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] P:mediawiki::maintenance: CampaignEvents periodic [puppet] - 10https://gerrit.wikimedia.org/r/858346 (https://phabricator.wikimedia.org/T320403) (owner: 10Clément Goubert)
[09:30:31] <wikibugs>	 (03CR) 10Clément Goubert: [C: 03+2] P:mediawiki::maintenance: CampaignEvents periodic [puppet] - 10https://gerrit.wikimedia.org/r/858346 (https://phabricator.wikimedia.org/T320403) (owner: 10Clément Goubert)
[09:31:15] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "I'm a bit confused about this." [puppet] - 10https://gerrit.wikimedia.org/r/854875 (owner: 10Majavah)
[09:34:16] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] "LGTM, feel free to ignore the nits" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860924 (owner: 10Arturo Borrero Gonzalez)
[09:35:21] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is CRITICAL: 212 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[09:36:25] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is OK: (C)100 gt (W)50 gt 8 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[09:37:45] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] "Going ahead, let me know feedback post-review too" [alerts] - 10https://gerrit.wikimedia.org/r/860609 (owner: 10Filippo Giunchedi)
[09:38:03] <wikibugs>	 (03CR) 10David Caro: harbor: ensure that it's started (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/860896 (https://phabricator.wikimedia.org/T267616) (owner: 10David Caro)
[09:40:24] <wikibugs>	 10SRE, 10Phabricator, 10Traffic, 10Wikimedia-Incident: Phabricator was logging out users repeatedly (2022-08-26) - https://phabricator.wikimedia.org/T316337 (10jcrespo)
[09:40:36] <wikibugs>	 10SRE, 10Traffic: strip non session cookies before cache lookup in ATS - https://phabricator.wikimedia.org/T316338 (10jcrespo)
[09:40:40] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] hiera: unify ulsfo LVS configuration [puppet] - 10https://gerrit.wikimedia.org/r/860930 (https://phabricator.wikimedia.org/T317247) (owner: 10Ssingh)
[09:40:44] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wmcs: openstack: inventory: add support to network information (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860924 (owner: 10Arturo Borrero Gonzalez)
[09:40:52] <wikibugs>	 10SRE, 10Phabricator, 10Traffic, 10Wikimedia-Incident: Phabricator was logging out users repeatedly (2022-08-26) - https://phabricator.wikimedia.org/T316337 (10jcrespo) 05Open→03Resolved a:03Vgutierrez @hashar @Vgutierrez Please review my summary of the incident at: https://wikitech.wikimedia.org/wik...
[09:41:00] <wikibugs>	 10SRE, 10SRE-OnFire (FY2021/2022-Q3), 10Data-Engineering, 10Event-Platform Value Stream, and 2 others: Incident: 2022-03-4 Banner sampling leading to a relatively wide site outage (mostly esams) - https://phabricator.wikimedia.org/T303036 (10BTullis) I'm not sure that there's much more to do, is there? Fro...
[09:41:12] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "Valentin and I clarified this is the first phase, the next step will be to remove port 80 / plain HTTP entirely later on." [puppet] - 10https://gerrit.wikimedia.org/r/859986 (https://phabricator.wikimedia.org/T238720) (owner: 10Vgutierrez)
[09:41:36] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] gerrit: Reject non-tls requests with a 403 [puppet] - 10https://gerrit.wikimedia.org/r/859986 (https://phabricator.wikimedia.org/T238720) (owner: 10Vgutierrez)
[09:43:11] <wikibugs>	 (03CR) 10Ayounsi: Add function to int_automation to validate QFX5120 port blocks (031 comment) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/812376 (https://phabricator.wikimedia.org/T303529) (owner: 10Cathal Mooney)
[09:43:18] <wikibugs>	 (03PS5) 10Slyngshede: C:ldap::client::utils Rewrite add-ldap-group [puppet] - 10https://gerrit.wikimedia.org/r/860568
[09:43:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add role_contacts for mwlog [puppet] - 10https://gerrit.wikimedia.org/r/860886 (owner: 10Muehlenhoff)
[09:44:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Enable profile::auto_restarts::service for envoyproxy on Grafana [puppet] - 10https://gerrit.wikimedia.org/r/860576 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[09:45:05] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is CRITICAL: 109 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[09:45:28] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Deprecate and disable port 80 for one-off sites under canonical domains - https://phabricator.wikimedia.org/T238720 (10Vgutierrez)
[09:45:34] <wikibugs>	 (03PS2) 10Muehlenhoff: zookeeper: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/860907 (https://phabricator.wikimedia.org/T308013)
[09:46:31] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is OK: (C)100 gt (W)50 gt 11 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[09:46:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143 (T321126)', diff saved to https://phabricator.wikimedia.org/P41274 and previous config saved to /var/cache/conftool/dbconfig/20221128-094654-marostegui.json
[09:47:01] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[09:48:03] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Deprecate and disable port 80 for one-off sites under canonical domains - https://phabricator.wikimedia.org/T238720 (10Vgutierrez)
[09:48:45] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Deprecate and disable port 80 for one-off sites under canonical domains - https://phabricator.wikimedia.org/T238720 (10Vgutierrez)
[09:51:52] <wikibugs>	 10SRE, 10ops-ulsfo, 10Infrastructure-Foundations: Degraded RAID on ganeti4006 - https://phabricator.wikimedia.org/T321863 (10MoritzMuehlenhoff) This was some alert spam during initial setup; this is one of the new servers in ulsfo.
[09:53:13] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Deprecate and disable port 80 for one-off sites under canonical domains - https://phabricator.wikimedia.org/T238720 (10hashar) For Gerrit, I have made the announcement on [[ https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/WGIDWKB4YN3DM7K...
[09:55:54] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Provide an option menu when booting via PXE - https://phabricator.wikimedia.org/T191018 (10LSobanski) Clinic Duty drive-by tagging.
[09:56:19] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Provide a pxe-bootable rescue image - https://phabricator.wikimedia.org/T78135 (10LSobanski) Clinic Duty drive-by tagging.
[09:56:57] <wikibugs>	 10SRE, 10DC-Ops, 10Tracking-Neverending: Hardware Automation Workflow - Overall Tracking - https://phabricator.wikimedia.org/T116063 (10LSobanski) Clinic Duty drive-by tagging.
[09:57:08] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] wmcs: openstack: inventory: add support to network information (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860924 (owner: 10Arturo Borrero Gonzalez)
[10:02:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P41275 and previous config saved to /var/cache/conftool/dbconfig/20221128-100200-marostegui.json
[10:07:59] <wikibugs>	 (03PS1) 10Elukey: knative: import new upstream version 1.7.2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/861349 (https://phabricator.wikimedia.org/T323793)
[10:09:31] <wikibugs>	 (03PS3) 10David Caro: harbor: remove support for <bullseye [puppet] - 10https://gerrit.wikimedia.org/r/860623 (https://phabricator.wikimedia.org/T267616)
[10:09:33] <wikibugs>	 (03PS3) 10David Caro: harbor: remove unused harbor::db module/role [puppet] - 10https://gerrit.wikimedia.org/r/860627 (https://phabricator.wikimedia.org/T267616)
[10:09:35] <wikibugs>	 (03PS8) 10David Caro: toolforge harbor: update certs with acmechief [puppet] - 10https://gerrit.wikimedia.org/r/728629 (https://phabricator.wikimedia.org/T267616) (owner: 10Bstorm)
[10:09:37] <wikibugs>	 (03PS2) 10David Caro: harbor: ensure that it's started [puppet] - 10https://gerrit.wikimedia.org/r/860896 (https://phabricator.wikimedia.org/T267616)
[10:14:37] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_ring_manager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:16:05] <icinga-wm>	 PROBLEM - SSH on mw1326.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:17:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P41276 and previous config saved to /var/cache/conftool/dbconfig/20221128-101706-marostegui.json
[10:20:28] <wikibugs>	 (03PS14) 10Clément Goubert: opentelemetry-collector: Basic install [puppet] - 10https://gerrit.wikimedia.org/r/856931
[10:20:52] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: cloudvirt1043: move to modern NIC setup [puppet] - 10https://gerrit.wikimedia.org/r/861350 (https://phabricator.wikimedia.org/T319184)
[10:21:49] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] cloudvirt1043: move to modern NIC setup [puppet] - 10https://gerrit.wikimedia.org/r/861350 (https://phabricator.wikimedia.org/T319184) (owner: 10Arturo Borrero Gonzalez)
[10:22:38] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38453/console" [puppet] - 10https://gerrit.wikimedia.org/r/856931 (owner: 10Clément Goubert)
[10:23:51] <wikibugs>	 (03CR) 10Clément Goubert: opentelemetry-collector: Basic install [puppet] - 10https://gerrit.wikimedia.org/r/856931 (owner: 10Clément Goubert)
[10:24:17] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1] "PCC OK, see above." [puppet] - 10https://gerrit.wikimedia.org/r/856931 (owner: 10Clément Goubert)
[10:28:35] <wikibugs>	 (03PS1) 10Muehlenhoff: buster updates [puppet] - 10https://gerrit.wikimedia.org/r/861351
[10:29:55] <wikibugs>	 (03PS1) 10JMeybohm: Rewrite as kubernetes operator/controller [software/helm-state-metrics] - 10https://gerrit.wikimedia.org/r/861352 (https://phabricator.wikimedia.org/T323706)
[10:29:57] <wikibugs>	 (03PS1) 10JMeybohm: update vendor [software/helm-state-metrics] - 10https://gerrit.wikimedia.org/r/861353 (https://phabricator.wikimedia.org/T323706)
[10:30:13] <wikibugs>	 (03CR) 10Jgiannelos: api-gateway: expose restbase /api/ endpoint (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/852165 (https://phabricator.wikimedia.org/T322152) (owner: 10Hnowlan)
[10:31:38] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bullseye
[10:31:48] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1001 for host cloudvirt1043.eqiad.wmnet with O...
[10:32:10] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudvirt1043: move to modern NIC setup [puppet] - 10https://gerrit.wikimedia.org/r/861350 (https://phabricator.wikimedia.org/T319184) (owner: 10Arturo Borrero Gonzalez)
[10:32:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1143 (T321126)', diff saved to https://phabricator.wikimedia.org/P41277 and previous config saved to /var/cache/conftool/dbconfig/20221128-103213-marostegui.json
[10:32:15] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[10:32:19] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[10:32:28] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[10:32:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3314 (T321126)', diff saved to https://phabricator.wikimedia.org/P41278 and previous config saved to /var/cache/conftool/dbconfig/20221128-103234-marostegui.json
[10:32:42] <wikibugs>	 (03CR) 10Muehlenhoff: opentelemetry-collector: Basic install (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/856931 (owner: 10Clément Goubert)
[10:33:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] dcops: switch mgmt down alerts to open tasks [alerts] - 10https://gerrit.wikimedia.org/r/860525 (https://phabricator.wikimedia.org/T310266) (owner: 10Filippo Giunchedi)
[10:33:59] <wikibugs>	 (03PS3) 10Filippo Giunchedi: dcops: switch mgmt down alerts to open tasks [alerts] - 10https://gerrit.wikimedia.org/r/860525 (https://phabricator.wikimedia.org/T310266)
[10:34:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T321126)', diff saved to https://phabricator.wikimedia.org/P41279 and previous config saved to /var/cache/conftool/dbconfig/20221128-103444-marostegui.json
[10:35:00] <wikibugs>	 (03PS2) 10JMeybohm: Rewrite as kubernetes operator/controller [software/helm-state-metrics] - 10https://gerrit.wikimedia.org/r/861352 (https://phabricator.wikimedia.org/T323706)
[10:35:02] <wikibugs>	 (03PS2) 10JMeybohm: update vendor [software/helm-state-metrics] - 10https://gerrit.wikimedia.org/r/861353 (https://phabricator.wikimedia.org/T323706)
[10:35:59] <wikibugs>	 (03PS15) 10Clément Goubert: opentelemetry-collector: Basic install [puppet] - 10https://gerrit.wikimedia.org/r/856931
[10:36:36] <wikibugs>	 (03CR) 10JMeybohm: "All the yaml in config/ is auto generated by the operator-sdk" [software/helm-state-metrics] - 10https://gerrit.wikimedia.org/r/861352 (https://phabricator.wikimedia.org/T323706) (owner: 10JMeybohm)
[10:38:10] <wikibugs>	 (03PS16) 10Clément Goubert: opentelemetry-collector: Basic install [puppet] - 10https://gerrit.wikimedia.org/r/856931
[10:39:05] <wikibugs>	 (03PS21) 10Arturo Borrero Gonzalez: cookbooks: wmcs: cloudvirt: add cookbook to maintain canary VMs [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114
[10:39:09] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38455/console" [puppet] - 10https://gerrit.wikimedia.org/r/856931 (owner: 10Clément Goubert)
[10:39:46] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1] opentelemetry-collector: Basic install (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/856931 (owner: 10Clément Goubert)
[10:40:06] <wikibugs>	 (03PS3) 10David Caro: harbor: ensure that it's started [puppet] - 10https://gerrit.wikimedia.org/r/860896 (https://phabricator.wikimedia.org/T267616)
[10:41:18] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] harbor: ensure that it's started [puppet] - 10https://gerrit.wikimedia.org/r/860896 (https://phabricator.wikimedia.org/T267616) (owner: 10David Caro)
[10:46:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "I don't have any insight on the content of the service YAML config, but the puppetisation part looks good" [puppet] - 10https://gerrit.wikimedia.org/r/856931 (owner: 10Clément Goubert)
[10:48:33] <logmsgbot>	 !log aborrero@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1043.eqiad.wmnet with OS bullseye
[10:48:42] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1001 for host cloudvirt1043.eqiad.wmnet with OS bu...
[10:48:59] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bullseye
[10:49:09] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1001 for host cloudvirt1043.eqiad.wmnet with O...
[10:49:51] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P41280 and previous config saved to /var/cache/conftool/dbconfig/20221128-104950-marostegui.json
[10:51:58] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1] opentelemetry-collector: Basic install (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/856931 (owner: 10Clément Goubert)
[10:52:30] <wikibugs>	 (03PS4) 10David Caro: harbor: ensure that it's started [puppet] - 10https://gerrit.wikimedia.org/r/860896 (https://phabricator.wikimedia.org/T267616)
[10:53:18] <wikibugs>	 (03CR) 10David Caro: harbor: ensure that it's started (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/860896 (https://phabricator.wikimedia.org/T267616) (owner: 10David Caro)
[10:54:11] <wikibugs>	 (03PS17) 10Clément Goubert: opentelemetry-collector: Basic install [puppet] - 10https://gerrit.wikimedia.org/r/856931
[10:54:50] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] p::metricsinfra:haproxy: rename some vars to reflect intent [puppet] - 10https://gerrit.wikimedia.org/r/831036 (owner: 10David Caro)
[10:55:15] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38456/console" [puppet] - 10https://gerrit.wikimedia.org/r/856931 (owner: 10Clément Goubert)
[10:56:59] <wikibugs>	 (03CR) 10FNegri: harbor: ensure that it's started (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/860896 (https://phabricator.wikimedia.org/T267616) (owner: 10David Caro)
[10:58:08] <wikibugs>	 (03PS4) 10David Caro: Remove support for overriding LDAP client stack [puppet] - 10https://gerrit.wikimedia.org/r/826536 (owner: 10Majavah)
[10:58:43] <wikibugs>	 (03CR) 10David Caro: Remove support for overriding LDAP client stack (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/826536 (owner: 10Majavah)
[10:59:40] <wikibugs>	 (03PS1) 10Filippo Giunchedi: wmnet: move read traffic to graphite1005 [dns] - 10https://gerrit.wikimedia.org/r/861356 (https://phabricator.wikimedia.org/T318903)
[10:59:42] <wikibugs>	 (03PS1) 10Filippo Giunchedi: wmnet: move writes to graphite1005 [dns] - 10https://gerrit.wikimedia.org/r/861357 (https://phabricator.wikimedia.org/T318903)
[10:59:47] <wikibugs>	 (03PS2) 10Filippo Giunchedi: hieradata: pool graphite1005 for reads [puppet] - 10https://gerrit.wikimedia.org/r/860522 (https://phabricator.wikimedia.org/T318903)
[10:59:49] <wikibugs>	 (03PS1) 10Filippo Giunchedi: graphite: move alerts to graphite1005 [puppet] - 10https://gerrit.wikimedia.org/r/861358 (https://phabricator.wikimedia.org/T318903)
[10:59:51] <wikibugs>	 (03PS1) 10Filippo Giunchedi: stats: failover writes to graphite1005 [puppet] - 10https://gerrit.wikimedia.org/r/861359 (https://phabricator.wikimedia.org/T318903)
[11:02:18] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
[11:02:28] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] api-gateway: expose restbase /api/ endpoint [deployment-charts] - 10https://gerrit.wikimedia.org/r/852165 (https://phabricator.wikimedia.org/T322152) (owner: 10Hnowlan)
[11:03:07] <icinga-wm>	 PROBLEM - SSH on mw1320.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:03:17] <wikibugs>	 (03PS1) 10Filippo Giunchedi: ProductionServices: move to graphite1005 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861361 (https://phabricator.wikimedia.org/T318903)
[11:03:48] <wikibugs>	 (03PS3) 10JMeybohm: Rewrite as kubernetes operator/controller [software/helm-state-metrics] - 10https://gerrit.wikimedia.org/r/861352 (https://phabricator.wikimedia.org/T323706)
[11:03:50] <wikibugs>	 (03PS3) 10JMeybohm: update vendor [software/helm-state-metrics] - 10https://gerrit.wikimedia.org/r/861353 (https://phabricator.wikimedia.org/T323706)
[11:04:03] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ProductionServices: move to graphite1005 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861361 (https://phabricator.wikimedia.org/T318903) (owner: 10Filippo Giunchedi)
[11:04:43] <wikibugs>	 (03CR) 10David Caro: harbor: ensure that it's started (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/860896 (https://phabricator.wikimedia.org/T267616) (owner: 10David Caro)
[11:04:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P41281 and previous config saved to /var/cache/conftool/dbconfig/20221128-110456-marostegui.json
[11:05:20] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/856931 (owner: 10Clément Goubert)
[11:05:50] <logmsgbot>	 !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
[11:06:27] <wikibugs>	 (03PS22) 10Arturo Borrero Gonzalez: cookbooks: wmcs: cloudvirt: add cookbook to maintain canary VMs [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114
[11:07:00] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: cookbooks: wmcs: cloudvirt: add cookbook to maintain canary VMs (038 comments) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114 (owner: 10Arturo Borrero Gonzalez)
[11:07:42] <wikibugs>	 (03Merged) 10jenkins-bot: api-gateway: expose restbase /api/ endpoint [deployment-charts] - 10https://gerrit.wikimedia.org/r/852165 (https://phabricator.wikimedia.org/T322152) (owner: 10Hnowlan)
[11:07:44] <wikibugs>	 (03CR) 10David Caro: harbor: ensure that it's started (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/860896 (https://phabricator.wikimedia.org/T267616) (owner: 10David Caro)
[11:10:47] <wikibugs>	 (03PS2) 10Filippo Giunchedi: ProductionServices: move to graphite1005 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861361 (https://phabricator.wikimedia.org/T318903)
[11:12:12] <icinga-wm>	 PROBLEM - SSH on db1120.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:13:29] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "Happy to see this code gone. I originally introduced it some time ago, and have tried to remove it a few times already. It was never the r" [puppet] - 10https://gerrit.wikimedia.org/r/826536 (owner: 10Majavah)
[11:14:29] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: cookbooks: wmcs: cloudvirt: add cookbook to maintain canary VMs (032 comments) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114 (owner: 10Arturo Borrero Gonzalez)
[11:14:41] <wikibugs>	 (03CR) 10Phedenskog: [C: 04-1] "I don't have privileges to abandon this, but we should since we will not use WebPageTest + we wouldn't use the non open source version on " [puppet] - 10https://gerrit.wikimedia.org/r/633202 (https://phabricator.wikimedia.org/T262962) (owner: 10Dave Pifke)
[11:15:26] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1 C: 03+2] opentelemetry-collector: Basic install [puppet] - 10https://gerrit.wikimedia.org/r/856931 (owner: 10Clément Goubert)
[11:16:59] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2032.codfw.wmnet to cluster codfw and group B
[11:20:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T321126)', diff saved to https://phabricator.wikimedia.org/P41282 and previous config saved to /var/cache/conftool/dbconfig/20221128-112003-marostegui.json
[11:20:05] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[11:20:11] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[11:20:19] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[11:20:23] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[11:20:47] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[11:20:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3314 (T321126)', diff saved to https://phabricator.wikimedia.org/P41283 and previous config saved to /var/cache/conftool/dbconfig/20221128-112053-marostegui.json
[11:21:54] <wikibugs>	 (03PS1) 10Clément Goubert: opentelemetry::collector: Fix service ensure [puppet] - 10https://gerrit.wikimedia.org/r/861362 (https://phabricator.wikimedia.org/T320565)
[11:22:41] <wikibugs>	 (03PS2) 10Clément Goubert: opentelemetry::collector: Fix service ensure [puppet] - 10https://gerrit.wikimedia.org/r/861362 (https://phabricator.wikimedia.org/T320565)
[11:23:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T321126)', diff saved to https://phabricator.wikimedia.org/P41284 and previous config saved to /var/cache/conftool/dbconfig/20221128-112302-marostegui.json
[11:23:41] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38457/console" [puppet] - 10https://gerrit.wikimedia.org/r/861362 (https://phabricator.wikimedia.org/T320565) (owner: 10Clément Goubert)
[11:25:03] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1 C: 03+2] opentelemetry::collector: Fix service ensure [puppet] - 10https://gerrit.wikimedia.org/r/861362 (https://phabricator.wikimedia.org/T320565) (owner: 10Clément Goubert)
[11:26:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] zookeeper: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/860907 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[11:29:54] <wikibugs>	 (03PS2) 10Muehlenhoff: ceph: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/860908 (https://phabricator.wikimedia.org/T308013)
[11:29:56] <wikibugs>	 (03PS1) 10Clément Goubert: opentelemetry::collector: Fix config template [puppet] - 10https://gerrit.wikimedia.org/r/861364 (https://phabricator.wikimedia.org/T320565)
[11:30:41] <logmsgbot>	 !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bullseye
[11:30:50] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1001 for host cloudvirt1043.eqiad.wmnet with OS bu...
[11:30:59] <wikibugs>	 (03CR) 10FNegri: [C: 03+1] "I think if it works on toolsbeta-harbor-1 it's good enough for now, and we'll probably migrate this to k8s sooner or later." [puppet] - 10https://gerrit.wikimedia.org/r/860896 (https://phabricator.wikimedia.org/T267616) (owner: 10David Caro)
[11:31:09] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38458/console" [puppet] - 10https://gerrit.wikimedia.org/r/861364 (https://phabricator.wikimedia.org/T320565) (owner: 10Clément Goubert)
[11:31:36] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1 C: 03+2] opentelemetry::collector: Fix config template [puppet] - 10https://gerrit.wikimedia.org/r/861364 (https://phabricator.wikimedia.org/T320565) (owner: 10Clément Goubert)
[11:32:57] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:33:16] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] ceph: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/860908 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[11:38:10] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P41285 and previous config saved to /var/cache/conftool/dbconfig/20221128-113809-marostegui.json
[11:51:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] buster updates [puppet] - 10https://gerrit.wikimedia.org/r/861351 (owner: 10Muehlenhoff)
[11:51:11] <wikibugs>	 (03PS1) 10Hnowlan: thumbor: new release [deployment-charts] - 10https://gerrit.wikimedia.org/r/861367 (https://phabricator.wikimedia.org/T323775)
[11:53:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P41286 and previous config saved to /var/cache/conftool/dbconfig/20221128-115316-marostegui.json
[11:55:50] <wikibugs>	 (03PS1) 10Stevemunene: Add an-presto1006 to presto cluster [puppet] - 10https://gerrit.wikimedia.org/r/861368 (https://phabricator.wikimedia.org/T323783)
[11:57:27] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www-7.4.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method
[11:59:23] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on api_appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=POST
[12:07:07] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2105.codfw.wmnet with reason: Maintenance
[12:07:21] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2105.codfw.wmnet with reason: Maintenance
[12:07:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2105 (T323827)', diff saved to https://phabricator.wikimedia.org/P41287 and previous config saved to /var/cache/conftool/dbconfig/20221128-120727-ladsgroup.json
[12:07:33] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[12:08:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T321126)', diff saved to https://phabricator.wikimedia.org/P41288 and previous config saved to /var/cache/conftool/dbconfig/20221128-120822-marostegui.json
[12:08:24] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[12:08:28] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[12:08:37] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[12:08:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1147 (T321126)', diff saved to https://phabricator.wikimedia.org/P41289 and previous config saved to /var/cache/conftool/dbconfig/20221128-120843-marostegui.json
[12:09:28] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] similar-users: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860706 (owner: 10Giuseppe Lavagetto)
[12:10:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T321126)', diff saved to https://phabricator.wikimedia.org/P41290 and previous config saved to /var/cache/conftool/dbconfig/20221128-121052-marostegui.json
[12:13:41] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] thumbor: new release [deployment-charts] - 10https://gerrit.wikimedia.org/r/861367 (https://phabricator.wikimedia.org/T323775) (owner: 10Hnowlan)
[12:14:07] <wikibugs>	 (03Merged) 10jenkins-bot: similar-users: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860706 (owner: 10Giuseppe Lavagetto)
[12:14:39] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] termbox: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860708 (owner: 10Giuseppe Lavagetto)
[12:17:15] <icinga-wm>	 RECOVERY - SSH on mw1326.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:18:02] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[12:18:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[12:18:21] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/similar-users: apply
[12:18:22] <wikibugs>	 (03Merged) 10jenkins-bot: thumbor: new release [deployment-charts] - 10https://gerrit.wikimedia.org/r/861367 (https://phabricator.wikimedia.org/T323775) (owner: 10Hnowlan)
[12:18:29] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] DONE helmfile.d/services/similar-users: apply
[12:18:40] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2097.codfw.wmnet with reason: Maintenance
[12:18:53] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2097.codfw.wmnet with reason: Maintenance
[12:19:47] <wikibugs>	 (03Merged) 10jenkins-bot: termbox: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860708 (owner: 10Giuseppe Lavagetto)
[12:20:44] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] START helmfile.d/services/similar-users: apply
[12:21:19] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [staging] START helmfile.d/services/thumbor: sync
[12:22:11] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [staging] DONE helmfile.d/services/thumbor: sync
[12:22:28] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] DONE helmfile.d/services/similar-users: apply
[12:26:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P41291 and previous config saved to /var/cache/conftool/dbconfig/20221128-122559-marostegui.json
[12:30:58] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[12:31:01] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[12:31:46] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
[12:32:00] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
[12:32:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2109 (T323907)', diff saved to https://phabricator.wikimedia.org/P41292 and previous config saved to /var/cache/conftool/dbconfig/20221128-123206-ladsgroup.json
[12:32:12] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[12:32:31] <wikibugs>	 (03PS1) 10Hnowlan: thumbor: Correct paths for 3d2png and tinyrgb [deployment-charts] - 10https://gerrit.wikimedia.org/r/861383 (https://phabricator.wikimedia.org/T323775)
[12:32:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[12:32:45] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1105.eqiad.wmnet with reason: Maintenance
[12:32:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2104.codfw.wmnet with reason: Maintenance
[12:32:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1105:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41293 and previous config saved to /var/cache/conftool/dbconfig/20221128-123251-ladsgroup.json
[12:33:01] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[12:33:05] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2104.codfw.wmnet with reason: Maintenance
[12:33:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repool db2109', diff saved to https://phabricator.wikimedia.org/P41294 and previous config saved to /var/cache/conftool/dbconfig/20221128-123312-ladsgroup.json
[12:33:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2104 (T323827)', diff saved to https://phabricator.wikimedia.org/P41295 and previous config saved to /var/cache/conftool/dbconfig/20221128-123317-ladsgroup.json
[12:33:21] <wikibugs>	 (03CR) 10David Caro: harbor: ensure that it's started (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/860896 (https://phabricator.wikimedia.org/T267616) (owner: 10David Caro)
[12:35:46] <wikibugs>	 10SRE, 10SRE-OnFire (FY2021/2022-Q3), 10Data-Engineering, 10Event-Platform Value Stream, and 2 others: Incident: 2022-03-4 Banner sampling leading to a relatively wide site outage (mostly esams) - https://phabricator.wikimedia.org/T303036 (10lmata) 05Open→03Resolved a:03lmata Thank you @BTullis for T...
[12:36:53] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] "The blocker is gone, thanks!" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114 (owner: 10Arturo Borrero Gonzalez)
[12:37:11] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/similar-users: apply
[12:37:50] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: cookbooks: wmcs: cloudvirt: add cookbook to maintain canary VMs (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114 (owner: 10Arturo Borrero Gonzalez)
[12:38:44] <wikibugs>	 (03PS23) 10Arturo Borrero Gonzalez: cookbooks: wmcs: cloudvirt: add cookbook to maintain canary VMs [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114
[12:38:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2105 (T323827)', diff saved to https://phabricator.wikimedia.org/P41296 and previous config saved to /var/cache/conftool/dbconfig/20221128-123845-ladsgroup.json
[12:38:52] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[12:38:55] <icinga-wm>	 PROBLEM - BFD status on cr1-eqiad is CRITICAL: CRIT: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:38:58] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] DONE helmfile.d/services/similar-users: apply
[12:39:07] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[12:39:49] <icinga-wm>	 PROBLEM - BFD status on cr2-eqdfw is CRITICAL: CRIT: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:40:19] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[12:40:29] <wikibugs>	 (03PS24) 10Arturo Borrero Gonzalez: cookbooks: wmcs: cloudvirt: add cookbook to maintain canary VMs [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114
[12:40:37] <icinga-wm>	 PROBLEM - Router interfaces on cr2-drmrs is CRITICAL: CRITICAL: host 185.15.58.129, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:40:55] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
[12:41:06] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P41297 and previous config saved to /var/cache/conftool/dbconfig/20221128-124105-marostegui.json
[12:41:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2104 (T323827)', diff saved to https://phabricator.wikimedia.org/P41298 and previous config saved to /var/cache/conftool/dbconfig/20221128-124125-ladsgroup.json
[12:44:13] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
[12:44:22] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
[12:45:21] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
[12:45:34] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops, and 2 others: tinyrgb is distributed via puppet - https://phabricator.wikimedia.org/T323775 (10MoritzMuehlenhoff) There's also a fourth option that comes to my mind:  Debian already ships various ICC profiles, in two separate packages: https://tracker.de...
[12:46:35] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
[12:47:10] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
[12:50:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41299 and previous config saved to /var/cache/conftool/dbconfig/20221128-125056-ladsgroup.json
[12:51:03] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[12:51:24] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[12:51:37] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[12:51:39] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[12:51:44] <wikibugs>	 (03PS1) 10Slyngshede: ldap:management rewrite modify-mfa to use Bitu. [puppet] - 10https://gerrit.wikimedia.org/r/861385
[12:51:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[12:51:55] <wikibugs>	 (03CR) 10Muehlenhoff: C:ldap::client::utils Rewrite add-ldap-group (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/860568 (owner: 10Slyngshede)
[12:52:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1112 (T323907)', diff saved to https://phabricator.wikimedia.org/P41300 and previous config saved to /var/cache/conftool/dbconfig/20221128-125200-ladsgroup.json
[12:52:07] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[12:53:52] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ldap:management rewrite modify-mfa to use Bitu. [puppet] - 10https://gerrit.wikimedia.org/r/861385 (owner: 10Slyngshede)
[12:53:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P41301 and previous config saved to /var/cache/conftool/dbconfig/20221128-125351-ladsgroup.json
[12:54:39] <icinga-wm>	 RECOVERY - Router interfaces on cr2-drmrs is OK: OK: host 185.15.58.129, interfaces up: 61, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:55:26] <wikibugs>	 (03PS6) 10Slyngshede: C:ldap::client::utils Rewrite add-ldap-group [puppet] - 10https://gerrit.wikimedia.org/r/860568
[12:56:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T321126)', diff saved to https://phabricator.wikimedia.org/P41302 and previous config saved to /var/cache/conftool/dbconfig/20221128-125612-marostegui.json
[12:56:14] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1148.eqiad.wmnet with reason: Maintenance
[12:56:19] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[12:56:24] <wikibugs>	 (03CR) 10Slyngshede: C:ldap::client::utils Rewrite add-ldap-group (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/860568 (owner: 10Slyngshede)
[12:56:28] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1148.eqiad.wmnet with reason: Maintenance
[12:56:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P41303 and previous config saved to /var/cache/conftool/dbconfig/20221128-125632-ladsgroup.json
[12:59:35] <icinga-wm>	 PROBLEM - OSPF status on cr2-drmrs is CRITICAL: OSPFv2: 2/4 UP : OSPFv3: 2/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:00:36] <wikibugs>	 (03PS25) 10Arturo Borrero Gonzalez: cookbooks: wmcs: cloudvirt: add cookbook to maintain canary VMs [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114
[13:04:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112 (T323907)', diff saved to https://phabricator.wikimedia.org/P41304 and previous config saved to /var/cache/conftool/dbconfig/20221128-130443-ladsgroup.json
[13:04:50] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[13:04:55] <icinga-wm>	 RECOVERY - SSH on mw1320.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:06:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P41305 and previous config saved to /var/cache/conftool/dbconfig/20221128-130603-ladsgroup.json
[13:06:22] <wikibugs>	 (03PS2) 10Slyngshede: ldap:management rewrite modify-mfa to use Bitu. [puppet] - 10https://gerrit.wikimedia.org/r/861385
[13:06:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148 (T321126)', diff saved to https://phabricator.wikimedia.org/P41306 and previous config saved to /var/cache/conftool/dbconfig/20221128-130642-marostegui.json
[13:06:49] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[13:08:36] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] ldap:management rewrite modify-mfa to use Bitu. [puppet] - 10https://gerrit.wikimedia.org/r/861385 (owner: 10Slyngshede)
[13:08:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P41307 and previous config saved to /var/cache/conftool/dbconfig/20221128-130858-ladsgroup.json
[13:09:43] <wikibugs>	 (03PS3) 10Slyngshede: ldap:management rewrite modify-mfa to use Bitu. [puppet] - 10https://gerrit.wikimedia.org/r/861385
[13:10:03] <wikibugs>	 (03CR) 10Muehlenhoff: C:ldap::client::utils Rewrite add-ldap-group (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/860568 (owner: 10Slyngshede)
[13:11:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P41308 and previous config saved to /var/cache/conftool/dbconfig/20221128-131138-ladsgroup.json
[13:14:05] <icinga-wm>	 RECOVERY - SSH on db1120.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:14:29] <wikibugs>	 (03PS7) 10Slyngshede: C:ldap::client::utils Rewrite add-ldap-group [puppet] - 10https://gerrit.wikimedia.org/r/860568
[13:16:27] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[13:16:34] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[13:18:41] <godog>	 !log upgrade thanos on thanos-fe2001 - T303154
[13:18:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:18:47] <stashbot>	 T303154: Upgrade Thanos to latest version - https://phabricator.wikimedia.org/T303154
[13:19:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P41309 and previous config saved to /var/cache/conftool/dbconfig/20221128-131949-ladsgroup.json
[13:20:12] <moritzm>	 !log rebalance Ganeti group B/codfw following reboots
[13:20:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:21:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P41310 and previous config saved to /var/cache/conftool/dbconfig/20221128-132109-ladsgroup.json
[13:21:12] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Infrastructure-Foundations: Q1:rack/setup/install ganeti203[12] - https://phabricator.wikimedia.org/T313856 (10MoritzMuehlenhoff)
[13:21:49] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P41311 and previous config saved to /var/cache/conftool/dbconfig/20221128-132149-marostegui.json
[13:21:51] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: ganeti203[12] implementation tracking - https://phabricator.wikimedia.org/T313857 (10MoritzMuehlenhoff) 05Open→03Resolved ganeti2031 and ganeti2032 have been added to the codfw Ganeti cluster.
[13:21:55] <godog>	 !log upgrade thanos on thanos-fe2* - T303154
[13:22:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2105 (T323827)', diff saved to https://phabricator.wikimedia.org/P41312 and previous config saved to /var/cache/conftool/dbconfig/20221128-132404-ladsgroup.json
[13:24:07] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2109.codfw.wmnet with reason: Maintenance
[13:24:09] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2109.codfw.wmnet with reason: Maintenance
[13:24:13] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[13:24:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2109 (T323827)', diff saved to https://phabricator.wikimedia.org/P41313 and previous config saved to /var/cache/conftool/dbconfig/20221128-132415-ladsgroup.json
[13:24:52] <godog>	 !log upgrade thanos on prometheus2* - T303154
[13:24:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:58] <stashbot>	 T303154: Upgrade Thanos to latest version - https://phabricator.wikimedia.org/T303154
[13:26:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2104 (T323827)', diff saved to https://phabricator.wikimedia.org/P41314 and previous config saved to /var/cache/conftool/dbconfig/20221128-132645-ladsgroup.json
[13:26:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2125.codfw.wmnet with reason: Maintenance
[13:27:00] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2125.codfw.wmnet with reason: Maintenance
[13:27:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2125 (T323827)', diff saved to https://phabricator.wikimedia.org/P41315 and previous config saved to /var/cache/conftool/dbconfig/20221128-132706-ladsgroup.json
[13:27:18] <logmsgbot>	 !log jbond@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2050.codfw.wmnet with OS bullseye
[13:27:23] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[13:27:44] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[13:27:51] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[13:31:07] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero)
[13:32:04] <logmsgbot>	 !log filippo@cumin1001 conftool action : set/pooled=false; selector: dnsdisc=thanos-query,name=eqiad
[13:34:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P41316 and previous config saved to /var/cache/conftool/dbconfig/20221128-133456-ladsgroup.json
[13:36:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41317 and previous config saved to /var/cache/conftool/dbconfig/20221128-133615-ladsgroup.json
[13:36:17] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1122.eqiad.wmnet with reason: Maintenance
[13:36:22] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[13:36:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1122.eqiad.wmnet with reason: Maintenance
[13:36:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1122 (T323827)', diff saved to https://phabricator.wikimedia.org/P41318 and previous config saved to /var/cache/conftool/dbconfig/20221128-133648-ladsgroup.json
[13:36:51] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/860909 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[13:36:56] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P41319 and previous config saved to /var/cache/conftool/dbconfig/20221128-133655-marostegui.json
[13:38:30] <wikibugs>	 (03CR) 10Btullis: "Can you do a a PCC run please, before we merge this?" [puppet] - 10https://gerrit.wikimedia.org/r/861368 (https://phabricator.wikimedia.org/T323783) (owner: 10Stevemunene)
[13:40:43] <wikibugs>	 10SRE, 10SRE-OnFire, 10Product-Infrastructure-Team-Backlog, 10Maps (Kartotherian), 10Sustainability (Incident Followup): Kartotherian/Maps outage followups, 2020-10-29 - https://phabricator.wikimedia.org/T266807 (10lmata)  @Marostegui: Thank you for following up, I missed your earlier ping.  Reading  T26...
[13:41:22] <wikibugs>	 (03CR) 10Klausman: Add basic rate-limit capabilities to ML clusters (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/860925 (https://phabricator.wikimedia.org/T300259) (owner: 10Elukey)
[13:42:39] <wikibugs>	 (03CR) 10Klausman: [C: 03+1] knative: import new upstream version 1.7.2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/861349 (https://phabricator.wikimedia.org/T323793) (owner: 10Elukey)
[13:45:56] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2050.codfw.wmnet with reason: host reimage
[13:46:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2125 (T323827)', diff saved to https://phabricator.wikimedia.org/P41320 and previous config saved to /var/cache/conftool/dbconfig/20221128-134635-ladsgroup.json
[13:46:41] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[13:47:38] <godog>	 !log restart grafana-server on grafana1002
[13:47:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:10] <godog>	 sorry for the brief disruption ^
[13:49:22] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2050.codfw.wmnet with reason: host reimage
[13:50:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112 (T323907)', diff saved to https://phabricator.wikimedia.org/P41321 and previous config saved to /var/cache/conftool/dbconfig/20221128-135002-ladsgroup.json
[13:50:04] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[13:50:06] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[13:50:08] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[13:51:34] <moritzm>	 !log rebalance Ganeti group C/eqiad T311687
[13:51:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:51:40] <stashbot>	 T311687: Upgrade ganeti/eqiad to Bullseye - https://phabricator.wikimedia.org/T311687
[13:52:02] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148 (T321126)', diff saved to https://phabricator.wikimedia.org/P41322 and previous config saved to /var/cache/conftool/dbconfig/20221128-135202-marostegui.json
[13:52:04] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1149.eqiad.wmnet with reason: Maintenance
[13:52:08] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[13:52:17] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1149.eqiad.wmnet with reason: Maintenance
[13:52:24] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1149 (T321126)', diff saved to https://phabricator.wikimedia.org/P41323 and previous config saved to /var/cache/conftool/dbconfig/20221128-135223-marostegui.json
[13:53:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2109 (T323827)', diff saved to https://phabricator.wikimedia.org/P41324 and previous config saved to /var/cache/conftool/dbconfig/20221128-135349-ladsgroup.json
[13:53:56] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[13:54:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149 (T321126)', diff saved to https://phabricator.wikimedia.org/P41325 and previous config saved to /var/cache/conftool/dbconfig/20221128-135433-marostegui.json
[14:00:04] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, and awight: #bothumor I � Unicode. All rise for UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221128T1400).
[14:00:04] <jouncebot>	 cirno: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:01:15] <Lucas_WMDE>	 I’m still having lunch, if nobody else is around I can deploy later in the window (cirno feel free to ping me in, idk, 30 minutes?)
[14:01:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P41326 and previous config saved to /var/cache/conftool/dbconfig/20221128-140141-ladsgroup.json
[14:04:13] <wikibugs>	 (03CR) 10Jbond: "LGTM comments inline" [puppet] - 10https://gerrit.wikimedia.org/r/861385 (owner: 10Slyngshede)
[14:06:16] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2050.codfw.wmnet with OS bullseye
[14:06:23] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[14:07:34] <wikibugs>	 (03PS2) 10Jaime Nuche: create group for Release Engineering members [puppet] - 10https://gerrit.wikimedia.org/r/860836
[14:07:59] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[14:08:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P41327 and previous config saved to /var/cache/conftool/dbconfig/20221128-140855-ladsgroup.json
[14:09:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P41328 and previous config saved to /var/cache/conftool/dbconfig/20221128-140939-marostegui.json
[14:09:57] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[14:10:10] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[14:10:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1157 (T323907)', diff saved to https://phabricator.wikimedia.org/P41329 and previous config saved to /var/cache/conftool/dbconfig/20221128-141016-ladsgroup.json
[14:10:23] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[14:10:29] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to wmf for abartov - https://phabricator.wikimedia.org/T323911 (10Marostegui) p:05Triage→03Medium @Asaf from what I can see you are already part of the wmf LDAP group. Not sure if you need something else apart - @Ottomata is there anything else required to access...
[14:10:51] <cirno>	 o/
[14:11:04] <cirno>	 Lucas_WMDE: sorry I missed the ping
[14:12:05] <wikibugs>	 (03CR) 10Muehlenhoff: ldap:management rewrite modify-mfa to use Bitu. (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/861385 (owner: 10Slyngshede)
[14:12:13] <wikibugs>	 (03PS2) 10Matthias Mullie: Add mediawiki.searchpreview schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/845518 (https://phabricator.wikimedia.org/T321069)
[14:12:58] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[14:13:05] <wikibugs>	 (03PS3) 10Matthias Mullie: Add mediawiki.searchpreview schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/845518 (https://phabricator.wikimedia.org/T321069)
[14:15:51] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Multiple RAID battery failures on hadoop worker hosts - https://phabricator.wikimedia.org/T318659 (10Jclark-ctr) @BTullis  we just received batteries.  When would work best for you I would like to do them this week if...
[14:16:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P41330 and previous config saved to /var/cache/conftool/dbconfig/20221128-141648-ladsgroup.json
[14:19:09] <wikibugs>	 (03CR) 10Elukey: Add basic rate-limit capabilities to ML clusters (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/860925 (https://phabricator.wikimedia.org/T300259) (owner: 10Elukey)
[14:19:48] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Multiple RAID battery failures on hadoop worker hosts - https://phabricator.wikimedia.org/T318659 (10BTullis) >>! In T318659#8424520, @Jclark-ctr wrote: > @BTullis  we just received batteries.  When would work best for...
[14:20:10] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to wmf for abartov - https://phabricator.wikimedia.org/T323911 (10MoritzMuehlenhoff) If Asaf needs Supetset access to private tables he needs to be added to the analytics-privatedata-users group, but without an SSH key, see https://wikitech.wikimedia.org/wiki/Analyti...
[14:20:48] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Multiple RAID battery failures on hadoop worker hosts - https://phabricator.wikimedia.org/T318659 (10Jclark-ctr) Yea I am on site right now Let me know when they are ready for me
[14:21:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T323907)', diff saved to https://phabricator.wikimedia.org/P41331 and previous config saved to /var/cache/conftool/dbconfig/20221128-142107-ladsgroup.json
[14:21:14] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[14:22:11] * Lucas_WMDE back
[14:24:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P41332 and previous config saved to /var/cache/conftool/dbconfig/20221128-142402-ladsgroup.json
[14:24:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P41333 and previous config saved to /var/cache/conftool/dbconfig/20221128-142446-marostegui.json
[14:25:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by lucaswerkmeister-wmde@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860974 (https://phabricator.wikimedia.org/T323734) (owner: 10Stang)
[14:25:45] <Lucas_WMDE>	 cirno: ^
[14:25:49] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[14:25:58] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[14:26:00] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to wmf for abartov - https://phabricator.wikimedia.org/T323911 (10Marostegui) @Ottomata and @odimitrijevic could you approve the access to `analytics-privatedata-users` @Asaf could you get your manager (Simona, per namely) to approve this too?. I don't see them on ph...
[14:26:23] <wikibugs>	 (03Merged) 10jenkins-bot: wikidatawiki: Add ne language logo variant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860974 (https://phabricator.wikimedia.org/T323734) (owner: 10Stang)
[14:26:27] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
[14:26:36] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [[gerrit:860974|wikidatawiki: Add ne language logo variant (T323734)]]
[14:26:42] <stashbot>	 T323734: Move language-specific logos from Commons.css to logos.php at wikidatawiki - https://phabricator.wikimedia.org/T323734
[14:27:36] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde and stang: Backport for [[gerrit:860974|wikidatawiki: Add ne language logo variant (T323734)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
[14:27:43] <Lucas_WMDE>	 cirno: please test
[14:28:03] <Lucas_WMDE>	 https://www.wikidata.org/?uselang=ne on mwdebug looks good to me (after a force-reload)
[14:28:09] <cirno>	 Lucas_WMDE: tested with ?uselang=ne and it looks good to me
[14:28:14] <Lucas_WMDE>	 yay, thanks
[14:28:20] <Lucas_WMDE>	 syncing
[14:28:59] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[14:29:11] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[14:29:54] <Lucas_WMDE>	 trwikimedia change also looks good to me on Gerrit, I’ll deploy that afterwards
[14:30:59] <wikibugs>	 (03CR) 10Herron: [C: 03+1] hieradata: pool graphite1005 for reads [puppet] - 10https://gerrit.wikimedia.org/r/860522 (https://phabricator.wikimedia.org/T318903) (owner: 10Filippo Giunchedi)
[14:31:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2125 (T323827)', diff saved to https://phabricator.wikimedia.org/P41334 and previous config saved to /var/cache/conftool/dbconfig/20221128-143154-ladsgroup.json
[14:31:57] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2126.codfw.wmnet with reason: Maintenance
[14:32:01] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[14:32:10] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2126.codfw.wmnet with reason: Maintenance
[14:32:12] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on db2095.codfw.wmnet with reason: Maintenance
[14:32:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on db2095.codfw.wmnet with reason: Maintenance
[14:32:25] <wikibugs>	 (03PS2) 10Lucas Werkmeister (WMDE): trwikimedia: Update logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860975 (https://phabricator.wikimedia.org/T323850) (owner: 10Stang)
[14:32:28] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:860974|wikidatawiki: Add ne language logo variant (T323734)]] (duration: 05m 52s)
[14:32:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2126 (T323827)', diff saved to https://phabricator.wikimedia.org/P41335 and previous config saved to /var/cache/conftool/dbconfig/20221128-143231-ladsgroup.json
[14:32:35] <stashbot>	 T323734: Move language-specific logos from Commons.css to logos.php at wikidatawiki - https://phabricator.wikimedia.org/T323734
[14:33:08] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[14:33:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[14:33:32] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by lucaswerkmeister-wmde@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860975 (https://phabricator.wikimedia.org/T323850) (owner: 10Stang)
[14:33:47] <Lucas_WMDE>	 now we’ll find out if `scap backport` automatically purges the PNGs or if I still need to do that manually
[14:33:48] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1 C: 03+2] hiera: unify ulsfo LVS configuration [puppet] - 10https://gerrit.wikimedia.org/r/860930 (https://phabricator.wikimedia.org/T317247) (owner: 10Ssingh)
[14:33:53] <Lucas_WMDE>	 (I suspect the latter, but I’m ready to be surprised ;) )
[14:34:16] <wikibugs>	 (03Merged) 10jenkins-bot: trwikimedia: Update logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860975 (https://phabricator.wikimedia.org/T323850) (owner: 10Stang)
[14:34:28] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [[gerrit:860975|trwikimedia: Update logo (T323850)]]
[14:34:34] <stashbot>	 T323850: Change the logo of Wikimedia Turkey on tr.wikimedia.org - https://phabricator.wikimedia.org/T323850
[14:35:22] <moritzm>	 !log rebalance Ganeti group D/eqiad T311687
[14:35:25] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde and stang: Backport for [[gerrit:860975|trwikimedia: Update logo (T323850)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
[14:35:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:27] <stashbot>	 T311687: Upgrade ganeti/eqiad to Bullseye - https://phabricator.wikimedia.org/T311687
[14:35:38] <Lucas_WMDE>	 cirno: please test
[14:35:48] <Lucas_WMDE>	 (looks good on my end, I think)
[14:35:52] <cirno>	 Lucas_WMDE: looks good to me
[14:36:03] <Lucas_WMDE>	 syncing
[14:36:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P41336 and previous config saved to /var/cache/conftool/dbconfig/20221128-143613-ladsgroup.json
[14:36:24] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops: Q3:rack/setup/install db1206 - https://phabricator.wikimedia.org/T322256 (10Jclark-ctr) db1206  B1 U36 Port 26  Cableid 3285
[14:36:36] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops: Q3:rack/setup/install db1206 - https://phabricator.wikimedia.org/T322256 (10Jclark-ctr)
[14:36:51] <wikibugs>	 10SRE, 10ops-eqiad, 10DBA, 10DC-Ops: Q3:rack/setup/install db1206 - https://phabricator.wikimedia.org/T322256 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson
[14:36:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[14:37:03] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
[14:39:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2109 (T323827)', diff saved to https://phabricator.wikimedia.org/P41337 and previous config saved to /var/cache/conftool/dbconfig/20221128-143908-ladsgroup.json
[14:39:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2139.codfw.wmnet with reason: Maintenance
[14:39:16] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[14:39:24] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2139.codfw.wmnet with reason: Maintenance
[14:39:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149 (T321126)', diff saved to https://phabricator.wikimedia.org/P41338 and previous config saved to /var/cache/conftool/dbconfig/20221128-143952-marostegui.json
[14:39:53] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:860975|trwikimedia: Update logo (T323850)]] (duration: 05m 24s)
[14:39:54] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[14:39:58] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[14:40:04] <stashbot>	 T323850: Change the logo of Wikimedia Turkey on tr.wikimedia.org - https://phabricator.wikimedia.org/T323850
[14:40:08] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[14:40:10] <Lucas_WMDE>	 looks like it needs to be purged manually, one sec
[14:40:10] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1160.eqiad.wmnet with reason: Maintenance
[14:40:23] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1160.eqiad.wmnet with reason: Maintenance
[14:40:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1160 (T321126)', diff saved to https://phabricator.wikimedia.org/P41339 and previous config saved to /var/cache/conftool/dbconfig/20221128-144029-marostegui.json
[14:40:42] <wikibugs>	 (03CR) 10FNegri: [C: 03+1] harbor: remove support for <bullseye [puppet] - 10https://gerrit.wikimedia.org/r/860623 (https://phabricator.wikimedia.org/T267616) (owner: 10David Caro)
[14:40:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2126 (T323827)', diff saved to https://phabricator.wikimedia.org/P41340 and previous config saved to /var/cache/conftool/dbconfig/20221128-144050-ladsgroup.json
[14:41:11] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/trwikimedia%s.png\n' '' '-1.5x' '-2x' | mwscript purgeList.php # T323850
[14:41:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:35] <Lucas_WMDE>	 anything else to deploy?
[14:41:57] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[14:42:21] <cirno>	 that's all from me
[14:42:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1160 (T321126)', diff saved to https://phabricator.wikimedia.org/P41341 and previous config saved to /var/cache/conftool/dbconfig/20221128-144239-marostegui.json
[14:42:53] <Lucas_WMDE>	 !log UTC afternoon backport+config window done
[14:42:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:37] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] "👍" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114 (owner: 10Arturo Borrero Gonzalez)
[14:44:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1122 (T323827)', diff saved to https://phabricator.wikimedia.org/P41342 and previous config saved to /var/cache/conftool/dbconfig/20221128-144435-ladsgroup.json
[14:44:42] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[14:44:46] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[14:44:47] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[14:45:45] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[14:48:59] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[14:51:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P41343 and previous config saved to /var/cache/conftool/dbconfig/20221128-145120-ladsgroup.json
[14:52:52] <wikibugs>	 (03PS1) 10Elukey: WIP - Upgrade knative to 1.7.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/861395 (https://phabricator.wikimedia.org/T323793)
[14:53:38] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP - Upgrade knative to 1.7.2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/861395 (https://phabricator.wikimedia.org/T323793) (owner: 10Elukey)
[14:55:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P41344 and previous config saved to /var/cache/conftool/dbconfig/20221128-145556-ladsgroup.json
[14:57:12] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
[14:57:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P41345 and previous config saved to /var/cache/conftool/dbconfig/20221128-145745-marostegui.json
[14:57:54] <wikibugs>	 (03CR) 10Muehlenhoff: create group for Release Engineering members (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/860836 (owner: 10Jaime Nuche)
[14:58:59] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[14:59:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P41346 and previous config saved to /var/cache/conftool/dbconfig/20221128-145942-ladsgroup.json
[15:00:46] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] thumbor: Correct paths for 3d2png and tinyrgb [deployment-charts] - 10https://gerrit.wikimedia.org/r/861383 (https://phabricator.wikimedia.org/T323775) (owner: 10Hnowlan)
[15:06:05] <wikibugs>	 (03Merged) 10jenkins-bot: thumbor: Correct paths for 3d2png and tinyrgb [deployment-charts] - 10https://gerrit.wikimedia.org/r/861383 (https://phabricator.wikimedia.org/T323775) (owner: 10Hnowlan)
[15:06:16] <jinxer-wm>	 (ThanosSidecarBucketOperationsFailed) firing: (3) Thanos Sidecar bucket operations are failing - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarBucketOperationsFailed
[15:06:20] <wikibugs>	 10SRE, 10Epic: Encrypt all the things - https://phabricator.wikimedia.org/T111653 (10LSobanski) 05Open→03Resolved a:03LSobanski The remaining two open action items are assigned to specific teams and the value of this task is limited so I'm resolving it.
[15:06:24] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2149.codfw.wmnet with reason: Maintenance
[15:06:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T323907)', diff saved to https://phabricator.wikimedia.org/P41347 and previous config saved to /var/cache/conftool/dbconfig/20221128-150626-ladsgroup.json
[15:06:28] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[15:06:35] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[15:06:38] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2149.codfw.wmnet with reason: Maintenance
[15:06:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[15:06:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2149 (T323827)', diff saved to https://phabricator.wikimedia.org/P41348 and previous config saved to /var/cache/conftool/dbconfig/20221128-150643-ladsgroup.json
[15:06:54] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[15:06:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1166 (T323907)', diff saved to https://phabricator.wikimedia.org/P41349 and previous config saved to /var/cache/conftool/dbconfig/20221128-150654-ladsgroup.json
[15:07:57] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
[15:09:49] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Ferm should log errors when failing to create all configured rules - https://phabricator.wikimedia.org/T237020 (10LSobanski)
[15:10:45] <wikibugs>	 10SRE, 10SRE Observability: Important nagios-nrpe-server errors not showing up in unit journal - https://phabricator.wikimedia.org/T237236 (10LSobanski)
[15:11:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P41350 and previous config saved to /var/cache/conftool/dbconfig/20221128-151103-ladsgroup.json
[15:11:16] <jinxer-wm>	 (ThanosSidecarBucketOperationsFailed) firing: (10) Thanos Sidecar bucket operations are failing - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarBucketOperationsFailed
[15:12:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P41351 and previous config saved to /var/cache/conftool/dbconfig/20221128-151252-marostegui.json
[15:12:58] <icinga-wm>	 PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:13:06] <icinga-wm>	 PROBLEM - BGP status on cr2-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:13:24] <logmsgbot>	 !log jbond@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2050.codfw.wmnet with OS bullseye
[15:13:29] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[15:14:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P41352 and previous config saved to /var/cache/conftool/dbconfig/20221128-151448-ladsgroup.json
[15:15:44] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Add thanos-web.svc and discovery [dns] - 10https://gerrit.wikimedia.org/r/861396 (https://phabricator.wikimedia.org/T323913)
[15:16:15] <godog>	 looking into the thanos alert
[15:17:40] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops, and 2 others: tinyrgb is distributed via puppet - https://phabricator.wikimedia.org/T323775 (10Joe) The most obvious thing to me is to include the file in the thumbor docker image. It's ok to have a small binary that doesn't change much in it.
[15:18:32] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: thumbor: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860711
[15:19:24] <wikibugs>	 (03PS1) 10Dbrant: Enable shared Reading Lists landing page on all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861397 (https://phabricator.wikimedia.org/T313269)
[15:23:05] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cookbooks: wmcs: cloudvirt: add cookbook to maintain canary VMs [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114 (owner: 10Arturo Borrero Gonzalez)
[15:23:59] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[15:25:33] <wikibugs>	 10SRE, 10Cloud-Services, 10wikitech.wikimedia.org: Determine whether wikitech should really depend on production search cluster - https://phabricator.wikimedia.org/T110987 (10LSobanski) silver.wikimedia.org seems to be long gone and the arguments in the task so far don't make me feel strongly about setting u...
[15:25:48] <wikibugs>	 10SRE, 10Cloud-Services, 10wikitech.wikimedia.org: Determine whether wikitech should really depend on production search cluster - https://phabricator.wikimedia.org/T110987 (10LSobanski) 05Open→03Resolved a:03LSobanski
[15:26:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2126 (T323827)', diff saved to https://phabricator.wikimedia.org/P41353 and previous config saved to /var/cache/conftool/dbconfig/20221128-152609-ladsgroup.json
[15:26:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2138.codfw.wmnet with reason: Maintenance
[15:26:16] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[15:26:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2138.codfw.wmnet with reason: Maintenance
[15:26:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2138:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41354 and previous config saved to /var/cache/conftool/dbconfig/20221128-152631-ladsgroup.json
[15:27:27] <wikibugs>	 10SRE, 10Traffic, 10affects-Kiwix-and-openZIM: HTTP 500 against api.php?action=parse API on tr.wikipedia.org - https://phabricator.wikimedia.org/T317011 (10Kelson) The reported bug seems indeed to have "vanished". Thank you for the good work.
[15:27:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1160 (T321126)', diff saved to https://phabricator.wikimedia.org/P41355 and previous config saved to /var/cache/conftool/dbconfig/20221128-152758-marostegui.json
[15:28:00] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1190.eqiad.wmnet with reason: Maintenance
[15:28:05] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[15:28:14] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1190.eqiad.wmnet with reason: Maintenance
[15:28:20] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1190 (T321126)', diff saved to https://phabricator.wikimedia.org/P41356 and previous config saved to /var/cache/conftool/dbconfig/20221128-152820-marostegui.json
[15:28:44] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] thumbor: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860711 (owner: 10Giuseppe Lavagetto)
[15:29:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1122 (T323827)', diff saved to https://phabricator.wikimedia.org/P41357 and previous config saved to /var/cache/conftool/dbconfig/20221128-152955-ladsgroup.json
[15:29:57] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1129.eqiad.wmnet with reason: Maintenance
[15:30:10] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1129.eqiad.wmnet with reason: Maintenance
[15:30:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1129 (T323827)', diff saved to https://phabricator.wikimedia.org/P41358 and previous config saved to /var/cache/conftool/dbconfig/20221128-153016-ladsgroup.json
[15:30:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1190 (T321126)', diff saved to https://phabricator.wikimedia.org/P41359 and previous config saved to /var/cache/conftool/dbconfig/20221128-153029-marostegui.json
[15:32:58] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[15:33:45] <godog>	 !log revert back to thanos 0.21 - T303154
[15:33:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:51] <stashbot>	 T303154: Upgrade Thanos to latest version - https://phabricator.wikimedia.org/T303154
[15:34:07] <wikibugs>	 (03Merged) 10jenkins-bot: thumbor: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860711 (owner: 10Giuseppe Lavagetto)
[15:34:34] <logmsgbot>	 !log filippo@cumin1001 conftool action : set/pooled=true; selector: dnsdisc=thanos-query,name=eqiad
[15:34:57] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity for Hghani - https://phabricator.wikimedia.org/T322145 (10Ottomata) Hi, this sounds like an issue with your ssh config and your ssh key.  If your key is configured correctly, ssh should not prompt you for a passw...
[15:35:51] <wikibugs>	 (03CR) 10MSantos: [C: 03+1] maps: remove Cassandra and Tilerator service [puppet] - 10https://gerrit.wikimedia.org/r/860634 (https://phabricator.wikimedia.org/T298246) (owner: 10Hnowlan)
[15:35:51] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Wenjun Fan - https://phabricator.wikimedia.org/T319056 (10Ottomata) > Wenjun's access is ssh-less access to analytics-privatedata-users group, right? If so, to remove their public key from the task description Correct.
[15:36:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T323827)', diff saved to https://phabricator.wikimedia.org/P41360 and previous config saved to /var/cache/conftool/dbconfig/20221128-153628-ladsgroup.json
[15:36:35] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[15:37:21] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/thumbor: apply
[15:37:58] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[15:38:10] <wikibugs>	 (03PS1) 10Elukey: knative-serving: improve chart's dependencies [deployment-charts] - 10https://gerrit.wikimedia.org/r/861399 (https://phabricator.wikimedia.org/T303279)
[15:38:23] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] DONE helmfile.d/services/thumbor: apply
[15:38:56] <wikibugs>	 (03PS2) 10Elukey: knative-serving: improve chart's dependencies [deployment-charts] - 10https://gerrit.wikimedia.org/r/861399 (https://phabricator.wikimedia.org/T303279)
[15:38:58] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] knative-serving: improve chart's dependencies [deployment-charts] - 10https://gerrit.wikimedia.org/r/861399 (https://phabricator.wikimedia.org/T303279) (owner: 10Elukey)
[15:39:04] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] START helmfile.d/services/thumbor: apply
[15:39:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T323827)', diff saved to https://phabricator.wikimedia.org/P41361 and previous config saved to /var/cache/conftool/dbconfig/20221128-153916-ladsgroup.json
[15:39:32] <wikibugs>	 (03PS1) 10Klausman: (WIP) API GW: add config for addtional LW inference services [deployment-charts] - 10https://gerrit.wikimedia.org/r/861401 (https://phabricator.wikimedia.org/T323916)
[15:41:01] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmcs-cinder-backup-manager: allow for less frequent backups [puppet] - 10https://gerrit.wikimedia.org/r/858659 (https://phabricator.wikimedia.org/T306200) (owner: 10Andrew Bogott)
[15:41:13] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] DONE helmfile.d/services/thumbor: apply
[15:41:25] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/thumbor: apply
[15:42:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T323907)', diff saved to https://phabricator.wikimedia.org/P41362 and previous config saved to /var/cache/conftool/dbconfig/20221128-154234-ladsgroup.json
[15:42:41] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[15:43:49] <jinxer-wm>	 (ThanosSidecarBucketOperationsFailed) resolved: (10) Thanos Sidecar bucket operations are failing - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar - https://alerts.wikimedia.org/?q=alertname%3DThanosSidecarBucketOperationsFailed
[15:44:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41363 and previous config saved to /var/cache/conftool/dbconfig/20221128-154404-ladsgroup.json
[15:44:11] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[15:44:55] <wikibugs>	 (03CR) 10Jbond: "lgtm some minor nits/comments inline" [puppet] - 10https://gerrit.wikimedia.org/r/860568 (owner: 10Slyngshede)
[15:45:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P41364 and previous config saved to /var/cache/conftool/dbconfig/20221128-154536-marostegui.json
[15:46:34] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to wmf for abartov - https://phabricator.wikimedia.org/T323911 (10Ottomata) Approve!
[15:46:39] <wikibugs>	 (03PS1) 10Muehlenhoff: Update partman config for maps [puppet] - 10https://gerrit.wikimedia.org/r/861405
[15:50:54] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[15:51:02] <wikibugs>	 10ops-codfw, 10serviceops: codfw: ManagementSSHDown for ores2009 and thumbor2004 - https://phabricator.wikimedia.org/T323925 (10Papaul)
[15:51:14] <wikibugs>	 10ops-codfw, 10serviceops: codfw: ManagementSSHDown for ores2009 and thumbor2004 - https://phabricator.wikimedia.org/T323925 (10Papaul) p:05Triage→03High
[15:51:30] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[15:51:34] <wikibugs>	 10SRE, 10ops-codfw: Degraded RAID on ganeti2013 - https://phabricator.wikimedia.org/T323222 (10Papaul) p:05Triage→03Medium
[15:51:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P41365 and previous config saved to /var/cache/conftool/dbconfig/20221128-155135-ladsgroup.json
[15:52:46] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
[15:52:54] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 48976 bytes in 9.029 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[15:52:59] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[15:53:22] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.237 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[15:53:34] <wikibugs>	 10SRE, 10ops-codfw: Degraded RAID on ganeti2013 - https://phabricator.wikimedia.org/T323222 (10Papaul) @MoritzMuehlenhoff unfortunately this server is out of warranty.
[15:53:54] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[15:54:01] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[15:54:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P41366 and previous config saved to /var/cache/conftool/dbconfig/20221128-155423-ladsgroup.json
[15:56:59] <wikibugs>	 (03PS1) 10Muehlenhoff: Set role_contacts for failoid to SRE IF [puppet] - 10https://gerrit.wikimedia.org/r/861409
[15:57:41] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P41367 and previous config saved to /var/cache/conftool/dbconfig/20221128-155740-ladsgroup.json
[15:58:26] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/861409 (owner: 10Muehlenhoff)
[15:59:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P41368 and previous config saved to /var/cache/conftool/dbconfig/20221128-155910-ladsgroup.json
[15:59:12] <wikibugs>	 (03PS1) 10Filippo Giunchedi: conftool: add thanos-web service [puppet] - 10https://gerrit.wikimedia.org/r/861411 (https://phabricator.wikimedia.org/T323913)
[15:59:18] <wikibugs>	 (03PS1) 10Filippo Giunchedi: thanos: add thanos-web to catalog and frontend [puppet] - 10https://gerrit.wikimedia.org/r/861412 (https://phabricator.wikimedia.org/T323913)
[16:00:40] <logmsgbot>	 !log jbond@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2050.codfw.wmnet with OS bullseye
[16:00:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P41369 and previous config saved to /var/cache/conftool/dbconfig/20221128-160042-marostegui.json
[16:00:45] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for Dasm - https://phabricator.wikimedia.org/T322591 (10Htriedman) @andrea.denisse that is correct! 2023-06-30 is the expiry date for @dasm
[16:00:47] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[16:01:08] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[16:01:19] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[16:01:50] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
[16:02:59] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[16:03:57] <wikibugs>	 (03PS3) 10Elukey: knative-serving: improve chart's dependencies [deployment-charts] - 10https://gerrit.wikimedia.org/r/861399 (https://phabricator.wikimedia.org/T303279)
[16:04:48] <wikibugs>	 (03PS1) 10PipelineBot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/860591
[16:06:25] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
[16:06:35] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: db2174 lost power - https://phabricator.wikimedia.org/T323512 (10Papaul) I tested the HW on the server all looking good. The only error i had was error-code 2000-0251 which is not a big issue see link below for more information on error-code. I think the task can be closed. Thanks....
[16:06:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P41370 and previous config saved to /var/cache/conftool/dbconfig/20221128-160641-ladsgroup.json
[16:08:37] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Multiple RAID battery failures on hadoop worker hosts - https://phabricator.wikimedia.org/T318659 (10Jclark-ctr)
[16:08:47] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: db2174 lost power - https://phabricator.wikimedia.org/T323512 (10Marostegui) Thank you Papaul, I will get this host back to the load balancer and then close the task.
[16:09:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P41371 and previous config saved to /var/cache/conftool/dbconfig/20221128-160929-ladsgroup.json
[16:12:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P41372 and previous config saved to /var/cache/conftool/dbconfig/20221128-161247-ladsgroup.json
[16:12:52] <wikibugs>	 (03PS2) 10Muehlenhoff: Set role_contacts for failoid to SRE IF [puppet] - 10https://gerrit.wikimedia.org/r/861409
[16:14:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P41373 and previous config saved to /var/cache/conftool/dbconfig/20221128-161417-ladsgroup.json
[16:15:49] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1190 (T321126)', diff saved to https://phabricator.wikimedia.org/P41374 and previous config saved to /var/cache/conftool/dbconfig/20221128-161549-marostegui.json
[16:15:51] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1199.eqiad.wmnet with reason: Maintenance
[16:16:04] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1199.eqiad.wmnet with reason: Maintenance
[16:16:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1199 (T321126)', diff saved to https://phabricator.wikimedia.org/P41375 and previous config saved to /var/cache/conftool/dbconfig/20221128-161610-marostegui.json
[16:16:56] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[16:18:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1199 (T321126)', diff saved to https://phabricator.wikimedia.org/P41376 and previous config saved to /var/cache/conftool/dbconfig/20221128-161820-marostegui.json
[16:19:02] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2050.codfw.wmnet with reason: host reimage
[16:21:02] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[16:21:40] <wikibugs>	 (03PS1) 10Jbond: swift_disks: update for new partioning schema [puppet] - 10https://gerrit.wikimedia.org/r/861424
[16:21:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T323827)', diff saved to https://phabricator.wikimedia.org/P41377 and previous config saved to /var/cache/conftool/dbconfig/20221128-162148-ladsgroup.json
[16:21:50] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2156.codfw.wmnet with reason: Maintenance
[16:21:55] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[16:22:07] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] swift_disks: update for new partioning schema [puppet] - 10https://gerrit.wikimedia.org/r/861424 (owner: 10Jbond)
[16:22:25] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2156.codfw.wmnet with reason: Maintenance
[16:22:27] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on db2094.codfw.wmnet with reason: Maintenance
[16:22:32] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2050.codfw.wmnet with reason: host reimage
[16:22:33] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] swift_disks: update for new partioning schema [puppet] - 10https://gerrit.wikimedia.org/r/861424 (owner: 10Jbond)
[16:22:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on db2094.codfw.wmnet with reason: Maintenance
[16:22:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2156 (T323827)', diff saved to https://phabricator.wikimedia.org/P41378 and previous config saved to /var/cache/conftool/dbconfig/20221128-162246-ladsgroup.json
[16:24:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1129 (T323827)', diff saved to https://phabricator.wikimedia.org/P41379 and previous config saved to /var/cache/conftool/dbconfig/20221128-162436-ladsgroup.json
[16:24:38] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[16:24:51] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1139.eqiad.wmnet with reason: Maintenance
[16:25:13] <logmsgbot>	 !log jbond@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2050.codfw.wmnet with OS bullseye
[16:25:20] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[16:26:17] <wikibugs>	 10SRE-OnFire, 10Gerrit, 10serviceops-collab, 10Release-Engineering-Team (GitLab III: GitLab in LA 🪃), and 2 others: gerrit1001 running out of space on / - https://phabricator.wikimedia.org/T323262 (10LSobanski) a:03LSobanski
[16:27:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T323907)', diff saved to https://phabricator.wikimedia.org/P41380 and previous config saved to /var/cache/conftool/dbconfig/20221128-162753-ladsgroup.json
[16:27:55] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[16:28:03] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[16:28:09] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
[16:28:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1175 (T323907)', diff saved to https://phabricator.wikimedia.org/P41381 and previous config saved to /var/cache/conftool/dbconfig/20221128-162815-ladsgroup.json
[16:29:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41382 and previous config saved to /var/cache/conftool/dbconfig/20221128-162923-ladsgroup.json
[16:29:26] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2148.codfw.wmnet with reason: Maintenance
[16:29:30] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[16:29:39] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2148.codfw.wmnet with reason: Maintenance
[16:29:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2148 (T323827)', diff saved to https://phabricator.wikimedia.org/P41383 and previous config saved to /var/cache/conftool/dbconfig/20221128-162945-ladsgroup.json
[16:29:50] <wikibugs>	 (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861426 (https://phabricator.wikimedia.org/T128546)
[16:30:04] <jouncebot>	 jan_drewniak: #bothumor I � Unicode. All rise for Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221128T1630).
[16:32:37] <wikibugs>	 (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861426 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[16:33:19] <wikibugs>	 (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861426 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[16:33:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P41384 and previous config saved to /var/cache/conftool/dbconfig/20221128-163328-marostegui.json
[16:34:20] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[16:34:27] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[16:37:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[16:37:59] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[16:38:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[16:38:44] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[16:38:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41385 and previous config saved to /var/cache/conftool/dbconfig/20221128-163850-ladsgroup.json
[16:39:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T323827)', diff saved to https://phabricator.wikimedia.org/P41386 and previous config saved to /var/cache/conftool/dbconfig/20221128-163859-ladsgroup.json
[16:39:32] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Set role_contacts for failoid to SRE IF [puppet] - 10https://gerrit.wikimedia.org/r/861409 (owner: 10Muehlenhoff)
[16:39:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[16:39:38] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[16:39:46] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[16:39:52] <logmsgbot>	 !log jdrewniak@deploy1002 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:856611| Bumping portals to master (T128546)]] (duration: 04m 33s)
[16:40:17] <stashbot>	 T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
[16:43:29] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[16:44:21] <logmsgbot>	 !log jdrewniak@deploy1002 Synchronized portals: Wikimedia Portals Update: [[gerrit:856611| Bumping portals to master (T128546)]] (duration: 04m 28s)
[16:46:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2148 (T323827)', diff saved to https://phabricator.wikimedia.org/P41387 and previous config saved to /var/cache/conftool/dbconfig/20221128-164646-ladsgroup.json
[16:46:58] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[16:47:09] <wikibugs>	 (03PS2) 10Jbond: install_server: migrate ms-bs_simple top GPT [puppet] - 10https://gerrit.wikimedia.org/r/860581 (https://phabricator.wikimedia.org/T308677)
[16:47:59] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[16:48:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P41388 and previous config saved to /var/cache/conftool/dbconfig/20221128-164834-marostegui.json
[16:48:39] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[16:52:48] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[16:52:49] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[16:53:51] <logmsgbot>	 !log jbond@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2050.codfw.wmnet with OS bullseye
[16:53:57] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[16:54:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P41389 and previous config saved to /var/cache/conftool/dbconfig/20221128-165406-ladsgroup.json
[16:54:36] <wikibugs>	 (03PS3) 10Jbond: install_server: migrate ms-bs_simple top GPT [puppet] - 10https://gerrit.wikimedia.org/r/860581 (https://phabricator.wikimedia.org/T308677)
[16:55:09] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[16:55:17] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[16:56:36] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[16:56:54] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] neutron.conf: remove allow_overlapping_ips config flag [puppet] - 10https://gerrit.wikimedia.org/r/858646 (https://phabricator.wikimedia.org/T323319) (owner: 10Andrew Bogott)
[16:56:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41390 and previous config saved to /var/cache/conftool/dbconfig/20221128-165654-ladsgroup.json
[16:57:01] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[16:57:26] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Set service_token_roles for services that use Keystone [puppet] - 10https://gerrit.wikimedia.org/r/858647 (https://phabricator.wikimedia.org/T323319) (owner: 10Andrew Bogott)
[16:57:39] <wikibugs>	 (03PS8) 10Andrew Bogott: Set service_token_roles for services that use Keystone [puppet] - 10https://gerrit.wikimedia.org/r/858647 (https://phabricator.wikimedia.org/T323319)
[16:58:40] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:59:17] <wikibugs>	 (03PS2) 10Andrew Bogott: glance: use memcached for token caching [puppet] - 10https://gerrit.wikimedia.org/r/858651 (https://phabricator.wikimedia.org/T323319)
[17:01:01] <wikibugs>	 (03PS1) 10Marostegui: control-mariadb-client-10.4-bullseye: Back to 10.4.26 [software] - 10https://gerrit.wikimedia.org/r/861428 (https://phabricator.wikimedia.org/T323928)
[17:01:36] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] control-mariadb-client-10.4-bullseye: Back to 10.4.26 [software] - 10https://gerrit.wikimedia.org/r/861428 (https://phabricator.wikimedia.org/T323928) (owner: 10Marostegui)
[17:01:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P41391 and previous config saved to /var/cache/conftool/dbconfig/20221128-170153-ladsgroup.json
[17:02:06] <wikibugs>	 (03Merged) 10jenkins-bot: control-mariadb-client-10.4-bullseye: Back to 10.4.26 [software] - 10https://gerrit.wikimedia.org/r/861428 (https://phabricator.wikimedia.org/T323928) (owner: 10Marostegui)
[17:02:51] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] glance: use memcached for token caching [puppet] - 10https://gerrit.wikimedia.org/r/858651 (https://phabricator.wikimedia.org/T323319) (owner: 10Andrew Bogott)
[17:03:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1199 (T321126)', diff saved to https://phabricator.wikimedia.org/P41392 and previous config saved to /var/cache/conftool/dbconfig/20221128-170340-marostegui.json
[17:03:42] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[17:03:49] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[17:03:56] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[17:03:58] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2099.codfw.wmnet with reason: Maintenance
[17:04:12] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2099.codfw.wmnet with reason: Maintenance
[17:04:19] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2106.codfw.wmnet with reason: Maintenance
[17:04:32] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2106.codfw.wmnet with reason: Maintenance
[17:04:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2106 (T321126)', diff saved to https://phabricator.wikimedia.org/P41393 and previous config saved to /var/cache/conftool/dbconfig/20221128-170438-marostegui.json
[17:05:04] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Patch cinder volume_type api to allow non-uuid project ids. [puppet] - 10https://gerrit.wikimedia.org/r/857073 (https://phabricator.wikimedia.org/T301949) (owner: 10Andrew Bogott)
[17:06:00] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] trove: remove network_label_regex [puppet] - 10https://gerrit.wikimedia.org/r/858655 (https://phabricator.wikimedia.org/T323319) (owner: 10Andrew Bogott)
[17:06:12] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cinder.conf: lock_path to oslo_concurrency [puppet] - 10https://gerrit.wikimedia.org/r/858653 (https://phabricator.wikimedia.org/T323319) (owner: 10Andrew Bogott)
[17:06:38] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cinder: remove default quota settings [puppet] - 10https://gerrit.wikimedia.org/r/858654 (https://phabricator.wikimedia.org/T323319) (owner: 10Andrew Bogott)
[17:06:51] <wikibugs>	 (03PS2) 10Andrew Bogott: cinder.conf: lock_path to oslo_concurrency [puppet] - 10https://gerrit.wikimedia.org/r/858653 (https://phabricator.wikimedia.org/T323319)
[17:06:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2106 (T321126)', diff saved to https://phabricator.wikimedia.org/P41394 and previous config saved to /var/cache/conftool/dbconfig/20221128-170651-marostegui.json
[17:09:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P41395 and previous config saved to /var/cache/conftool/dbconfig/20221128-170912-ladsgroup.json
[17:09:52] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:12:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P41396 and previous config saved to /var/cache/conftool/dbconfig/20221128-171200-ladsgroup.json
[17:13:38] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2050.codfw.wmnet with reason: host reimage
[17:13:49] <logmsgbot>	 !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime for 0:15:00 on mc-wf2001.codfw.wmnet with reason: Kernel upgrade
[17:14:03] <logmsgbot>	 !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mc-wf2001.codfw.wmnet with reason: Kernel upgrade
[17:14:10] <logmsgbot>	 !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime for 0:15:00 on mc-wf2002.codfw.wmnet with reason: Kernel upgrade
[17:14:23] <logmsgbot>	 !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mc-wf2002.codfw.wmnet with reason: Kernel upgrade
[17:15:03] <wikibugs>	 (03PS2) 10Elukey: Add basic rate-limit capabilities to ML clusters [deployment-charts] - 10https://gerrit.wikimedia.org/r/860925 (https://phabricator.wikimedia.org/T300259)
[17:15:45] <wikibugs>	 (03PS1) 10Jbond: wmflib: update xfs partitions to 4/5 after conversion to GPT [puppet] - 10https://gerrit.wikimedia.org/r/861429
[17:15:47] <wikibugs>	 (03CR) 10Elukey: Add basic rate-limit capabilities to ML clusters (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/860925 (https://phabricator.wikimedia.org/T300259) (owner: 10Elukey)
[17:16:58] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] install_server: migrate ms-bs_simple top GPT [puppet] - 10https://gerrit.wikimedia.org/r/860581 (https://phabricator.wikimedia.org/T308677) (owner: 10Jbond)
[17:17:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P41397 and previous config saved to /var/cache/conftool/dbconfig/20221128-171659-ladsgroup.json
[17:17:07] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2050.codfw.wmnet with reason: host reimage
[17:17:56] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] wmflib: update xfs partitions to 4/5 after conversion to GPT [puppet] - 10https://gerrit.wikimedia.org/r/861429 (owner: 10Jbond)
[17:19:03] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] wmflib: update xfs partitions to 4/5 after conversion to GPT [puppet] - 10https://gerrit.wikimedia.org/r/861429 (owner: 10Jbond)
[17:19:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T323907)', diff saved to https://phabricator.wikimedia.org/P41398 and previous config saved to /var/cache/conftool/dbconfig/20221128-171911-ladsgroup.json
[17:19:20] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[17:20:50] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqsin is CRITICAL: CRITICAL: host 103.102.166.130, interfaces up: 80, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[17:20:52] <logmsgbot>	 !log jbond@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2050.codfw.wmnet with OS bullseye
[17:20:59] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[17:21:24] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 48975 bytes in 0.358 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:21:25] <wikibugs>	 (03PS1) 10Andrew Bogott: cinder: update volume_type_access.py.patch to resemble upstream patch [puppet] - 10https://gerrit.wikimedia.org/r/861430 (https://phabricator.wikimedia.org/T301949)
[17:21:34] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[17:21:43] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[17:21:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P41399 and previous config saved to /var/cache/conftool/dbconfig/20221128-172157-marostegui.json
[17:22:12] <wikibugs>	 (03PS2) 10Andrew Bogott: trove: remove network_label_regex [puppet] - 10https://gerrit.wikimedia.org/r/858655 (https://phabricator.wikimedia.org/T323319)
[17:22:25] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cinder: update volume_type_access.py.patch to resemble upstream patch [puppet] - 10https://gerrit.wikimedia.org/r/861430 (https://phabricator.wikimedia.org/T301949) (owner: 10Andrew Bogott)
[17:22:33] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops, 10Platform Team Workboards (Platform Engineering Reliability): 3d2png failing in Kubernetes - https://phabricator.wikimedia.org/T323936 (10hnowlan)
[17:22:47] <wikibugs>	 (03PS2) 10Andrew Bogott: cinder: remove default quota settings [puppet] - 10https://gerrit.wikimedia.org/r/858654 (https://phabricator.wikimedia.org/T323319)
[17:22:59] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[17:23:40] <wikibugs>	 (03PS1) 10Sohom Datta: Enable limited width on plwikisource MAIN namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861431 (https://phabricator.wikimedia.org/T323185)
[17:23:52] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.293 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:24:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T323827)', diff saved to https://phabricator.wikimedia.org/P41400 and previous config saved to /var/cache/conftool/dbconfig/20221128-172419-ladsgroup.json
[17:24:22] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2177.codfw.wmnet with reason: Maintenance
[17:24:28] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[17:24:36] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2177.codfw.wmnet with reason: Maintenance
[17:24:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2177 (T323827)', diff saved to https://phabricator.wikimedia.org/P41401 and previous config saved to /var/cache/conftool/dbconfig/20221128-172442-ladsgroup.json
[17:26:55] <wikibugs>	 (03CR) 10Sohom Datta: "I'll be free on Nov 30th/Dec 1st during the morning backport, but feel free to deploy before that as well if required 😊" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861431 (https://phabricator.wikimedia.org/T323185) (owner: 10Sohom Datta)
[17:27:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P41402 and previous config saved to /var/cache/conftool/dbconfig/20221128-172707-ladsgroup.json
[17:27:59] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[17:29:40] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: wmcs: libs: openstack: fix host_list regex [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/861432
[17:31:35] <jnuche>	 jouncebot: noandnext
[17:31:45] <jnuche>	 jouncebot: nowandnext
[17:31:46] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 28 minute(s)
[17:31:46] <jouncebot>	 In 0 hour(s) and 28 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221128T1800)
[17:32:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2148 (T323827)', diff saved to https://phabricator.wikimedia.org/P41403 and previous config saved to /var/cache/conftool/dbconfig/20221128-173206-ladsgroup.json
[17:32:08] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2170.codfw.wmnet with reason: Maintenance
[17:32:15] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[17:32:21] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2170.codfw.wmnet with reason: Maintenance
[17:32:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2170:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41404 and previous config saved to /var/cache/conftool/dbconfig/20221128-173227-ladsgroup.json
[17:34:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P41405 and previous config saved to /var/cache/conftool/dbconfig/20221128-173418-ladsgroup.json
[17:35:26] <logmsgbot>	 !log jnuche@deploy1002 Installing scap version "4.29.2" for 558 hosts
[17:35:53] <logmsgbot>	 !log jnuche@deploy1002 Installation of scap version "4.29.2" completed for 558 hosts
[17:36:30] <wikibugs>	 10SRE, 10Wikibase Product Platform, 10Wikimedia-Apache-configuration, 10serviceops: Incorrect handling of ETags taking precedence over timestamps in conditional requests - https://phabricator.wikimedia.org/T320241 (10jijiki) @Silvan_WMDE sorry for not replying sooner, I will take a look at this when I find...
[17:36:45] <wikibugs>	 10SRE, 10Wikibase Product Platform, 10Wikimedia-Apache-configuration, 10serviceops: Incorrect handling of ETags taking precedence over timestamps in conditional requests - https://phabricator.wikimedia.org/T320241 (10jijiki) a:03jijiki
[17:37:02] <wikibugs>	 10SRE, 10Observability-Alerting: Important nagios-nrpe-server errors not showing up in unit journal - https://phabricator.wikimedia.org/T237236 (10lmata)
[17:37:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P41406 and previous config saved to /var/cache/conftool/dbconfig/20221128-173704-marostegui.json
[17:38:20] <wikibugs>	 (03CR) 10David Caro: "Got a question there" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/861432 (owner: 10Arturo Borrero Gonzalez)
[17:39:46] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2050.codfw.wmnet with reason: host reimage
[17:40:19] <wikibugs>	 (03PS1) 10Andrew Bogott: nova: don't specify AvailabilityZoneFilter [puppet] - 10https://gerrit.wikimedia.org/r/861433 (https://phabricator.wikimedia.org/T323319)
[17:42:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41407 and previous config saved to /var/cache/conftool/dbconfig/20221128-174213-ladsgroup.json
[17:42:16] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[17:42:21] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[17:42:53] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nova: don't specify AvailabilityZoneFilter [puppet] - 10https://gerrit.wikimedia.org/r/861433 (https://phabricator.wikimedia.org/T323319) (owner: 10Andrew Bogott)
[17:43:02] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1156.eqiad.wmnet with reason: Maintenance
[17:43:03] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[17:43:11] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2050.codfw.wmnet with reason: host reimage
[17:43:18] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[17:43:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1156 (T323827)', diff saved to https://phabricator.wikimedia.org/P41408 and previous config saved to /var/cache/conftool/dbconfig/20221128-174324-ladsgroup.json
[17:43:26] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_dispersion_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:49:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P41409 and previous config saved to /var/cache/conftool/dbconfig/20221128-174925-ladsgroup.json
[17:49:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41410 and previous config saved to /var/cache/conftool/dbconfig/20221128-174951-ladsgroup.json
[17:49:58] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[17:50:47] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops, 10Platform Team Workboards (Platform Engineering Reliability): 3d2png failing in Kubernetes - https://phabricator.wikimedia.org/T323936 (10hnowlan)
[17:52:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2106 (T321126)', diff saved to https://phabricator.wikimedia.org/P41411 and previous config saved to /var/cache/conftool/dbconfig/20221128-175210-marostegui.json
[17:52:13] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2110.codfw.wmnet with reason: Maintenance
[17:52:19] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[17:52:26] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2110.codfw.wmnet with reason: Maintenance
[17:52:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2110 (T321126)', diff saved to https://phabricator.wikimedia.org/P41412 and previous config saved to /var/cache/conftool/dbconfig/20221128-175232-marostegui.json
[17:54:45] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2110 (T321126)', diff saved to https://phabricator.wikimedia.org/P41413 and previous config saved to /var/cache/conftool/dbconfig/20221128-175445-marostegui.json
[17:54:57] <wikibugs>	 (03CR) 10Majavah: P:openstack: explicit rules for haproxy backend traffic POC (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/854875 (owner: 10Majavah)
[17:54:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T323827)', diff saved to https://phabricator.wikimedia.org/P41414 and previous config saved to /var/cache/conftool/dbconfig/20221128-175458-ladsgroup.json
[17:55:04] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[17:55:58] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:56:39] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: wmcs: libs: openstack: fix host_list regex (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/861432 (owner: 10Arturo Borrero Gonzalez)
[17:57:34] <wikibugs>	 (03CR) 10Jdlrobson: Enable shared Reading Lists landing page on all wikis. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861397 (https://phabricator.wikimedia.org/T313269) (owner: 10Dbrant)
[17:59:56] <wikibugs>	 (03PS2) 10Elukey: knative: import new upstream version 1.7.2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/861349 (https://phabricator.wikimedia.org/T323793)
[18:00:05] <jouncebot>	 ryankemper: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221128T1800).
[18:00:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T323827)', diff saved to https://phabricator.wikimedia.org/P41415 and previous config saved to /var/cache/conftool/dbconfig/20221128-180015-ladsgroup.json
[18:00:23] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[18:00:42] <wikibugs>	 (03CR) 10Klausman: [C: 03+1] knative-serving: improve chart's dependencies [deployment-charts] - 10https://gerrit.wikimedia.org/r/861399 (https://phabricator.wikimedia.org/T303279) (owner: 10Elukey)
[18:00:49] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2050.codfw.wmnet with OS bullseye
[18:00:56] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[18:01:22] <wikibugs>	 (03CR) 10Elukey: "Fixed a little issue with build dependency tracking and added two new docker images, related to new daemons that we'll need to run with 1." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/861349 (https://phabricator.wikimedia.org/T323793) (owner: 10Elukey)
[18:01:45] <wikibugs>	 (03PS2) 10Dbrant: Enable shared Reading Lists landing page on all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861397 (https://phabricator.wikimedia.org/T313269)
[18:03:43] <wikibugs>	 (03CR) 10Dbrant: Enable shared Reading Lists landing page on all wikis. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861397 (https://phabricator.wikimedia.org/T313269) (owner: 10Dbrant)
[18:04:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T323907)', diff saved to https://phabricator.wikimedia.org/P41417 and previous config saved to /var/cache/conftool/dbconfig/20221128-180431-ladsgroup.json
[18:04:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
[18:04:38] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[18:04:46] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
[18:04:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1179 (T323907)', diff saved to https://phabricator.wikimedia.org/P41418 and previous config saved to /var/cache/conftool/dbconfig/20221128-180452-ladsgroup.json
[18:04:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P41419 and previous config saved to /var/cache/conftool/dbconfig/20221128-180458-ladsgroup.json
[18:05:47] <wikibugs>	 (03CR) 10Jdlrobson: [C: 03+1] Enable shared Reading Lists landing page on all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861397 (https://phabricator.wikimedia.org/T313269) (owner: 10Dbrant)
[18:05:49] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1 C: 03+2] "thank you both :)" [puppet] - 10https://gerrit.wikimedia.org/r/852260 (owner: 10Dzahn)
[18:07:52] <wikibugs>	 (03CR) 10David Caro: wmcs: libs: openstack: fix host_list regex (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/861432 (owner: 10Arturo Borrero Gonzalez)
[18:08:32] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "I approve this for the files that I've authored here. Arturo is likely the author of anything that I'm not." [puppet] - 10https://gerrit.wikimedia.org/r/860903 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[18:08:51] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1 C: 03+2] "noop confirmed on clouddumps1002, dumpsdata1003" [puppet] - 10https://gerrit.wikimedia.org/r/852260 (owner: 10Dzahn)
[18:08:54] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: wmcs: openstack: lib: ensure_canary: fix changelist calculation [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/861438
[18:08:58] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (PATCH events) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[18:09:07] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] openstack/codfw1dev: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/860903 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[18:09:17] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] Retire obsolete cloudvirt Partman recipes [puppet] - 10https://gerrit.wikimedia.org/r/859431 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[18:09:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P41420 and previous config saved to /var/cache/conftool/dbconfig/20221128-180951-marostegui.json
[18:10:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P41421 and previous config saved to /var/cache/conftool/dbconfig/20221128-181004-ladsgroup.json
[18:13:17] <wikibugs>	 (03CR) 10Jdlrobson: [C: 03+1] "patch looks good! Feel free to backport whenever is convenient!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861431 (https://phabricator.wikimedia.org/T323185) (owner: 10Sohom Datta)
[18:15:15] <MatmaRex>	 anyone around who would like to check on a maintenance script for me? https://phabricator.wikimedia.org/T315510#8392683
[18:15:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P41423 and previous config saved to /var/cache/conftool/dbconfig/20221128-181522-ladsgroup.json
[18:15:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T323907)', diff saved to https://phabricator.wikimedia.org/P41424 and previous config saved to /var/cache/conftool/dbconfig/20221128-181541-ladsgroup.json
[18:15:49] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[18:16:51] <mutante>	 MatmaRex: i'll look
[18:17:49] <wikibugs>	 (03CR) 10Andrea Denisse: "Hello, the expiry date for the users' access is confirmed." [puppet] - 10https://gerrit.wikimedia.org/r/860132 (https://phabricator.wikimedia.org/T322591) (owner: 10Andrea Denisse)
[18:18:09] <wikibugs>	 (03CR) 10Andrea Denisse: [C: 03+2] admin: Add missing email for dpujol. [puppet] - 10https://gerrit.wikimedia.org/r/860945 (https://phabricator.wikimedia.org/T322670) (owner: 10Andrea Denisse)
[18:18:41] <mutante>	 MatmaRex: the log file in taavi's homesays it started "afwikibooks" and then ends
[18:18:58] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (PATCH events) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[18:19:06] <mutante>	 "maintenance job" is normally an actual job running on actual mwmaint. this is a manually started command on the deployment server.
[18:19:31] <taavi>	 mutante: I realized that about a second after the !log, and moved it to mwmaint1002
[18:19:54] <mutante>	 taavi: ah, gotcha!:)
[18:19:59] <taavi>	 MatmaRex: it's in enwikinews now, says 'Processed 89300 (updated 32726) of 2829596 rows'
[18:19:59] <MatmaRex>	 mutante: that doesn't seem right, a few days ago folks told me it made it to commonswiki
[18:20:02] <MatmaRex>	 oh
[18:20:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P41425 and previous config saved to /var/cache/conftool/dbconfig/20221128-182004-ladsgroup.json
[18:20:14] <mutante>	 well, there you go then :)
[18:20:22] <MatmaRex>	 okay, thanks!
[18:20:26] <mutante>	 I did not know the "START" part either
[18:20:29] <wikibugs>	 (03PS1) 10Ssingh: cp5002, cp5007: decommission hosts (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861439 (https://phabricator.wikimedia.org/T323830)
[18:20:31] <wikibugs>	 (03PS1) 10Ssingh: cp5003, cp5008: decommission hosts (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861440 (https://phabricator.wikimedia.org/T323830)
[18:20:33] <wikibugs>	 (03PS1) 10Ssingh: cp5004, cp5009: decommission hosts (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861441 (https://phabricator.wikimedia.org/T323830)
[18:20:35] <wikibugs>	 (03PS1) 10Ssingh: cp5005, cp5010: decommission hosts (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861442 (https://phabricator.wikimedia.org/T323830)
[18:20:37] <wikibugs>	 (03PS1) 10Ssingh: cp5006: decommission host (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861443 (https://phabricator.wikimedia.org/T323830)
[18:23:43] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops, 10Platform Team Workboards (Platform Engineering Reliability): 3d2png failing in Kubernetes - https://phabricator.wikimedia.org/T323936 (10hnowlan)
[18:24:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P41426 and previous config saved to /var/cache/conftool/dbconfig/20221128-182458-marostegui.json
[18:25:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P41427 and previous config saved to /var/cache/conftool/dbconfig/20221128-182511-ladsgroup.json
[18:30:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P41428 and previous config saved to /var/cache/conftool/dbconfig/20221128-183028-ladsgroup.json
[18:30:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P41429 and previous config saved to /var/cache/conftool/dbconfig/20221128-183048-ladsgroup.json
[18:33:55] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Wenjun Fan - https://phabricator.wikimedia.org/T319056 (10andrea.denisse)
[18:35:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41430 and previous config saved to /var/cache/conftool/dbconfig/20221128-183511-ladsgroup.json
[18:35:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2175.codfw.wmnet with reason: Maintenance
[18:35:18] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[18:35:26] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2175.codfw.wmnet with reason: Maintenance
[18:35:33] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2175 (T323827)', diff saved to https://phabricator.wikimedia.org/P41431 and previous config saved to /var/cache/conftool/dbconfig/20221128-183532-ladsgroup.json
[18:35:59] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[18:36:13] <wikibugs>	 (03PS4) 10Jbond: convrt-ssds: update cookbook to reimage ms-be with new partition schema [cookbooks] - 10https://gerrit.wikimedia.org/r/859470 (https://phabricator.wikimedia.org/T308677)
[18:37:03] <wikibugs>	 (03PS4) 10Jbond: swift: move ms-be2050 to new naming schema [puppet] - 10https://gerrit.wikimedia.org/r/859592 (https://phabricator.wikimedia.org/T308677)
[18:38:00] <wikibugs>	 (03CR) 10Jbond: "This should be ready to go now.  i think it would be good to add this node back in and make sure everything works as expected before progr" [puppet] - 10https://gerrit.wikimedia.org/r/859592 (https://phabricator.wikimedia.org/T308677) (owner: 10Jbond)
[18:38:02] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] convrt-ssds: update cookbook to reimage ms-be with new partition schema [cookbooks] - 10https://gerrit.wikimedia.org/r/859470 (https://phabricator.wikimedia.org/T308677) (owner: 10Jbond)
[18:40:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2110 (T321126)', diff saved to https://phabricator.wikimedia.org/P41432 and previous config saved to /var/cache/conftool/dbconfig/20221128-184004-marostegui.json
[18:40:06] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2119.codfw.wmnet with reason: Maintenance
[18:40:11] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[18:40:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T323827)', diff saved to https://phabricator.wikimedia.org/P41433 and previous config saved to /var/cache/conftool/dbconfig/20221128-184017-ladsgroup.json
[18:40:19] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2119.codfw.wmnet with reason: Maintenance
[18:40:24] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[18:40:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2119 (T321126)', diff saved to https://phabricator.wikimedia.org/P41434 and previous config saved to /var/cache/conftool/dbconfig/20221128-184025-marostegui.json
[18:42:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2119 (T321126)', diff saved to https://phabricator.wikimedia.org/P41435 and previous config saved to /var/cache/conftool/dbconfig/20221128-184238-marostegui.json
[18:42:49] <icinga-wm>	 PROBLEM - Host db2101.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[18:43:21] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Security-Team: Add Kelton Hurd to wmf ldap group - https://phabricator.wikimedia.org/T323941 (10sbassett)
[18:43:45] <logmsgbot>	 !log ebernhardson@deploy1002 Started deploy [wikimedia/discovery/analytics@276aa70]: relax slas for subgraph and incoming links
[18:45:09] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Security-Team: Add Kelton Hurd to wmf ldap group - https://phabricator.wikimedia.org/T323941 (10sbassett) @KHurd-WMF - Please create a wikitech username and shell account via https://wikitech.wikimedia.org/w/index.php?title=Special:CreateAccount
[18:45:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T323827)', diff saved to https://phabricator.wikimedia.org/P41436 and previous config saved to /var/cache/conftool/dbconfig/20221128-184535-ladsgroup.json
[18:45:37] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[18:45:41] <wikibugs>	 (03PS5) 10Jbond: convrt-ssds: update cookbook to reimage ms-be with new partition schema [cookbooks] - 10https://gerrit.wikimedia.org/r/859470 (https://phabricator.wikimedia.org/T308677)
[18:45:42] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[18:45:50] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[18:45:55] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P41437 and previous config saved to /var/cache/conftool/dbconfig/20221128-184554-ladsgroup.json
[18:46:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1170:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41438 and previous config saved to /var/cache/conftool/dbconfig/20221128-184603-ladsgroup.json
[18:46:19] <logmsgbot>	 !log ebernhardson@deploy1002 Finished deploy [wikimedia/discovery/analytics@276aa70]: relax slas for subgraph and incoming links (duration: 02m 34s)
[18:48:18] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Security-Team: Add Kelton Hurd to deployment and analytics-privatedata-users groups - https://phabricator.wikimedia.org/T323943 (10sbassett)
[18:48:53] <icinga-wm>	 RECOVERY - Host db2101.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.58 ms
[18:50:59] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[18:54:21] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2175 (T323827)', diff saved to https://phabricator.wikimedia.org/P41439 and previous config saved to /var/cache/conftool/dbconfig/20221128-185420-ladsgroup.json
[18:54:28] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[18:57:45] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P41440 and previous config saved to /var/cache/conftool/dbconfig/20221128-185745-marostegui.json
[19:01:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1179 (T323907)', diff saved to https://phabricator.wikimedia.org/P41441 and previous config saved to /var/cache/conftool/dbconfig/20221128-190101-ladsgroup.json
[19:01:03] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
[19:01:08] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[19:01:16] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
[19:01:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1189 (T323907)', diff saved to https://phabricator.wikimedia.org/P41442 and previous config saved to /var/cache/conftool/dbconfig/20221128-190122-ladsgroup.json
[19:01:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41443 and previous config saved to /var/cache/conftool/dbconfig/20221128-190122-ladsgroup.json
[19:01:35] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[19:01:59] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[19:06:22] <wikibugs>	 (03PS1) 10Ssingh: P:cache::haproxy: harden systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/861445 (https://phabricator.wikimedia.org/T323944)
[19:07:32] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38459/console" [puppet] - 10https://gerrit.wikimedia.org/r/861445 (https://phabricator.wikimedia.org/T323944) (owner: 10Ssingh)
[19:09:27] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P41444 and previous config saved to /var/cache/conftool/dbconfig/20221128-190927-ladsgroup.json
[19:11:59] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[19:12:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1189 (T323907)', diff saved to https://phabricator.wikimedia.org/P41445 and previous config saved to /var/cache/conftool/dbconfig/20221128-191211-ladsgroup.json
[19:12:18] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[19:12:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P41446 and previous config saved to /var/cache/conftool/dbconfig/20221128-191251-marostegui.json
[19:16:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P41447 and previous config saved to /var/cache/conftool/dbconfig/20221128-191629-ladsgroup.json
[19:17:34] <wikibugs>	 (03PS1) 10Ottomata: beta - set message_key_fields on stream rc0.mediawiki.page_change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861446 (https://phabricator.wikimedia.org/T318846)
[19:18:59] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] beta - set message_key_fields on stream rc0.mediawiki.page_change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861446 (https://phabricator.wikimedia.org/T318846) (owner: 10Ottomata)
[19:23:39] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[19:24:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P41448 and previous config saved to /var/cache/conftool/dbconfig/20221128-192433-ladsgroup.json
[19:24:36] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[19:24:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[19:25:33] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[19:25:47] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.decommission for hosts cp[5002,5007].eqsin.wmnet
[19:27:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P41449 and previous config saved to /var/cache/conftool/dbconfig/20221128-192718-ladsgroup.json
[19:27:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2119 (T321126)', diff saved to https://phabricator.wikimedia.org/P41450 and previous config saved to /var/cache/conftool/dbconfig/20221128-192758-marostegui.json
[19:28:00] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2136.codfw.wmnet with reason: Maintenance
[19:28:04] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[19:28:24] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2136.codfw.wmnet with reason: Maintenance
[19:28:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2136 (T321126)', diff saved to https://phabricator.wikimedia.org/P41451 and previous config saved to /var/cache/conftool/dbconfig/20221128-192830-marostegui.json
[19:30:45] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2136 (T321126)', diff saved to https://phabricator.wikimedia.org/P41452 and previous config saved to /var/cache/conftool/dbconfig/20221128-193043-marostegui.json
[19:31:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P41453 and previous config saved to /var/cache/conftool/dbconfig/20221128-193135-ladsgroup.json
[19:31:47] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.dns.netbox
[19:31:58] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[19:34:20] <wikibugs>	 (03CR) 10Bking: [C: 03+2] mjolnir msearch: Reduce allowed concurrency [puppet] - 10https://gerrit.wikimedia.org/r/860129 (https://phabricator.wikimedia.org/T318575) (owner: 10Ebernhardson)
[19:35:06] <sukhe>	 herron: hi
[19:35:15] <sukhe>	 I have some pending thanos changes in the dns cookbook
[19:35:26] <sukhe>	 cwhite: or herron ^
[19:35:42] <herron>	 I think those were godog from earlier let me find the task
[19:36:19] <sukhe>	 ok!
[19:37:05] <volans>	 sukhe: if it's for the svc zonefile than merge it away
[19:37:07] <volans>	 it's a noop in prod
[19:37:23] <sukhe>	 ok thanks volans!
[19:37:25] <wikibugs>	 (03PS2) 10Dzahn: phabricator: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/860905 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[19:37:38] <sukhe>	 this diff feature is nice
[19:37:41] <sukhe>	 very much appreciated
[19:38:13] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5002,5007].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
[19:39:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2175 (T323827)', diff saved to https://phabricator.wikimedia.org/P41454 and previous config saved to /var/cache/conftool/dbconfig/20221128-193940-ladsgroup.json
[19:39:47] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[19:41:28] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5002,5007].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
[19:41:29] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:41:29] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[5002,5007].eqsin.wmnet
[19:41:35] <wikibugs>	 10SRE, 10ops-eqsin, 10DC-Ops, 10Traffic, 10Patch-For-Review: Q2:rack/setup/install/decom eqsin: unified decommission task - https://phabricator.wikimedia.org/T323830 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by sukhe@cumin2002 for hosts: `cp[5002,5007].eqsin.wmnet` - cp5002.eqsin.w...
[19:41:54] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] cp5002, cp5007: decommission hosts (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861439 (https://phabricator.wikimedia.org/T323830) (owner: 10Ssingh)
[19:41:58] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[19:42:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P41455 and previous config saved to /var/cache/conftool/dbconfig/20221128-194224-ladsgroup.json
[19:44:04] <wikibugs>	 (03PS1) 10Ottomata: rc0.mediawiki.page_change stream - produce with keyed message [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861448 (https://phabricator.wikimedia.org/T318846)
[19:44:13] <wikibugs>	 10SRE, 10ops-eqsin, 10DC-Ops, 10Traffic, 10Patch-For-Review: Q2:rack/setup/install/decom eqsin: unified decommission task - https://phabricator.wikimedia.org/T323830 (10ssingh)
[19:45:02] <wikibugs>	 (03PS2) 10Ottomata: rc0.mediawiki.page_change stream - produce with keyed message [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861448 (https://phabricator.wikimedia.org/T318846)
[19:45:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P41456 and previous config saved to /var/cache/conftool/dbconfig/20221128-194551-marostegui.json
[19:46:10] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] rc0.mediawiki.page_change stream - produce with keyed message [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861448 (https://phabricator.wikimedia.org/T318846) (owner: 10Ottomata)
[19:46:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T323827)', diff saved to https://phabricator.wikimedia.org/P41457 and previous config saved to /var/cache/conftool/dbconfig/20221128-194642-ladsgroup.json
[19:46:44] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[19:46:50] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[19:46:55] <wikibugs>	 (03Merged) 10jenkins-bot: rc0.mediawiki.page_change stream - produce with keyed message [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861448 (https://phabricator.wikimedia.org/T318846) (owner: 10Ottomata)
[19:46:57] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[19:47:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1182 (T323827)', diff saved to https://phabricator.wikimedia.org/P41458 and previous config saved to /var/cache/conftool/dbconfig/20221128-194703-ladsgroup.json
[19:47:07] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp5016 is CRITICAL: reload-vcl failed to run since 0h, 4 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[19:47:15] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp5008 is CRITICAL: reload-vcl failed to run since 0h, 4 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[19:47:17] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp5009 is CRITICAL: reload-vcl failed to run since 0h, 4 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[19:47:36] <sukhe>	 interesting
[19:47:37] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp5014 is CRITICAL: reload-vcl failed to run since 0h, 4 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[19:47:54] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp5004 is CRITICAL: reload-vcl failed to run since 0h, 4 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[19:48:07] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp5015 is CRITICAL: reload-vcl failed to run since 0h, 5 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[19:48:15] <sukhe>	 looking 
[19:48:29] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp5003 is CRITICAL: reload-vcl failed to run since 0h, 5 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[19:48:31] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp5013 is CRITICAL: reload-vcl failed to run since 0h, 5 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[19:48:41] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp5011 is CRITICAL: reload-vcl failed to run since 0h, 5 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[19:48:45] <icinga-wm>	 RECOVERY - Uncommitted DNS changes in Netbox on netbox1002 is OK: Netbox has zero uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[19:48:53] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp5006 is CRITICAL: reload-vcl failed to run since 0h, 5 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[19:49:59] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST virtualservices) on k8s-mlserve@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-mlserve - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[19:50:06] <sukhe>	 bblack: ^
[19:50:21] <sukhe>	 the confd vcl based reload thing is back, probably stemming from the depool of cp5002 and 5007!
[19:50:33] <sukhe>	 A:cp-eqsin echo OK? :)
[19:50:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[19:51:33] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:53:40] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[19:53:41] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[19:54:42] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[19:57:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1189 (T323907)', diff saved to https://phabricator.wikimedia.org/P41459 and previous config saved to /var/cache/conftool/dbconfig/20221128-195731-ladsgroup.json
[19:57:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
[19:57:38] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[19:57:47] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
[19:57:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1198 (T323907)', diff saved to https://phabricator.wikimedia.org/P41460 and previous config saved to /var/cache/conftool/dbconfig/20221128-195753-ladsgroup.json
[20:00:57] <logmsgbot>	 !log bblack@cumin1001 conftool action : set/pooled=no; selector: name=cp5028.eqsin.wmnet,service=ats-be
[20:00:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P41461 and previous config saved to /var/cache/conftool/dbconfig/20221128-200058-marostegui.json
[20:01:09] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp5003 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[20:01:20] <logmsgbot>	 !log bblack@cumin1001 conftool action : set/pooled=yes; selector: name=cp5028.eqsin.wmnet,service=ats-be
[20:04:59] <jinxer-wm>	 (KubernetesAPILatency) firing: (2) High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[20:04:59] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5020.eqsin.wmnet,service=ats-be
[20:05:10] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet,service=ats-be
[20:05:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T323827)', diff saved to https://phabricator.wikimedia.org/P41462 and previous config saved to /var/cache/conftool/dbconfig/20221128-200522-ladsgroup.json
[20:05:29] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[20:08:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1198 (T323907)', diff saved to https://phabricator.wikimedia.org/P41463 and previous config saved to /var/cache/conftool/dbconfig/20221128-200838-ladsgroup.json
[20:08:45] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[20:10:57] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:11:11] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp5004 is OK: reload-vcl successfully ran 0h, 9 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[20:13:27] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp5008 is OK: reload-vcl successfully ran 0h, 7 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[20:13:42] <wikibugs>	 (03PS1) 10Ottomata: eventgate - bump version to get keyed message support [deployment-charts] - 10https://gerrit.wikimedia.org/r/861451 (https://phabricator.wikimedia.org/T318846)
[20:14:07] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate - bump version to get keyed message support [deployment-charts] - 10https://gerrit.wikimedia.org/r/861451 (https://phabricator.wikimedia.org/T318846) (owner: 10Ottomata)
[20:14:58] <jinxer-wm>	 (KubernetesAPILatency) firing: (2) High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[20:15:48] <wikibugs>	 (03PS1) 10Kghbln: Add ProWiki feed [puppet] - 10https://gerrit.wikimedia.org/r/861452
[20:16:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2136 (T321126)', diff saved to https://phabricator.wikimedia.org/P41464 and previous config saved to /var/cache/conftool/dbconfig/20221128-201604-marostegui.json
[20:16:06] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2137.codfw.wmnet with reason: Maintenance
[20:16:13] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[20:16:30] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2137.codfw.wmnet with reason: Maintenance
[20:16:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2137:3314 (T321126)', diff saved to https://phabricator.wikimedia.org/P41465 and previous config saved to /var/cache/conftool/dbconfig/20221128-201636-marostegui.json
[20:18:19] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp5016 is OK: reload-vcl successfully ran 0h, 12 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[20:18:32] <logmsgbot>	 !log otto@deploy1002 helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
[20:18:34] <logmsgbot>	 !log otto@deploy1002 helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
[20:18:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T321126)', diff saved to https://phabricator.wikimedia.org/P41466 and previous config saved to /var/cache/conftool/dbconfig/20221128-201849-marostegui.json
[20:19:16] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
[20:20:10] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
[20:20:16] <wikibugs>	 (03CR) 10Kghbln: "Hi Daniel, according to https://www.mediawiki.org/wiki/Git/Reviewers#operations/puppet you are reviewing the planet. Will be great to get " [puppet] - 10https://gerrit.wikimedia.org/r/861452 (owner: 10Kghbln)
[20:20:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P41467 and previous config saved to /var/cache/conftool/dbconfig/20221128-202029-ladsgroup.json
[20:21:16] <wikibugs>	 (03PS2) 10Dzahn: planet: Add ProWiki feed [puppet] - 10https://gerrit.wikimedia.org/r/861452 (owner: 10Kghbln)
[20:21:29] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp5012 is CRITICAL: reload-vcl failed to run since 0h, 16 minutes. https://wikitech.wikimedia.org/wiki/Varnish
[20:21:59] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
[20:22:56] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
[20:23:05] <wikibugs>	 10SRE, 10Data Pipelines, 10Data-Engineering-Planning, 10Traffic-Icebox: Mobile redirects drop provenance parameters - https://phabricator.wikimedia.org/T252227 (10EChetty)
[20:23:37] <logmsgbot>	 !log otto@deploy1002 helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
[20:23:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P41468 and previous config saved to /var/cache/conftool/dbconfig/20221128-202345-ladsgroup.json
[20:24:14] <logmsgbot>	 !log otto@deploy1002 helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
[20:24:19] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
[20:24:26] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "Sure, no problem. per https://wikiindex.org/Jeroen_De_Dauw" [puppet] - 10https://gerrit.wikimedia.org/r/861452 (owner: 10Kghbln)
[20:25:14] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
[20:25:21] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
[20:26:13] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
[20:26:33] <logmsgbot>	 !log otto@deploy1002 helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
[20:27:04] <logmsgbot>	 !log otto@deploy1002 helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
[20:27:13] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
[20:27:23] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "lgtm. there might be a few more minor authors. not sure where we draw the line sometimes" [puppet] - 10https://gerrit.wikimedia.org/r/860905 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[20:28:03] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
[20:28:11] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Enable profile::auto_restarts::service for Envoy on planet [puppet] - 10https://gerrit.wikimedia.org/r/860560 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[20:28:16] <wikibugs>	 (03CR) 10Kghbln: planet: Add ProWiki feed (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/861452 (owner: 10Kghbln)
[20:28:27] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
[20:29:12] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
[20:29:49] <logmsgbot>	 !log otto@deploy1002 helmfile [staging] START helmfile.d/services/eventgate-main: apply
[20:30:16] <logmsgbot>	 !log otto@deploy1002 helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
[20:30:29] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] START helmfile.d/services/eventgate-main: apply
[20:31:00] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] planet: Add ProWiki feed (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/861452 (owner: 10Kghbln)
[20:31:16] <logmsgbot>	 !log otto@deploy1002 helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
[20:31:31] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
[20:32:15] <logmsgbot>	 !log otto@deploy1002 helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
[20:32:50] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "service and timer was created on planet1002. I also tested manually starting it." [puppet] - 10https://gerrit.wikimedia.org/r/860560 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[20:33:56] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P41469 and previous config saved to /var/cache/conftool/dbconfig/20221128-203356-marostegui.json
[20:35:36] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P41470 and previous config saved to /var/cache/conftool/dbconfig/20221128-203535-ladsgroup.json
[20:38:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P41471 and previous config saved to /var/cache/conftool/dbconfig/20221128-203851-ladsgroup.json
[20:42:16] <wikibugs>	 (03PS1) 10Ottomata: Revert portals to commit 2177e33bdb9db87b01be886161419d604134e0b6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861455
[20:43:41] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Revert portals to commit 2177e33bdb9db87b01be886161419d604134e0b6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861455 (owner: 10Ottomata)
[20:44:38] <wikibugs>	 (03CR) 10Kghbln: planet: Add ProWiki feed (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/861452 (owner: 10Kghbln)
[20:44:43] <wikibugs>	 (03Merged) 10jenkins-bot: Revert portals to commit 2177e33bdb9db87b01be886161419d604134e0b6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861455 (owner: 10Ottomata)
[20:48:53] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[20:49:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P41472 and previous config saved to /var/cache/conftool/dbconfig/20221128-204902-marostegui.json
[20:50:23] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[20:50:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T323827)', diff saved to https://phabricator.wikimedia.org/P41473 and previous config saved to /var/cache/conftool/dbconfig/20221128-205041-ladsgroup.json
[20:50:43] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1188.eqiad.wmnet with reason: Maintenance
[20:50:47] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[20:50:48] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[20:50:57] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1188.eqiad.wmnet with reason: Maintenance
[20:51:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1188 (T323827)', diff saved to https://phabricator.wikimedia.org/P41474 and previous config saved to /var/cache/conftool/dbconfig/20221128-205103-ladsgroup.json
[20:51:23] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[20:51:24] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[20:52:16] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[20:53:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1198 (T323907)', diff saved to https://phabricator.wikimedia.org/P41475 and previous config saved to /var/cache/conftool/dbconfig/20221128-205358-ladsgroup.json
[20:54:00] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[20:54:02] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[20:54:04] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[20:55:56] <wikibugs>	 (03CR) 10Andrea Denisse: "Hi, I've implemented your suggestions and the PCC results look good to me: https://puppet-compiler.wmflabs.org/output/854951/38447/" [puppet] - 10https://gerrit.wikimedia.org/r/854951 (https://phabricator.wikimedia.org/T315523) (owner: 10Andrea Denisse)
[20:59:45] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5003.eqsin.wmnet,service=ats-tls
[20:59:46] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5003.eqsin.wmnet,service=ats-be
[20:59:46] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5003.eqsin.wmnet,service=varnish-fe
[21:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, and kindrobot: That opportune time is upon us again. Time for a UTC late backport window deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221128T2100).
[21:00:05] <jouncebot>	 dbrant: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:07] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp5006 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[21:01:03] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp5014 is OK: reload-vcl successfully ran 0h, 1 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[21:01:17] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp5003.eqsin.wmnet with reason: downtimed, to be depooled
[21:01:31] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp5013 is OK: reload-vcl successfully ran 0h, 1 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[21:01:33] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp5003.eqsin.wmnet with reason: downtimed, to be depooled
[21:01:51] <wikibugs>	 (03PS1) 10BBlack: p::phabricator::main: remove unused $cache_nodes [puppet] - 10https://gerrit.wikimedia.org/r/861460
[21:02:04] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5008.eqsin.wmnet,service=ats-tls
[21:02:04] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5008.eqsin.wmnet,service=ats-be
[21:02:05] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5008.eqsin.wmnet,service=varnish-fe
[21:02:27] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp5015 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[21:02:28] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp5008.eqsin.wmnet with reason: downtimed, to be depooled
[21:02:43] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp5008.eqsin.wmnet with reason: downtimed, to be depooled
[21:02:45] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] cp5003, cp5008: decommission hosts (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861440 (https://phabricator.wikimedia.org/T323830) (owner: 10Ssingh)
[21:03:09] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /_info (retrieve service info) is CRITICAL: Test retrieve service info returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Citoid
[21:03:09] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp5012 is OK: reload-vcl successfully ran 0h, 0 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[21:03:36] <wikibugs>	 (03PS2) 10Ssingh: cp5003, cp5008: decommission hosts (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861440 (https://phabricator.wikimedia.org/T323830)
[21:04:07] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp5011 is OK: reload-vcl successfully ran 0h, 1 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[21:04:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T321126)', diff saved to https://phabricator.wikimedia.org/P41476 and previous config saved to /var/cache/conftool/dbconfig/20221128-210408-marostegui.json
[21:04:10] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2138.codfw.wmnet with reason: Maintenance
[21:04:13] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2138.codfw.wmnet with reason: Maintenance
[21:04:15] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp5009 is OK: reload-vcl successfully ran 0h, 1 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish
[21:04:15] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[21:04:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2138:3314 (T321126)', diff saved to https://phabricator.wikimedia.org/P41477 and previous config saved to /var/cache/conftool/dbconfig/20221128-210419-marostegui.json
[21:04:59] <jinxer-wm>	 (KubernetesAPILatency) firing: (2) High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[21:05:09] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[21:06:08] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host arclamp1001.eqiad.wmnet with OS bullseye
[21:06:12] <dbrant>	 o/ deployers around?
[21:06:12] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: Q2:rack/setup/install arclamp1001.eqiad.wmnet - https://phabricator.wikimedia.org/T319433 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host arclamp1001.eqiad.wmnet with OS bullseye
[21:06:30] <cjming>	 I can deploy
[21:06:32] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "aha, lgtm, I don't see anything using it. compiler shows it's just removing the parameter and values though: https://puppet-compiler.wmfla" [puppet] - 10https://gerrit.wikimedia.org/r/861460 (owner: 10BBlack)
[21:06:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T321126)', diff saved to https://phabricator.wikimedia.org/P41478 and previous config saved to /var/cache/conftool/dbconfig/20221128-210632-marostegui.json
[21:06:57] <wikibugs>	 (03PS3) 10Clare Ming: Enable shared Reading Lists landing page on all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861397 (https://phabricator.wikimedia.org/T313269) (owner: 10Dbrant)
[21:08:29] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by cjming@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861397 (https://phabricator.wikimedia.org/T313269) (owner: 10Dbrant)
[21:09:04] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "Oh, it's a great change, thanks!:) The topic branch was just a little detail for me but it's nice to have them grouped. The "planet: " pre" [puppet] - 10https://gerrit.wikimedia.org/r/861452 (owner: 10Kghbln)
[21:09:13] <wikibugs>	 (03Merged) 10jenkins-bot: Enable shared Reading Lists landing page on all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/861397 (https://phabricator.wikimedia.org/T313269) (owner: 10Dbrant)
[21:09:27] <logmsgbot>	 !log cjming@deploy1002 Started scap: Backport for [[gerrit:861397|Enable shared Reading Lists landing page on all wikis. (T313269)]]
[21:09:33] <stashbot>	 T313269: Shareable Reading Lists - https://phabricator.wikimedia.org/T313269
[21:10:26] <wikibugs>	 (03PS2) 10BBlack: p::phabricator::main: remove unused $cache_nodes [puppet] - 10https://gerrit.wikimedia.org/r/861460 (https://phabricator.wikimedia.org/T270185)
[21:10:27] <logmsgbot>	 !log cjming@deploy1002 cjming and dbrant: Backport for [[gerrit:861397|Enable shared Reading Lists landing page on all wikis. (T313269)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
[21:10:50] <cjming>	 dbrant: up on test servers if you'd like to verify
[21:11:17] <dbrant>	 cjming: confirmed! looks good
[21:11:38] <cjming>	 cool - syncing
[21:11:59] <wikibugs>	 (03CR) 10BBlack: p::phabricator::main: remove unused $cache_nodes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/861460 (https://phabricator.wikimedia.org/T270185) (owner: 10BBlack)
[21:12:36] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.decommission for hosts cp[5003,5008].eqsin.wmnet
[21:12:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[21:13:38] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[21:13:39] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[21:13:48] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] p::phabricator::main: remove unused $cache_nodes [puppet] - 10https://gerrit.wikimedia.org/r/861460 (https://phabricator.wikimedia.org/T270185) (owner: 10BBlack)
[21:14:44] <wikibugs>	 (03PS1) 10MSantos: wikifeeds: bump to 2022-11-28-160349-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/861461
[21:14:59] <jinxer-wm>	 (KubernetesAPILatency) firing: (2) High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[21:15:49] <logmsgbot>	 !log cjming@deploy1002 Finished scap: Backport for [[gerrit:861397|Enable shared Reading Lists landing page on all wikis. (T313269)]] (duration: 06m 22s)
[21:15:56] <stashbot>	 T313269: Shareable Reading Lists - https://phabricator.wikimedia.org/T313269
[21:15:59] <cjming>	 dbrant: live!
[21:16:10] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[21:16:23] <dbrant>	 cjming: excellent, many thanks!
[21:16:45] <icinga-wm>	 PROBLEM - Check systemd state on thanos-fe1001 is CRITICAL: CRITICAL - degraded: The following units failed: swift_ring_manager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:16:45] <cjming>	 so welcome!
[21:17:15] <cjming>	 I'll hang out for a bit longer before closing the backport window
[21:18:31] <wikibugs>	 (03PS11) 10Andrea Denisse: netmon: Open LibreNMS port for netmon2002. [puppet] - 10https://gerrit.wikimedia.org/r/854951 (https://phabricator.wikimedia.org/T315523)
[21:18:33] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.dns.netbox
[21:19:31] <icinga-wm>	 PROBLEM - Check systemd state on kubernetes2011 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:20:17] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes2011 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[21:20:49] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5003,5008].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
[21:21:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P41479 and previous config saved to /var/cache/conftool/dbconfig/20221128-212138-marostegui.json
[21:23:05] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5003,5008].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
[21:23:05] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[21:23:06] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[5003,5008].eqsin.wmnet
[21:23:14] <wikibugs>	 10SRE, 10ops-eqsin, 10DC-Ops, 10Traffic, 10Patch-For-Review: Q2:rack/setup/install/decom eqsin: unified decommission task - https://phabricator.wikimedia.org/T323830 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by sukhe@cumin2002 for hosts: `cp[5003,5008].eqsin.wmnet` - cp5003.eqsin.w...
[21:25:12] <wikibugs>	 10SRE, 10ops-eqsin, 10DC-Ops, 10Traffic, 10Patch-For-Review: Q2:rack/setup/install/decom eqsin: unified decommission task - https://phabricator.wikimedia.org/T323830 (10ssingh)
[21:27:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T323827)', diff saved to https://phabricator.wikimedia.org/P41480 and previous config saved to /var/cache/conftool/dbconfig/20221128-212702-ladsgroup.json
[21:27:23] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[21:27:39] <wikibugs>	 (03PS1) 10BBlack: docker_registry_ha: remove unused cache::nodes ref [puppet] - 10https://gerrit.wikimedia.org/r/861463 (https://phabricator.wikimedia.org/T256762)
[21:29:58] <jinxer-wm>	 (KubernetesAPILatency) firing: (2) High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[21:33:33] <cjming>	 !log end of UTC late backport window
[21:33:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:36:45] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P41481 and previous config saved to /var/cache/conftool/dbconfig/20221128-213645-marostegui.json
[21:39:59] <jinxer-wm>	 (KubernetesAPILatency) firing: (2) High Kubernetes API latency (LIST secrets) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[21:42:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P41482 and previous config saved to /var/cache/conftool/dbconfig/20221128-214208-ladsgroup.json
[21:44:32] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:44:36] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5004.eqsin.wmnet,service=ats-tls
[21:44:36] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5004.eqsin.wmnet,service=ats-be
[21:44:36] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5004.eqsin.wmnet,service=varnish-fe
[21:44:37] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5009.eqsin.wmnet,service=ats-tls
[21:44:37] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5009.eqsin.wmnet,service=ats-be
[21:44:37] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5009.eqsin.wmnet,service=varnish-fe
[21:46:05] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp[5004,5009].eqsin.wmnet with reason: downtimed, to be depooled
[21:46:22] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp[5004,5009].eqsin.wmnet with reason: downtimed, to be depooled
[21:47:49] <wikibugs>	 (03PS2) 10Ssingh: cp5004, cp5009: decommission hosts (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861441 (https://phabricator.wikimedia.org/T323830)
[21:48:38] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] cp5004, cp5009: decommission hosts (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861441 (https://phabricator.wikimedia.org/T323830) (owner: 10Ssingh)
[21:50:52] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes2011 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[21:51:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T321126)', diff saved to https://phabricator.wikimedia.org/P41483 and previous config saved to /var/cache/conftool/dbconfig/20221128-215151-marostegui.json
[21:51:54] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2139.codfw.wmnet with reason: Maintenance
[21:51:56] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2139.codfw.wmnet with reason: Maintenance
[21:51:59] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[21:52:03] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2147.codfw.wmnet with reason: Maintenance
[21:52:17] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2147.codfw.wmnet with reason: Maintenance
[21:52:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2147 (T321126)', diff saved to https://phabricator.wikimedia.org/P41484 and previous config saved to /var/cache/conftool/dbconfig/20221128-215223-marostegui.json
[21:54:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2147 (T321126)', diff saved to https://phabricator.wikimedia.org/P41485 and previous config saved to /var/cache/conftool/dbconfig/20221128-215435-marostegui.json
[21:55:04] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.decommission for hosts cp[5004,5009].eqsin.wmnet
[21:57:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P41486 and previous config saved to /var/cache/conftool/dbconfig/20221128-215715-ladsgroup.json
[21:59:59] <jinxer-wm>	 (KubernetesAPILatency) firing: (3) High Kubernetes API latency (LIST virtualservices) on k8s-mlserve@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[22:00:01] <brennen>	 !log phabricator: phab1001 -> phab1004 migration starting soon; downtime expected (T280597)
[22:00:05] <jouncebot>	 Reedy, sbassett, Maryum, and manfredi: May I have your attention please! Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221128T2200)
[22:00:05] <jouncebot>	 mutante and brennen: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Phabricator migration to phab1004 . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221128T2200).
[22:00:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:00:08] <stashbot>	 T280597: move phabricator to new hardware generation - https://phabricator.wikimedia.org/T280597
[22:00:40] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on phab1001.eqiad.wmnet with reason: T322250
[22:00:46] <stashbot>	 T322250: decom phab2001 (service owner) - https://phabricator.wikimedia.org/T322250
[22:00:56] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on phab1001.eqiad.wmnet with reason: T322250
[22:03:28] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.dns.netbox
[22:05:07] <wikibugs>	 (03PS2) 10Ssingh: cp5005, cp5010: decommission hosts (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861442 (https://phabricator.wikimedia.org/T323830)
[22:06:03] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5004,5009].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
[22:07:19] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5004,5009].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
[22:07:19] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[22:07:19] <logmsgbot>	 !log sukhe@cumin2002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts cp[5004,5009].eqsin.wmnet
[22:08:43] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host arclamp1001.eqiad.wmnet with OS bullseye
[22:09:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20221128-220944-marostegui.json
[22:11:48] <icinga-wm>	 RECOVERY - Check systemd state on thanos-fe1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:12:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T323827)', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20221128-221221-ladsgroup.json
[22:12:23] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1197.eqiad.wmnet with reason: Maintenance
[22:12:36] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1197.eqiad.wmnet with reason: Maintenance
[22:12:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1197 (T323827)', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20221128-221242-ladsgroup.json
[22:15:43] <wikibugs>	 (03PS1) 10JHathaway: postfix::mx: vrts password [labs/private] - 10https://gerrit.wikimedia.org/r/861487
[22:18:38] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1043 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:20:52] <wikibugs>	 (03PS12) 10Andrew Bogott: Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751
[22:21:39] <wikibugs>	 (03CR) 10JHathaway: [C: 03+2] postfix::mx: vrts password [labs/private] - 10https://gerrit.wikimedia.org/r/861487 (owner: 10JHathaway)
[22:21:41] <wikibugs>	 (03CR) 10JHathaway: [V: 03+2 C: 03+2] postfix::mx: vrts password [labs/private] - 10https://gerrit.wikimedia.org/r/861487 (owner: 10JHathaway)
[22:23:54] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751 (owner: 10Andrew Bogott)
[22:24:11] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+1] "Discussed during migration window." [puppet] - 10https://gerrit.wikimedia.org/r/859145 (https://phabricator.wikimedia.org/T280597) (owner: 10Dzahn)
[22:24:25] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] phabricator: set mysql master port for eqiad [puppet] - 10https://gerrit.wikimedia.org/r/859145 (https://phabricator.wikimedia.org/T280597) (owner: 10Dzahn)
[22:24:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20221128-222450-marostegui.json
[22:25:24] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5005.eqsin.wmnet,service=ats-tls
[22:25:24] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5005.eqsin.wmnet,service=ats-be
[22:25:25] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5005.eqsin.wmnet,service=varnish-fe
[22:25:25] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5010.eqsin.wmnet,service=ats-tls
[22:25:25] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5010.eqsin.wmnet,service=ats-be
[22:25:26] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5010.eqsin.wmnet,service=varnish-fe
[22:26:06] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp[5005,5010].eqsin.wmnet with reason: downtimed, to be depooled
[22:26:23] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp[5005,5010].eqsin.wmnet with reason: downtimed, to be depooled
[22:26:29] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] cp5005, cp5010: decommission hosts (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861442 (https://phabricator.wikimedia.org/T323830) (owner: 10Ssingh)
[22:26:57] <sukhe>	 jhathaway: merging your labs/private change!
[22:27:08] <jhathaway>	 sukhe: thanks
[22:27:08] <sukhe>	 er no, not labs/private but
[22:27:11] <sukhe>	 ok
[22:30:41] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+1] Revert "Revert "hieradata: switch active Phabricator server to phab1004"" [puppet] - 10https://gerrit.wikimedia.org/r/860031 (owner: 10Dzahn)
[22:31:22] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/output/860031/38461/phab1004.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/860031 (owner: 10Dzahn)
[22:31:28] <wikibugs>	 (03PS2) 10Dzahn: Revert "Revert "hieradata: switch active Phabricator server to phab1004"" [puppet] - 10https://gerrit.wikimedia.org/r/860031
[22:32:13] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.decommission for hosts cp[5005,5010].eqsin.wmnet
[22:36:28] <wikibugs>	 (03PS2) 10Ssingh: cp5006: decommission host (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861443 (https://phabricator.wikimedia.org/T323830)
[22:37:43] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.dns.netbox
[22:39:32] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5005,5010].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
[22:39:56] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2147 (T321126)', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20221128-223956-marostegui.json
[22:39:58] <jinxer-wm>	 (KubernetesAPILatency) firing: (4) High Kubernetes API latency (LIST certificates) on k8s-mlserve@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[22:39:59] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2155.codfw.wmnet with reason: Maintenance
[22:40:12] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2155.codfw.wmnet with reason: Maintenance
[22:40:14] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2095.codfw.wmnet with reason: Maintenance
[22:40:16] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2095.codfw.wmnet with reason: Maintenance
[22:40:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2155 (T321126)', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20221128-224022-marostegui.json
[22:41:19] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[5005,5010].eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
[22:41:20] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[22:41:20] <logmsgbot>	 !log sukhe@cumin2002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts cp[5005,5010].eqsin.wmnet
[22:42:00] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5006.eqsin.wmnet,service=ats-tls
[22:42:01] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5006.eqsin.wmnet,service=ats-be
[22:42:01] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5006.eqsin.wmnet,service=varnish-fe
[22:42:33] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp5006.eqsin.wmnet with reason: downtimed, to be depooled
[22:42:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T321126)', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20221128-224235-marostegui.json
[22:42:48] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp5006.eqsin.wmnet with reason: downtimed, to be depooled
[22:42:51] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] cp5006: decommission host (eqsin hardware refresh) [puppet] - 10https://gerrit.wikimedia.org/r/861443 (https://phabricator.wikimedia.org/T323830) (owner: 10Ssingh)
[22:44:19] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1043 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:47:06] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.decommission for hosts cp5006.eqsin.wmnet
[22:50:07] <wikibugs>	 (03PS13) 10Andrew Bogott: Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751
[22:50:41] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
[22:50:55] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
[22:51:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2105 (T323907)', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20221128-225101-ladsgroup.json
[22:52:08] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.dns.netbox
[22:53:17] <logmsgbot>	 !log brennen@deploy1002 Started deploy [phabricator/deployment@f68dc24]: deploy config changes for phab1001 -> phab1004 (T280597)
[22:53:28] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751 (owner: 10Andrew Bogott)
[22:54:10] <logmsgbot>	 !log brennen@deploy1002 Finished deploy [phabricator/deployment@f68dc24]: deploy config changes for phab1001 -> phab1004 (T280597) (duration: 00m 52s)
[22:54:46] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5006.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
[22:54:53] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+1] phabricator: let phd run on phab1004 [puppet] - 10https://gerrit.wikimedia.org/r/859628 (https://phabricator.wikimedia.org/T280597) (owner: 10Dzahn)
[22:55:01] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] phabricator: let phd run on phab1004 [puppet] - 10https://gerrit.wikimedia.org/r/859628 (https://phabricator.wikimedia.org/T280597) (owner: 10Dzahn)
[22:56:33] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5006.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
[22:56:34] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[22:56:34] <logmsgbot>	 !log sukhe@cumin2002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts cp5006.eqsin.wmnet
[22:57:03] <jinxer-wm>	 (ProbeDown) firing: Service centrallog1001:6514 has failed probes (tcp_rsyslog_receiver_ip6) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1001:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[22:57:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to  and previous config saved to /var/cache/conftool/dbconfig/20221128-225741-marostegui.json
[22:58:09] <icinga-wm>	 PROBLEM - Check systemd state on ms-be1062 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:58:46] <wikibugs>	 (03PS2) 10Dzahn: Revert "Revert "phabricator: switch from phab1001 to phab1004, discovery and SPF"" [dns] - 10https://gerrit.wikimedia.org/r/860032
[22:59:47] <wikibugs>	 (03PS14) 10Andrew Bogott: Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751
[22:59:58] <jinxer-wm>	 (KubernetesAPILatency) firing: (5) High Kubernetes API latency (LIST certificates) on k8s-mlserve@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[23:00:14] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+1] Revert "Revert "phabricator: switch from phab1001 to phab1004, discovery and SPF"" [dns] - 10https://gerrit.wikimedia.org/r/860032 (owner: 10Dzahn)
[23:00:17] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Revert "Revert "phabricator: switch from phab1001 to phab1004, discovery and SPF"" [dns] - 10https://gerrit.wikimedia.org/r/860032 (owner: 10Dzahn)
[23:02:03] <jinxer-wm>	 (ProbeDown) resolved: Service centrallog1001:6514 has failed probes (tcp_rsyslog_receiver_ip6) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1001:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[23:03:24] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751 (owner: 10Andrew Bogott)
[23:05:10] <wikibugs>	 (03PS15) 10Andrew Bogott: Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751
[23:09:52] <wikibugs>	 (03PS1) 10Dzahn: phabricator: quote mysql port numbers [puppet] - 10https://gerrit.wikimedia.org/r/861489 (https://phabricator.wikimedia.org/T280597)
[23:09:58] <jinxer-wm>	 (KubernetesAPILatency) firing: (5) High Kubernetes API latency (LIST certificates) on k8s-mlserve@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[23:11:44] <wikibugs>	 (03PS2) 10Dzahn: phabricator: quote mysql port numbers [puppet] - 10https://gerrit.wikimedia.org/r/861489 (https://phabricator.wikimedia.org/T280597)
[23:12:12] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+1] phabricator: quote mysql port numbers [puppet] - 10https://gerrit.wikimedia.org/r/861489 (https://phabricator.wikimedia.org/T280597) (owner: 10Dzahn)
[23:12:22] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[23:12:23] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] phabricator: quote mysql port numbers [puppet] - 10https://gerrit.wikimedia.org/r/861489 (https://phabricator.wikimedia.org/T280597) (owner: 10Dzahn)
[23:12:24] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[23:12:43] <wikibugs>	 (03CR) 10Dzahn: [V: 03+2 C: 03+2] phabricator: quote mysql port numbers [puppet] - 10https://gerrit.wikimedia.org/r/861489 (https://phabricator.wikimedia.org/T280597) (owner: 10Dzahn)
[23:12:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P41487 and previous config saved to /var/cache/conftool/dbconfig/20221128-231247-marostegui.json
[23:12:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1197 (T323827)', diff saved to https://phabricator.wikimedia.org/P41488 and previous config saved to /var/cache/conftool/dbconfig/20221128-231258-ladsgroup.json
[23:13:05] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827
[23:14:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[23:14:14] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
[23:14:15] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[23:14:20] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[23:14:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1112 (T322618)', diff saved to https://phabricator.wikimedia.org/P41489 and previous config saved to /var/cache/conftool/dbconfig/20221128-231426-ladsgroup.json
[23:14:35] <stashbot>	 T322618: Fix renamed indexes of flaggedrevs_tracking table in production - https://phabricator.wikimedia.org/T322618
[23:15:21] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ms-be1062 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[23:15:29] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[23:15:42] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[23:15:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1098:3317 (T323907)', diff saved to https://phabricator.wikimedia.org/P41490 and previous config saved to /var/cache/conftool/dbconfig/20221128-231548-ladsgroup.json
[23:15:55] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[23:16:17] <wikibugs>	 (03PS1) 10Dzahn: phabricator: change db ports to strings in tools class [puppet] - 10https://gerrit.wikimedia.org/r/861490 (https://phabricator.wikimedia.org/T280597)
[23:16:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2105 (T323907)', diff saved to https://phabricator.wikimedia.org/P41491 and previous config saved to /var/cache/conftool/dbconfig/20221128-231623-ladsgroup.json
[23:16:46] <wikibugs>	 10ops-codfw: ManagementSSHDown - https://phabricator.wikimedia.org/T323960 (10phaultfinder)
[23:16:58] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] phabricator: change db ports to strings in tools class [puppet] - 10https://gerrit.wikimedia.org/r/861490 (https://phabricator.wikimedia.org/T280597) (owner: 10Dzahn)
[23:17:33] <wikibugs>	 (03CR) 10Dzahn: [V: 03+2 C: 03+2] phabricator: change db ports to strings in tools class [puppet] - 10https://gerrit.wikimedia.org/r/861490 (https://phabricator.wikimedia.org/T280597) (owner: 10Dzahn)
[23:17:51] <wikibugs>	 10ops-eqiad: ManagementSSHDown - https://phabricator.wikimedia.org/T323961 (10phaultfinder)
[23:18:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112 (T322618)', diff saved to https://phabricator.wikimedia.org/P41492 and previous config saved to /var/cache/conftool/dbconfig/20221128-231821-ladsgroup.json
[23:19:59] <jinxer-wm>	 (KubernetesAPILatency) firing: (7) High Kubernetes API latency (LIST certificates) on k8s-mlserve@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[23:20:12] <wikibugs>	 (03PS1) 10Dzahn: phabricator: switch mysql slave port for logmail to string [puppet] - 10https://gerrit.wikimedia.org/r/861491 (https://phabricator.wikimedia.org/T280597)
[23:20:45] <wikibugs>	 (03CR) 10Dzahn: [V: 03+2 C: 03+2] phabricator: switch mysql slave port for logmail to string [puppet] - 10https://gerrit.wikimedia.org/r/861491 (https://phabricator.wikimedia.org/T280597) (owner: 10Dzahn)
[23:22:26] <logmsgbot>	 !log brennen@deploy1002 Started deploy [phabricator/deployment@f68dc24]: deploy config changes for mysql-port-as-string (T280597)
[23:22:33] <stashbot>	 T280597: move phabricator to new hardware generation - https://phabricator.wikimedia.org/T280597
[23:23:18] <wikibugs>	 10SRE, 10ops-eqsin, 10DC-Ops, 10Traffic, 10Patch-For-Review: Q2:rack/setup/install/decom eqsin: unified decommission task - https://phabricator.wikimedia.org/T323830 (10ssingh)
[23:23:21] <logmsgbot>	 !log brennen@deploy1002 Finished deploy [phabricator/deployment@f68dc24]: deploy config changes for mysql-port-as-string (T280597) (duration: 00m 55s)
[23:24:11] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1062 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:27:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T321126)', diff saved to https://phabricator.wikimedia.org/P41493 and previous config saved to /var/cache/conftool/dbconfig/20221128-232754-marostegui.json
[23:27:56] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2172.codfw.wmnet with reason: Maintenance
[23:28:02] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[23:28:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P41494 and previous config saved to /var/cache/conftool/dbconfig/20221128-232805-ladsgroup.json
[23:28:09] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2172.codfw.wmnet with reason: Maintenance
[23:28:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2172 (T321126)', diff saved to https://phabricator.wikimedia.org/P41495 and previous config saved to /var/cache/conftool/dbconfig/20221128-232815-marostegui.json
[23:30:09] <wikibugs>	 (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/859631 (https://phabricator.wikimedia.org/T280597) (owner: 10Dzahn)
[23:30:29] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T321126)', diff saved to https://phabricator.wikimedia.org/P41496 and previous config saved to /var/cache/conftool/dbconfig/20221128-233028-marostegui.json
[23:30:40] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] phabricator: move some more settings from host file to common [puppet] - 10https://gerrit.wikimedia.org/r/859631 (https://phabricator.wikimedia.org/T280597) (owner: 10Dzahn)
[23:31:07] <wikibugs>	 (03PS2) 10Dzahn: phabricator: move some more settings from host file to common [puppet] - 10https://gerrit.wikimedia.org/r/859631 (https://phabricator.wikimedia.org/T280597)
[23:31:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P41497 and previous config saved to /var/cache/conftool/dbconfig/20221128-233130-ladsgroup.json
[23:32:10] <logmsgbot>	 !log ebernhardson@deploy1002 Started deploy [search/mjolnir/deploy@d361052]: msearch_daemon: Remove cluster selection/load monitor
[23:32:53] <wikibugs>	 (03PS3) 10Dzahn: mariadb: remove phab1001 from production-m3 grants [puppet] - 10https://gerrit.wikimedia.org/r/858419 (https://phabricator.wikimedia.org/T323418)
[23:32:58] <wikibugs>	 (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/858419 (https://phabricator.wikimedia.org/T323418) (owner: 10Dzahn)
[23:33:02] <logmsgbot>	 !log ebernhardson@deploy1002 Finished deploy [search/mjolnir/deploy@d361052]: msearch_daemon: Remove cluster selection/load monitor (duration: 00m 51s)
[23:33:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P41498 and previous config saved to /var/cache/conftool/dbconfig/20221128-233328-ladsgroup.json
[23:33:48] <wikibugs>	 (03CR) 10BBlack: "PCC says no system changes, just expected unused parameter data removal:" [puppet] - 10https://gerrit.wikimedia.org/r/861463 (https://phabricator.wikimedia.org/T256762) (owner: 10BBlack)
[23:41:40] <wikibugs>	 (03PS1) 10RLazarus: httpbb: Replace URL for metawiki test [puppet] - 10https://gerrit.wikimedia.org/r/861497 (https://phabricator.wikimedia.org/T323707)
[23:41:54] <wikibugs>	 (03PS1) 10Dzahn: phabricator: disable phd running on phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/861498 (https://phabricator.wikimedia.org/T323418)
[23:42:22] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+1] phabricator: disable phd running on phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/861498 (https://phabricator.wikimedia.org/T323418) (owner: 10Dzahn)
[23:42:32] <wikibugs>	 (03CR) 10Dzahn: [V: 03+2 C: 03+2] phabricator: disable phd running on phab1001 [puppet] - 10https://gerrit.wikimedia.org/r/861498 (https://phabricator.wikimedia.org/T323418) (owner: 10Dzahn)
[23:43:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P41499 and previous config saved to /var/cache/conftool/dbconfig/20221128-234311-ladsgroup.json
[23:45:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P41500 and previous config saved to /var/cache/conftool/dbconfig/20221128-234535-marostegui.json
[23:46:21] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ms-be1062 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[23:46:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P41501 and previous config saved to /var/cache/conftool/dbconfig/20221128-234636-ladsgroup.json
[23:48:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P41502 and previous config saved to /var/cache/conftool/dbconfig/20221128-234834-ladsgroup.json
[23:52:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T323907)', diff saved to https://phabricator.wikimedia.org/P41503 and previous config saved to /var/cache/conftool/dbconfig/20221128-235223-ladsgroup.json
[23:52:30] <stashbot>	 T323907: Make fr_user unsigned - https://phabricator.wikimedia.org/T323907
[23:58:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1197 (T323827)', diff saved to https://phabricator.wikimedia.org/P41504 and previous config saved to /var/cache/conftool/dbconfig/20221128-235817-ladsgroup.json
[23:58:19] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[23:58:22] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[23:58:25] <stashbot>	 T323827: Finish timestamp schema changes in flaggedrevs - https://phabricator.wikimedia.org/T323827