[00:03:38] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Data-Engineering: Grant ssh access to analytics-admins to dcausse and gmodena - https://phabricator.wikimedia.org/T323280 (10andrea.denisse) Hello @Gehel , do you approve @Dcausse access to the `analytics-admins` group  ?
[00:05:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P40880 and previous config saved to /var/cache/conftool/dbconfig/20221124-000543-marostegui.json
[00:09:59] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[00:14:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P40881 and previous config saved to /var/cache/conftool/dbconfig/20221124-001435-ladsgroup.json
[00:14:59] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[00:19:39] <icinga-wm>	 PROBLEM - Ganeti memory on ganeti1011 is CRITICAL: CRIT Memory 95% used. Largest process: qemu-system-x86 (30718) = 25.4% https://wikitech.wikimedia.org/wiki/Ganeti%23Memory_pressure
[00:20:07] <icinga-wm>	 PROBLEM - Check systemd state on an-web1001 is CRITICAL: CRITICAL - degraded: The following units failed: hardsync-published.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:20:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P40882 and previous config saved to /var/cache/conftool/dbconfig/20221124-002050-marostegui.json
[00:23:58] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[00:29:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P40883 and previous config saved to /var/cache/conftool/dbconfig/20221124-002941-ladsgroup.json
[00:30:17] <icinga-wm>	 RECOVERY - Check systemd state on an-web1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:35:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2159 (T321126)', diff saved to https://phabricator.wikimedia.org/P40884 and previous config saved to /var/cache/conftool/dbconfig/20221124-003556-marostegui.json
[00:35:58] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2168.codfw.wmnet with reason: Maintenance
[00:36:03] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[00:36:12] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Maintenance
[00:36:18] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2168:3317 (T321126)', diff saved to https://phabricator.wikimedia.org/P40885 and previous config saved to /var/cache/conftool/dbconfig/20221124-003618-marostegui.json
[00:36:54] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] Remove graphite2003 [puppet] - 10https://gerrit.wikimedia.org/r/860071 (https://phabricator.wikimedia.org/T323718) (owner: 10Filippo Giunchedi)
[00:38:51] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T321126)', diff saved to https://phabricator.wikimedia.org/P40886 and previous config saved to /var/cache/conftool/dbconfig/20221124-003850-marostegui.json
[00:38:58] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[00:39:18] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[00:39:43] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
[00:39:44] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[00:40:00] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[00:40:06] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1165 (T323214)', diff saved to https://phabricator.wikimedia.org/P40887 and previous config saved to /var/cache/conftool/dbconfig/20221124-004006-ladsgroup.json
[00:40:12] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[00:43:47] <icinga-wm>	 RECOVERY - Check systemd state on logstash1026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:44:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2158 (T323214)', diff saved to https://phabricator.wikimedia.org/P40888 and previous config saved to /var/cache/conftool/dbconfig/20221124-004448-ladsgroup.json
[00:44:50] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
[00:45:04] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
[00:45:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2169:3316 (T323214)', diff saved to https://phabricator.wikimedia.org/P40889 and previous config saved to /var/cache/conftool/dbconfig/20221124-004510-ladsgroup.json
[00:45:16] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[00:50:41] <icinga-wm>	 PROBLEM - SSH on mw1329.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:53:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P40890 and previous config saved to /var/cache/conftool/dbconfig/20221124-005357-marostegui.json
[00:54:58] <wikibugs>	 (03PS1) 10Andrea Denisse: admin: add dasm to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/860132 (https://phabricator.wikimedia.org/T322591)
[00:55:35] <wikibugs>	 (03PS2) 10Andrea Denisse: admin: add dasm to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/860132 (https://phabricator.wikimedia.org/T322591)
[00:55:58] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[00:55:59] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www-7.4.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method
[00:58:01] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on api_appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=POST
[00:59:21] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for Dasm - https://phabricator.wikimedia.org/T322591 (10andrea.denisse) Hi @Htriedman and @Jcross , could you please help me to confirm that the expiry date for @dasm 's access is on the 2023-06-30? :)
[01:00:34] <wikibugs>	 (03CR) 10Andrea Denisse: "Hello, could you please review my patch?" [puppet] - 10https://gerrit.wikimedia.org/r/860132 (https://phabricator.wikimedia.org/T322591) (owner: 10Andrea Denisse)
[01:00:58] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[01:08:29] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Wenjun Fan - https://phabricator.wikimedia.org/T319056 (10andrea.denisse) Hi @Ottomata , I just want to double check with you, Wenjun's access is ssh-less access to analytics-privatedata-users group, right? If so, to remove thei...
[01:09:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P40891 and previous config saved to /var/cache/conftool/dbconfig/20221124-010903-marostegui.json
[01:09:41] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Wenjun Fan - https://phabricator.wikimedia.org/T319056 (10andrea.denisse) Hi @XenoRyet , do you approve ssh-less access to the `analytics-privatedata-users` group for Wenjun Fan ?
[01:10:59] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] "Mostly LGTM, see below. :) You might want to wait until your expiry_date question is answered on Phab, but feel free to edit and merge wit" [puppet] - 10https://gerrit.wikimedia.org/r/860132 (https://phabricator.wikimedia.org/T322591) (owner: 10Andrea Denisse)
[01:11:13] <rzl>	 denisse|m: ^ hope you don't mind the drive-by :)
[01:13:50] <wikibugs>	 (03PS3) 10Andrea Denisse: admin: add dasm to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/860132 (https://phabricator.wikimedia.org/T322591)
[01:17:40] <wikibugs>	 (03CR) 10Andrea Denisse: admin: add dasm to analytics-privatedata-users (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/860132 (https://phabricator.wikimedia.org/T322591) (owner: 10Andrea Denisse)
[01:18:03] <denisse|m>	 rzl:  On the contrary, thank you!! :D
[01:24:10] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T321126)', diff saved to https://phabricator.wikimedia.org/P40892 and previous config saved to /var/cache/conftool/dbconfig/20221124-012409-marostegui.json
[01:24:12] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2169.codfw.wmnet with reason: Maintenance
[01:24:14] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2169.codfw.wmnet with reason: Maintenance
[01:24:17] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[01:24:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2169:3317 (T321126)', diff saved to https://phabricator.wikimedia.org/P40893 and previous config saved to /var/cache/conftool/dbconfig/20221124-012420-marostegui.json
[01:26:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T321126)', diff saved to https://phabricator.wikimedia.org/P40894 and previous config saved to /var/cache/conftool/dbconfig/20221124-012652-marostegui.json
[01:37:45] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job redis_gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:41:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P40895 and previous config saved to /var/cache/conftool/dbconfig/20221124-014158-marostegui.json
[01:42:45] <jinxer-wm>	 (JobUnavailable) firing: (8) Reduced availability for job nginx in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:44:45] <wikibugs>	 (03PS1) 10RLazarus: httpbb: Bump the timeout for meta:List_of_Wikipedias, at least for now [puppet] - 10https://gerrit.wikimedia.org/r/860136 (https://phabricator.wikimedia.org/T323707)
[01:49:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T323214)', diff saved to https://phabricator.wikimedia.org/P40896 and previous config saved to /var/cache/conftool/dbconfig/20221124-014908-ladsgroup.json
[01:49:15] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[01:51:29] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[01:51:35] <icinga-wm>	 RECOVERY - SSH on mw1329.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:52:27] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] httpbb: Bump the timeout for meta:List_of_Wikipedias, at least for now [puppet] - 10https://gerrit.wikimedia.org/r/860136 (https://phabricator.wikimedia.org/T323707) (owner: 10RLazarus)
[01:52:45] <jinxer-wm>	 (JobUnavailable) firing: (10) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:55:31] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[01:57:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P40897 and previous config saved to /var/cache/conftool/dbconfig/20221124-015705-marostegui.json
[02:04:15] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P40898 and previous config saved to /var/cache/conftool/dbconfig/20221124-020415-ladsgroup.json
[02:07:45] <jinxer-wm>	 (JobUnavailable) firing: (10) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:12:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T321126)', diff saved to https://phabricator.wikimedia.org/P40899 and previous config saved to /var/cache/conftool/dbconfig/20221124-021211-marostegui.json
[02:12:13] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2182.codfw.wmnet with reason: Maintenance
[02:12:18] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[02:12:27] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Maintenance
[02:12:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2182 (T321126)', diff saved to https://phabricator.wikimedia.org/P40900 and previous config saved to /var/cache/conftool/dbconfig/20221124-021233-marostegui.json
[02:15:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2182 (T321126)', diff saved to https://phabricator.wikimedia.org/P40901 and previous config saved to /var/cache/conftool/dbconfig/20221124-021505-marostegui.json
[02:17:45] <jinxer-wm>	 (JobUnavailable) resolved: (10) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:19:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P40902 and previous config saved to /var/cache/conftool/dbconfig/20221124-021921-ladsgroup.json
[02:23:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T323214)', diff saved to https://phabricator.wikimedia.org/P40903 and previous config saved to /var/cache/conftool/dbconfig/20221124-022309-ladsgroup.json
[02:23:16] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[02:30:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P40904 and previous config saved to /var/cache/conftool/dbconfig/20221124-023011-marostegui.json
[02:34:28] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1165 (T323214)', diff saved to https://phabricator.wikimedia.org/P40905 and previous config saved to /var/cache/conftool/dbconfig/20221124-023428-ladsgroup.json
[02:34:30] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[02:34:34] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[02:34:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
[02:35:01] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1168 (T323214)', diff saved to https://phabricator.wikimedia.org/P40906 and previous config saved to /var/cache/conftool/dbconfig/20221124-023500-ladsgroup.json
[02:38:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P40907 and previous config saved to /var/cache/conftool/dbconfig/20221124-023816-ladsgroup.json
[02:40:03] <jinxer-wm>	 (ProbeDown) firing: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:45:03] <jinxer-wm>	 (ProbeDown) resolved: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:45:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P40908 and previous config saved to /var/cache/conftool/dbconfig/20221124-024518-marostegui.json
[02:53:23] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P40909 and previous config saved to /var/cache/conftool/dbconfig/20221124-025322-ladsgroup.json
[03:00:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2182 (T321126)', diff saved to https://phabricator.wikimedia.org/P40910 and previous config saved to /var/cache/conftool/dbconfig/20221124-030025-marostegui.json
[03:00:32] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[03:08:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T323214)', diff saved to https://phabricator.wikimedia.org/P40911 and previous config saved to /var/cache/conftool/dbconfig/20221124-030829-ladsgroup.json
[03:08:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
[03:08:36] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[03:08:55] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
[03:09:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2171:3316 (T323214)', diff saved to https://phabricator.wikimedia.org/P40912 and previous config saved to /var/cache/conftool/dbconfig/20221124-030901-ladsgroup.json
[03:19:58] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[03:39:58] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[03:42:17] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T323214)', diff saved to https://phabricator.wikimedia.org/P40913 and previous config saved to /var/cache/conftool/dbconfig/20221124-034217-ladsgroup.json
[03:42:23] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[03:55:28] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[03:57:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P40914 and previous config saved to /var/cache/conftool/dbconfig/20221124-035723-ladsgroup.json
[04:02:43] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[04:12:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P40915 and previous config saved to /var/cache/conftool/dbconfig/20221124-041230-ladsgroup.json
[04:27:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1168 (T323214)', diff saved to https://phabricator.wikimedia.org/P40916 and previous config saved to /var/cache/conftool/dbconfig/20221124-042736-ladsgroup.json
[04:27:38] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[04:27:43] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[04:27:51] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
[04:27:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1180 (T323214)', diff saved to https://phabricator.wikimedia.org/P40917 and previous config saved to /var/cache/conftool/dbconfig/20221124-042757-ladsgroup.json
[04:42:49] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T323214)', diff saved to https://phabricator.wikimedia.org/P40918 and previous config saved to /var/cache/conftool/dbconfig/20221124-044249-ladsgroup.json
[04:42:56] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[04:57:56] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P40919 and previous config saved to /var/cache/conftool/dbconfig/20221124-045755-ladsgroup.json
[05:13:02] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P40920 and previous config saved to /var/cache/conftool/dbconfig/20221124-051301-ladsgroup.json
[05:16:07] <icinga-wm>	 PROBLEM - Ganeti memory on ganeti1011 is CRITICAL: CRIT Memory 95% used. Largest process: qemu-system-x86 (30718) = 25.4% https://wikitech.wikimedia.org/wiki/Ganeti%23Memory_pressure
[05:17:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T323214)', diff saved to https://phabricator.wikimedia.org/P40921 and previous config saved to /var/cache/conftool/dbconfig/20221124-051749-ladsgroup.json
[05:17:56] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[05:28:08] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T323214)', diff saved to https://phabricator.wikimedia.org/P40922 and previous config saved to /var/cache/conftool/dbconfig/20221124-052808-ladsgroup.json
[05:28:10] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
[05:28:15] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[05:28:24] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
[05:28:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2180 (T323214)', diff saved to https://phabricator.wikimedia.org/P40923 and previous config saved to /var/cache/conftool/dbconfig/20221124-052830-ladsgroup.json
[05:32:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P40924 and previous config saved to /var/cache/conftool/dbconfig/20221124-053256-ladsgroup.json
[05:48:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P40925 and previous config saved to /var/cache/conftool/dbconfig/20221124-054802-ladsgroup.json
[06:03:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1180 (T323214)', diff saved to https://phabricator.wikimedia.org/P40926 and previous config saved to /var/cache/conftool/dbconfig/20221124-060309-ladsgroup.json
[06:03:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
[06:03:16] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[06:03:24] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
[06:03:31] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1187 (T323214)', diff saved to https://phabricator.wikimedia.org/P40927 and previous config saved to /var/cache/conftool/dbconfig/20221124-060330-ladsgroup.json
[06:06:15] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T323117
[06:06:21] <stashbot>	 T323117: Switchover s7 master (db1181 -> db1136) - https://phabricator.wikimedia.org/T323117
[06:06:47] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T323117
[06:07:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Set db1136 with weight 0 T323117', diff saved to https://phabricator.wikimedia.org/P40928 and previous config saved to /var/cache/conftool/dbconfig/20221124-060742-ladsgroup.json
[06:21:24] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Data-Engineering: Grant ssh access to analytics-admins to dcausse and gmodena - https://phabricator.wikimedia.org/T323280 (10Gehel) I approve!
[06:28:55] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for jobrunner on alert1001 is CRITICAL: 249 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[06:29:52] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: citoid: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859487
[06:30:59] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for jobrunner on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[06:50:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1187 (T323214)', diff saved to https://phabricator.wikimedia.org/P40929 and previous config saved to /var/cache/conftool/dbconfig/20221124-065057-ladsgroup.json
[06:51:04] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[06:52:22] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] citoid: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859487 (owner: 10Giuseppe Lavagetto)
[06:56:12] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2118.codfw.wmnet with reason: Maintenance
[06:56:14] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2118.codfw.wmnet with reason: Maintenance
[06:56:47] <wikibugs>	 (03Merged) 10jenkins-bot: citoid: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859487 (owner: 10Giuseppe Lavagetto)
[06:58:16] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: db2174 lost power - https://phabricator.wikimedia.org/T323512 (10Marostegui) Given that it is a public holiday in the US and Papaul won't be onsite till Monday, I am starting replication so the host doesn't get behind that many days. I will stop it again on Monday.
[06:59:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2180 (T323214)', diff saved to https://phabricator.wikimedia.org/P40930 and previous config saved to /var/cache/conftool/dbconfig/20221124-065956-ladsgroup.json
[07:00:05] <jouncebot>	 kormat, marostegui, and Amir1: (Dis)respected human, time to deploy Primary database switchover (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221124T0700). Please do the needful.
[07:00:06] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[07:00:14] <Amir1>	 moin
[07:00:17] <Amir1>	 starting
[07:01:11] <wikibugs>	 (03PS2) 10Ladsgroup: mariadb: Promote db1136 to s7 master [puppet] - 10https://gerrit.wikimedia.org/r/856498 (https://phabricator.wikimedia.org/T323117) (owner: 10Gerrit maintenance bot)
[07:01:15] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] mariadb: Promote db1136 to s7 master [puppet] - 10https://gerrit.wikimedia.org/r/856498 (https://phabricator.wikimedia.org/T323117) (owner: 10Gerrit maintenance bot)
[07:01:18] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] mariadb: Promote db1136 to s7 master [puppet] - 10https://gerrit.wikimedia.org/r/856498 (https://phabricator.wikimedia.org/T323117) (owner: 10Gerrit maintenance bot)
[07:02:04] <Amir1>	 !log Starting s7 eqiad failover from db1181 to db1136 - T323117
[07:02:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:02:10] <stashbot>	 T323117: Switchover s7 master (db1181 -> db1136) - https://phabricator.wikimedia.org/T323117
[07:02:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T323117', diff saved to https://phabricator.wikimedia.org/P40931 and previous config saved to /var/cache/conftool/dbconfig/20221124-070215-ladsgroup.json
[07:02:32] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[07:02:35] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[07:02:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Promote db1136 to s7 primary and set section read-write T323117', diff saved to https://phabricator.wikimedia.org/P40932 and previous config saved to /var/cache/conftool/dbconfig/20221124-070250-ladsgroup.json
[07:02:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[07:03:17] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[07:03:20] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[07:04:17] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[07:04:26] <wikibugs>	 (03PS2) 10Ladsgroup: wmnet: Update s7-master alias [dns] - 10https://gerrit.wikimedia.org/r/856499 (https://phabricator.wikimedia.org/T323117) (owner: 10Gerrit maintenance bot)
[07:04:31] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1099.eqiad.wmnet with reason: Maintenance
[07:04:35] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] wmnet: Update s7-master alias [dns] - 10https://gerrit.wikimedia.org/r/856499 (https://phabricator.wikimedia.org/T323117) (owner: 10Gerrit maintenance bot)
[07:04:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1099:3318 (T321126)', diff saved to https://phabricator.wikimedia.org/P40933 and previous config saved to /var/cache/conftool/dbconfig/20221124-070437-marostegui.json
[07:04:43] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[07:05:00] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/citoid: apply
[07:05:16] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] DONE helmfile.d/services/citoid: apply
[07:05:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depool db1181 T323117', diff saved to https://phabricator.wikimedia.org/P40934 and previous config saved to /var/cache/conftool/dbconfig/20221124-070546-ladsgroup.json
[07:05:47] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/citoid: apply
[07:06:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P40935 and previous config saved to /var/cache/conftool/dbconfig/20221124-070603-ladsgroup.json
[07:06:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T321126)', diff saved to https://phabricator.wikimedia.org/P40936 and previous config saved to /var/cache/conftool/dbconfig/20221124-070645-marostegui.json
[07:06:48] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[07:06:50] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[07:07:45] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/citoid: apply
[07:07:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[07:08:08] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] DONE helmfile.d/services/citoid: apply
[07:09:21] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] START helmfile.d/services/citoid: apply
[07:09:46] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] DONE helmfile.d/services/citoid: apply
[07:12:55] <icinga-wm>	 RECOVERY - swift eqiad object availability low on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=8&fullscreen&orgId=1&var-DC=eqiad
[07:14:44] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[07:14:58] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
[07:15:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P40938 and previous config saved to /var/cache/conftool/dbconfig/20221124-071504-ladsgroup.json
[07:15:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[07:15:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[07:21:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P40939 and previous config saved to /var/cache/conftool/dbconfig/20221124-072110-ladsgroup.json
[07:21:46] <wikibugs>	 (03PS2) 10Stang: wikidatawiki: Add language-specific logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860117 (https://phabricator.wikimedia.org/T323734)
[07:21:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P40940 and previous config saved to /var/cache/conftool/dbconfig/20221124-072152-marostegui.json
[07:28:36] <wikibugs>	 (03PS1) 10Volans: ulsfo mgmt: remove missing netbox include [dns] - 10https://gerrit.wikimedia.org/r/860474
[07:30:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P40941 and previous config saved to /var/cache/conftool/dbconfig/20221124-073011-ladsgroup.json
[07:30:35] <logmsgbot>	 !log phedenskog@deploy1002 Started deploy [performance/navtiming@e421904]: (no justification provided)
[07:30:40] <wikibugs>	 (03CR) 10Volans: [C: 03+2] ulsfo mgmt: remove missing netbox include [dns] - 10https://gerrit.wikimedia.org/r/860474 (owner: 10Volans)
[07:30:44] <logmsgbot>	 !log phedenskog@deploy1002 Finished deploy [performance/navtiming@e421904]: (no justification provided) (duration: 00m 08s)
[07:35:30] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: cxserver: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859488
[07:36:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1187 (T323214)', diff saved to https://phabricator.wikimedia.org/P40942 and previous config saved to /var/cache/conftool/dbconfig/20221124-073616-ladsgroup.json
[07:36:18] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
[07:36:23] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[07:36:31] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
[07:36:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1201 (T323214)', diff saved to https://phabricator.wikimedia.org/P40943 and previous config saved to /var/cache/conftool/dbconfig/20221124-073637-ladsgroup.json
[07:36:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P40944 and previous config saved to /var/cache/conftool/dbconfig/20221124-073658-marostegui.json
[07:41:47] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www-7.4.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method
[07:42:59] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on api_appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=POST
[07:45:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2180 (T323214)', diff saved to https://phabricator.wikimedia.org/P40945 and previous config saved to /var/cache/conftool/dbconfig/20221124-074517-ladsgroup.json
[07:45:24] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[07:52:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T321126)', diff saved to https://phabricator.wikimedia.org/P40946 and previous config saved to /var/cache/conftool/dbconfig/20221124-075205-marostegui.json
[07:52:07] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[07:52:12] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[07:52:20] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[07:52:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1101:3318 (T321126)', diff saved to https://phabricator.wikimedia.org/P40947 and previous config saved to /var/cache/conftool/dbconfig/20221124-075226-marostegui.json
[07:54:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T321126)', diff saved to https://phabricator.wikimedia.org/P40948 and previous config saved to /var/cache/conftool/dbconfig/20221124-075434-marostegui.json
[07:57:02] <wikibugs>	 (03PS1) 10Marostegui: control-mariadb-client-10.5: Delete file [software] - 10https://gerrit.wikimedia.org/r/860477
[07:57:50] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] control-mariadb-client-10.5: Delete file [software] - 10https://gerrit.wikimedia.org/r/860477 (owner: 10Marostegui)
[07:58:20] <wikibugs>	 (03Merged) 10jenkins-bot: control-mariadb-client-10.5: Delete file [software] - 10https://gerrit.wikimedia.org/r/860477 (owner: 10Marostegui)
[08:00:05] <jouncebot>	 Amir1, apergos, and jnuche: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC morning backport and config training. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221124T0800).
[08:00:16] <apergos>	 morning!  there are no trainees signed up this morning and no patches scheduled for deployment in the window. 
[08:00:38] <apergos>	 and this means.... you guessed it... see you next time!  and have a happy holiday, folks in the U.S.
[08:04:43] <moritzm>	 !log rebalance Ganeti group A/codfw following reboots
[08:04:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:09:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P40949 and previous config saved to /var/cache/conftool/dbconfig/20221124-080941-marostegui.json
[08:13:48] <moritzm>	 !log installing tomcat9 security updates
[08:13:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:22:38] <wikibugs>	 (03PS24) 10Jelto: sre.gitlab.upgrade: add cookbook to upgrade GitLab version [cookbooks] - 10https://gerrit.wikimedia.org/r/858999 (https://phabricator.wikimedia.org/T323569)
[08:24:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P40950 and previous config saved to /var/cache/conftool/dbconfig/20221124-082447-marostegui.json
[08:24:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1201 (T323214)', diff saved to https://phabricator.wikimedia.org/P40951 and previous config saved to /var/cache/conftool/dbconfig/20221124-082458-ladsgroup.json
[08:25:04] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[08:26:59] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sre.gitlab.upgrade: add cookbook to upgrade GitLab version [cookbooks] - 10https://gerrit.wikimedia.org/r/858999 (https://phabricator.wikimedia.org/T323569) (owner: 10Jelto)
[08:30:31] <wikibugs>	 (03PS25) 10Jelto: sre.gitlab.upgrade: add cookbook to upgrade GitLab version [cookbooks] - 10https://gerrit.wikimedia.org/r/858999 (https://phabricator.wikimedia.org/T323569)
[08:34:42] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sre.gitlab.upgrade: add cookbook to upgrade GitLab version [cookbooks] - 10https://gerrit.wikimedia.org/r/858999 (https://phabricator.wikimedia.org/T323569) (owner: 10Jelto)
[08:39:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T321126)', diff saved to https://phabricator.wikimedia.org/P40952 and previous config saved to /var/cache/conftool/dbconfig/20221124-083954-marostegui.json
[08:39:56] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1104.eqiad.wmnet with reason: Maintenance
[08:40:01] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[08:40:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P40953 and previous config saved to /var/cache/conftool/dbconfig/20221124-084004-ladsgroup.json
[08:40:09] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1104.eqiad.wmnet with reason: Maintenance
[08:40:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1104 (T321126)', diff saved to https://phabricator.wikimedia.org/P40954 and previous config saved to /var/cache/conftool/dbconfig/20221124-084015-marostegui.json
[08:41:24] <Kizule>	 Hello, I'm unsure where I should ask this, so be free to correct me. :)
[08:41:33] <Kizule>	 I'm unable to extract archive file from https://dumps.wikimedia.org/wikidatawiki/20221120/
[08:41:41] <Kizule>	 ubuntu@ip-172-31-41-196:~$ tar xf wikidatawiki-20221120-pages-articles.xml.bz2
[08:41:41] <Kizule>	 tar (child): lbzip2: Cannot exec: No such file or directory
[08:41:42] <Kizule>	 tar (child): Error is not recoverable: exiting now
[08:41:42] <Kizule>	 tar: Child returned status 2
[08:41:43] <Kizule>	 tar: Error is not recoverable: exiting now
[08:42:24] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1104 (T321126)', diff saved to https://phabricator.wikimedia.org/P40955 and previous config saved to /var/cache/conftool/dbconfig/20221124-084223-marostegui.json
[08:46:07] <Kizule>	 Nevermind. Gotcha. This has solved issue for me: https://svennd.be/lbzip2-cannot-exec-no-such-file-or-directory/
[08:55:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P40956 and previous config saved to /var/cache/conftool/dbconfig/20221124-085511-ladsgroup.json
[08:57:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P40957 and previous config saved to /var/cache/conftool/dbconfig/20221124-085729-marostegui.json
[09:03:36] <wikibugs>	 (03CR) 10Jelto: "This change is ready for review." (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/858999 (https://phabricator.wikimedia.org/T323569) (owner: 10Jelto)
[09:10:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1201 (T323214)', diff saved to https://phabricator.wikimedia.org/P40958 and previous config saved to /var/cache/conftool/dbconfig/20221124-091017-ladsgroup.json
[09:10:19] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[09:10:25] <stashbot>	 T323214: Fix unsigned drifts in flaggedrevs caused by 4c0b3c7b9b0 - https://phabricator.wikimedia.org/T323214
[09:10:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[09:11:21] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] cxserver: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859488 (owner: 10Giuseppe Lavagetto)
[09:12:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P40959 and previous config saved to /var/cache/conftool/dbconfig/20221124-091236-marostegui.json
[09:15:54] <wikibugs>	 (03Merged) 10jenkins-bot: cxserver: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859488 (owner: 10Giuseppe Lavagetto)
[09:17:16] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] Remove graphite2003 [puppet] - 10https://gerrit.wikimedia.org/r/860071 (https://phabricator.wikimedia.org/T323718) (owner: 10Filippo Giunchedi)
[09:17:21] <wikibugs>	 (03PS2) 10Filippo Giunchedi: Remove graphite2003 [puppet] - 10https://gerrit.wikimedia.org/r/860071 (https://phabricator.wikimedia.org/T323718)
[09:20:30] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/cxserver: apply
[09:22:04] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[09:23:36] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[09:24:55] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[09:26:04] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[09:26:45] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[09:27:14] <wikibugs>	 (03PS1) 10Slyngshede: Allow configuration from json file. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/860508
[09:27:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1104 (T321126)', diff saved to https://phabricator.wikimedia.org/P40960 and previous config saved to /var/cache/conftool/dbconfig/20221124-092742-marostegui.json
[09:27:44] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1111.eqiad.wmnet with reason: Maintenance
[09:27:49] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[09:27:58] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1111.eqiad.wmnet with reason: Maintenance
[09:28:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1111 (T321126)', diff saved to https://phabricator.wikimedia.org/P40961 and previous config saved to /var/cache/conftool/dbconfig/20221124-092804-marostegui.json
[09:28:24] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: datahub: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859489
[09:28:32] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: developer-portal: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859490
[09:29:11] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] datahub: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859489 (owner: 10Giuseppe Lavagetto)
[09:29:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1111 (T321126)', diff saved to https://phabricator.wikimedia.org/P40962 and previous config saved to /var/cache/conftool/dbconfig/20221124-092912-marostegui.json
[09:29:17] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] developer-portal: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859490 (owner: 10Giuseppe Lavagetto)
[09:30:24] <wikibugs>	 (03PS2) 10Jaime Nuche: k8s builder: allow deployers to sudo update-mediawiki-tools-release [puppet] - 10https://gerrit.wikimedia.org/r/860121 (https://phabricator.wikimedia.org/T323735) (owner: 10Brennen Bearnes)
[09:33:33] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.hosts.decommission for hosts graphite2003.codfw.wmnet
[09:35:35] <wikibugs>	 (03CR) 10Jaime Nuche: "Thanks Brennen. I've moved the sudo privilege to the K8s builder profile, which is where we were already granting some of these permission" [puppet] - 10https://gerrit.wikimedia.org/r/860121 (https://phabricator.wikimedia.org/T323735) (owner: 10Brennen Bearnes)
[09:38:17] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.dns.netbox
[09:40:25] <logmsgbot>	 !log filippo@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: graphite2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
[09:41:11] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: datahub: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859489
[09:41:13] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: developer-portal: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859490
[09:41:54] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: graphite2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
[09:41:54] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:41:55] <logmsgbot>	 !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts graphite2003.codfw.wmnet
[09:42:10] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "Nice addition! Some issues, questions and comments inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/858999 (https://phabricator.wikimedia.org/T323569) (owner: 10Jelto)
[09:42:31] <wikibugs>	 10ops-codfw, 10decommission-hardware, 10User-fgiunchedi: decommission graphite2003.codfw.wmnet - https://phabricator.wikimedia.org/T323718 (10fgiunchedi)
[09:42:48] <wikibugs>	 10ops-codfw, 10decommission-hardware, 10User-fgiunchedi: decommission graphite2003.codfw.wmnet - https://phabricator.wikimedia.org/T323718 (10fgiunchedi) @Papaul host is ready for decom
[09:44:04] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: mathoid: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860509
[09:44:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P40963 and previous config saved to /var/cache/conftool/dbconfig/20221124-094418-marostegui.json
[09:44:52] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: miscweb: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860510
[09:45:37] <wikibugs>	 (03PS2) 10Slyngshede: Allow configuration from json file. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/860508
[09:46:44] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: recommendation-api: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860511
[09:47:25] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: eventstreams: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860512
[09:48:06] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: wikifeeds: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860513
[09:48:38] <wikibugs>	 (03CR) 10Jaime Nuche: "PCC compilation looks healthy: https://puppet-compiler.wmflabs.org/output/860121/38418/deploy1002.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/860121 (https://phabricator.wikimedia.org/T323735) (owner: 10Brennen Bearnes)
[09:50:39] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] k8s builder: allow deployers to sudo update-mediawiki-tools-release [puppet] - 10https://gerrit.wikimedia.org/r/860121 (https://phabricator.wikimedia.org/T323735) (owner: 10Brennen Bearnes)
[09:52:45] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Data-Engineering: Grant ssh access to analytics-admins to dcausse and gmodena - https://phabricator.wikimedia.org/T323280 (10BTullis) a:03BTullis
[09:54:46] <logmsgbot>	 !log dcaro@cumin1001 START - Cookbook sre.dns.netbox
[09:57:51] <logmsgbot>	 !log dcaro@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Removed AAAA entry for clouddb1013 - dcaro@cumin1001"
[09:58:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[09:58:54] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[09:59:11] <logmsgbot>	 !log dcaro@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Removed AAAA entry for clouddb1013 - dcaro@cumin1001"
[09:59:11] <logmsgbot>	 !log dcaro@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:59:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P40964 and previous config saved to /var/cache/conftool/dbconfig/20221124-095925-marostegui.json
[10:03:37] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: wikifeeds: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860515
[10:04:36] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: kask: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860515
[10:05:41] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: changeprop: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860517
[10:06:00] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: eventgate: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860518
[10:07:14] <wikibugs>	 (03CR) 10Muehlenhoff: Fix typing to allow Python 3.7 support. (031 comment) [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/858457 (owner: 10Slyngshede)
[10:07:56] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: linkrecommendation: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860519
[10:08:14] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: mobileapps: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860520
[10:09:11] <wikibugs>	 (03PS1) 10Filippo Giunchedi: graphite: mirror traffic to graphite1005 [puppet] - 10https://gerrit.wikimedia.org/r/860521 (https://phabricator.wikimedia.org/T318903)
[10:09:13] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: pool graphite1005 for reads [puppet] - 10https://gerrit.wikimedia.org/r/860522 (https://phabricator.wikimedia.org/T318903)
[10:14:27] <wikibugs>	 (03CR) 10Muehlenhoff: Allow configuration from json file. (032 comments) [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/860508 (owner: 10Slyngshede)
[10:14:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1111 (T321126)', diff saved to https://phabricator.wikimedia.org/P40965 and previous config saved to /var/cache/conftool/dbconfig/20221124-101431-marostegui.json
[10:14:33] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1114.eqiad.wmnet with reason: Maintenance
[10:14:38] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[10:14:46] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1114.eqiad.wmnet with reason: Maintenance
[10:14:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1114 (T321126)', diff saved to https://phabricator.wikimedia.org/P40966 and previous config saved to /var/cache/conftool/dbconfig/20221124-101452-marostegui.json
[10:16:43] <logmsgbot>	 !log dcaro@cumin1001 START - Cookbook sre.dns.netbox
[10:16:53] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] datahub: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859489 (owner: 10Giuseppe Lavagetto)
[10:17:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1114 (T321126)', diff saved to https://phabricator.wikimedia.org/P40967 and previous config saved to /var/cache/conftool/dbconfig/20221124-101701-marostegui.json
[10:17:55] <wikibugs>	 (03PS1) 10Btullis: Add dcausse and gmodena to analytics-admins [puppet] - 10https://gerrit.wikimedia.org/r/860523 (https://phabricator.wikimedia.org/T323280)
[10:19:13] <icinga-wm>	 PROBLEM - Ganeti memory on ganeti1011 is CRITICAL: CRIT Memory 95% used. Largest process: qemu-system-x86 (30718) = 25.4% https://wikitech.wikimedia.org/wiki/Ganeti%23Memory_pressure
[10:19:24] <logmsgbot>	 !log dcaro@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Removed AAAA entry for all clouddbs - dcaro@cumin1001"
[10:20:45] <logmsgbot>	 !log dcaro@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Removed AAAA entry for all clouddbs - dcaro@cumin1001"
[10:20:45] <logmsgbot>	 !log dcaro@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:21:27] <wikibugs>	 (03Merged) 10jenkins-bot: datahub: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859489 (owner: 10Giuseppe Lavagetto)
[10:23:49] <logmsgbot>	 !log cmooney@cumin1001 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[10:23:52] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.dns.netbox
[10:24:16] <wikibugs>	 (03PS2) 10Btullis: Add dcausse and gmodena to analytics-admins [puppet] - 10https://gerrit.wikimedia.org/r/860523 (https://phabricator.wikimedia.org/T323280)
[10:25:12] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:26:05] <icinga-wm>	 RECOVERY - Ensure mysql credential creation for tools users is running on labstore1004 is OK: OK - maintain-dbusers is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[10:27:16] <wikibugs>	 (03CR) 10Btullis: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38419/console" [puppet] - 10https://gerrit.wikimedia.org/r/860523 (https://phabricator.wikimedia.org/T323280) (owner: 10Btullis)
[10:29:18] <wikibugs>	 (03CR) 10Btullis: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38420/console" [puppet] - 10https://gerrit.wikimedia.org/r/860523 (https://phabricator.wikimedia.org/T323280) (owner: 10Btullis)
[10:32:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P40968 and previous config saved to /var/cache/conftool/dbconfig/20221124-103207-marostegui.json
[10:32:10] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] developer-portal: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859490 (owner: 10Giuseppe Lavagetto)
[10:32:47] <wikibugs>	 (03PS1) 10Filippo Giunchedi: dcops: switch mgmt down alerts to open tasks [alerts] - 10https://gerrit.wikimedia.org/r/860525 (https://phabricator.wikimedia.org/T310266)
[10:33:00] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] hadoop: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/840144 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[10:37:05] <wikibugs>	 (03Merged) 10jenkins-bot: developer-portal: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/859490 (owner: 10Giuseppe Lavagetto)
[10:41:30] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] spicerack: add monitoring for sre.puppet.netbox-sync (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/860019 (owner: 10Jbond)
[10:41:54] <akosiaris>	 !log reboot rdb1010, rdb1012, rdb2008, rdb2010 for kerne upgrades. All are redis replicas, there should be no impact.
[10:41:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:44:33] <icinga-wm>	 PROBLEM - Host rdb2010 is DOWN: PING CRITICAL - Packet loss = 100%
[10:44:53] <icinga-wm>	 RECOVERY - Host rdb2010 is UP: PING OK - Packet loss = 0%, RTA = 33.10 ms
[10:47:15] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P40969 and previous config saved to /var/cache/conftool/dbconfig/20221124-104714-marostegui.json
[10:51:33] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: datahub: convert subcharts to modules too [deployment-charts] - 10https://gerrit.wikimedia.org/r/860530
[10:51:39] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Minor question, but overall LGTM" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/857672 (https://phabricator.wikimedia.org/T320552) (owner: 10Clément Goubert)
[10:52:53] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] datahub: convert subcharts to modules too [deployment-charts] - 10https://gerrit.wikimedia.org/r/860530 (owner: 10Giuseppe Lavagetto)
[10:57:09] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: datahub: convert subcharts to modules too [deployment-charts] - 10https://gerrit.wikimedia.org/r/860530
[10:57:13] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] dumps/distribution: add more data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/852260 (owner: 10Dzahn)
[11:00:04] <jouncebot>	 mvolz: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Services – Citoid / Zotero . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221124T1100).
[11:02:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1114 (T321126)', diff saved to https://phabricator.wikimedia.org/P40970 and previous config saved to /var/cache/conftool/dbconfig/20221124-110220-marostegui.json
[11:02:22] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1116.eqiad.wmnet with reason: Maintenance
[11:02:28] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[11:02:36] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1116.eqiad.wmnet with reason: Maintenance
[11:02:38] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1126.eqiad.wmnet with reason: Maintenance
[11:02:39] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/860523 (https://phabricator.wikimedia.org/T323280) (owner: 10Btullis)
[11:02:52] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1126.eqiad.wmnet with reason: Maintenance
[11:02:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1126 (T321126)', diff saved to https://phabricator.wikimedia.org/P40971 and previous config saved to /var/cache/conftool/dbconfig/20221124-110258-marostegui.json
[11:04:06] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126 (T321126)', diff saved to https://phabricator.wikimedia.org/P40972 and previous config saved to /var/cache/conftool/dbconfig/20221124-110405-marostegui.json
[11:05:00] <wikibugs>	 (03PS1) 10Jbond: Revert "pki: move root common settings to profile" [puppet] - 10https://gerrit.wikimedia.org/r/860488
[11:05:32] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] datahub: convert subcharts to modules too [deployment-charts] - 10https://gerrit.wikimedia.org/r/860530 (owner: 10Giuseppe Lavagetto)
[11:05:57] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Revert "pki: move root common settings to profile" [puppet] - 10https://gerrit.wikimedia.org/r/860488 (owner: 10Jbond)
[11:06:20] <wikibugs>	 (03CR) 10Btullis: [V: 03+1] "For future reference: the linked ticket was created by @ottomata and so his approval was inferred from this fact. https://phabricator.wiki" [puppet] - 10https://gerrit.wikimedia.org/r/860523 (https://phabricator.wikimedia.org/T323280) (owner: 10Btullis)
[11:06:26] <wikibugs>	 (03CR) 10Btullis: [V: 03+1 C: 03+2] Add dcausse and gmodena to analytics-admins [puppet] - 10https://gerrit.wikimedia.org/r/860523 (https://phabricator.wikimedia.org/T323280) (owner: 10Btullis)
[11:07:25] <wikibugs>	 (03PS2) 10Jbond: Revert "pki: move root common settings to profile" [puppet] - 10https://gerrit.wikimedia.org/r/860488
[11:07:42] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] Revert "pki: move root common settings to profile" [puppet] - 10https://gerrit.wikimedia.org/r/860488 (owner: 10Jbond)
[11:10:12] <wikibugs>	 (03Merged) 10jenkins-bot: datahub: convert subcharts to modules too [deployment-charts] - 10https://gerrit.wikimedia.org/r/860530 (owner: 10Giuseppe Lavagetto)
[11:10:40] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] systemd::timer::job: update documentation and fix minor lint issues [puppet] - 10https://gerrit.wikimedia.org/r/860074 (owner: 10Jbond)
[11:10:43] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] systemd::timer::job: add monitoring_url to unit file [puppet] - 10https://gerrit.wikimedia.org/r/860075 (owner: 10Jbond)
[11:11:04] <wikibugs>	 (03PS4) 10Jbond: spicerack: add monitoring for sre.puppet.netbox-sync [puppet] - 10https://gerrit.wikimedia.org/r/860019
[11:14:06] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Data-Engineering, 10Patch-For-Review: Grant ssh access to analytics-admins to dcausse and gmodena - https://phabricator.wikimedia.org/T323280 (10BTullis) 05Open→03Resolved @dcausse, @gmodena - Welcome to the `analytics-admins` group!  Please take suitable care with your...
[11:16:19] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[11:18:09] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[11:18:15] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[11:19:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P40973 and previous config saved to /var/cache/conftool/dbconfig/20221124-111912-marostegui.json
[11:22:36] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
[11:25:37] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
[11:28:50] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[11:28:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: (2) Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag  - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[11:31:18] <logmsgbot>	 !log jbond@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2050.codfw.wmnet with OS bullseye
[11:31:25] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[11:31:39] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[11:31:44] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[11:33:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: (2) Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag  - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[11:34:11] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: datahub: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/860534
[11:34:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P40974 and previous config saved to /var/cache/conftool/dbconfig/20221124-113418-marostegui.json
[11:34:28] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] datahub: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/860534 (owner: 10Giuseppe Lavagetto)
[11:39:13] <wikibugs>	 (03Merged) 10jenkins-bot: datahub: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/860534 (owner: 10Giuseppe Lavagetto)
[11:39:33] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[11:40:04] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[11:43:06] <logmsgbot>	 !log jbond@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2050.codfw.wmnet with OS bullseye
[11:43:12] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[11:44:37] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[11:45:38] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] DONE helmfile.d/services/datahub: sync on main
[11:46:35] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] START helmfile.d/services/datahub: apply on main
[11:48:12] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
[11:49:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1126 (T321126)', diff saved to https://phabricator.wikimedia.org/P40976 and previous config saved to /var/cache/conftool/dbconfig/20221124-114925-marostegui.json
[11:49:27] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1167.eqiad.wmnet with reason: Maintenance
[11:49:32] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[11:49:41] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1167.eqiad.wmnet with reason: Maintenance
[11:49:42] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[11:49:58] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[11:50:02] <icinga-wm>	 PROBLEM - Ganeti memory on ganeti1011 is CRITICAL: CRIT Memory 95% used. Largest process: qemu-system-x86 (30718) = 25.4% https://wikitech.wikimedia.org/wiki/Ganeti%23Memory_pressure
[11:50:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1167 (T321126)', diff saved to https://phabricator.wikimedia.org/P40977 and previous config saved to /var/cache/conftool/dbconfig/20221124-115004-marostegui.json
[11:51:04] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/datahub: apply on main
[11:52:41] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: cloudvirt1044: move to modern NIC setup [puppet] - 10https://gerrit.wikimedia.org/r/860535 (https://phabricator.wikimedia.org/T319184)
[11:52:44] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
[11:56:01] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10jbond) >>! In T308677#8417608, @MoritzMuehlenhoff wrote: >>>! In T308677#8346...
[11:57:58] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] cloudvirt1044: move to modern NIC setup [puppet] - 10https://gerrit.wikimedia.org/r/860535 (https://phabricator.wikimedia.org/T319184) (owner: 10Arturo Borrero Gonzalez)
[11:58:15] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10jbond)  > This seems to be a more generic issue with partman creating the sow...
[11:59:02] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bullseye
[11:59:12] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1001 for host cloudvirt1044.eqiad.wmnet with O...
[11:59:26] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudvirt1044: move to modern NIC setup [puppet] - 10https://gerrit.wikimedia.org/r/860535 (https://phabricator.wikimedia.org/T319184) (owner: 10Arturo Borrero Gonzalez)
[12:05:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T321126)', diff saved to https://phabricator.wikimedia.org/P40978 and previous config saved to /var/cache/conftool/dbconfig/20221124-120514-marostegui.json
[12:05:21] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[12:07:57] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[12:08:04] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[12:08:35] * jbond overly optomistic this is the one that will work 
[12:09:13] * volans crossing fingers ;)
[12:12:24] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
[12:12:48] * jbond thanks vol.ans 
[12:13:16] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero)
[12:15:16] <logmsgbot>	 !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
[12:17:34] <logmsgbot>	 !log jbond@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2050.codfw.wmnet with OS bullseye
[12:17:40] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[12:18:20] * jbond :( naughty d-i why have you decided you no longer have driveres for the controler :@ !
[12:18:27] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[12:18:33] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[12:18:49] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:20:13] <icinga-wm>	 PROBLEM - Ganeti memory on ganeti1011 is CRITICAL: CRIT Memory 95% used. Largest process: qemu-system-x86 (30718) = 25.4% https://wikitech.wikimedia.org/wiki/Ganeti%23Memory_pressure
[12:20:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P40979 and previous config saved to /var/cache/conftool/dbconfig/20221124-122020-marostegui.json
[12:22:52] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 1:00:00 on idp-test1002.wikimedia.org with reason: Testing some changes, service will be down from time to time
[12:23:08] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on idp-test1002.wikimedia.org with reason: Testing some changes, service will be down from time to time
[12:24:04] <wikibugs>	 (03PS3) 10Slyngshede: Allow configuration from json file. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/860508
[12:24:13] <wikibugs>	 (03CR) 10Slyngshede: Allow configuration from json file. (032 comments) [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/860508 (owner: 10Slyngshede)
[12:25:16] <wikibugs>	 (03PS4) 10Slyngshede: Allow configuration from json file. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/860508
[12:30:55] <wikibugs>	 (03PS5) 10Slyngshede: Allow configuration from json file. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/860508
[12:30:59] <wikibugs>	 (03PS2) 10Volans: tox.ini: drop support for python3.7/3.8 [cookbooks] - 10https://gerrit.wikimedia.org/r/850038 (owner: 10Jbond)
[12:32:25] <wikibugs>	 (03CR) 10Volans: [C: 03+2] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/850038 (owner: 10Jbond)
[12:33:46] <wikibugs>	 (03CR) 10Muehlenhoff: Enable profile::auto_restarts::service for virtlogd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/859980 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[12:35:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P40980 and previous config saved to /var/cache/conftool/dbconfig/20221124-123527-marostegui.json
[12:37:22] <wikibugs>	 (03PS2) 10Slyngshede: Fix typing to allow Python 3.7 support. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/858457
[12:37:52] <wikibugs>	 (03CR) 10Slyngshede: Fix typing to allow Python 3.7 support. (031 comment) [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/858457 (owner: 10Slyngshede)
[12:38:04] <logmsgbot>	 !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bullseye
[12:38:14] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1001 for host cloudvirt1044.eqiad.wmnet with OS bu...
[12:42:10] <logmsgbot>	 !log jbond@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2050.codfw.wmnet with OS bullseye
[12:42:16] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[12:42:50] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[12:42:56] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[12:46:06] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:47:26] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/858457 (owner: 10Slyngshede)
[12:47:59] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+2] Fix typing to allow Python 3.7 support. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/858457 (owner: 10Slyngshede)
[12:48:01] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+2 C: 03+2] Fix typing to allow Python 3.7 support. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/858457 (owner: 10Slyngshede)
[12:48:31] <wikibugs>	 (03PS6) 10Slyngshede: Allow configuration from json file. [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/860508
[12:50:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T321126)', diff saved to https://phabricator.wikimedia.org/P40981 and previous config saved to /var/cache/conftool/dbconfig/20221124-125033-marostegui.json
[12:50:35] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[12:50:41] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[12:50:49] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[12:50:51] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1172.eqiad.wmnet with reason: Maintenance
[12:51:05] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1172.eqiad.wmnet with reason: Maintenance
[12:51:07] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+1] "lgtm" [deployment-charts] - 10https://gerrit.wikimedia.org/r/860517 (owner: 10Giuseppe Lavagetto)
[12:51:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1172 (T321126)', diff saved to https://phabricator.wikimedia.org/P40982 and previous config saved to /var/cache/conftool/dbconfig/20221124-125111-marostegui.json
[12:52:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T321126)', diff saved to https://phabricator.wikimedia.org/P40983 and previous config saved to /var/cache/conftool/dbconfig/20221124-125218-marostegui.json
[12:56:40] <wikibugs>	 (03PS11) 10Arturo Borrero Gonzalez: cookbooks: wmcs: cloudvirt: add cookbook to maintain canary VMs [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114
[12:57:01] <wikibugs>	 (03PS1) 10Muehlenhoff: Migrate service definitions to CasRegisteredService [puppet] - 10https://gerrit.wikimedia.org/r/860551 (https://phabricator.wikimedia.org/T311235)
[12:57:28] <wikibugs>	 (03PS2) 10Jbond: install_server: fix config for ms-be dynamic partition [puppet] - 10https://gerrit.wikimedia.org/r/860114
[13:01:14] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2050.codfw.wmnet with reason: host reimage
[13:02:48] <moritzm>	 !log installing glibc security updates on buster
[13:02:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:04:06] <wikibugs>	 (03PS4) 10Stang: zhwiki: Revert 20 years logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/858709 (https://phabricator.wikimedia.org/T320859)
[13:04:18] <jnuche>	 jouncebot: next
[13:04:18] <jouncebot>	 In 0 hour(s) and 55 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221124T1400)
[13:04:18] <jouncebot>	 In 0 hour(s) and 55 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221124T1400)
[13:04:45] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2050.codfw.wmnet with reason: host reimage
[13:07:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P40984 and previous config saved to /var/cache/conftool/dbconfig/20221124-130725-marostegui.json
[13:07:29] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] Enable profile::auto_restarts::service for virtlogd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/859980 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[13:09:08] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
[13:09:35] <jnuche>	 Lucas_WMDE, urbanecm: I see there's a couple of patches in the upcoming backport window, backports yesterday were affected by https://phabricator.wikimedia.org/T323735
[13:09:54] <jnuche>	 the problem should be fixed now, but I'll be around in case it happens again
[13:10:16] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
[13:11:41] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
[13:12:50] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
[13:13:48] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.wikireplicas.add-wiki
[13:16:03] <wikibugs>	 (03PS1) 10Muehlenhoff: sre.misc-clusters.roll-restart-reboot-eventschemas: Also restart envoyproxy [cookbooks] - 10https://gerrit.wikimedia.org/r/860556
[13:16:58] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on an-worker1090 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough Btullis T318659 - Added more downtime, but replacement batteries are on their way https://wikitech.wikimedia.org/wiki/MegaCli%23
[13:16:58] <icinga-wm>	 ng
[13:18:16] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: wmcs: proxy: resolve home directory in the puppet ca path [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860557
[13:18:47] <wikibugs>	 (03CR) 10Jbond: "lgtm, will ping the task when cloud is updated.  We could possibly also use the cas_version fact if that ends up taking longer for some re" [puppet] - 10https://gerrit.wikimedia.org/r/860551 (https://phabricator.wikimedia.org/T311235) (owner: 10Muehlenhoff)
[13:20:31] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] "LGTM" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860557 (owner: 10Arturo Borrero Gonzalez)
[13:21:16] <wikibugs>	 (03CR) 10Muehlenhoff: Migrate service definitions to CasRegisteredService (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/860551 (https://phabricator.wikimedia.org/T311235) (owner: 10Muehlenhoff)
[13:22:18] <wikibugs>	 (03CR) 10FNegri: [C: 03+1] wmcs: proxy: resolve home directory in the puppet ca path [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860557 (owner: 10Arturo Borrero Gonzalez)
[13:22:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P40985 and previous config saved to /var/cache/conftool/dbconfig/20221124-132231-marostegui.json
[13:22:35] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2050.codfw.wmnet with OS bullseye
[13:22:41] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[13:22:46] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] wmcs: proxy: resolve home directory in the puppet ca path [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860557 (owner: 10Arturo Borrero Gonzalez)
[13:22:50] <wikibugs>	 (03PS5) 10Stang: zhwiki: Revert 20 years logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/858709 (https://phabricator.wikimedia.org/T320859)
[13:26:25] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs: proxy: resolve home directory in the puppet ca path [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860557 (owner: 10Arturo Borrero Gonzalez)
[13:28:33] <wikibugs>	 (03PS1) 10Muehlenhoff: Enable profile::auto_restarts::service for Envoy on planet [puppet] - 10https://gerrit.wikimedia.org/r/860560 (https://phabricator.wikimedia.org/T135991)
[13:30:20] <moritzm>	 !log restarting slapd on serpens/seaborgium
[13:30:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:30:47] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[13:30:54] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[13:37:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T321126)', diff saved to https://phabricator.wikimedia.org/P40986 and previous config saved to /var/cache/conftool/dbconfig/20221124-133738-marostegui.json
[13:37:40] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1177.eqiad.wmnet with reason: Maintenance
[13:37:44] <logmsgbot>	 !log jbond@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2050.codfw.wmnet with OS bullseye
[13:37:48] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[13:37:51] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[13:37:53] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1177.eqiad.wmnet with reason: Maintenance
[13:38:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1177 (T321126)', diff saved to https://phabricator.wikimedia.org/P40987 and previous config saved to /var/cache/conftool/dbconfig/20221124-133759-marostegui.json
[13:38:43] <logmsgbot>	 !log btullis@cumin1001 Added views for new wiki: igwiktionary T314645
[13:38:43] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
[13:38:49] <stashbot>	 T314645: Prepare and check storage layer for igwiktionary - https://phabricator.wikimedia.org/T314645
[13:39:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T321126)', diff saved to https://phabricator.wikimedia.org/P40988 and previous config saved to /var/cache/conftool/dbconfig/20221124-133907-marostegui.json
[13:39:22] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Enable profile::auto_restarts::service for virtlogd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/859980 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[13:43:45] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[13:43:48] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: wmcs: proxy: use port [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860563
[13:43:52] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[13:52:41] <wikibugs>	 (03PS1) 10Muehlenhoff: Add a cookbook to restart/reboot ncredir nodes [cookbooks] - 10https://gerrit.wikimedia.org/r/860564
[13:53:51] <btullis>	 !log Removed unused and expiring kafka_jumbo certificates. T323697
[13:53:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:53:58] <stashbot>	 T323697: Update kafka-jumbo certificates - https://phabricator.wikimedia.org/T323697
[13:54:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P40989 and previous config saved to /var/cache/conftool/dbconfig/20221124-135413-marostegui.json
[13:55:23] <wikibugs>	 (03PS2) 10Filippo Giunchedi: dcops: switch mgmt down alerts to open tasks [alerts] - 10https://gerrit.wikimedia.org/r/860525 (https://phabricator.wikimedia.org/T310266)
[13:59:19] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/developer-portal: apply
[13:59:45] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] DONE helmfile.d/services/developer-portal: apply
[14:00:04] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221124T1400)
[14:00:04] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, and awight: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221124T1400).
[14:00:05] <jouncebot>	 cirno: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:39] <cirno>	 o/
[14:03:38] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] changeprop: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860517 (owner: 10Giuseppe Lavagetto)
[14:05:15] <wikibugs>	 (03CR) 10Clément Goubert: Add a new production image for otelcol (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/857672 (https://phabricator.wikimedia.org/T320552) (owner: 10Clément Goubert)
[14:06:00] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] kask: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860515 (owner: 10Giuseppe Lavagetto)
[14:08:09] <wikibugs>	 (03Merged) 10jenkins-bot: changeprop: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860517 (owner: 10Giuseppe Lavagetto)
[14:09:20] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P40990 and previous config saved to /var/cache/conftool/dbconfig/20221124-140920-marostegui.json
[14:10:11] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] install_server: fix config for ms-be dynamic partition [puppet] - 10https://gerrit.wikimedia.org/r/860114 (owner: 10Jbond)
[14:10:33] <wikibugs>	 (03Merged) 10jenkins-bot: kask: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860515 (owner: 10Giuseppe Lavagetto)
[14:10:45] <jbond>	 moritzm: ok to merge yours
[14:11:17] <moritzm>	 sorry, yes please go ahead
[14:11:31] <jbond>	 done
[14:11:53] <logmsgbot>	 !log jbond@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2050.codfw.wmnet with OS bullseye
[14:11:59] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[14:13:14] <Lucas_WMDE>	 o/
[14:13:23] <Lucas_WMDE>	 looks like nobody’s doing the backport window yet?
[14:13:32] <Lucas_WMDE>	 in which case I can deploy
[14:13:46] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2050.codfw.wmnet with OS bullseye
[14:13:53] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jbond...
[14:14:06] <Lucas_WMDE>	 cirno: new nickname? ^^
[14:14:21] <cirno>	 :P
[14:15:10] <wikibugs>	 (03PS1) 10Slyngshede: WIP C:ldap::client::utils Rewrite add-ldap-group [puppet] - 10https://gerrit.wikimedia.org/r/860568
[14:15:56] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "Mostly a doubt about the chosen UID. Otherwise lgtm." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/857672 (https://phabricator.wikimedia.org/T320552) (owner: 10Clément Goubert)
[14:15:58] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/860556 (owner: 10Muehlenhoff)
[14:16:26] <wikibugs>	 (03PS1) 10Muehlenhoff: Enable profile::auto_restarts::service for prometheus-ipmi-exporter [puppet] - 10https://gerrit.wikimedia.org/r/860569 (https://phabricator.wikimedia.org/T135991)
[14:17:30] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by lucaswerkmeister-wmde@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860117 (https://phabricator.wikimedia.org/T323734) (owner: 10Stang)
[14:18:11] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
[14:18:50] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
[14:19:12] <wikibugs>	 (03Merged) 10jenkins-bot: wikidatawiki: Add language-specific logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860117 (https://phabricator.wikimedia.org/T323734) (owner: 10Stang)
[14:19:32] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [[gerrit:860117|wikidatawiki: Add language-specific logos (T323734)]]
[14:19:38] <stashbot>	 T323734: Move language-specific logos from Commons.css to logos.php at wikidatawiki - https://phabricator.wikimedia.org/T323734
[14:20:52] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde and stang: Backport for [[gerrit:860117|wikidatawiki: Add language-specific logos (T323734)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
[14:20:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[14:21:17] <Lucas_WMDE>	 cirno: please test
[14:21:30] <cirno>	 looking
[14:21:32] <Lucas_WMDE>	 for my part, I see an Arabic logo on https://www.wikidata.org/wiki/Wikidata:Main_Page?uselang=ar&safemode=1 now (safemode to bypass common.css) that wasn’t there before, so that part seems to be working
[14:22:30] <cirno>	 I tested all 9 sites mentioned in this patch, all of them looks fine to me
[14:22:40] <Lucas_WMDE>	 hm, the English logo gets smaller on my end
[14:22:52] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: Add a new production image for otelcol (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/857672 (https://phabricator.wikimedia.org/T320552) (owner: 10Clément Goubert)
[14:23:21] <Lucas_WMDE>	 left is mwdebug, right without https://usercontent.irccloud-cdn.com/file/KRyOK27P/image.png
[14:23:39] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[14:24:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T321126)', diff saved to https://phabricator.wikimedia.org/P40991 and previous config saved to /var/cache/conftool/dbconfig/20221124-142426-marostegui.json
[14:24:28] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1178.eqiad.wmnet with reason: Maintenance
[14:24:33] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[14:24:35] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[14:24:36] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[14:24:41] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1178.eqiad.wmnet with reason: Maintenance
[14:24:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1178 (T321126)', diff saved to https://phabricator.wikimedia.org/P40992 and previous config saved to /var/cache/conftool/dbconfig/20221124-142447-marostegui.json
[14:24:55] <cirno>	 it is smaller, the old logo's width is larger than 135px
[14:25:10] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "Couple details, LGTM otherwise." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/857672 (https://phabricator.wikimedia.org/T320552) (owner: 10Clément Goubert)
[14:25:36] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[14:25:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[14:26:08] <Lucas_WMDE>	 yeah
[14:26:28] <Lucas_WMDE>	 and nothing in the task or commit message said that making the logo smaller was supposed to be part of it :/
[14:27:35] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
[14:27:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T321126)', diff saved to https://phabricator.wikimedia.org/P40993 and previous config saved to /var/cache/conftool/dbconfig/20221124-142756-marostegui.json
[14:28:31] <icinga-wm>	 PROBLEM - Ganeti memory on ganeti1011 is CRITICAL: CRIT Memory 95% used. Largest process: qemu-system-x86 (30718) = 25.4% https://wikitech.wikimedia.org/wiki/Ganeti%23Memory_pressure
[14:28:52] <cirno>	 strange, from my side this does not change the appearance, (the right for mwdebug1001 https://usercontent.irccloud-cdn.com/file/cuIEoPto/image.png
[14:29:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [software/bitu-ldap] - 10https://gerrit.wikimedia.org/r/860508 (owner: 10Slyngshede)
[14:29:05] <Lucas_WMDE>	 cirno: did you force-reload as well?
[14:29:06] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
[14:29:25] <cirno>	 yeah, I pressed Ctrl+Shift+R
[14:29:38] <wikibugs>	 (03PS1) 10Filippo Giunchedi: icinga: decom mgmt monitoring [puppet] - 10https://gerrit.wikimedia.org/r/860572 (https://phabricator.wikimedia.org/T310266)
[14:29:40] <wikibugs>	 (03PS1) 10Filippo Giunchedi: icinga: move mgmt_parents to icinga [puppet] - 10https://gerrit.wikimedia.org/r/860573 (https://phabricator.wikimedia.org/T310266)
[14:29:42] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: remove mgmt_contactgroups [puppet] - 10https://gerrit.wikimedia.org/r/860574 (https://phabricator.wikimedia.org/T310266)
[14:29:56] <Lucas_WMDE>	 ah, I think you’re at 150% zoom or something?
[14:30:03] <Lucas_WMDE>	 at 150% the difference vanishes on my end too
[14:30:24] <Lucas_WMDE>	 same for 200%
[14:30:28] <Lucas_WMDE>	 it only exists on 100%, it seems
[14:30:29] <cirno>	 I'm at 200% zoom
[14:30:57] <Lucas_WMDE>	 s/it only exists/the difference only exists/
[14:31:44] <logmsgbot>	 !log jbond@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2050.codfw.wmnet with reason: host reimage
[14:31:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[14:32:37] <Lucas_WMDE>	 I’ll sync the change now
[14:32:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] icinga: decom mgmt monitoring [puppet] - 10https://gerrit.wikimedia.org/r/860572 (https://phabricator.wikimedia.org/T310266) (owner: 10Filippo Giunchedi)
[14:33:17] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] icinga: move mgmt_parents to icinga [puppet] - 10https://gerrit.wikimedia.org/r/860573 (https://phabricator.wikimedia.org/T310266) (owner: 10Filippo Giunchedi)
[14:33:29] <cirno>	 I see, if unclick the `background-size`rule for 150%, it does become bigger
[14:33:38] <cirno>	 https://usercontent.irccloud-cdn.com/file/PC0BK2dv/image.png
[14:34:09] <cirno>	 thanks
[14:34:17] <Lucas_WMDE>	 I don’t see that rule, strange
[14:34:21] <Lucas_WMDE>	 ah
[14:34:25] <Lucas_WMDE>	 webkit-min-device-pixel-ratio
[14:34:33] <Lucas_WMDE>	 apple hidpi screen shenanigans I guess
[14:35:13] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2050.codfw.wmnet with reason: host reimage
[14:35:18] <Lucas_WMDE>	 oh wait, now I see the rule you mean
[14:35:21] <Lucas_WMDE>	 at 150% zoom still
[14:35:22] <Lucas_WMDE>	 yeah
[14:36:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[14:36:56] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:860117|wikidatawiki: Add language-specific logos (T323734)]] (duration: 17m 24s)
[14:37:02] <wikibugs>	 (03PS2) 10Filippo Giunchedi: icinga: decom mgmt monitoring [puppet] - 10https://gerrit.wikimedia.org/r/860572 (https://phabricator.wikimedia.org/T310266)
[14:37:02] <stashbot>	 T323734: Move language-specific logos from Commons.css to logos.php at wikidatawiki - https://phabricator.wikimedia.org/T323734
[14:37:04] <wikibugs>	 (03PS2) 10Filippo Giunchedi: icinga: move mgmt_parents to icinga [puppet] - 10https://gerrit.wikimedia.org/r/860573 (https://phabricator.wikimedia.org/T310266)
[14:37:06] <wikibugs>	 (03PS2) 10Filippo Giunchedi: hieradata: remove mgmt_contactgroups [puppet] - 10https://gerrit.wikimedia.org/r/860574 (https://phabricator.wikimedia.org/T310266)
[14:37:44] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] "LGTM, does this work for you? (looks like it should)" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860563 (owner: 10Arturo Borrero Gonzalez)
[14:37:55] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/wikidatawiki%s.png\n' '' '-1.5x' '-2x' | mwscript purgeList.php # T323734
[14:38:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:39:34] <icinga-wm>	 PROBLEM - Ganeti memory on ganeti1011 is CRITICAL: CRIT Memory 95% used. Largest process: qemu-system-x86 (30718) = 25.4% https://wikitech.wikimedia.org/wiki/Ganeti%23Memory_pressure
[14:40:16] <Lucas_WMDE>	 cirno: in the zhwiki change, is it intentional that the non-20y SVGs also change?
[14:40:34] <Lucas_WMDE>	 I looked at wikipedia-tagline-zh-hans.svg and it even seems to change script
[14:42:17] <cirno>	 I just re-download those file from commons and compress them, it should not change something
[14:43:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P40994 and previous config saved to /var/cache/conftool/dbconfig/20221124-144303-marostegui.json
[14:43:13] <Lucas_WMDE>	 if I look at `static/images/mobile/copyright/wikipedia-tagline-zh-hans.svg` on master, it’s definitely in some Chinese script (I can’t say if simplified or traditional)
[14:43:21] <Lucas_WMDE>	 oh wait
[14:43:24] <Lucas_WMDE>	 no, sorry
[14:43:27] <Lucas_WMDE>	 the script doesn’t change
[14:43:34] <Lucas_WMDE>	 eog just silently moves ahead to the next file??
[14:43:39] <Lucas_WMDE>	 so I was looking at `zh_min_nan.svg` now
[14:43:44] <wikibugs>	 (03PS1) 10Muehlenhoff: Enable profile::auto_restarts::service for envoyproxy on Grafana [puppet] - 10https://gerrit.wikimedia.org/r/860576 (https://phabricator.wikimedia.org/T135991)
[14:43:51] <Lucas_WMDE>	 let me look again
[14:43:59] <cirno>	 0_o
[14:44:47] <Lucas_WMDE>	 okay, they are identical
[14:44:50] <Lucas_WMDE>	 eog just confused me
[14:44:55] <Lucas_WMDE>	 (gnome’s image viewer… “eye of gnome”)
[14:45:11] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] mathoid: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860509 (owner: 10Giuseppe Lavagetto)
[14:45:19] <cirno>	 (I use a vscode plugin to preview those files
[14:45:51] <wikibugs>	 (03PS6) 10Lucas Werkmeister (WMDE): zhwiki: Revert 20 years logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/858709 (https://phabricator.wikimedia.org/T320859) (owner: 10Stang)
[14:46:39] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
[14:47:01] <wikibugs>	 (03CR) 10Vgutierrez: Add a cookbook to restart/reboot ncredir nodes (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/860564 (owner: 10Muehlenhoff)
[14:47:23] <Lucas_WMDE>	 (I actually wanted to open both copies in eog, but after I opened eog on master and downloaded the change, I noticed that eog now showed something else)
[14:47:26] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] linkrecommendation: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860519 (owner: 10Giuseppe Lavagetto)
[14:47:45] <Lucas_WMDE>	 (so i just assumed that it was now showing the new file content, when it actually showed a different file – perhaps because Git momentarily removed the old file before creating the new version)
[14:48:41] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
[14:49:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[14:50:45] <claime>	 !log updating package otelcol-contrib to 0.66.0 in component thirdparty/otelcol-contrib
[14:50:48] <Lucas_WMDE>	 wow zuul is very busy at the moment
[14:50:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:03] <Lucas_WMDE>	 cirno: fyi I’m waiting for the test build to pass before I +2 the zh change
[14:51:05] <cirno>	 yeah, queued for 4 minutes...
[14:51:46] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
[14:52:17] <Lucas_WMDE>	 huge depends-on chain at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/860125 apparently
[14:52:24] <Lucas_WMDE>	 (and all that for a change that’s DO NOT MERGE)
[14:52:24] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
[14:52:28] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (PATCH inferenceservices) on k8s-mlserve@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-mlserve - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[14:52:31] <logmsgbot>	 !log jbond@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2050.codfw.wmnet with OS bullseye
[14:52:34] <Lucas_WMDE>	 ok now it’s running
[14:52:45] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jbond@cum...
[14:53:20] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] START helmfile.d/services/mathoid: apply
[14:53:28] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] DONE helmfile.d/services/mathoid: apply
[14:53:40] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10
[14:54:47] <_joe_>	 jelto: ^^
[14:54:53] <_joe_>	 seems like zuul is stuck
[14:54:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[14:55:41] <wikibugs>	 (03PS1) 10Muehlenhoff: Enable profile::auto_restarts::service for Envoy on debmonitor [puppet] - 10https://gerrit.wikimedia.org/r/860579 (https://phabricator.wikimedia.org/T135991)
[14:55:47] <_joe_>	 hashar: you as well :)
[14:55:48] <wikibugs>	 (03Merged) 10jenkins-bot: mathoid: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860509 (owner: 10Giuseppe Lavagetto)
[14:56:07] <Lucas_WMDE>	 completely stuck or just very full?
[14:56:07] <wikibugs>	 (03PS2) 10Muehlenhoff: Enable profile::auto_restarts::service for envoyproxy on Grafana [puppet] - 10https://gerrit.wikimedia.org/r/860576 (https://phabricator.wikimedia.org/T135991)
[14:56:10] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
[14:56:14] <Lucas_WMDE>	 I just saw a patch leave gate-and-submit fwiw
[14:56:27] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] zhwiki: Revert 20 years logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/858709 (https://phabricator.wikimedia.org/T320859) (owner: 10Stang)
[14:56:36] <icinga-wm>	 PROBLEM - Ganeti memory on ganeti1011 is CRITICAL: CRIT Memory 95% used. Largest process: qemu-system-x86 (30718) = 25.4% https://wikitech.wikimedia.org/wiki/Ganeti%23Memory_pressure
[14:56:37] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
[14:57:26] <_joe_>	 Lucas_WMDE: it's very slow
[14:57:28] <jinxer-wm>	 (KubernetesAPILatency) firing: (2) High Kubernetes API latency (PATCH inferenceservices) on k8s-mlserve@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[14:57:37] <_joe_>	 and I think some qwueues are actually stuck
[14:57:42] <Lucas_WMDE>	 ok
[14:57:47] <_joe_>	 but I don't want to start debugging zuul tbh :P
[14:58:10] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P40995 and previous config saved to /var/cache/conftool/dbconfig/20221124-145810-marostegui.json
[14:58:19] <moritzm>	 !log rebalance Ganeti group C/eqiad T311687
[14:58:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:58:24] <stashbot>	 T311687: Upgrade ganeti/eqiad to Bullseye - https://phabricator.wikimedia.org/T311687
[14:58:25] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] START helmfile.d/services/mathoid: apply
[14:58:29] <wikibugs>	 (03Merged) 10jenkins-bot: zhwiki: Revert 20 years logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/858709 (https://phabricator.wikimedia.org/T320859) (owner: 10Stang)
[14:59:04] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] DONE helmfile.d/services/mathoid: apply
[14:59:26] <Lucas_WMDE>	 oh damn, I +2ed instead of using scap backport
[14:59:27] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/mathoid: apply
[14:59:30] <Lucas_WMDE>	 oh well, let’s do it manually
[14:59:43] <wikibugs>	 (03Merged) 10jenkins-bot: linkrecommendation: convert to modules [deployment-charts] - 10https://gerrit.wikimedia.org/r/860519 (owner: 10Giuseppe Lavagetto)
[14:59:47] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] DONE helmfile.d/services/mathoid: apply
[15:00:02] <Lucas_WMDE>	 cirno: the change is on mwdebug1001 (and only there), can you test it?
[15:00:23] <Lucas_WMDE>	 (can’t be bothered to SSH into the other three mwdebug servers where `scap backport` would also have deployed the change for testing)
[15:00:32] <cirno>	 looking
[15:00:35] <wikibugs>	 (03PS1) 10Hnowlan: Add tinyrgb colour profile [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/860580 (https://phabricator.wikimedia.org/T233196)
[15:00:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[15:01:07] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/mathoid: apply
[15:01:19] <wikibugs>	 (03PS7) 10Clément Goubert: Add a new production image for otelcol [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/857672 (https://phabricator.wikimedia.org/T320552)
[15:01:21] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Enable profile::auto_restarts::service for envoyproxy on Grafana [puppet] - 10https://gerrit.wikimedia.org/r/860576 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[15:01:35] <cirno>	 Lucas_WMDE: tested on legacy vector and vector-2022, both looks fine to me
[15:01:48] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
[15:01:49] <Lucas_WMDE>	 great, thanks
[15:02:26] <Lucas_WMDE>	 oh damn and the window is already over
[15:02:27] <Lucas_WMDE>	 jouncebot: now
[15:02:27] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 57 minute(s)
[15:02:28] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (PATCH inferenceservices) on k8s-mlserve@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-mlserve - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[15:02:29] <Lucas_WMDE>	 phew
[15:02:36] <Lucas_WMDE>	 those three syncs will take a bit
[15:02:54] <Lucas_WMDE>	 I’m syncing config.yaml first (which I suspect is a prod noop), then logos.php, then static/
[15:03:13] <Lucas_WMDE>	 so that the old logos.php doesn’t reference files that are already being deleted
[15:03:21] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
[15:03:53] <logmsgbot>	 !log isaranto@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
[15:03:55] <wikibugs>	 (03PS8) 10Clément Goubert: Add a new production image for otelcol [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/857672 (https://phabricator.wikimedia.org/T320552)
[15:04:06] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] START helmfile.d/services/linkrecommendation: apply
[15:04:28] <wikibugs>	 (03CR) 10Clément Goubert: Add a new production image for otelcol (033 comments) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/857672 (https://phabricator.wikimedia.org/T320552) (owner: 10Clément Goubert)
[15:04:34] <logmsgbot>	 !log oblivian@deploy1002 helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
[15:05:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[15:06:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[15:06:52] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[15:06:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[15:07:02] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized logos/config.yaml: Config: [[gerrit:858709|zhwiki: Revert 20 years logos (T320859)]] (1/3) (duration: 04m 41s)
[15:07:07] <stashbot>	 T320859: Requesting temporary logo change for zh.wikipedia.org - https://phabricator.wikimedia.org/T320859
[15:07:27] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[15:09:03] <icinga-wm>	 PROBLEM - SSH on mw1327.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:11:48] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/logos.php: Config: [[gerrit:858709|zhwiki: Revert 20 years logos (T320859)]] (2/3) (duration: 04m 34s)
[15:12:31] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[15:13:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[15:13:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[15:13:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T321126)', diff saved to https://phabricator.wikimedia.org/P40996 and previous config saved to /var/cache/conftool/dbconfig/20221124-151316-marostegui.json
[15:13:18] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1192.eqiad.wmnet with reason: Maintenance
[15:13:28] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[15:13:32] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1192.eqiad.wmnet with reason: Maintenance
[15:13:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1192 (T321126)', diff saved to https://phabricator.wikimedia.org/P40997 and previous config saved to /var/cache/conftool/dbconfig/20221124-151338-marostegui.json
[15:13:59] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[15:14:43] <wikibugs>	 (03PS1) 10Jbond: install_server: migrate ms-bs_simple top GPT [puppet] - 10https://gerrit.wikimedia.org/r/860581 (https://phabricator.wikimedia.org/T308677)
[15:14:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1192 (T321126)', diff saved to https://phabricator.wikimedia.org/P40998 and previous config saved to /var/cache/conftool/dbconfig/20221124-151445-marostegui.json
[15:16:54] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1002 Synchronized static/images/: Config: [[gerrit:858709|zhwiki: Revert 20 years logos (T320859)]] (3/3) (duration: 04m 43s)
[15:17:00] <stashbot>	 T320859: Requesting temporary logo change for zh.wikipedia.org - https://phabricator.wikimedia.org/T320859
[15:17:18] <Lucas_WMDE>	 !log lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-%s.svg\n' {tagline-zh{,-hans},wordmark-zh-hans} | mwscript purgeList.php # T320859
[15:17:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:17:37] <Lucas_WMDE>	 cirno: can you quickly check that things aren’t horribly broken without mwdebug now? ^^
[15:18:09] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wmcs: proxy: use port [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860563 (owner: 10Arturo Borrero Gonzalez)
[15:18:31] <cirno>	 Lucas_WMDE: it works fine from my side
[15:18:39] <wikibugs>	 (03PS1) 10TK-999: mcrouter: Specify missing CXXFLAGS [debs/mcrouter] - 10https://gerrit.wikimedia.org/r/860584
[15:19:02] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[15:19:08] <Lucas_WMDE>	 yay thanks
[15:19:13] <Lucas_WMDE>	 !log UTC afternoon backport+config window done
[15:19:17] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
[15:19:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:19:39] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[15:19:40] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[15:20:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[15:21:31] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): WMCS Cookbook Automation Q2 tracking task - https://phabricator.wikimedia.org/T319401 (10fnegri)
[15:24:55] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
[15:25:02] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
[15:25:26] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM." [debs/mcrouter] - 10https://gerrit.wikimedia.org/r/860584 (owner: 10TK-999)
[15:25:54] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
[15:29:15] <wikibugs>	 (03PS6) 10Effie Mouzeli: WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102
[15:29:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P40999 and previous config saved to /var/cache/conftool/dbconfig/20221124-152952-marostegui.json
[15:30:32] <SandraEbele>	 !log Started deployment of refinery as part of weekly deployment train
[15:30:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:31:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102 (owner: 10Effie Mouzeli)
[15:32:14] <logmsgbot>	 !log ebysans@deploy1002 Started deploy [analytics/refinery@1bfb89f]: Regular analytics weekly train [analytics/refinery@1bfb89f]
[15:32:28] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10jbond) ok with T308677#8419843 and T308677#8420119 i have now managed to succ...
[15:32:48] <jbond>	 Emperor: FYI ^^^
[15:41:13] <wikibugs>	 (03PS1) 10Muehlenhoff: Set profile::contacts::role_contacts for role analytics_cluster::coordinator::replica [puppet] - 10https://gerrit.wikimedia.org/r/860608
[15:41:20] <logmsgbot>	 !log ebysans@deploy1002 Finished deploy [analytics/refinery@1bfb89f]: Regular analytics weekly train [analytics/refinery@1bfb89f] (duration: 09m 06s)
[15:42:41] <logmsgbot>	 !log ebysans@deploy1002 Started deploy [analytics/refinery@1bfb89f] (thin): Regular analytics weekly train THIN [analytics/refinery@1bfb89f]
[15:42:45] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Set profile::contacts::role_contacts for role analytics_cluster::coordinator::replica [puppet] - 10https://gerrit.wikimedia.org/r/860608 (owner: 10Muehlenhoff)
[15:42:48] <logmsgbot>	 !log ebysans@deploy1002 Finished deploy [analytics/refinery@1bfb89f] (thin): Regular analytics weekly train THIN [analytics/refinery@1bfb89f] (duration: 00m 07s)
[15:43:15] <logmsgbot>	 !log ebysans@deploy1002 Started deploy [analytics/refinery@1bfb89f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1bfb89f]
[15:44:15] <wikibugs>	 (03PS2) 10Muehlenhoff: Set role_contacts for role analytics_cluster::coordinator::replica [puppet] - 10https://gerrit.wikimedia.org/r/860608
[15:44:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P41000 and previous config saved to /var/cache/conftool/dbconfig/20221124-154458-marostegui.json
[15:45:16] <logmsgbot>	 !log ebysans@deploy1002 Finished deploy [analytics/refinery@1bfb89f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1bfb89f] (duration: 02m 00s)
[15:45:53] <wikibugs>	 (03PS1) 10Filippo Giunchedi: o11y: more lenient logstash kafka consumer lag [alerts] - 10https://gerrit.wikimedia.org/r/860609
[15:49:26] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Thank you Andrea! I'll let you merge as needed" [puppet] - 10https://gerrit.wikimedia.org/r/854952 (https://phabricator.wikimedia.org/T322670) (owner: 10Filippo Giunchedi)
[15:49:30] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: wmcs: openstack: neutron: mark several commands as safe [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860610
[15:50:31] <wikibugs>	 (03PS1) 10Ssingh: Release 0.35-2 [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/860612 (https://phabricator.wikimedia.org/T321309)
[15:50:46] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: wmcs: openstack: neutron: mark several commands as safe [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860610
[15:54:01] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www-7.4.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method
[15:56:05] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on api_appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=POST
[15:58:55] <icinga-wm>	 PROBLEM - SSH on mw1312.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:00:00] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] admin: add dpujol to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/854952 (https://phabricator.wikimedia.org/T322670) (owner: 10Filippo Giunchedi)
[16:00:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1192 (T321126)', diff saved to https://phabricator.wikimedia.org/P41001 and previous config saved to /var/cache/conftool/dbconfig/20221124-160005-marostegui.json
[16:00:07] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1193.eqiad.wmnet with reason: Maintenance
[16:00:12] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[16:00:20] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1193.eqiad.wmnet with reason: Maintenance
[16:00:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1193 (T321126)', diff saved to https://phabricator.wikimedia.org/P41002 and previous config saved to /var/cache/conftool/dbconfig/20221124-160026-marostegui.json
[16:02:02] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] "LGTM" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860610 (owner: 10Arturo Borrero Gonzalez)
[16:02:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1193 (T321126)', diff saved to https://phabricator.wikimedia.org/P41003 and previous config saved to /var/cache/conftool/dbconfig/20221124-160234-marostegui.json
[16:08:25] <hashar>	 _joe_: sorry I missed your ping about zuul.
[16:08:38] <hashar>	 looks like it had a spam of change https://grafana.wikimedia.org/d/000000322/zuul-gearman?viewPanel=10&from=now-24h&to=now&orgId=1
[16:08:45] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wmcs: openstack: neutron: mark several commands as safe [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860610 (owner: 10Arturo Borrero Gonzalez)
[16:09:11] <hashar>	 it will processes even eventually
[16:09:55] <icinga-wm>	 RECOVERY - SSH on mw1327.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:10:09] <wikibugs>	 (03CR) 10Vgutierrez: [C: 04-1] "we need to bump dependencies in setup.py first" [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/860612 (https://phabricator.wikimedia.org/T321309) (owner: 10Ssingh)
[16:11:49] <SandraEbele>	 !log killed webrequest-druid-hourly-coord for restart as part of weekly deployment train
[16:11:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:16] <SandraEbele>	 !log successfully restarted webrequest-druid-hourly-coord for restart as part of weekly deployment train.
[16:13:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:15:32] <SandraEbele>	 !log killed webrequest-druid-daily-coord for restart as part of weekly deployment train.
[16:15:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:15:48] <wikibugs>	 (03CR) 10MVernon: [C: 03+1] "This looks good to me now, thank you so much for doing this!" [puppet] - 10https://gerrit.wikimedia.org/r/859592 (https://phabricator.wikimedia.org/T308677) (owner: 10Jbond)
[16:17:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P41004 and previous config saved to /var/cache/conftool/dbconfig/20221124-161741-marostegui.json
[16:18:19] <wikibugs>	 (03PS12) 10Arturo Borrero Gonzalez: cookbooks: wmcs: cloudvirt: add cookbook to maintain canary VMs [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114
[16:22:17] <SandraEbele>	 !log successfully restarted webrequest-druid-daily-coord as part of weekly deployment train.
[16:22:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:28:01] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops, 10Platform Team Workboards (Platform Engineering Reliability): tinyrgb is distributed via puppet - https://phabricator.wikimedia.org/T323775 (10hnowlan)
[16:29:22] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10MatthewVernon) ms-be2050 looks good to me now, thank you :)  I think any appr...
[16:29:36] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: wmcs: openstack: common: allow arbitrary flavor names [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860621
[16:29:59] <wikibugs>	 (03CR) 10Daniel Kinzler: api-gateway: expose restbase /api/ endpoint (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/852165 (https://phabricator.wikimedia.org/T322152) (owner: 10Hnowlan)
[16:30:33] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops, 10Platform Team Workboards (Platform Engineering Reliability): tinyrgb is distributed via puppet - https://phabricator.wikimedia.org/T323775 (10hnowlan) 05Open→03In progress
[16:30:37] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops, and 2 others: Migrate thumbor to Kubernetes - https://phabricator.wikimedia.org/T233196 (10hnowlan)
[16:32:04] <wikibugs>	 (03PS3) 10David Caro: toolforge harbor: update certs with acmechief [puppet] - 10https://gerrit.wikimedia.org/r/728629 (https://phabricator.wikimedia.org/T267616) (owner: 10Bstorm)
[16:32:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P41006 and previous config saved to /var/cache/conftool/dbconfig/20221124-163247-marostegui.json
[16:34:37] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: wmcs: openstack: common: allow arbitrary flavor names [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860621
[16:41:44] <wikibugs>	 (03PS13) 10Arturo Borrero Gonzalez: cookbooks: wmcs: cloudvirt: add cookbook to maintain canary VMs [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859114
[16:44:55] <wikibugs>	 (03CR) 10Vgutierrez: [C: 04-1] toolforge harbor: update certs with acmechief (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/728629 (https://phabricator.wikimedia.org/T267616) (owner: 10Bstorm)
[16:45:51] <wikibugs>	 (03CR) 10David Caro: "I would avoid giving this option until is actually needed. That helps avoid creating flavors with custom names unless it's strictly necess" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860621 (owner: 10Arturo Borrero Gonzalez)
[16:47:53] <wikibugs>	 (03PS1) 10David Caro: harbor: remove support for <bullseye [puppet] - 10https://gerrit.wikimedia.org/r/860623 (https://phabricator.wikimedia.org/T267616)
[16:47:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1193 (T321126)', diff saved to https://phabricator.wikimedia.org/P41008 and previous config saved to /var/cache/conftool/dbconfig/20221124-164754-marostegui.json
[16:47:56] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1203.eqiad.wmnet with reason: Maintenance
[16:48:01] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[16:48:09] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1203.eqiad.wmnet with reason: Maintenance
[16:48:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1203 (T321126)', diff saved to https://phabricator.wikimedia.org/P41009 and previous config saved to /var/cache/conftool/dbconfig/20221124-164815-marostegui.json
[16:49:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1203 (T321126)', diff saved to https://phabricator.wikimedia.org/P41010 and previous config saved to /var/cache/conftool/dbconfig/20221124-164923-marostegui.json
[16:49:56] <wikibugs>	 (03CR) 10David Caro: toolforge harbor: update certs with acmechief (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/728629 (https://phabricator.wikimedia.org/T267616) (owner: 10Bstorm)
[16:50:29] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: wmcs: openstack: common: allow arbitrary flavor names (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860621 (owner: 10Arturo Borrero Gonzalez)
[16:53:47] <wikibugs>	 (03PS7) 10Effie Mouzeli: WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102
[16:54:30] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102 (owner: 10Effie Mouzeli)
[16:55:29] <wikibugs>	 (03CR) 10David Caro: wmcs: openstack: common: allow arbitrary flavor names (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860621 (owner: 10Arturo Borrero Gonzalez)
[16:55:44] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/860579 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[16:56:53] <wikibugs>	 (03CR) 10David Caro: wmcs: openstack: common: allow arbitrary flavor names (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860621 (owner: 10Arturo Borrero Gonzalez)
[16:57:21] <wikibugs>	 (03PS8) 10Effie Mouzeli: WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102
[16:58:31] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: wmcs: openstack: common: allow arbitrary flavor names (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860621 (owner: 10Arturo Borrero Gonzalez)
[16:58:34] <wikibugs>	 (03Abandoned) 10Arturo Borrero Gonzalez: wmcs: openstack: common: allow arbitrary flavor names [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/860621 (owner: 10Arturo Borrero Gonzalez)
[16:58:56] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops, 10Platform Team Workboards (Platform Engineering Reliability): byte/str mismatch TypeError when converting any STL file - https://phabricator.wikimedia.org/T323781 (10hnowlan)
[16:59:34] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102 (owner: 10Effie Mouzeli)
[16:59:36] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops, 10Platform Team Workboards (Platform Engineering Reliability): byte/str mismatch TypeError when converting any STL file - https://phabricator.wikimedia.org/T323781 (10hnowlan)
[16:59:49] <icinga-wm>	 RECOVERY - SSH on mw1312.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:00:05] <jouncebot>	 jbond and rzl: #bothumor I � Unicode. All rise for Puppet request window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221124T1700).
[17:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[17:00:09] <wikibugs>	 (03PS1) 10Urbanecm: GrowthExperiments: Remove non-existent variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860624
[17:01:34] * urbanecm is going to push some cleanup live
[17:01:36] <logmsgbot>	 !log urbanecm@deploy1002 backport aborted:  (duration: 00m 01s)
[17:01:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by urbanecm@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860624 (owner: 10Urbanecm)
[17:01:53] <wikibugs>	 (03PS1) 10Clément Goubert: C:vopsbot: Notify service on config change [puppet] - 10https://gerrit.wikimedia.org/r/860625
[17:03:02] <wikibugs>	 (03Merged) 10jenkins-bot: GrowthExperiments: Remove non-existent variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/860624 (owner: 10Urbanecm)
[17:03:15] <logmsgbot>	 !log urbanecm@deploy1002 Started scap: Backport for [[gerrit:860624|GrowthExperiments: Remove non-existent variables]]
[17:03:43] <wikibugs>	 (03PS2) 10Urbanecm: GrowthExperiments: Remove unused GEHomepageNewAccountVariants config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/859995 (owner: 10Kosta Harlan)
[17:04:25] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "beta-only, no-op for prod" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/859995 (owner: 10Kosta Harlan)
[17:04:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P41011 and previous config saved to /var/cache/conftool/dbconfig/20221124-170429-marostegui.json
[17:04:58] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38427/console" [puppet] - 10https://gerrit.wikimedia.org/r/860625 (owner: 10Clément Goubert)
[17:05:35] <wikibugs>	 (03PS1) 10David Caro: harbor: remove unused harbor::db module/role [puppet] - 10https://gerrit.wikimedia.org/r/860627 (https://phabricator.wikimedia.org/T267616)
[17:05:39] <wikibugs>	 (03Merged) 10jenkins-bot: GrowthExperiments: Remove unused GEHomepageNewAccountVariants config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/859995 (owner: 10Kosta Harlan)
[17:05:44] <wikibugs>	 (03CR) 10Clément Goubert: C:vopsbot: Notify service on config change [puppet] - 10https://gerrit.wikimedia.org/r/860625 (owner: 10Clément Goubert)
[17:06:01] <wikibugs>	 (03CR) 10Clément Goubert: [V: 03+1] "PCC OK" [puppet] - 10https://gerrit.wikimedia.org/r/860625 (owner: 10Clément Goubert)
[17:07:04] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[17:07:56] <wikibugs>	 (03PS1) 10Hnowlan: thumbor: correct tinyrgb path [deployment-charts] - 10https://gerrit.wikimedia.org/r/860628 (https://phabricator.wikimedia.org/T323775)
[17:08:16] <wikibugs>	 (03PS2) 10Hnowlan: Add tinyrgb colour profile [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/860580 (https://phabricator.wikimedia.org/T323775)
[17:08:19] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[17:08:20] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[17:08:22] <wikibugs>	 (03PS9) 10Effie Mouzeli: WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102
[17:08:41] <logmsgbot>	 !log urbanecm@deploy1002 Finished scap: Backport for [[gerrit:860624|GrowthExperiments: Remove non-existent variables]] (duration: 05m 25s)
[17:09:05] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102 (owner: 10Effie Mouzeli)
[17:09:08] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by urbanecm@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/859995 (owner: 10Kosta Harlan)
[17:09:16] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[17:14:27] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[17:14:50] <wikibugs>	 (03PS10) 10Effie Mouzeli: WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102
[17:15:28] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[17:15:29] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[17:15:58] <wikibugs>	 (03PS11) 10Effie Mouzeli: WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102
[17:16:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[17:18:17] <wikibugs>	 (03PS1) 10Hnowlan: Fix TypeError when prepending string to STL files [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/860632 (https://phabricator.wikimedia.org/T323781)
[17:19:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P41012 and previous config saved to /var/cache/conftool/dbconfig/20221124-171936-marostegui.json
[17:22:25] <wikibugs>	 (03PS12) 10Effie Mouzeli: WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102
[17:26:29] <wikibugs>	 (03PS13) 10Effie Mouzeli: WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102
[17:30:06] <wikibugs>	 (03PS14) 10Effie Mouzeli: WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102
[17:32:11] <wikibugs>	 (03PS1) 10Hnowlan: maps: remove Cassandra and Tilerator service [puppet] - 10https://gerrit.wikimedia.org/r/860634 (https://phabricator.wikimedia.org/T298246)
[17:32:21] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102 (owner: 10Effie Mouzeli)
[17:34:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1203 (T321126)', diff saved to https://phabricator.wikimedia.org/P41013 and previous config saved to /var/cache/conftool/dbconfig/20221124-173442-marostegui.json
[17:34:44] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[17:34:47] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
[17:34:49] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[17:34:49] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2098.codfw.wmnet with reason: Maintenance
[17:35:03] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2098.codfw.wmnet with reason: Maintenance
[17:35:07] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2100.codfw.wmnet with reason: Maintenance
[17:35:32] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2100.codfw.wmnet with reason: Maintenance
[17:35:36] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2152.codfw.wmnet with reason: Maintenance
[17:35:50] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2152.codfw.wmnet with reason: Maintenance
[17:35:56] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2152 (T321126)', diff saved to https://phabricator.wikimedia.org/P41014 and previous config saved to /var/cache/conftool/dbconfig/20221124-173556-marostegui.json
[17:37:06] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2152 (T321126)', diff saved to https://phabricator.wikimedia.org/P41015 and previous config saved to /var/cache/conftool/dbconfig/20221124-173706-marostegui.json
[17:37:47] <wikibugs>	 (03CR) 10Hnowlan: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38435/console" [puppet] - 10https://gerrit.wikimedia.org/r/860634 (https://phabricator.wikimedia.org/T298246) (owner: 10Hnowlan)
[17:42:58] <wikibugs>	 (03CR) 10Vlad.shapik: [C: 03+1] Fix TypeError when prepending string to STL files [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/860632 (https://phabricator.wikimedia.org/T323781) (owner: 10Hnowlan)
[17:52:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P41016 and previous config saved to /var/cache/conftool/dbconfig/20221124-175212-marostegui.json
[17:55:19] <wikibugs>	 (03CR) 10Andrea Denisse: [C: 03+1] "LGTM, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/860576 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[17:59:13] <wikibugs>	 (03PS15) 10Effie Mouzeli: WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102
[18:00:05] <jouncebot>	 bd808: How many deployers does it take to do Technical Engagement weekly deploy (Toolhub, Developer portal, Striker) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221124T1800).
[18:01:41] <wikibugs>	 (03PS1) 10MSantos: Bump proton to 2022-11-24-154643-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/860635
[18:05:50] <wikibugs>	 (03CR) 10Hnowlan: api-gateway: expose restbase /api/ endpoint (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/852165 (https://phabricator.wikimedia.org/T322152) (owner: 10Hnowlan)
[18:07:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P41017 and previous config saved to /var/cache/conftool/dbconfig/20221124-180719-marostegui.json
[18:07:41] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/860634 (https://phabricator.wikimedia.org/T298246) (owner: 10Hnowlan)
[18:08:55] <wikibugs>	 (03PS2) 10Hnowlan: api-gateway: expose restbase /api/ endpoint [deployment-charts] - 10https://gerrit.wikimedia.org/r/852165 (https://phabricator.wikimedia.org/T322152)
[18:11:48] <wikibugs>	 (03CR) 10MSantos: [C: 03+2] Bump proton to 2022-11-24-154643-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/860635 (owner: 10MSantos)
[18:12:18] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] Fix TypeError when prepending string to STL files [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/860632 (https://phabricator.wikimedia.org/T323781) (owner: 10Hnowlan)
[18:13:20] <wikibugs>	 (03PS30) 10Raymond Ndibe: Modify maintain-dbusers.py to call the rest-api service [puppet] - 10https://gerrit.wikimedia.org/r/809921 (https://phabricator.wikimedia.org/T304040)
[18:15:17] <logmsgbot>	 !log mbsantos@deploy1002 helmfile [staging] START helmfile.d/services/proton: apply
[18:16:20] <wikibugs>	 (03Merged) 10jenkins-bot: Bump proton to 2022-11-24-154643-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/860635 (owner: 10MSantos)
[18:17:24] <wikibugs>	 (03Merged) 10jenkins-bot: Fix TypeError when prepending string to STL files [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/860632 (https://phabricator.wikimedia.org/T323781) (owner: 10Hnowlan)
[18:19:04] <wikibugs>	 (03CR) 10Vlad.shapik: [C: 03+1] thumbor: correct tinyrgb path [deployment-charts] - 10https://gerrit.wikimedia.org/r/860628 (https://phabricator.wikimedia.org/T323775) (owner: 10Hnowlan)
[18:19:23] <logmsgbot>	 !log mbsantos@deploy1002 helmfile [staging] START helmfile.d/services/proton: apply
[18:20:22] <logmsgbot>	 !log mbsantos@deploy1002 helmfile [staging] DONE helmfile.d/services/proton: apply
[18:21:20] <logmsgbot>	 !log mbsantos@deploy1002 helmfile [codfw] START helmfile.d/services/proton: apply
[18:21:32] <wikibugs>	 (03CR) 10Vlad.shapik: [C: 03+1] Add tinyrgb colour profile [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/860580 (https://phabricator.wikimedia.org/T323775) (owner: 10Hnowlan)
[18:22:18] <wikibugs>	 (03Abandoned) 10Ssingh: Release 0.35-2 [software/acme-chief] (debian) - 10https://gerrit.wikimedia.org/r/860612 (https://phabricator.wikimedia.org/T321309) (owner: 10Ssingh)
[18:22:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2152 (T321126)', diff saved to https://phabricator.wikimedia.org/P41018 and previous config saved to /var/cache/conftool/dbconfig/20221124-182225-marostegui.json
[18:22:28] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2154.codfw.wmnet with reason: Maintenance
[18:22:32] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[18:22:41] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2154.codfw.wmnet with reason: Maintenance
[18:22:47] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2154 (T321126)', diff saved to https://phabricator.wikimedia.org/P41019 and previous config saved to /var/cache/conftool/dbconfig/20221124-182247-marostegui.json
[18:22:58] <logmsgbot>	 !log mbsantos@deploy1002 helmfile [codfw] DONE helmfile.d/services/proton: apply
[18:23:29] <logmsgbot>	 !log mbsantos@deploy1002 helmfile [eqiad] START helmfile.d/services/proton: apply
[18:24:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2154 (T321126)', diff saved to https://phabricator.wikimedia.org/P41020 and previous config saved to /var/cache/conftool/dbconfig/20221124-182457-marostegui.json
[18:25:40] <logmsgbot>	 !log mbsantos@deploy1002 helmfile [eqiad] DONE helmfile.d/services/proton: apply
[18:28:45] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] Add tinyrgb colour profile [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/860580 (https://phabricator.wikimedia.org/T323775) (owner: 10Hnowlan)
[18:34:01] <icinga-wm>	 PROBLEM - NFS Share Volume Space /srv/tools on labstore1004 is CRITICAL: DISK CRITICAL - free space: /srv/tools 1256115 MB (15% inode=68%): https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage%23NFS_volume_cleanup https://grafana.wikimedia.org/d/50z0i4XWz/tools-overall-nfs-storage-utilization?orgId=1
[18:34:59] <wikibugs>	 (03Merged) 10jenkins-bot: Add tinyrgb colour profile [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/860580 (https://phabricator.wikimedia.org/T323775) (owner: 10Hnowlan)
[18:37:50] <wikibugs>	 (03PS6) 10Vlad.shapik: WP:Add ability to specify a DPI value for PDF [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/853402 (https://phabricator.wikimedia.org/T256959)
[18:40:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P41021 and previous config saved to /var/cache/conftool/dbconfig/20221124-184004-marostegui.json
[18:45:08] <wikibugs>	 (03PS1) 10Ssingh: setup.py: update dependencies for bullseye [software/acme-chief] - 10https://gerrit.wikimedia.org/r/860637 (https://phabricator.wikimedia.org/T321309)
[18:50:02] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh)
[18:51:09] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] setup.py: update dependencies for bullseye [software/acme-chief] - 10https://gerrit.wikimedia.org/r/860637 (https://phabricator.wikimedia.org/T321309) (owner: 10Ssingh)
[18:52:33] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity for Hghani - https://phabricator.wikimedia.org/T322145 (10Hghani) Hi I am using a windows 10 machine and I am having trouble logging in via ssh.  When I attempt to connect to the server it prompts for password/pa...
[18:55:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P41022 and previous config saved to /var/cache/conftool/dbconfig/20221124-185510-marostegui.json
[18:55:26] <wikibugs>	 (03CR) 10Ssingh: "13:51:05 ImportError: cannot import name 'escape' from 'jinja2' (/src/.tox/py37-tests-min/lib/python3.7/site-packages/jinja2/__init__.py)" [software/acme-chief] - 10https://gerrit.wikimedia.org/r/860637 (https://phabricator.wikimedia.org/T321309) (owner: 10Ssingh)
[18:55:43] <wikibugs>	 10SRE-swift-storage, 10Infrastructure-Foundations, 10Patch-For-Review: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 (10jbond) >>! In T308677#8420280, @MatthewVernon wrote: > ms-be2050 looks good t...
[19:02:23] <wikibugs>	 (03PS2) 10Ssingh: setup.py: update dependencies for bullseye [software/acme-chief] - 10https://gerrit.wikimedia.org/r/860637 (https://phabricator.wikimedia.org/T321309)
[19:03:10] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] setup.py: update dependencies for bullseye [software/acme-chief] - 10https://gerrit.wikimedia.org/r/860637 (https://phabricator.wikimedia.org/T321309) (owner: 10Ssingh)
[19:10:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2154 (T321126)', diff saved to https://phabricator.wikimedia.org/P41023 and previous config saved to /var/cache/conftool/dbconfig/20221124-191017-marostegui.json
[19:10:19] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2161.codfw.wmnet with reason: Maintenance
[19:10:24] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[19:10:32] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2161.codfw.wmnet with reason: Maintenance
[19:10:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2161 (T321126)', diff saved to https://phabricator.wikimedia.org/P41024 and previous config saved to /var/cache/conftool/dbconfig/20221124-191038-marostegui.json
[19:12:49] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2161 (T321126)', diff saved to https://phabricator.wikimedia.org/P41025 and previous config saved to /var/cache/conftool/dbconfig/20221124-191249-marostegui.json
[19:27:56] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P41026 and previous config saved to /var/cache/conftool/dbconfig/20221124-192755-marostegui.json
[19:43:02] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P41027 and previous config saved to /var/cache/conftool/dbconfig/20221124-194302-marostegui.json
[19:56:15] <wikibugs>	 (03PS2) 10Raymond Ndibe: cookbooks: print out instructions on next step after updating the            buildpack/tekton images in the local repo [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859582 (https://phabricator.wikimedia.org/T321188)
[19:58:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2161 (T321126)', diff saved to https://phabricator.wikimedia.org/P41028 and previous config saved to /var/cache/conftool/dbconfig/20221124-195808-marostegui.json
[19:58:11] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2162.codfw.wmnet with reason: Maintenance
[19:58:16] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[19:58:24] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2162.codfw.wmnet with reason: Maintenance
[19:58:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2162 (T321126)', diff saved to https://phabricator.wikimedia.org/P41029 and previous config saved to /var/cache/conftool/dbconfig/20221124-195830-marostegui.json
[20:00:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2162 (T321126)', diff saved to https://phabricator.wikimedia.org/P41030 and previous config saved to /var/cache/conftool/dbconfig/20221124-200040-marostegui.json
[20:12:22] <wikibugs>	 (03PS3) 10Raymond Ndibe: cookbooks: print out instructions on next step after updating the            buildpack/tekton images in the local repo [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859582 (https://phabricator.wikimedia.org/T321188)
[20:13:41] <icinga-wm>	 RECOVERY - cassandra-b CQL 10.64.32.31:9042 on aqs1018 is OK: TCP OK - 0.000 second response time on 10.64.32.31 port 9042 https://phabricator.wikimedia.org/T93886
[20:15:47] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P41031 and previous config saved to /var/cache/conftool/dbconfig/20221124-201547-marostegui.json
[20:22:40] <wikibugs>	 (03CR) 10Raymond Ndibe: cookbooks: print out instructions on next step after updating the            buildpack/tekton images in the local repo (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859582 (https://phabricator.wikimedia.org/T321188) (owner: 10Raymond Ndibe)
[20:24:26] <wikibugs>	 (03PS16) 10Effie Mouzeli: WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102
[20:24:44] <wikibugs>	 (03CR) 10Raymond Ndibe: "Hello David, this needs to be +2'd by someone else. I can't merge it myself" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/859582 (https://phabricator.wikimedia.org/T321188) (owner: 10Raymond Ndibe)
[20:30:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P41032 and previous config saved to /var/cache/conftool/dbconfig/20221124-203053-marostegui.json
[20:31:58] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[20:32:44] <wikibugs>	 (03PS17) 10Effie Mouzeli: WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102
[20:41:58] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[20:42:49] <wikibugs>	 10SRE-OnFire, 10Gerrit, 10serviceops-collab, 10Release-Engineering-Team (GitLab III: GitLab in LA 🪃), and 2 others: gerrit1001 running out of space on / - https://phabricator.wikimedia.org/T323262 (10hashar)
[20:46:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2162 (T321126)', diff saved to https://phabricator.wikimedia.org/P41033 and previous config saved to /var/cache/conftool/dbconfig/20221124-204600-marostegui.json
[20:46:02] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2163.codfw.wmnet with reason: Maintenance
[20:46:07] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[20:46:15] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2163.codfw.wmnet with reason: Maintenance
[20:46:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2163 (T321126)', diff saved to https://phabricator.wikimedia.org/P41034 and previous config saved to /var/cache/conftool/dbconfig/20221124-204621-marostegui.json
[20:48:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2163 (T321126)', diff saved to https://phabricator.wikimedia.org/P41035 and previous config saved to /var/cache/conftool/dbconfig/20221124-204832-marostegui.json
[20:49:29] <wikibugs>	 (03CR) 10Ssingh: [C: 04-1] "Need to revert 3.9 or add skip_missing https://phabricator.wikimedia.org/T289222 but that's for another day." [software/acme-chief] - 10https://gerrit.wikimedia.org/r/860637 (https://phabricator.wikimedia.org/T321309) (owner: 10Ssingh)
[20:56:41] <TheresNoTime>	 jouncebot: nowandnext
[20:56:41] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 3 minute(s)
[20:56:41] <jouncebot>	 In 0 hour(s) and 3 minute(s): UTC late backport and config training (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221124T2100)
[20:57:04] * TheresNoTime isn't available for ^ but looks empty anyway
[20:57:14] <wikibugs>	 (03PS18) 10Effie Mouzeli: WIP:P:mediawiki::mcrouter_wancache Profile refactoring [puppet] - 10https://gerrit.wikimedia.org/r/860102
[21:00:04] <jouncebot>	 brennen and TheresNoTime: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC late backport and config training deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221124T2100).
[21:02:49] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10
[21:03:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P41036 and previous config saved to /var/cache/conftool/dbconfig/20221124-210338-marostegui.json
[21:18:45] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P41037 and previous config saved to /var/cache/conftool/dbconfig/20221124-211845-marostegui.json
[21:25:01] <wikibugs>	 (03PS7) 10Vlad.shapik: Add ability to specify a DPI value for PDF [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/853402 (https://phabricator.wikimedia.org/T256959)
[21:33:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2163 (T321126)', diff saved to https://phabricator.wikimedia.org/P41038 and previous config saved to /var/cache/conftool/dbconfig/20221124-213351-marostegui.json
[21:33:54] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2164.codfw.wmnet with reason: Maintenance
[21:33:59] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[21:34:07] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2164.codfw.wmnet with reason: Maintenance
[21:34:09] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2094.codfw.wmnet with reason: Maintenance
[21:34:22] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2094.codfw.wmnet with reason: Maintenance
[21:34:29] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2164 (T321126)', diff saved to https://phabricator.wikimedia.org/P41039 and previous config saved to /var/cache/conftool/dbconfig/20221124-213428-marostegui.json
[21:36:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2164 (T321126)', diff saved to https://phabricator.wikimedia.org/P41040 and previous config saved to /var/cache/conftool/dbconfig/20221124-213639-marostegui.json
[21:38:32] <wikibugs>	 (03CR) 10Vlad.shapik: "It works as expected, and the image quality is substantially better when we specify a higher DPI." [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/853402 (https://phabricator.wikimedia.org/T256959) (owner: 10Vlad.shapik)
[21:51:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P41041 and previous config saved to /var/cache/conftool/dbconfig/20221124-215145-marostegui.json
[22:02:49] <tgr_>	 Oops, got confused and pushed to github instead of gerrit ( https://github.com/wikimedia/Timestamp/tree/add-sub ), will delete the branch.
[22:02:59] <tgr_>	 (logging here for lack of a better place)
[22:06:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P41042 and previous config saved to /var/cache/conftool/dbconfig/20221124-220652-marostegui.json
[22:21:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2164 (T321126)', diff saved to https://phabricator.wikimedia.org/P41043 and previous config saved to /var/cache/conftool/dbconfig/20221124-222158-marostegui.json
[22:22:00] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2166.codfw.wmnet with reason: Maintenance
[22:22:05] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[22:22:14] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2166.codfw.wmnet with reason: Maintenance
[22:22:20] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2166 (T321126)', diff saved to https://phabricator.wikimedia.org/P41044 and previous config saved to /var/cache/conftool/dbconfig/20221124-222220-marostegui.json
[22:24:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2166 (T321126)', diff saved to https://phabricator.wikimedia.org/P41045 and previous config saved to /var/cache/conftool/dbconfig/20221124-222430-marostegui.json
[22:39:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P41046 and previous config saved to /var/cache/conftool/dbconfig/20221124-223937-marostegui.json
[22:54:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P41047 and previous config saved to /var/cache/conftool/dbconfig/20221124-225443-marostegui.json
[22:55:29] <icinga-wm>	 PROBLEM - puppet last run on wcqs1001 is CRITICAL: CRITICAL: Puppet has been disabled for 605047 seconds, message: T321605 - bking, last run 7 days ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[23:09:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2166 (T321126)', diff saved to https://phabricator.wikimedia.org/P41048 and previous config saved to /var/cache/conftool/dbconfig/20221124-230949-marostegui.json
[23:09:52] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2167.codfw.wmnet with reason: Maintenance
[23:09:57] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[23:10:05] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2167.codfw.wmnet with reason: Maintenance
[23:10:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2167:3318 (T321126)', diff saved to https://phabricator.wikimedia.org/P41049 and previous config saved to /var/cache/conftool/dbconfig/20221124-231011-marostegui.json
[23:12:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T321126)', diff saved to https://phabricator.wikimedia.org/P41050 and previous config saved to /var/cache/conftool/dbconfig/20221124-231221-marostegui.json
[23:14:29] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[23:14:31] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[23:15:43] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[23:15:46] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[23:17:17] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[23:17:19] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[23:22:58] <wikibugs>	 (03PS2) 10Andrea Denisse: admin: add dpujol to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/854952 (https://phabricator.wikimedia.org/T322670) (owner: 10Filippo Giunchedi)
[23:23:36] <wikibugs>	 (03CR) 10Andrea Denisse: [C: 03+2] admin: add dpujol to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/854952 (https://phabricator.wikimedia.org/T322670) (owner: 10Filippo Giunchedi)
[23:23:41] <wikibugs>	 (03CR) 10Andrea Denisse: [V: 03+2 C: 03+2] admin: add dpujol to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/854952 (https://phabricator.wikimedia.org/T322670) (owner: 10Filippo Giunchedi)
[23:25:38] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for David.pujol - https://phabricator.wikimedia.org/T322670 (10andrea.denisse) 05In progress→03Resolved Hello, @David.pujol should have access now. Please let me know if there's anything else I could help w...
[23:26:49] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for Dasm - https://phabricator.wikimedia.org/T322591 (10andrea.denisse) Hi @Htriedman and @Jcross , friendly ping to confirm @dasm 's access expiry date. :)
[23:27:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P41051 and previous config saved to /var/cache/conftool/dbconfig/20221124-232728-marostegui.json
[23:29:32] <wikibugs>	 (03CR) 10Andrea Denisse: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/860522 (https://phabricator.wikimedia.org/T318903) (owner: 10Filippo Giunchedi)
[23:29:53] <wikibugs>	 (03CR) 10Andrea Denisse: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/860521 (https://phabricator.wikimedia.org/T318903) (owner: 10Filippo Giunchedi)
[23:31:00] <wikibugs>	 (03CR) 10Andrea Denisse: o11y: more lenient logstash kafka consumer lag (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/860609 (owner: 10Filippo Giunchedi)
[23:32:17] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Wenjun Fan - https://phabricator.wikimedia.org/T319056 (10andrea.denisse)
[23:33:12] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[23:33:15] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[23:33:50] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[23:33:52] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[23:36:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P41052 and previous config saved to /var/cache/conftool/dbconfig/20221124-233604-ladsgroup.json
[23:42:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P41053 and previous config saved to /var/cache/conftool/dbconfig/20221124-234234-marostegui.json
[23:51:10] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P41054 and previous config saved to /var/cache/conftool/dbconfig/20221124-235109-ladsgroup.json
[23:57:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T321126)', diff saved to https://phabricator.wikimedia.org/P41055 and previous config saved to /var/cache/conftool/dbconfig/20221124-235741-marostegui.json
[23:57:43] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2168.codfw.wmnet with reason: Maintenance
[23:57:48] <stashbot>	 T321126: Add column 'cul_actor' and index cul_actor_time to cu_log on wmf wikis - https://phabricator.wikimedia.org/T321126
[23:57:57] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Maintenance
[23:58:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2168:3318 (T321126)', diff saved to https://phabricator.wikimedia.org/P41056 and previous config saved to /var/cache/conftool/dbconfig/20221124-235803-marostegui.json