[00:00:49] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "parameter 'ip_families' index 0 expects a match for Enum['ip4', 'ip6'], got 'ipv4'" [puppet] - 10https://gerrit.wikimedia.org/r/884396 (https://phabricator.wikimedia.org/T327974) (owner: 10Dzahn)
[00:02:06] <wikibugs>	 (03PS1) 10Dzahn: etherpad: fix ip_family name, ip4 not ipv4 [puppet] - 10https://gerrit.wikimedia.org/r/885057 (https://phabricator.wikimedia.org/T327974)
[00:02:26] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: (2) Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[00:02:27] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] etherpad: fix ip_family name, ip4 not ipv4 [puppet] - 10https://gerrit.wikimedia.org/r/885057 (https://phabricator.wikimedia.org/T327974) (owner: 10Dzahn)
[00:04:30] <wikibugs>	 (03PS1) 10Zabe: Set 'groupLoadsBySection' for s11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885058 (https://phabricator.wikimedia.org/T326980)
[00:06:10] <logmsgbot>	 !log brett@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
[00:06:53] <wikibugs>	 (03PS2) 10Zabe: Set 'groupLoadsBySection' for s11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885058 (https://phabricator.wikimedia.org/T326980)
[00:09:15] <logmsgbot>	 !log brett@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
[00:11:22] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s3 on db1102 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 676.71 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[00:14:27] <mutante>	 !log etherpad - maintenance downtime for about 5 minutes to test monitoring 
[00:14:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:17:18] <icinga-wm>	 PROBLEM - etherpad.wikimedia.org HTTP on etherpad1003 is CRITICAL: connect to address 10.64.32.181 and port 9001: Connection refused https://wikitech.wikimedia.org/wiki/Etherpad.wikimedia.org
[00:18:06] <mutante>	 well, yea, that's the icinga alert
[00:18:14] <mutante>	 but I want to know if the alertmanager alert works
[00:18:39] <mutante>	 and I dont see any of that
[00:19:05] <mutante>	 and unlike icinga you cant actually see alerts that are not alerting.. so ...dont know how to confirm it works
[00:19:13] <mutante>	 or doesnt work
[00:19:48] <rzl>	 I do see an etherpad alert at the top of https://alerts.wikimedia.org/
[00:20:15] <mutante>	 yea, but that's the wrong team 
[00:20:21] <mutante>	 not the one I added ..hmm
[00:20:35] <mutante>	 and where did that actually alert
[00:20:56] <mutante>	 wait.. maybe it is that one , heh
[00:20:57] <rzl>	 oh, there's a ProbeDown alert further down the page with team: serviceops-collab
[00:21:17] <rzl>	 I had to search by "etherpad" to narrow the field enough to see it, but there it is
[00:21:42] <mutante>	 rzl: yea, that's the one. and I did get email now. thanks
[00:21:53] * mutante turns Etherpad back on :)
[00:22:26] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: (2) Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[00:22:34] <icinga-wm>	 RECOVERY - etherpad.wikimedia.org HTTP on etherpad1003 is OK: HTTP OK: HTTP/1.1 200 OK - 6448 bytes in 0.044 second response time https://wikitech.wikimedia.org/wiki/Etherpad.wikimedia.org
[00:23:33] <mutante>	 rzl: now I just need to adjust my "team" actions to make it create tickets :) cool
[00:23:47] <rzl>	 nice
[00:25:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:25:59] <mutante>	 you can also easily link to just "all alerts for team X"
[00:30:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:38:09] <wikibugs>	 (03PS7) 10Urbanecm: Allow AbuseFilter to block IPs and users on itwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884333 (https://phabricator.wikimedia.org/T328194) (owner: 10Superpes15)
[00:38:38] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884333 (https://phabricator.wikimedia.org/T328194) (owner: 10Superpes15)
[00:40:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:42:37] <logmsgbot>	 !log brett@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5027.eqsin.wmnet with OS bullseye
[00:42:42] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin1001 for host cp5027.eqsin.wmnet with OS bullseye completed: - cp5027 (**PASS**)   - Downtimed on Icinga/Alertmanager   - Disabled Pu...
[00:45:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[00:50:04] <logmsgbot>	 !log brett@cumin1001 conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet
[00:50:32] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall)
[00:53:59] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh) For posterity, the versions of the iDRAC and the NIC firmware that we are looking for for the cp hosts bullseye upgrade and that we pass to the firmware cookbook/upload on the HTTP management interface:...
[01:18:50] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: m1 on db2160 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 872.65 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[01:22:24] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: m1 on db2160 is OK: OK slave_sql_lag Replication lag: 0.02 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[01:31:57] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3053.esams.wmnet']
[01:32:09] <logmsgbot>	 !log sukhe@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp3053.esams.wmnet']
[01:35:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:37:55] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS bullseye
[01:38:01] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp3053.esams.wmnet with OS bullseye
[01:40:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:59:02] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
[02:02:16] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
[02:04:01] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Upgrade lists.wikimedia.org to next Mailman/hyperkitty/postorius versions - https://phabricator.wikimedia.org/T286217 (10Legoktm) Our current Mailman deployment is a bunch of backported and forked debs with random patches thrown on top based on what we managed to fix upstream....
[02:05:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:10:45] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:14:33] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s3 on db1102 is OK: OK slave_sql_lag Replication lag: 0.46 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[02:19:03] <jinxer-wm>	 (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:20:45] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:25:45] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:28:55] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3053.esams.wmnet with OS bullseye
[02:29:01] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp3053.esams.wmnet with OS bullseye completed: - cp3053 (**WARN**)   - Removed from Puppet and PuppetDB if present   -...
[02:35:45] <jinxer-wm>	 (JobUnavailable) firing: (4) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:43:55] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh)
[02:43:59] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet,service=cdn
[02:43:59] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet,service=ats-be
[03:00:05] <jouncebot>	 Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230131T0300)
[03:06:15] <wikibugs>	 (03PS1) 10Krinkle: multiversion: Create dblist-manage command for easy add/delete [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885064 (https://phabricator.wikimedia.org/T308932)
[03:06:16] <wikibugs>	 (03PS1) 10Krinkle: logos: Exclude logos/index.html from Git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885065
[03:06:19] <wikibugs>	 (03PS1) 10Krinkle: multiversion: Remove getCachableMWConfig in favour of getConfigGlobals [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885066 (https://phabricator.wikimedia.org/T308932)
[03:06:28] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] multiversion: Remove getCachableMWConfig in favour of getConfigGlobals [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885066 (https://phabricator.wikimedia.org/T308932) (owner: 10Krinkle)
[03:07:36] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.40.0-wmf.21 [core] (wmf/1.40.0-wmf.21) - 10https://gerrit.wikimedia.org/r/885010 (https://phabricator.wikimedia.org/T325584)
[03:07:39] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/1.40.0-wmf.21 [core] (wmf/1.40.0-wmf.21) - 10https://gerrit.wikimedia.org/r/885010 (https://phabricator.wikimedia.org/T325584) (owner: 10TrainBranchBot)
[03:08:16] <wikibugs>	 (03PS2) 10Krinkle: multiversion: Create dblist-manage command for easy add/delete [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885064 (https://phabricator.wikimedia.org/T308932)
[03:08:18] <wikibugs>	 (03PS2) 10Krinkle: logos: Exclude logos/index.html from Git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885065
[03:08:20] <wikibugs>	 (03PS2) 10Krinkle: multiversion: Remove getCachableMWConfig in favour of getConfigGlobals [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885066 (https://phabricator.wikimedia.org/T308932)
[03:24:27] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.40.0-wmf.21 [core] (wmf/1.40.0-wmf.21) - 10https://gerrit.wikimedia.org/r/885010 (https://phabricator.wikimedia.org/T325584) (owner: 10TrainBranchBot)
[03:25:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:35:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:49:29] <icinga-wm>	 RECOVERY - dump of matomo in eqiad on backupmon1001 is OK: Last dump for matomo at eqiad (db1108) taken on 2023-01-31 03:47:03 (281 MiB, +4.1 %) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup
[04:00:05] <jouncebot>	 Deploy window Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230131T0400)
[04:01:23] <wikibugs>	 (03PS1) 10TrainBranchBot: testwikis wikis to 1.40.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885068 (https://phabricator.wikimedia.org/T325584)
[04:01:29] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] testwikis wikis to 1.40.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885068 (https://phabricator.wikimedia.org/T325584) (owner: 10TrainBranchBot)
[04:02:02] <wikibugs>	 (03Merged) 10jenkins-bot: testwikis wikis to 1.40.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885068 (https://phabricator.wikimedia.org/T325584) (owner: 10TrainBranchBot)
[04:02:30] <logmsgbot>	 !log mwpresync@deploy1002 Started scap: testwikis wikis to 1.40.0-wmf.21  refs T325584
[04:02:56] <stashbot>	 T325584: 1.40.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T325584
[04:20:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:22:26] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[04:30:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:35:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:55:26] <logmsgbot>	 !log mwpresync@deploy1002 Finished scap: testwikis wikis to 1.40.0-wmf.21  refs T325584 (duration: 52m 56s)
[04:56:05] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[04:56:19] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
[04:57:43] <logmsgbot>	 !log mwpresync@deploy1002 Pruned MediaWiki: 1.40.0-wmf.19 (duration: 02m 15s)
[04:58:26] <wikibugs>	 10SRE-OnFire, 10Sustainability (Incident Followup): 2023-01-10 eqsin network outage - https://phabricator.wikimedia.org/T328354 (10andrea.denisse)
[05:01:35] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] "I will deploy it a bit later today" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885058 (https://phabricator.wikimedia.org/T326980) (owner: 10Zabe)
[05:05:37] <icinga-wm>	 RECOVERY - Backup freshness on backup1001 is OK: Fresh: 117 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[05:05:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:10:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:20:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:25:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:41:11] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] "LGTM https://integration.wikimedia.org/ci/job/operations-mw-config-php74-composer-diffConfig-docker/1466/console" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885046 (https://phabricator.wikimedia.org/T299612) (owner: 10Sbailey)
[06:15:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:19:03] <jinxer-wm>	 (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:20:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:32:00] <wikibugs>	 (03PS1) 10Muehlenhoff: Apply role::installserver to install2004 [puppet] - 10https://gerrit.wikimedia.org/r/885246 (https://phabricator.wikimedia.org/T327867)
[06:45:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:50:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:52:47] <marostegui>	 !log dbmaint Schema change on s8 eqiad T328373 
[06:52:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:53:00] <marostegui>	 !log dbmaint Schema change on s4 eqiad T328373 
[06:53:56] <marostegui>	 !log dbmaint Schema change on s6 eqiad T328373 
[06:54:53] <marostegui>	 !log dbmaint Schema change on s2 eqiad T328373 
[06:59:24] <marostegui>	 !log dbmaint Schema change on s7 eqiad T328373 
[06:59:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:00:04] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230131T0700)
[07:00:04] <jouncebot>	 kormat, marostegui, and Amir1: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Primary database switchover deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230131T0700).
[07:00:05] <wikibugs>	 (03PS1) 10Muehlenhoff: New stub keytabs for the new install servers [labs/private] - 10https://gerrit.wikimedia.org/r/885263 (https://phabricator.wikimedia.org/T327867)
[07:00:14] <Amir1>	 nothing for today
[07:02:45] <wikibugs>	 (03PS1) 10Marostegui: db1195: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/885264 (https://phabricator.wikimedia.org/T328253)
[07:03:05] <marostegui>	 !log dbmaint Schema change on s5 eqiad T328373 
[07:03:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:03:11] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1195: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/885264 (https://phabricator.wikimedia.org/T328253) (owner: 10Marostegui)
[07:04:13] <stashbot>	 T328373: Drop default value from cul_actor on wmf wikis - https://phabricator.wikimedia.org/T328373
[07:06:21] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Promote db1195 to m2 master [puppet] - 10https://gerrit.wikimedia.org/r/885265 (https://phabricator.wikimedia.org/T328253)
[07:06:28] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on db[2133,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Primary switchover m2 T328253
[07:06:42] <stashbot>	 T328253: Switchover m2 master db1164 -> db1195 - https://phabricator.wikimedia.org/T328253
[07:06:44] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2133,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Primary switchover m2 T328253
[07:07:46] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Promote db1195 to m2 master [puppet] - 10https://gerrit.wikimedia.org/r/885265 (https://phabricator.wikimedia.org/T328253) (owner: 10Marostegui)
[07:10:23] <marostegui>	 !log Failover m2 from db1164 to db1195 - T328253
[07:10:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:10:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:15:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:16:28] <wikibugs>	 (03PS1) 10Marostegui: db1164: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/885268 (https://phabricator.wikimedia.org/T328402)
[07:16:52] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1164: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/885268 (https://phabricator.wikimedia.org/T328402) (owner: 10Marostegui)
[07:22:39] <wikibugs>	 (03CR) 10Muehlenhoff: [V: 03+2 C: 03+2] New stub keytabs for the new install servers [labs/private] - 10https://gerrit.wikimedia.org/r/885263 (https://phabricator.wikimedia.org/T327867) (owner: 10Muehlenhoff)
[07:22:42] <marostegui>	 !log dbmaint Schema change on s1 eqiad T328373 
[07:22:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:22:46] <stashbot>	 T328373: Drop default value from cul_actor on wmf wikis - https://phabricator.wikimedia.org/T328373
[07:22:49] <marostegui>	 !log dbmaint Schema change on s3 eqiad T328373 
[07:22:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:47] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Move db1164 to m3 [puppet] - 10https://gerrit.wikimedia.org/r/885269 (https://phabricator.wikimedia.org/T328402)
[07:31:30] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Move db1164 to m3 [puppet] - 10https://gerrit.wikimedia.org/r/885269 (https://phabricator.wikimedia.org/T328402) (owner: 10Marostegui)
[07:32:04] <marostegui>	 moritzm: there are pending puppet changes from you
[07:35:10] <moritzm>	 ah, right. forgot about merging the labs-private ones, fixing that now
[07:35:17] <moritzm>	 done
[07:37:11] <marostegui>	 thanks!
[07:40:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:45:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[07:50:10] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: sre-mediawiki: add mean latency alerts (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/883502 (https://phabricator.wikimedia.org/T326544) (owner: 10Giuseppe Lavagetto)
[07:50:29] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: sre-mediawiki: add mean latency alerts [alerts] - 10https://gerrit.wikimedia.org/r/883502 (https://phabricator.wikimedia.org/T326544)
[07:50:31] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: sre-mediawiki: port the other prometheus-based alerts [alerts] - 10https://gerrit.wikimedia.org/r/883950
[07:50:33] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10LDAP, 10Patch-For-Review: Retire ldap-corp cluster - https://phabricator.wikimedia.org/T323820 (10MoritzMuehlenhoff) I've synched up with ITS, they will shut down the ldap1.corp.wikimedia.org server that we synched against next calendar year.
[07:55:09] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] sre-mediawiki: add mean latency alerts [alerts] - 10https://gerrit.wikimedia.org/r/883502 (https://phabricator.wikimedia.org/T326544) (owner: 10Giuseppe Lavagetto)
[07:56:19] <wikibugs>	 (03Merged) 10jenkins-bot: sre-mediawiki: add mean latency alerts [alerts] - 10https://gerrit.wikimedia.org/r/883502 (https://phabricator.wikimedia.org/T326544) (owner: 10Giuseppe Lavagetto)
[07:56:38] <moritzm>	 !log installing bash bugfix updates from Bullseye point release
[07:56:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:56:44] <wikibugs>	 (03CR) 10Jelto: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/884905 (https://phabricator.wikimedia.org/T327664) (owner: 10JMeybohm)
[08:00:05] <jouncebot>	 Amir1 and Urbanecm: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for UTC morning backport window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230131T0800).
[08:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[08:00:16] <Amir1>	 awesome
[08:04:47] <wikibugs>	 (03PS11) 10Slyngshede: P:IDM Configure OIDC and LDAP. [puppet] - 10https://gerrit.wikimedia.org/r/884881
[08:05:01] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] sre-mediawiki: port the other prometheus-based alerts [alerts] - 10https://gerrit.wikimedia.org/r/883950 (owner: 10Giuseppe Lavagetto)
[08:05:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:06:08] <wikibugs>	 (03Merged) 10jenkins-bot: sre-mediawiki: port the other prometheus-based alerts [alerts] - 10https://gerrit.wikimedia.org/r/883950 (owner: 10Giuseppe Lavagetto)
[08:06:51] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39331/console" [puppet] - 10https://gerrit.wikimedia.org/r/884881 (owner: 10Slyngshede)
[08:10:06] <wikibugs>	 (03PS4) 10Phedenskog: Remove unused eventlogging_RUMSpeedIndex stream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/726854 (https://phabricator.wikimedia.org/T286700)
[08:10:45] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job jmx_puppetdb in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:11:19] <wikibugs>	 (03PS12) 10Slyngshede: P:IDM Configure OIDC and LDAP. [puppet] - 10https://gerrit.wikimedia.org/r/884881
[08:13:22] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39332/console" [puppet] - 10https://gerrit.wikimedia.org/r/884881 (owner: 10Slyngshede)
[08:15:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:22:17] <wikibugs>	 10SRE, 10Infrastructure Security, 10observability: Grafana: CVE-2022-39324 CVE-2022-23552 - https://phabricator.wikimedia.org/T328405 (10MoritzMuehlenhoff)
[08:22:26] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[08:36:56] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
[08:39:09] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
[08:41:35] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] exim: Remove leftovers of ldap-corp setup [puppet] - 10https://gerrit.wikimedia.org/r/884282 (https://phabricator.wikimedia.org/T323820) (owner: 10Muehlenhoff)
[08:42:58] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (PATCH inferenceservices) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-mlstaging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[08:43:39] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: ci: add pre-commit hooks [software/httpbb] - 10https://gerrit.wikimedia.org/r/885273
[08:43:52] <wikibugs>	 10SRE, 10Infrastructure Security, 10observability: Grafana: CVE-2022-39324 CVE-2022-23552 - https://phabricator.wikimedia.org/T328405 (10fgiunchedi) Upgrading SGTM, I don't see 8.5.16 on apt.grafana.org yet though:  https://apt.grafana.com/dists/stable/main/binary-amd64/Packages.gz
[08:45:29] <elukey>	 !log restore previously removed password for keystore to kafka-logging clusters
[08:45:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:47:58] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (PATCH inferenceservices) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-mlstaging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[08:48:49] <wikibugs>	 (03PS1) 10Zabe: Stop writing to cuc_user and cuc_user_text in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885274 (https://phabricator.wikimedia.org/T233004)
[08:49:19] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
[08:50:13] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Stop writing to cuc_user and cuc_user_text in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885274 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[08:50:21] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by zabe@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885274 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[08:51:06] <wikibugs>	 (03Merged) 10jenkins-bot: Stop writing to cuc_user and cuc_user_text in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885274 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[08:51:52] <logmsgbot>	 !log zabe@deploy1002 Started scap: Backport for [[gerrit:885274|Stop writing to cuc_user and cuc_user_text in testwiki (T233004)]]
[08:51:57] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[08:53:55] <logmsgbot>	 !log zabe@deploy1002 zabe: Backport for [[gerrit:885274|Stop writing to cuc_user and cuc_user_text in testwiki (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
[08:54:19] <elukey>	 !log roll restart kafka on kafka-logging1001 to pick up new pki certs
[08:54:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:54:32] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
[09:00:01] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
[09:00:03] <logmsgbot>	 !log zabe@deploy1002 Finished scap: Backport for [[gerrit:885274|Stop writing to cuc_user and cuc_user_text in testwiki (T233004)]] (duration: 08m 11s)
[09:00:08] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[09:00:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:01:56] <wikibugs>	 10SRE, 10Infrastructure Security, 10observability: Grafana: CVE-2022-39324 CVE-2022-23552 - https://phabricator.wikimedia.org/T328405 (10fgiunchedi) Opened an issue with upstream re: apt repo update https://github.com/grafana/grafana/issues/62544
[09:05:11] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
[09:06:58] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (PATCH inferenceservices) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-mlstaging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[09:07:19] <wikibugs>	 (03PS2) 10Muehlenhoff: Apply role::installserver to install2004 [puppet] - 10https://gerrit.wikimedia.org/r/885246 (https://phabricator.wikimedia.org/T327867)
[09:09:46] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Add DP cookie for pageview filtering - https://phabricator.wikimedia.org/T315676 (10Vgutierrez) >>! In T315676#8572237, @Jcross wrote: > Hi @BBlack and @Vgutierrez - could you please provide an update or some guidance around your expected timeline for this? Please let us...
[09:10:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:10:47] <wikibugs>	 10SRE, 10serviceops: Migrate node-based services in production to node14 - https://phabricator.wikimedia.org/T306995 (10Lucas_Werkmeister_WMDE) Good to know, thanks!
[09:11:47] <wikibugs>	 10SRE, 10Wikidata, 10serviceops, 10wdwb-tech: Migrate wikibase/termbox to newer Node.js version - https://phabricator.wikimedia.org/T328295 (10Lucas_Werkmeister_WMDE)
[09:11:54] <wikibugs>	 10SRE, 10Wikidata, 10serviceops, 10wdwb-tech: Migrate wikibase/termbox to newer Node.js version - https://phabricator.wikimedia.org/T328295 (10Lucas_Werkmeister_WMDE)
[09:11:58] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (PATCH inferenceservices) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-mlstaging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[09:20:10] <wikibugs>	 (03PS1) 10Marostegui: db2093: Install MariaDB 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/885278 (https://phabricator.wikimedia.org/T328408)
[09:20:43] <marostegui>	 !log dbmaint Install MariaDB 10.6 on db2093 (db_inventory) T328408
[09:20:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:20:48] <stashbot>	 T328408: Migrate db_inventory section to MariaDB 10.6 - https://phabricator.wikimedia.org/T328408
[09:20:50] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2093: Install MariaDB 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/885278 (https://phabricator.wikimedia.org/T328408) (owner: 10Marostegui)
[09:25:12] <wikibugs>	 (03PS14) 10Vgutierrez: varnish: Generate a DP subkey daily [puppet] - 10https://gerrit.wikimedia.org/r/857748 (https://phabricator.wikimedia.org/T315676)
[09:28:56] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Apply role::installserver to install2004 [puppet] - 10https://gerrit.wikimedia.org/r/885246 (https://phabricator.wikimedia.org/T327867) (owner: 10Muehlenhoff)
[09:45:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:50:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[09:53:42] <wikibugs>	 (03PS2) 10JMeybohm: Switch staging.svc.eqiad.wmnet to point to codfw k8s [dns] - 10https://gerrit.wikimedia.org/r/884900 (https://phabricator.wikimedia.org/T327664)
[10:00:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:03:23] <wikibugs>	 (03PS1) 10Muehlenhoff: Move webproxy.codfw.wmnet to install2004 [dns] - 10https://gerrit.wikimedia.org/r/885285 (https://phabricator.wikimedia.org/T327867)
[10:05:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:12:51] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "we could have the same in jobs-api" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/868791 (owner: 10Majavah)
[10:13:45] <wikibugs>	 (03Merged) 10jenkins-bot: add unit tests for parse_quantity [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/868791 (owner: 10Majavah)
[10:18:14] <jayme>	 !log switching active kubernetes staging cluster from eqiad to codfw - T327664
[10:18:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:18:19] <stashbot>	 T327664: Update staging-eqiad to k8s 1.23 - https://phabricator.wikimedia.org/T327664
[10:19:03] <jinxer-wm>	 (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:21:01] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Switch staging.svc.eqiad.wmnet to point to codfw k8s [dns] - 10https://gerrit.wikimedia.org/r/884900 (https://phabricator.wikimedia.org/T327664) (owner: 10JMeybohm)
[10:21:59] <wikibugs>	 (03CR) 10EoghanGaffney: [C: 03+2] Send rsyslog output for vrts apache logs to kafka/logstash [puppet] - 10https://gerrit.wikimedia.org/r/884909 (https://phabricator.wikimedia.org/T321759) (owner: 10EoghanGaffney)
[10:23:22] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1 C: 03+2] Switch the active staging cluster to codfw [puppet] - 10https://gerrit.wikimedia.org/r/884905 (https://phabricator.wikimedia.org/T327664) (owner: 10JMeybohm)
[10:23:28] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Drop profile::ci::kubernetes_config [puppet] - 10https://gerrit.wikimedia.org/r/884915 (owner: 10JMeybohm)
[10:27:37] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "Thanks. change LGTM. Minor stuff inline." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/868790 (https://phabricator.wikimedia.org/T277495) (owner: 10Majavah)
[10:37:48] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: icinga: remove mediawiki alerts [puppet] - 10https://gerrit.wikimedia.org/r/885288
[10:38:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[10:40:29] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: Fix PHP string interpolation [puppet] - 10https://gerrit.wikimedia.org/r/868528 (https://phabricator.wikimedia.org/T314096) (owner: 10Reedy)
[10:40:31] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: nagios: remove obsolete command check_all_memcached.php [puppet] - 10https://gerrit.wikimedia.org/r/885289
[10:43:27] <wikibugs>	 (03PS1) 10Jelto: sre.gitlab.upgrade: fix location of gitlab version-manifest.json [cookbooks] - 10https://gerrit.wikimedia.org/r/885291 (https://phabricator.wikimedia.org/T323569)
[10:45:55] <jinxer-wm>	 (LogstashIngestSpike) firing: Logstash rate of ingestion percent change compared to yesterday - https://phabricator.wikimedia.org/T202307 - https://grafana.wikimedia.org/d/000000561/logstash?orgId=1&panelId=2&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashIngestSpike
[10:46:38] <wikibugs>	 10SRE-OnFire, 10Maps (Kartotherian), 10Sustainability (Incident Followup), 10Technical-Debt: Kartotherian configuration should be deployable to all production envs at once - https://phabricator.wikimedia.org/T328406 (10awight)
[10:50:32] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "Ship it!" [puppet] - 10https://gerrit.wikimedia.org/r/885288 (owner: 10Giuseppe Lavagetto)
[10:50:55] <jinxer-wm>	 (LogstashIngestSpike) firing: (2) Logstash rate of ingestion percent change compared to yesterday - https://phabricator.wikimedia.org/T202307  - https://alerts.wikimedia.org/?q=alertname%3DLogstashIngestSpike
[10:52:08] <wikibugs>	 (03PS1) 10Filippo Giunchedi: sre: cosmetic-only changes for mw alerts [alerts] - 10https://gerrit.wikimedia.org/r/885293
[10:53:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[10:55:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:56:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[10:57:45] <logmsgbot>	 !log jayme@cumin1001 conftool action : set/pooled=true; selector: name=codfw,dnsdisc=k8s-ingress-staging
[10:57:46] <logmsgbot>	 !log jayme@cumin1001 conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=k8s-ingress-staging
[10:57:50] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39334/console" [puppet] - 10https://gerrit.wikimedia.org/r/884881 (owner: 10Slyngshede)
[10:59:28] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] sre: cosmetic-only changes for mw alerts [alerts] - 10https://gerrit.wikimedia.org/r/885293 (owner: 10Filippo Giunchedi)
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230131T1100)
[11:00:43] <wikibugs>	 (03PS2) 10Muehlenhoff: Move webproxy.codfw.wmnet to install2004 [dns] - 10https://gerrit.wikimedia.org/r/885285 (https://phabricator.wikimedia.org/T327867)
[11:00:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:00:55] <jinxer-wm>	 (LogstashIngestSpike) resolved: (2) Logstash rate of ingestion percent change compared to yesterday - https://phabricator.wikimedia.org/T202307  - https://alerts.wikimedia.org/?q=alertname%3DLogstashIngestSpike
[11:01:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad&var-datasource=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[11:06:22] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Move webproxy.codfw.wmnet to install2004 [dns] - 10https://gerrit.wikimedia.org/r/885285 (https://phabricator.wikimedia.org/T327867) (owner: 10Muehlenhoff)
[11:07:36] <wikibugs>	 (03PS31) 10Vgutierrez: Varnish analytics: support differential privacy [puppet] - 10https://gerrit.wikimedia.org/r/824769 (https://phabricator.wikimedia.org/T315676) (owner: 10Isaac Johnson)
[11:07:40] <wikibugs>	 10ops-codfw: Inbound interface errors - https://phabricator.wikimedia.org/T328420 (10phaultfinder)
[11:12:37] <wikibugs>	 (03PS3) 10Ilias Sarantopoulos: feat: add json payload capability [software/httpbb] - 10https://gerrit.wikimedia.org/r/884920 (https://phabricator.wikimedia.org/T328280)
[11:14:27] <wikibugs>	 (03CR) 10Jelto: "This change is ready for review." [cookbooks] - 10https://gerrit.wikimedia.org/r/885291 (https://phabricator.wikimedia.org/T323569) (owner: 10Jelto)
[11:21:27] <moritzm>	 !log installing bind9 security updates (client-side tools/libs only)
[11:21:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:23:58] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/885291 (https://phabricator.wikimedia.org/T323569) (owner: 10Jelto)
[11:25:28] <wikibugs>	 (03PS4) 10Ilias Sarantopoulos: feat: add json payload capability [software/httpbb] - 10https://gerrit.wikimedia.org/r/884920 (https://phabricator.wikimedia.org/T328280)
[11:25:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:26:19] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/884998 (owner: 10Jbond)
[11:30:05] <wikibugs>	 (03PS1) 10EoghanGaffney: Add /var/log/mail.{log,info,err,warn} to rsyslog [puppet] - 10https://gerrit.wikimedia.org/r/885294 (https://phabricator.wikimedia.org/T321760)
[11:30:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:32:16] <wikibugs>	 (03CR) 10Volans: "did just a super quick pass, I'll redo a full pass once CI passes too" [cookbooks] - 10https://gerrit.wikimedia.org/r/884996 (owner: 10Jbond)
[11:32:50] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] reposync: switch from copy_tree to copytree [software/spicerack] - 10https://gerrit.wikimedia.org/r/884998 (owner: 10Jbond)
[11:35:15] <wikibugs>	 (03PS5) 10Ilias Sarantopoulos: feat: add json payload capability [software/httpbb] - 10https://gerrit.wikimedia.org/r/884920 (https://phabricator.wikimedia.org/T328280)
[11:36:19] <wikibugs>	 (03PS1) 10JMeybohm: Update staging-codfw to k8s 1.23 [deployment-charts] - 10https://gerrit.wikimedia.org/r/885297 (https://phabricator.wikimedia.org/T327664)
[11:36:21] <wikibugs>	 (03PS6) 10Ilias Sarantopoulos: feat: add json payload capability [software/httpbb] - 10https://gerrit.wikimedia.org/r/884920 (https://phabricator.wikimedia.org/T328280)
[11:36:36] <wikibugs>	 (03Merged) 10jenkins-bot: reposync: switch from copy_tree to copytree [software/spicerack] - 10https://gerrit.wikimedia.org/r/884998 (owner: 10Jbond)
[11:37:34] <wikibugs>	 (03CR) 10Volans: "The change makes sense to me, but it would be nice to know that it makes sense also based on other redfish implementations." [software/spicerack] - 10https://gerrit.wikimedia.org/r/836749 (owner: 10Jbond)
[11:38:53] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] sre.gitlab.upgrade: get gitlab version from API [cookbooks] - 10https://gerrit.wikimedia.org/r/885291 (https://phabricator.wikimedia.org/T323569) (owner: 10Jelto)
[11:39:02] <wikibugs>	 (03PS5) 10Majavah: kubernetes: Apply resource changes on restart [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/868790 (https://phabricator.wikimedia.org/T277495)
[11:39:15] <wikibugs>	 (03CR) 10Majavah: kubernetes: Apply resource changes on restart (032 comments) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/868790 (https://phabricator.wikimedia.org/T277495) (owner: 10Majavah)
[11:39:18] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: feat: add json payload capability (033 comments) [software/httpbb] - 10https://gerrit.wikimedia.org/r/884920 (https://phabricator.wikimedia.org/T328280) (owner: 10Ilias Sarantopoulos)
[11:39:47] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] kubernetes: Apply resource changes on restart [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/868790 (https://phabricator.wikimedia.org/T277495) (owner: 10Majavah)
[11:40:31] <wikibugs>	 (03CR) 10Volans: "LGTM, question and nit inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/836757 (owner: 10Jbond)
[11:40:41] <wikibugs>	 (03Merged) 10jenkins-bot: sre.gitlab.upgrade: get gitlab version from API [cookbooks] - 10https://gerrit.wikimedia.org/r/885291 (https://phabricator.wikimedia.org/T323569) (owner: 10Jelto)
[11:40:43] <wikibugs>	 (03PS15) 10Vgutierrez: varnish: Generate a DP subkey daily [puppet] - 10https://gerrit.wikimedia.org/r/857748 (https://phabricator.wikimedia.org/T315676)
[11:41:13] <wikibugs>	 (03PS6) 10Majavah: kubernetes: Apply resource changes on restart [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/868790 (https://phabricator.wikimedia.org/T277495)
[11:41:15] <wikibugs>	 (03PS13) 10Slyngshede: P:IDM Configure OIDC and LDAP. [puppet] - 10https://gerrit.wikimedia.org/r/884881
[11:41:27] <wikibugs>	 (03CR) 10Volans: redfish: Move dell specific functionality to dell class (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/836749 (owner: 10Jbond)
[11:41:36] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] P:IDM Configure OIDC and LDAP. [puppet] - 10https://gerrit.wikimedia.org/r/884881 (owner: 10Slyngshede)
[11:42:37] <wikibugs>	 (03PS32) 10Vgutierrez: Varnish analytics: support differential privacy [puppet] - 10https://gerrit.wikimedia.org/r/824769 (https://phabricator.wikimedia.org/T315676) (owner: 10Isaac Johnson)
[11:42:54] <wikibugs>	 (03CR) 10Volans: "LGTM, some lines are reported as untested" [software/spicerack] - 10https://gerrit.wikimedia.org/r/884978 (owner: 10Jbond)
[11:44:08] <wikibugs>	 (03PS33) 10Vgutierrez: Varnish analytics: support differential privacy [puppet] - 10https://gerrit.wikimedia.org/r/824769 (https://phabricator.wikimedia.org/T315676) (owner: 10Isaac Johnson)
[11:44:15] <wikibugs>	 (03CR) 10Vgutierrez: Varnish analytics: support differential privacy (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/824769 (https://phabricator.wikimedia.org/T315676) (owner: 10Isaac Johnson)
[11:45:20] <wikibugs>	 (03PS1) 10Jbond: add profile::idm::server::oidc_secret [labs/private] - 10https://gerrit.wikimedia.org/r/885300
[11:45:34] <wikibugs>	 (03PS16) 10Vgutierrez: varnish: Generate a DP subkey daily [puppet] - 10https://gerrit.wikimedia.org/r/857748 (https://phabricator.wikimedia.org/T315676)
[11:45:54] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] add profile::idm::server::oidc_secret [labs/private] - 10https://gerrit.wikimedia.org/r/885300 (owner: 10Jbond)
[11:48:11] <wikibugs>	 (03PS17) 10Vgutierrez: varnish: Generate a DP subkey daily [puppet] - 10https://gerrit.wikimedia.org/r/857748 (https://phabricator.wikimedia.org/T315676)
[11:50:44] <wikibugs>	 (03PS2) 10Jbond: rotate-snmp: convert to cookbook classes and use secrets for passwords [cookbooks] - 10https://gerrit.wikimedia.org/r/884996
[11:50:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:50:55] <logmsgbot>	 !log jgiannelos@deploy1002 Started deploy [kartotherian/deploy@42a07d3] (eqiad): Disable traffic mirroring from codfw to eqiad
[11:51:31] <logmsgbot>	 !log jgiannelos@deploy1002 Finished deploy [kartotherian/deploy@42a07d3] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 35s)
[11:54:55] <icinga-wm>	 PROBLEM - kartotherian endpoints health on maps1010 is CRITICAL: /{src}/{z}/{x}/{y}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 301 (expecting: 200): /{src}/{z}/{x}/{y}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /{src}/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting
[11:54:55] <icinga-wm>	 /private-info/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /geoline?getgeojso
[11:54:55] <icinga-wm>	 {ids} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /geoshape?getgeojson=1&ids={ids} (Untitled test) is CRITICAL: Test Untitled test https://wikitech.wikimedia.org/wiki/Services/Monitoring/kartotherian
[11:55:07] <icinga-wm>	 PROBLEM - kartotherian endpoints health on maps1008 is CRITICAL: /{src}/{z}/{x}/{y}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 301 (expecting: 200): /{src}/{z}/{x}/{y}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /{src}/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting
[11:55:07] <icinga-wm>	 /private-info/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /geoline?getgeojso
[11:55:07] <icinga-wm>	 {ids} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /geoshape?getgeojson=1&ids={ids} (Untitled test) is CRITICAL: Test Untitled test https://wikitech.wikimedia.org/wiki/Services/Monitoring/kartotherian
[11:55:11] <icinga-wm>	 PROBLEM - kartotherian endpoints health on maps1006 is CRITICAL: /{src}/{z}/{x}/{y}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 301 (expecting: 200): /{src}/{z}/{x}/{y}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /{src}/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting
[11:55:11] <icinga-wm>	 /private-info/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /geoline?getgeojso
[11:55:11] <icinga-wm>	 {ids} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /geoshape?getgeojson=1&ids={ids} (Untitled test) is CRITICAL: Test Untitled test https://wikitech.wikimedia.org/wiki/Services/Monitoring/kartotherian
[11:55:39] <icinga-wm>	 PROBLEM - kartotherian endpoints health on maps1005 is CRITICAL: /{src}/{z}/{x}/{y}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 301 (expecting: 200): /{src}/{z}/{x}/{y}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /{src}/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting
[11:55:39] <icinga-wm>	 /private-info/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /geoline?getgeojso
[11:55:39] <icinga-wm>	 {ids} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /geoshape?getgeojson=1&ids={ids} (Untitled test) is CRITICAL: Test Untitled test https://wikitech.wikimedia.org/wiki/Services/Monitoring/kartotherian
[11:55:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[11:55:48] <wikibugs>	 (03PS1) 10Slyngshede: Rename profile::idm::server::oidc_secret variable [labs/private] - 10https://gerrit.wikimedia.org/r/885301
[11:56:09] <icinga-wm>	 PROBLEM - kartotherian endpoints health on maps1007 is CRITICAL: /{src}/{z}/{x}/{y}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 301 (expecting: 200): /{src}/{z}/{x}/{y}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /{src}/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting
[11:56:09] <icinga-wm>	 /private-info/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /geoline?getgeojso
[11:56:09] <icinga-wm>	 {ids} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /geoshape?getgeojson=1&ids={ids} (Untitled test) is CRITICAL: Test Untitled test https://wikitech.wikimedia.org/wiki/Services/Monitoring/kartotherian
[11:56:09] <icinga-wm>	 PROBLEM - Kartotherian LVS eqiad on kartotherian.svc.eqiad.wmnet is CRITICAL: /{src}/{z}/{x}/{y}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 301 (expecting: 200): /{src}/{z}/{x}/{y}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /{src}/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 4
[11:56:10] <icinga-wm>	 cting: 200): /private-info/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /geol
[11:56:10] <icinga-wm>	 eojson=1&ids={ids} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /geoshape?getgeojson=1&ids={ids} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /geopoint?getgeojson=1&ids={ids} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Maps%23Kartotherian
[11:57:57] <Lucas_WMDE>	 kartotherian unhappy again? :/
[11:58:25] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] Rename profile::idm::server::oidc_secret variable [labs/private] - 10https://gerrit.wikimedia.org/r/885301 (owner: 10Slyngshede)
[11:59:02] <wikibugs>	 (03PS18) 10Vgutierrez: varnish: Generate a DP subkey daily [puppet] - 10https://gerrit.wikimedia.org/r/857748 (https://phabricator.wikimedia.org/T315676)
[11:59:41] <wikibugs>	 (03PS1) 10JMeybohm: k8s: Update staging-eqiad to kubernetes 1.23 [puppet] - 10https://gerrit.wikimedia.org/r/885302 (https://phabricator.wikimedia.org/T327664)
[12:00:32] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1 C: 03+2] Rename profile::idm::server::oidc_secret variable [labs/private] - 10https://gerrit.wikimedia.org/r/885301 (owner: 10Slyngshede)
[12:00:35] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+2 C: 03+2] Rename profile::idm::server::oidc_secret variable [labs/private] - 10https://gerrit.wikimedia.org/r/885301 (owner: 10Slyngshede)
[12:01:42] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39335/console" [puppet] - 10https://gerrit.wikimedia.org/r/885302 (https://phabricator.wikimedia.org/T327664) (owner: 10JMeybohm)
[12:02:15] <wikibugs>	 (03PS2) 10JMeybohm: k8s: Update staging-eqiad to kubernetes 1.23 [puppet] - 10https://gerrit.wikimedia.org/r/885302 (https://phabricator.wikimedia.org/T327664)
[12:02:17] <wikibugs>	 (03PS1) 10JMeybohm: install_server: Update kubestagetcd1* to bullseye [puppet] - 10https://gerrit.wikimedia.org/r/885303 (https://phabricator.wikimedia.org/T327664)
[12:04:20] <nemo-yiannis>	 Lucas_WMDE: yeah we tried to deploy what we reverted yesterday but its only on eqiad so no production traffic is affected
[12:04:40] <nemo-yiannis>	 still doesn't look happy
[12:07:27] <Lucas_WMDE>	 ok, I see
[12:08:56] <wikibugs>	 (03Abandoned) 10Nikerabbit: Localisation updates from https://translatewiki.net. [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/883911 (owner: 10L10n-bot)
[12:11:45] <wikibugs>	 (03PS1) 10Muehlenhoff: Update cloudbastion rules for install2004 [puppet] - 10https://gerrit.wikimedia.org/r/885304 (https://phabricator.wikimedia.org/T327867)
[12:13:46] <wikibugs>	 (03PS14) 10Slyngshede: P:IDM Configure OIDC and LDAP. [puppet] - 10https://gerrit.wikimedia.org/r/884881
[12:15:26] <wikibugs>	 (03PS1) 10Muehlenhoff: Stop DHCP on install2004 for now [puppet] - 10https://gerrit.wikimedia.org/r/885305
[12:15:29] <wikibugs>	 (03PS1) 10Muehlenhoff: Point to install2004 for DHCP in codfw [homer/public] - 10https://gerrit.wikimedia.org/r/885326 (https://phabricator.wikimedia.org/T327867)
[12:16:15] <wikibugs>	 10SRE, 10serviceops, 10CommRel-Specialists-Support (Jan-Mar-2023), 10Datacenter-Switchover: CommRel support for March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T328287 (10Elitre)
[12:20:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:22:26] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[12:23:04] <icinga-wm>	 ACKNOWLEDGEMENT - Kartotherian LVS eqiad on kartotherian.svc.eqiad.wmnet is CRITICAL: /{src}/{z}/{x}/{y}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 301 (expecting: 200): /{src}/{z}/{x}/{y}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /{src}/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected 
[12:23:04] <icinga-wm>	 04 (expecting: 200): /private-info/info.json (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}@{scale}x.{format} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 404 (expecting: 200
[12:23:04] <icinga-wm>	 ine?getgeojson=1&ids={ids} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /geoshape?getgeojson=1&ids={ids} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200): /geopoint?getgeojson=1&ids={ids} (Untitled test) is CRITICAL: Test Untitled test returned the unexpected status 400 (expecting: 200) Effie Mouzeli devs are working on it https://wi
[12:23:04] <icinga-wm>	 ikimedia.org/wiki/Maps%23Kartotherian
[12:25:44] <wikibugs>	 (03PS2) 10Muehlenhoff: Stop DHCP on install2004 for now [puppet] - 10https://gerrit.wikimedia.org/r/885305
[12:25:51] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:28:17] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Stop DHCP on install2004 for now [puppet] - 10https://gerrit.wikimedia.org/r/885305 (owner: 10Muehlenhoff)
[12:34:16] <wikibugs>	 (03PS15) 10Slyngshede: P:IDM Configure OIDC and LDAP. [puppet] - 10https://gerrit.wikimedia.org/r/884881
[12:36:04] <wikibugs>	 (03PS1) 10Muehlenhoff: Fix name [puppet] - 10https://gerrit.wikimedia.org/r/885331
[12:36:10] <logmsgbot>	 !log jgiannelos@deploy1002 Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
[12:36:45] <icinga-wm>	 RECOVERY - kartotherian endpoints health on maps1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/kartotherian
[12:36:46] <logmsgbot>	 !log jgiannelos@deploy1002 Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 35s)
[12:37:06] <wikibugs>	 (03PS1) 10EoghanGaffney: Send exim mail.{log,info,warn,err} to kafka/logstash [puppet] - 10https://gerrit.wikimedia.org/r/885332 (https://phabricator.wikimedia.org/T321759)
[12:37:15] <icinga-wm>	 RECOVERY - Kartotherian LVS eqiad on kartotherian.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Maps%23Kartotherian
[12:37:15] <icinga-wm>	 RECOVERY - kartotherian endpoints health on maps1007 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/kartotherian
[12:37:20] <wikibugs>	 (03PS16) 10Slyngshede: P:IDM Configure OIDC and LDAP. [puppet] - 10https://gerrit.wikimedia.org/r/884881
[12:37:27] <nemo-yiannis>	 ^ FYI we reverted to previous healthy state since we figure out the problem (csp issues, x-amples on swagger not working as expected) cc effie 
[12:37:51] <icinga-wm>	 RECOVERY - kartotherian endpoints health on maps1010 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/kartotherian
[12:38:01] <icinga-wm>	 RECOVERY - kartotherian endpoints health on maps1008 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/kartotherian
[12:38:07] <icinga-wm>	 RECOVERY - kartotherian endpoints health on maps1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/kartotherian
[12:40:00] <wikibugs>	 (03PS17) 10Slyngshede: P:IDM Configure OIDC and LDAP. [puppet] - 10https://gerrit.wikimedia.org/r/884881
[12:40:59] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39340/console" [puppet] - 10https://gerrit.wikimedia.org/r/884881 (owner: 10Slyngshede)
[12:43:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Fix name [puppet] - 10https://gerrit.wikimedia.org/r/885331 (owner: 10Muehlenhoff)
[12:45:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:46:34] <wikibugs>	 (03CR) 10Jelto: "looks mostly good, one comment on commit message" [puppet] - 10https://gerrit.wikimedia.org/r/885294 (https://phabricator.wikimedia.org/T321760) (owner: 10EoghanGaffney)
[12:47:37] <wikibugs>	 (03PS2) 10EoghanGaffney: Add /var/log/mail.{log,info,err,warn} to rsyslog [puppet] - 10https://gerrit.wikimedia.org/r/885294 (https://phabricator.wikimedia.org/T321759)
[12:47:58] <wikibugs>	 (03PS18) 10Jaime Nuche: jenkins: add hieradata config for Scap3-based deployments [puppet] - 10https://gerrit.wikimedia.org/r/883913 (https://phabricator.wikimedia.org/T323909)
[12:48:00] <wikibugs>	 (03PS6) 10Jaime Nuche: jenkins: use Scap3 deployment for releases instances [puppet] - 10https://gerrit.wikimedia.org/r/884887 (https://phabricator.wikimedia.org/T323909)
[12:48:03] <wikibugs>	 (03PS4) 10Jaime Nuche: jenkins: enable Scap3 deployment for active releases instance [puppet] - 10https://gerrit.wikimedia.org/r/884891 (https://phabricator.wikimedia.org/T323909)
[12:48:05] <wikibugs>	 (03PS1) 10Jaime Nuche: jenkins: remove redundant class parameter [puppet] - 10https://gerrit.wikimedia.org/r/885333
[12:49:24] <wikibugs>	 (03CR) 10Jelto: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/885294 (https://phabricator.wikimedia.org/T321759) (owner: 10EoghanGaffney)
[12:50:45] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job jmx_puppetdb in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:51:57] <wikibugs>	 (03CR) 10Superpes15: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884934 (https://phabricator.wikimedia.org/T328357) (owner: 10Superpes15)
[12:54:12] <wikibugs>	 (03PS1) 10Muehlenhoff: Move next-server settings from install2003->2004 [puppet] - 10https://gerrit.wikimedia.org/r/885336 (https://phabricator.wikimedia.org/T327867)
[12:54:39] <wikibugs>	 (03PS5) 10Jaime Nuche: jenkins: enable Scap3 deployment for active releases instance [puppet] - 10https://gerrit.wikimedia.org/r/884891 (https://phabricator.wikimedia.org/T323909)
[12:55:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:57:28] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Migrate the install servers to Bullseye - https://phabricator.wikimedia.org/T327867 (10MoritzMuehlenhoff) install2004 has had the installserver role assigned and it's now acting the web proxy for codfw. The DHCP server is currrently stopped, tomorrow m...
[12:59:03] <wikibugs>	 (03CR) 10EoghanGaffney: Add /var/log/mail.{log,info,err,warn} to rsyslog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/885294 (https://phabricator.wikimedia.org/T321759) (owner: 10EoghanGaffney)
[12:59:36] <wikibugs>	 (03CR) 10Jaime Nuche: jenkins: use Scap3 deployment for releases instances (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/884887 (https://phabricator.wikimedia.org/T323909) (owner: 10Jaime Nuche)
[13:00:17] <wikibugs>	 (03CR) 10Jaime Nuche: "PCC: https://puppet-compiler.wmflabs.org/output/884887/39341/" [puppet] - 10https://gerrit.wikimedia.org/r/884887 (https://phabricator.wikimedia.org/T323909) (owner: 10Jaime Nuche)
[13:00:43] <wikibugs>	 (03CR) 10Jaime Nuche: "PCC: https://puppet-compiler.wmflabs.org/output/884891/39342/" [puppet] - 10https://gerrit.wikimedia.org/r/884891 (https://phabricator.wikimedia.org/T323909) (owner: 10Jaime Nuche)
[13:04:01] <wikibugs>	 (03PS10) 10Jbond: redfish: Move dell specific functionality to dell class [software/spicerack] - 10https://gerrit.wikimedia.org/r/836749
[13:04:03] <wikibugs>	 (03PS10) 10Jbond: redfish: store all OOB info for later use [software/spicerack] - 10https://gerrit.wikimedia.org/r/836757
[13:06:03] <wikibugs>	 (03PS1) 10Daniel Kinzler: Bump parsoid parser cache writes to 25%. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885337 (https://phabricator.wikimedia.org/T320534)
[13:06:05] <wikibugs>	 (03CR) 10Jbond: redfish: store all OOB info for later use (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/836757 (owner: 10Jbond)
[13:06:11] <wikibugs>	 (03CR) 10Jbond: redfish: Move dell specific functionality to dell class (035 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/836749 (owner: 10Jbond)
[13:09:12] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] kubernetes: Apply resource changes on restart [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/868790 (https://phabricator.wikimedia.org/T277495) (owner: 10Majavah)
[13:10:44] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "this needs manual rebase :-(" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/883261 (https://phabricator.wikimedia.org/T311918) (owner: 10Majavah)
[13:11:23] <wikibugs>	 (03PS3) 10Majavah: kubernetes: Use the shared image-config configmap [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/883261 (https://phabricator.wikimedia.org/T311918)
[13:11:37] <wikibugs>	 (03CR) 10Majavah: kubernetes: Use the shared image-config configmap (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/883261 (https://phabricator.wikimedia.org/T311918) (owner: 10Majavah)
[13:15:55] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39343/console" [puppet] - 10https://gerrit.wikimedia.org/r/884881 (owner: 10Slyngshede)
[13:23:01] <wikibugs>	 (03CR) 10Jbond: redfish: add system_manager info (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/884978 (owner: 10Jbond)
[13:37:26] <wikibugs>	 (03PS18) 10Slyngshede: P:IDM Configure OIDC and LDAP. [puppet] - 10https://gerrit.wikimedia.org/r/884881
[13:38:38] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39344/console" [puppet] - 10https://gerrit.wikimedia.org/r/884881 (owner: 10Slyngshede)
[13:39:23] <wikibugs>	 (03PS2) 10Jbond: redfish: add system_manager info [software/spicerack] - 10https://gerrit.wikimedia.org/r/884978
[13:40:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:42:01] <wikibugs>	 (03PS1) 10MSantos: mobileapps: bump to 2023-01-31-130212-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/885353
[13:42:40] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "I've mostly checked the varnish_dp_key_generator.py file and LGTM, I've left a minor nit inline." [puppet] - 10https://gerrit.wikimedia.org/r/857748 (https://phabricator.wikimedia.org/T315676) (owner: 10Vgutierrez)
[13:42:53] <wikibugs>	 10SRE, 10Commons, 10MediaWiki-File-management, 10StructuredDataOnCommons, and 3 others: Frequent "Error: 429, Too Many Requests" errors on pages with many (>50) thumbnails - https://phabricator.wikimedia.org/T266155 (10PatchDemoBot) Test wiki on [[ https://patchdemo.wmflabs.org | Patch demo ]]  by TheDJ us...
[13:45:45] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job jmx_puppetdb in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:45:52] <wikibugs>	 (03CR) 10Volans: "I've manually performed the steps at https://wikitech.wikimedia.org/wiki/Spicerack/Cookbooks#Renaming/Deleting_a_cookbook to remove the ol" [cookbooks] - 10https://gerrit.wikimedia.org/r/883228 (https://phabricator.wikimedia.org/T327783) (owner: 10Muehlenhoff)
[13:47:26] <wikibugs>	 (03PS2) 10Jbond: redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989
[13:48:13] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/885304 (https://phabricator.wikimedia.org/T327867) (owner: 10Muehlenhoff)
[13:50:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:51:38] <wikibugs>	 (03CR) 10MSantos: [C: 03+2] mobileapps: bump to 2023-01-31-130212-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/885353 (owner: 10MSantos)
[13:56:54] <wikibugs>	 (03Merged) 10jenkins-bot: mobileapps: bump to 2023-01-31-130212-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/885353 (owner: 10MSantos)
[13:58:42] <wikibugs>	 (03CR) 10Slyngshede: [V: 03+1] "Allow IDM to authenticate with OIDC." [puppet] - 10https://gerrit.wikimedia.org/r/884881 (owner: 10Slyngshede)
[13:58:46] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Machine-Learning-Team, 10Patch-For-Review: httpbb with HTTP POSTs and json payload - https://phabricator.wikimedia.org/T328280 (10isarantopoulos) a:03isarantopoulos
[13:59:13] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Machine-Learning-Team: httpbb doesn't support integers in the POST's body - https://phabricator.wikimedia.org/T328120 (10isarantopoulos) a:03isarantopoulos
[14:00:05] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: OwO what's this, a deployment window?? UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230131T1400). nyaa~
[14:00:05] <jouncebot>	 Dreamy_Jazz and duesen: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:05] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230131T1400)
[14:00:23] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/885332 (https://phabricator.wikimedia.org/T321759) (owner: 10EoghanGaffney)
[14:00:29] <Lucas_WMDE>	 I have a meeting and probably can’t deploy, sorry
[14:00:31] <urbanecm>	 i can deploy today
[14:00:38] <urbanecm>	 Dreamy_Jazz: duesen: hi, around?
[14:00:50] <duesen>	 urbanecm: hi
[14:00:51] <Dreamy_Jazz>	 \0
[14:00:57] <Lucas_WMDE>	 urbanecm: that’s good, since one of the changes needs CU rights to verify too ^^
[14:01:03] <urbanecm>	 :)
[14:01:40] <wikibugs>	 (03PS2) 10Urbanecm: Disable write old for CheckUserLog reason field for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885041 (https://phabricator.wikimedia.org/T233004) (owner: 10Dreamy Jazz)
[14:01:46] <duesen>	 urbanecm: my config change is the same as last week. This time, we go from 10% to 25%. Nothing to test. 
[14:01:47] <wikibugs>	 (03PS2) 10Urbanecm: Remove redundant definition of wgCheckUserEnableSpecialInvestigate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885051 (owner: 10Dreamy Jazz)
[14:01:50] <wikibugs>	 (03CR) 10Jbond: rotate-snmp: convert to cookbook classes and use secrets for passwords (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/884996 (owner: 10Jbond)
[14:01:53] <urbanecm>	 duesen: ack
[14:01:55] <logmsgbot>	 !log urbanecm@deploy1002 Backport cancelled.
[14:02:07] <wikibugs>	 (03PS2) 10Urbanecm: Bump parsoid parser cache writes to 25%. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885337 (https://phabricator.wikimedia.org/T320534) (owner: 10Daniel Kinzler)
[14:02:16] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by urbanecm@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885041 (https://phabricator.wikimedia.org/T233004) (owner: 10Dreamy Jazz)
[14:02:19] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by urbanecm@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885051 (owner: 10Dreamy Jazz)
[14:02:21] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by urbanecm@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885337 (https://phabricator.wikimedia.org/T320534) (owner: 10Daniel Kinzler)
[14:02:21] <urbanecm>	 let's do them all in one go then
[14:03:02] <wikibugs>	 (03Merged) 10jenkins-bot: Disable write old for CheckUserLog reason field for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885041 (https://phabricator.wikimedia.org/T233004) (owner: 10Dreamy Jazz)
[14:03:06] <wikibugs>	 (03Merged) 10jenkins-bot: Remove redundant definition of wgCheckUserEnableSpecialInvestigate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885051 (owner: 10Dreamy Jazz)
[14:03:09] <wikibugs>	 (03Merged) 10jenkins-bot: Bump parsoid parser cache writes to 25%. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885337 (https://phabricator.wikimedia.org/T320534) (owner: 10Daniel Kinzler)
[14:03:27] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Machine-Learning-Team, 10Patch-For-Review: httpbb with HTTP POSTs and json payload - https://phabricator.wikimedia.org/T328280 (10isarantopoulos) After discussing during the review with @RLazarus we went with the second approach.  In the aforementioned patch the...
[14:03:36] <logmsgbot>	 !log urbanecm@deploy1002 Started scap: Backport for [[gerrit:885041|Disable write old for CheckUserLog reason field for testwiki (T233004)]], [[gerrit:885051|Remove redundant definition of wgCheckUserEnableSpecialInvestigate]], [[gerrit:885337|Bump parsoid parser cache writes to 25%. (T320534)]]
[14:03:37] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [homer/public] - 10https://gerrit.wikimedia.org/r/885326 (https://phabricator.wikimedia.org/T327867) (owner: 10Muehlenhoff)
[14:03:42] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[14:03:43] <stashbot>	 T320534: Put Parsoid output into the ParserCache on every edit - https://phabricator.wikimedia.org/T320534
[14:04:14] <Dreamy_Jazz>	 I will be able to test the investigate one, but for the other one of mine the test steps are:
[14:04:14] <Dreamy_Jazz>	 * Make an entry into the CheckUserLog using any non-empty reason
[14:04:15] <Dreamy_Jazz>	 * Inspect that row in the database to ensure cul_reason is the empty string
[14:04:19] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-eqiad with k8s 1.23
[14:04:36] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-eqiad with k8s 1.23
[14:05:08] <logmsgbot>	 !log mbsantos@deploy1002 helmfile [staging] START helmfile.d/services/mobileapps: apply
[14:05:26] <logmsgbot>	 !log urbanecm@deploy1002 urbanecm and dreamyjazz and daniel: Backport for [[gerrit:885041|Disable write old for CheckUserLog reason field for testwiki (T233004)]], [[gerrit:885051|Remove redundant definition of wgCheckUserEnableSpecialInvestigate]], [[gerrit:885337|Bump parsoid parser cache writes to 25%. (T320534)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwde
[14:05:26] <logmsgbot>	 bug1002.eqiad.wmnet
[14:05:32] <logmsgbot>	 !log mbsantos@deploy1002 helmfile [staging] DONE helmfile.d/services/mobileapps: apply
[14:05:36] <urbanecm>	 Dreamy_Jazz: pulled to mwdebug for testing
[14:05:44] <logmsgbot>	 !log mbsantos@deploy1002 helmfile [codfw] START helmfile.d/services/mobileapps: apply
[14:06:04] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39345/console" [puppet] - 10https://gerrit.wikimedia.org/r/885333 (owner: 10Jaime Nuche)
[14:06:18] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] "LGTM and will merge as its a noop" [puppet] - 10https://gerrit.wikimedia.org/r/885333 (owner: 10Jaime Nuche)
[14:06:31] <logmsgbot>	 !log mbsantos@deploy1002 helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
[14:06:58] <Dreamy_Jazz>	 Special:Investigate shows and I can run a check using it on mwdebug1001
[14:07:04] <Dreamy_Jazz>	 So that change is good
[14:07:51] <wikibugs>	 (03CR) 10Jbond: "After merging i noticed that this will cause a change on the cloud instances" [puppet] - 10https://gerrit.wikimedia.org/r/885333 (owner: 10Jaime Nuche)
[14:07:56] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] install_server: Update kubestagetcd1* to bullseye [puppet] - 10https://gerrit.wikimedia.org/r/885303 (https://phabricator.wikimedia.org/T327664) (owner: 10JMeybohm)
[14:07:57] <logmsgbot>	 !log mbsantos@deploy1002 helmfile [eqiad] START helmfile.d/services/mobileapps: apply
[14:08:44] <logmsgbot>	 !log mbsantos@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
[14:09:03] <Dreamy_Jazz>	 The other is on testwiki so will need someone else than me to test
[14:09:12] <jayme>	 jbond: feel free to merge my change
[14:09:35] <urbanecm>	 Dreamy_Jazz: i assume i need to check if cul_reason stops being populated, and cu log works?
[14:09:46] <Dreamy_Jazz>	 Yes.
[14:09:57] <urbanecm>	 doing
[14:10:29] <Dreamy_Jazz>	 It should be the default value of a string if all things go right
[14:11:23] <Dreamy_Jazz>	 *a empty string
[14:11:38] * duesen is waiting for the metric to jump
[14:12:01] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1012:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[14:12:49] <urbanecm>	 it's indeed an empty string, and cul_reason_id: 6, so...looks good to me
[14:13:26] <urbanecm>	 (kind of surprised for the reason ID to be that low, but i guess "testing" is not uncommon at testwiki :D)
[14:13:38] * urbanecm is proceeding
[14:13:44] <Dreamy_Jazz>	 Yeah. I was going to say it had to be a fairly common reason
[14:13:44] <Lucas_WMDE>	 ^^
[14:14:30] <urbanecm>	 uhh. scap's full of red text.
[14:14:38] <wikibugs>	 (03CR) 10Jaime Nuche: jenkins: remove redundant class parameter (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/885333 (owner: 10Jaime Nuche)
[14:15:13] <urbanecm>	 this is the text https://www.irccloud.com/pastebin/mLQSHgH0/
[14:16:16] <urbanecm>	 ...and it proceeds with rolling things back
[14:16:21] <wikibugs>	 (03PS1) 10Jbond: apereo_cas: move merge strategy to lookup_options [puppet] - 10https://gerrit.wikimedia.org/r/885356
[14:16:25] <urbanecm>	 wonderful.
[14:17:32] <urbanecm>	 can i please get some SRE help with getting past this scap sync error? ^^ seems to be about k8s config being group-readable
[14:18:26] <urbanecm>	 jayme: sukhe: akosiaris: ^ please :)
[14:19:03] <jinxer-wm>	 (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[14:19:53] <jayme>	 urbanecm: uhm...looking
[14:20:09] <jayme>	 urbanecm: what host are you on?
[14:20:09] <logmsgbot>	 !log urbanecm@deploy1002 Finished scap: Backport for [[gerrit:885041|Disable write old for CheckUserLog reason field for testwiki (T233004)]], [[gerrit:885051|Remove redundant definition of wgCheckUserEnableSpecialInvestigate]], [[gerrit:885337|Bump parsoid parser cache writes to 25%. (T320534)]] (duration: 16m 33s)
[14:20:13] <urbanecm>	 jayme: deploy1002
[14:20:16] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[14:20:16] <stashbot>	 T320534: Put Parsoid output into the ParserCache on every edit - https://phabricator.wikimedia.org/T320534
[14:20:22] <urbanecm>	 trying to do a MW deployment with scap backport
[14:21:07] <duesen>	 urbanecm: confirmed, thank you!
[14:22:04] <urbanecm>	 the deployment finished appservers-wise. looks like scap's rollback only affects the k8s part.
[14:22:34] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] apereo_cas: move merge strategy to lookup_options [puppet] - 10https://gerrit.wikimedia.org/r/885356 (owner: 10Jbond)
[14:23:25] <duesen>	 Amir1: parsoid cache writs are now at 25%
[14:23:44] <Amir1>	 yup, thanks. Do you feel like reviewing this patch for the mobile clean up?
[14:25:11] <wikibugs>	 (03PS19) 10Jbond: P:IDM Configure OIDC and LDAP. [puppet] - 10https://gerrit.wikimedia.org/r/884881 (owner: 10Slyngshede)
[14:25:15] <jayme>	 urbanecm: I'd assume a temporary error, can you try again
[14:25:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:26:08] <jayme>	 maybe a race and the repo cache got corrupted during your scap run...fetching the charts works for me now 
[14:26:30] <urbanecm>	 ack, trying again.
[14:26:56] <logmsgbot>	 !log urbanecm@deploy1002 Started scap: Backport for [[gerrit:885041|Disable write old for CheckUserLog reason field for testwiki (T233004)]], [[gerrit:885051|Remove redundant definition of wgCheckUserEnableSpecialInvestigate]], [[gerrit:885337|Bump parsoid parser cache writes to 25%. (T320534)]]
[14:27:01] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[14:27:02] <stashbot>	 T320534: Put Parsoid output into the ParserCache on every edit - https://phabricator.wikimedia.org/T320534
[14:28:10] <Dreamy_Jazz>	 Do you need me to test my changes again?
[14:28:32] <urbanecm>	 Dreamy_Jazz: nope, I'm just re-running the sync to ensure it's actually synced out everywhere (incl. k8s)
[14:28:42] <logmsgbot>	 !log urbanecm@deploy1002 dreamyjazz and urbanecm and daniel: Backport for [[gerrit:885041|Disable write old for CheckUserLog reason field for testwiki (T233004)]], [[gerrit:885051|Remove redundant definition of wgCheckUserEnableSpecialInvestigate]], [[gerrit:885337|Bump parsoid parser cache writes to 25%. (T320534)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwde
[14:28:43] <logmsgbot>	 bug1002.eqiad.wmnet
[14:28:46] <Dreamy_Jazz>	 Thanks.
[14:28:49] <urbanecm>	 proceeding
[14:30:12] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] P:wmcs::metricsinfra: add haproxy config for grafana [puppet] - 10https://gerrit.wikimedia.org/r/869211 (https://phabricator.wikimedia.org/T307465) (owner: 10Majavah)
[14:30:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:30:46] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] P:wmcs::metricsinfra: add internal name for prometheus [puppet] - 10https://gerrit.wikimedia.org/r/871291 (https://phabricator.wikimedia.org/T307465) (owner: 10Majavah)
[14:31:07] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] P:wmcs::metricsinfra::grafana: configure data sources [puppet] - 10https://gerrit.wikimedia.org/r/871292 (https://phabricator.wikimedia.org/T307465) (owner: 10Majavah)
[14:31:09] <jayme>	 urbanecm: the group-readable config is a red herring btw. It's just a "security warning" that helm spits out (for every command you run via helmfile)
[14:31:49] <urbanecm>	 makes sense, thanks for the explanation/help.
[14:31:53] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm some minor comments" [puppet] - 10https://gerrit.wikimedia.org/r/884881 (owner: 10Slyngshede)
[14:32:05] <urbanecm>	 it got past the k8s steps w/o any errors on the second try
[14:32:35] <wikibugs>	 (03CR) 10Jbond: jenkins: remove redundant class parameter (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/885333 (owner: 10Jaime Nuche)
[14:32:37] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.ganeti.reimage for host kubestagetcd1004.eqiad.wmnet with OS bullseye
[14:33:08] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.ganeti.reimage for host kubestagetcd1005.eqiad.wmnet with OS bullseye
[14:33:28] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.ganeti.reimage for host kubestagetcd1006.eqiad.wmnet with OS bullseye
[14:33:55] <jayme>	 nice, thanks!
[14:34:19] <logmsgbot>	 !log urbanecm@deploy1002 Finished scap: Backport for [[gerrit:885041|Disable write old for CheckUserLog reason field for testwiki (T233004)]], [[gerrit:885051|Remove redundant definition of wgCheckUserEnableSpecialInvestigate]], [[gerrit:885337|Bump parsoid parser cache writes to 25%. (T320534)]] (duration: 07m 23s)
[14:34:27] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[14:34:27] <stashbot>	 T320534: Put Parsoid output into the ParserCache on every edit - https://phabricator.wikimedia.org/T320534
[14:34:30] <urbanecm>	 and it's all done now
[14:35:20] <wikibugs>	 (03PS1) 10Ottomata: mw-page-content-change-enrichment - v1.0.5 [deployment-charts] - 10https://gerrit.wikimedia.org/r/885357 (https://phabricator.wikimedia.org/T325305)
[14:35:40] <Dreamy_Jazz>	 Thanks!
[14:35:47] <urbanecm>	 np
[14:38:25] <wikibugs>	 (03CR) 10Gmodena: [C: 03+1] "LGTM" [deployment-charts] - 10https://gerrit.wikimedia.org/r/885357 (https://phabricator.wikimedia.org/T325305) (owner: 10Ottomata)
[14:38:44] <wikibugs>	 (03PS1) 10Dreamy Jazz: Disable write old for CheckUserLog reason on group 0 and group 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885358 (https://phabricator.wikimedia.org/T233004)
[14:39:58] <jinxer-wm>	 (KubernetesAPILatency) firing: (11) High Kubernetes API latency (LIST csinodes) on k8s-staging@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[14:40:45] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:41:18] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1006.eqiad.wmnet with reason: host reimage
[14:41:22] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1004.eqiad.wmnet with reason: host reimage
[14:41:28] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1005.eqiad.wmnet with reason: host reimage
[14:42:08] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp2035.codfw.wmnet with OS bullseye
[14:42:14] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp2035.codfw.wmnet with OS bullseye
[14:42:53] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] mw-page-content-change-enrichment - v1.0.5 [deployment-charts] - 10https://gerrit.wikimedia.org/r/885357 (https://phabricator.wikimedia.org/T325305) (owner: 10Ottomata)
[14:43:38] <wikibugs>	 (03PS1) 10Dreamy Jazz: Disable write old for CheckUserLog reason everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885359 (https://phabricator.wikimedia.org/T233004)
[14:43:43] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1006.eqiad.wmnet with reason: host reimage
[14:44:22] <logmsgbot>	 !log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[14:44:25] <wikibugs>	 (03CR) 10Bking: [C: 03+1] miscweb / query_service: remove ability to list directories [puppet] - 10https://gerrit.wikimedia.org/r/883272 (https://phabricator.wikimedia.org/T324667) (owner: 10Gehel)
[14:44:26] <logmsgbot>	 !log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[14:44:30] <wikibugs>	 (03PS1) 10Stevemunene: Add authzIdentity to jaas config [deployment-charts] - 10https://gerrit.wikimedia.org/r/885360 (https://phabricator.wikimedia.org/T327884)
[14:45:29] <wikibugs>	 (03PS4) 10Superpes15: Add mobile wordmark to cswiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884934 (https://phabricator.wikimedia.org/T328357)
[14:45:31] <wikibugs>	 (03PS1) 10Ottomata: mw-page-content-change-enrichment - use correct image verison v1.0.4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/885361 (https://phabricator.wikimedia.org/T327494)
[14:45:45] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:45:48] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] mw-page-content-change-enrichment - use correct image verison v1.0.4 [deployment-charts] - 10https://gerrit.wikimedia.org/r/885361 (https://phabricator.wikimedia.org/T327494) (owner: 10Ottomata)
[14:46:18] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1004.eqiad.wmnet with reason: host reimage
[14:46:21] <logmsgbot>	 !log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[14:46:25] <logmsgbot>	 !log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[14:48:41] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1005.eqiad.wmnet with reason: host reimage
[14:50:35] <wikibugs>	 (03CR) 10David Caro: P:metricsinfra: add profile and role for a Grafana server (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/869210 (https://phabricator.wikimedia.org/T307465) (owner: 10Majavah)
[14:55:45] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:56:26] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1006.eqiad.wmnet with OS bullseye
[14:56:56] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1004.eqiad.wmnet with OS bullseye
[14:57:01] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) resolved: Blazegraph instance wdqs1012:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[15:00:01] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1012:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[15:01:08] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
[15:01:16] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1005.eqiad.wmnet with OS bullseye
[15:02:28] <wikibugs>	 (03CR) 10Majavah: P:metricsinfra: add profile and role for a Grafana server (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/869210 (https://phabricator.wikimedia.org/T307465) (owner: 10Majavah)
[15:02:56] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - k8s-ingress-staging_30443: Servers kubestage1004.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:04:28] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
[15:09:50] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - k8s-ingress-staging_30443: Servers kubestage1003.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[15:15:01] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) resolved: Blazegraph instance wdqs1012:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[15:15:25] <jayme>	 this is me
[15:15:32] <icinga-wm>	 PROBLEM - Check systemd state on deploy1002 is CRITICAL: CRITICAL - degraded: The following units failed: imagecatalog_record.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:15:34] <jayme>	 (pybal backend)
[15:15:34] <sukhe>	 thanks
[15:15:59] <jayme>	 sorry for telling so late - had to jump in a meeting
[15:16:23] <wikibugs>	 (03PS1) 10Jbond: redfish: Add simple supermicro class [software/spicerack] - 10https://gerrit.wikimedia.org/r/885363
[15:16:24] <sukhe>	 np, I figured it was from your earlier work and also that you would have noticed it
[15:18:22] <wikibugs>	 (03CR) 10David Caro: P:metricsinfra: add profile and role for a Grafana server (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/869210 (https://phabricator.wikimedia.org/T307465) (owner: 10Majavah)
[15:19:41] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] k8s: Update staging-eqiad to kubernetes 1.23 [puppet] - 10https://gerrit.wikimedia.org/r/885302 (https://phabricator.wikimedia.org/T327664) (owner: 10JMeybohm)
[15:19:54] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: Add simple supermicro class [software/spicerack] - 10https://gerrit.wikimedia.org/r/885363 (owner: 10Jbond)
[15:20:58] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.ganeti.reimage for host kubestagemaster1001.eqiad.wmnet with OS bullseye
[15:21:30] <wikibugs>	 (03PS2) 10Jbond: redfish: Add simple supermicro class [software/spicerack] - 10https://gerrit.wikimedia.org/r/885363
[15:23:42] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bullseye
[15:24:16] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2035.codfw.wmnet with OS bullseye
[15:24:22] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp2035.codfw.wmnet with OS bullseye completed: - cp2035 (**PASS**)   - Downtimed on Icinga/Alertmanager   - Disabled Pu...
[15:25:03] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: Add simple supermicro class [software/spicerack] - 10https://gerrit.wikimedia.org/r/885363 (owner: 10Jbond)
[15:26:38] <wikibugs>	 (03PS2) 10Southparkfan: rsyslog: allow subject name validation [puppet] - 10https://gerrit.wikimedia.org/r/876248 (https://phabricator.wikimedia.org/T127717)
[15:26:59] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] rsyslog: allow subject name validation [puppet] - 10https://gerrit.wikimedia.org/r/876248 (https://phabricator.wikimedia.org/T127717) (owner: 10Southparkfan)
[15:27:08] <Amir1>	 jouncebot: nowandnext
[15:27:08] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 32 minute(s)
[15:27:08] <jouncebot>	 In 1 hour(s) and 32 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230131T1700)
[15:27:14] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Set 'groupLoadsBySection' for s11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885058 (https://phabricator.wikimedia.org/T326980) (owner: 10Zabe)
[15:27:41] <wikibugs>	 (03CR) 10Southparkfan: rsyslog: allow subject name validation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/876248 (https://phabricator.wikimedia.org/T127717) (owner: 10Southparkfan)
[15:28:03] <wikibugs>	 (03CR) 10Reedy: [C: 03+1] Document the '+' pattern for specifying wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885048 (owner: 10Gergő Tisza)
[15:28:07] <wikibugs>	 (03CR) 10Majavah: P:metricsinfra: add profile and role for a Grafana server (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/869210 (https://phabricator.wikimedia.org/T307465) (owner: 10Majavah)
[15:28:11] <wikibugs>	 (03Merged) 10jenkins-bot: Set 'groupLoadsBySection' for s11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885058 (https://phabricator.wikimedia.org/T326980) (owner: 10Zabe)
[15:29:10] <wikibugs>	 (03PS1) 10Bking: flink-rdf-streaming-updater: use S3 instead of swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/885365 (https://phabricator.wikimedia.org/T304914)
[15:30:08] <logmsgbot>	 !log ladsgroup@deploy1002 Started scap: Backport for [[gerrit:885058|Set 'groupLoadsBySection' for s11 (T326980)]]
[15:30:14] <stashbot>	 T326980: PHP Notice: Undefined index: s11 - https://phabricator.wikimedia.org/T326980
[15:31:25] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1001.eqiad.wmnet with reason: host reimage
[15:32:00] <logmsgbot>	 !log ladsgroup@deploy1002 ladsgroup and zabe: Backport for [[gerrit:885058|Set 'groupLoadsBySection' for s11 (T326980)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
[15:32:33] <wikibugs>	 (03PS3) 10Southparkfan: rsyslog: allow subject name validation [puppet] - 10https://gerrit.wikimedia.org/r/876248 (https://phabricator.wikimedia.org/T127717)
[15:33:00] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Machine-Learning-Team: httpbb doesn't support integers in the POST's body - https://phabricator.wikimedia.org/T328120 (10isarantopoulos) @elukey I closed this task since your change has already been merged and deployed.
[15:33:55] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] P:metricsinfra: add profile and role for a Grafana server (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/869210 (https://phabricator.wikimedia.org/T307465) (owner: 10Majavah)
[15:34:06] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] P:wmcs::metricsinfra: add haproxy config for grafana [puppet] - 10https://gerrit.wikimedia.org/r/869211 (https://phabricator.wikimedia.org/T307465) (owner: 10Majavah)
[15:34:14] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] P:wmcs::metricsinfra: add internal name for prometheus [puppet] - 10https://gerrit.wikimedia.org/r/871291 (https://phabricator.wikimedia.org/T307465) (owner: 10Majavah)
[15:34:18] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] P:wmcs::metricsinfra::grafana: configure data sources [puppet] - 10https://gerrit.wikimedia.org/r/871292 (https://phabricator.wikimedia.org/T307465) (owner: 10Majavah)
[15:34:31] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1001.eqiad.wmnet with reason: host reimage
[15:34:34] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] flink-rdf-streaming-updater: use S3 instead of swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/885365 (https://phabricator.wikimedia.org/T304914) (owner: 10Bking)
[15:35:43] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
[15:35:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:36:17] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Split Swift cookbooks (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/883228 (https://phabricator.wikimedia.org/T327783) (owner: 10Muehlenhoff)
[15:37:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Update cloudbastion rules for install2004 [puppet] - 10https://gerrit.wikimedia.org/r/885304 (https://phabricator.wikimedia.org/T327867) (owner: 10Muehlenhoff)
[15:38:46] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
[15:39:58] <logmsgbot>	 !log ladsgroup@deploy1002 Finished scap: Backport for [[gerrit:885058|Set 'groupLoadsBySection' for s11 (T326980)]] (duration: 09m 49s)
[15:40:03] <stashbot>	 T326980: PHP Notice: Undefined index: s11 - https://phabricator.wikimedia.org/T326980
[15:40:17] <wikibugs>	 (03PS1) 10Ottomata: Define dse_kubepod_networks in network constants and in ferm defs [puppet] - 10https://gerrit.wikimedia.org/r/885366 (https://phabricator.wikimedia.org/T328447)
[15:40:45] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job jmx_puppetdb in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:42:30] <wikibugs>	 (03PS1) 10Ottomata: Allow access to kafka jumbo and test from DSE k8s [puppet] - 10https://gerrit.wikimedia.org/r/885367 (https://phabricator.wikimedia.org/T325305)
[15:43:47] <wikibugs>	 (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/39347/console" [puppet] - 10https://gerrit.wikimedia.org/r/885367 (https://phabricator.wikimedia.org/T325305) (owner: 10Ottomata)
[15:46:35] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh)
[15:49:46] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagemaster1001.eqiad.wmnet with OS bullseye
[15:50:20] <wikibugs>	 (03CR) 10Btullis: Define dse_kubepod_networks in network constants and in ferm defs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/885366 (https://phabricator.wikimedia.org/T328447) (owner: 10Ottomata)
[15:51:41] <wikibugs>	 (03PS2) 10Ottomata: Define dse_kubepod_networks in network constants and in ferm defs [puppet] - 10https://gerrit.wikimedia.org/r/885366 (https://phabricator.wikimedia.org/T328447)
[15:52:19] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] "Looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/885366 (https://phabricator.wikimedia.org/T328447) (owner: 10Ottomata)
[15:52:23] <wikibugs>	 (03CR) 10Ottomata: Define dse_kubepod_networks in network constants and in ferm defs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/885366 (https://phabricator.wikimedia.org/T328447) (owner: 10Ottomata)
[15:54:22] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.reimage for host kubestage1004.eqiad.wmnet with OS bullseye
[15:54:56] <wikibugs>	 (03CR) 10Bking: [C: 03+2] flink-rdf-streaming-updater: use S3 instead of swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/885365 (https://phabricator.wikimedia.org/T304914) (owner: 10Bking)
[15:55:16] <wikibugs>	 (03CR) 10Bking: [V: 03+2 C: 03+2] flink-rdf-streaming-updater: use S3 instead of swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/885365 (https://phabricator.wikimedia.org/T304914) (owner: 10Bking)
[15:55:18] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Define dse_kubepod_networks in network constants and in ferm defs [puppet] - 10https://gerrit.wikimedia.org/r/885366 (https://phabricator.wikimedia.org/T328447) (owner: 10Ottomata)
[15:55:28] <wikibugs>	 (03CR) 10Ottomata: [V: 03+1 C: 03+2] Allow access to kafka jumbo and test from DSE k8s [puppet] - 10https://gerrit.wikimedia.org/r/885367 (https://phabricator.wikimedia.org/T325305) (owner: 10Ottomata)
[15:55:32] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=cdn
[15:55:33] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be
[15:55:40] <wikibugs>	 (03PS2) 10Ottomata: Allow access to kafka jumbo and test from DSE k8s [puppet] - 10https://gerrit.wikimedia.org/r/885367 (https://phabricator.wikimedia.org/T325305)
[15:55:44] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/836749 (owner: 10Jbond)
[15:56:31] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
[15:56:37] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp5018.eqsin.wmnet with OS bullseye
[15:56:53] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS bullseye
[15:57:02] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp5028.eqsin.wmnet with OS bullseye
[15:57:07] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] Allow access to kafka jumbo and test from DSE k8s [puppet] - 10https://gerrit.wikimedia.org/r/885367 (https://phabricator.wikimedia.org/T325305) (owner: 10Ottomata)
[15:57:15] <ottomata>	 moritzm: am puppet-merging your 'Update cloudbastion rules for install2004' change.
[15:57:32] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/836757 (owner: 10Jbond)
[15:58:05] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Update staging-codfw to k8s 1.23 [deployment-charts] - 10https://gerrit.wikimedia.org/r/885297 (https://phabricator.wikimedia.org/T327664) (owner: 10JMeybohm)
[15:58:11] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/884978 (owner: 10Jbond)
[15:58:49] <wikibugs>	 (03PS1) 10Filippo Giunchedi: scap: shorten timeout on target bootstrap [puppet] - 10https://gerrit.wikimedia.org/r/885369
[15:58:51] <wikibugs>	 (03PS1) 10Filippo Giunchedi: pontoon: default to not block_abuse_nets [puppet] - 10https://gerrit.wikimedia.org/r/885370
[15:58:52] <godog>	 a little gerrit spam incoming, sorry
[15:58:53] <wikibugs>	 (03PS1) 10Filippo Giunchedi: pontoon: update o11y with opensearch roles and settings [puppet] - 10https://gerrit.wikimedia.org/r/885371
[15:58:55] <wikibugs>	 (03PS1) 10Filippo Giunchedi: opensearch: move to /run/ [puppet] - 10https://gerrit.wikimedia.org/r/885372
[15:58:57] <wikibugs>	 (03PS1) 10Filippo Giunchedi: opensearch: service depends on tmpfile [puppet] - 10https://gerrit.wikimedia.org/r/885373
[16:00:05] <logmsgbot>	 !log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
[16:00:07] <logmsgbot>	 !log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
[16:00:26] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: mediawiki: adapt rsyslog parsing of slowlog to ecs 1.11 [deployment-charts] - 10https://gerrit.wikimedia.org/r/884360
[16:00:37] <moritzm>	 ottomata: ack, thx
[16:00:43] <logmsgbot>	 !log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[16:00:44] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: mediawiki: adapt rsyslog parsing of slowlog to ecs 1.11 (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/884360 (owner: 10Giuseppe Lavagetto)
[16:01:12] <logmsgbot>	 !log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[16:01:42] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host cp2032.codfw.wmnet with OS bullseye
[16:01:50] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2032.codfw.wmnet with OS bullseye
[16:03:29] <wikibugs>	 (03CR) 10Filippo Giunchedi: "If this looks good I think we can port the same changes to elasticsearch too" [puppet] - 10https://gerrit.wikimedia.org/r/885373 (owner: 10Filippo Giunchedi)
[16:04:09] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Ditto as I56f976f65d I think this could/should be ported to elasticsearch too" [puppet] - 10https://gerrit.wikimedia.org/r/885372 (owner: 10Filippo Giunchedi)
[16:05:03] <wikibugs>	 (03Merged) 10jenkins-bot: Update staging-codfw to k8s 1.23 [deployment-charts] - 10https://gerrit.wikimedia.org/r/885297 (https://phabricator.wikimedia.org/T327664) (owner: 10JMeybohm)
[16:05:29] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989 (owner: 10Jbond)
[16:06:03] <wikibugs>	 (03PS2) 10Herron: logstash: remove rate of ingestion percent change compared to yesterday alert [alerts] - 10https://gerrit.wikimedia.org/r/884349 (https://phabricator.wikimedia.org/T202307)
[16:06:37] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
[16:07:37] <wikibugs>	 (03CR) 10Herron: [C: 03+2] logstash: remove rate of ingestion percent change compared to yesterday alert [alerts] - 10https://gerrit.wikimedia.org/r/884349 (https://phabricator.wikimedia.org/T202307) (owner: 10Herron)
[16:08:48] <wikibugs>	 10SRE, 10API Platform, 10GrowthExperiments-ImpactModule, 10Growth-Team (Current Sprint), 10MW-1.40-notes (1.40.0-wmf.21; 2023-01-30): UserImpact: Fetch information for more articles when calculating most-viewed-articles data ponit - https://phabricator.wikimedia.org/T324675 (10EChetty)
[16:08:51] <wikibugs>	 10SRE, 10API Platform, 10GrowthExperiments-ImpactModule, 10Growth-Team (Current Sprint), 10MW-1.40-notes (1.40.0-wmf.21; 2023-01-30): UserImpact: Fetch information for more articles when calculating most-viewed-articles data ponit - https://phabricator.wikimedia.org/T324675 (10EChetty) Maintainers of AQS...
[16:08:53] <wikibugs>	 (03Merged) 10jenkins-bot: logstash: remove rate of ingestion percent change compared to yesterday alert [alerts] - 10https://gerrit.wikimedia.org/r/884349 (https://phabricator.wikimedia.org/T202307) (owner: 10Herron)
[16:09:48] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
[16:10:51] <wikibugs>	 10SRE, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 10 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10EChetty)
[16:11:05] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering-Planning: Q3:rack/setup/install an-worker11[49-56] - https://phabricator.wikimedia.org/T327295 (10EChetty)
[16:12:18] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] releases: add blackbox::check::http monitor [puppet] - 10https://gerrit.wikimedia.org/r/884392 (https://phabricator.wikimedia.org/T327975) (owner: 10Dzahn)
[16:13:46] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] setup.py: force a newer sphinx_rtd_theme [software/spicerack] - 10https://gerrit.wikimedia.org/r/883538 (owner: 10Volans)
[16:14:10] <wikibugs>	 10SRE, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10EChetty)
[16:14:11] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bullseye
[16:14:22] <wikibugs>	 (03CR) 10EoghanGaffney: [C: 03+2] Send exim mail.{log,info,warn,err} to kafka/logstash [puppet] - 10https://gerrit.wikimedia.org/r/885332 (https://phabricator.wikimedia.org/T321759) (owner: 10EoghanGaffney)
[16:14:29] <wikibugs>	 (03PS2) 10EoghanGaffney: Send exim mail.{log,info,warn,err} to kafka/logstash [puppet] - 10https://gerrit.wikimedia.org/r/885332 (https://phabricator.wikimedia.org/T321759)
[16:14:43] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Q3:rack/setup/install an-worker11[49-56] - https://phabricator.wikimedia.org/T327295 (10EChetty)
[16:15:08] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Also when investigating this I have come to doubt whether ExecStartPre is actually effective/needed: by default it is executed as the unit" [puppet] - 10https://gerrit.wikimedia.org/r/885373 (owner: 10Filippo Giunchedi)
[16:16:19] <wikibugs>	 (03PS1) 10Dzahn: releases: fix IP family parameter name in blackbox http check [puppet] - 10https://gerrit.wikimedia.org/r/885376
[16:16:47] <wikibugs>	 (03PS2) 10Volans: setup.py: force a newer sphinx_rtd_theme [software/spicerack] - 10https://gerrit.wikimedia.org/r/883538
[16:16:53] <wikibugs>	 (03CR) 10Volans: [C: 03+2] setup.py: force a newer sphinx_rtd_theme [software/spicerack] - 10https://gerrit.wikimedia.org/r/883538 (owner: 10Volans)
[16:17:20] <wikibugs>	 (03CR) 10Volans: [C: 03+2] setup.py: force a newer sphinx_rtd_theme [software/cumin] - 10https://gerrit.wikimedia.org/r/883540 (owner: 10Volans)
[16:17:55] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] releases: fix IP family parameter name in blackbox http check [puppet] - 10https://gerrit.wikimedia.org/r/885376 (owner: 10Dzahn)
[16:18:12] <logmsgbot>	 !log sukhe@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5018.eqsin.wmnet with OS bullseye
[16:18:14] <logmsgbot>	 !log sukhe@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5028.eqsin.wmnet with OS bullseye
[16:18:18] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp5018.eqsin.wmnet with OS bullseye executed with errors: - cp5018 (**FAIL**)   - Downtimed on Icinga/Alertmanager   -...
[16:18:21] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp5028.eqsin.wmnet with OS bullseye executed with errors: - cp5028 (**FAIL**)   - Downtimed on Icinga/Alertmanager   -...
[16:18:43] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS bullseye
[16:18:46] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
[16:18:50] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp5028.eqsin.wmnet with OS bullseye
[16:18:52] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp5018.eqsin.wmnet with OS bullseye
[16:19:48] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[16:20:19] <wikibugs>	 (03PS1) 10DCausse: flink-app: do not set "taskmanager.numberOfTaskSlots" [deployment-charts] - 10https://gerrit.wikimedia.org/r/885377
[16:20:28] <wikibugs>	 (03Merged) 10jenkins-bot: setup.py: force a newer sphinx_rtd_theme [software/spicerack] - 10https://gerrit.wikimedia.org/r/883538 (owner: 10Volans)
[16:20:35] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[16:21:17] <wikibugs>	 (03CR) 10Jaime Nuche: [C: 03+1] scap: shorten timeout on target bootstrap [puppet] - 10https://gerrit.wikimedia.org/r/885369 (owner: 10Filippo Giunchedi)
[16:22:26] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[16:23:52] <wikibugs>	 (03Merged) 10jenkins-bot: setup.py: force a newer sphinx_rtd_theme [software/cumin] - 10https://gerrit.wikimedia.org/r/883540 (owner: 10Volans)
[16:27:05] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] scap: shorten timeout on target bootstrap [puppet] - 10https://gerrit.wikimedia.org/r/885369 (owner: 10Filippo Giunchedi)
[16:28:14] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1004.eqiad.wmnet with OS bullseye
[16:29:13] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet,service=cdn
[16:29:14] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet,service=ats-be
[16:29:33] <zabe>	 !log zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Grants:Programs/Wikimedia Community Fund" "Grants:Programs/Wikimedia Community Fund/General Support Fund" "Zabe" --reason "per request [[:phab:T328456|T328456]]" --skip-subpages #  T328456
[16:29:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:37] <stashbot>	 T328456: Move translatable page Grants:Programs/Wikimedia Community Fund - https://phabricator.wikimedia.org/T328456
[16:35:24] <icinga-wm>	 PROBLEM - Host cp5019 is DOWN: PING CRITICAL - Packet loss = 100%
[16:35:32] <sukhe>	 er ok, downtiming it too
[16:35:52] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 5:00:00 on cp5019.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
[16:36:07] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on cp5019.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
[16:37:36] <logmsgbot>	 !log bking@deploy1002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
[16:37:55] <logmsgbot>	 !log bking@deploy1002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
[16:38:41] <logmsgbot>	 !log bking@deploy1002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
[16:38:49] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: sre: add alerting for mediawiki on k8s [alerts] - 10https://gerrit.wikimedia.org/r/797315
[16:39:57] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] sre: add alerting for mediawiki on k8s [alerts] - 10https://gerrit.wikimedia.org/r/797315 (owner: 10Giuseppe Lavagetto)
[16:40:00] <logmsgbot>	 !log bking@deploy1002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
[16:41:45] <wikibugs>	 (03PS1) 10Ottomata: mw--page-content-change-enrichment - increase memory in dse k8s [deployment-charts] - 10https://gerrit.wikimedia.org/r/885382 (https://phabricator.wikimedia.org/T325305)
[16:41:50] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging] START helmfile.d/services/miscweb: apply
[16:42:12] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] mw--page-content-change-enrichment - increase memory in dse k8s [deployment-charts] - 10https://gerrit.wikimedia.org/r/885382 (https://phabricator.wikimedia.org/T325305) (owner: 10Ottomata)
[16:42:37] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] conftool-data: add logstash[12]032 to kibana7 backend [puppet] - 10https://gerrit.wikimedia.org/r/881813 (owner: 10Cwhite)
[16:43:26] <logmsgbot>	 !log jayme@deploy1002 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[16:43:48] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[16:44:12] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[16:44:42] <logmsgbot>	 !log cwhite@cumin2002 conftool action : set/weight=10; selector: name=logstash1032.eqiad.wmnet
[16:45:03] <logmsgbot>	 !log cwhite@cumin2002 conftool action : set/weight=10; selector: name=logstash2032.codfw.wmnet
[16:46:02] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] mw--page-content-change-enrichment - increase memory in dse k8s [deployment-charts] - 10https://gerrit.wikimedia.org/r/885382 (https://phabricator.wikimedia.org/T325305) (owner: 10Ottomata)
[16:46:27] <logmsgbot>	 !log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[16:46:32] <logmsgbot>	 !log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[16:48:38] <icinga-wm>	 PROBLEM - puppet last run on mw2271 is CRITICAL: CRITICAL: Puppet has been disabled for 604958 seconds, message: test - dzahn, last run 7 days ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[16:48:43] <wikibugs>	 (03PS1) 10Dzahn: etherpad: use correct port number for blackbox monitoring [puppet] - 10https://gerrit.wikimedia.org/r/885383 (https://phabricator.wikimedia.org/T327974)
[16:49:13] <mutante>	 oh, did I disable puppet on a random mw server and forget? fixing that
[16:49:19] <logmsgbot>	 !log brett@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2032.codfw.wmnet with OS bullseye
[16:49:23] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2032.codfw.wmnet with OS bullseye executed with errors: - cp2032 (**FAIL**)   - Downtimed on Icinga/Alertmanager   -...
[16:49:40] <mutante>	 !log mw2271 - renabling disabled puppet
[16:49:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:49:59] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host cp2032.codfw.wmnet with OS bullseye
[16:50:02] <wikibugs>	 (03PS1) 10Ottomata: mw-page-content-change-enrichment - lower mem usage to match k8s limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/885384 (https://phabricator.wikimedia.org/T325305)
[16:50:32] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2032.codfw.wmnet with OS bullseye
[16:51:22] <wikibugs>	 (03CR) 10Ottomata: "Otherwise:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/885384 (https://phabricator.wikimedia.org/T325305) (owner: 10Ottomata)
[16:51:29] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] mw-page-content-change-enrichment - lower mem usage to match k8s limits [deployment-charts] - 10https://gerrit.wikimedia.org/r/885384 (https://phabricator.wikimedia.org/T325305) (owner: 10Ottomata)
[16:52:08] <logmsgbot>	 !log cwhite@deploy1002 Started deploy [releng/phatality@e0bb573]: (no justification provided)
[16:52:13] <logmsgbot>	 !log otto@deploy1002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[16:52:18] <logmsgbot>	 !log cwhite@deploy1002 Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 10s)
[16:52:21] <logmsgbot>	 !log otto@deploy1002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
[16:52:30] <logmsgbot>	 !log cwhite@deploy1002 Started deploy [releng/phatality@e0bb573]: (no justification provided)
[16:52:41] <logmsgbot>	 !log cwhite@deploy1002 Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 11s)
[16:52:58] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (GET pods) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[16:54:18] <icinga-wm>	 RECOVERY - puppet last run on mw2271 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[16:54:39] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
[16:54:57] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp5028.eqsin.wmnet with reason: host reimage
[16:55:34] <wikibugs>	 (03PS3) 10Cwhite: logstash: clean up curator actions todo items [puppet] - 10https://gerrit.wikimedia.org/r/869251 (https://phabricator.wikimedia.org/T301760)
[16:56:13] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] logstash: clean up curator actions todo items [puppet] - 10https://gerrit.wikimedia.org/r/869251 (https://phabricator.wikimedia.org/T301760) (owner: 10Cwhite)
[16:57:45] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
[16:58:12] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] logstash: change ecs-default clean up policy to prefix [puppet] - 10https://gerrit.wikimedia.org/r/869252 (owner: 10Cwhite)
[16:58:19] <wikibugs>	 (03PS3) 10Cwhite: logstash: change ecs-default clean up policy to prefix [puppet] - 10https://gerrit.wikimedia.org/r/869252
[16:59:06] <wikibugs>	 (03CR) 10DCausse: "we want to customize this value to use allow more tasks to run per pods. Did not put in values.yaml as I'm not sure if it's something the " [deployment-charts] - 10https://gerrit.wikimedia.org/r/885377 (owner: 10DCausse)
[16:59:31] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5028.eqsin.wmnet with reason: host reimage
[16:59:32] <wikibugs>	 (03PS3) 10Cwhite: logstash: change ecs-test clean up policy to prefix [puppet] - 10https://gerrit.wikimedia.org/r/869253
[17:00:04] <jouncebot>	 jbond and rzl: gettimeofday() says it's time for Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230131T1700)
[17:00:04] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[17:01:02] <wikibugs>	 (03PS1) 10Jelto: sre.gitlab.upgrade: remove Debian revision suffix from version check [cookbooks] - 10https://gerrit.wikimedia.org/r/885385 (https://phabricator.wikimedia.org/T323569)
[17:02:15] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] logstash: change ecs-test clean up policy to prefix [puppet] - 10https://gerrit.wikimedia.org/r/869253 (owner: 10Cwhite)
[17:02:35] <wikibugs>	 (03PS3) 10Cwhite: logstash: change w3creportingapi clean up policy to prefix [puppet] - 10https://gerrit.wikimedia.org/r/869254
[17:02:58] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (GET pods) on k8s-staging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-staging - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[17:03:11] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.dhcp for host cp5019.eqsin.wmnet
[17:03:58] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] logstash: change w3creportingapi clean up policy to prefix [puppet] - 10https://gerrit.wikimedia.org/r/869254 (owner: 10Cwhite)
[17:04:27] <wikibugs>	 (03CR) 10Jelto: [C: 03+1] "lgtm together with I97f5f27991d5cecda3fe5a2b927cade329ebeded" [puppet] - 10https://gerrit.wikimedia.org/r/885332 (https://phabricator.wikimedia.org/T321759) (owner: 10EoghanGaffney)
[17:05:11] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] Send exim mail.{log,info,warn,err} to kafka/logstash [puppet] - 10https://gerrit.wikimedia.org/r/885332 (https://phabricator.wikimedia.org/T321759) (owner: 10EoghanGaffney)
[17:05:29] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
[17:08:36] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
[17:09:25] <icinga-wm>	 RECOVERY - Host cp5019 is UP: PING WARNING - Packet loss = 90%, RTA = 225.37 ms
[17:12:07] <icinga-wm>	 RECOVERY - Check systemd state on deploy1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:13:22] <wikibugs>	 (03CR) 10Jbond: "i think ill rework this to use the multihttpush uri instead" [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989 (owner: 10Jbond)
[17:13:49] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] "Looks good, thanks." [deployment-charts] - 10https://gerrit.wikimedia.org/r/885360 (https://phabricator.wikimedia.org/T327884) (owner: 10Stevemunene)
[17:14:10] <logmsgbot>	 !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp5019.eqsin.wmnet
[17:16:28] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Machine-Learning-Team: httpbb doesn't support integers in the POST's body - https://phabricator.wikimedia.org/T328120 (10Aklapper) @isarantopoulos: Hi, this task is still open. If this task is resolved, please set the task status to `resolved`. Thanks a lot!
[17:28:41] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2032.codfw.wmnet with OS bullseye
[17:28:46] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2032.codfw.wmnet with OS bullseye completed: - cp2032 (**PASS**)   - Removed from Puppet and PuppetDB if present   -...
[17:29:26] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5029.eqsin.wmnet,service=cdn
[17:29:26] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp5029.eqsin.wmnet,service=ats-be
[17:29:33] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=cdn
[17:29:34] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=ats-be
[17:29:53] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 1:00:00 on cp5029.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
[17:30:03] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "Makes sense without overcomplicating it using the debian versioning scheme" [cookbooks] - 10https://gerrit.wikimedia.org/r/885385 (https://phabricator.wikimedia.org/T323569) (owner: 10Jelto)
[17:30:08] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp5029.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
[17:30:18] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5028.eqsin.wmnet with OS bullseye
[17:30:31] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp5028.eqsin.wmnet with OS bullseye completed: - cp5028 (**PASS**)   - Removed from Puppet and PuppetDB if present   -...
[17:31:27] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh)
[17:31:44] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5028.eqsin.wmnet,service=cdn
[17:31:44] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5028.eqsin.wmnet,service=ats-be
[17:33:10] <wikibugs>	 10SRE-tools, 10Infrastructure-Foundations, 10Machine-Learning-Team: httpbb doesn't support integers in the POST's body - https://phabricator.wikimedia.org/T328120 (10RLazarus) 05Open→03Resolved
[17:33:14] <logmsgbot>	 !log brett@cumin2002 conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet
[17:34:09] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5018.eqsin.wmnet with OS bullseye
[17:34:11] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall)
[17:34:15] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp5018.eqsin.wmnet with OS bullseye completed: - cp5018 (**PASS**)   - Removed from Puppet and PuppetDB if present   -...
[17:34:52] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh)
[17:34:58] <wikibugs>	 10SRE, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team, and 2 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10dcaro)
[17:35:23] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet,service=cdn
[17:35:24] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet,service=ats-be
[17:36:45] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[17:37:48] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1076.eqiad.wmnet
[17:38:04] <logmsgbot>	 !log sukhe@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1076.eqiad.wmnet
[17:38:21] <wikibugs>	 (03PS1) 10Jdrewniak: Add cswiki to desktop-improvements group. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885391 (https://phabricator.wikimedia.org/T328154)
[17:38:44] <wikibugs>	 10SRE, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team, and 2 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10Andrew) We have a ton of rebalancing to do for each of these switches. The C8 deadline we can meet but can we ge...
[17:38:55] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1090.eqiad.wmnet
[17:39:02] <logmsgbot>	 !log sukhe@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1090.eqiad.wmnet
[17:41:49] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[17:42:16] <wikibugs>	 (03PS3) 10Jbond: redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989
[17:45:56] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989 (owner: 10Jbond)
[17:46:48] <wikibugs>	 (03CR) 10Ahmon Dancy: "joe, this chart is still referenced from https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/image-suggestion-api/+/refs/hea" [deployment-charts] - 10https://gerrit.wikimedia.org/r/859541 (owner: 10Giuseppe Lavagetto)
[17:46:58] <wikibugs>	 (03PS1) 10Bking: flink-rdf-streaming-updater: use S3 instead of swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/885392 (https://phabricator.wikimedia.org/T304914)
[17:47:05] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] flink-rdf-streaming-updater: use S3 instead of swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/885392 (https://phabricator.wikimedia.org/T304914) (owner: 10Bking)
[17:47:18] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.remove-downtime for cp5019.eqsin.wmnet
[17:47:19] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp5019.eqsin.wmnet
[17:50:43] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host cp2034.codfw.wmnet with OS bullseye
[17:50:49] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2034.codfw.wmnet with OS bullseye
[17:52:46] <wikibugs>	 (03PS11) 10Jbond: redfish: Move dell specific functionality to dell class [software/spicerack] - 10https://gerrit.wikimedia.org/r/836749
[17:52:47] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=cdn
[17:52:47] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
[17:52:48] <wikibugs>	 (03PS11) 10Jbond: redfish: store all OOB info for later use [software/spicerack] - 10https://gerrit.wikimedia.org/r/836757
[17:52:50] <wikibugs>	 (03PS3) 10Jbond: redfish: add system_manager info [software/spicerack] - 10https://gerrit.wikimedia.org/r/884978
[17:52:52] <wikibugs>	 (03PS4) 10Jbond: redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989
[17:52:54] <wikibugs>	 (03PS3) 10Jbond: redfish: Add simple supermicro class [software/spicerack] - 10https://gerrit.wikimedia.org/r/885363
[17:52:56] <wikibugs>	 (03PS2) 10DCausse: flink-app: do not set "taskmanager.numberOfTaskSlots" [deployment-charts] - 10https://gerrit.wikimedia.org/r/885377
[17:53:01] <sukhe>	 !log depool cp1075.eqiad.wmnet for iDRAC firmware testing: T321309
[17:53:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:05] <stashbot>	 T321309: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309
[17:55:48] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.reimage for host cp5029.eqsin.wmnet with OS bullseye
[17:55:54] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cp5029.eqsin.wmnet with OS bullseye
[17:56:16] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: store all OOB info for later use [software/spicerack] - 10https://gerrit.wikimedia.org/r/836757 (owner: 10Jbond)
[17:56:19] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: Move dell specific functionality to dell class [software/spicerack] - 10https://gerrit.wikimedia.org/r/836749 (owner: 10Jbond)
[17:56:24] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: Add simple supermicro class [software/spicerack] - 10https://gerrit.wikimedia.org/r/885363 (owner: 10Jbond)
[17:56:28] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: add system_manager info [software/spicerack] - 10https://gerrit.wikimedia.org/r/884978 (owner: 10Jbond)
[17:56:34] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989 (owner: 10Jbond)
[17:57:27] <wikibugs>	 (03PS1) 10Bking: flink-rdf-streaming-updater: use S3 instead of swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/885394 (https://phabricator.wikimedia.org/T304914)
[17:57:53] <wikibugs>	 (03Abandoned) 10Bking: flink-rdf-streaming-updater: use S3 instead of swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/885392 (https://phabricator.wikimedia.org/T304914) (owner: 10Bking)
[18:00:04] <jouncebot>	 Deploy window MediaWiki infrastucture (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230131T1800)
[18:00:57] <wikibugs>	 (03PS12) 10Jbond: redfish: Move dell specific functionality to dell class [software/spicerack] - 10https://gerrit.wikimedia.org/r/836749
[18:01:00] <wikibugs>	 (03PS12) 10Jbond: redfish: store all OOB info for later use [software/spicerack] - 10https://gerrit.wikimedia.org/r/836757
[18:01:01] <wikibugs>	 (03PS4) 10Jbond: redfish: add system_manager info [software/spicerack] - 10https://gerrit.wikimedia.org/r/884978
[18:01:03] <wikibugs>	 (03PS5) 10Jbond: redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989
[18:01:05] <wikibugs>	 (03PS4) 10Jbond: redfish: Add simple supermicro class [software/spicerack] - 10https://gerrit.wikimedia.org/r/885363
[18:01:17] <wikibugs>	 (03PS1) 10Nray: Enable ClientPreferences for group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885395 (https://phabricator.wikimedia.org/T327979)
[18:04:15] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: Move dell specific functionality to dell class [software/spicerack] - 10https://gerrit.wikimedia.org/r/836749 (owner: 10Jbond)
[18:04:36] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: store all OOB info for later use [software/spicerack] - 10https://gerrit.wikimedia.org/r/836757 (owner: 10Jbond)
[18:04:38] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: add system_manager info [software/spicerack] - 10https://gerrit.wikimedia.org/r/884978 (owner: 10Jbond)
[18:04:40] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: Add simple supermicro class [software/spicerack] - 10https://gerrit.wikimedia.org/r/885363 (owner: 10Jbond)
[18:04:42] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989 (owner: 10Jbond)
[18:05:33] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] flink-rdf-streaming-updater: use S3 instead of swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/885394 (https://phabricator.wikimedia.org/T304914) (owner: 10Bking)
[18:05:48] <wikibugs>	 (03PS13) 10Jbond: redfish: Move dell specific functionality to dell class [software/spicerack] - 10https://gerrit.wikimedia.org/r/836749
[18:06:01] <wikibugs>	 (03PS13) 10Jbond: redfish: store all OOB info for later use [software/spicerack] - 10https://gerrit.wikimedia.org/r/836757
[18:06:19] <wikibugs>	 (03PS5) 10Jbond: redfish: add system_manager info [software/spicerack] - 10https://gerrit.wikimedia.org/r/884978
[18:06:27] <wikibugs>	 (03PS6) 10Jbond: redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989
[18:07:36] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
[18:07:42] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp5020.eqsin.wmnet with OS bullseye
[18:09:47] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
[18:09:56] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989 (owner: 10Jbond)
[18:10:35] <wikibugs>	 (03CR) 10Bking: [C: 03+2] flink-rdf-streaming-updater: use S3 instead of swift [deployment-charts] - 10https://gerrit.wikimedia.org/r/885394 (https://phabricator.wikimedia.org/T304914) (owner: 10Bking)
[18:10:44] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] redfish: Move dell specific functionality to dell class [software/spicerack] - 10https://gerrit.wikimedia.org/r/836749 (owner: 10Jbond)
[18:10:49] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] redfish: store all OOB info for later use (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/836757 (owner: 10Jbond)
[18:10:54] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] redfish: add system_manager info [software/spicerack] - 10https://gerrit.wikimedia.org/r/884978 (owner: 10Jbond)
[18:12:56] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
[18:14:06] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: store all OOB info for later use [software/spicerack] - 10https://gerrit.wikimedia.org/r/836757 (owner: 10Jbond)
[18:14:08] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: add system_manager info [software/spicerack] - 10https://gerrit.wikimedia.org/r/884978 (owner: 10Jbond)
[18:19:03] <jinxer-wm>	 (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[18:19:12] <logmsgbot>	 !log sukhe@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
[18:19:17] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp5020.eqsin.wmnet with OS bullseye executed with errors: - cp5020 (**FAIL**)   - Downtimed on Icinga/Alertmanager   -...
[18:19:39] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=cdn
[18:19:39] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
[18:20:03] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
[18:20:09] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp5020.eqsin.wmnet with OS bullseye
[18:21:19] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1075.eqiad.wmnet
[18:21:24] <logmsgbot>	 !log sukhe@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1075.eqiad.wmnet
[18:22:28] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1075.eqiad.wmnet']
[18:22:33] <logmsgbot>	 !log sukhe@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1075.eqiad.wmnet']
[18:23:20] <wikibugs>	 (03PS2) 10Sbailey: Enable Linter write namespace, tag and template for group0 and group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885046 (https://phabricator.wikimedia.org/T299612)
[18:24:09] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1075']
[18:24:15] <logmsgbot>	 !log sukhe@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1075']
[18:25:09] <logmsgbot>	 !log bking@deploy1002 helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
[18:25:59] <logmsgbot>	 !log bking@deploy1002 helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
[18:26:37] <mutante>	 !log gitlab-prod-1001.devtools (cloud) - ip addr del 172.16.7.146/21 dev eth0 - T318521
[18:26:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:26:41] <stashbot>	 T318521: Migrate gitlab-test instance to bullseye - https://phabricator.wikimedia.org/T318521
[18:32:30] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2034.codfw.wmnet with OS bullseye
[18:32:34] <wikibugs>	 (03CR) 10Jdlrobson: [C: 03+1] Enable ClientPreferences for group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885395 (https://phabricator.wikimedia.org/T327979) (owner: 10Nray)
[18:32:35] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2034.codfw.wmnet with OS bullseye completed: - cp2034 (**PASS**)   - Downtimed on Icinga/Alertmanager   - Disabled Pu...
[18:34:45] <wikibugs>	 10SRE, 10ops-drmrs, 10Infrastructure-Foundations, 10netops: cr2-drmrs:xe-0/1/1 stuck optic - https://phabricator.wikimedia.org/T324555 (10RobH) CS0907837:    > Support, >  > We have three items for remote hands to accomplish for us on this request: >  > 1) Please pickup DEL0117661, unpackage it into our ra...
[18:42:40] <logmsgbot>	 !log sukhe@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
[18:42:45] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp5020.eqsin.wmnet with OS bullseye executed with errors: - cp5020 (**FAIL**)   - Removed from Puppet and PuppetDB if p...
[18:42:51] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
[18:42:57] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp5020.eqsin.wmnet with OS bullseye
[18:44:36] <mutante>	 !log gitlab-prod-1001.devtools (cloud) - rebooted VM ; ip addr del 172.16.7.146/32 dev eth0 - T318521
[18:44:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:44:41] <stashbot>	 T318521: Migrate gitlab-test instance to bullseye - https://phabricator.wikimedia.org/T318521
[18:44:54] <wikibugs>	 (03PS7) 10Jbond: redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989
[18:46:05] <wikibugs>	 (03PS14) 10Jbond: redfish: Move dell specific functionality to dell class [software/spicerack] - 10https://gerrit.wikimedia.org/r/836749
[18:46:43] <wikibugs>	 (03PS14) 10Jbond: redfish: store all OOB info for later use [software/spicerack] - 10https://gerrit.wikimedia.org/r/836757
[18:46:53] <wikibugs>	 (03PS6) 10Jbond: redfish: add system_manager info [software/spicerack] - 10https://gerrit.wikimedia.org/r/884978
[18:47:03] <wikibugs>	 (03PS8) 10Jbond: redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989
[18:50:36] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989 (owner: 10Jbond)
[18:50:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job pdu_sentry4 in ops@eqsin - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[18:53:50] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=cdn
[18:53:51] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-be
[18:53:58] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh)
[18:55:45] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job pdu_sentry4 in ops@eqsin - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[18:58:04] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] "Thank you!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/885377 (owner: 10DCausse)
[19:00:05] <jouncebot>	 dancy and brennen: How many deployers does it take to do MediaWiki train - Utc-7 Version deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230131T1900).
[19:01:32] <dancy>	 o/
[19:01:41] <dancy>	 Pressing the buttons
[19:01:58] <brennen>	 o/
[19:02:45] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 wikis to 1.40.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885406 (https://phabricator.wikimedia.org/T325584)
[19:02:46] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] group0 wikis to 1.40.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885406 (https://phabricator.wikimedia.org/T325584) (owner: 10TrainBranchBot)
[19:03:34] <wikibugs>	 (03Merged) 10jenkins-bot: group0 wikis to 1.40.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885406 (https://phabricator.wikimedia.org/T325584) (owner: 10TrainBranchBot)
[19:04:46] <wikibugs>	 (03Merged) 10jenkins-bot: flink-app: do not set "taskmanager.numberOfTaskSlots" [deployment-charts] - 10https://gerrit.wikimedia.org/r/885377 (owner: 10DCausse)
[19:12:05] <logmsgbot>	 !log dancy@deploy1002 rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.21  refs T325584
[19:12:10] <stashbot>	 T325584: 1.40.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T325584
[19:15:15] <wikibugs>	 10SRE, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: codfw row A switches upgrade - https://phabricator.wikimedia.org/T327925 (10colewhite)
[19:15:45] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job pdu_sentry4 in ops@eqsin - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[19:16:08] <logmsgbot>	 !log sukhe@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
[19:16:14] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp5020.eqsin.wmnet with OS bullseye executed with errors: - cp5020 (**FAIL**)   - Removed from Puppet and PuppetDB if p...
[19:16:41] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
[19:17:15] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp5020.eqsin.wmnet with OS bullseye
[19:20:45] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job pdu_sentry4 in ops@eqsin - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[19:21:31] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host cp2037.codfw.wmnet with OS bullseye
[19:21:37] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2037.codfw.wmnet with OS bullseye
[19:26:31] <wikibugs>	 10SRE, 10ops-eqsin, 10DC-Ops, 10Traffic: eqsin hosts are not rebooting when running sre.hosts.reimage cookbook - https://phabricator.wikimedia.org/T327812 (10Papaul) on cp5029 reimage steps - start reimage cookbook on one terminal  - start console on another terminal on the console terminal the server pxe...
[19:30:13] <logmsgbot>	 !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp5029.eqsin.wmnet with reason: host reimage
[19:32:20] <wikibugs>	 (03PS9) 10Jbond: redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989
[19:33:01] <wikibugs>	 10SRE, 10ops-drmrs, 10Infrastructure-Foundations, 10netops: cr2-drmrs:xe-0/1/1 stuck optic - https://phabricator.wikimedia.org/T324555 (10RobH) p:05Low→03Medium
[19:33:25] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5029.eqsin.wmnet with reason: host reimage
[19:35:06] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[19:35:40] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] redfish: add upload/update methods [software/spicerack] - 10https://gerrit.wikimedia.org/r/884989 (owner: 10Jbond)
[19:36:06] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[19:36:27] <wikibugs>	 10SRE, 10ops-drmrs, 10Infrastructure-Foundations, 10netops: cr2-drmrs:xe-0/1/1 stuck optic - https://phabricator.wikimedia.org/T324555 (10RobH)
[19:40:16] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
[19:43:21] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
[19:58:02] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
[19:58:08] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp5020.eqsin.wmnet with OS bullseye
[19:58:10] <wikibugs>	 10SRE, 10Traffic-Icebox: Disable TLSv1/TLSv1.1 on sites without caching layer - https://phabricator.wikimedia.org/T238518 (10BCornwall)
[19:58:14] <logmsgbot>	 !log sukhe@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5020.eqsin.wmnet with OS bullseye
[19:58:19] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp5020.eqsin.wmnet with OS bullseye executed with errors: - cp5020 (**FAIL**)   - Removed from Puppet and PuppetDB if p...
[19:58:24] <wikibugs>	 10SRE, 10Traffic-Icebox: Disable TLSv1/TLSv1.1 on sites without caching layer - https://phabricator.wikimedia.org/T238518 (10BCornwall)
[19:59:05] <sukhe>	 !log sudo rm /etc/dhcp/automation/ttyS1-115200/cp5020.conf
[19:59:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:59:33] <wikibugs>	 (03CR) 10RLazarus: "I don't have any strong feelings about this, but I do want httpbb to be consistent with SRE's other Python repos." [software/httpbb] - 10https://gerrit.wikimedia.org/r/885273 (owner: 10Ilias Sarantopoulos)
[20:00:32] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
[20:00:38] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp5020.eqsin.wmnet with OS bullseye
[20:03:17] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2037.codfw.wmnet with OS bullseye
[20:03:23] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2037.codfw.wmnet with OS bullseye completed: - cp2037 (**PASS**)   - Downtimed on Icinga/Alertmanager   - Disabled Pu...
[20:04:14] <logmsgbot>	 !log pt1979@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5029.eqsin.wmnet with OS bullseye
[20:04:21] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cp5029.eqsin.wmnet with OS bullseye completed: - cp5029 (**PASS**)   - Downtimed on Icinga/Alertmanager   - Disabled P...
[20:05:35] <logmsgbot>	 !log brett@cumin2002 conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet
[20:05:56] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall)
[20:06:19] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS bullseye
[20:06:27] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2039.codfw.wmnet with OS bullseye
[20:07:21] <wikibugs>	 (03PS1) 10Slyngshede: C:apereo_cas fix memberOf to group mapping in OIDC. [puppet] - 10https://gerrit.wikimedia.org/r/885415
[20:09:06] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5029.eqsin.wmnet,service=cdn
[20:09:06] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5029.eqsin.wmnet,service=ats-be
[20:09:24] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh)
[20:11:32] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host cp2036.codfw.wmnet with OS bullseye
[20:11:38] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2036.codfw.wmnet with OS bullseye
[20:12:07] <wikibugs>	 (03PS1) 10Zabe: Stop writing to cuc_user and cuc_user_text in group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885416 (https://phabricator.wikimedia.org/T233004)
[20:19:45] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:22:26] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[20:25:19] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
[20:28:03] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
[20:30:04] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
[20:33:16] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
[20:37:01] <wikibugs>	 (03PS2) 10Brian Wolff: Restrict flow-edit-title to autoconfirmed on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884142 (https://phabricator.wikimedia.org/T328097)
[20:44:46] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:45:52] <zabe>	 !log start running "foreachwikiindblist s5.dblist migrateRevisionCommentTemp.php --sleep 2" in screen # T275246
[20:45:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:45:58] <stashbot>	 T275246: Populate rev_actor and rev_comment_id - https://phabricator.wikimedia.org/T275246
[20:47:49] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS bullseye
[20:47:54] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2039.codfw.wmnet with OS bullseye completed: - cp2039 (**PASS**)   - Downtimed on Icinga/Alertmanager   - Disabled Pu...
[20:50:32] <wikibugs>	 10SRE, 10SRE-swift-storage, 10Data-Persistence, 10Thumbor Migration: Pooling thumbor-k8s causes spikes in swift 500 errors - https://phabricator.wikimedia.org/T328033 (10VirginiaPoundstone) @KOfori sre and data persistence tagged.  Thank you for your guidance.
[20:52:27] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2036.codfw.wmnet with OS bullseye
[20:52:32] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2036.codfw.wmnet with OS bullseye completed: - cp2036 (**PASS**)   - Downtimed on Icinga/Alertmanager   - Disabled Pu...
[20:54:16] <wikibugs>	 (03CR) 10Ollie Shotton: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885422 (https://phabricator.wikimedia.org/T326313) (owner: 10Ollie Shotton)
[20:55:21] <wikibugs>	 10SRE, 10API Platform, 10GrowthExperiments-ImpactModule, 10Growth-Team (Current Sprint), 10MW-1.40-notes (1.40.0-wmf.21; 2023-01-30): UserImpact: Fetch information for more articles when calculating most-viewed-articles data ponit - https://phabricator.wikimedia.org/T324675 (10kostajh) >>! In T324675#857...
[20:57:15] <logmsgbot>	 !log sukhe@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5020.eqsin.wmnet with OS bullseye
[20:57:21] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp5020.eqsin.wmnet with OS bullseye executed with errors: - cp5020 (**FAIL**)   - Removed from Puppet and PuppetDB if p...
[20:58:44] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
[20:58:50] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp5020.eqsin.wmnet with OS bullseye
[21:00:01] <wikibugs>	 10SRE, 10DBA, 10Data-Persistence, 10Data-Persistence-Backup, and 2 others: Data check es2020 after replication broke - https://phabricator.wikimedia.org/T327770 (10jcrespo) 05In progress→03Resolved All tables resulted ok from the check, comparing eqiad, its codfw primary and itself on the last 4million...
[21:00:04] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230131T2100).
[21:00:04] <jouncebot>	 sbailey, nray, and bawolff: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:16] <bawolff>	 Woo
[21:00:27] <nray>	 o/
[21:00:34] <sbailey>	 I am here :-) I think
[21:00:58] <kindrobot>	 I can start the deploy window, but I'll need to hand it off if it goes long.
[21:02:27] <kindrobot>	 sbailey, I'll do yours first.
[21:02:35] <sbailey>	 ok, ready
[21:03:19] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by kindrobot@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885046 (https://phabricator.wikimedia.org/T299612) (owner: 10Sbailey)
[21:03:39] <kindrobot>	 !log start UTC late backport window
[21:03:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:04:04] <wikibugs>	 (03Merged) 10jenkins-bot: Enable Linter write namespace, tag and template for group0 and group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885046 (https://phabricator.wikimedia.org/T299612) (owner: 10Sbailey)
[21:04:27] <logmsgbot>	 !log kindrobot@deploy1002 Started scap: Backport for [[gerrit:885046|Enable Linter write namespace, tag and template for group0 and group1 (T299612)]]
[21:04:31] <stashbot>	 T299612: Add namespace column and index to table - https://phabricator.wikimedia.org/T299612
[21:06:17] <logmsgbot>	 !log kindrobot@deploy1002 sbailey and kindrobot: Backport for [[gerrit:885046|Enable Linter write namespace, tag and template for group0 and group1 (T299612)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
[21:06:40] <kindrobot>	 sbailey: can you confirm?
[21:07:50] <sbailey>	 waiting for sync, blocked creating page on test2wiki, going to another site. This is run from a job. Should be safe as group 0 passed fine.
[21:08:29] <sbailey>	 trying meta
[21:09:15] <sbailey>	 yes can do it here, give me 1 minute
[21:09:50] <kindrobot>	 ack
[21:12:00] <sbailey>	 We are good to go
[21:12:06] <sbailey>	 working on meta
[21:12:17] <kindrobot>	 Great, thanks! Syncing...
[21:15:25] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Add DP cookie for pageview filtering - https://phabricator.wikimedia.org/T315676 (10Jcross) Thank you so much for the quick reply. Exciting!!
[21:15:53] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "..for now.." [puppet] - 10https://gerrit.wikimedia.org/r/885383 (https://phabricator.wikimedia.org/T327974) (owner: 10Dzahn)
[21:17:48] <logmsgbot>	 !log kindrobot@deploy1002 Finished scap: Backport for [[gerrit:885046|Enable Linter write namespace, tag and template for group0 and group1 (T299612)]] (duration: 13m 20s)
[21:17:53] <stashbot>	 T299612: Add namespace column and index to table - https://phabricator.wikimedia.org/T299612
[21:18:36] <kindrobot>	 Next up in nray if you're ready.
[21:18:46] <nray>	 thank you, im ready!
[21:19:08] <kindrobot>	 Oh, actually it looks like there's a merge conflict. Could you resolve it?
[21:19:40] <nray>	 kindrobot: let me take a look
[21:19:42] <kindrobot>	 bawolff: yours also has a merge conflict
[21:19:54] <bawolff>	 Oh, i jut rebased it, jus a second i'll do it again
[21:20:11] <wikibugs>	 (03PS2) 10Nray: Enable ClientPreferences for group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885395 (https://phabricator.wikimedia.org/T327979)
[21:20:23] <wikibugs>	 (03PS3) 10Brian Wolff: Restrict flow-edit-title to autoconfirmed on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884142 (https://phabricator.wikimedia.org/T328097)
[21:20:38] <nray>	 @kindrobot should be good now
[21:21:23] <kindrobot>	 Great, merging...
[21:22:11] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by kindrobot@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885395 (https://phabricator.wikimedia.org/T327979) (owner: 10Nray)
[21:23:00] <wikibugs>	 (03Merged) 10jenkins-bot: Enable ClientPreferences for group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885395 (https://phabricator.wikimedia.org/T327979) (owner: 10Nray)
[21:23:23] <logmsgbot>	 !log kindrobot@deploy1002 Started scap: Backport for [[gerrit:885395|Enable ClientPreferences for group0 (T327979)]]
[21:23:28] <stashbot>	 T327979: Enable persistent fixed width setting for anonymous users - https://phabricator.wikimedia.org/T327979
[21:24:04] <logmsgbot>	 !log brett@cumin2002 conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet
[21:24:15] <kindrobot>	 RoanKattouw, urbanecm, cjming, or TheresNoTime, could I hand bawolff's patch 884142 off to one of you after this one?
[21:24:16] <logmsgbot>	 !log brett@cumin2002 conftool action : set/pooled=yes; selector: name=cp2039.codfw.wmnet
[21:24:56] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall)
[21:25:03] <logmsgbot>	 !log kindrobot@deploy1002 kindrobot and nray: Backport for [[gerrit:885395|Enable ClientPreferences for group0 (T327979)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
[21:25:10] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host cp2038.codfw.wmnet with OS bullseye
[21:25:16] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2038.codfw.wmnet with OS bullseye
[21:25:29] <kindrobot>	 nray: could you confirm?
[21:25:37] <nray>	 yes, checking now
[21:27:30] <nray>	 @kindrobot things look good, you can proceed@!
[21:27:51] <kindrobot>	 Thank you! Syncing...
[21:29:35] <kindrobot>	 Sorry bawolff, I won't be able to deploy your patch. I've got a commitment coming up, and I can't risk the deployment window running into it. 
[21:29:41] <logmsgbot>	 !log eevans@cumin1001 START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2002.codfw.wmnet: Trying to induce errors - eevans@cumin1001
[21:29:59] <bawolff>	 kindrobot: no worries, it happens. Its not a particularly urgent patch
[21:30:23] <kindrobot>	 Great, thank you for understanding. :)
[21:30:24] <bawolff>	 If someone else show up to do more in the window, please ping me :)
[21:31:00] <RhinosF1>	 bawolff: how do you not have prod access?
[21:31:05] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
[21:31:26] <bawolff>	 RhinosF1: I used to, once upon a time
[21:31:44] <zabe>	 I can take a look, but only in like 30min
[21:32:08] <bawolff>	 Umm, around the time i quit my job at WMF, my laptop was stolen (Prauge hackathon, it was an interesting time for me), so my access got revoked, and since i was kind of quiting anyways, i never asked for it back
[21:32:52] <bawolff>	 zabe: That'd be awesome if that works out, but if not, no stress, I'll just do some other window
[21:33:41] <logmsgbot>	 !log kindrobot@deploy1002 Finished scap: Backport for [[gerrit:885395|Enable ClientPreferences for group0 (T327979)]] (duration: 10m 17s)
[21:33:46] <stashbot>	 T327979: Enable persistent fixed width setting for anonymous users - https://phabricator.wikimedia.org/T327979
[21:34:05] <bawolff>	 Not to mention, its very rare I do stuff that involves deploying things. Last time I participated in this process it was still called SWAT
[21:34:15] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
[21:34:51] <bawolff>	 I literally just downloaded the wikimedia debug toolbar ten minutes ago because i haven't needed it since i got my new laptop
[21:35:18] <kindrobot>	 !log close UTC late backport window. Did not deploy bawolff 884142 as I ran out of time. zabe may reopen the window in around 30 minutes to finish it out
[21:35:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:35:28] <nray>	 thanks for your help @kindrobot !
[21:35:40] <kindrobot>	 No problem, thank you everyone. :)
[21:36:15] <logmsgbot>	 !log eevans@cumin1001 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2002.codfw.wmnet: Trying to induce errors - eevans@cumin1001
[21:39:02] <logmsgbot>	 !log eevans@cumin1001 START - Cookbook sre.hosts.reboot-single for host cassandra-dev2002.codfw.wmnet
[21:44:05] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
[21:44:59] <logmsgbot>	 !log eevans@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cassandra-dev2002.codfw.wmnet
[21:47:17] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
[22:05:17] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5020.eqsin.wmnet with OS bullseye
[22:05:23] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp5020.eqsin.wmnet with OS bullseye completed: - cp5020 (**PASS**)   - Removed from Puppet and PuppetDB if present   -...
[22:07:07] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet,service=cdn
[22:07:08] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet,service=ats-be
[22:07:29] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ssingh)
[22:07:50] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2038.codfw.wmnet with OS bullseye
[22:07:55] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2038.codfw.wmnet with OS bullseye completed: - cp2038 (**PASS**)   - Downtimed on Icinga/Alertmanager   - Disabled Pu...
[22:10:02] <wikibugs>	 (03PS1) 10Jcrespo: Add unit tests [software/mediabackups] - 10https://gerrit.wikimedia.org/r/885428
[22:13:14] <logmsgbot>	 !log brett@cumin2002 conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet
[22:13:38] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host cp2040.codfw.wmnet with OS bullseye
[22:13:44] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp2040.codfw.wmnet with OS bullseye
[22:13:48] <wikibugs>	 (03PS4) 10Zabe: Restrict flow-edit-title to autoconfirmed on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884142 (https://phabricator.wikimedia.org/T328097) (owner: 10Brian Wolff)
[22:13:56] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall)
[22:14:04] <zabe>	 bawolff, we can do this now
[22:14:11] <bawolff>	 Woo. Thanks :)
[22:14:31] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Restrict flow-edit-title to autoconfirmed on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884142 (https://phabricator.wikimedia.org/T328097) (owner: 10Brian Wolff)
[22:15:16] <wikibugs>	 (03Merged) 10jenkins-bot: Restrict flow-edit-title to autoconfirmed on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/884142 (https://phabricator.wikimedia.org/T328097) (owner: 10Brian Wolff)
[22:17:53] <logmsgbot>	 !log zabe@deploy1002 Started scap: Backport for [[gerrit:884142|Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097)]]
[22:17:58] <stashbot>	 T328097: make flow-edit-title be autoconfirm only on mediawikiwiki - https://phabricator.wikimedia.org/T328097
[22:19:04] <jinxer-wm>	 (ProbeDown) firing: (2) Service centrallog1002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog1002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[22:19:45] <logmsgbot>	 !log zabe@deploy1002 zabe and bawolff: Backport for [[gerrit:884142|Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
[22:20:52] <bawolff>	 zabe: I tested and confirmed it worked
[22:21:01] <zabe>	 cool, syncing
[22:21:45] <bawolff>	 Although i did notice that flow is not purging varnish cache properly, which is :S
[22:23:11] <bawolff>	 Oh nevermind, it is just sorted differently for me when logged out
[22:23:58] <wikibugs>	 (03PS2) 10Zabe: Stop writing to cuc_user and cuc_user_text in group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885416 (https://phabricator.wikimedia.org/T233004)
[22:24:01] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Stop writing to cuc_user and cuc_user_text in group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885416 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[22:24:58] <wikibugs>	 (03Merged) 10jenkins-bot: Stop writing to cuc_user and cuc_user_text in group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885416 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[22:26:37] <logmsgbot>	 !log zabe@deploy1002 Finished scap: Backport for [[gerrit:884142|Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097)]] (duration: 08m 43s)
[22:26:42] <stashbot>	 T328097: make flow-edit-title be autoconfirm only on mediawikiwiki - https://phabricator.wikimedia.org/T328097
[22:26:46] <zabe>	 bawolff, should be live :)
[22:26:52] <wikibugs>	 (03PS1) 10Zabe: Stop writing to cuc_comment in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885431 (https://phabricator.wikimedia.org/T233004)
[22:26:55] <bawolff>	 Awsome. Thank you :)
[22:27:09] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Stop writing to cuc_comment in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885431 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[22:28:01] <wikibugs>	 (03Merged) 10jenkins-bot: Stop writing to cuc_comment in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885431 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[22:28:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by zabe@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/885431 (https://phabricator.wikimedia.org/T233004) (owner: 10Zabe)
[22:28:24] <logmsgbot>	 !log zabe@deploy1002 Started scap: Backport for [[gerrit:885416|Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004)]], [[gerrit:885431|Stop writing to cuc_comment in testwiki (T233004)]]
[22:28:28] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[22:30:09] <logmsgbot>	 !log zabe@deploy1002 zabe: Backport for [[gerrit:885416|Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004)]], [[gerrit:885431|Stop writing to cuc_comment in testwiki (T233004)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
[22:32:35] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
[22:35:41] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
[22:35:58] <logmsgbot>	 !log zabe@deploy1002 Finished scap: Backport for [[gerrit:885416|Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004)]], [[gerrit:885431|Stop writing to cuc_comment in testwiki (T233004)]] (duration: 07m 34s)
[22:36:02] <stashbot>	 T233004: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004
[22:53:28] <wikibugs>	 (03PS1) 10Bking: elastic: add udp_json_logback_compat_profile [puppet] - 10https://gerrit.wikimedia.org/r/885438 (https://phabricator.wikimedia.org/T324335)
[22:53:29] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2040.codfw.wmnet with OS bullseye
[22:53:36] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp2040.codfw.wmnet with OS bullseye completed: - cp2040 (**PASS**)   - Downtimed on Icinga/Alertmanager   - Disabled Pu...
[22:54:10] <logmsgbot>	 !log brett@cumin2002 conftool action : set/pooled=yes; selector: name=cp2040.codfw.wmnet
[22:54:39] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10BCornwall)
[22:55:50] <wikibugs>	 (03PS1) 10Bking: elastic: add ESJsonLayout log config [puppet] - 10https://gerrit.wikimedia.org/r/885439 (https://phabricator.wikimedia.org/T324335)
[22:56:06] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+1] "Looks good, ready to test on relforge" [puppet] - 10https://gerrit.wikimedia.org/r/885438 (https://phabricator.wikimedia.org/T324335) (owner: 10Bking)
[22:56:19] <wikibugs>	 (03CR) 10Bking: [C: 03+2] elastic: add udp_json_logback_compat_profile [puppet] - 10https://gerrit.wikimedia.org/r/885438 (https://phabricator.wikimedia.org/T324335) (owner: 10Bking)
[22:57:17] <inflatador>	 mutante gonna merge your etherpad patch if that's cool
[23:01:06] <mutante>	 inflatador: yes, it is. sorry. got distracted
[23:01:14] <inflatador>	 mutante np, it's merged
[23:03:10] <wikibugs>	 (03PS1) 10Bking: Revert "elastic: add udp_json_logback_compat_profile" [puppet] - 10https://gerrit.wikimedia.org/r/885320
[23:04:01] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+1] Revert "elastic: add udp_json_logback_compat_profile" [puppet] - 10https://gerrit.wikimedia.org/r/885320 (owner: 10Bking)
[23:06:07] <wikibugs>	 (03CR) 10Bking: [C: 03+2] Revert "elastic: add udp_json_logback_compat_profile" [puppet] - 10https://gerrit.wikimedia.org/r/885320 (owner: 10Bking)
[23:06:23] <wikibugs>	 (03PS1) 10JHathaway: Add jaeger-{builder,query,collector} [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/885441 (https://phabricator.wikimedia.org/T320553)
[23:08:14] <wikibugs>	 (03PS2) 10Ryan Kemper: elastic: add ESJsonLayout log config [puppet] - 10https://gerrit.wikimedia.org/r/885439 (https://phabricator.wikimedia.org/T324335) (owner: 10Bking)
[23:08:59] <wikibugs>	 (03PS3) 10Ryan Kemper: elastic: add ESJsonLayout log config [puppet] - 10https://gerrit.wikimedia.org/r/885439 (https://phabricator.wikimedia.org/T324335) (owner: 10Bking)
[23:09:19] <wikibugs>	 (03CR) 10JHathaway: "kindly review" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/885441 (https://phabricator.wikimedia.org/T320553) (owner: 10JHathaway)
[23:12:59] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host cp3054.esams.wmnet with OS bullseye
[23:13:05] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp3054.esams.wmnet with OS bullseye
[23:34:36] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
[23:35:27] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS bullseye
[23:35:34] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp3055.esams.wmnet with OS bullseye
[23:37:43] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
[23:38:39] <wikibugs>	 (03CR) 10RLazarus: "Please also add a test in test_main.py, where you pass a json_body through and assert that it's encoded correctly -- you can use test_form" [software/httpbb] - 10https://gerrit.wikimedia.org/r/884920 (https://phabricator.wikimedia.org/T328280) (owner: 10Ilias Sarantopoulos)
[23:45:48] <logmsgbot>	 !log brett@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3055.esams.wmnet with OS bullseye
[23:45:53] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp3055.esams.wmnet with OS bullseye executed with errors: - cp3055 (**FAIL**)   - Downtimed on Icinga/Alertmanager   -...
[23:51:32] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS bullseye
[23:51:38] <wikibugs>	 10SRE, 10Traffic: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp3055.esams.wmnet with OS bullseye