[00:04:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P77605 and previous config saved to /var/cache/conftool/dbconfig/20250611-000441-marostegui.json
[00:08:25] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1155346
[00:08:25] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1155346 (owner: 10TrainBranchBot)
[00:10:14] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on db2141 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 606.79 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[00:11:46] <icinga-wm>	 PROBLEM - Disk space on an-worker1107 is CRITICAL: DISK CRITICAL - free space: / 2056 MB (3% inode=95%): /tmp 2056 MB (3% inode=95%): /var/tmp 2056 MB (3% inode=95%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-worker1107&var-datasource=eqiad+prometheus/ops
[00:19:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1254 (T396130)', diff saved to https://phabricator.wikimedia.org/P77606 and previous config saved to /var/cache/conftool/dbconfig/20250611-001949-marostegui.json
[00:19:53] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[00:20:05] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[00:29:16] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1155346 (owner: 10TrainBranchBot)
[00:58:29] <jinxer-wm>	 FIRING: GoRoutinesTooHigh: gNMIc running on netflow1002 have more than 10000 Go routines. - https://wikitech.wikimedia.org/wiki/Network_telemetry#GoRoutinesTooHigh - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGoRoutinesTooHigh
[01:30:28] <wikibugs>	 10ops-codfw, 06DC-Ops: Unresponsive management for nokiatest2002.mgmt:22 - https://phabricator.wikimedia.org/T396546 (10phaultfinder) 03NEW
[01:31:25] <wikibugs>	 10ops-codfw, 06DC-Ops: Unresponsive management for nokiatest2001.mgmt:22 - https://phabricator.wikimedia.org/T396547 (10phaultfinder) 03NEW
[01:31:46] <icinga-wm>	 PROBLEM - Disk space on an-worker1107 is CRITICAL: DISK CRITICAL - free space: / 2099 MB (3% inode=95%): /tmp 2099 MB (3% inode=95%): /var/tmp 2099 MB (3% inode=95%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-worker1107&var-datasource=eqiad+prometheus/ops
[01:46:02] <wikibugs>	 (03PS1) 10DDesouza: miscweb(research-landing-page): bump image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155352 (https://phabricator.wikimedia.org/T219903)
[02:03:23] <wikibugs>	 (03CR) 10DDesouza: [C:03+2] miscweb(research-landing-page): bump image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155352 (https://phabricator.wikimedia.org/T219903) (owner: 10DDesouza)
[02:05:14] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on db2141 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[02:05:32] <wikibugs>	 (03Merged) 10jenkins-bot: miscweb(research-landing-page): bump image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155352 (https://phabricator.wikimedia.org/T219903) (owner: 10DDesouza)
[02:06:12] <logmsgbot>	 !log dani@deploy1003 helmfile [staging] START helmfile.d/services/miscweb: apply
[02:06:28] <logmsgbot>	 !log dani@deploy1003 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[02:06:29] <logmsgbot>	 !log dani@deploy1003 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[02:06:45] <logmsgbot>	 !log dani@deploy1003 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[02:06:46] <logmsgbot>	 !log dani@deploy1003 helmfile [codfw] START helmfile.d/services/miscweb: apply
[02:07:07] <logmsgbot>	 !log dani@deploy1003 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[03:28:34] <jinxer-wm>	 FIRING: [2x] CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld
[03:54:26] <wikibugs>	 (03PS1) 10KartikMistry: Update recommendation-api to 2025-06-10-203235-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155359 (https://phabricator.wikimedia.org/T374695)
[04:00:54] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 197920720 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[04:01:54] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 30368 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[04:10:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: httpbb_kubernetes_mw-api-ext-next_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:16:14] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[04:21:35] <_joe_>	 truly nothing relevant
[04:23:56] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+2] robots.txt: add crawl-delay directive for semrushbot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1148791 (owner: 10Giuseppe Lavagetto)
[04:24:42] <_joe_>	 jouncebot: now
[04:24:42] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 35 minute(s)
[04:24:42] <wikibugs>	 (03Merged) 10jenkins-bot: robots.txt: add crawl-delay directive for semrushbot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1148791 (owner: 10Giuseppe Lavagetto)
[04:24:54] <_joe_>	 jouncebot: next
[04:24:54] <jouncebot>	 In 1 hour(s) and 35 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250611T0600)
[04:25:06] <_joe_>	 yeah I'm going to do this a bit earlier than usua;
[04:25:08] <_joe_>	 *usual
[04:25:44] <logmsgbot>	 !log oblivian@deploy1003 Started scap sync-world: Backport for [[gerrit:1148791|robots.txt: add crawl-delay directive for semrushbot]]
[04:28:19] <logmsgbot>	 !log oblivian@deploy1003 oblivian: Backport for [[gerrit:1148791|robots.txt: add crawl-delay directive for semrushbot]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[04:30:34] <logmsgbot>	 !log oblivian@deploy1003 oblivian: Continuing with sync
[04:37:27] <logmsgbot>	 !log oblivian@deploy1003 Finished scap sync-world: Backport for [[gerrit:1148791|robots.txt: add crawl-delay directive for semrushbot]] (duration: 11m 43s)
[04:43:15] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid releases routed via main (k8s) 1.514s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[04:48:15] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid releases routed via main (k8s) 1.491s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[04:58:29] <jinxer-wm>	 FIRING: GoRoutinesTooHigh: gNMIc running on netflow1002 have more than 10000 Go routines. - https://wikitech.wikimedia.org/wiki/Network_telemetry#GoRoutinesTooHigh - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGoRoutinesTooHigh
[05:05:58] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
[05:06:14] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[05:06:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:09:04] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
[05:09:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2149 (T396130)', diff saved to https://phabricator.wikimedia.org/P77607 and previous config saved to /var/cache/conftool/dbconfig/20250611-050911-marostegui.json
[05:09:15] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[05:10:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: httpbb_kubernetes_mw-api-ext-next_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:10:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Set db2214 with weight 0 T396509', diff saved to https://phabricator.wikimedia.org/P77608 and previous config saved to /var/cache/conftool/dbconfig/20250611-051056-root.json
[05:11:00] <stashbot>	 T396509: Switchover s6 master (db2229 -> db2214) - https://phabricator.wikimedia.org/T396509
[05:11:09] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s6 T396509
[05:11:21] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Promote db2214 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/1155289 (https://phabricator.wikimedia.org/T396509) (owner: 10Gerrit maintenance bot)
[05:15:02] <marostegui>	 !log Starting s6 codfw failover from db2229 to db2214 - T396509
[05:15:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:15:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote db2214 to s6 primary T396509', diff saved to https://phabricator.wikimedia.org/P77609 and previous config saved to /var/cache/conftool/dbconfig/20250611-051525-marostegui.json
[05:16:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2229 T396509', diff saved to https://phabricator.wikimedia.org/P77610 and previous config saved to /var/cache/conftool/dbconfig/20250611-051612-marostegui.json
[05:16:16] <stashbot>	 T396509: Switchover s6 master (db2229 -> db2214) - https://phabricator.wikimedia.org/T396509
[05:16:59] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2229.codfw.wmnet with reason: Maintenance
[05:18:35] <wikibugs>	 (03PS1) 10Marostegui: db2229: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1155366 (https://phabricator.wikimedia.org/T395989)
[05:19:25] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2229: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1155366 (https://phabricator.wikimedia.org/T395989) (owner: 10Marostegui)
[05:26:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T396130)', diff saved to https://phabricator.wikimedia.org/P77611 and previous config saved to /var/cache/conftool/dbconfig/20250611-052657-marostegui.json
[05:27:02] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[05:27:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2229 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77612 and previous config saved to /var/cache/conftool/dbconfig/20250611-052719-root.json
[05:28:54] <wikibugs>	 (03PS1) 10Marostegui: db2238: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1155369 (https://phabricator.wikimedia.org/T396549)
[05:29:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2238 T396549', diff saved to https://phabricator.wikimedia.org/P77613 and previous config saved to /var/cache/conftool/dbconfig/20250611-052907-marostegui.json
[05:29:11] <stashbot>	 T396549: Migrate s2 to MariaDB 10.11 - https://phabricator.wikimedia.org/T396549
[05:29:27] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2238.codfw.wmnet with reason: Maintenance
[05:29:37] <wikibugs>	 (03PS3) 10Samwilson: InitialiseSettings: wgTemplateDataEnableDiscovery on more wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1151831 (https://phabricator.wikimedia.org/T377975)
[05:30:08] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2238: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1155369 (https://phabricator.wikimedia.org/T396549) (owner: 10Marostegui)
[05:35:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2238 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77614 and previous config saved to /var/cache/conftool/dbconfig/20250611-053527-root.json
[05:39:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es1040', diff saved to https://phabricator.wikimedia.org/P77615 and previous config saved to /var/cache/conftool/dbconfig/20250611-053903-marostegui.json
[05:39:25] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1040.eqiad.wmnet with reason: Maintenance
[05:40:55] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote es1035 to es7 master [puppet] - 10https://gerrit.wikimedia.org/r/1155372 (https://phabricator.wikimedia.org/T396550)
[05:40:59] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: wmnet: Update es7-master alias [dns] - 10https://gerrit.wikimedia.org/r/1155373 (https://phabricator.wikimedia.org/T396550)
[05:42:05] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P77616 and previous config saved to /var/cache/conftool/dbconfig/20250611-054204-marostegui.json
[05:42:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2229 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77617 and previous config saved to /var/cache/conftool/dbconfig/20250611-054224-root.json
[05:43:18] <wikibugs>	 (03PS1) 10Marostegui: db-production.php: Disable writes in es7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155374 (https://phabricator.wikimedia.org/T396550)
[05:48:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77618 and previous config saved to /var/cache/conftool/dbconfig/20250611-054835-root.json
[05:50:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2238 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77619 and previous config saved to /var/cache/conftool/dbconfig/20250611-055033-root.json
[05:52:09] <wikibugs>	 (03PS1) 10Marostegui: db1233: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1155375 (https://phabricator.wikimedia.org/T396549)
[05:52:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1233 T396549', diff saved to https://phabricator.wikimedia.org/P77620 and previous config saved to /var/cache/conftool/dbconfig/20250611-055222-marostegui.json
[05:52:26] <stashbot>	 T396549: Migrate s2 to MariaDB 10.11 - https://phabricator.wikimedia.org/T396549
[05:52:55] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1233.eqiad.wmnet with reason: Maintenance
[05:54:31] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1233: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1155375 (https://phabricator.wikimedia.org/T396549) (owner: 10Marostegui)
[05:57:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es1040 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77621 and previous config saved to /var/cache/conftool/dbconfig/20250611-055705-root.json
[05:57:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P77622 and previous config saved to /var/cache/conftool/dbconfig/20250611-055711-marostegui.json
[05:57:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2229 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77623 and previous config saved to /var/cache/conftool/dbconfig/20250611-055730-root.json
[06:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250611T0600)
[06:00:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77624 and previous config saved to /var/cache/conftool/dbconfig/20250611-060048-root.json
[06:01:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:02:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repool es1040', diff saved to https://phabricator.wikimedia.org/P77625 and previous config saved to /var/cache/conftool/dbconfig/20250611-060227-marostegui.json
[06:03:03] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db-production.php: Disable writes in es7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155374 (https://phabricator.wikimedia.org/T396550) (owner: 10Marostegui)
[06:03:30] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T396550
[06:03:33] <stashbot>	 T396550: Switchover es7 master (es1039 -> es1035) - https://phabricator.wikimedia.org/T396550
[06:03:49] <wikibugs>	 (03Merged) 10jenkins-bot: db-production.php: Disable writes in es7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155374 (https://phabricator.wikimedia.org/T396550) (owner: 10Marostegui)
[06:04:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repool es1040', diff saved to https://phabricator.wikimedia.org/P77626 and previous config saved to /var/cache/conftool/dbconfig/20250611-060413-marostegui.json
[06:04:27] <logmsgbot>	 !log marostegui@deploy1003 Started scap sync-world: Backport for [[gerrit:1155374|db-production.php: Disable writes in es7 (T396550)]]
[06:05:04] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-production.php: Disable writes in es7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155377
[06:05:09] <wikibugs>	 (03CR) 10Marostegui: [C:04-2] "Not yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155377 (owner: 10Marostegui)
[06:05:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2238 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77627 and previous config saved to /var/cache/conftool/dbconfig/20250611-060538-root.json
[06:05:52] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repool es1040', diff saved to https://phabricator.wikimedia.org/P77628 and previous config saved to /var/cache/conftool/dbconfig/20250611-060552-marostegui.json
[06:06:32] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] mariadb: Promote es1035 to es7 master [puppet] - 10https://gerrit.wikimedia.org/r/1155372 (https://phabricator.wikimedia.org/T396550) (owner: 10Gerrit maintenance bot)
[06:06:40] <logmsgbot>	 !log marostegui@deploy1003 marostegui: Backport for [[gerrit:1155374|db-production.php: Disable writes in es7 (T396550)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[06:07:30] <logmsgbot>	 !log marostegui@deploy1003 marostegui: Continuing with sync
[06:12:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T396130)', diff saved to https://phabricator.wikimedia.org/P77629 and previous config saved to /var/cache/conftool/dbconfig/20250611-061219-marostegui.json
[06:12:23] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[06:12:35] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
[06:12:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2229 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77630 and previous config saved to /var/cache/conftool/dbconfig/20250611-061236-root.json
[06:12:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77631 and previous config saved to /var/cache/conftool/dbconfig/20250611-061242-marostegui.json
[06:14:30] <logmsgbot>	 !log marostegui@deploy1003 Finished scap sync-world: Backport for [[gerrit:1155374|db-production.php: Disable writes in es7 (T396550)]] (duration: 10m 03s)
[06:14:33] <stashbot>	 T396550: Switchover es7 master (es1039 -> es1035) - https://phabricator.wikimedia.org/T396550
[06:15:02] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Set es1035 with weight 0 T396550', diff saved to https://phabricator.wikimedia.org/P77632 and previous config saved to /var/cache/conftool/dbconfig/20250611-061501-root.json
[06:15:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77633 and previous config saved to /var/cache/conftool/dbconfig/20250611-061553-root.json
[06:16:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote es1035 to es7 primary T396550', diff saved to https://phabricator.wikimedia.org/P77634 and previous config saved to /var/cache/conftool/dbconfig/20250611-061644-root.json
[06:17:27] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] wmnet: Update es7-master alias [dns] - 10https://gerrit.wikimedia.org/r/1155373 (https://phabricator.wikimedia.org/T396550) (owner: 10Gerrit maintenance bot)
[06:17:34] <logmsgbot>	 !log marostegui@dns1006 START - running authdns-update
[06:18:21] <logmsgbot>	 !log marostegui@dns1006 END - running authdns-update
[06:19:02] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Pool es1039', diff saved to https://phabricator.wikimedia.org/P77635 and previous config saved to /var/cache/conftool/dbconfig/20250611-061901-marostegui.json
[06:19:04] <marostegui>	 !log Starting es7 eqiad failover from es1039 to es1035 - T396550
[06:19:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:19:12] <wikibugs>	 (03CR) 10Marostegui: Revert "db-production.php: Disable writes in es7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155377 (owner: 10Marostegui)
[06:19:19] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db-production.php: Disable writes in es7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155377 (owner: 10Marostegui)
[06:20:07] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-production.php: Disable writes in es7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155377 (owner: 10Marostegui)
[06:20:42] <logmsgbot>	 !log marostegui@deploy1003 Started scap sync-world: Backport for [[gerrit:1155377|Revert "db-production.php: Disable writes in es7"]]
[06:20:44] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2238 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77636 and previous config saved to /var/cache/conftool/dbconfig/20250611-062044-root.json
[06:22:50] <logmsgbot>	 !log marostegui@deploy1003 marostegui: Backport for [[gerrit:1155377|Revert "db-production.php: Disable writes in es7"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[06:23:45] <logmsgbot>	 !log marostegui@deploy1003 marostegui: Continuing with sync
[06:25:33] <wikibugs>	 (03PS1) 10Muehlenhoff: Apply installserver role on install7002 [puppet] - 10https://gerrit.wikimedia.org/r/1155503 (https://phabricator.wikimedia.org/T394263)
[06:25:42] <moritzm>	 !log installing libxml2 security updates
[06:25:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:27:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2229 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77637 and previous config saved to /var/cache/conftool/dbconfig/20250611-062741-root.json
[06:30:13] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
[06:30:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77638 and previous config saved to /var/cache/conftool/dbconfig/20250611-063027-marostegui.json
[06:30:31] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[06:30:48] <logmsgbot>	 !log marostegui@deploy1003 Finished scap sync-world: Backport for [[gerrit:1155377|Revert "db-production.php: Disable writes in es7"]] (duration: 10m 06s)
[06:31:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77639 and previous config saved to /var/cache/conftool/dbconfig/20250611-063059-root.json
[06:32:11] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
[06:32:51] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
[06:35:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2238 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77640 and previous config saved to /var/cache/conftool/dbconfig/20250611-063549-root.json
[06:36:16] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid releases routed via main (k8s) 1.272s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[06:36:36] <wikibugs>	 (03CR) 10Jelto: [C:03+2] gitlab: bump gitlab-settings to v1.8.0 [puppet] - 10https://gerrit.wikimedia.org/r/1155152 (https://phabricator.wikimedia.org/T395014) (owner: 10Jelto)
[06:38:21] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
[06:38:34] <jinxer-wm>	 FIRING: [2x] CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld
[06:41:16] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid releases routed via main (k8s) 1.272s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[06:42:08] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
[06:42:45] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Apply installserver role on install7002 [puppet] - 10https://gerrit.wikimedia.org/r/1155503 (https://phabricator.wikimedia.org/T394263) (owner: 10Muehlenhoff)
[06:42:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2229 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77641 and previous config saved to /var/cache/conftool/dbconfig/20250611-064246-root.json
[06:43:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es2028 T395241', diff saved to https://phabricator.wikimedia.org/P77642 and previous config saved to /var/cache/conftool/dbconfig/20250611-064314-marostegui.json
[06:43:40] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2028.codfw.wmnet with reason: Maintenance
[06:45:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P77643 and previous config saved to /var/cache/conftool/dbconfig/20250611-064535-marostegui.json
[06:46:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool es2027 T395241', diff saved to https://phabricator.wikimedia.org/P77644 and previous config saved to /var/cache/conftool/dbconfig/20250611-064606-marostegui.json
[06:46:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77645 and previous config saved to /var/cache/conftool/dbconfig/20250611-064611-root.json
[06:46:26] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2027.codfw.wmnet with reason: Maintenance
[06:48:14] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
[06:48:21] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
[06:49:04] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=1) rolling restart_daemons on A:wdqs-all
[06:49:11] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
[06:50:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2028 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77646 and previous config saved to /var/cache/conftool/dbconfig/20250611-065013-root.json
[06:50:47] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
[06:52:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77647 and previous config saved to /var/cache/conftool/dbconfig/20250611-065217-root.json
[06:56:29] <wikibugs>	 (03PS1) 10Jelto: gitlab: bump gitlab-settings to v1.9.0 [puppet] - 10https://gerrit.wikimedia.org/r/1155542 (https://phabricator.wikimedia.org/T395014)
[06:57:18] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] "thanks for the hotfix!" [puppet] - 10https://gerrit.wikimedia.org/r/1155542 (https://phabricator.wikimedia.org/T395014) (owner: 10Jelto)
[06:57:40] <wikibugs>	 (03CR) 10Jelto: [C:03+2] gitlab: bump gitlab-settings to v1.9.0 [puppet] - 10https://gerrit.wikimedia.org/r/1155542 (https://phabricator.wikimedia.org/T395014) (owner: 10Jelto)
[06:58:32] <wikibugs>	 (03CR) 10Brouberol: "We need to bump the chart version once more, as another change was merged in the meantime" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1154248 (https://phabricator.wikimedia.org/T388378) (owner: 10Btullis)
[07:00:05] <jouncebot>	 Amir1, Urbanecm, and awight: That opportune time for a UTC morning backport window deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250611T0700).
[07:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[07:00:17] <wikibugs>	 (03PS4) 10Brouberol: Configure dse-k8s-worker100[2-3] with the dse_k8s::worker role [puppet] - 10https://gerrit.wikimedia.org/r/1155120 (https://phabricator.wikimedia.org/T395557)
[07:00:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P77648 and previous config saved to /var/cache/conftool/dbconfig/20250611-070042-marostegui.json
[07:00:54] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
[07:01:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77649 and previous config saved to /var/cache/conftool/dbconfig/20250611-070117-root.json
[07:03:40] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
[07:04:08] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: aux-k8s: Switch MTU to 1460 [puppet] - 10https://gerrit.wikimedia.org/r/1155543 (https://phabricator.wikimedia.org/T352956)
[07:05:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2028 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77650 and previous config saved to /var/cache/conftool/dbconfig/20250611-070519-root.json
[07:06:34] <wikibugs>	 (03PS1) 10Jelto: gitlab: bump gitlab-settings to v1.10.0 [puppet] - 10https://gerrit.wikimedia.org/r/1155545 (https://phabricator.wikimedia.org/T395014)
[07:07:09] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] gitlab: bump gitlab-settings to v1.10.0 [puppet] - 10https://gerrit.wikimedia.org/r/1155545 (https://phabricator.wikimedia.org/T395014) (owner: 10Jelto)
[07:07:16] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] aux-k8s: Switch MTU to 1460 [puppet] - 10https://gerrit.wikimedia.org/r/1155543 (https://phabricator.wikimedia.org/T352956) (owner: 10Alexandros Kosiaris)
[07:07:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2027 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77651 and previous config saved to /var/cache/conftool/dbconfig/20250611-070722-root.json
[07:07:25] <wikibugs>	 (03CR) 10Jelto: [C:03+2] gitlab: bump gitlab-settings to v1.10.0 [puppet] - 10https://gerrit.wikimedia.org/r/1155545 (https://phabricator.wikimedia.org/T395014) (owner: 10Jelto)
[07:09:55] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
[07:10:20] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
[07:14:47] <jinxer-wm>	 RESOLVED: CirrusSearchTitleSuggestIndexTooOld: Some search indices that power autocomplete have not been updated recently - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#CirrusSearch_titlesuggest_index_is_too_old - TODO - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchTitleSuggestIndexTooOld
[07:15:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77652 and previous config saved to /var/cache/conftool/dbconfig/20250611-071549-marostegui.json
[07:15:53] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[07:16:05] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
[07:16:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2177 (T396130)', diff saved to https://phabricator.wikimedia.org/P77653 and previous config saved to /var/cache/conftool/dbconfig/20250611-071612-marostegui.json
[07:20:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2028 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77654 and previous config saved to /var/cache/conftool/dbconfig/20250611-072024-root.json
[07:20:36] <wikibugs>	 (03CR) 10Elukey: [C:03+1] pyrra: update o11y slos to 4w window [puppet] - 10https://gerrit.wikimedia.org/r/1155246 (https://phabricator.wikimedia.org/T395916) (owner: 10Herron)
[07:21:06] <wikibugs>	 (03PS1) 10Slyngshede: IDP: Update stylesheets [dns] - 10https://gerrit.wikimedia.org/r/1155546
[07:22:10] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[07:22:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2027 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77655 and previous config saved to /var/cache/conftool/dbconfig/20250611-072227-root.json
[07:24:00] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] IDP: Update stylesheets [dns] - 10https://gerrit.wikimedia.org/r/1155546 (owner: 10Slyngshede)
[07:24:08] <logmsgbot>	 !log slyngshede@dns1004 START - running authdns-update
[07:24:58] <logmsgbot>	 !log slyngshede@dns1004 END - running authdns-update
[07:25:01] <wikibugs>	 (03PS1) 10Jelto: gitlab: bump gitlab-settings to v1.11.0 [puppet] - 10https://gerrit.wikimedia.org/r/1155568 (https://phabricator.wikimedia.org/T395014)
[07:25:19] <icinga-wm>	 PROBLEM - TFTP service on install7002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 65534 (nobody), regex args .*/usr/sbin/atftpd .* https://wikitech.wikimedia.org/wiki/Monitoring/atftpd
[07:25:40] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] gitlab: bump gitlab-settings to v1.11.0 [puppet] - 10https://gerrit.wikimedia.org/r/1155568 (https://phabricator.wikimedia.org/T395014) (owner: 10Jelto)
[07:26:01] <wikibugs>	 06SRE: restbase2030 (and others) running low on disk space - https://phabricator.wikimedia.org/T395845#10902857 (10Jgiannelos) Hey @Eevans   * Regarding mobile-sections this has been completely decommisioned for long time now. I don't think we need storage for this anymore. * Mobile-html and media-list has also...
[07:26:03] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] hiera: x-provenance header on all DCs [puppet] - 10https://gerrit.wikimedia.org/r/1154157 (https://phabricator.wikimedia.org/T392217) (owner: 10Fabfur)
[07:26:05] <wikibugs>	 (03CR) 10Jelto: [C:03+2] gitlab: bump gitlab-settings to v1.11.0 [puppet] - 10https://gerrit.wikimedia.org/r/1155568 (https://phabricator.wikimedia.org/T395014) (owner: 10Jelto)
[07:27:23] <wikibugs>	 (03PS1) 10Muehlenhoff: atftpd: Add support for Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1155585 (https://phabricator.wikimedia.org/T396487)
[07:27:36] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
[07:28:37] <icinga-wm>	 PROBLEM - Backup freshness on backup1001 is CRITICAL: All failures: 1 (install7002), Fresh: 144 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[07:29:22] <wikibugs>	 (03CR) 10CI reject: [V:04-1] atftpd: Add support for Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1155585 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff)
[07:31:54] <logmsgbot>	 !log jmm@cumin1003 END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1031.eqiad.wmnet
[07:31:58] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
[07:33:13] <logmsgbot>	 !log jmm@cumin1003 END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1031.eqiad.wmnet
[07:33:16] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Switch magru to unified cert issued by GTS [puppet] - 10https://gerrit.wikimedia.org/r/1155593 (https://phabricator.wikimedia.org/T395131)
[07:33:32] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
[07:34:09] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1155593 (https://phabricator.wikimedia.org/T395131) (owner: 10Vgutierrez)
[07:34:15] <jinxer-wm>	 FIRING: ProbeDown: Service idp1004:443 has failed probes (http_idp_wikimedia_org_ip6) - https://wikitech.wikimedia.org/wiki/CAS-SSO#Alerting - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:34:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T396130)', diff saved to https://phabricator.wikimedia.org/P77656 and previous config saved to /var/cache/conftool/dbconfig/20250611-073457-marostegui.json
[07:35:01] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[07:35:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2028 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77657 and previous config saved to /var/cache/conftool/dbconfig/20250611-073530-root.json
[07:35:58] <wikibugs>	 (03PS2) 10Muehlenhoff: atftpd: Add support for Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1155585 (https://phabricator.wikimedia.org/T396487)
[07:37:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77658 and previous config saved to /var/cache/conftool/dbconfig/20250611-073733-root.json
[07:38:07] <wikibugs>	 (03CR) 10Elukey: phabricator: expand support for Phabricator tasks (033 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1154786 (owner: 10Volans)
[07:38:35] <logmsgbot>	 jmm@cumin1003 drain-node (PID 1099825) is awaiting input
[07:39:15] <jinxer-wm>	 RESOLVED: ProbeDown: Service idp1004:443 has failed probes (http_idp_wikimedia_org_ip6) - https://wikitech.wikimedia.org/wiki/CAS-SSO#Alerting - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:41:03] <wikibugs>	 (03CR) 10Elukey: [C:03+1] tox: add style checker and formatter environments [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1154766 (owner: 10Volans)
[07:41:09] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1155585 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff)
[07:41:14] <wikibugs>	 (03CR) 10Elukey: [C:03+1] git: add .git-blame-ignore-revs [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1154776 (owner: 10Volans)
[07:45:11] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1154303 (owner: 10Ayounsi)
[07:45:34] <wikibugs>	 (03CR) 10Volans: [C:03+2] tox: add style checker and formatter environments [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1154766 (owner: 10Volans)
[07:45:40] <wikibugs>	 (03CR) 10Volans: [C:03+2] git: add .git-blame-ignore-revs [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1154776 (owner: 10Volans)
[07:47:09] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] gNMI: spread targets on multiple netflow hosts [puppet] - 10https://gerrit.wikimedia.org/r/1154303 (owner: 10Ayounsi)
[07:47:57] <wikibugs>	 (03PS1) 10Muehlenhoff: Add stub keytab for install7002 [labs/private] - 10https://gerrit.wikimedia.org/r/1155594
[07:49:52] <wikibugs>	 (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Add stub keytab for install7002 [labs/private] - 10https://gerrit.wikimedia.org/r/1155594 (owner: 10Muehlenhoff)
[07:50:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P77659 and previous config saved to /var/cache/conftool/dbconfig/20250611-075004-marostegui.json
[07:51:02] <wikibugs>	 (03Merged) 10jenkins-bot: tox: add style checker and formatter environments [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1154766 (owner: 10Volans)
[07:51:22] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/5908/console" [puppet] - 10https://gerrit.wikimedia.org/r/1154302 (https://phabricator.wikimedia.org/T390251) (owner: 10Alexandros Kosiaris)
[07:51:45] <wikibugs>	 (03PS3) 10Muehlenhoff: atftpd: Add support for Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1155585 (https://phabricator.wikimedia.org/T396487)
[07:52:01] <wikibugs>	 (03Merged) 10jenkins-bot: git: add .git-blame-ignore-revs [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1154776 (owner: 10Volans)
[07:52:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77660 and previous config saved to /var/cache/conftool/dbconfig/20250611-075240-root.json
[07:52:57] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
[07:53:18] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
[07:53:50] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[07:56:22] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
[07:57:33] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1155585 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff)
[07:57:51] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[07:59:20] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
[07:59:29] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
[07:59:43] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
[08:00:54] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
[08:01:02] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db1184 (T395241)', diff saved to https://phabricator.wikimedia.org/P77661 and previous config saved to /var/cache/conftool/dbconfig/20250611-080101-fceratto.json
[08:03:01] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
[08:03:52] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
[08:04:13] <icinga-wm>	 PROBLEM - Hadoop NodeManager on an-worker1150 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[08:04:15] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
[08:05:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P77662 and previous config saved to /var/cache/conftool/dbconfig/20250611-080511-marostegui.json
[08:05:18] <wikibugs>	 (03CR) 10Muehlenhoff: "The PCC failure for P5 is expected, we use Puppet 7 syntax." [puppet] - 10https://gerrit.wikimedia.org/r/1155585 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff)
[08:05:34] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.ganeti.makevm for new host netflow1003.eqiad.wmnet
[08:05:35] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox
[08:07:17] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
[08:07:20] <logmsgbot>	 jmm@cumin1003 drain-node (PID 1102750) is awaiting input
[08:07:29] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
[08:07:45] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw
[08:09:17] <wikibugs>	 (03PS1) 10Ayounsi: Add netflow1003 to profile::kafka::broker::custom_ferm_srange_component [puppet] - 10https://gerrit.wikimedia.org/r/1155599
[08:09:50] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow1003.eqiad.wmnet - ayounsi@cumin1003"
[08:09:54] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
[08:09:55] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow1003.eqiad.wmnet - ayounsi@cumin1003"
[08:09:55] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:09:55] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.dns.wipe-cache netflow1003.eqiad.wmnet on all recursors
[08:09:58] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow1003.eqiad.wmnet on all recursors
[08:10:29] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow1003.eqiad.wmnet - ayounsi@cumin1003"
[08:10:32] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T395241)', diff saved to https://phabricator.wikimedia.org/P77664 and previous config saved to /var/cache/conftool/dbconfig/20250611-081031-fceratto.json
[08:10:34] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow1003.eqiad.wmnet - ayounsi@cumin1003"
[08:10:54] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
[08:11:02] <wikibugs>	 (03PS1) 10Elukey: Rename docker_registry_ha's occurrences to docker_registry [labs/private] - 10https://gerrit.wikimedia.org/r/1155601 (https://phabricator.wikimedia.org/T390251)
[08:11:23] <wikibugs>	 (03PS3) 10Majavah: P:openstack: pdns: auth: Bind the API on IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/1155219 (https://phabricator.wikimedia.org/T396448)
[08:11:23] <wikibugs>	 (03PS4) 10Majavah: P:openstack: pdns: auth: Support query_local_address for IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/1155220 (https://phabricator.wikimedia.org/T396448)
[08:11:23] <wikibugs>	 (03PS3) 10Majavah: P:openstack: pdns: recursor: Support binding on multiple addresses [puppet] - 10https://gerrit.wikimedia.org/r/1155228 (https://phabricator.wikimedia.org/T396448)
[08:11:24] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
[08:11:24] <wikibugs>	 (03PS1) 10Majavah: P:openstack: pdns: Add type definition for host config [puppet] - 10https://gerrit.wikimedia.org/r/1155602 (https://phabricator.wikimedia.org/T396448)
[08:11:25] <wikibugs>	 (03PS1) 10Majavah: P:openstack: pdns: auth: Explicitely configure IPs to bind on [puppet] - 10https://gerrit.wikimedia.org/r/1155603 (https://phabricator.wikimedia.org/T396448)
[08:11:28] <wikibugs>	 (03PS4) 10Alexandros Kosiaris: docker_registry_ha: Refactor to make it docker_registry [puppet] - 10https://gerrit.wikimedia.org/r/1154302 (https://phabricator.wikimedia.org/T390251)
[08:11:30] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: docker_registry: Move rsyslog rules from init to web.pp [puppet] - 10https://gerrit.wikimedia.org/r/1155257 (https://phabricator.wikimedia.org/T390251)
[08:11:34] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: docker_registry: Refactor to allow >1 instance [puppet] - 10https://gerrit.wikimedia.org/r/1155258 (https://phabricator.wikimedia.org/T390251)
[08:11:46] <wikibugs>	 (03CR) 10Elukey: "Filed https://gerrit.wikimedia.org/r/c/labs/private/+/1155601 to support the PCC runs. Everything LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1154302 (https://phabricator.wikimedia.org/T390251) (owner: 10Alexandros Kosiaris)
[08:12:08] <logmsgbot>	 !log stevemunene@cumin1002 START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
[08:12:32] <wikibugs>	 (03CR) 10Elukey: [C:03+1] "Makes sense yes!" [puppet] - 10https://gerrit.wikimedia.org/r/1155257 (https://phabricator.wikimedia.org/T390251) (owner: 10Alexandros Kosiaris)
[08:12:47] <wikibugs>	 (03PS14) 10Filippo Giunchedi: pdb_resource_exporter: add puppetdb resource exporter to puppedb [puppet] - 10https://gerrit.wikimedia.org/r/1143600 (https://phabricator.wikimedia.org/T395442) (owner: 10Tiziano Fogli)
[08:12:48] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/5910/co" [puppet] - 10https://gerrit.wikimedia.org/r/1155228 (https://phabricator.wikimedia.org/T396448) (owner: 10Majavah)
[08:13:15] <wikibugs>	 (03CR) 10CI reject: [V:04-1] P:openstack: pdns: Add type definition for host config [puppet] - 10https://gerrit.wikimedia.org/r/1155602 (https://phabricator.wikimedia.org/T396448) (owner: 10Majavah)
[08:13:20] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] "lgtm!" [puppet] - 10https://gerrit.wikimedia.org/r/1155585 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff)
[08:13:33] <wikibugs>	 (03CR) 10Filippo Giunchedi: "I reworked the code a little in the next PS since that was easier said than explained through review cycles, let me know what you think !" [puppet] - 10https://gerrit.wikimedia.org/r/1143600 (https://phabricator.wikimedia.org/T395442) (owner: 10Tiziano Fogli)
[08:13:35] <logmsgbot>	 ayounsi@cumin1003 makevm (PID 1102824) is awaiting input
[08:13:50] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.hosts.reimage for host netflow1003.eqiad.wmnet with OS bookworm
[08:14:53] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
[08:14:54] <wikibugs>	 (03PS1) 10Gkyziridis: ores-extension: enable oresUI for the second batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823)
[08:15:02] <wikibugs>	 06SRE, 06Data-Engineering: WE 5.4 FY 25/26: Improve automata detection at the edge and pass it to the refinery pipeline - https://phabricator.wikimedia.org/T396562 (10Joe) 03NEW
[08:15:05] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
[08:15:40] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
[08:17:39] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
[08:18:32] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
[08:18:35] <wikibugs>	 (03CR) 10Elukey: "Left a nit but the idea looks really nice! I am going to wait for the +1 to when we'll have PPC ready, and/or a successful Pontoon test." [puppet] - 10https://gerrit.wikimedia.org/r/1155258 (https://phabricator.wikimedia.org/T390251) (owner: 10Alexandros Kosiaris)
[08:18:42] <wikibugs>	 06SRE, 06Data-Engineering: WE 5.4 FY 25/26: Improve automata detection at the edge and pass it to the refinery pipeline - https://phabricator.wikimedia.org/T396562#10903050 (10Joe) p:05Triage→03High
[08:19:51] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] Convert an-db100[1-2] to dse-k8s-worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/1155119 (https://phabricator.wikimedia.org/T395557) (owner: 10Brouberol)
[08:19:54] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
[08:20:02] <wikibugs>	 (03PS15) 10Filippo Giunchedi: pdb_resource_exporter: add puppetdb resource exporter to puppedb [puppet] - 10https://gerrit.wikimedia.org/r/1143600 (https://phabricator.wikimedia.org/T395442) (owner: 10Tiziano Fogli)
[08:20:18] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2177 (T396130)', diff saved to https://phabricator.wikimedia.org/P77665 and previous config saved to /var/cache/conftool/dbconfig/20250611-082018-marostegui.json
[08:20:22] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[08:20:28] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
[08:20:33] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2190.codfw.wmnet with reason: Maintenance
[08:20:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2190 (T396130)', diff saved to https://phabricator.wikimedia.org/P77666 and previous config saved to /var/cache/conftool/dbconfig/20250611-082039-marostegui.json
[08:20:55] <wikibugs>	 (03PS4) 10Muehlenhoff: atftpd: Add support for Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1155585 (https://phabricator.wikimedia.org/T396487)
[08:21:05] <wikibugs>	 (03CR) 10Muehlenhoff: atftpd: Add support for Bookworm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1155585 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff)
[08:21:21] <wikibugs>	 (03PS5) 10Muehlenhoff: atftpd: Add support for Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1155585 (https://phabricator.wikimedia.org/T396487)
[08:21:53] <wikibugs>	 (03CR) 10Majavah: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1155602 (https://phabricator.wikimedia.org/T396448) (owner: 10Majavah)
[08:22:15] <logmsgbot>	 !log stevemunene@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
[08:23:10] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Yup, working on that." [puppet] - 10https://gerrit.wikimedia.org/r/1155258 (https://phabricator.wikimedia.org/T390251) (owner: 10Alexandros Kosiaris)
[08:25:38] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P77667 and previous config saved to /var/cache/conftool/dbconfig/20250611-082538-fceratto.json
[08:25:57] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
[08:26:03] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
[08:26:33] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.hosts.rename from an-db1001 to dse-k8s-worker1012
[08:26:57] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.dns.netbox
[08:27:45] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
[08:27:48] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on netflow1003.eqiad.wmnet with reason: host reimage
[08:30:13] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1150 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[08:30:17] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-db1001 to dse-k8s-worker1012 - brouberol@cumin2002"
[08:30:46] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-db1001 to dse-k8s-worker1012 - brouberol@cumin2002"
[08:30:46] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:30:47] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.dns.wipe-cache dse-k8s-worker1012 on all recursors
[08:30:50] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1012 on all recursors
[08:30:51] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1012
[08:30:53] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
[08:32:09] <logmsgbot>	 !log tappof@cumin1002 START - Cookbook sre.hosts.reboot-single for host alert2002.wikimedia.org
[08:32:10] <logmsgbot>	 !log tappof@cumin1002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2002.wikimedia.org
[08:32:11] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1012
[08:32:22] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow1003.eqiad.wmnet with reason: host reimage
[08:32:51] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-db1001 to dse-k8s-worker1012
[08:33:29] <jinxer-wm>	 RESOLVED: GoRoutinesTooHigh: gNMIc running on netflow1002 have more than 10000 Go routines. - https://wikitech.wikimedia.org/wiki/Network_telemetry#GoRoutinesTooHigh - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGoRoutinesTooHigh
[08:33:42] <tappof>	 !log T395240 May 2025 Bookworm reboots: alert2002.wikimedia.org
[08:33:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:34:47] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] CI: Remove invasive log message on helmfile compilation error [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155204 (https://phabricator.wikimedia.org/T396234) (owner: 10JMeybohm)
[08:35:02] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
[08:35:06] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.hosts.move-vlan for host dse-k8s-worker1012
[08:35:06] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host dse-k8s-worker1012
[08:36:59] <wikibugs>	 (03PS1) 10Muehlenhoff: profile::memcached::instance: Add support for passing firewall as an srange [puppet] - 10https://gerrit.wikimedia.org/r/1155609
[08:37:10] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
[08:37:16] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
[08:37:23] <wikibugs>	 (03CR) 10CI reject: [V:04-1] profile::memcached::instance: Add support for passing firewall as an srange [puppet] - 10https://gerrit.wikimedia.org/r/1155609 (owner: 10Muehlenhoff)
[08:37:43] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Switch lvs7002 to katran [puppet] - 10https://gerrit.wikimedia.org/r/1155610 (https://phabricator.wikimedia.org/T396561)
[08:37:44] <wikibugs>	 (03PS2) 10JMeybohm: CI: Remove invasive log message on helmfile compilation error [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155204 (https://phabricator.wikimedia.org/T396234)
[08:37:44] <wikibugs>	 (03PS3) 10JMeybohm: Add a script to visualize the dependencies of admin_ng environments [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155212 (https://phabricator.wikimedia.org/T389080)
[08:38:03] <wikibugs>	 (03CR) 10JMeybohm: Add a script to visualize the dependencies of admin_ng environments (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155212 (https://phabricator.wikimedia.org/T389080) (owner: 10JMeybohm)
[08:39:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190 (T396130)', diff saved to https://phabricator.wikimedia.org/P77668 and previous config saved to /var/cache/conftool/dbconfig/20250611-083935-marostegui.json
[08:39:39] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[08:39:53] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
[08:40:28] <jinxer-wm>	 FIRING: KeyholderUnarmed: 1 unarmed Keyholder key(s) on alert2002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed
[08:40:45] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P77669 and previous config saved to /var/cache/conftool/dbconfig/20250611-084045-fceratto.json
[08:40:53] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for cmelo - https://phabricator.wikimedia.org/T395966#10903118 (10elukey)
[08:44:09] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
[08:45:10] <wikibugs>	 (03PS1) 10Majavah: hieradata: Add codfw1dev v6 auth DNS IPs [puppet] - 10https://gerrit.wikimedia.org/r/1155613 (https://phabricator.wikimedia.org/T396448)
[08:45:11] <wikibugs>	 (03PS1) 10Majavah: hieradata: Add codfw1dev v6 recursive DNS IPs [puppet] - 10https://gerrit.wikimedia.org/r/1155614 (https://phabricator.wikimedia.org/T396448)
[08:45:24] <wikibugs>	 (03PS2) 10Muehlenhoff: profile::memcached::instance: Add support for passing firewall as an srange [puppet] - 10https://gerrit.wikimedia.org/r/1155609
[08:46:07] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/1155593 (https://phabricator.wikimedia.org/T395131) (owner: 10Vgutierrez)
[08:46:34] <wikibugs>	 (03CR) 10CI reject: [V:04-1] profile::memcached::instance: Add support for passing firewall as an srange [puppet] - 10https://gerrit.wikimedia.org/r/1155609 (owner: 10Muehlenhoff)
[08:46:52] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/5911/co" [puppet] - 10https://gerrit.wikimedia.org/r/1155613 (https://phabricator.wikimedia.org/T396448) (owner: 10Majavah)
[08:47:59] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/5912/co" [puppet] - 10https://gerrit.wikimedia.org/r/1155614 (https://phabricator.wikimedia.org/T396448) (owner: 10Majavah)
[08:51:25] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
[08:51:45] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1035.eqiad.wmnet
[08:51:52] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1035.eqiad.wmnet
[08:52:29] <wikibugs>	 (03PS3) 10Muehlenhoff: profile::memcached::instance: Add support for passing firewall as an srange [puppet] - 10https://gerrit.wikimedia.org/r/1155609
[08:52:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job gnmic in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:53:34] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow1003.eqiad.wmnet with OS bookworm
[08:53:34] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow1003.eqiad.wmnet
[08:54:42] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
[08:54:43] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P77670 and previous config saved to /var/cache/conftool/dbconfig/20250611-085442-marostegui.json
[08:55:52] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1184 (T395241)', diff saved to https://phabricator.wikimedia.org/P77671 and previous config saved to /var/cache/conftool/dbconfig/20250611-085552-fceratto.json
[08:56:09] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
[08:56:16] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db1186 (T395241)', diff saved to https://phabricator.wikimedia.org/P77672 and previous config saved to /var/cache/conftool/dbconfig/20250611-085615-fceratto.json
[08:57:42] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job gnmi in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[08:58:17] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1155609 (owner: 10Muehlenhoff)
[08:58:41] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence: ms-fe2015 is suffering intermittent errors on port 80 - https://phabricator.wikimedia.org/T396573 (10Vgutierrez) 03NEW
[08:58:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] atftpd: Add support for Bookworm [puppet] - 10https://gerrit.wikimedia.org/r/1155585 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff)
[08:59:13] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence: ms-fe2015 is suffering intermittent errors on port 80 - https://phabricator.wikimedia.org/T396573#10903266 (10Vgutierrez) p:05Triage→03High setting as high priority given the server is getting intermittently pooled and depooled potentially impacting user traffic
[08:59:22] <jinxer-wm>	 FIRING: GnmiTargetDown: fasw2-c1b-eqiad is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[08:59:48] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
[09:00:01] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-staging-worker
[09:00:22] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[09:01:26] <XioNoX>	 GnmiTargetDown is expected, should recover soon
[09:03:07] <jinxer-wm>	 FIRING: [11x] GnmiTargetDown: cloudsw1-d5-eqiad is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[09:03:29] <jinxer-wm>	 FIRING: [11x] GnmiTargetDown: cloudsw1-d5-eqiad is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[09:04:22] <jinxer-wm>	 RESOLVED: GnmiTargetDown: fasw2-c1b-eqiad is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[09:05:01] <wikibugs>	 (03PS2) 10Filippo Giunchedi: thanos: enable tracing for store [puppet] - 10https://gerrit.wikimedia.org/r/1155153 (https://phabricator.wikimedia.org/T394318)
[09:05:02] <wikibugs>	 (03PS2) 10Filippo Giunchedi: thanos: enforce series limit for sidecar [puppet] - 10https://gerrit.wikimedia.org/r/1155190 (https://phabricator.wikimedia.org/T394318)
[09:05:30] <wikibugs>	 (03PS4) 10Muehlenhoff: profile::memcached::instance: Add support for passing firewall as an srange [puppet] - 10https://gerrit.wikimedia.org/r/1155609
[09:05:52] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186 (T395241)', diff saved to https://phabricator.wikimedia.org/P77673 and previous config saved to /var/cache/conftool/dbconfig/20250611-090552-fceratto.json
[09:06:01] <logmsgbot>	 jmm@cumin1003 drain-node (PID 1110568) is awaiting input
[09:08:07] <jinxer-wm>	 RESOLVED: [11x] GnmiTargetDown: cloudsw1-d5-eqiad is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[09:09:39] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
[09:09:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P77674 and previous config saved to /var/cache/conftool/dbconfig/20250611-090949-marostegui.json
[09:10:08] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[09:11:11] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
[09:11:43] <wikibugs>	 (03PS1) 10Muehlenhoff: Revert "Revert back to install7001" [puppet] - 10https://gerrit.wikimedia.org/r/1155616 (https://phabricator.wikimedia.org/T394263)
[09:11:43] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
[09:12:06] <moritzm>	 !log installing libfile-find-rule-perl security updates
[09:12:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:14:36] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.hosts.rename from an-db1002 to dse-k8s-worker1013
[09:14:53] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1155609 (owner: 10Muehlenhoff)
[09:15:06] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.dns.netbox
[09:17:26] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1036.eqiad.wmnet
[09:17:32] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
[09:18:46] <logmsgbot>	 !log elukey@puppetserver1001 conftool action : set/pooled=true; selector: dnsdisc=inference,name=eqiad
[09:19:52] <elukey>	 !log repool eqiad for inference.discovery.wmnet - was left depooled after a long maintenance for k8s infra changes a week ago
[09:19:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:20:03] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-db1002 to dse-k8s-worker1013 - brouberol@cumin2002"
[09:20:06] <wikibugs>	 (03PS1) 10Ayounsi: Promote the TransitPeeringIn/OutSaturation alerts to p.aging [alerts] - 10https://gerrit.wikimedia.org/r/1155620 (https://phabricator.wikimedia.org/T388641)
[09:20:59] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P77678 and previous config saved to /var/cache/conftool/dbconfig/20250611-092059-fceratto.json
[09:21:10] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-db1002 to dse-k8s-worker1013 - brouberol@cumin2002"
[09:21:11] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:21:11] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.dns.wipe-cache dse-k8s-worker1013 on all recursors
[09:21:14] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1013 on all recursors
[09:21:15] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1013
[09:21:48] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Promote the TransitPeeringIn/OutSaturation alerts to p.aging [alerts] - 10https://gerrit.wikimedia.org/r/1155620 (https://phabricator.wikimedia.org/T388641) (owner: 10Ayounsi)
[09:22:26] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1013
[09:23:07] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-db1002 to dse-k8s-worker1013
[09:23:29] <wikibugs>	 (03PS2) 10Ayounsi: Promote the TransitPeeringIn/OutSaturation alerts to p.aging [alerts] - 10https://gerrit.wikimedia.org/r/1155620 (https://phabricator.wikimedia.org/T388641)
[09:24:06] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
[09:24:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2190 (T396130)', diff saved to https://phabricator.wikimedia.org/P77679 and previous config saved to /var/cache/conftool/dbconfig/20250611-092457-marostegui.json
[09:25:01] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[09:25:13] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2194.codfw.wmnet with reason: Maintenance
[09:25:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2194 (T396130)', diff saved to https://phabricator.wikimedia.org/P77680 and previous config saved to /var/cache/conftool/dbconfig/20250611-092518-marostegui.json
[09:26:18] <wikibugs>	 (03PS2) 10Hnowlan: trafficserver: restbaseless reading lists API for all wikis [puppet] - 10https://gerrit.wikimedia.org/r/1149625 (https://phabricator.wikimedia.org/T384891)
[09:29:13] <wikibugs>	 (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1155210 (https://phabricator.wikimedia.org/T363581) (owner: 10Ladsgroup)
[09:29:46] <wikibugs>	 (03PS1) 10Joal: Fix analytics webrequest data purge [puppet] - 10https://gerrit.wikimedia.org/r/1155621 (https://phabricator.wikimedia.org/T395934)
[09:30:16] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1155610 (https://phabricator.wikimedia.org/T396561) (owner: 10Vgutierrez)
[09:30:50] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] hiera: Switch magru to unified cert issued by GTS [puppet] - 10https://gerrit.wikimedia.org/r/1155593 (https://phabricator.wikimedia.org/T395131) (owner: 10Vgutierrez)
[09:30:59] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Fix analytics webrequest data purge [puppet] - 10https://gerrit.wikimedia.org/r/1155621 (https://phabricator.wikimedia.org/T395934) (owner: 10Joal)
[09:31:24] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Add a script to visualize the dependencies of admin_ng environments [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155212 (https://phabricator.wikimedia.org/T389080) (owner: 10JMeybohm)
[09:31:27] <wikibugs>	 (03CR) 10JMeybohm: [V:03+2 C:03+2] CI: Remove invasive log message on helmfile compilation error [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155204 (https://phabricator.wikimedia.org/T396234) (owner: 10JMeybohm)
[09:34:34] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[09:36:07] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P77681 and previous config saved to /var/cache/conftool/dbconfig/20250611-093606-fceratto.json
[09:37:17] <vgutierrez>	 !log use Google Trust Services (GTS) unified TLS certificate on magru - T395131
[09:37:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:37:20] <stashbot>	 T395131: Replace Digicert TLS certs with Google Trust Services ones - https://phabricator.wikimedia.org/T395131
[09:37:34] <wikibugs>	 (03CR) 10Majavah: "I'm not a huge fan of `src_sets` overriding `srange`. What do you think about adding both rules if both are set, or at least throwing a vi" [puppet] - 10https://gerrit.wikimedia.org/r/1155609 (owner: 10Muehlenhoff)
[09:38:13] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
[09:40:37] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Revert "Revert back to install7001" [puppet] - 10https://gerrit.wikimedia.org/r/1155616 (https://phabricator.wikimedia.org/T394263) (owner: 10Muehlenhoff)
[09:40:45] <logmsgbot>	 !log brouberol@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
[09:42:10] <wikibugs>	 (03PS1) 10Ayounsi: DHCP: install7001->7002 [homer/public] - 10https://gerrit.wikimedia.org/r/1155622 (https://phabricator.wikimedia.org/T394263)
[09:43:06] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [homer/public] - 10https://gerrit.wikimedia.org/r/1155622 (https://phabricator.wikimedia.org/T394263) (owner: 10Ayounsi)
[09:43:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194 (T396130)', diff saved to https://phabricator.wikimedia.org/P77682 and previous config saved to /var/cache/conftool/dbconfig/20250611-094319-marostegui.json
[09:43:23] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[09:43:42] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] DHCP: install7001->7002 [homer/public] - 10https://gerrit.wikimedia.org/r/1155622 (https://phabricator.wikimedia.org/T394263) (owner: 10Ayounsi)
[09:43:45] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Revert "Revert back to install7001" [puppet] - 10https://gerrit.wikimedia.org/r/1155616 (https://phabricator.wikimedia.org/T394263) (owner: 10Muehlenhoff)
[09:44:12] <wikibugs>	 (03Merged) 10jenkins-bot: DHCP: install7001->7002 [homer/public] - 10https://gerrit.wikimedia.org/r/1155622 (https://phabricator.wikimedia.org/T394263) (owner: 10Ayounsi)
[09:44:13] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
[09:47:59] <logmsgbot>	 jmm@cumin1003 drain-node (PID 1114280) is awaiting input
[09:48:19] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
[09:51:14] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1186 (T395241)', diff saved to https://phabricator.wikimedia.org/P77683 and previous config saved to /var/cache/conftool/dbconfig/20250611-095113-fceratto.json
[09:51:32] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
[09:51:39] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db1195 (T395241)', diff saved to https://phabricator.wikimedia.org/P77684 and previous config saved to /var/cache/conftool/dbconfig/20250611-095139-fceratto.json
[09:51:51] <wikibugs>	 (03Merged) 10jenkins-bot: CI: Remove invasive log message on helmfile compilation error [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155204 (https://phabricator.wikimedia.org/T396234) (owner: 10JMeybohm)
[09:51:52] <wikibugs>	 (03Merged) 10jenkins-bot: Add a script to visualize the dependencies of admin_ng environments [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155212 (https://phabricator.wikimedia.org/T389080) (owner: 10JMeybohm)
[09:53:04] <vgutierrez>	 !log restarting varnish on cp5018 to clear VarnishChildRestarted alert
[09:53:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:54] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1037.eqiad.wmnet
[09:56:01] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1037.eqiad.wmnet
[09:56:34] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[09:58:26] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P77686 and previous config saved to /var/cache/conftool/dbconfig/20250611-095825-marostegui.json
[09:58:27] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence: ms-fe2015 is suffering intermittent errors on port 80 - https://phabricator.wikimedia.org/T396573#10903511 (10Ladsgroup) I looked at the host a bit, it looks healthy (no swapping, no cpu saturation, etc.), nothing in kernel logs, the proxy-logs don't show anything out o...
[09:59:05] <wikibugs>	 (03PS1) 10Muehlenhoff: Failover webproxy to install7002 [dns] - 10https://gerrit.wikimedia.org/r/1155624 (https://phabricator.wikimedia.org/T394263)
[09:59:34] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence: ms-fe2015 is suffering intermittent errors on port 80 - https://phabricator.wikimedia.org/T396573#10903517 (10Ladsgroup) What Matthew said about the front-end proxies was that when I doubt, just reboot them, it has uptime of 64 days and should be rebooted anyway, should...
[10:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250611T1000)
[10:00:07] <logmsgbot>	 !log brouberol@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
[10:01:48] <wikibugs>	 (03PS5) 10Brouberol: Configure dse-k8s-worker100[2-3] with the dse_k8s::worker role [puppet] - 10https://gerrit.wikimedia.org/r/1155120 (https://phabricator.wikimedia.org/T395557)
[10:02:00] <wikibugs>	 (03CR) 10JMeybohm: [V:03+2 C:03+2] Make simple-cfssl usable for local WMF PKI deployments [software/cfssl-issuer] - 10https://gerrit.wikimedia.org/r/1154266 (https://phabricator.wikimedia.org/T396107) (owner: 10JMeybohm)
[10:02:00] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence: ms-fe2015 is suffering intermittent errors on port 80 - https://phabricator.wikimedia.org/T396573#10903523 (10Vgutierrez) please go ahead @Ladsgroup
[10:02:20] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195 (T395241)', diff saved to https://phabricator.wikimedia.org/P77687 and previous config saved to /var/cache/conftool/dbconfig/20250611-100220-fceratto.json
[10:02:33] <wikibugs>	 (03PS6) 10JMeybohm: cfssl-issuer: Allow to provide a custom CA certificate store [deployment-charts] - 10https://gerrit.wikimedia.org/r/1153978 (https://phabricator.wikimedia.org/T396107)
[10:02:45] <wikibugs>	 (03PS6) 10JMeybohm: coredns: Run coredns on an unprivileged port (5353) instead of 53 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1153977 (https://phabricator.wikimedia.org/T396107)
[10:03:27] <wikibugs>	 (03CR) 10CI reject: [V:04-1] cfssl-issuer: Allow to provide a custom CA certificate store [deployment-charts] - 10https://gerrit.wikimedia.org/r/1153978 (https://phabricator.wikimedia.org/T396107) (owner: 10JMeybohm)
[10:03:40] <wikibugs>	 (03CR) 10CI reject: [V:04-1] coredns: Run coredns on an unprivileged port (5353) instead of 53 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1153977 (https://phabricator.wikimedia.org/T396107) (owner: 10JMeybohm)
[10:04:11] <wikibugs>	 (03CR) 10JMeybohm: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1153978 (https://phabricator.wikimedia.org/T396107) (owner: 10JMeybohm)
[10:05:41] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Add netflow1003 to profile::kafka::broker::custom_ferm_srange_component [puppet] - 10https://gerrit.wikimedia.org/r/1155599 (owner: 10Ayounsi)
[10:06:08] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
[10:06:58] <wikibugs>	 (03PS1) 10JMeybohm: Revert "CI: Remove invasive log message on helmfile compilation error" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155626
[10:07:36] <moritzm>	 kubestagemaster1003 will go down for a Ganeti reboot
[10:07:41] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1038.eqiad.wmnet
[10:09:36] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[10:09:40] <icinga-wm>	 PROBLEM - Host kubestagemaster1003 is DOWN: PING CRITICAL - Packet loss = 100%
[10:11:18] <wikibugs>	 (03PS2) 10JMeybohm: Revert "CI: Remove invasive log message on helmfile compilation error" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155626
[10:12:25] <wikibugs>	 (03CR) 10JMeybohm: [V:03+2 C:03+2] Revert "CI: Remove invasive log message on helmfile compilation error" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155626 (owner: 10JMeybohm)
[10:12:43] <wikibugs>	 (03PS7) 10JMeybohm: cfssl-issuer: Allow to provide a custom CA certificate store [deployment-charts] - 10https://gerrit.wikimedia.org/r/1153978 (https://phabricator.wikimedia.org/T396107)
[10:12:53] <wikibugs>	 (03PS7) 10JMeybohm: coredns: Run coredns on an unprivileged port (5353) instead of 53 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1153977 (https://phabricator.wikimedia.org/T396107)
[10:13:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P77688 and previous config saved to /var/cache/conftool/dbconfig/20250611-101332-marostegui.json
[10:13:57] <jinxer-wm>	 FIRING: KubernetesCalicoDown: kubestagemaster1003.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-staging&var-instance=kubestagemaster1003.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[10:13:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Assign ncredir role to ncredir7003 [puppet] - 10https://gerrit.wikimedia.org/r/1153947 (https://phabricator.wikimedia.org/T394263) (owner: 10Muehlenhoff)
[10:14:03] <wikibugs>	 06SRE, 07SRE-Unowned, 10Maps: Create a new bucket for Tegola's tile cache and duplicate its data - https://phabricator.wikimedia.org/T396584 (10elukey) 03NEW
[10:14:29] <wikibugs>	 06SRE, 07SRE-Unowned, 06Data-Persistence, 10Maps: Create a new bucket for Tegola's tile cache and duplicate its data - https://phabricator.wikimedia.org/T396584#10903575 (10elukey)
[10:15:18] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1038.eqiad.wmnet
[10:15:33] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
[10:16:00] <icinga-wm>	 RECOVERY - Host kubestagemaster1003 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms
[10:16:54] <icinga-wm>	 PROBLEM - Host ms-fe2015 is DOWN: PING CRITICAL - Packet loss = 100%
[10:17:27] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P77689 and previous config saved to /var/cache/conftool/dbconfig/20250611-101727-fceratto.json
[10:17:56] <icinga-wm>	 RECOVERY - Host ms-fe2015 is UP: PING OK - Packet loss = 0%, RTA = 30.30 ms
[10:18:57] <jinxer-wm>	 RESOLVED: KubernetesCalicoDown: kubestagemaster1003.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-staging&var-instance=kubestagemaster1003.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[10:20:01] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
[10:22:37] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence: ms-fe2015 is suffering intermittent errors on port 80 - https://phabricator.wikimedia.org/T396573#10903594 (10Ladsgroup) rebooted and I'm seeing the requests are flowing again with 200s. Let's see if that fixes the issue.
[10:23:45] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
[10:27:07] <wikibugs>	 (03CR) 10Mvolz: [C:03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155195 (owner: 10PipelineBot)
[10:28:01] <wikibugs>	 (03PS3) 10Muehlenhoff: Add puppetserver2004 [dns] - 10https://gerrit.wikimedia.org/r/1154296 (https://phabricator.wikimedia.org/T381274)
[10:28:14] <wikibugs>	 (03PS2) 10Muehlenhoff: Add ncredir7003 to conftool [puppet] - 10https://gerrit.wikimedia.org/r/1153948 (https://phabricator.wikimedia.org/T394263)
[10:28:38] <wikibugs>	 (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155195 (owner: 10PipelineBot)
[10:28:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2194 (T396130)', diff saved to https://phabricator.wikimedia.org/P77690 and previous config saved to /var/cache/conftool/dbconfig/20250611-102839-marostegui.json
[10:28:43] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[10:28:56] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2205.codfw.wmnet with reason: Maintenance
[10:29:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2205 (T396130)', diff saved to https://phabricator.wikimedia.org/P77691 and previous config saved to /var/cache/conftool/dbconfig/20250611-102902-marostegui.json
[10:29:04] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
[10:29:39] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
[10:30:42] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, June 12 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1151693 (https://phabricator.wikimedia.org/T395668) (owner: 10Gkyziridis)
[10:30:46] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] ores-extension: enable oresUI for the second batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823) (owner: 10Gkyziridis)
[10:31:48] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove obsolete Cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/1155629
[10:32:20] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[10:32:32] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, June 12 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823) (owner: 10Gkyziridis)
[10:32:32] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1040.eqiad.wmnet
[10:32:34] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P77692 and previous config saved to /var/cache/conftool/dbconfig/20250611-103234-fceratto.json
[10:35:56] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] coredns: Run coredns on an unprivileged port (5353) instead of 53 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1153977 (https://phabricator.wikimedia.org/T396107) (owner: 10JMeybohm)
[10:37:24] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] cfssl-issuer: Allow to provide a custom CA certificate store [deployment-charts] - 10https://gerrit.wikimedia.org/r/1153978 (https://phabricator.wikimedia.org/T396107) (owner: 10JMeybohm)
[10:39:12] <logmsgbot>	 jmm@cumin1003 drain-node (PID 1121814) is awaiting input
[10:40:31] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
[10:41:34] <wikibugs>	 (03CR) 10Muehlenhoff: "Me neither, but OTOH this is really just transitionary: Once all call sites have moved off profile::memcached::srange, I'll change the log" [puppet] - 10https://gerrit.wikimedia.org/r/1155609 (owner: 10Muehlenhoff)
[10:45:51] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
[10:45:58] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1040.eqiad.wmnet
[10:46:48] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
[10:46:55] <wikibugs>	 (03CR) 10Majavah: [C:03+1] "sgtm!" [puppet] - 10https://gerrit.wikimedia.org/r/1155609 (owner: 10Muehlenhoff)
[10:47:42] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1195 (T395241)', diff saved to https://phabricator.wikimedia.org/P77693 and previous config saved to /var/cache/conftool/dbconfig/20250611-104741-fceratto.json
[10:47:50] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2205 (T396130)', diff saved to https://phabricator.wikimedia.org/P77694 and previous config saved to /var/cache/conftool/dbconfig/20250611-104750-marostegui.json
[10:47:54] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[10:48:00] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
[10:48:18] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[10:48:25] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db1196 (T395241)', diff saved to https://phabricator.wikimedia.org/P77695 and previous config saved to /var/cache/conftool/dbconfig/20250611-104825-fceratto.json
[10:48:34] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] "LGTM, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1155609 (owner: 10Muehlenhoff)
[10:50:00] <moritzm>	 kubestagemaster1004 and dse-k8s-etcd1002 will go down for a Ganeti reboot
[10:50:05] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
[10:52:04] <icinga-wm>	 PROBLEM - Host dse-k8s-etcd1002 is DOWN: PING CRITICAL - Packet loss = 100%
[10:52:32] <icinga-wm>	 PROBLEM - Host kubestagemaster1004 is DOWN: PING CRITICAL - Packet loss = 100%
[10:53:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Temporarily disable access for Jon [puppet] - 10https://gerrit.wikimedia.org/r/1152307 (owner: 10Jdlrobson)
[10:54:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] "For when you're back; ping me and we will reinstate your access" [puppet] - 10https://gerrit.wikimedia.org/r/1152307 (owner: 10Jdlrobson)
[10:55:26] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
[10:55:36] <icinga-wm>	 RECOVERY - Host dse-k8s-etcd1002 is UP: PING OK - Packet loss = 0%, RTA = 1.12 ms
[10:55:41] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
[10:56:02] <icinga-wm>	 RECOVERY - Host kubestagemaster1004 is UP: PING OK - Packet loss = 0%, RTA = 0.76 ms
[10:56:57] <jinxer-wm>	 FIRING: KubernetesCalicoDown: kubestagemaster1004.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-staging&var-instance=kubestagemaster1004.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[10:57:01] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1042.eqiad.wmnet
[10:57:41] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service kubestagemaster1004:6443 has failed probes (http_staging_eqiad_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#kubestagemaster1004:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:59:01] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196 (T395241)', diff saved to https://phabricator.wikimedia.org/P77696 and previous config saved to /var/cache/conftool/dbconfig/20250611-105900-fceratto.json
[11:00:06] <jouncebot>	 mvolz: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Citoid / Zotero deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250611T1100).
[11:00:28] <jinxer-wm>	 RESOLVED: KeyholderUnarmed: 1 unarmed Keyholder key(s) on alert2002:9100 - https://wikitech.wikimedia.org/wiki/Keyholder - TODO - https://alerts.wikimedia.org/?q=alertname%3DKeyholderUnarmed
[11:01:03] <moritzm>	 ml-etcd1001 will go down for a Ganeti reboot
[11:01:08] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
[11:01:57] <jinxer-wm>	 RESOLVED: KubernetesCalicoDown: kubestagemaster1004.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-staging&var-instance=kubestagemaster1004.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[11:02:18] <logmsgbot>	 !log brouberol@deploy1003 helmfile [eqiad] START helmfile.d/admin 'apply'.
[11:02:28] <logmsgbot>	 !log brouberol@deploy1003 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[11:02:41] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service kubestagemaster1004:6443 has failed probes (http_staging_eqiad_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#kubestagemaster1004:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:02:47] <logmsgbot>	 !log mvolz@deploy1003 helmfile [staging] START helmfile.d/services/citoid: apply
[11:02:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P77697 and previous config saved to /var/cache/conftool/dbconfig/20250611-110257-marostegui.json
[11:03:09] <logmsgbot>	 !log mvolz@deploy1003 helmfile [staging] DONE helmfile.d/services/citoid: apply
[11:03:12] <icinga-wm>	 PROBLEM - Host ml-etcd1001 is DOWN: PING CRITICAL - Packet loss = 100%
[11:03:50] <wikibugs>	 (03PS1) 10Ladsgroup: mariadb: Comment out m4 [puppet] - 10https://gerrit.wikimedia.org/r/1155637 (https://phabricator.wikimedia.org/T395999)
[11:05:31] <logmsgbot>	 !log mvolz@deploy1003 helmfile [codfw] START helmfile.d/services/citoid: apply
[11:05:42] <icinga-wm>	 RECOVERY - Host ml-etcd1001 is UP: PING OK - Packet loss = 0%, RTA = 0.59 ms
[11:05:57] <logmsgbot>	 !log mvolz@deploy1003 helmfile [codfw] DONE helmfile.d/services/citoid: apply
[11:06:21] <logmsgbot>	 !log mvolz@deploy1003 helmfile [eqiad] START helmfile.d/services/citoid: apply
[11:06:41] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
[11:06:49] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1042.eqiad.wmnet
[11:06:51] <logmsgbot>	 !log mvolz@deploy1003 helmfile [eqiad] DONE helmfile.d/services/citoid: apply
[11:07:52] <wikibugs>	 06SRE, 06Data-Engineering, 10LDAP-Access-Requests: Grant Access to Product's Superset & Turnilo for SKivlehan - https://phabricator.wikimedia.org/T393626#10903827 (10elukey) @SKivlehan-WMF Hi! I think you need to request access to the `wmf` LDAP group, please check https://wikitech.wikimedia.org/wiki/SRE...
[11:08:28] <elukey>	 c/12
[11:08:30] <elukey>	 ufff
[11:09:30] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[11:09:31] <wikibugs>	 (03Abandoned) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1154437 (owner: 10PipelineBot)
[11:10:09] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Move ganeti20[45-50] into production - https://phabricator.wikimedia.org/T396590 (10MoritzMuehlenhoff) 03NEW
[11:10:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: httpbb_kubernetes_mw-web-next_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:10:47] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Move ganeti2045-ganeti2050 into production and decom ganeti2019-ganeti2024 - https://phabricator.wikimedia.org/T396590#10903839 (10MoritzMuehlenhoff) p:05Triage→03Medium
[11:13:17] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1043.eqiad.wmnet
[11:14:07] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P77698 and previous config saved to /var/cache/conftool/dbconfig/20250611-111407-fceratto.json
[11:15:19] <wikibugs>	 (03PS2) 10Volans: phabricator: expand support for Phabricator tasks [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1154786
[11:15:32] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[11:16:36] <wikibugs>	 (03CR) 10Volans: "addressed comments" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/1154786 (owner: 10Volans)
[11:16:44] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet
[11:16:58] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] mariadb: Comment out m4 [puppet] - 10https://gerrit.wikimedia.org/r/1155637 (https://phabricator.wikimedia.org/T395999) (owner: 10Ladsgroup)
[11:17:26] <moritzm>	 !log installing librabbitmq security updates
[11:17:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:18:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P77699 and previous config saved to /var/cache/conftool/dbconfig/20250611-111805-marostegui.json
[11:19:30] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[11:20:14] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add ncredir7003 to conftool [puppet] - 10https://gerrit.wikimedia.org/r/1153948 (https://phabricator.wikimedia.org/T394263) (owner: 10Muehlenhoff)
[11:20:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: httpbb_kubernetes_mw-web-next_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:20:51] <wikibugs>	 (03PS1) 10Andrew-WMDE: wikidata-query-gui: Bump query-gui image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155643 (https://phabricator.wikimedia.org/T396002)
[11:22:04] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet
[11:22:11] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1043.eqiad.wmnet
[11:24:31] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1044.eqiad.wmnet
[11:27:09] <wikibugs>	 (03PS1) 10Gmodena: dse-k8s-eqiad: remove deprecated dumps 2 config. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155644 (https://phabricator.wikimedia.org/T396593)
[11:28:35] <Ammar>	 !log Ran fixStuckGlobalRename.php for T396545
[11:28:38] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet
[11:28:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:39] <stashbot>	 T396545: Unblock stuck global rename of Tok'ra Operative - https://phabricator.wikimedia.org/T396545
[11:28:42] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
[11:28:44] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Add netflow1003 to profile::kafka::broker::custom_ferm_srange_component [puppet] - 10https://gerrit.wikimedia.org/r/1155599 (owner: 10Ayounsi)
[11:29:15] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P77700 and previous config saved to /var/cache/conftool/dbconfig/20250611-112914-fceratto.json
[11:29:50] <wikibugs>	 (03PS4) 10Ladsgroup: Add x1 to DBRecordCache for dumps [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1145243
[11:32:04] <logmsgbot>	 !log jmm@puppetserver1001 conftool action : set/weight=1; selector: name=ncredir7003.magru.wmnet
[11:32:15] <logmsgbot>	 !log jmm@puppetserver1001 conftool action : set/pooled=yes; selector: name=ncredir7003.magru.wmnet
[11:32:19] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Failover webproxy to install7002 [dns] - 10https://gerrit.wikimedia.org/r/1155624 (https://phabricator.wikimedia.org/T394263) (owner: 10Muehlenhoff)
[11:33:12] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2205 (T396130)', diff saved to https://phabricator.wikimedia.org/P77701 and previous config saved to /var/cache/conftool/dbconfig/20250611-113312-marostegui.json
[11:33:17] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[11:33:19] <wikibugs>	 (03PS1) 10Brouberol: airflow: upgrade base image to pull a new cncf-kubernetes provider version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155645 (https://phabricator.wikimedia.org/T396476)
[11:33:28] <logmsgbot>	 !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2227.codfw.wmnet with reason: Maintenance
[11:33:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2227 (T396130)', diff saved to https://phabricator.wikimedia.org/P77702 and previous config saved to /var/cache/conftool/dbconfig/20250611-113336-marostegui.json
[11:33:59] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet
[11:33:59] <XioNoX>	 !log disable lvs7003 secondary link switch port - T367731
[11:34:01] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
[11:34:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:34:05] <logmsgbot>	 !log klausman@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ml-serve1001.eqiad.wmnet
[11:34:05] <stashbot>	 T367731: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731
[11:34:19] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Failover webproxy to install7002 [dns] - 10https://gerrit.wikimedia.org/r/1155624 (https://phabricator.wikimedia.org/T394263) (owner: 10Muehlenhoff)
[11:34:23] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
[11:34:24] <logmsgbot>	 !log jmm@dns1004 START - running authdns-update
[11:34:33] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1044.eqiad.wmnet
[11:35:14] <logmsgbot>	 !log jmm@dns1004 END - running authdns-update
[11:35:19] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1045.eqiad.wmnet
[11:35:47] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow: upgrade base image to pull a new cncf-kubernetes provider version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155645 (https://phabricator.wikimedia.org/T396476) (owner: 10Brouberol)
[11:37:32] <wikibugs>	 (03PS6) 10Gkyziridis: ores-extension: enable revertrisk filter for simplewiki and trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1151693 (https://phabricator.wikimedia.org/T395668)
[11:37:41] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ores-extension: enable revertrisk filter for simplewiki and trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1151693 (https://phabricator.wikimedia.org/T395668) (owner: 10Gkyziridis)
[11:38:07] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet
[11:38:09] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[11:39:02] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
[11:39:18] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[11:41:09] <logmsgbot>	 !log klausman@cumin2002 START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
[11:42:02] <XioNoX>	 !log disable lvs3010 secondary link switch port - T367731
[11:42:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:42:05] <stashbot>	 T367731: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731
[11:43:23] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet
[11:43:28] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1045.eqiad.wmnet
[11:44:23] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1196 (T395241)', diff saved to https://phabricator.wikimedia.org/P77703 and previous config saved to /var/cache/conftool/dbconfig/20250611-114422-fceratto.json
[11:44:41] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
[11:44:47] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db1206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77704 and previous config saved to /var/cache/conftool/dbconfig/20250611-114447-fceratto.json
[11:46:12] <logmsgbot>	 !log klausman@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
[11:47:17] <wikibugs>	 (03PS7) 10Gkyziridis: ores-extension: enable revertrisk filter for simplewiki and trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1151693 (https://phabricator.wikimedia.org/T395668)
[11:51:41] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77706 and previous config saved to /var/cache/conftool/dbconfig/20250611-115140-fceratto.json
[11:51:58] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1046.eqiad.wmnet
[11:52:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2227 (T396130)', diff saved to https://phabricator.wikimedia.org/P77707 and previous config saved to /var/cache/conftool/dbconfig/20250611-115231-marostegui.json
[11:52:36] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[11:56:17] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
[11:57:49] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove ncredir7001 from conftool [puppet] - 10https://gerrit.wikimedia.org/r/1155649 (https://phabricator.wikimedia.org/T394263)
[12:01:32] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
[12:01:38] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1046.eqiad.wmnet
[12:06:48] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P77708 and previous config saved to /var/cache/conftool/dbconfig/20250611-120648-fceratto.json
[12:07:01] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
[12:07:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P77709 and previous config saved to /var/cache/conftool/dbconfig/20250611-120740-marostegui.json
[12:12:18] <wikibugs>	 (03PS1) 10Brouberol: Revert "airflow: upgrade base image to pull a new cncf-kubernetes provider version" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155651
[12:12:52] <wikibugs>	 (03PS1) 10Gkyziridis: ores-extension: enable extension with revertrisk filter for the third batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155652 (https://phabricator.wikimedia.org/T395824)
[12:13:38] <wikibugs>	 (03CR) 10CI reject: [V:04-1] ores-extension: enable extension with revertrisk filter for the third batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155652 (https://phabricator.wikimedia.org/T395824) (owner: 10Gkyziridis)
[12:15:10] <wikibugs>	 (03PS2) 10Gkyziridis: ores-extension: enable extension with revertrisk filter for the third batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155652 (https://phabricator.wikimedia.org/T395824)
[12:16:13] <wikibugs>	 10ops-esams, 06DC-Ops: esams: remove old lvs secondary links - https://phabricator.wikimedia.org/T396601 (10ayounsi) 03NEW p:05Triage→03Low
[12:16:15] <wikibugs>	 10ops-magru: magru: remove old lvs secondary links - https://phabricator.wikimedia.org/T396602 (10ayounsi) 03NEW p:05Triage→03Low
[12:17:25] <wikibugs>	 10ops-drmrs: drmrs: remove old lvs secondary links - https://phabricator.wikimedia.org/T396603 (10ayounsi) 03NEW p:05Triage→03Low
[12:17:30] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] Revert "airflow: upgrade base image to pull a new cncf-kubernetes provider version" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155651 (owner: 10Brouberol)
[12:17:34] <wikibugs>	 10ops-esams, 06DC-Ops: esams: remove old lvs secondary links - https://phabricator.wikimedia.org/T396601#10904186 (10ayounsi)
[12:21:12] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[12:21:15] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] "Not my area of expertise but logic lgtm." [puppet] - 10https://gerrit.wikimedia.org/r/1155649 (https://phabricator.wikimedia.org/T394263) (owner: 10Muehlenhoff)
[12:21:51] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[12:21:54] <icinga-wm>	 RECOVERY - Disk space on an-worker1154 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-worker1154&var-datasource=eqiad+prometheus/ops
[12:21:55] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P77710 and previous config saved to /var/cache/conftool/dbconfig/20250611-122155-fceratto.json
[12:22:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P77711 and previous config saved to /var/cache/conftool/dbconfig/20250611-122246-marostegui.json
[12:22:56] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Netops: remove check_bgp [puppet] - 10https://gerrit.wikimedia.org/r/1148891 (https://phabricator.wikimedia.org/T388641) (owner: 10Ayounsi)
[12:23:12] <icinga-wm>	 RECOVERY - Disk space on an-worker1110 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-worker1110&var-datasource=eqiad+prometheus/ops
[12:23:12] <icinga-wm>	 RECOVERY - Disk space on an-worker1131 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-worker1131&var-datasource=eqiad+prometheus/ops
[12:25:33] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: ml-services: increase workers in viwiki-reverted [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155655 (https://phabricator.wikimedia.org/T387019)
[12:26:00] <icinga-wm>	 RECOVERY - Disk space on an-worker1093 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-worker1093&var-datasource=eqiad+prometheus/ops
[12:26:56] <icinga-wm>	 RECOVERY - Disk space on an-worker1124 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-worker1124&var-datasource=eqiad+prometheus/ops
[12:27:30] <wikibugs>	 (03CR) 10Dat Nguyen: [C:03+1] wikidata-query-gui: Bump query-gui image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155643 (https://phabricator.wikimedia.org/T396002) (owner: 10Andrew-WMDE)
[12:28:06] <wikibugs>	 (03CR) 10FNegri: [V:03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/5913/console" [puppet] - 10https://gerrit.wikimedia.org/r/1155229 (owner: 10FNegri)
[12:30:09] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+1] ml-services: increase workers in viwiki-reverted [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155655 (https://phabricator.wikimedia.org/T387019) (owner: 10Ilias Sarantopoulos)
[12:30:58] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] ml-services: increase workers in viwiki-reverted [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155655 (https://phabricator.wikimedia.org/T387019) (owner: 10Ilias Sarantopoulos)
[12:31:48] <icinga-wm>	 RECOVERY - Disk space on an-worker1107 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-worker1107&var-datasource=eqiad+prometheus/ops
[12:31:53] <wikibugs>	 (03PS2) 10FNegri: Revert "maintain-dbusers: Revert overly strict type" [puppet] - 10https://gerrit.wikimedia.org/r/1155229 (https://phabricator.wikimedia.org/T395999)
[12:32:26] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: increase workers in viwiki-reverted [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155655 (https://phabricator.wikimedia.org/T387019) (owner: 10Ilias Sarantopoulos)
[12:32:26] <wikibugs>	 (03PS1) 10Aklapper: Don't call time() more than needed [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1155657
[12:32:30] <icinga-wm>	 RECOVERY - Disk space on an-worker1109 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-worker1109&var-datasource=eqiad+prometheus/ops
[12:32:48] <wikibugs>	 (03PS3) 10FNegri: Revert "maintain-dbusers: Revert overly strict type" [puppet] - 10https://gerrit.wikimedia.org/r/1155229 (https://phabricator.wikimedia.org/T395999)
[12:33:08] <wikibugs>	 (03CR) 10FNegri: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1155229 (https://phabricator.wikimedia.org/T395999) (owner: 10FNegri)
[12:33:14] <wikibugs>	 (03CR) 10Aklapper: [V:03+2 C:03+2] Don't call time() more than needed [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1155657 (owner: 10Aklapper)
[12:33:56] <icinga-wm>	 RECOVERY - Disk space on an-worker1117 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-worker1117&var-datasource=eqiad+prometheus/ops
[12:35:26] <icinga-wm>	 RECOVERY - Disk space on an-worker1105 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-worker1105&var-datasource=eqiad+prometheus/ops
[12:35:43] <wikibugs>	 (03PS1) 10Aklapper: move a comment [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1155658
[12:35:51] <wikibugs>	 (03CR) 10Tarrow: [C:03+1] "manually checked the image tag is also present. I apparently don't have +2 here (anymore?)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155643 (https://phabricator.wikimedia.org/T396002) (owner: 10Andrew-WMDE)
[12:36:31] <wikibugs>	 (03PS2) 10Aklapper: move a comment [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1155658
[12:36:50] <wikibugs>	 (03CR) 10Aklapper: [V:03+2 C:03+2] move a comment [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1155658 (owner: 10Aklapper)
[12:37:02] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77712 and previous config saved to /var/cache/conftool/dbconfig/20250611-123702-fceratto.json
[12:37:20] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
[12:37:28] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db1207 (T395241)', diff saved to https://phabricator.wikimedia.org/P77713 and previous config saved to /var/cache/conftool/dbconfig/20250611-123727-fceratto.json
[12:37:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2227 (T396130)', diff saved to https://phabricator.wikimedia.org/P77714 and previous config saved to /var/cache/conftool/dbconfig/20250611-123753-marostegui.json
[12:37:57] <stashbot>	 T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130
[12:41:14] <wikibugs>	 (03CR) 10Majavah: [C:03+1] Revert "maintain-dbusers: Revert overly strict type" [puppet] - 10https://gerrit.wikimedia.org/r/1155229 (https://phabricator.wikimedia.org/T395999) (owner: 10FNegri)
[12:41:32] <XioNoX>	 !log disable lvs7002 secondary link switch port - T367731
[12:41:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:41:36] <stashbot>	 T367731: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731
[12:43:42] <XioNoX>	 !log disable lvs7001 secondary link switch port - T367731
[12:43:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:43:49] <wikibugs>	 (03CR) 10Filippo Giunchedi: pdb_resource_exporter: add puppetdb resource exporter to puppedb (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1143600 (https://phabricator.wikimedia.org/T395442) (owner: 10Tiziano Fogli)
[12:44:11] <wikibugs>	 10ops-magru: magru: remove old lvs secondary links - https://phabricator.wikimedia.org/T396602#10904321 (10ayounsi)
[12:44:17] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] hieradata: enable memcache on all titan hosts [puppet] - 10https://gerrit.wikimedia.org/r/1155231 (https://phabricator.wikimedia.org/T394319) (owner: 10Filippo Giunchedi)
[12:45:35] <godog>	 jouncebot: nowandnext
[12:45:35] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 14 minute(s)
[12:45:35] <jouncebot>	 In 0 hour(s) and 14 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250611T1300)
[12:47:37] <logmsgbot>	 !log jmm@cumin1003 END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
[12:47:39] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
[12:48:04] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[12:48:35] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T395241)', diff saved to https://phabricator.wikimedia.org/P77715 and previous config saved to /var/cache/conftool/dbconfig/20250611-124834-fceratto.json
[12:49:17] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[12:50:43] <wikibugs>	 (03CR) 10FNegri: [C:03+2] Revert "maintain-dbusers: Revert overly strict type" [puppet] - 10https://gerrit.wikimedia.org/r/1155229 (https://phabricator.wikimedia.org/T395999) (owner: 10FNegri)
[12:51:03] <wikibugs>	 (03PS1) 10Brouberol: airflow: upgrade base image to pull a new cncf-kubernetes provider version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155659 (https://phabricator.wikimedia.org/T396476)
[12:51:17] <XioNoX>	 !log disable lvs3009 secondary link switch port - T367731
[12:51:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:51:21] <stashbot>	 T367731: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731
[12:53:16] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] cfssl-issuer: Allow to provide a custom CA certificate store [deployment-charts] - 10https://gerrit.wikimedia.org/r/1153978 (https://phabricator.wikimedia.org/T396107) (owner: 10JMeybohm)
[12:53:59] <wikibugs>	 10ops-esams, 06SRE, 06DC-Ops: esams: remove old lvs secondary links - https://phabricator.wikimedia.org/T396601#10904354 (10ayounsi)
[12:54:00] <wikibugs>	 06SRE, 07SRE-Unowned: The ops-maint-gcal.js script is missing support for some vendors - https://phabricator.wikimedia.org/T381680#10904355 (10Scott_French) @elukey - Ah, I wonder if Google might have changed something. The 16384 number was based entirely on bisection with a small number of test events. It see...
[12:54:28] <wikibugs>	 (03Merged) 10jenkins-bot: cfssl-issuer: Allow to provide a custom CA certificate store [deployment-charts] - 10https://gerrit.wikimedia.org/r/1153978 (https://phabricator.wikimedia.org/T396107) (owner: 10JMeybohm)
[12:54:39] <wikibugs>	 10ops-esams, 06SRE, 06DC-Ops: esams: remove old lvs secondary links - https://phabricator.wikimedia.org/T396601#10904373 (10ayounsi)
[12:54:49] <logmsgbot>	 !log jmm@cumin1003 END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
[12:54:52] <XioNoX>	 !log disable lvs3008 secondary link switch port - T367731
[12:54:54] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
[12:54:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:54:57] <godog>	 incoming gerrit spam -- apologies in advance
[12:56:04] <wikibugs>	 (03CR) 10JMeybohm: calico: Add support to manage CNI installation by daemonset (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1153976 (https://phabricator.wikimedia.org/T396107) (owner: 10JMeybohm)
[12:56:22] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T384308 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155218 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:56:28] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T384321 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155222 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:56:34] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T384425 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155226 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:56:38] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T384427 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155230 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:56:42] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T384924 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155248 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:56:46] <wikibugs>	 (03PS2) 10Filippo Giunchedi: monitoring services: add migration task T384922 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155245 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:56:50] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T384933 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155250 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:56:53] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T384938 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155251 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:56:56] <wikibugs>	 (03PS1) 10Aklapper: Penalize on setting Due Date to default value [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1155660 (https://phabricator.wikimedia.org/T396607)
[12:56:57] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T384939 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155254 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:57:01] <wikibugs>	 (03PS2) 10Filippo Giunchedi: monitoring services: add migration task T328502 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155138 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:57:05] <wikibugs>	 (03PS2) 10Filippo Giunchedi: monitoring services: add migration task T384998 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155136 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:57:08] <wikibugs>	 (03PS2) 10Filippo Giunchedi: monitoring services: add migration task T370157 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155134 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:57:12] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T228830 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155144 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:57:15] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T309012 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155143 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:57:18] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T374842 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155142 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:57:22] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T367149 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155141 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:57:25] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T315866 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155139 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:57:29] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T367065 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155137 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:57:46] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T371083 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155131 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:57:50] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T384309 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155133 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:57:58] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T370526 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155132 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:58:05] <logmsgbot>	 !log jmm@cumin1003 END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
[12:58:06] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T374823 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155130 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:58:16] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T374839 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155129 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:58:20] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T375166 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155128 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:58:24] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T384303 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155127 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:58:28] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T384305 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155124 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:58:32] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T385583 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155598 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:58:36] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T385590 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155600 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:58:40] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T321808 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155607 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:58:45] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T358029 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155611 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:58:49] <wikibugs>	 (03PS4) 10Filippo Giunchedi: monitoring services: add migration task T350694 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155140 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:58:53] <wikibugs>	 (03PS1) 10Filippo Giunchedi: monitoring services: add migration task T332764 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155627 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:58:57] <wikibugs>	 (03PS2) 10Filippo Giunchedi: monitoring services: add migration task T385587 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155135 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:59:01] <wikibugs>	 (03PS2) 10Filippo Giunchedi: monitoring services: add migration task T384214 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155619 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:59:05] <wikibugs>	 (03PS2) 10Filippo Giunchedi: monitoring services: add migration task T362397 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155612 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:59:13] <wikibugs>	 (03PS2) 10Filippo Giunchedi: monitoring services: add migration task T370530 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155625 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:59:17] <wikibugs>	 (03PS4) 10Filippo Giunchedi: monitoring services: add migration task T357099 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155145 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:59:21] <wikibugs>	 (03PS2) 10Filippo Giunchedi: monitoring services: add migration task T384830 to instances [puppet] - 10https://gerrit.wikimedia.org/r/1155240 (https://phabricator.wikimedia.org/T395443) (owner: 10Tiziano Fogli)
[12:59:25] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow: upgrade base image to pull a new cncf-kubernetes provider version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155659 (https://phabricator.wikimedia.org/T396476) (owner: 10Brouberol)
[13:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: May I have your attention please! UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250611T1300)
[13:00:05] <jouncebot>	 xSavitar, Lucas_WMDE, edsanders, and MatmaRex: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:08] <Lucas_WMDE>	 o/
[13:00:12] <Lucas_WMDE>	 I can deploy!
[13:00:13] <edsanders>	 I'm here
[13:00:16] <xSavitar>	 o/
[13:00:18] <MatmaRex>	 hi
[13:00:19] <XioNoX>	 !log disable lvs6002 secondary link switch port - T367731
[13:00:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:00:23] <stashbot>	 T367731: drmrs/esams/magru LVS : remove cross-rack links - https://phabricator.wikimedia.org/T367731
[13:00:31] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [extensions/Wikibase] (wmf/1.45.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1155244 (https://phabricator.wikimedia.org/T396219) (owner: 10Lucas Werkmeister (WMDE))
[13:00:48] <Lucas_WMDE>	 xSavitar: do you want to self-service your config change?
[13:01:04] <xSavitar>	 You can go ahead, I'm here to help with testing :)
[13:01:09] <Lucas_WMDE>	 ok :)
[13:01:16] <wikibugs>	 (03PS3) 10Jforrester: wikifunctions: Configure memcachedUri for the function-orchestrator and enable [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155241 (https://phabricator.wikimedia.org/T390746)
[13:01:16] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Update evaluators from 2025-06-03-205630 to 2025-06-09-163022 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155661 (https://phabricator.wikimedia.org/T390753)
[13:01:26] <wikibugs>	 (03PS1) 10Jforrester: wikifunctions: Update orchestrator from 2025-06-04-185118 to 2025-06-10-144243 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155662 (https://phabricator.wikimedia.org/T390753)
[13:01:40] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1152064 (https://phabricator.wikimedia.org/T395185) (owner: 10D3r1ck01)
[13:02:35] <wikibugs>	 (03Merged) 10jenkins-bot: SUL3: Enable client hints data on the auth shared domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1152064 (https://phabricator.wikimedia.org/T395185) (owner: 10D3r1ck01)
[13:03:04] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1152064|SUL3: Enable client hints data on the auth shared domain (T395185)]]
[13:03:04] <wikibugs>	 (03PS2) 10Aklapper: Penalize on setting Due Date to default value [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1155660 (https://phabricator.wikimedia.org/T396607)
[13:03:08] <stashbot>	 T395185: Consider enabling client hints on auth.wikimedia.org - https://phabricator.wikimedia.org/T395185
[13:03:27] <akosiaris>	 !log T393557 block requests to /api/rest_v1/page/data-parsoid
[13:03:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:30] <stashbot>	 T393557: Block external traffic to RESTBase /page/data-parsoid endpoint and investigate internal usage - https://phabricator.wikimedia.org/T393557
[13:03:37] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
[13:03:42] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P77716 and previous config saved to /var/cache/conftool/dbconfig/20250611-130341-fceratto.json
[13:03:46] <wikibugs>	 (03CR) 10Aklapper: [V:03+2 C:03+2] Penalize on setting Due Date to default value [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1155660 (https://phabricator.wikimedia.org/T396607) (owner: 10Aklapper)
[13:04:22] <wikibugs>	 10ops-drmrs: drmrs: remove old lvs secondary links - https://phabricator.wikimedia.org/T396603#10904450 (10ayounsi)
[13:05:12] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 d3r1ck01, lucaswerkmeister-wmde: Backport for [[gerrit:1152064|SUL3: Enable client hints data on the auth shared domain (T395185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:05:21] <Lucas_WMDE>	 xSavitar: please test :)
[13:05:24] <xSavitar>	 okay
[13:06:16] <xSavitar>	 Lucas_WMDE, works as expected.
[13:06:27] <xSavitar>	 let's go live :)
[13:06:41] <wikibugs>	 (03CR) 10Samtar: [C:03+1] InitialiseSettings: wgTemplateDataEnableDiscovery on more wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1151831 (https://phabricator.wikimedia.org/T377975) (owner: 10Samwilson)
[13:06:45] <Lucas_WMDE>	 nice!
[13:07:15] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 d3r1ck01, lucaswerkmeister-wmde: Continuing with sync
[13:07:39] <logmsgbot>	 jmm@cumin1003 drain-node (PID 1138830) is awaiting input
[13:08:28] <wikibugs>	 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Bring relforge100[89] into production - https://phabricator.wikimedia.org/T389957#10904472 (10bking) a:03bking
[13:08:41] <xSavitar>	 Lucas_WMDE, thank you so much for deploying. 🙏🏽
[13:11:40] <logmsgbot>	 !log jmm@cumin1003 END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
[13:13:06] <wikibugs>	 06SRE: restbase2030 (and others) running low on disk space - https://phabricator.wikimedia.org/T395845#10904495 (10Eevans) >>! In T395845#10902857, @Jgiannelos wrote: > Hey @Eevans  >  > * Regarding mobile-sections this has been completely decommisioned for long time now. I don't think we need storage for this a...
[13:13:21] <wikibugs>	 (03PS1) 10Samtar: IS: Enable `wgTemplateDataEnableDiscovery` for mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155665 (https://phabricator.wikimedia.org/T377975)
[13:14:14] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1152064|SUL3: Enable client hints data on the auth shared domain (T395185)]] (duration: 11m 09s)
[13:14:17] <stashbot>	 T395185: Consider enabling client hints on auth.wikimedia.org - https://phabricator.wikimedia.org/T395185
[13:14:19] <wikibugs>	 (03CR) 10Elukey: "This will impact some dashboards, most notably the Citoid one (to backlog and we already started the quarter). It should be fine but befor" [puppet] - 10https://gerrit.wikimedia.org/r/1155316 (https://phabricator.wikimedia.org/T395916) (owner: 10Herron)
[13:14:40] <Lucas_WMDE>	 my backport is almost done in CI, so I’ll wait for that to finish
[13:14:50] <edsanders>	 I'll have a go at self-deploying
[13:15:02] <edsanders>	 (after)
[13:15:05] <Lucas_WMDE>	 ok
[13:15:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [extensions/Wikibase] (wmf/1.45.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1155244 (https://phabricator.wikimedia.org/T396219) (owner: 10Lucas Werkmeister (WMDE))
[13:15:17] <Lucas_WMDE>	 (thanks for the clarification, I was about to complain :D)
[13:16:04] <wikibugs>	 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Bring relforge100[89] into production - https://phabricator.wikimedia.org/T389957#10904507 (10bking) 05Open→03Resolved `relforge100[89]` are now part of the cluster:   ` bking@relforge1008:~$ curl -s http://0:9200/_cat/nodes 10.64.164.14...
[13:18:44] <wikibugs>	 06SRE: restbase2030 (and others) running low on disk space - https://phabricator.wikimedia.org/T395845#10904523 (10Eevans) >>! In T395845#10904495, @Eevans wrote: >>>! In T395845#10902857, @Jgiannelos wrote: >>  >> [ ... ] >  > Ok, and AFAIK a truncate would —in the worst case scenario— just result in a cold cac...
[13:18:48] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P77717 and previous config saved to /var/cache/conftool/dbconfig/20250611-131848-fceratto.json
[13:19:42] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] "ncredir7003 is already pooled, all good :)" [puppet] - 10https://gerrit.wikimedia.org/r/1155649 (https://phabricator.wikimedia.org/T394263) (owner: 10Muehlenhoff)
[13:20:15] <jinxer-wm>	 FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid releases routed via main (k8s) 1.119s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[13:23:36] <akosiaris>	 This ^ is probably because of T393557. Calls to parsoid have dropped in codfw from 0.15rps to close to 0. 
[13:23:36] <stashbot>	 T393557: Block external traffic to RESTBase /page/data-parsoid endpoint and investigate internal usage - https://phabricator.wikimedia.org/T393557
[13:23:38] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence: ms-fe2015 is suffering intermittent errors on port 80 - https://phabricator.wikimedia.org/T396573#10904567 (10Vgutierrez) 05Open→03Resolved a:03Ladsgroup No errors since the reboot, feel free to re-open the task if the issue re-appears: ` vgutierrez@lvs2013:~$...
[13:23:59] <wikibugs>	 (03PS15) 10Ayounsi: Tox: add Python3.12 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452
[13:24:32] <akosiaris>	 I 'll silence it for now, let's see how the alert will behave in 15m or so.
[13:25:15] <jinxer-wm>	 RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid releases routed via main (k8s) 1.119s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[13:25:43] <akosiaris>	 🤞 we will be able to remove mw-parsoid soon
[13:25:59] <wikibugs>	 (03Abandoned) 10Ayounsi: MR: rollback gNMI [homer/public] - 10https://gerrit.wikimedia.org/r/1133398 (https://phabricator.wikimedia.org/T390052) (owner: 10Ayounsi)
[13:26:36] <Lucas_WMDE>	 that CI build is taking longer than expected
[13:26:38] <Lucas_WMDE>	 :S
[13:28:02] <Lucas_WMDE>	 wondering if I should stop my scap and let edsanders go first
[13:28:06] <Lucas_WMDE>	 ah, no, it just finished!
[13:28:18] <Lucas_WMDE>	 then let’s let that deploy go through :)
[13:28:29] <wikibugs>	 (03Merged) 10jenkins-bot: Update searchsuggest message key [extensions/Wikibase] (wmf/1.45.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1155244 (https://phabricator.wikimedia.org/T396219) (owner: 10Lucas Werkmeister (WMDE))
[13:28:52] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1155244|Update searchsuggest message key (T396219)]]
[13:28:53] <wikibugs>	 (03Abandoned) 10Ayounsi: [WIP] Initial SONiC config from Homer YAML [homer/public] - 10https://gerrit.wikimedia.org/r/940867 (https://phabricator.wikimedia.org/T320638) (owner: 10Ayounsi)
[13:28:55] <stashbot>	 T396219: ScopedTypeaheadSearch: update "searchsuggest" message key - https://phabricator.wikimedia.org/T396219
[13:29:09] <claime>	 akosiaris: that's awesome :D
[13:29:34] <claime>	 akosiaris: if the alert doesn't shut off we can add a minimum rps threshold for it to fire
[13:29:57] <wikibugs>	 (03CR) 10Xcollazo: [C:03+1] dse-k8s-eqiad: remove deprecated dumps 2 config. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155644 (https://phabricator.wikimedia.org/T396593) (owner: 10Gmodena)
[13:31:01] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde: Backport for [[gerrit:1155244|Update searchsuggest message key (T396219)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:31:14] <Lucas_WMDE>	 testing…
[13:31:31] <Lucas_WMDE>	 works! \o/
[13:31:42] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde: Continuing with sync
[13:33:38] <wikibugs>	 (03CR) 10Herron: [C:03+1] thanos: enable tracing for store [puppet] - 10https://gerrit.wikimedia.org/r/1155153 (https://phabricator.wikimedia.org/T394318) (owner: 10Filippo Giunchedi)
[13:33:51] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Tox: add Python3.12 support [software/spicerack] - 10https://gerrit.wikimedia.org/r/1050452 (owner: 10Ayounsi)
[13:33:56] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T395241)', diff saved to https://phabricator.wikimedia.org/P77718 and previous config saved to /var/cache/conftool/dbconfig/20250611-133355-fceratto.json
[13:34:14] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
[13:34:21] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db1218 (T395241)', diff saved to https://phabricator.wikimedia.org/P77719 and previous config saved to /var/cache/conftool/dbconfig/20250611-133420-fceratto.json
[13:34:37] <Lucas_WMDE>	 MatmaRex: I’m guessing your to changes can be deployed together (once we get to them)?
[13:34:53] <MatmaRex>	 Lucas_WMDE: yep
[13:34:59] <Lucas_WMDE>	 ack
[13:36:51] <wikibugs>	 (03CR) 10KartikMistry: [C:03+2] Update recommendation-api to 2025-06-10-203235-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155359 (https://phabricator.wikimedia.org/T374695) (owner: 10KartikMistry)
[13:37:43] <wikibugs>	 06SRE, 06Data-Engineering: WE 5.4 FY 25/26: Improve automata detection at the edge and pass it to the refinery pipeline - https://phabricator.wikimedia.org/T396562#10904642 (10Joe) There's a few open questions here: * In terms of pure traffic control, which is what SRE want, only running detection on cache mis...
[13:38:23] <wikibugs>	 (03Merged) 10jenkins-bot: Update recommendation-api to 2025-06-10-203235-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155359 (https://phabricator.wikimedia.org/T374695) (owner: 10KartikMistry)
[13:38:49] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1155244|Update searchsuggest message key (T396219)]] (duration: 09m 57s)
[13:38:52] <stashbot>	 T396219: ScopedTypeaheadSearch: update "searchsuggest" message key - https://phabricator.wikimedia.org/T396219
[13:38:59] <Lucas_WMDE>	 Now witness the firepower of this fully armed and operational SpiderPig!
[13:39:01] <Lucas_WMDE>	 deploy at will, edsanders
[13:39:47] <logmsgbot>	 !log kartik@deploy1003 helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
[13:39:51] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by esanders@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155295 (https://phabricator.wikimedia.org/T392121) (owner: 10Esanders)
[13:40:04] <Lucas_WMDE>	 “Running '/usr/local/sbin/restart-php-fpm-all php7.4-fpm [snip]' on 4 host(s)” o_O are we still running php7.4?
[13:40:34] <wikibugs>	 (03PS3) 10Ayounsi: Bird: use the "interface" config option for v6 peers [puppet] - 10https://gerrit.wikimedia.org/r/1052109 (https://phabricator.wikimedia.org/T362392)
[13:40:46] <wikibugs>	 (03Merged) 10jenkins-bot: Enable DiscussionTools visual enhancements everywhere except 12 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155295 (https://phabricator.wikimedia.org/T392121) (owner: 10Esanders)
[13:40:59] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Bird: use the "interface" config option for v6 peers [puppet] - 10https://gerrit.wikimedia.org/r/1052109 (https://phabricator.wikimedia.org/T362392) (owner: 10Ayounsi)
[13:41:08] <logmsgbot>	 !log esanders@deploy1003 Started scap sync-world: Backport for [[gerrit:1155295|Enable DiscussionTools visual enhancements everywhere except 12 wikis (T392121)]]
[13:41:12] <stashbot>	 T392121: Phase 4: Offer Usability Improvements as default-on feature at wikis - https://phabricator.wikimedia.org/T392121
[13:41:58] <wikibugs>	 (03PS4) 10Ayounsi: Bird: use the "interface" config option for v6 peers [puppet] - 10https://gerrit.wikimedia.org/r/1052109 (https://phabricator.wikimedia.org/T362392)
[13:42:31] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218 (T395241)', diff saved to https://phabricator.wikimedia.org/P77720 and previous config saved to /var/cache/conftool/dbconfig/20250611-134230-fceratto.json
[13:42:38] <wikibugs>	 (03CR) 10Ayounsi: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1052109 (https://phabricator.wikimedia.org/T362392) (owner: 10Ayounsi)
[13:43:08] <logmsgbot>	 !log kartik@deploy1003 helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
[13:43:17] <logmsgbot>	 !log esanders@deploy1003 esanders: Backport for [[gerrit:1155295|Enable DiscussionTools visual enhancements everywhere except 12 wikis (T392121)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:43:59] <wikibugs>	 (03CR) 10Hnowlan: [C:03+2] trafficserver: restbaseless reading lists API for all wikis [puppet] - 10https://gerrit.wikimedia.org/r/1149625 (https://phabricator.wikimedia.org/T384891) (owner: 10Hnowlan)
[13:45:07] <hnowlan>	 !log migrating reading lists out of restbase for all wikis
[13:45:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:46:41] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] calico: Add support to manage CNI installation by daemonset [deployment-charts] - 10https://gerrit.wikimedia.org/r/1153976 (https://phabricator.wikimedia.org/T396107) (owner: 10JMeybohm)
[13:46:45] <logmsgbot>	 !log esanders@deploy1003 esanders: Continuing with sync
[13:47:11] <Lucas_WMDE>	 \o/
[13:47:54] <logmsgbot>	 !log kartik@deploy1003 helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
[13:48:55] <kart_>	 !log Updated Recommnedation-API to 2025-06-10-203235-production (T374695)
[13:48:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:58] <stashbot>	 T374695: Community-defined Translation Collections: Support collections with multiple sub-collections - https://phabricator.wikimedia.org/T374695
[13:49:24] <wikibugs>	 (03PS1) 10Aklapper: Split an if clause [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1155677
[13:50:18] <vgutierrez>	 !log upload varnish 7.1.1-2~bpo11+wmf2 to apt.wm.o (bullseye-wikimedia) - T396581
[13:50:19] <wikibugs>	 (03CR) 10Aklapper: [V:03+2 C:03+2] Split an if clause [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1155677 (owner: 10Aklapper)
[13:50:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:50:21] <stashbot>	 T396581: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581
[13:53:45] <logmsgbot>	 !log esanders@deploy1003 Finished scap sync-world: Backport for [[gerrit:1155295|Enable DiscussionTools visual enhancements everywhere except 12 wikis (T392121)]] (duration: 12m 36s)
[13:53:48] <stashbot>	 T392121: Phase 4: Offer Usability Improvements as default-on feature at wikis - https://phabricator.wikimedia.org/T392121
[13:55:25] <Lucas_WMDE>	 alright, I’ll finish with MatmaRex’ changes then :)
[13:55:41] <MatmaRex>	 thanks
[13:55:46] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155299 (https://phabricator.wikimedia.org/T362324) (owner: 10Bartosz Dziewoński)
[13:55:47] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155303 (https://phabricator.wikimedia.org/T393963) (owner: 10Bartosz Dziewoński)
[13:56:34] <wikibugs>	 (03Merged) 10jenkins-bot: Set $wgPHPSessionHandling to 'disable' on testwiki and beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155299 (https://phabricator.wikimedia.org/T362324) (owner: 10Bartosz Dziewoński)
[13:56:35] <wikibugs>	 (03PS5) 10Ayounsi: Bird: use the "interface" config option for v6 peers [puppet] - 10https://gerrit.wikimedia.org/r/1052109 (https://phabricator.wikimedia.org/T362392)
[13:56:37] <wikibugs>	 (03Merged) 10jenkins-bot: Stop logging $wgPHPSessionHandling warnings for now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155303 (https://phabricator.wikimedia.org/T393963) (owner: 10Bartosz Dziewoński)
[13:56:49] <wikibugs>	 (03CR) 10Ayounsi: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1052109 (https://phabricator.wikimedia.org/T362392) (owner: 10Ayounsi)
[13:56:58] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for [[gerrit:1155299|Set $wgPHPSessionHandling to 'disable' on testwiki and beta cluster (T362324)]], [[gerrit:1155303|Stop logging $wgPHPSessionHandling warnings for now (T393963)]]
[13:57:03] <stashbot>	 T362324: Disable PHPSessionHandler in Wikimedia production - https://phabricator.wikimedia.org/T362324
[13:57:04] <stashbot>	 T393963: PHP Deprecated: Use of $_SESSION was deprecated in MediaWiki 1.27. [Called from session_write_close in (internal function)] - https://phabricator.wikimedia.org/T393963
[13:57:25] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'configure' for AS: 10310
[13:57:37] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P77721 and previous config saved to /var/cache/conftool/dbconfig/20250611-135736-fceratto.json
[13:58:39] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] hiera: Switch lvs7002 to katran [puppet] - 10https://gerrit.wikimedia.org/r/1155610 (https://phabricator.wikimedia.org/T396561) (owner: 10Vgutierrez)
[13:59:08] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1155299|Set $wgPHPSessionHandling to 'disable' on testwiki and beta cluster (T362324)]], [[gerrit:1155303|Stop logging $wgPHPSessionHandling warnings for now (T393963)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:59:29] <MatmaRex>	 looking
[13:59:38] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Switch eqsin to unified cert issued by GTS [puppet] - 10https://gerrit.wikimedia.org/r/1155681 (https://phabricator.wikimedia.org/T395131)
[14:00:04] <jouncebot>	 Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250611T1400)
[14:00:20] <logmsgbot>	 !log ayounsi@cumin1002 END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 10310
[14:00:35] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'configure' for AS: 10310
[14:00:48] <MatmaRex>	 Lucas_WMDE: seems good
[14:01:04] <wikibugs>	 (03CR) 10Andrew-WMDE: [C:03+2] wikidata-query-gui: Bump query-gui image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155643 (https://phabricator.wikimedia.org/T396002) (owner: 10Andrew-WMDE)
[14:01:04] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for cmelo - https://phabricator.wikimedia.org/T395966#10904763 (10cmelo) Hi @elukey, thanks, and yes I would like some help with the process, I also would like to request: **analytics-privatedata-users** access, because I usually need to read some f...
[14:01:11] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 matmarex, lucaswerkmeister-wmde: Continuing with sync
[14:01:12] <Lucas_WMDE>	 ok!
[14:01:50] <Lucas_WMDE>	 oh, I didn’t realize we’re already over time :S
[14:02:16] <James_F>	 Lucas_WMDE: It's fine, we don't clash in practice.
[14:02:39] <wikibugs>	 (03Merged) 10jenkins-bot: wikidata-query-gui: Bump query-gui image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155643 (https://phabricator.wikimedia.org/T396002) (owner: 10Andrew-WMDE)
[14:02:42] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] "thanks for working on this" [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/1155237 (https://phabricator.wikimedia.org/T390912) (owner: 10Ssingh)
[14:04:21] <logmsgbot>	 !log andrew-wmde@deploy1003 helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
[14:04:40] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 10310
[14:06:00] <wikibugs>	 (03PS2) 10Jforrester: wikifunctions: Update evaluators from 2025-06-03-205630 to 2025-06-09-163022 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155661 (https://phabricator.wikimedia.org/T390753)
[14:06:00] <wikibugs>	 (03PS2) 10Jforrester: wikifunctions: Update orchestrator from 2025-06-04-185118 to 2025-06-10-144243 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155662 (https://phabricator.wikimedia.org/T390753)
[14:06:00] <wikibugs>	 (03PS4) 10Jforrester: wikifunctions: Configure memcachedUri for the function-orchestrator and enable [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155241 (https://phabricator.wikimedia.org/T390746)
[14:06:33] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for cmelo - https://phabricator.wikimedia.org/T395966#10904773 (10cmelo) @elukey, Here are the public keys:  PROD public key:  ` ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICB5a7B3Lik8aSZpI3TOgV6uBExCmrkmn8FE/3PHmClG claudiomelo@wmf3041 `  WMCS public key:...
[14:06:40] <wikibugs>	 (03CR) 10Cory Massaro: [C:03+2] wikifunctions: Update evaluators from 2025-06-03-205630 to 2025-06-09-163022 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155661 (https://phabricator.wikimedia.org/T390753) (owner: 10Jforrester)
[14:06:42] <logmsgbot>	 !log andrew-wmde@deploy1003 helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
[14:08:08] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Update evaluators from 2025-06-03-205630 to 2025-06-09-163022 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155661 (https://phabricator.wikimedia.org/T390753) (owner: 10Jforrester)
[14:08:12] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1003 Finished scap sync-world: Backport for [[gerrit:1155299|Set $wgPHPSessionHandling to 'disable' on testwiki and beta cluster (T362324)]], [[gerrit:1155303|Stop logging $wgPHPSessionHandling warnings for now (T393963)]] (duration: 11m 14s)
[14:08:18] <stashbot>	 T362324: Disable PHPSessionHandler in Wikimedia production - https://phabricator.wikimedia.org/T362324
[14:08:18] <stashbot>	 T393963: PHP Deprecated: Use of $_SESSION was deprecated in MediaWiki 1.27. [Called from session_write_close in (internal function)] - https://phabricator.wikimedia.org/T393963
[14:09:45] <MatmaRex>	 Lucas_WMDE: thanks!
[14:10:20] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[14:10:27] <wikibugs>	 10ops-codfw, 06SRE, 06cloud-services-team, 06DC-Ops: cloudcontrol2010-dev service implementation - https://phabricator.wikimedia.org/T396064#10904809 (10Andrew) 05Open→03Resolved
[14:10:42] <Lucas_WMDE>	 MatmaRex: np! I’m excited to see those deprecation warnings go down in logspam-watch ^^
[14:10:51] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[14:10:59] <logmsgbot>	 !log andrew-wmde@deploy1003 helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
[14:11:20] <wikibugs>	 (03CR) 10Cory Massaro: [C:03+2] wikifunctions: Update orchestrator from 2025-06-04-185118 to 2025-06-10-144243 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155662 (https://phabricator.wikimedia.org/T390753) (owner: 10Jforrester)
[14:11:37] <Lucas_WMDE>	 !log UTC afternoon backport+config window done
[14:11:39] <wikibugs>	 (03PS1) 10Tchanders: temp accounts: Enable temp account creation on three wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155683 (https://phabricator.wikimedia.org/T396464)
[14:11:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:41] <wikibugs>	 (03PS1) 10Tchanders: temp accounts: Enable temp account creation on further wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155684 (https://phabricator.wikimedia.org/T396465)
[14:11:49] <logmsgbot>	 !log apine@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[14:11:57] <wikibugs>	 (03CR) 10Cory Massaro: wikifunctions: Update orchestrator from 2025-06-04-185118 to 2025-06-10-144243 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155662 (https://phabricator.wikimedia.org/T390753) (owner: 10Jforrester)
[14:12:23] <logmsgbot>	 !log andrew-wmde@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
[14:12:36] <wikibugs>	 (03CR) 10Tchanders: [C:04-2] "Planned for 24 June, 2025. Requires go-ahead from comms." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155684 (https://phabricator.wikimedia.org/T396465) (owner: 10Tchanders)
[14:12:44] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P77722 and previous config saved to /var/cache/conftool/dbconfig/20250611-141243-fceratto.json
[14:12:52] <logmsgbot>	 !log apine@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[14:13:00] <logmsgbot>	 !log apine@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[14:13:06] <wikibugs>	 (03CR) 10Tchanders: [C:04-2] "Planned for 17 June, 2025. Requires go-ahead from comms." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155683 (https://phabricator.wikimedia.org/T396464) (owner: 10Tchanders)
[14:13:41] <logmsgbot>	 !log apine@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[14:14:12] <wikibugs>	 (03CR) 10Cory Massaro: [C:03+2] wikifunctions: Update orchestrator from 2025-06-04-185118 to 2025-06-10-144243 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155662 (https://phabricator.wikimedia.org/T390753) (owner: 10Jforrester)
[14:15:18] <logmsgbot>	 !log andrew-wmde@deploy1003 helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
[14:15:43] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Update orchestrator from 2025-06-04-185118 to 2025-06-10-144243 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155662 (https://phabricator.wikimedia.org/T390753) (owner: 10Jforrester)
[14:15:44] <wikibugs>	 (03PS1) 10Jforrester: WikiLambda: Set repo-only config only in repo mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155686
[14:15:44] <wikibugs>	 (03PS1) 10Jforrester: WikiLambda: Enable orchestrator cache updates on edit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155687 (https://phabricator.wikimedia.org/T390746)
[14:15:52] <logmsgbot>	 !log andrew-wmde@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
[14:15:55] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[14:16:32] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[14:16:53] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[14:17:26] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[14:18:09] <logmsgbot>	 !log apine@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[14:18:28] <wikibugs>	 06SRE, 06Traffic: haproxy is able to load the same GeoIP & IP-to-ASN data as Varnish does - https://phabricator.wikimedia.org/T329849#10904877 (10Fabfur) 05Open→03Resolved p:05Triage→03Medium a:03Fabfur
[14:18:41] <logmsgbot>	 !log apine@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[14:18:49] <logmsgbot>	 !log apine@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[14:19:18] <logmsgbot>	 !log apine@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[14:21:01] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] Release 9.2.10-1wm2 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/1155237 (https://phabricator.wikimedia.org/T390912) (owner: 10Ssingh)
[14:21:07] <wikibugs>	 (03CR) 10Cory Massaro: [C:03+2] wikifunctions: Configure memcachedUri for the function-orchestrator and enable [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155241 (https://phabricator.wikimedia.org/T390746) (owner: 10Jforrester)
[14:21:32] <logmsgbot>	 !log btullis@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[14:22:16] <logmsgbot>	 !log btullis@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[14:22:43] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Configure memcachedUri for the function-orchestrator and enable [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155241 (https://phabricator.wikimedia.org/T390746) (owner: 10Jforrester)
[14:23:37] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[14:24:42] <wikibugs>	 (03CR) 10Ssingh: "NOOP on all DNS hosts: https://puppet-compiler.wmflabs.org/output/1052109/5917/" [puppet] - 10https://gerrit.wikimedia.org/r/1052109 (https://phabricator.wikimedia.org/T362392) (owner: 10Ayounsi)
[14:26:20] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[14:27:51] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1218 (T395241)', diff saved to https://phabricator.wikimedia.org/P77723 and previous config saved to /var/cache/conftool/dbconfig/20250611-142750-fceratto.json
[14:28:09] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
[14:28:16] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db1219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77724 and previous config saved to /var/cache/conftool/dbconfig/20250611-142816-fceratto.json
[14:28:48] <wikibugs>	 (03PS1) 10Cory Massaro: wikifunctions: make the JSON good with commas. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155690
[14:28:53] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] wikifunctions: make the JSON good with commas. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155690 (owner: 10Cory Massaro)
[14:30:19] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove ncredir7001 from conftool [puppet] - 10https://gerrit.wikimedia.org/r/1155649 (https://phabricator.wikimedia.org/T394263) (owner: 10Muehlenhoff)
[14:30:28] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: make the JSON good with commas. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155690 (owner: 10Cory Massaro)
[14:31:45] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[14:31:55] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[14:34:37] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
[14:34:57] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet
[14:36:33] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77726 and previous config saved to /var/cache/conftool/dbconfig/20250611-143633-fceratto.json
[14:37:05] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1155681 (https://phabricator.wikimedia.org/T395131) (owner: 10Vgutierrez)
[14:39:15] <logmsgbot>	 !log tappof@cumin1002 START - Cookbook sre.hosts.reboot-single for host titan2001.codfw.wmnet
[14:40:24] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet
[14:40:37] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1047.eqiad.wmnet
[14:41:14] <wikibugs>	 (03CR) 10Scott French: "Thanks for the review!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1127188 (https://phabricator.wikimedia.org/T388260) (owner: 10Scott French)
[14:44:52] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.decommission for hosts relforge[1003-1004].eqiad.wmnet
[14:45:18] <logmsgbot>	 !log bking@cumin2002 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts relforge[1003-1004].eqiad.wmnet
[14:46:47] <logmsgbot>	 !log tappof@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2001.codfw.wmnet
[14:51:40] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P77727 and previous config saved to /var/cache/conftool/dbconfig/20250611-145140-fceratto.json
[14:53:18] <wikibugs>	 (03PS1) 10Elukey: sre.hosts.provision: improve Supermicro's PXE configs and logs [cookbooks] - 10https://gerrit.wikimedia.org/r/1155697
[14:54:39] <wikibugs>	 (03PS1) 10Bking: cirrus-streaming-updater (staging): remove references to defunct host [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155698 (https://phabricator.wikimedia.org/T390565)
[14:54:46] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM, question inline for more future-proofing" [cookbooks] - 10https://gerrit.wikimedia.org/r/1155697 (owner: 10Elukey)
[14:55:36] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to deployment for tarrow - https://phabricator.wikimedia.org/T208491#10905094 (10Tarrow) 05Resolved→03Open I was just trying to deploy https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1155643 and discovered that I seem to...
[14:56:11] <wikibugs>	 (03PS8) 10Gkyziridis: ores-extension: enable revertrisk filter for simplewiki and trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1151693 (https://phabricator.wikimedia.org/T395668)
[14:56:11] <wikibugs>	 (03PS3) 10Gkyziridis: ores-extension: enable extension with revertrisk filter for the third batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155652 (https://phabricator.wikimedia.org/T395824)
[14:56:44] <wikibugs>	 (03CR) 10Ssingh: "https://puppet-compiler.wmflabs.org/output/1052109/5918/" [puppet] - 10https://gerrit.wikimedia.org/r/1052109 (https://phabricator.wikimedia.org/T362392) (owner: 10Ayounsi)
[14:57:18] <logmsgbot>	 !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
[14:57:51] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr1-magru:xe-0/1/2 (DISABLED) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[14:57:54] <wikibugs>	 (03PS2) 10AOkoth: wmnet: switch active doc host [dns] - 10https://gerrit.wikimedia.org/r/1155306 (https://phabricator.wikimedia.org/T392130)
[14:57:57] <wikibugs>	 (03CR) 10Elukey: sre.hosts.provision: improve Supermicro's PXE configs and logs (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1155697 (owner: 10Elukey)
[14:58:27] <wikibugs>	 (03CR) 10Bking: [C:03+2] cirrus-streaming-updater (staging): remove references to defunct host [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155698 (https://phabricator.wikimedia.org/T390565) (owner: 10Bking)
[14:58:38] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: wikifunctions: Enable staging access to memcached [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155702 (https://phabricator.wikimedia.org/T391986)
[14:58:44] <wikibugs>	 (03CR) 10Bking: [C:03+2] "self-merging, as this only affects a staging environment." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155698 (https://phabricator.wikimedia.org/T390565) (owner: 10Bking)
[14:59:30] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Looking at the PCC diffs, +1 for the Monday deployment." [puppet] - 10https://gerrit.wikimedia.org/r/1052109 (https://phabricator.wikimedia.org/T362392) (owner: 10Ayounsi)
[14:59:54] <wikibugs>	 (03CR) 10Volans: [C:03+1] sre.hosts.provision: improve Supermicro's PXE configs and logs (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1155697 (owner: 10Elukey)
[15:00:12] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus-streaming-updater (staging): remove references to defunct host [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155698 (https://phabricator.wikimedia.org/T390565) (owner: 10Bking)
[15:00:18] <wikibugs>	 (03CR) 10Elukey: [C:03+2] sre.hosts.provision: improve Supermicro's PXE configs and logs [cookbooks] - 10https://gerrit.wikimedia.org/r/1155697 (owner: 10Elukey)
[15:00:53] <wikibugs>	 (03PS9) 10Gkyziridis: ores-extension: enable revertrisk filter for simplewiki and trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1151693 (https://phabricator.wikimedia.org/T395668)
[15:00:53] <wikibugs>	 (03PS2) 10Gkyziridis: ores-extension: enable oresUI for the second batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823)
[15:01:53] <logmsgbot>	 !log bking@deploy1003 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:02:13] <logmsgbot>	 !log bking@deploy1003 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:02:25] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155703
[15:02:54] <wikibugs>	 06SRE, 06Data-Engineering, 10LDAP-Access-Requests: Grant Access to Product's Superset & Turnilo for SKivlehan - https://phabricator.wikimedia.org/T393626#10905110 (10elukey) 05Resolved→03Open
[15:03:01] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[15:03:08] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[15:03:18] <logmsgbot>	 !log brouberol@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[15:03:23] <wikibugs>	 (03Abandoned) 10Gkyziridis: ores-extension: enable revertrisk filter for simplewiki and trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1151693 (https://phabricator.wikimedia.org/T395668) (owner: 10Gkyziridis)
[15:03:26] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] wikifunctions: Enable staging access to memcached [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155702 (https://phabricator.wikimedia.org/T391986) (owner: 10Alexandros Kosiaris)
[15:04:38] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] hiera: Switch eqsin to unified cert issued by GTS [puppet] - 10https://gerrit.wikimedia.org/r/1155681 (https://phabricator.wikimedia.org/T395131) (owner: 10Vgutierrez)
[15:04:51] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] wikifunctions: Enable staging access to memcached [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155702 (https://phabricator.wikimedia.org/T391986) (owner: 10Alexandros Kosiaris)
[15:06:15] <logmsgbot>	 !log tappof@cumin1002 START - Cookbook sre.hosts.reboot-single for host titan1001.eqiad.wmnet
[15:06:35] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Enable staging access to memcached [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155702 (https://phabricator.wikimedia.org/T391986) (owner: 10Alexandros Kosiaris)
[15:06:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:06:45] <wikibugs>	 (03PS3) 10Gkyziridis: ores-extension: enable oresUI for the second batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823)
[15:06:48] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P77729 and previous config saved to /var/cache/conftool/dbconfig/20250611-150647-fceratto.json
[15:06:58] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.decommission for hosts relforge[1003-1004].eqiad.wmnet
[15:08:26] <logmsgbot>	 !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[15:08:44] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[15:09:04] <wikibugs>	 (03PS4) 10Gkyziridis: ores-extension: enable oresUI for the second batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823)
[15:09:12] <logmsgbot>	 !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[15:09:13] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] START helmfile.d/services/wikifunctions: apply
[15:09:39] <logmsgbot>	 !log apine@deploy1003 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
[15:10:42] <sukhe>	 !log reprepro -C main include bullseye-wikimedia trafficserver_9.2.10-1wm2_amd64.changes: T390912
[15:10:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:45] <stashbot>	 T390912: Upgrade to ATS 9.2.10 - https://phabricator.wikimedia.org/T390912
[15:13:18] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to deployment for tarrow - https://phabricator.wikimedia.org/T208491#10905150 (10Addshore) Apr 20, 2023 Removed  @Tarrow  (1620)  @thcipriani  (2321)  https://gerrit.wikimedia.org/r/admin/groups/3fdcf8fd0d569e90a3e9b39788a29f2c50d33be9,audit...
[15:13:29] <logmsgbot>	 !log sukhe@puppetserver1001 conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet [reason: testing 9.2.10 upgrade]
[15:14:29] <sukhe>	 !log depool cp4037 to test ATS 9.2.10 upgrade: T390912
[15:14:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:45] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.decommission for hosts dse-k8s-worker1012.eqiad.wmnet
[15:15:10] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.dns.netbox
[15:15:20] <logmsgbot>	 !log sukhe@cumin1002 START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp4037*} and A:cp - 9.2.10 upgrade (T390912)
[15:15:25] <logmsgbot>	 !log tappof@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1001.eqiad.wmnet
[15:16:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:16:45] <wikibugs>	 (03PS1) 10Clare Ming: xLab: Deploy v0.6.9 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155708 (https://phabricator.wikimedia.org/T396457)
[15:16:53] <wikibugs>	 (03PS1) 10Bking: relforge: remove decomm'd hosts [puppet] - 10https://gerrit.wikimedia.org/r/1155709 (https://phabricator.wikimedia.org/T390565)
[15:18:20] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1052 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:18:28] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1049 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:18:46] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1059 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:18:56] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1050 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:18:58] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1053 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:19:12] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1048 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:19:14] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1056 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:19:30] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1061 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:19:38] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1054 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:19:39] <wikibugs>	 (03PS1) 10Cwhite: add ecs.version checking [software/ecs] - 10https://gerrit.wikimedia.org/r/1155710 (https://phabricator.wikimedia.org/T395819)
[15:19:43] <logmsgbot>	 !log sukhe@cumin1002 END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp4037*} and A:cp - 9.2.10 upgrade (T390912)
[15:19:46] <stashbot>	 T390912: Upgrade to ATS 9.2.10 - https://phabricator.wikimedia.org/T390912
[15:19:46] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1059 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:19:56] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1057 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:20:14] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1056 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:20:15] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.dns.netbox
[15:20:21] <wikibugs>	 (03CR) 10Bking: [C:03+2] relforge: remove decomm'd hosts [puppet] - 10https://gerrit.wikimedia.org/r/1155709 (https://phabricator.wikimedia.org/T390565) (owner: 10Bking)
[15:20:30] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1061 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:20:34] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] add ecs.version checking [software/ecs] - 10https://gerrit.wikimedia.org/r/1155710 (https://phabricator.wikimedia.org/T395819) (owner: 10Cwhite)
[15:20:38] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1054 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:20:53] <logmsgbot>	 bking@cumin2002 decommission (PID 3555323) is awaiting input
[15:20:56] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1057 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:20:58] <wikibugs>	 (03CR) 10Santiago Faci: [C:03+2] xLab: Deploy v0.6.9 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155708 (https://phabricator.wikimedia.org/T396457) (owner: 10Clare Ming)
[15:20:58] <wikibugs>	 (03Merged) 10jenkins-bot: add ecs.version checking [software/ecs] - 10https://gerrit.wikimedia.org/r/1155710 (https://phabricator.wikimedia.org/T395819) (owner: 10Cwhite)
[15:21:06] <wikibugs>	 (03CR) 10Bking: [C:03+2] "Self-merging, as these hosts are already decommed." [puppet] - 10https://gerrit.wikimedia.org/r/1155709 (https://phabricator.wikimedia.org/T390565) (owner: 10Bking)
[15:21:13] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to deployment for tarrow - https://phabricator.wikimedia.org/T208491#10905187 (10thcipriani) 05Open→03Resolved >>! In T208491#10905149, @Addshore wrote: > Apr 20, 2023 Removed  @Tarrow  (1620)  @thcipriani  (2321) >  > https://gerrit...
[15:21:17] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[15:21:50] <logmsgbot>	 !log sukhe@puppetserver1001 conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet [reason: repooling after testing 9.2.10 upgrade: T390912]
[15:21:55] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77731 and previous config saved to /var/cache/conftool/dbconfig/20250611-152155-fceratto.json
[15:22:11] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to deployment for tarrow - https://phabricator.wikimedia.org/T208491#10905205 (10Tarrow) Thanks for the super fast response!
[15:22:14] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
[15:22:19] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] hiera: Switch eqsin to unified cert issued by GTS [puppet] - 10https://gerrit.wikimedia.org/r/1155681 (https://phabricator.wikimedia.org/T395131) (owner: 10Vgutierrez)
[15:22:20] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1050 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:22:20] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1053 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:22:21] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db1232 (T395241)', diff saved to https://phabricator.wikimedia.org/P77732 and previous config saved to /var/cache/conftool/dbconfig/20250611-152220-fceratto.json
[15:22:33] <wikibugs>	 (03Merged) 10jenkins-bot: xLab: Deploy v0.6.9 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155708 (https://phabricator.wikimedia.org/T396457) (owner: 10Clare Ming)
[15:22:47] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1049 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:22:56] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1048 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:22:56] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1052 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:22:58] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1053 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:23:20] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1053 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:23:28] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1049 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:23:46] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1012.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
[15:23:47] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1049 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:23:52] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to deployment for tarrow - https://phabricator.wikimedia.org/T208491#10905211 (10Addshore) {meme, src="seal-of-approval", above="Such speed", below="much fast"}
[15:23:56] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1050 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:24:06] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1012.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
[15:24:06] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:24:07] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dse-k8s-worker1012.eqiad.wmnet
[15:24:16] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: relforge[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
[15:24:20] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1050 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:24:22] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: relforge[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
[15:24:22] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:24:23] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts relforge[1003-1004].eqiad.wmnet
[15:24:56] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1048 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:25:12] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1048 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:25:15] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: wikifunctions: Fix the syntax of memcachdUri [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155711 (https://phabricator.wikimedia.org/T391986)
[15:25:20] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1052 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:25:31] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:04-1] "the list of models and threshold definitions for simplewiki and trwikfrom https://gerrit.wikimedia.org/r/q/Ifac4768d27eebab0cbd749ae8e5f06" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823) (owner: 10Gkyziridis)
[15:25:56] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1052 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[15:26:18] <logmsgbot>	 !log cjming@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
[15:26:50] <logmsgbot>	 !log cjming@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
[15:27:58] <wikibugs>	 (03CR) 10Cory Massaro: [C:03+2] wikifunctions: Fix the syntax of memcachdUri [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155711 (https://phabricator.wikimedia.org/T391986) (owner: 10Alexandros Kosiaris)
[15:28:01] <wikibugs>	 (03CR) 10Jforrester: [C:03+1] wikifunctions: Fix the syntax of memcachdUri [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155711 (https://phabricator.wikimedia.org/T391986) (owner: 10Alexandros Kosiaris)
[15:28:03] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] wikifunctions: Fix the syntax of memcachdUri [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155711 (https://phabricator.wikimedia.org/T391986) (owner: 10Alexandros Kosiaris)
[15:28:25] <logmsgbot>	 bking@cumin2002 decommission (PID 3568090) is awaiting input
[15:29:01] <vgutierrez>	 !log use Google Trust Services (GTS) unified TLS certificate on eqsin - T395131
[15:29:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:29:04] <stashbot>	 T395131: Replace Digicert TLS certs with Google Trust Services ones - https://phabricator.wikimedia.org/T395131
[15:29:23] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1232 (T395241)', diff saved to https://phabricator.wikimedia.org/P77733 and previous config saved to /var/cache/conftool/dbconfig/20250611-152923-fceratto.json
[15:29:29] <wikibugs>	 (03Merged) 10jenkins-bot: wikifunctions: Fix the syntax of memcachdUri [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155711 (https://phabricator.wikimedia.org/T391986) (owner: 10Alexandros Kosiaris)
[15:30:08] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[15:32:04] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): decommission relforge100[34] - https://phabricator.wikimedia.org/T390565#10905248 (10bking) a:05bking→03None
[15:32:37] <wikibugs>	 10ops-eqiad, 06DC-Ops, 10decommission-hardware, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): decommission relforge100[34] - https://phabricator.wikimedia.org/T390565#10905254 (10bking) Hello DC Ops,  I think these hosts are ready for y'all. If that's not the case, ping me here on in IRC (inflatador) and...
[15:34:06] <wikibugs>	 (03PS2) 10Bking: search: Return traffic to all DCs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1154828 (https://phabricator.wikimedia.org/T388610) (owner: 10Ebernhardson)
[15:34:08] <wikibugs>	 (03PS1) 10Cwhite: refactor ecs.version testing [software/ecs] - 10https://gerrit.wikimedia.org/r/1155712
[15:34:15] <wikibugs>	 (03PS8) 10CDobbins: add rest of South America (except Falkland Islands) to geo-maps [dns] - 10https://gerrit.wikimedia.org/r/1153334
[15:34:19] <wikibugs>	 (03PS3) 10Bking: search: Return traffic to all DCs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1154828 (https://phabricator.wikimedia.org/T388610) (owner: 10Ebernhardson)
[15:34:47] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[15:35:00] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] refactor ecs.version testing [software/ecs] - 10https://gerrit.wikimedia.org/r/1155712 (owner: 10Cwhite)
[15:35:28] <wikibugs>	 (03Merged) 10jenkins-bot: refactor ecs.version testing [software/ecs] - 10https://gerrit.wikimedia.org/r/1155712 (owner: 10Cwhite)
[15:36:20] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[15:36:57] <wikibugs>	 (03CR) 10Bking: [C:03+2] search: Return traffic to all DCs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1154828 (https://phabricator.wikimedia.org/T388610) (owner: 10Ebernhardson)
[15:37:31] <wikibugs>	 (03PS1) 10Bking: Revert "search: Return traffic to all DCs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155713
[15:37:39] <wikibugs>	 (03CR) 10Bking: [V:03+2 C:03+2] Revert "search: Return traffic to all DCs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155713 (owner: 10Bking)
[15:38:17] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.network.peering with action 'configure' for AS: 264525
[15:38:42] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 264525
[15:38:53] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[15:39:28] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.dns.netbox
[15:40:08] <wikibugs>	 (03CR) 10Gkyziridis: ores-extension: enable oresUI for the second batch of wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823) (owner: 10Gkyziridis)
[15:40:39] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment for cmelo - https://phabricator.wikimedia.org/T395966#10905329 (10elukey) >>! In T395966#10904763, @cmelo wrote: > Hi @elukey, thanks, and yes I would like some help with the process, I also would like to request: **analytics-privatedata-users** acc...
[15:42:50] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: After moving dse-k8s-worker1012 vlan - btullis@cumin1002"
[15:42:56] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: After moving dse-k8s-worker1012 vlan - btullis@cumin1002"
[15:42:56] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:43:33] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[15:44:30] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P77734 and previous config saved to /var/cache/conftool/dbconfig/20250611-154430-fceratto.json
[15:44:51] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1012
[15:46:14] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1012
[15:46:38] <wikibugs>	 (03PS1) 10Elukey: admin: move cmelo to ssh user [puppet] - 10https://gerrit.wikimedia.org/r/1155717 (https://phabricator.wikimedia.org/T395966)
[15:47:58] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
[15:50:15] <wikibugs>	 (03CR) 10Scott French: [C:03+2] shellbox: align image version to 2025-06-05-215815 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1127188 (https://phabricator.wikimedia.org/T388260) (owner: 10Scott French)
[15:50:33] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] Bird: use the "interface" config option for v6 peers [puppet] - 10https://gerrit.wikimedia.org/r/1052109 (https://phabricator.wikimedia.org/T362392) (owner: 10Ayounsi)
[15:52:00] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[15:52:01] <wikibugs>	 (03Merged) 10jenkins-bot: shellbox: align image version to 2025-06-05-215815 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1127188 (https://phabricator.wikimedia.org/T388260) (owner: 10Scott French)
[15:53:30] <jinxer-wm>	 FIRING: ProbeDown: Service text-https:443 has failed probes (http_text-https_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:54:31] <wikibugs>	 (03PS1) 10Hnowlan: services_proxy: change mobileapps port [puppet] - 10https://gerrit.wikimedia.org/r/1155719 (https://phabricator.wikimedia.org/T367418)
[15:54:31] <sukhe>	 huh
[15:54:52] <wikibugs>	 (03PS2) 10Hnowlan: services_proxy: change mobileapps port [puppet] - 10https://gerrit.wikimedia.org/r/1155719 (https://phabricator.wikimedia.org/T367418)
[15:54:57] <jinxer-wm>	 FIRING: ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:55:01] <sukhe>	 !incidents
[15:55:01] <sirenbot>	 6341 (UNACKED)  ProbeDown sre (103.102.166.224 ip4 text-https:443 probes/service http_text-https_ip4 eqsin)
[15:55:05] <hnowlan>	 uh oh
[15:55:07] <sukhe>	 !ack 6341
[15:55:08] <sirenbot>	 6341 (ACKED)  ProbeDown sre (103.102.166.224 ip4 text-https:443 probes/service http_text-https_ip4 eqsin)
[15:55:09] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox: apply
[15:55:30] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox: apply
[15:55:36] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1155717 (https://phabricator.wikimedia.org/T395966) (owner: 10Elukey)
[15:55:37] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
[15:55:52] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
[15:55:59] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-media: apply
[15:56:10] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
[15:56:17] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
[15:56:20] <icinga-wm>	 PROBLEM - Router interfaces on mr1-eqsin is CRITICAL: CRITICAL: No response from remote host 103.102.166.128 for 1.3.6.1.2.1.2.2.1.2 with snmp version 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:56:26] <hnowlan>	 ttfb is way up in the last 25 minutes or so 
[15:56:30] <sukhe>	 yep
[15:56:31] <_joe_>	 sukhe: fwiw things are ok here
[15:56:34] <sukhe>	 moving to -security
[15:56:35] <logmsgbot>	 !log swfrench@deploy1003 helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
[15:56:39] <_joe_>	 like the site is blazing fast
[15:56:40] <sukhe>	 _joe_: only eqsin is impacted
[15:56:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job pdu_sentry4 in ops@eqsin - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:56:42] <vgutierrez>	 eqsin feels slow
[15:56:44] <vgutierrez>	 even on ssh
[15:57:09] <sukhe>	 that also explains the purged alerts
[15:57:22] <logmsgbot>	 !log dancy@deploy1003 Installing scap version "4.173.0" for 2 host(s)
[15:57:41] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:57:48] <icinga-wm>	 PROBLEM - Recursive DNS on 103.102.166.8 is CRITICAL: DNS_QUERY CRITICAL - query timed out https://wikitech.wikimedia.org/wiki/DNS
[15:57:51] <wikibugs>	 (03PS1) 10Btullis: Remove obsolete analytics_cluster::postgresql role and profile [puppet] - 10https://gerrit.wikimedia.org/r/1155720 (https://phabricator.wikimedia.org/T395557)
[15:58:44] <jinxer-wm>	 FIRING: HaproxyUnavailable: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[15:58:56] <herron>	 !incidents
[15:58:57] <sirenbot>	 6341 (ACKED)  ProbeDown sre (103.102.166.224 ip4 text-https:443 probes/service http_text-https_ip4 eqsin)
[15:58:57] <sirenbot>	 6342 (UNACKED)  HaproxyUnavailable cache_text global sre (thanos-rule)
[15:59:02] <herron>	 !ack 6342
[15:59:03] <sirenbot>	 6342 (ACKED)  HaproxyUnavailable cache_text global sre (thanos-rule)
[15:59:03] <sukhe>	 !ack 6342
[15:59:04] <sirenbot>	 6342 (ACKED)  HaproxyUnavailable cache_text global sre (thanos-rule)
[15:59:12] <logmsgbot>	 !log dancy@deploy1003 Installation of scap version "4.173.0" completed for 2 hosts
[15:59:19] <icinga-wm>	 PROBLEM - Router interfaces on mr1-eqsin is CRITICAL: CRITICAL: No response from remote host 103.102.166.128 for 1.3.6.1.2.1.2.2.1.8 with snmp version 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:59:38] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P77735 and previous config saved to /var/cache/conftool/dbconfig/20250611-155937-fceratto.json
[15:59:47] <icinga-wm>	 RECOVERY - Recursive DNS on 103.102.166.8 is OK: DNS_QUERY OK - Success https://wikitech.wikimedia.org/wiki/DNS
[15:59:57] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:00:13] <icinga-wm>	 RECOVERY - Router interfaces on mr1-eqsin is OK: OK: host 103.102.166.128, interfaces up: 37, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[16:00:22] <logmsgbot>	 !log btullis@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
[16:00:23] <wikibugs>	 (03PS5) 10Gkyziridis: ores-extension: enable oresUI for the second batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823)
[16:01:27] <logmsgbot>	 !log fceratto@deploy1003 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[16:01:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job pdu_sentry4 in ops@eqsin - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:01:50] <wikibugs>	 (03CR) 10Gkyziridis: ores-extension: enable oresUI for the second batch of wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823) (owner: 10Gkyziridis)
[16:02:22] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
[16:02:41] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:03:44] <jinxer-wm>	 RESOLVED: HaproxyUnavailable: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[16:06:41] <icinga-wm>	 PROBLEM - Check unit status of statograph_post on alert1002 is CRITICAL: CRITICAL: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[16:06:41] <wikibugs>	 (03CR) 10Hnowlan: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1155719 (https://phabricator.wikimedia.org/T367418) (owner: 10Hnowlan)
[16:09:53] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.decommission for hosts dse-k8s-worker1013.eqiad.wmnet
[16:10:11] <logmsgbot>	 !log xcollazo@deploy1003 Started deploy [airflow-dags/analytics@b0517a4]: Deploy to pickup T385112#10905490.
[16:10:15] <stashbot>	 T385112: Investigate reasons for remaining inconsistencies - https://phabricator.wikimedia.org/T385112
[16:10:49] <logmsgbot>	 !log xcollazo@deploy1003 Finished deploy [airflow-dags/analytics@b0517a4]: Deploy to pickup T385112#10905490. (duration: 02m 14s)
[16:11:50] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "Just a small nit regarding sorting the entries in the array, other than that LGTM!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823) (owner: 10Gkyziridis)
[16:14:05] <wikibugs>	 (03PS1) 10Tchanders: WIP Configure event stream for IP auto-reveal instrument [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155725 (https://phabricator.wikimedia.org/T387600)
[16:14:45] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1232 (T395241)', diff saved to https://phabricator.wikimedia.org/P77736 and previous config saved to /var/cache/conftool/dbconfig/20250611-161444-fceratto.json
[16:15:03] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
[16:15:10] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db1234 (T395241)', diff saved to https://phabricator.wikimedia.org/P77737 and previous config saved to /var/cache/conftool/dbconfig/20250611-161509-fceratto.json
[16:16:41] <icinga-wm>	 RECOVERY - Check unit status of statograph_post on alert1002 is OK: OK: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[16:18:16] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.dns.netbox
[16:18:35] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "Just a comment regarding sorting alphabetically the wikis in the arrays. Other than that it looks good!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155652 (https://phabricator.wikimedia.org/T395824) (owner: 10Gkyziridis)
[16:18:43] <wikibugs>	 (03CR) 10Ahmon Dancy: [C:03+1] "This looks reasonable to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1136044 (https://phabricator.wikimedia.org/T364694) (owner: 10Aklapper)
[16:20:31] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[16:22:17] <logmsgbot>	 btullis@cumin1003 reimage (PID 1157910) is awaiting input
[16:22:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: httpbb_kubernetes_mw-web-next_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:23:36] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1234 (T395241)', diff saved to https://phabricator.wikimedia.org/P77738 and previous config saved to /var/cache/conftool/dbconfig/20250611-162335-fceratto.json
[16:23:57] <logmsgbot>	 btullis@cumin1002 decommission (PID 1236153) is awaiting input
[16:24:42] <wikibugs>	 (03CR) 10Dzahn: "I have a side question. Should I not expect to see newer versions on https://docker-registry.wikimedia.org/buildkitd/tags/ ?" [puppet] - 10https://gerrit.wikimedia.org/r/1155324 (https://phabricator.wikimedia.org/T394931) (owner: 10Brennen Bearnes)
[16:24:53] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] gitlab runners: update buildkitd to v0.22.0 [puppet] - 10https://gerrit.wikimedia.org/r/1155324 (https://phabricator.wikimedia.org/T394931) (owner: 10Brennen Bearnes)
[16:25:09] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
[16:25:30] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
[16:25:30] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:25:30] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dse-k8s-worker1013.eqiad.wmnet
[16:25:58] <wikibugs>	 (03CR) 10Ahmon Dancy: [C:03+1] "https://docker-registry.wikimedia.org/repos/releng/buildkit/tags/ is where you want to look" [puppet] - 10https://gerrit.wikimedia.org/r/1155324 (https://phabricator.wikimedia.org/T394931) (owner: 10Brennen Bearnes)
[16:27:05] <wikibugs>	 (03PS1) 10Cwhite: Change message when no errors found [software/ecs] - 10https://gerrit.wikimedia.org/r/1155726
[16:27:12] <wikibugs>	 (03CR) 10Ahmon Dancy: [C:03+1] "In fact, if you know of a way to delete  https://docker-registry.wikimedia.org/buildkitd, that would be nice." [puppet] - 10https://gerrit.wikimedia.org/r/1155324 (https://phabricator.wikimedia.org/T394931) (owner: 10Brennen Bearnes)
[16:27:55] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.dns.netbox
[16:28:30] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:28:53] <wikibugs>	 (03PS1) 10Aklapper: Penalize on linking a Pholio Mock [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1155727 (https://phabricator.wikimedia.org/T396609)
[16:29:25] <wikibugs>	 (03CR) 10Brennen Bearnes: "Last I knew, deleting images from the registry was considered a no-go, but maybe (hopefully) that has changed..." [puppet] - 10https://gerrit.wikimedia.org/r/1155324 (https://phabricator.wikimedia.org/T394931) (owner: 10Brennen Bearnes)
[16:29:56] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "ah, thanks, gotcha!" [puppet] - 10https://gerrit.wikimedia.org/r/1155324 (https://phabricator.wikimedia.org/T394931) (owner: 10Brennen Bearnes)
[16:30:06] <wikibugs>	 (03CR) 10Aklapper: [V:03+2 C:03+2] Penalize on linking a Pholio Mock [phabricator/antivandalism] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1155727 (https://phabricator.wikimedia.org/T396609) (owner: 10Aklapper)
[16:30:15] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "What is the status of doc2003 regarding backups now? Last comment I saw was about adding it to the ignore list. But if that is production " [dns] - 10https://gerrit.wikimedia.org/r/1155306 (https://phabricator.wikimedia.org/T392130) (owner: 10AOkoth)
[16:30:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job pdu_sentry4 in ops@eqsin - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:31:50] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: After moving dse-k8s-worker1013 vlan - btullis@cumin1002"
[16:31:56] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: After moving dse-k8s-worker1013 vlan - btullis@cumin1002"
[16:31:56] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:32:05] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1013
[16:33:14] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1013
[16:34:24] <wikibugs>	 10SRE-SLO, 10Abstract Wikipedia team (25Q4 (Apr–Jun)), 07OKR-Work, 07Workstreams: Establish an SLO for the Wikifunctions integration into Wikimedia projects' wikitext pages, to assure reader experience quality is maintained during roll-out - https://phabricator.wikimedia.org/T390548#10905611 (10DSantamaria)
[16:34:28] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.provision for host dse-k8s-worker1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:34:42] <wikibugs>	 (03CR) 10Dzahn: [V:03+1 C:03+1] ";; ANSWER SECTION:" [puppet] - 10https://gerrit.wikimedia.org/r/1154855 (owner: 10BCornwall)
[16:36:44] <jinxer-wm>	 FIRING: HaproxyUnavailable: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[16:36:58] <wikibugs>	 10SRE-SLO, 10Abstract Wikipedia team (25Q4 (Apr–Jun)), 07OKR-Work, 07Workstreams: Establish an SLO for the Wikifunctions integration into Wikimedia projects' wikitext pages, to assure reader experience quality is maintained during roll-out - https://phabricator.wikimedia.org/T390548#10905621 (10DSantamaria)
[16:37:03] <herron>	 !incidents
[16:37:04] <sirenbot>	 6343 (UNACKED)  HaproxyUnavailable cache_text global sre (thanos-rule)
[16:37:04] <sirenbot>	 6342 (RESOLVED)  HaproxyUnavailable cache_text global sre (thanos-rule)
[16:37:04] <sirenbot>	 6341 (RESOLVED)  ProbeDown sre (103.102.166.224 ip4 text-https:443 probes/service http_text-https_ip4 eqsin)
[16:37:13] <sukhe>	 !ack 6343
[16:37:13] <sirenbot>	 6343 (ACKED)  HaproxyUnavailable cache_text global sre (thanos-rule)
[16:38:40] <wikibugs>	 10ops-codfw, 06DC-Ops: Alert for device lsw1-b3-codfw.mgmt.codfw.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T396635 (10phaultfinder) 03NEW
[16:38:43] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P77739 and previous config saved to /var/cache/conftool/dbconfig/20250611-163842-fceratto.json
[16:40:22] <logmsgbot>	 btullis@cumin1002 provision (PID 1261705) is awaiting input
[16:41:13] <wikibugs>	 (03CR) 10Aleksandar Mastilovic: "What does this "experimental check failed" mean? Is there a change required to the source code?" [puppet] - 10https://gerrit.wikimedia.org/r/1142712 (https://phabricator.wikimedia.org/T390556) (owner: 10Aleksandar Mastilovic)
[16:41:44] <jinxer-wm>	 RESOLVED: HaproxyUnavailable: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[16:42:41] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:43:30] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:45:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job pdu_sentry4 in ops@eqsin - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:47:41] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service text-https:443 has failed probes (http_text-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:48:33] <wikibugs>	 10ops-codfw, 06DC-Ops: Alert for device lsw1-b5-codfw.mgmt.codfw.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T396638 (10phaultfinder) 03NEW
[16:49:40] <wikibugs>	 (03PS1) 10AOkoth: doc: make doc2003 the active host [puppet] - 10https://gerrit.wikimedia.org/r/1155733 (https://phabricator.wikimedia.org/T392130)
[16:53:35] <wikibugs>	 10ops-codfw, 06DC-Ops: Alert for device lsw1-c1-codfw.mgmt.codfw.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T396639 (10phaultfinder) 03NEW
[16:53:51] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P77740 and previous config saved to /var/cache/conftool/dbconfig/20250611-165350-fceratto.json
[16:54:28] <wikibugs>	 (03PS6) 10Gkyziridis: ores-extension: enable oresUI for the second batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823)
[16:56:41] <wikibugs>	 (03PS1) 10PipelineBot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155735
[16:58:19] <wikibugs>	 10SRE-SLO, 06Abstract Wikipedia team, 06SRE Observability, 07Essential-Work: create new SLO dashboard via Pyrra - https://phabricator.wikimedia.org/T394057#10905756 (10Jdforrester-WMF)
[16:58:21] <wikibugs>	 10SRE-SLO, 10Abstract Wikipedia team (25Q4 (Apr–Jun)), 07OKR-Work, 07Workstreams: Establish an SLO for the Wikifunctions integration into Wikimedia projects' wikitext pages, to assure reader experience quality is maintained during roll-out - https://phabricator.wikimedia.org/T390548#10905757 (10Jdforrester-...
[16:58:23] <wikibugs>	 10SRE-SLO, 06Abstract Wikipedia team, 06SRE Observability, 07Essential-Work: create new SLO dashboard via Pyrra - https://phabricator.wikimedia.org/T394057#10905759 (10Jdforrester-WMF)
[16:58:32] <wikibugs>	 (03CR) 10AOkoth: "Ack. I've made `doc2003` the active_host on Puppet. That will automatically disable backups on it. I should probably merge that before thi" [dns] - 10https://gerrit.wikimedia.org/r/1155306 (https://phabricator.wikimedia.org/T392130) (owner: 10AOkoth)
[16:58:34] <wikibugs>	 10ops-codfw, 06DC-Ops: Alert for device lsw1-d1-codfw.mgmt.codfw.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T396641 (10phaultfinder) 03NEW
[17:00:05] <jouncebot>	 swfrench-wmf and jasmine_: How many deployers does it take to do MediaWiki infrastructure (UTC late) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250611T1700).
[17:00:52] <logmsgbot>	 !log btullis@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
[17:01:26] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
[17:06:31] <wikibugs>	 (03PS9) 10CDobbins: add rest of South America (except Falkland Islands) to geo-maps [dns] - 10https://gerrit.wikimedia.org/r/1153334
[17:08:38] <wikibugs>	 10ops-codfw, 06DC-Ops: Alert for device lsw1-d5-codfw.mgmt.codfw.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T396642 (10phaultfinder) 03NEW
[17:08:58] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1234 (T395241)', diff saved to https://phabricator.wikimedia.org/P77741 and previous config saved to /var/cache/conftool/dbconfig/20250611-170857-fceratto.json
[17:09:16] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
[17:09:23] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db1235 (T395241)', diff saved to https://phabricator.wikimedia.org/P77742 and previous config saved to /var/cache/conftool/dbconfig/20250611-170922-fceratto.json
[17:09:41] <wikibugs>	 (03PS1) 10Bking: cirrussearch: return traffic to all DCs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155738 (https://phabricator.wikimedia.org/T388610)
[17:11:36] <logmsgbot>	 btullis@cumin1002 provision (PID 1261705) is awaiting input
[17:12:04] <wikibugs>	 (03CR) 10Majavah: [C:03+1] P:idm Enable API [puppet] - 10https://gerrit.wikimedia.org/r/1154262 (https://phabricator.wikimedia.org/T364605) (owner: 10Slyngshede)
[17:12:10] <wikibugs>	 (03PS7) 10Ilias Sarantopoulos: ores-extension: enable oresUI for the second batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823) (owner: 10Gkyziridis)
[17:12:14] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:12:40] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+1] ores-extension: enable oresUI for the second batch of wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823) (owner: 10Gkyziridis)
[17:13:40] <logmsgbot>	 !log btullis@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
[17:16:32] <wikibugs>	 (03PS8) 10Ilias Sarantopoulos: ores-extension: enable oresUI for the second batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823) (owner: 10Gkyziridis)
[17:16:34] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
[17:17:33] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1235 (T395241)', diff saved to https://phabricator.wikimedia.org/P77743 and previous config saved to /var/cache/conftool/dbconfig/20250611-171733-fceratto.json
[17:18:30] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
[17:19:29] <logmsgbot>	 !log btullis@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
[17:19:45] <wikibugs>	 (03PS9) 10Ilias Sarantopoulos: ores-extension: enable oresUI for the second batch of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155604 (https://phabricator.wikimedia.org/T395823) (owner: 10Gkyziridis)
[17:20:31] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-web-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-web-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[17:22:24] <wikibugs>	 (03CR) 10Btullis: [C:03+1] elasticsearch: filter LVS config based on cluster membership [puppet] - 10https://gerrit.wikimedia.org/r/1138400 (https://phabricator.wikimedia.org/T387569) (owner: 10Bking)
[17:22:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: httpbb_kubernetes_mw-web-next_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:22:35] <wikibugs>	 10ops-codfw, 06DC-Ops: Alert for device lsw1-d6-codfw.mgmt.codfw.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T396643 (10phaultfinder) 03NEW
[17:29:17] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
[17:32:00] <wikibugs>	 (03CR) 10Herron: [C:03+2] pyrra: update o11y slos to 4w window [puppet] - 10https://gerrit.wikimedia.org/r/1155246 (https://phabricator.wikimedia.org/T395916) (owner: 10Herron)
[17:32:13] <wikibugs>	 (03CR) 10Mforns: "One post-merge comment:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1155123 (https://phabricator.wikimedia.org/T394297) (owner: 10Brouberol)
[17:32:41] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P77744 and previous config saved to /var/cache/conftool/dbconfig/20250611-173240-fceratto.json
[17:35:26] <logmsgbot>	 !log btullis@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
[17:35:50] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.reimage for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
[17:37:04] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] etcd data for search-{psi,omega} dns discovery [puppet] - 10https://gerrit.wikimedia.org/r/1151308 (https://phabricator.wikimedia.org/T143553) (owner: 10Bking)
[17:37:56] <ryankemper>	 !log T143553 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151308 (first patch in plan https://phabricator.wikimedia.org/T143553#10861215)
[17:37:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:37:59] <stashbot>	 T143553: Switching search traffic between datacenters should be faster - https://phabricator.wikimedia.org/T143553
[17:45:13] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
[17:46:00] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] search: Add dnsdisc entries for omega and psi clusters [puppet] - 10https://gerrit.wikimedia.org/r/1151300 (https://phabricator.wikimedia.org/T143553) (owner: 10Ebernhardson)
[17:47:48] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P77745 and previous config saved to /var/cache/conftool/dbconfig/20250611-174747-fceratto.json
[17:48:35] <ryankemper>	 !log T143553 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151300 to add dnsdisc entries for omega/psi clusters (second patch in plan https://phabricator.wikimedia.org/T143553#10861215)
[17:48:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:48:39] <stashbot>	 T143553: Switching search traffic between datacenters should be faster - https://phabricator.wikimedia.org/T143553
[17:48:52] <wikibugs>	 10ops-codfw, 10ops-eqiad, 06SRE, 06DC-Ops: Dell SSD Critical Firmware Update - https://phabricator.wikimedia.org/T394348#10906008 (10RobH)
[17:48:57] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
[17:50:20] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: SSD firmware update for db125[0-4] - https://phabricator.wikimedia.org/T396648 (10RobH) 03NEW
[17:50:52] <sukhe>	 !log running agent on A:dnsbox T143553
[17:50:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:51:39] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
[17:52:55] <wikibugs>	 (03PS2) 10Ryan Kemper: search: Update dnsdisc envoy upstreams [puppet] - 10https://gerrit.wikimedia.org/r/1151316 (https://phabricator.wikimedia.org/T143553) (owner: 10Bking)
[17:53:35] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: SSD firmware update for db125[0-4] - https://phabricator.wikimedia.org/T396648#10906027 (10RobH) a:05RobH→03None @jcrespo or @Marostegui: Would one of you be the best person to handle this or should I task it over to Kwaku for assignment?  Basi...
[17:53:38] <wikibugs>	 (03CR) 10AOkoth: "https://puppet-compiler.wmflabs.org/output/1155733/5919/" [puppet] - 10https://gerrit.wikimedia.org/r/1155733 (https://phabricator.wikimedia.org/T392130) (owner: 10AOkoth)
[17:53:40] <wikibugs>	 (03PS4) 10Ebernhardson: Add search-{psi,omega}.svc.$dc.wmnet cnames [dns] - 10https://gerrit.wikimedia.org/r/1151303 (https://phabricator.wikimedia.org/T143553)
[17:55:05] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
[17:56:08] <logmsgbot>	 !log ryankemper@cumin2002 conftool action : set/pooled=true; selector: dnsdisc=search-psi
[17:56:10] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10fundraising-tech-ops: SSD firmware update for frbackup2002 - https://phabricator.wikimedia.org/T396649 (10RobH) 03NEW
[17:56:25] <logmsgbot>	 !log ryankemper@cumin2002 conftool action : set/pooled=true; selector: dnsdisc=search-omega
[17:57:02] <ryankemper>	 !log T143553 Pooled `dns-disc=search-(omega|psi)` per plan in https://phabricator.wikimedia.org/T143553#10861215
[17:57:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:57:05] <stashbot>	 T143553: Switching search traffic between datacenters should be faster - https://phabricator.wikimedia.org/T143553
[17:57:21] <wikibugs>	 (03PS6) 10Ryan Kemper: search: Add search-{psi,omega} geoip discovery entries [dns] - 10https://gerrit.wikimedia.org/r/1151304 (https://phabricator.wikimedia.org/T143553) (owner: 10Ebernhardson)
[17:58:01] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] search: Add search-{psi,omega} geoip discovery entries [dns] - 10https://gerrit.wikimedia.org/r/1151304 (https://phabricator.wikimedia.org/T143553) (owner: 10Ebernhardson)
[17:58:17] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10fundraising-tech-ops: SSD firmware update for frbackup2002 - https://phabricator.wikimedia.org/T396649#10906050 (10RobH) @Jgreen: Should this turf to you or should I assign it over to Greg for allocation?   Basically we need to update the firmware on the affected db hosts db...
[17:58:55] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10fundraising-tech-ops: SSD firmware update for frbackup2002 - https://phabricator.wikimedia.org/T396649#10906051 (10RobH)
[17:59:15] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "yea, that sounds right (about the order of things)" [dns] - 10https://gerrit.wikimedia.org/r/1155306 (https://phabricator.wikimedia.org/T392130) (owner: 10AOkoth)
[17:59:33] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: SSD firmware update for db125[0-4] - https://phabricator.wikimedia.org/T396648#10906058 (10Ladsgroup) I'm on phone give me a second to tell you bow each one is and how we can proceed. Some might be simpler than others
[18:00:05] <jouncebot>	 brennen and dduvall: #bothumor My software never has bugs. It just develops random features. Rise for MediaWiki train - Utc-7 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250611T1800).
[18:00:28] <wikibugs>	 10ops-codfw, 10ops-eqiad, 06SRE, 06DC-Ops: Dell SSD Critical Firmware Update - https://phabricator.wikimedia.org/T394348#10906061 (10RobH)
[18:02:17] <wikibugs>	 (03PS7) 10Ryan Kemper: search: Add search-{psi,omega} geoip discovery entries [dns] - 10https://gerrit.wikimedia.org/r/1151304 (https://phabricator.wikimedia.org/T143553) (owner: 10Ebernhardson)
[18:02:17] <wikibugs>	 (03PS5) 10Ryan Kemper: Add search-{psi,omega}.svc.$dc.wmnet cnames [dns] - 10https://gerrit.wikimedia.org/r/1151303 (https://phabricator.wikimedia.org/T143553) (owner: 10Ebernhardson)
[18:02:35] <wikibugs>	 (03CR) 10Ryan Kemper: [V:03+2 C:03+2] search: Add search-{psi,omega} geoip discovery entries [dns] - 10https://gerrit.wikimedia.org/r/1151304 (https://phabricator.wikimedia.org/T143553) (owner: 10Ebernhardson)
[18:02:54] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1235 (T395241)', diff saved to https://phabricator.wikimedia.org/P77746 and previous config saved to /var/cache/conftool/dbconfig/20250611-180254-fceratto.json
[18:03:03] <logmsgbot>	 !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1251.eqiad.wmnet with reason: Maintenance
[18:03:10] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Depooling db1251 (T395241)', diff saved to https://phabricator.wikimedia.org/P77747 and previous config saved to /var/cache/conftool/dbconfig/20250611-180309-fceratto.json
[18:03:12] <logmsgbot>	 !log sukhe@dns1004 START - running authdns-update
[18:03:44] <brennen>	 o/
[18:03:49] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: SSD firmware update for db125[0-4] - https://phabricator.wikimedia.org/T396648#10906076 (10Ladsgroup) Db1250 is master of m1, needs a switchover. Db1251 is a normal s1 replica. We can depool it at any moment Db1252 is also a normal replica of w4. C...
[18:04:03] <logmsgbot>	 !log sukhe@dns1004 END - running authdns-update
[18:05:26] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] Add search-{psi,omega}.svc.$dc.wmnet cnames [dns] - 10https://gerrit.wikimedia.org/r/1151303 (https://phabricator.wikimedia.org/T143553) (owner: 10Ebernhardson)
[18:05:47] <logmsgbot>	 !log sukhe@dns1004 START - running authdns-update
[18:05:51] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651 (10RobH) 03NEW
[18:06:20] <logmsgbot>	 !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
[18:06:39] <logmsgbot>	 !log sukhe@dns1004 END - running authdns-update
[18:07:19] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: SSD firmware update for db125[0-4] - https://phabricator.wikimedia.org/T396648#10906115 (10Ladsgroup) I can take care of all of them later today except db1250. For that it has to wait a bit
[18:07:35] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651#10906116 (10RobH) a:03Andrew @andrew, Would you be the best person to handle this or should I task it over to Joanna for assignment? Basically we need...
[18:07:56] <wikibugs>	 10ops-codfw, 10ops-eqiad, 06SRE, 06DC-Ops: Dell SSD Critical Firmware Update - https://phabricator.wikimedia.org/T394348#10906119 (10RobH)
[18:08:25] <wikibugs>	 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations: SSD firmware update not working in firmware cookbook - https://phabricator.wikimedia.org/T394543#10906122 (10RobH) 05Open→03Resolved Thanks!
[18:08:45] <sukhe>	 !log sudo cumin 'A:lvs-secondary-eqiad or A:lvs-secondary-codfw' 'run-puppet-agent': T143553
[18:08:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:08:49] <stashbot>	 T143553: Switching search traffic between datacenters should be faster - https://phabricator.wikimedia.org/T143553
[18:09:11] <wikibugs>	 10ops-codfw, 10ops-eqiad, 06SRE, 06DC-Ops: Dell SSD Critical Firmware Update - https://phabricator.wikimedia.org/T394348#10906126 (10RobH) The cookbook has been repaired via T394543 (Thank you Riccardo!) and now sub-tasks have been filed and linked from this parent task for all service groups/sre sub-teams...
[18:09:25] <logmsgbot>	 btullis@cumin1003 reimage (PID 1166575) is awaiting input
[18:09:41] <icinga-wm>	 PROBLEM - Check unit status of statograph_post on alert1002 is CRITICAL: CRITICAL: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[18:09:57] <brennen>	 !log 1.45.0-wmf.5 train status (392175): no current blockers, logs reasonably clean, rolling to group1
[18:09:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:40] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: SSD firmware update for db125[0-4] - https://phabricator.wikimedia.org/T396648#10906129 (10RobH) a:03Ladsgroup >>! In T396648#10906115, @Ladsgroup wrote: > I can take care of all of them later today except db1250. For that it has to wait a bit  T...
[18:10:57] <sukhe>	 !log sudo cumin 'A:lvs-low-traffic-eqiad or A:lvs-low-traffic-codfw' 'run-puppet-agent': T143553
[18:11:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:01] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 to 1.45.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155747 (https://phabricator.wikimedia.org/T392175)
[18:11:02] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] group1 to 1.45.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155747 (https://phabricator.wikimedia.org/T392175) (owner: 10TrainBranchBot)
[18:11:51] <wikibugs>	 (03Merged) 10jenkins-bot: group1 to 1.45.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155747 (https://phabricator.wikimedia.org/T392175) (owner: 10TrainBranchBot)
[18:12:05] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
[18:12:05] <logmsgbot>	 !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
[18:12:29] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1251 (T395241)', diff saved to https://phabricator.wikimedia.org/P77748 and previous config saved to /var/cache/conftool/dbconfig/20250611-181228-fceratto.json
[18:13:20] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
[18:16:04] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
[18:16:05] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
[18:16:57] <wikibugs>	 (03CR) 10Dzahn: [V:03+1] "the httpbb tests pass, so V+1. I just don't have the background if this was confirmed with releng and CI is configured to upload to the ne" [puppet] - 10https://gerrit.wikimedia.org/r/1155733 (https://phabricator.wikimedia.org/T392130) (owner: 10AOkoth)
[18:19:27] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] search: Update dnsdisc envoy upstreams [puppet] - 10https://gerrit.wikimedia.org/r/1151316 (https://phabricator.wikimedia.org/T143553) (owner: 10Bking)
[18:19:28] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: RMA Damaged Pdu E14 - https://phabricator.wikimedia.org/T395971#10906166 (10VRiley-WMF) I currently have a case pending with servertech.com the ticket is 00503345
[18:21:19] <logmsgbot>	 !log brennen@deploy1003 rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.5  refs T392175
[18:21:23] <stashbot>	 T392175: 1.45.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T392175
[18:21:27] <swfrench-wmf>	 brennen: once the dust settles for group 1 and logs look clean, would it be alright if I use the tail of your window to wrap up some shellbox deployments that got preempted earlier?
[18:22:27] <brennen>	 swfrench-wmf: yeah, give me a few minutes to triage logs and i'll give you a ping?
[18:22:46] <swfrench-wmf>	 brennen: that sounds great - take your time! :)
[18:23:15] <brennen>	 cool.  one or two things here i want to take a closer look at.
[18:23:18] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.dns.netbox
[18:26:46] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  an-worker1186 - vriley@cumin1002"
[18:26:53] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  an-worker1186 - vriley@cumin1002"
[18:26:53] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[18:27:36] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P77749 and previous config saved to /var/cache/conftool/dbconfig/20250611-182735-fceratto.json
[18:28:37] <wikibugs>	 10ops-codfw, 06DC-Ops: Alert for device lsw1-e1-codfw.mgmt.codfw.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T396657 (10phaultfinder) 03NEW
[18:29:34] <wikibugs>	 10ops-codfw, 06DC-Ops: Alert for device lsw1-e3-codfw.mgmt.codfw.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T396658 (10phaultfinder) 03NEW
[18:30:42] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.dns.netbox
[18:31:29] <urandom>	 !log truncating restbase mobile-sections table — T395845
[18:31:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:31:32] <stashbot>	 T395845: restbase2030 (and others) running low on disk space - https://phabricator.wikimedia.org/T395845
[18:31:51] <brennen>	 swfrench-wmf: go ahead i'd say.
[18:32:35] <swfrench-wmf>	 brennen: great, thank you very much
[18:33:35] <wikibugs>	 10ops-codfw, 06DC-Ops: Alert for device lsw1-f1-codfw.mgmt.codfw.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T396659 (10phaultfinder) 03NEW
[18:34:58] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox: apply
[18:35:32] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox: apply
[18:35:43] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
[18:36:01] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
[18:36:12] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-media: apply
[18:36:26] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
[18:36:26] <logmsgbot>	 vriley@cumin1002 netbox (PID 1306732) is awaiting input
[18:36:38] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
[18:37:02] <logmsgbot>	 !log swfrench@deploy1003 helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
[18:37:09] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  an-worker1186 - vriley@cumin1002"
[18:37:15] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  an-worker1186 - vriley@cumin1002"
[18:37:15] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[18:38:39] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Alert for device lsw1-f3-codfw.mgmt.codfw.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T393785#10906271 (10phaultfinder)
[18:42:12] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.dns.netbox
[18:42:42] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P77750 and previous config saved to /var/cache/conftool/dbconfig/20250611-184242-fceratto.json
[18:43:31] <logmsgbot>	 !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1047.eqiad.wmnet
[18:44:49] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[18:45:16] <logmsgbot>	 vriley@cumin1002 provision (PID 474103) is awaiting input
[18:49:11] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[18:49:19] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[18:49:35] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox: apply
[18:49:41] <icinga-wm>	 RECOVERY - Check unit status of statograph_post on alert1002 is OK: OK: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[18:50:16] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
[18:50:27] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
[18:50:40] <moritzm>	 !log remove ganeti1047 from Ganeti cluster in eqiad for hardware diagnosis
[18:50:42] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
[18:50:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:50:53] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
[18:51:06] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
[18:51:18] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
[18:51:41] <logmsgbot>	 !log swfrench@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
[18:52:11] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[18:53:30] <jinxer-wm>	 FIRING: ProbeDown: Service ganeti1047:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[18:53:57] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti1047 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 110 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[18:53:57] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti1047 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[18:54:49] <wikibugs>	 10ops-eqiad, 06DC-Ops: Upgrade firmware (NIC and system) on ganeti1047 - https://phabricator.wikimedia.org/T396660 (10MoritzMuehlenhoff) 03NEW
[18:54:51] <moritzm>	 ^ expected
[18:56:20] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
[18:56:26] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906317 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host an-worker1186.eqiad.wmnet with OS b...
[18:57:48] <logmsgbot>	 !log fceratto@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1251 (T395241)', diff saved to https://phabricator.wikimedia.org/P77751 and previous config saved to /var/cache/conftool/dbconfig/20250611-185748-fceratto.json
[18:57:51] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr1-magru:xe-0/1/2 (DISABLED) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[18:58:40] <wikibugs>	 06SRE: restbase2030 (and others) running low on disk space - https://phabricator.wikimedia.org/T395845#10906345 (10Eevans) That didn't move the needle as much as I'd hoped.  Most of the storage volumes are now at abut ~75% (using a combination of `nodetool cleanup` and the truncation of mobile-sections).  restba...
[18:59:59] <wikibugs>	 06SRE: restbase2030 (and others) running low on disk space - https://phabricator.wikimedia.org/T395845#10906346 (10Eevans) 05Open→03Resolved
[19:00:36] <swfrench-wmf>	 FYI, I am done with the previously mentioned shellbox updates
[19:02:06] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
[19:02:14] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906351 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host an-worker1185.eqiad.wmnet with OS b...
[19:10:15] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
[19:10:20] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906360 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host an-worker1186.eqiad.wmnet with OS bulls...
[19:12:19] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Change OutputPage::wrapWikiTextAsInterface() to soft-deprecation [core] (wmf/1.45.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1155749 (https://phabricator.wikimedia.org/T396618)
[19:12:31] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, June 11 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [core] (wmf/1.45.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1155749 (https://phabricator.wikimedia.org/T396618) (owner: 10Bartosz Dziewoński)
[19:13:59] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
[19:14:08] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906369 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host an-worker1186.eqiad.wmnet with OS b...
[19:16:33] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
[19:16:43] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906370 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host an-worker1185.eqiad.wmnet with OS bulls...
[19:17:12] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:27:24] <wikibugs>	 (03PS2) 10Cwhite: Perform dot expansion per dot_expander.rb [software/ecs] - 10https://gerrit.wikimedia.org/r/1155726
[19:27:28] <wikibugs>	 10ops-eqiad, 06SRE, 06SRE-OnFire, 10Cassandra, and 4 others: additional sessionstore expansion — eqiad - https://phabricator.wikimedia.org/T395955#10906386 (10Eevans) @VRiley-WMF any eta on this?  I don't need to do any actual drive swapping right now, but knowing the number and disposition of drives avail...
[19:28:26] <wikibugs>	 (03CR) 10Eevans: [C:03+2] cassandra: reuse preseed for JBOD configuration [puppet] - 10https://gerrit.wikimedia.org/r/1152337 (https://phabricator.wikimedia.org/T391544) (owner: 10Eevans)
[19:28:48] <wikibugs>	 10ops-eqiad, 06SRE, 06SRE-OnFire, 10Cassandra, and 4 others: additional sessionstore expansion — eqiad - https://phabricator.wikimedia.org/T395955#10906391 (10VRiley-WMF) Thanks for checking in. I will get you a number soon
[19:29:10] <wikibugs>	 (03PS3) 10Cwhite: Perform dot expansion per dot_expander.rb [software/ecs] - 10https://gerrit.wikimedia.org/r/1155726
[19:30:12] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] Perform dot expansion per dot_expander.rb [software/ecs] - 10https://gerrit.wikimedia.org/r/1155726 (owner: 10Cwhite)
[19:30:32] <wikibugs>	 (03Merged) 10jenkins-bot: Perform dot expansion per dot_expander.rb [software/ecs] - 10https://gerrit.wikimedia.org/r/1155726 (owner: 10Cwhite)
[19:30:34] <logmsgbot>	 vriley@cumin1002 reimage (PID 1311221) is awaiting input
[19:30:50] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
[19:30:55] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906411 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host an-worker1186.eqiad.wmnet with OS bulls...
[19:31:06] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:31:28] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:32:23] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:32:40] <icinga-wm>	 PROBLEM - Check unit status of statograph_post on alert1002 is CRITICAL: CRITICAL: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:34:11] <wikibugs>	 (03CR) 10AOkoth: [C:03+2] doc: make doc2003 the active host [puppet] - 10https://gerrit.wikimedia.org/r/1155733 (https://phabricator.wikimedia.org/T392130) (owner: 10AOkoth)
[19:34:35] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:36:13] <wikibugs>	 (03PS3) 10AOkoth: wmnet: switch active doc host [dns] - 10https://gerrit.wikimedia.org/r/1155306 (https://phabricator.wikimedia.org/T392130)
[19:37:38] <wikibugs>	 (03CR) 10AOkoth: [C:03+2] wmnet: switch active doc host [dns] - 10https://gerrit.wikimedia.org/r/1155306 (https://phabricator.wikimedia.org/T392130) (owner: 10AOkoth)
[19:38:03] <logmsgbot>	 !log aokoth@dns1004 START - running authdns-update
[19:38:56] <logmsgbot>	 !log aokoth@dns1004 END - running authdns-update
[19:44:38] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[19:45:19] <logmsgbot>	 vriley@cumin1002 reimage (PID 1315010) is awaiting input
[19:45:49] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
[19:45:54] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906431 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host an-worker1186.eqiad.wmnet with OS b...
[19:46:03] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
[19:46:09] <wikibugs>	 (03PS1) 10Eevans: cassandra-dev2001: configure for partition reuse [puppet] - 10https://gerrit.wikimedia.org/r/1155756 (https://phabricator.wikimedia.org/T391544)
[19:46:10] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906432 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host an-worker1185.eqiad.wmnet with OS b...
[19:50:41] <wikibugs>	 (03Abandoned) 10Eevans: Use instance `ID=default` when no ID is supplied [debs/cassandra-tools-wmf] - 10https://gerrit.wikimedia.org/r/384055 (https://phabricator.wikimedia.org/T178169) (owner: 10Eevans)
[19:51:43] <wikibugs>	 (03Abandoned) 10Eevans: Don't start cassandra on boot or via puppet [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/219503 (https://phabricator.wikimedia.org/T103134) (owner: 10GWicke)
[19:52:40] <icinga-wm>	 RECOVERY - Check unit status of statograph_post on alert1002 is OK: OK: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:52:41] <jinxer-wm>	 RESOLVED: ProbeDown: Service ganeti1047:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[19:58:10] <wikibugs>	 10ops-eqiad, 06SRE, 06SRE-OnFire, 10Cassandra, and 4 others: additional sessionstore expansion — eqiad - https://phabricator.wikimedia.org/T395955#10906453 (10VRiley-WMF) @Eevans We currently 15x 480 drives.
[20:00:05] <jouncebot>	 RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: #bothumor My software never has bugs. It just develops random features. Rise for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250611T2000).
[20:00:05] <jouncebot>	 MatmaRex: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:23] <MatmaRex>	 hi
[20:01:09] <cjming>	 hi ! i can deploy
[20:01:58] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy1003 using scap backport" [core] (wmf/1.45.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1155749 (https://phabricator.wikimedia.org/T396618) (owner: 10Bartosz Dziewoński)
[20:02:12] <logmsgbot>	 vriley@cumin1002 reimage (PID 1315192) is awaiting input
[20:02:15] <jinxer-wm>	 FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-parsoid - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[20:02:25] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
[20:02:32] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906468 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host an-worker1185.eqiad.wmnet with OS bulls...
[20:02:33] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
[20:02:42] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906469 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host an-worker1186.eqiad.wmnet with OS bulls...
[20:05:29] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:05:49] <wikibugs>	 (03Merged) 10jenkins-bot: Change OutputPage::wrapWikiTextAsInterface() to soft-deprecation [core] (wmf/1.45.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1155749 (https://phabricator.wikimedia.org/T396618) (owner: 10Bartosz Dziewoński)
[20:06:04] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:06:12] <logmsgbot>	 !log cjming@deploy1003 Started scap sync-world: Backport for [[gerrit:1155749|Change OutputPage::wrapWikiTextAsInterface() to soft-deprecation (T396618)]]
[20:06:15] <stashbot>	 T396618: PHP Deprecated: Use of MediaWiki\Output\OutputPage::wrapWikiTextAsInterface was deprecated in MediaWiki 1.45. [Called from MediaWiki\Extension\Translate\Synchronization\ExportTranslationsSpecialPage::execute] - https://phabricator.wikimedia.org/T396618
[20:06:27] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:07:14] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:07:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-parsoid - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[20:07:29] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
[20:07:37] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906475 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host an-worker1186.eqiad.wmnet with OS b...
[20:07:59] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
[20:08:08] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906476 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host an-worker1185.eqiad.wmnet with OS b...
[20:08:27] <logmsgbot>	 !log cjming@deploy1003 matmarex, cjming: Backport for [[gerrit:1155749|Change OutputPage::wrapWikiTextAsInterface() to soft-deprecation (T396618)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[20:08:53] <cjming>	 MatmaRex: ok to sync?
[20:09:00] <MatmaRex>	 cjming: yup
[20:09:11] <logmsgbot>	 !log cjming@deploy1003 matmarex, cjming: Continuing with sync
[20:11:10] <wikibugs>	 (03CR) 10Krinkle: "Tagging with wmf-perf because this changes the cache/expiry handling, and because it moves flamegraph samples. All good and welcome, but I" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1075211 (https://phabricator.wikimedia.org/T374997) (owner: 10Bartosz Dziewoński)
[20:16:12] <logmsgbot>	 !log cjming@deploy1003 Finished scap sync-world: Backport for [[gerrit:1155749|Change OutputPage::wrapWikiTextAsInterface() to soft-deprecation (T396618)]] (duration: 10m 00s)
[20:16:16] <stashbot>	 T396618: PHP Deprecated: Use of MediaWiki\Output\OutputPage::wrapWikiTextAsInterface was deprecated in MediaWiki 1.45. [Called from MediaWiki\Extension\Translate\Synchronization\ExportTranslationsSpecialPage::execute] - https://phabricator.wikimedia.org/T396618
[20:23:59] <logmsgbot>	 vriley@cumin1002 reimage (PID 1318504) is awaiting input
[20:24:22] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
[20:24:35] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906528 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host an-worker1186.eqiad.wmnet with OS bulls...
[20:24:51] <logmsgbot>	 vriley@cumin1002 reimage (PID 1318556) is awaiting input
[20:24:55] <wikibugs>	 (03PS1) 10Dwisehaupt: Add civi cname to civicrm for new standalone testing [dns] - 10https://gerrit.wikimedia.org/r/1155761 (https://phabricator.wikimedia.org/T261779)
[20:25:54] <MatmaRex>	 cjming: thanks for deploying
[20:26:02] <cjming>	 np - yw!
[20:27:14] <wikibugs>	 (03CR) 10Jgreen: [C:03+1] Add civi cname to civicrm for new standalone testing [dns] - 10https://gerrit.wikimedia.org/r/1155761 (https://phabricator.wikimedia.org/T261779) (owner: 10Dwisehaupt)
[20:27:53] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
[20:28:03] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Q2:rack/setup/install an-worker11[78-86] - https://phabricator.wikimedia.org/T377878#10906537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host an-worker1185.eqiad.wmnet with OS bulls...
[20:30:27] <wikibugs>	 (03CR) 10Dwisehaupt: [C:03+2] Add civi cname to civicrm for new standalone testing [dns] - 10https://gerrit.wikimedia.org/r/1155761 (https://phabricator.wikimedia.org/T261779) (owner: 10Dwisehaupt)
[20:30:42] <logmsgbot>	 !log dwisehaupt@dns1004 START - running authdns-update
[20:31:39] <logmsgbot>	 !log dwisehaupt@dns1004 END - running authdns-update
[20:59:02] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "Looks good, one nit in-line." [alerts] - 10https://gerrit.wikimedia.org/r/1155620 (https://phabricator.wikimedia.org/T388641) (owner: 10Ayounsi)
[21:00:04] <jouncebot>	 Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250611T2100)
[21:00:27] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] Promote the TransitPeeringIn/OutSaturation alerts to p.aging (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1155620 (https://phabricator.wikimedia.org/T388641) (owner: 10Ayounsi)
[21:10:46] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jforrester@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155686 (owner: 10Jforrester)
[21:10:46] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jforrester@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155687 (https://phabricator.wikimedia.org/T390746) (owner: 10Jforrester)
[21:11:33] <wikibugs>	 (03Merged) 10jenkins-bot: WikiLambda: Set repo-only config only in repo mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155686 (owner: 10Jforrester)
[21:11:39] <wikibugs>	 (03Merged) 10jenkins-bot: WikiLambda: Enable orchestrator cache updates on edit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155687 (https://phabricator.wikimedia.org/T390746) (owner: 10Jforrester)
[21:12:03] <logmsgbot>	 !log jforrester@deploy1003 Started scap sync-world: Backport for [[gerrit:1155686|WikiLambda: Set repo-only config only in repo mode]], [[gerrit:1155687|WikiLambda: Enable orchestrator cache updates on edit (T390746)]]
[21:12:08] <stashbot>	 T390746: When needing an Object, fetch it from the memcached pool not HTTP if so configured - https://phabricator.wikimedia.org/T390746
[21:14:12] <logmsgbot>	 !log jforrester@deploy1003 jforrester: Backport for [[gerrit:1155686|WikiLambda: Set repo-only config only in repo mode]], [[gerrit:1155687|WikiLambda: Enable orchestrator cache updates on edit (T390746)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[21:14:55] <logmsgbot>	 !log jforrester@deploy1003 jforrester: Continuing with sync
[21:17:59] <logmsgbot>	 !log cdobbins@cumin2002 START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
[21:18:03] <stashbot>	 T396581: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581
[21:21:49] <logmsgbot>	 !log jforrester@deploy1003 Finished scap sync-world: Backport for [[gerrit:1155686|WikiLambda: Set repo-only config only in repo mode]], [[gerrit:1155687|WikiLambda: Enable orchestrator cache updates on edit (T390746)]] (duration: 09m 45s)
[21:21:52] <stashbot>	 T390746: When needing an Object, fetch it from the memcached pool not HTTP if so configured - https://phabricator.wikimedia.org/T390746
[21:23:47] <wikibugs>	 10SRE-Access-Requests: apine is a member of wmf and project-deployment-prep but not spider pig - https://phabricator.wikimedia.org/T396669 (10cmassaro) 03NEW
[21:24:32] <logmsgbot>	 !log cdobbins@cumin2002 END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
[21:24:36] <stashbot>	 T396581: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581
[21:26:21] <wikibugs>	 10SRE-Access-Requests: apine is a member of wmf and deployers but not spider pig - https://phabricator.wikimedia.org/T396669#10906693 (10cmassaro)
[21:26:43] <wikibugs>	 10SRE-Access-Requests: apine is a member of wmf and deployers but not spider pig - https://phabricator.wikimedia.org/T396669#10906695 (10Jdforrester-WMF)
[21:26:55] <logmsgbot>	 !log apine@deploy1003 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[21:27:40] <logmsgbot>	 !log apine@deploy1003 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[21:27:47] <logmsgbot>	 !log apine@deploy1003 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[21:28:16] <logmsgbot>	 !log apine@deploy1003 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[21:28:31] <wikibugs>	 10SRE-Access-Requests: apine is a member of wmf and deployers but not spider pig - https://phabricator.wikimedia.org/T396669#10906700 (10Dzahn) You need to request membership in groups "deployment" and "spiderpig-access".  The "spiderpig-access" part you can request at https://idm.wikimedia.org/permissions/  see...
[21:30:49] <wikibugs>	 06SRE, 10SRE-Access-Requests: apine is a member of wmf and deployers but not spider pig - https://phabricator.wikimedia.org/T396669#10906709 (10Dzahn) update: since the last edit I see you already have "deployment" (not the same as deployment-prep).  This means all you need is the spiderpig-access part.  Just...
[21:36:15] <wikibugs>	 (03PS1) 10CDobbins: varnish: add libvmod-wmfuniq to apt-get install packages [cookbooks] - 10https://gerrit.wikimedia.org/r/1155771
[21:37:15] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] varnish: add libvmod-wmfuniq to apt-get install packages [cookbooks] - 10https://gerrit.wikimedia.org/r/1155771 (owner: 10CDobbins)
[21:38:47] <wikibugs>	 (03CR) 10BCornwall: [V:03+2 C:03+1] "`" [cookbooks] - 10https://gerrit.wikimedia.org/r/1155771 (owner: 10CDobbins)
[21:39:07] <wikibugs>	 (03CR) 10CDobbins: [C:03+2] varnish: add libvmod-wmfuniq to apt-get install packages [cookbooks] - 10https://gerrit.wikimedia.org/r/1155771 (owner: 10CDobbins)
[21:43:46] <logmsgbot>	 !log cdobbins@cumin2002 START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
[21:43:50] <stashbot>	 T396581: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581
[21:48:57] <logmsgbot>	 !log cdobbins@cumin2002 END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
[21:49:01] <stashbot>	 T396581: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581
[21:50:30] <wikibugs>	 (03PS1) 10Cwhite: beta-logs: bump phatality version [puppet] - 10https://gerrit.wikimedia.org/r/1155773 (https://phabricator.wikimedia.org/T387606)
[21:50:31] <wikibugs>	 (03PS1) 10Cwhite: logstash: bump phatality version [puppet] - 10https://gerrit.wikimedia.org/r/1155774 (https://phabricator.wikimedia.org/T387606)
[21:51:30] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] beta-logs: bump phatality version [puppet] - 10https://gerrit.wikimedia.org/r/1155773 (https://phabricator.wikimedia.org/T387606) (owner: 10Cwhite)
[22:00:05] <jouncebot>	 Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250611T2200)
[22:01:15] <wikibugs>	 (03PS1) 10Andrew Bogott: Add radosgw access for members of the new 'object_storage' role. [puppet] - 10https://gerrit.wikimedia.org/r/1155775 (https://phabricator.wikimedia.org/T396594)
[22:13:40] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651#10906820 (10Andrew) Yes -- assuming that the cookbook works reliably for updating the firmware, these should either be be managed by me or by the hypothe...
[22:35:09] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651#10906864 (10RobH) The cookbook worked reliably for updating 4 of the 6 cirrussearch hosts (first couple were used in testing so had issues on the automat...
[22:35:18] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651#10906865 (10RobH)
[22:36:28] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: SSD firmware update for an-coord100[3-4] - https://phabricator.wikimedia.org/T394499#10906866 (10RobH)
[22:36:34] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: SSD firmware update for an-coord100[3-4] - https://phabricator.wikimedia.org/T394499#10906867 (10RobH)
[22:38:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: SSD firmware update for an-coord100[3-4] - https://phabricator.wikimedia.org/T394499#10906869 (10RobH) a:05RobH→03BTullis @btullis,  With the successful update of the cookbook, an-coord1004 can now be scheduled for downtime and update.  The downtime is about 15minutes or so...
[22:38:58] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: SSD firmware update for an-mariadb100[1-2] - https://phabricator.wikimedia.org/T394498#10906872 (10RobH)
[22:40:02] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: SSD firmware update for an-mariadb100[1-2] - https://phabricator.wikimedia.org/T394498#10906884 (10RobH) @btullis,  I've updated the task description to answer the quesiton on downtime and steps required.  Would you like to handle the actual firmware update to these hosts via th...
[22:40:09] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: SSD firmware update for an-mariadb100[1-2] - https://phabricator.wikimedia.org/T394498#10906885 (10RobH) a:05RobH→03BTullis
[22:40:22] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: SSD firmware update for an-mariadb100[1-2] - https://phabricator.wikimedia.org/T394498#10906886 (10RobH)
[22:40:36] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depool db1253 (T396648)', diff saved to https://phabricator.wikimedia.org/P77754 and previous config saved to /var/cache/conftool/dbconfig/20250611-224035-ladsgroup.json
[22:40:40] <stashbot>	 T396648: SSD firmware update for db125[0-4] - https://phabricator.wikimedia.org/T396648
[22:40:52] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10fundraising-tech-ops: SSD firmware update for frbackup2002 - https://phabricator.wikimedia.org/T396649#10906889 (10RobH) p:05Triage→03Medium
[22:40:54] <wikibugs>	 (03CR) 10Samwilson: [C:03+1] IS: Enable `wgTemplateDataEnableDiscovery` for mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1155665 (https://phabricator.wikimedia.org/T377975) (owner: 10Samtar)
[22:43:46] <logmsgbot>	 !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: Firmware upgrade (T396648)
[22:47:05] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
[22:48:00] <logmsgbot>	 !log ladsgroup@cumin1002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
[22:48:32] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
[22:49:09] <logmsgbot>	 !log ladsgroup@cumin1002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
[22:50:29] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: SSD firmware update for db125[0-4] - https://phabricator.wikimedia.org/T396648#10906964 (10Ladsgroup) I'm getting this for db1253: ` db1253.eqiad.wmnet (Gen 15): starting db1253.eqiad.wmnet (SSD): update db1253.eqiad.wmnet (SSD): current version: 1...
[22:53:14] <logmsgbot>	 !log ladsgroup@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1253.eqiad.wmnet with reason: Firmware upgrade (T396648)
[22:53:18] <stashbot>	 T396648: SSD firmware update for db125[0-4] - https://phabricator.wikimedia.org/T396648
[22:54:59] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: SSD firmware update for db125[0-4] - https://phabricator.wikimedia.org/T396648#10906995 (10Ladsgroup) I bumped the downtime to 48 hours, (I have shut down mariadb and ran swap off since it'll need a reboot) so if you need to do it on your own, plea...
[22:57:51] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr1-magru:xe-0/1/2 (DISABLED) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[22:58:31] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06DBA, 06DC-Ops: SSD firmware update for db125[0-4] - https://phabricator.wikimedia.org/T396648#10907003 (10Ladsgroup) a:05Ladsgroup→03RobH
[23:38:38] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1155780
[23:38:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1155780 (owner: 10TrainBranchBot)
[23:50:27] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1155780 (owner: 10TrainBranchBot)