[00:00:09] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Start reading from af_user(_text)/afh_user(_text) in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992830 (https://phabricator.wikimedia.org/T355616) (owner: 10Zabe)
[00:01:41] <wikibugs>	 (03Merged) 10jenkins-bot: Start reading from af_user(_text)/afh_user(_text) in testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992830 (https://phabricator.wikimedia.org/T355616) (owner: 10Zabe)
[00:02:31] <logmsgbot>	 !log zabe@deploy2002 Started scap: Backport for [[gerrit:992830|Start reading from af_user(_text)/afh_user(_text) in testwiki (T355616)]]
[00:02:36] <stashbot>	 T355616: Start reading from af_user(_text)/afh_user(_text) - https://phabricator.wikimedia.org/T355616
[00:03:59] <logmsgbot>	 !log zabe@deploy2002 zabe: Backport for [[gerrit:992830|Start reading from af_user(_text)/afh_user(_text) in testwiki (T355616)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[00:04:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1243 (T354336)', diff saved to https://phabricator.wikimedia.org/P55587 and previous config saved to /var/cache/conftool/dbconfig/20240125-000452-marostegui.json
[00:04:55] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1247.eqiad.wmnet with reason: Maintenance
[00:05:01] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[00:05:09] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1247.eqiad.wmnet with reason: Maintenance
[00:05:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1247 (T354336)', diff saved to https://phabricator.wikimedia.org/P55588 and previous config saved to /var/cache/conftool/dbconfig/20240125-000515-marostegui.json
[00:05:36] <logmsgbot>	 !log zabe@deploy2002 zabe: Continuing with sync
[00:07:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247 (T354336)', diff saved to https://phabricator.wikimedia.org/P55589 and previous config saved to /var/cache/conftool/dbconfig/20240125-000726-marostegui.json
[00:12:02] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to (general SRE production SSH access) for swfrench - https://phabricator.wikimedia.org/T355834 (10Scott_French) 05In progress→03Resolved
[00:12:08] <logmsgbot>	 !log zabe@deploy2002 Finished scap: Backport for [[gerrit:992830|Start reading from af_user(_text)/afh_user(_text) in testwiki (T355616)]] (duration: 09m 36s)
[00:12:27] <stashbot>	 T355616: Start reading from af_user(_text)/afh_user(_text) - https://phabricator.wikimedia.org/T355616
[00:12:56] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2103.codfw.wmnet with OS bullseye
[00:22:33] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P55590 and previous config saved to /var/cache/conftool/dbconfig/20240125-002233-marostegui.json
[00:37:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P55591 and previous config saved to /var/cache/conftool/dbconfig/20240125-003739-marostegui.json
[00:38:55] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/992654
[00:38:58] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/992654 (owner: 10TrainBranchBot)
[00:52:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247 (T354336)', diff saved to https://phabricator.wikimedia.org/P55592 and previous config saved to /var/cache/conftool/dbconfig/20240125-005245-marostegui.json
[00:52:48] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1248.eqiad.wmnet with reason: Maintenance
[00:52:51] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[00:53:02] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1248.eqiad.wmnet with reason: Maintenance
[00:53:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1248 (T354336)', diff saved to https://phabricator.wikimedia.org/P55593 and previous config saved to /var/cache/conftool/dbconfig/20240125-005307-marostegui.json
[00:54:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248 (T354336)', diff saved to https://phabricator.wikimedia.org/P55594 and previous config saved to /var/cache/conftool/dbconfig/20240125-005417-marostegui.json
[01:00:43] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/992654 (owner: 10TrainBranchBot)
[01:01:45] <wikibugs>	 (03PS1) 10Cwhite: logstash: consume from mediawiki accesslog sampled topics [puppet] - 10https://gerrit.wikimedia.org/r/992656 (https://phabricator.wikimedia.org/T355836)
[01:01:47] <wikibugs>	 (03PS1) 10Cwhite: logstash: stop consuming the full mediawiki accesslog topics [puppet] - 10https://gerrit.wikimedia.org/r/992657 (https://phabricator.wikimedia.org/T355836)
[01:09:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P55595 and previous config saved to /var/cache/conftool/dbconfig/20240125-010923-marostegui.json
[01:24:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P55596 and previous config saved to /var/cache/conftool/dbconfig/20240125-012430-marostegui.json
[01:28:03] <logmsgbot>	 !log fab@deploy2002 Started deploy [airflow-dags/research@e6aa85a]: (no justification provided)
[01:28:17] <logmsgbot>	 !log fab@deploy2002 Finished deploy [airflow-dags/research@e6aa85a]: (no justification provided) (duration: 00m 13s)
[01:38:51] <jinxer-wm>	 (RdfStreamingUpdaterSpaceUsageTooHigh) firing: (2) The RDF Streaming Updater is using more than 50GiB of storage - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterSpaceUsageTooHigh
[01:39:37] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248 (T354336)', diff saved to https://phabricator.wikimedia.org/P55597 and previous config saved to /var/cache/conftool/dbconfig/20240125-013936-marostegui.json
[01:39:39] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1249.eqiad.wmnet with reason: Maintenance
[01:39:43] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[01:39:53] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1249.eqiad.wmnet with reason: Maintenance
[01:39:59] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1249 (T354336)', diff saved to https://phabricator.wikimedia.org/P55598 and previous config saved to /var/cache/conftool/dbconfig/20240125-013958-marostegui.json
[01:42:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1249 (T354336)', diff saved to https://phabricator.wikimedia.org/P55599 and previous config saved to /var/cache/conftool/dbconfig/20240125-014208-marostegui.json
[01:57:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P55600 and previous config saved to /var/cache/conftool/dbconfig/20240125-015714-marostegui.json
[02:12:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P55601 and previous config saved to /var/cache/conftool/dbconfig/20240125-021221-marostegui.json
[02:27:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1249 (T354336)', diff saved to https://phabricator.wikimedia.org/P55602 and previous config saved to /var/cache/conftool/dbconfig/20240125-022727-marostegui.json
[02:27:30] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[02:27:34] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[02:27:44] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
[02:29:16] <icinga-wm>	 RECOVERY - BFD status on cr1-eqiad is OK: UP: 24 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[02:29:16] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:29:50] <icinga-wm>	 RECOVERY - Router interfaces on cr1-drmrs is OK: OK: host 185.15.58.128, interfaces up: 58, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[02:39:21] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:50:13] <wikibugs>	 (03PS1) 10Andrew Bogott: disable_tool: remove the archive_db stage from the cron host [puppet] - 10https://gerrit.wikimedia.org/r/992835 (https://phabricator.wikimedia.org/T353642)
[02:51:24] <wikibugs>	 (03PS2) 10Andrew Bogott: disable_tool: remove the archive_db stage from the cron host [puppet] - 10https://gerrit.wikimedia.org/r/992835 (https://phabricator.wikimedia.org/T353642)
[02:55:49] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] disable_tool: remove the archive_db stage from the cron host [puppet] - 10https://gerrit.wikimedia.org/r/992835 (https://phabricator.wikimedia.org/T353642) (owner: 10Andrew Bogott)
[03:09:21] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:57:29] <wikibugs>	 (03CR) 10Samwilson: [C: 03+1] "I've double-checked it and it's right." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992632 (https://phabricator.wikimedia.org/T350653) (owner: 10Samtar)
[05:03:58] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: refinery-sqoop-mediawiki-production-daily.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:38:51] <jinxer-wm>	 (RdfStreamingUpdaterSpaceUsageTooHigh) firing: (2) The RDF Streaming Updater is using more than 50GiB of storage - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterSpaceUsageTooHigh
[05:50:57] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
[05:51:10] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
[05:55:35] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2099.codfw.wmnet with reason: Maintenance
[05:55:59] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2099.codfw.wmnet with reason: Maintenance
[05:56:01] <wikibugs>	 (03PS1) 10Marostegui: Revert "mariadb: Disable notifications on A1 hosts" [puppet] - 10https://gerrit.wikimedia.org/r/992780
[05:56:05] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2106.codfw.wmnet with reason: Maintenance
[05:56:20] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2106.codfw.wmnet with reason: Maintenance
[05:56:26] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2106 (T354336)', diff saved to https://phabricator.wikimedia.org/P55603 and previous config saved to /var/cache/conftool/dbconfig/20240125-055626-marostegui.json
[05:56:31] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[05:58:38] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2106 (T354336)', diff saved to https://phabricator.wikimedia.org/P55604 and previous config saved to /var/cache/conftool/dbconfig/20240125-055837-marostegui.json
[06:00:07] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "mariadb: Disable notifications on A1 hosts" [puppet] - 10https://gerrit.wikimedia.org/r/992780 (owner: 10Marostegui)
[06:02:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55605 and previous config saved to /var/cache/conftool/dbconfig/20240125-060214-root.json
[06:02:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2157 (re)pooling @ 1%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55606 and previous config saved to /var/cache/conftool/dbconfig/20240125-060222-root.json
[06:02:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2026 (re)pooling @ 1%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55607 and previous config saved to /var/cache/conftool/dbconfig/20240125-060240-root.json
[06:02:49] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2136 (re)pooling @ 1%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55608 and previous config saved to /var/cache/conftool/dbconfig/20240125-060249-root.json
[06:10:22] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s2 T355682
[06:10:28] <stashbot>	 T355682: Switchover s2 master (db2107 -> db2104) - https://phabricator.wikimedia.org/T355682
[06:10:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Set db2104 with weight 0 T355682', diff saved to https://phabricator.wikimedia.org/P55609 and previous config saved to /var/cache/conftool/dbconfig/20240125-061048-root.json
[06:11:00] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s2 T355682
[06:11:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[06:12:40] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Promote db2104 to s2 master [puppet] - 10https://gerrit.wikimedia.org/r/992428 (https://phabricator.wikimedia.org/T355682) (owner: 10Gerrit maintenance bot)
[06:13:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P55610 and previous config saved to /var/cache/conftool/dbconfig/20240125-061344-marostegui.json
[06:15:19] <wikibugs>	 (03PS1) 10Marostegui: ProductionServices.php: Promote pc2014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992842 (https://phabricator.wikimedia.org/T355683)
[06:17:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55611 and previous config saved to /var/cache/conftool/dbconfig/20240125-061719-root.json
[06:17:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55612 and previous config saved to /var/cache/conftool/dbconfig/20240125-061727-root.json
[06:17:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2026 (re)pooling @ 5%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55613 and previous config saved to /var/cache/conftool/dbconfig/20240125-061745-root.json
[06:17:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2136 (re)pooling @ 5%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55614 and previous config saved to /var/cache/conftool/dbconfig/20240125-061753-root.json
[06:26:32] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] ProductionServices.php: Promote pc2014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992842 (https://phabricator.wikimedia.org/T355683) (owner: 10Marostegui)
[06:27:15] <wikibugs>	 (03Merged) 10jenkins-bot: ProductionServices.php: Promote pc2014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992842 (https://phabricator.wikimedia.org/T355683) (owner: 10Marostegui)
[06:28:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P55615 and previous config saved to /var/cache/conftool/dbconfig/20240125-062851-marostegui.json
[06:29:04] <logmsgbot>	 !log marostegui@deploy2002 Started scap: Backport for [[gerrit:992842|ProductionServices.php: Promote pc2014 (T355683)]]
[06:29:09] <stashbot>	 T355683: Switchover pc2 master - https://phabricator.wikimedia.org/T355683
[06:30:58] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Backport for [[gerrit:992842|ProductionServices.php: Promote pc2014 (T355683)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[06:31:25] <logmsgbot>	 !log marostegui@deploy2002 marostegui: Continuing with sync
[06:32:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55616 and previous config saved to /var/cache/conftool/dbconfig/20240125-063225-root.json
[06:32:32] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55617 and previous config saved to /var/cache/conftool/dbconfig/20240125-063232-root.json
[06:32:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2026 (re)pooling @ 10%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55618 and previous config saved to /var/cache/conftool/dbconfig/20240125-063250-root.json
[06:32:59] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55619 and previous config saved to /var/cache/conftool/dbconfig/20240125-063258-root.json
[06:37:46] <logmsgbot>	 !log marostegui@deploy2002 Finished scap: Backport for [[gerrit:992842|ProductionServices.php: Promote pc2014 (T355683)]] (duration: 08m 42s)
[06:37:51] <stashbot>	 T355683: Switchover pc2 master - https://phabricator.wikimedia.org/T355683
[06:38:06] <wikibugs>	 (03PS1) 10Marostegui: pc2: Enable notifications on the master [puppet] - 10https://gerrit.wikimedia.org/r/992843 (https://phabricator.wikimedia.org/T355683)
[06:39:17] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] pc2: Enable notifications on the master [puppet] - 10https://gerrit.wikimedia.org/r/992843 (https://phabricator.wikimedia.org/T355683) (owner: 10Marostegui)
[06:41:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[06:43:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2106 (T354336)', diff saved to https://phabricator.wikimedia.org/P55620 and previous config saved to /var/cache/conftool/dbconfig/20240125-064357-marostegui.json
[06:44:00] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
[06:44:03] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[06:44:14] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
[06:44:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2110 (T354336)', diff saved to https://phabricator.wikimedia.org/P55621 and previous config saved to /var/cache/conftool/dbconfig/20240125-064420-marostegui.json
[06:47:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55622 and previous config saved to /var/cache/conftool/dbconfig/20240125-064729-root.json
[06:47:37] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55623 and previous config saved to /var/cache/conftool/dbconfig/20240125-064737-root.json
[06:47:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55624 and previous config saved to /var/cache/conftool/dbconfig/20240125-064755-root.json
[06:48:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55625 and previous config saved to /var/cache/conftool/dbconfig/20240125-064803-root.json
[06:53:46] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence: Relocating servers out of A1 in codfw - https://phabricator.wikimedia.org/T355437 (10Marostegui) Database related hosts are being repooled
[06:55:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2110 (T354336)', diff saved to https://phabricator.wikimedia.org/P55626 and previous config saved to /var/cache/conftool/dbconfig/20240125-065535-marostegui.json
[06:55:41] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[07:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T0700)
[07:00:04] <jouncebot>	 kormat, marostegui, and Amir1: May I have your attention please! Primary database switchover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T0700)
[07:00:08] <marostegui>	 arnaudb: ready?
[07:00:41] <arnaudb>	 ready
[07:00:46] <marostegui>	 oooook
[07:00:55] <marostegui>	 !log Starting s2 codfw failover from db2107 to db2104 - T355682
[07:00:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:01:03] <stashbot>	 T355682: Switchover s2 master (db2107 -> db2104) - https://phabricator.wikimedia.org/T355682
[07:01:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - T355682', diff saved to https://phabricator.wikimedia.org/P55627 and previous config saved to /var/cache/conftool/dbconfig/20240125-070120-marostegui.json
[07:01:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Promote db2104 to s2 primary and set section read-write T355682', diff saved to https://phabricator.wikimedia.org/P55628 and previous config saved to /var/cache/conftool/dbconfig/20240125-070153-marostegui.json
[07:02:08] <marostegui>	 arnaudb: done, can you check you can write in any s2 wiki?
[07:02:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55629 and previous config saved to /var/cache/conftool/dbconfig/20240125-070234-root.json
[07:02:42] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55630 and previous config saved to /var/cache/conftool/dbconfig/20240125-070242-root.json
[07:02:47] <arnaudb>	 one sec, on it
[07:03:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55631 and previous config saved to /var/cache/conftool/dbconfig/20240125-070300-root.json
[07:03:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55632 and previous config saved to /var/cache/conftool/dbconfig/20240125-070308-root.json
[07:05:26] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] wmnet: Update s2-master alias [dns] - 10https://gerrit.wikimedia.org/r/992429 (https://phabricator.wikimedia.org/T355682) (owner: 10Gerrit maintenance bot)
[07:06:05] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2107 T355682', diff saved to https://phabricator.wikimedia.org/P55633 and previous config saved to /var/cache/conftool/dbconfig/20240125-070604-marostegui.json
[07:06:21] <stashbot>	 T355682: Switchover s2 master (db2107 -> db2104) - https://phabricator.wikimedia.org/T355682
[07:07:07] <wikibugs>	 (03CR) 10Mxmxchere: "Hi Joe and thanks for the prompt review. Your goal is that for etcd 3.3/Debian 11 machines the config file should remain untouched to circ" [puppet] - 10https://gerrit.wikimedia.org/r/992629 (owner: 10Mxmxchere)
[07:08:06] <arnaudb>	 everything looks ok on my end
[07:08:11] <marostegui>	 ok thanks
[07:08:36] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10Marostegui)
[07:12:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2159 db2160 db2109 db2107 db2137:3314 db2135:3315 db2143 db2147 db2177 db2178 db2188 T355549', diff saved to https://phabricator.wikimedia.org/P55634 and previous config saved to /var/cache/conftool/dbconfig/20240125-071253-marostegui.json
[07:12:59] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P55635 and previous config saved to /var/cache/conftool/dbconfig/20240125-071259-marostegui.json
[07:13:00] <stashbot>	 T355549: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549
[07:13:54] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10Marostegui) Database hosts are depooled - @cmooney confirm if you will downtime them or if I should do it myself
[07:17:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55636 and previous config saved to /var/cache/conftool/dbconfig/20240125-071739-root.json
[07:17:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55637 and previous config saved to /var/cache/conftool/dbconfig/20240125-071747-root.json
[07:18:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55638 and previous config saved to /var/cache/conftool/dbconfig/20240125-071805-root.json
[07:18:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55639 and previous config saved to /var/cache/conftool/dbconfig/20240125-071813-root.json
[07:20:10] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2137:3315 T355549', diff saved to https://phabricator.wikimedia.org/P55640 and previous config saved to /var/cache/conftool/dbconfig/20240125-072010-marostegui.json
[07:20:19] <stashbot>	 T355549: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549
[07:28:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P55641 and previous config saved to /var/cache/conftool/dbconfig/20240125-072806-marostegui.json
[07:31:56] <wikibugs>	 (03PS4) 10Slyngshede: Debian packaging, dependencies and permissions [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992739
[07:32:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55642 and previous config saved to /var/cache/conftool/dbconfig/20240125-073244-root.json
[07:32:52] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55643 and previous config saved to /var/cache/conftool/dbconfig/20240125-073252-root.json
[07:32:58] <wikibugs>	 (03CR) 10Slyngshede: Debian packaging, dependencies and permissions (031 comment) [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992739 (owner: 10Slyngshede)
[07:33:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55644 and previous config saved to /var/cache/conftool/dbconfig/20240125-073310-root.json
[07:33:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: After on-site maintenance', diff saved to https://phabricator.wikimedia.org/P55645 and previous config saved to /var/cache/conftool/dbconfig/20240125-073319-root.json
[07:43:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2110 (T354336)', diff saved to https://phabricator.wikimedia.org/P55646 and previous config saved to /var/cache/conftool/dbconfig/20240125-074312-marostegui.json
[07:43:15] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2119.codfw.wmnet with reason: Maintenance
[07:43:18] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[07:43:28] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2119.codfw.wmnet with reason: Maintenance
[07:43:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2119 (T354336)', diff saved to https://phabricator.wikimedia.org/P55647 and previous config saved to /var/cache/conftool/dbconfig/20240125-074334-marostegui.json
[07:45:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2119 (T354336)', diff saved to https://phabricator.wikimedia.org/P55648 and previous config saved to /var/cache/conftool/dbconfig/20240125-074546-marostegui.json
[07:59:00] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] ml-serve: Drop explicit list of deployExtraClusterRoles [deployment-charts] - 10https://gerrit.wikimedia.org/r/992764 (https://phabricator.wikimedia.org/T354516) (owner: 10Klausman)
[08:00:04] <jouncebot>	 Amir1 and Urbanecm: Your horoscope predicts another UTC morning backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T0800).
[08:00:04] <jouncebot>	 Tran: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:00:16] <Tran>	 👋
[08:00:49] <wikibugs>	 (03CR) 10Muehlenhoff: Debian packaging, dependencies and permissions (031 comment) [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992739 (owner: 10Slyngshede)
[08:00:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P55650 and previous config saved to /var/cache/conftool/dbconfig/20240125-080053-marostegui.json
[08:00:54] <kostajh>	 hi
[08:01:23] <wikibugs>	 (03CR) 10Kosta Harlan: "Yes" [extensions/CentralAuth] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/992123 (https://phabricator.wikimedia.org/T354928) (owner: 10Kosta Harlan)
[08:01:30] <kostajh>	 I'm going to add https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/992123 to the calendar
[08:01:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[08:01:45] <kostajh>	 Hi Tran! I can deploy your patch
[08:02:01] <Tran>	 I can also run the deploy steps myself if you're here to help bail me out if I mess it up?
[08:02:10] <Tran>	 I do have access to the deploy server
[08:03:10] <kostajh>	 hmm, actually, sorry I just got a notice from my calendar reminding me I need to leave soon
[08:03:16] <kostajh>	 Amir1, are you around?
[08:04:42] <kostajh>	 or perhaps hashar?
[08:05:57] <Tran>	 Alternatively, I could just deploy it and revert immediately if something goes pear shaped. The steps look reasonable.
[08:06:36] <wikibugs>	 (03CR) 10Kosta Harlan: Update beta configs to reflect new temp account naming pattern (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992670 (https://phabricator.wikimedia.org/T349503) (owner: 10STran)
[08:07:11] <kostajh>	 Tran: yeah going with https://deploy-commands.toolforge.org/bacc/992670 should be pretty straightforward
[08:07:34] <kostajh>	 if you're comfortable doing so, I'm around for another 5 minutes or so
[08:07:43] <Tran>	 Okay I can start
[08:08:09] <kostajh>	 likewise, if you're comfortable syncing https://deploy-commands.toolforge.org/bacc/992123, I'd appreciate that. The patch is already live on wmf.14 and merged into master, it just didn't make the branch cut for wmf.15.
[08:08:45] <Tran>	 Let's see if I can get this first one done without problem and if I can, I'll do yours too.
[08:09:19] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove long-absented resource [puppet] - 10https://gerrit.wikimedia.org/r/992700
[08:09:25] <Tran>	 Actually, let me do yours first so I can answer your comment on my patch without rushing
[08:10:32] <wikibugs>	 (03CR) 10STran: [C: 03+2] "backporting" [extensions/CentralAuth] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/992123 (https://phabricator.wikimedia.org/T354928) (owner: 10Kosta Harlan)
[08:11:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by stran@deploy2002 using scap backport" [extensions/CentralAuth] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/992123 (https://phabricator.wikimedia.org/T354928) (owner: 10Kosta Harlan)
[08:12:50] <kostajh>	 Tran: for verifying https://gerrit.wikimedia.org/r/992123, you'd create a new account on test.wikipedia.org via mwdebug2002 and then have a look at logstash debug dashboard https://logstash.wikimedia.org/app/dashboards#/view/mwdebug1002?_g=h@48fceb7&_a=h@b20f488
[08:15:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove long-absented resource [puppet] - 10https://gerrit.wikimedia.org/r/992700 (owner: 10Muehlenhoff)
[08:15:46] <wikibugs>	 (03Merged) 10jenkins-bot: PreAuthenticationProvider: Allow blocking account creation based on IP reputation [extensions/CentralAuth] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/992123 (https://phabricator.wikimedia.org/T354928) (owner: 10Kosta Harlan)
[08:16:00] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P55651 and previous config saved to /var/cache/conftool/dbconfig/20240125-081559-marostegui.json
[08:16:12] <logmsgbot>	 !log stran@deploy2002 Started scap: Backport for [[gerrit:992123|PreAuthenticationProvider: Allow blocking account creation based on IP reputation (T354928)]]
[08:16:17] <stashbot>	 T354928: Allow denial of account creation for IPs known to ipoid - https://phabricator.wikimedia.org/T354928
[08:16:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete setting [puppet] - 10https://gerrit.wikimedia.org/r/992407 (owner: 10Muehlenhoff)
[08:19:53] <wikibugs>	 (03PS1) 10Muehlenhoff: Default insetup::buster role to not send notifications as well [puppet] - 10https://gerrit.wikimedia.org/r/992846
[08:20:16] <wikibugs>	 (03PS2) 10Muehlenhoff: Switch hadoop master/standby roles to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/990693 (https://phabricator.wikimedia.org/T349619)
[08:22:40] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/990693 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[08:28:19] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Default insetup::buster role to not send notifications as well [puppet] - 10https://gerrit.wikimedia.org/r/992846 (owner: 10Muehlenhoff)
[08:31:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2119 (T354336)', diff saved to https://phabricator.wikimedia.org/P55652 and previous config saved to /var/cache/conftool/dbconfig/20240125-083106-marostegui.json
[08:31:09] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2136.codfw.wmnet with reason: Maintenance
[08:31:11] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2136.codfw.wmnet with reason: Maintenance
[08:31:12] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[08:31:17] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
[08:31:31] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
[08:31:45] <jinxer-wm>	 (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[08:40:25] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Default insetup::buster role to not send notifications as well [puppet] - 10https://gerrit.wikimedia.org/r/992846 (owner: 10Muehlenhoff)
[08:40:50] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Remove Marko from a few groups no longer needed/used [puppet] - 10https://gerrit.wikimedia.org/r/991774 (owner: 10Muehlenhoff)
[08:44:58] <logmsgbot>	 !log stran@deploy2002 stran and kharlan: Backport for [[gerrit:992123|PreAuthenticationProvider: Allow blocking account creation based on IP reputation (T354928)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:45:08] <stashbot>	 T354928: Allow denial of account creation for IPs known to ipoid - https://phabricator.wikimedia.org/T354928
[08:45:42] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: puppet: fail the run with puppet 7 and buster [puppet] - 10https://gerrit.wikimedia.org/r/991540 (owner: 10Filippo Giunchedi)
[08:49:54] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: profile: restart postgres on first install / bootstrap [puppet] - 10https://gerrit.wikimedia.org/r/705704 (owner: 10Filippo Giunchedi)
[08:50:54] <wikibugs>	 (03CR) 10Filippo Giunchedi: profile: restart postgres on first install / bootstrap (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/705704 (owner: 10Filippo Giunchedi)
[08:53:56] <Tran>	 Currently testing 992123, hoping to be done before the window ends and apologies if I run over.
[08:59:10] <kostajh>	 Tran: I'm back; how's it going?
[08:59:50] <Tran>	 Testing it right now. It took longer to deploy than expected. I was able to successfully create an account and logs looked okay to me. Could you double check?
[09:00:05] <jouncebot>	 hashar and jnuche: Deploy window MediaWiki train - Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T0900)
[09:00:22] <hashar>	 the train is blocked
[09:00:42] <kostajh>	 Tran: yeah, checking it
[09:00:43] <hashar>	 a blocker due to CentralAuth got added yesterday night
[09:00:52] <hashar>	 I have to announce it
[09:01:02] <tgr>	 hashar: do you have a link?
[09:01:27] <hashar>	 tgr: https://gerrit.wikimedia.org/r/992804  UserGroupManager: Fix cross-wiki database access
[09:01:30] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] udp2log: Replace ferm rules with firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/991793 (owner: 10Muehlenhoff)
[09:01:41] <hashar>	 due to some heavy refactoring in the mediawiki DB layer https://gerrit.wikimedia.org/r/c/mediawiki/core/+/990745
[09:01:54] <hashar>	 according to taavi (but I see no reason to not trust his judgement :] )
[09:02:07] <hashar>	 I am mentioning him for reference
[09:02:14] <hashar>	 and all that code completely escapes me
[09:02:26] <kostajh>	 Tran: it looks good to me
[09:02:41] <Tran>	 great thanks I'll continue with the sync
[09:04:21] <Tran>	 kostajh> do you know how I can recover from a disconnected pipe. I forgot to run this in a screen.
[09:05:53] <kostajh>	 Tran: I am not sure.
[09:06:05] <Tran>	 Well that's awkward. Do I re-run scap? or revert?
[09:06:12] <tgr>	 hashar: give me ten minutes to test.
[09:06:39] <hashar>	 tgr: yeah no worries, that got reported last night
[09:06:49] <kostajh>	 I think you can just re-run `scap backport {changeid}`
[09:06:55] <tgr>	 Tran: check with ps if it's still running?
[09:06:57] <kostajh>	 but maybe someone else here knows
[09:07:13] <kostajh>	 I think it's paused at the "test on mwdebug" stage
[09:07:24] <tgr>	 oh, right
[09:07:59] <tgr>	 that's probably not recoverable without root
[09:08:06] <tgr>	 but yeah you can just re-run it
[09:08:07] <taavi>	 I'm not sure if `scap backport` works on an already merged patch, but `scap sync-world` will surely do the right thing since the patch was already merged and pulled to deploy2002
[09:08:24] <tgr>	 backport works too, it just skips the merge part then
[09:08:36] <taavi>	 oh even better
[09:08:46] <tgr>	 but yeah sync is a little faster
[09:09:00] <tgr>	 you will need to abort the old scap since it has a lock system
[09:09:28] <tgr>	 maybe there is a command line parameter for that?
[09:09:57] <Tran>	 would that be `scap backport --revert <change_number_or_url> `?
[09:09:58] <tgr>	 if not, probably fine to just kill it, if it's waiting for a keypress
[09:10:12] <Tran>	 I don't have access to the process, based on what `ps` is telling me
[09:10:14] <tgr>	 no, revert would try to undo the change
[09:10:47] <kostajh>	 I would try `scap backport 992123`. (after invoking `screen` or `tmux`)
[09:11:09] <Tran>	 okay let me try that and yes, lesson learned. Use `tmux`.
[09:12:02] <logmsgbot>	 !log stran@deploy2002 Started scap: Backport for [[gerrit:992123|PreAuthenticationProvider: Allow blocking account creation based on IP reputation (T354928)]]
[09:12:07] <stashbot>	 T354928: Allow denial of account creation for IPs known to ipoid - https://phabricator.wikimedia.org/T354928
[09:13:30] <tgr>	 Tran: you seem to be owning process 29685
[09:14:04] <wikibugs>	 (03PS1) 10Muehlenhoff: Revert "udp2log: Replace ferm rules with firewall::service" [puppet] - 10https://gerrit.wikimedia.org/r/992880
[09:14:16] <logmsgbot>	 !log stran@deploy2002 kharlan and stran: Backport for [[gerrit:992123|PreAuthenticationProvider: Allow blocking account creation based on IP reputation (T354928)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[09:15:46] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Revert "udp2log: Replace ferm rules with firewall::service" [puppet] - 10https://gerrit.wikimedia.org/r/992880 (owner: 10Muehlenhoff)
[09:16:26] <icinga-wm>	 PROBLEM - Host mwlog2002 is DOWN: PING CRITICAL - Packet loss = 100%
[09:16:44] <kostajh>	 I made https://gitlab.wikimedia.org/toolforge-repos/deploy-commands/-/merge_requests/1 to update the deployment commands page to reference tmux/screen
[09:16:50] <Tran>	 tgr is that the old scrap I disconnected from? I think 12856 is the new one I just kicked off.
[09:18:12] <logmsgbot>	 !log stran@deploy2002 kharlan and stran: Continuing with sync
[09:18:12] <kostajh>	 it looks like 29685 was `scap backport` which has invoked a new process for `sync-world` which is 12856
[09:18:23] <Tran>	 new scap is syncing
[09:19:40] <tgr>	 if it doesn't prevent you from running scap again it's fine. I thought it uses a lockfile but maybe that's only done during the sync step.
[09:21:50] <icinga-wm>	 RECOVERY - Host mwlog2002 is UP: PING OK - Packet loss = 0%, RTA = 30.29 ms
[09:22:41] <hashar>	 I can't remember where (or whether) the scap log files are, but it emits its logs over syslog which can then be seen in Kibana https://logstash.wikimedia.org/app/dashboards#/view/f7e31de0-9f0d-11eb-863c-3588009e4dd9
[09:22:57] <hashar>	 so you can potentially check the progress from there
[09:23:24] <hashar>	 yesterday a backport took 10/11 minutes, I am guessing that is the new baseline
[09:25:20] <hashar>	 and pid 12856 is still emitting logs (can be checked by filtering on `process.pid:12856`
[09:29:26] <logmsgbot>	 !log stran@deploy2002 Finished scap: Backport for [[gerrit:992123|PreAuthenticationProvider: Allow blocking account creation based on IP reputation (T354928)]] (duration: 17m 24s)
[09:29:31] <stashbot>	 T354928: Allow denial of account creation for IPs known to ipoid - https://phabricator.wikimedia.org/T354928
[09:29:56] <Tran>	 Well I think that finished successfully
[09:30:09] <Tran>	 it took 17 minutes
[09:30:30] <hashar>	 and the `!log` shows it has completed
[09:30:38] <hashar>	 no clue why it took SO long though :-\\\\\
[09:31:08] <Tran>	 checking `ps`, I don't see any processes I own that refer to the commands I ran about 40 minutes ago so I guess it timed out?
[09:31:08] <wikibugs>	 (03CR) 10Kosta Harlan: Update beta configs to reflect new temp account naming pattern (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992670 (https://phabricator.wikimedia.org/T349503) (owner: 10STran)
[09:31:40] <tgr>	 hashar: the old one is not emitting logs: https://logstash.wikimedia.org/goto/9ccb86a05ea94cb1f559061b9d21e0cb
[09:32:04] <taavi>	 I would assume the old process was just killed when the SSH session timed out
[09:32:07] <kostajh>	 right
[09:32:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T354336)', diff saved to https://phabricator.wikimedia.org/P55653 and previous config saved to /var/cache/conftool/dbconfig/20240125-093208-marostegui.json
[09:32:16] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[09:32:33] <kostajh>	 Tran: will you sync the config patch or do you want to do that another time? I had a question about one of the values there, so maybe a later window is better.
[09:32:41] <taavi>	 and as the last scap run finished successfully, everything should be in a consistent state now
[09:33:22] <Tran>	 kostajh We're out of the window so I can reschedule it. I came to the same assumption you did but I pinged someone with more context about it and we can wait for that answer.
[09:33:29] <kostajh>	 ok
[09:33:41] <tgr>	 taavi: do you know how to reproduce the train blocked bug locally?
[09:33:55] <tgr>	 I guess I need to clear the central user cache?
[09:34:21] <tgr>	 Tran: the train is blocked so syncing a config change should be fine
[09:34:39] <tgr>	 hashar: ^ right?
[09:34:45] <hashar>	 yes
[09:34:55] <hashar>	 Tran: yes please continue with your deployment
[09:35:00] <hashar>	 jouncebot: now
[09:35:00] <jouncebot>	 For the next 1 hour(s) and 24 minute(s): MediaWiki train - Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T0900)
[09:35:12] <hashar>	 we are not running the train this morning 
[09:35:33] <hashar>	 and I am usually more than happy having the backport window to be extended as long as all parties are aware :)
[09:36:04] <tgr>	 FWIW the fix for T355813 looks good, I just need to figure out how to test it
[09:36:04] <stashbot>	 T355813: CentralAuth doesn't shows user rights correctly - https://phabricator.wikimedia.org/T355813
[09:36:14] <hashar>	 ah great
[09:36:34] <taavi>	 tgr: I'm able to repro just by visiting Special:CA on my local wiki without the fix applied
[09:36:52] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] "Looks good, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/990693 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[09:36:59] <hashar>	 I am not qualified at all in reviewing any of that since I know nothing about CentralAuth, shared DB or the DB abstraction layer or architecture
[09:37:16] <hashar>	 taavi: then I guess we can cherry pick and try it out on mwdebug?
[09:37:41] <icinga-wm>	 PROBLEM - Host mwlog2002 is DOWN: PING CRITICAL - Packet loss = 100%
[09:38:17] <hashar>	 eek
[09:38:39] <taavi>	 moritzm: is mwlog2002 being down related to the udp2log patch you merged earlier?
[09:38:42] <hashar>	 I can ssh on mwlog2002
[09:38:51] <jinxer-wm>	 (RdfStreamingUpdaterSpaceUsageTooHigh) firing: (2) The RDF Streaming Updater is using more than 50GiB of storage - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterSpaceUsageTooHigh
[09:39:01] <icinga-wm>	 RECOVERY - Host mwlog2002 is UP: PING OK - Packet loss = 0%, RTA = 31.31 ms
[09:39:10] <tgr>	 duh, I'm being stupid
[09:39:30] <tgr>	 of course it works locally if I have the same groups on every wiki
[09:40:21] <taavi>	 whoops
[09:41:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch hadoop master/standby roles to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/990693 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[09:42:01] <Tran>	 Sorry for the delay, was discussing if we were ready to deploy the config change. If I could still make it in 5-10 minutes, that would be great otherwise we're not in a rush.
[09:43:58] <wikibugs>	 (03PS5) 10Slyngshede: Debian packaging, dependencies and database migration. [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992739
[09:44:06] <wikibugs>	 (03Abandoned) 10Muehlenhoff: Also default insetup::buster role disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/990695 (owner: 10Muehlenhoff)
[09:45:30] <wikibugs>	 (03PS4) 10STran: Update beta configs to reflect new temp account naming pattern [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992670 (https://phabricator.wikimedia.org/T349503)
[09:46:48] <wikibugs>	 (03CR) 10Kosta Harlan: Update beta configs to reflect new temp account naming pattern (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992670 (https://phabricator.wikimedia.org/T349503) (owner: 10STran)
[09:47:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P55654 and previous config saved to /var/cache/conftool/dbconfig/20240125-094714-marostegui.json
[09:47:34] <wikibugs>	 (03CR) 10Kosta Harlan: [C: 03+1] Update beta configs to reflect new temp account naming pattern (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992670 (https://phabricator.wikimedia.org/T349503) (owner: 10STran)
[09:48:11] <wikibugs>	 (03CR) 10Muehlenhoff: Debian packaging, dependencies and database migration. (031 comment) [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992739 (owner: 10Slyngshede)
[09:50:01] <wikibugs>	 (03PS5) 10STran: Update beta configs to reflect new temp account naming pattern [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992670 (https://phabricator.wikimedia.org/T349503)
[09:50:37] <wikibugs>	 (03CR) 10Kosta Harlan: [C: 03+1] "thanks! Looks good to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992670 (https://phabricator.wikimedia.org/T349503) (owner: 10STran)
[09:51:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Fold linux44 into the regular wmf kmod::blacklist [puppet] - 10https://gerrit.wikimedia.org/r/992702 (owner: 10Muehlenhoff)
[09:51:39] <tgr>	 hashar: +2-d. I'll leave the backport to someone else, it's getting late.
[09:53:00] <Tran>	 If no one has any objections, could I start my config backport of 992670?
[09:53:33] <tgr>	 Tran: I'd say go for it
[09:53:48] <tgr>	 core merges take way longer than config merges
[09:53:57] <Tran>	 alright then I'm starting
[09:54:31] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by stran@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992670 (https://phabricator.wikimedia.org/T349503) (owner: 10STran)
[09:55:18] <wikibugs>	 (03Merged) 10jenkins-bot: Update beta configs to reflect new temp account naming pattern [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992670 (https://phabricator.wikimedia.org/T349503) (owner: 10STran)
[09:55:32] <wikibugs>	 (03PS6) 10Slyngshede: Debian packaging, dependencies and database migration. [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992739
[09:59:26] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Ship it :-)" [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992739 (owner: 10Slyngshede)
[09:59:51] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+2] Debian packaging, dependencies and database migration. (031 comment) [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992739 (owner: 10Slyngshede)
[10:00:05] <Tran>	 backport of 992670 is done
[10:01:34] <kostajh>	 hashar: should we log that the backport window is done? 
[10:02:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P55655 and previous config saved to /var/cache/conftool/dbconfig/20240125-100221-marostegui.json
[10:02:52] <wikibugs>	 (03Merged) 10jenkins-bot: Debian packaging, dependencies and database migration. [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992739 (owner: 10Slyngshede)
[10:05:34] <wikibugs>	 (03CR) 10Majavah: [V: 03+1 C: 03+2] Bring cloudrabbit1003 in service as a new cluster [puppet] - 10https://gerrit.wikimedia.org/r/992725 (owner: 10Majavah)
[10:07:46] <wikibugs>	 (03PS1) 10Muehlenhoff: Install debmonitor-server on bookworm [puppet] - 10https://gerrit.wikimedia.org/r/992881 (https://phabricator.wikimedia.org/T241049)
[10:12:45] <wikibugs>	 (03PS1) 10Majavah: P:openstack: rabbitmq: fix RABBITMQ_NODENAME [puppet] - 10https://gerrit.wikimedia.org/r/992882
[10:12:47] <wikibugs>	 (03PS1) 10Majavah: rabbitmq: fix order of invalidate_rabbitmq_guest_account [puppet] - 10https://gerrit.wikimedia.org/r/992883
[10:14:12] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1208/co" [puppet] - 10https://gerrit.wikimedia.org/r/992883 (owner: 10Majavah)
[10:17:04] <moritzm>	 !log upgrading python-pymysql in S6 DB hosts to 1.0.2-2~wmf11u1 T355531
[10:17:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:10] <stashbot>	 T355531: Migrate all db-* scripts to Bookworm - https://phabricator.wikimedia.org/T355531
[10:17:24] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] P:openstack: rabbitmq: fix RABBITMQ_NODENAME [puppet] - 10https://gerrit.wikimedia.org/r/992882 (owner: 10Majavah)
[10:17:28] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T354336)', diff saved to https://phabricator.wikimedia.org/P55656 and previous config saved to /var/cache/conftool/dbconfig/20240125-101728-marostegui.json
[10:17:29] <wikibugs>	 (03CR) 10Majavah: [V: 03+1 C: 03+2] rabbitmq: fix order of invalidate_rabbitmq_guest_account [puppet] - 10https://gerrit.wikimedia.org/r/992883 (owner: 10Majavah)
[10:17:30] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
[10:17:33] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[10:17:44] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
[10:17:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2138:3314 (T354336)', diff saved to https://phabricator.wikimedia.org/P55657 and previous config saved to /var/cache/conftool/dbconfig/20240125-101750-marostegui.json
[10:20:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T354336)', diff saved to https://phabricator.wikimedia.org/P55658 and previous config saved to /var/cache/conftool/dbconfig/20240125-102002-marostegui.json
[10:21:24] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10cmooney) >>! In T355549#9487462, @Marostegui wrote: > Database hosts are depooled - @cmooney confirm if you wi...
[10:21:42] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.hosts.reimage for host cloudrabbit1003.eqiad.wmnet with OS bookworm
[10:22:04] <wikibugs>	 (03PS1) 10Majavah: wikimediacloud.org: Move RabbitMQ traffic to cloudrabbit1003 [dns] - 10https://gerrit.wikimedia.org/r/992884 (https://phabricator.wikimedia.org/T345610)
[10:27:37] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10Marostegui) Great thank you!
[10:31:37] <wikibugs>	 (03CR) 10Slyngshede: Install debmonitor-server on bookworm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/992881 (https://phabricator.wikimedia.org/T241049) (owner: 10Muehlenhoff)
[10:35:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P55659 and previous config saved to /var/cache/conftool/dbconfig/20240125-103509-marostegui.json
[10:35:55] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.eqiad.wmnet with reason: host reimage
[10:38:18] <logmsgbot>	 !log isaranto@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
[10:39:10] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.eqiad.wmnet with reason: host reimage
[10:43:18] <wikibugs>	 (03PS1) 10Hnowlan: tegola: temporarily disable maps2006 db [deployment-charts] - 10https://gerrit.wikimedia.org/r/992887 (https://phabricator.wikimedia.org/T355549)
[10:45:21] <wikibugs>	 (03CR) 10Clément Goubert: [C: 03+1] tegola: temporarily disable maps2006 db [deployment-charts] - 10https://gerrit.wikimedia.org/r/992887 (https://phabricator.wikimedia.org/T355549) (owner: 10Hnowlan)
[10:46:02] <wikibugs>	 (03CR) 10Muehlenhoff: Install debmonitor-server on bookworm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/992881 (https://phabricator.wikimedia.org/T241049) (owner: 10Muehlenhoff)
[10:48:41] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] deployment_server: add dummy oauth2-proxy secrets for jaeger [labs/private] - 10https://gerrit.wikimedia.org/r/992699 (https://phabricator.wikimedia.org/T320555) (owner: 10Filippo Giunchedi)
[10:48:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] mariadb::monitor_memory: Update package name [puppet] - 10https://gerrit.wikimedia.org/r/983721 (owner: 10Muehlenhoff)
[10:49:24] <moritzm>	 godog: merging your oauth labs-private patch
[10:49:33] <moritzm>	 done
[10:49:42] <wikibugs>	 (03PS1) 10Majavah: systemd: timer_service: Move ConditionPathExists to correct section [puppet] - 10https://gerrit.wikimedia.org/r/992888
[10:50:15] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P55660 and previous config saved to /var/cache/conftool/dbconfig/20240125-105015-marostegui.json
[10:50:58] <godog>	 moritzm: thank you!
[10:52:34] <wikibugs>	 (03PS1) 10Muehlenhoff: mariabdb::monitor_memory: Also update update name in dependency [puppet] - 10https://gerrit.wikimedia.org/r/992890
[10:52:53] <wikibugs>	 (03PS1) 10Zabe: UserGroupManager: Fix cross-wiki database access [core] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/992781 (https://phabricator.wikimedia.org/T355813)
[10:53:46] <hashar>	 kostajh: sorry I was in meeting.  `!log` the end of the backport window is often done yes, that is a good way to broadcast it has completed :)
[10:53:55] <hashar>	 the alternative is to ask / sync up here
[10:54:06] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] mariabdb::monitor_memory: Also update update name in dependency [puppet] - 10https://gerrit.wikimedia.org/r/992890 (owner: 10Muehlenhoff)
[10:54:40] <kostajh>	 hashar: T.gr mentioned +2'ing some core change, did you end up syncing that? Or was that not for a backport?
[10:54:50] <hashar>	 the cherry pick is in the pipe https://gerrit.wikimedia.org/r/c/mediawiki/core/+/992781 
[10:55:03] <hashar>	 I think taavi now how to reproduces it
[10:55:08] <hashar>	 s/now/know/
[10:55:10] <hashar>	 I will deploy it
[10:55:14] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] UserGroupManager: Fix cross-wiki database access [core] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/992781 (https://phabricator.wikimedia.org/T355813) (owner: 10Zabe)
[10:57:15] <hashar>	 looks like I can check it comparing meta vs frwiki
[10:57:17] <hashar>	 https://meta.wikimedia.org/wiki/Special:CentralAuth?target=hashar 
[10:57:23] <hashar>	 https://fr.wikipedia.org/wiki/Special:CentralAuth?target=hashar
[10:57:27] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.eqiad.wmnet with OS bookworm
[10:58:16] <wikibugs>	 (03PS1) 10Muehlenhoff: ganeti: Stop using transition package [puppet] - 10https://gerrit.wikimedia.org/r/992891
[11:00:05] <jouncebot>	 mvolz: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Services – Citoid / Zotero . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1100).
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1100)
[11:05:22] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T354336)', diff saved to https://phabricator.wikimedia.org/P55662 and previous config saved to /var/cache/conftool/dbconfig/20240125-110521-marostegui.json
[11:05:24] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
[11:05:27] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[11:05:37] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
[11:05:43] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2147.codfw.wmnet with reason: Maintenance
[11:05:57] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2147.codfw.wmnet with reason: Maintenance
[11:07:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2147 (T354336)', diff saved to https://phabricator.wikimedia.org/P55663 and previous config saved to /var/cache/conftool/dbconfig/20240125-110714-marostegui.json
[11:10:23] <wikibugs>	 (03PS1) 10Btullis: varnish: enrich X-Analytics for browser prefetch / prerender / preview [puppet] - 10https://gerrit.wikimedia.org/r/992782 (https://phabricator.wikimedia.org/T346463)
[11:11:25] <wikibugs>	 (03PS2) 10Btullis: varnish: enrich X-Analytics for browser prefetch / prerender / preview [puppet] - 10https://gerrit.wikimedia.org/r/992782 (https://phabricator.wikimedia.org/T346463)
[11:12:30] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] varnish: enrich X-Analytics for browser prefetch / prerender / preview [puppet] - 10https://gerrit.wikimedia.org/r/992782 (https://phabricator.wikimedia.org/T346463) (owner: 10Btullis)
[11:13:10] <wikibugs>	 (03PS3) 10Btullis: varnish: enrich X-Analytics for browser prefetch / prerender / preview [puppet] - 10https://gerrit.wikimedia.org/r/992782 (https://phabricator.wikimedia.org/T346463)
[11:15:22] <wikibugs>	 (03CR) 10Muehlenhoff: Upstream release v0.3.4 (031 comment) [software/debmonitor-client] (debian) - 10https://gerrit.wikimedia.org/r/992788 (owner: 10Volans)
[11:15:52] <wikibugs>	 (03Merged) 10jenkins-bot: UserGroupManager: Fix cross-wiki database access [core] (wmf/1.42.0-wmf.15) - 10https://gerrit.wikimedia.org/r/992781 (https://phabricator.wikimedia.org/T355813) (owner: 10Zabe)
[11:16:55] <wikibugs>	 (03CR) 10Muehlenhoff: "They all look harmless or are not relevant to us (like the standards version), I'd say we can ignore them and revisit later if/when we aim" [software/debmonitor-client] (debian) - 10https://gerrit.wikimedia.org/r/992788 (owner: 10Volans)
[11:19:33] <wikibugs>	 (03PS1) 10Zabe: Start reading from af_actor/afh_actor in group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992894 (https://phabricator.wikimedia.org/T355616)
[11:19:53] <hashar>	 jouncebot: now
[11:19:53] <jouncebot>	 For the next 0 hour(s) and 40 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1100)
[11:19:54] <jouncebot>	 For the next 0 hour(s) and 40 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1100)
[11:20:06] <hashar>	 I am deploying that mediawiki/core patch for CentralAuth
[11:20:52] <logmsgbot>	 !log hashar@deploy2002 Started scap: Backport for [[gerrit:992781|UserGroupManager: Fix cross-wiki database access (T355813)]]
[11:20:58] <stashbot>	 T355813: CentralAuth doesn't shows user rights correctly - https://phabricator.wikimedia.org/T355813
[11:22:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P55664 and previous config saved to /var/cache/conftool/dbconfig/20240125-112220-marostegui.json
[11:22:36] <logmsgbot>	 !log hashar@deploy2002 hashar and zabe: Backport for [[gerrit:992781|UserGroupManager: Fix cross-wiki database access (T355813)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[11:23:25] <logmsgbot>	 !log hashar@deploy2002 hashar and zabe: Continuing with sync
[11:25:24] <wikibugs>	 (03PS2) 10Volans: Upstream release v0.3.4 [software/debmonitor-client] (debian) - 10https://gerrit.wikimedia.org/r/992788
[11:25:29] <wikibugs>	 (03CR) 10Volans: "addressed comments" [software/debmonitor-client] (debian) - 10https://gerrit.wikimedia.org/r/992788 (owner: 10Volans)
[11:26:26] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
[11:26:40] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
[11:26:43] <claime>	 !log Restarting ferm.service on k8s node kubernetes2036.codfw.wmnet - T354855
[11:26:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:26:48] <stashbot>	 T354855: ferm sometimes fails to restart on Kubernetes workers via xtables lock held by kube-proxy - https://phabricator.wikimedia.org/T354855
[11:26:55] <wikibugs>	 (03CR) 10Muehlenhoff: "One final bit inline" [software/debmonitor-client] (debian) - 10https://gerrit.wikimedia.org/r/992788 (owner: 10Volans)
[11:27:00] <icinga-wm>	 RECOVERY - Check systemd state on kubernetes2036 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:27:59] <wikibugs>	 (03PS3) 10Volans: Upstream release v0.3.4 [software/debmonitor-client] (debian) - 10https://gerrit.wikimedia.org/r/992788
[11:28:01] <wikibugs>	 (03CR) 10Volans: Upstream release v0.3.4 (031 comment) [software/debmonitor-client] (debian) - 10https://gerrit.wikimedia.org/r/992788 (owner: 10Volans)
[11:28:35] <hashar>	 I will run the train after lunch
[11:29:43] <logmsgbot>	 !log hashar@deploy2002 Finished scap: Backport for [[gerrit:992781|UserGroupManager: Fix cross-wiki database access (T355813)]] (duration: 08m 50s)
[11:29:48] <stashbot>	 T355813: CentralAuth doesn't shows user rights correctly - https://phabricator.wikimedia.org/T355813
[11:31:34] <wikibugs>	 (03PS1) 10PipelineBot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/992660
[11:31:51] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Nice, ship it :-)" [software/debmonitor-client] (debian) - 10https://gerrit.wikimedia.org/r/992788 (owner: 10Volans)
[11:33:44] <wikibugs>	 (03CR) 10Vgutierrez: hiera: add acls for heavy ratelimiting abusing ip from list (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/989968 (https://phabricator.wikimedia.org/T353910) (owner: 10Fabfur)
[11:33:46] <wikibugs>	 (03PS1) 10Slyngshede: Enable debmonitor service on installation [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992898
[11:35:21] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B4 from asw-b4-codfw to lsw1-b4-codfw - https://phabricator.wikimedia.org/T355860 (10cmooney) p:05Triage→03Medium
[11:35:25] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Start reading from af_actor/afh_actor in group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992894 (https://phabricator.wikimedia.org/T355616) (owner: 10Zabe)
[11:35:39] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B4 from asw-b4-codfw to lsw1-b4-codfw - https://phabricator.wikimedia.org/T355860 (10cmooney)
[11:35:45] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[11:36:08] <wikibugs>	 (03Merged) 10jenkins-bot: Start reading from af_actor/afh_actor in group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992894 (https://phabricator.wikimedia.org/T355616) (owner: 10Zabe)
[11:36:42] <logmsgbot>	 !log zabe@deploy2002 Started scap: Backport for [[gerrit:992894|Start reading from af_actor/afh_actor in group0 wikis (T355616)]]
[11:36:48] <stashbot>	 T355616: Start reading from af_actor/afh_actor - https://phabricator.wikimedia.org/T355616
[11:36:52] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A2 from asw-a2-codfw to lsw1-a2-codfw - https://phabricator.wikimedia.org/T355861 (10cmooney) p:05Triage→03Medium
[11:37:27] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P55665 and previous config saved to /var/cache/conftool/dbconfig/20240125-113727-marostegui.json
[11:37:51] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A3 from asw-a3-codfw to lsw1-a3-codfw - https://phabricator.wikimedia.org/T355862 (10cmooney) p:05Triage→03Medium
[11:38:01] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A3 from asw-a3-codfw to lsw1-a3-codfw - https://phabricator.wikimedia.org/T355862 (10cmooney)
[11:38:11] <logmsgbot>	 !log zabe@deploy2002 zabe: Backport for [[gerrit:992894|Start reading from af_actor/afh_actor in group0 wikis (T355616)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[11:38:17] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A2 from asw-a2-codfw to lsw1-a2-codfw - https://phabricator.wikimedia.org/T355861 (10cmooney)
[11:38:43] <logmsgbot>	 !log zabe@deploy2002 zabe: Continuing with sync
[11:39:04] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A4 from asw-a4-codfw to lsw1-a4-codfw - https://phabricator.wikimedia.org/T355863 (10cmooney) p:05Triage→03Medium
[11:39:15] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[11:39:21] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A4 from asw-a4-codfw to lsw1-a4-codfw - https://phabricator.wikimedia.org/T355863 (10cmooney)
[11:39:31] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: ipoid: Fix chart default ports [deployment-charts] - 10https://gerrit.wikimedia.org/r/992899 (https://phabricator.wikimedia.org/T355167)
[11:40:28] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A2 from asw-a2-codfw to lsw1-a2-codfw - https://phabricator.wikimedia.org/T355861 (10cmooney)
[11:40:33] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[11:41:03] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[11:41:09] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A3 from asw-a3-codfw to lsw1-a3-codfw - https://phabricator.wikimedia.org/T355862 (10cmooney)
[11:42:07] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti1038.eqiad.wmnet to cluster eqiad and group D
[11:42:30] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw - https://phabricator.wikimedia.org/T355864 (10cmooney) p:05Triage→03Medium
[11:42:39] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw - https://phabricator.wikimedia.org/T355864 (10cmooney)
[11:42:45] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[11:43:28] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw - https://phabricator.wikimedia.org/T355866 (10cmooney) p:05Triage→03Medium
[11:43:35] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw - https://phabricator.wikimedia.org/T355866 (10cmooney)
[11:43:41] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[11:44:22] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1038.eqiad.wmnet to cluster eqiad and group D
[11:45:07] <logmsgbot>	 !log zabe@deploy2002 Finished scap: Backport for [[gerrit:992894|Start reading from af_actor/afh_actor in group0 wikis (T355616)]] (duration: 08m 25s)
[11:45:19] <stashbot>	 T355616: Start reading from af_actor/afh_actor - https://phabricator.wikimedia.org/T355616
[11:45:22] <wikibugs>	 (03PS1) 10PipelineBot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/992661
[11:45:23] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A7 from asw-a7-codfw to lsw1-a7-codfw - https://phabricator.wikimedia.org/T355867 (10cmooney) p:05Triage→03Medium
[11:45:31] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A7 from asw-a7-codfw to lsw1-a7-codfw - https://phabricator.wikimedia.org/T355867 (10cmooney)
[11:45:37] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[11:46:58] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868 (10cmooney) p:05Triage→03Medium
[11:47:07] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[11:47:13] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B2 from asw-b2-codfw to lsw1-b2-codfw - https://phabricator.wikimedia.org/T355868 (10cmooney)
[11:47:49] <wikibugs>	 (03CR) 10Vgutierrez: [C: 04-1] "this doesn't seem to be a bug, see I7fb15acdf1c5cd6e6b257d1de82437b33f96fbc3." [puppet] - 10https://gerrit.wikimedia.org/r/991409 (https://phabricator.wikimedia.org/T355158) (owner: 10Fabfur)
[11:52:03] <logmsgbot>	 !log jgiannelos@deploy2002 Started deploy [restbase/deploy@708f0f3]: (no justification provided)
[11:52:10] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes2036 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[11:52:12] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Create netbox script to support moving a cable from one network port to another - https://phabricator.wikimedia.org/T355869 (10cmooney) p:05Triage→03Low
[11:52:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2147 (T354336)', diff saved to https://phabricator.wikimedia.org/P55666 and previous config saved to /var/cache/conftool/dbconfig/20240125-115233-marostegui.json
[11:52:36] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2155.codfw.wmnet with reason: Maintenance
[11:52:40] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[11:52:50] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2155.codfw.wmnet with reason: Maintenance
[11:52:52] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
[11:52:53] <wikibugs>	 (03CR) 10Fabfur: [C: 03+1] "looks good to me!" [puppet] - 10https://gerrit.wikimedia.org/r/992782 (https://phabricator.wikimedia.org/T346463) (owner: 10Btullis)
[11:53:13] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B3 from asw-b3-codfw to lsw1-b3-codfw - https://phabricator.wikimedia.org/T355870 (10cmooney) p:05Triage→03Medium
[11:53:16] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
[11:53:20] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B3 from asw-b3-codfw to lsw1-b3-codfw - https://phabricator.wikimedia.org/T355870 (10cmooney)
[11:53:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2155 (T354336)', diff saved to https://phabricator.wikimedia.org/P55667 and previous config saved to /var/cache/conftool/dbconfig/20240125-115322-marostegui.json
[11:53:26] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[11:54:31] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B6 from asw-b6-codfw to lsw1-b6-codfw - https://phabricator.wikimedia.org/T355871 (10cmooney) p:05Triage→03Medium
[11:54:40] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B6 from asw-b6-codfw to lsw1-b6-codfw - https://phabricator.wikimedia.org/T355871 (10cmooney)
[11:54:46] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[11:55:29] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B7 from asw-b7-codfw to lsw1-b7-codfw - https://phabricator.wikimedia.org/T355872 (10cmooney) p:05Triage→03Medium
[11:55:40] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[11:55:46] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B7 from asw-b7-codfw to lsw1-b7-codfw - https://phabricator.wikimedia.org/T355872 (10cmooney)
[11:56:28] <wikibugs>	 (03PS1) 10Hnowlan: installserver: fix disk profiles for new k8s workers [puppet] - 10https://gerrit.wikimedia.org/r/992900 (https://phabricator.wikimedia.org/T354791)
[11:56:31] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B8 from asw-b8-codfw to lsw1-b8-codfw - https://phabricator.wikimedia.org/T355873 (10cmooney) p:05Triage→03Medium
[11:56:42] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[11:56:48] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B8 from asw-b8-codfw to lsw1-b8-codfw - https://phabricator.wikimedia.org/T355873 (10cmooney)
[11:57:35] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[12:01:00] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A8 from asw-a8-codfw to lsw1-a8-codfw - https://phabricator.wikimedia.org/T355874 (10cmooney) p:05Triage→03Medium
[12:01:09] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A8 from asw-a8-codfw to lsw1-a8-codfw - https://phabricator.wikimedia.org/T355874 (10cmooney)
[12:01:15] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[12:06:36] <moritzm>	 !log installing openssh security updates
[12:06:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:56] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/992900 (https://phabricator.wikimedia.org/T354791) (owner: 10Hnowlan)
[12:10:10] <wikibugs>	 (03CR) 10Clément Goubert: [C: 03+1] installserver: fix disk profiles for new k8s workers [puppet] - 10https://gerrit.wikimedia.org/r/992900 (https://phabricator.wikimedia.org/T354791) (owner: 10Hnowlan)
[12:12:32] <logmsgbot>	 !log jgiannelos@deploy2002 Finished deploy [restbase/deploy@708f0f3]: (no justification provided) (duration: 20m 28s)
[12:13:07] <wikibugs>	 (03CR) 10Muehlenhoff: Enable debmonitor service on installation (031 comment) [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992898 (owner: 10Slyngshede)
[12:19:52] <wikibugs>	 (03PS31) 10Fabfur: hiera: add acls for heavy ratelimiting abusing ip from list [puppet] - 10https://gerrit.wikimedia.org/r/989968 (https://phabricator.wikimedia.org/T353910)
[12:21:48] <wikibugs>	 (03PS32) 10Fabfur: hiera: add acls for heavy ratelimiting abusing ip from list [puppet] - 10https://gerrit.wikimedia.org/r/989968 (https://phabricator.wikimedia.org/T353910)
[12:23:03] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[12:24:21] <wikibugs>	 (03CR) 10Fabfur: [V: 03+1] "PCC SUCCESS (DIFF 1 CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/989968 (https://phabricator.wikimedia.org/T353910) (owner: 10Fabfur)
[12:25:14] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:25:55] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/992662
[12:26:23] <wikibugs>	 (03PS5) 10AOkoth: vrts: enable connection pooling [puppet] - 10https://gerrit.wikimedia.org/r/988679
[12:26:24] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney)
[12:26:34] <wikibugs>	 10SRE, 10ops-codfw, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack A8 from asw-a8-codfw to lsw1-a8-codfw - https://phabricator.wikimedia.org/T355874 (10cmooney)
[12:26:53] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (LIST pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[12:28:26] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:29:48] * topranks looking 
[12:30:04] <wikibugs>	 (03PS2) 10Slyngshede: Enable debmonitor service on installation [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992898
[12:30:13] <wikibugs>	 (03CR) 10Slyngshede: Enable debmonitor service on installation (031 comment) [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992898 (owner: 10Slyngshede)
[12:31:14] <claime>	 topranks: check with hnowlan if it's not the hosts he's working on
[12:31:15] <_joe_>	 jouncebot: nowandnext
[12:31:16] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 28 minute(s)
[12:31:16] <jouncebot>	 In 0 hour(s) and 28 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1300)
[12:31:53] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (LIST pods) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[12:32:05] <topranks>	 claime: ok will do 
[12:32:36] <topranks>	 hnowlan: alert for BGP on cr in codfw, for hosts mw2395, mw2267 and mw2357 - that related to anything you're doing?
[12:32:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Ship it" [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992898 (owner: 10Slyngshede)
[12:33:02] <hnowlan>	 topranks: yeah, I just drained those hosts 
[12:33:18] <topranks>	 hnowlan: all good, I acked the alert there 
[12:33:22] <hnowlan>	 thanks
[12:34:18] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+2] Enable debmonitor service on installation [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992898 (owner: 10Slyngshede)
[12:34:40] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (3) rsyslog on mw2267:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[12:35:04] <wikibugs>	 (03CR) 10Fabfur: hiera: add acls for heavy ratelimiting abusing ip from list (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/989968 (https://phabricator.wikimedia.org/T353910) (owner: 10Fabfur)
[12:35:08] <hnowlan>	 me also, acked 
[12:35:46] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] installserver: fix disk profiles for new k8s workers [puppet] - 10https://gerrit.wikimedia.org/r/992900 (https://phabricator.wikimedia.org/T354791) (owner: 10Hnowlan)
[12:37:25] <wikibugs>	 (03Merged) 10jenkins-bot: Enable debmonitor service on installation [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992898 (owner: 10Slyngshede)
[12:38:18] <wikibugs>	 (03Abandoned) 10Muehlenhoff: Bump standards version [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/981293 (owner: 10Muehlenhoff)
[12:39:40] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] varnish: enrich X-Analytics for browser prefetch / prerender / preview [puppet] - 10https://gerrit.wikimedia.org/r/992782 (https://phabricator.wikimedia.org/T346463) (owner: 10Btullis)
[12:41:15] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.reimage for host mw2357.codfw.wmnet with OS bullseye
[12:41:30] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.reimage for host mw2395.codfw.wmnet with OS bullseye
[12:41:47] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.reimage for host mw2267.codfw.wmnet with OS bullseye
[12:42:25] <wikibugs>	 (03CR) 10Muehlenhoff: "Ack, I'll do that now." [puppet] - 10https://gerrit.wikimedia.org/r/989090 (https://phabricator.wikimedia.org/T329529) (owner: 10Muehlenhoff)
[12:43:16] <hashar>	 I will promote the wikis at 13:00 UTC (17 minutes from now)
[12:47:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] eventschemas: Select the custom nginx provider with no additional modules [puppet] - 10https://gerrit.wikimedia.org/r/989090 (https://phabricator.wikimedia.org/T329529) (owner: 10Muehlenhoff)
[12:53:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T354336)', diff saved to https://phabricator.wikimedia.org/P55669 and previous config saved to /var/cache/conftool/dbconfig/20240125-125353-marostegui.json
[12:54:00] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[12:57:41] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw2395.codfw.wmnet with reason: host reimage
[12:58:29] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw2357.codfw.wmnet with reason: host reimage
[12:58:57] <logmsgbot>	 !log hnowlan@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw2267.codfw.wmnet with reason: host reimage
[13:00:04] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1300)
[13:01:57] <wikibugs>	 (03CR) 10Majavah: "A rather large PCC run can be seen here: https://puppet-compiler.wmflabs.org/output/992888/1210/" [puppet] - 10https://gerrit.wikimedia.org/r/992888 (owner: 10Majavah)
[13:02:13] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2395.codfw.wmnet with reason: host reimage
[13:02:21] <topranks>	 !log draining VMs from ganeti2021 ahead of codfw rack b5 maintenance T355549
[13:02:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:02:27] <stashbot>	 T355549: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549
[13:02:53] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet
[13:04:12] <wikibugs>	 (03PS1) 10TrainBranchBot: group2 wikis to 1.42.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992927 (https://phabricator.wikimedia.org/T354433)
[13:04:14] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] group2 wikis to 1.42.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992927 (https://phabricator.wikimedia.org/T354433) (owner: 10TrainBranchBot)
[13:04:58] <wikibugs>	 (03Merged) 10jenkins-bot: group2 wikis to 1.42.0-wmf.15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992927 (https://phabricator.wikimedia.org/T354433) (owner: 10TrainBranchBot)
[13:05:00] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2267.codfw.wmnet with reason: host reimage
[13:08:18] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2357.codfw.wmnet with reason: host reimage
[13:09:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P55670 and previous config saved to /var/cache/conftool/dbconfig/20240125-130900-marostegui.json
[13:12:46] <logmsgbot>	 !log hashar@deploy2002 rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.15  refs T354433
[13:13:11] <stashbot>	 T354433: 1.42.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T354433
[13:14:38] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: s6 #page on db2129 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1054, Errmsg: Error Unknown column user_is_temp in field list on query. Default database: jawiki. [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[13:15:03] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] ganeti: Stop using transition package [puppet] - 10https://gerrit.wikimedia.org/r/992891 (owner: 10Muehlenhoff)
[13:15:16] <jynus>	 Amir1: ^
[13:15:24] <marostegui>	 mmmm
[13:15:25] <jynus>	 is it only 1 host?
[13:15:28] <hashar>	 scap failed to sync on mw2267 mw2395 and mw2357, I am assuming they are being reimaged
[13:15:31] <Amir1>	 let me check
[13:15:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2129', diff saved to https://phabricator.wikimedia.org/P55671 and previous config saved to /var/cache/conftool/dbconfig/20240125-131547-marostegui.json
[13:15:50] <Lucas_WMDE>	 both of those are listed in T354791
[13:15:51] <marostegui>	 depooled for now
[13:15:56] <jynus>	 thanks, marostegui
[13:16:00] <Lucas_WMDE>	 (reclaiming jobrunners for k8s)
[13:16:01] <jynus>	 if it is 1 host, no biggie
[13:16:05] <stashbot>	 T354791: Reclaim jobrunner hardware for k8s - https://phabricator.wikimedia.org/T354791
[13:16:18] <marostegui>	 it is the candidate master
[13:16:23] <jynus>	 uf
[13:16:31] * topranks here 
[13:16:44] <wikibugs>	 10SRE-Sprint-Week-Sustainability-March2023, 10Beta-Cluster-Infrastructure, 10DBA, 10MediaWiki-libs-Rdbms, 10Epic: Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255 (10Reedy)
[13:16:50] <Lucas_WMDE>	 *all three of the hosts that hashar listed
[13:16:53] <hashar>	 I also promoted the wikis a few minutes which might cause various issues
[13:16:53] * Lucas_WMDE shuts up while dbas talk
[13:17:04] <Amir1>	 marostegui: Made T355885 
[13:17:05] <stashbot>	 T355885: replication broken on db2129 - https://phabricator.wikimedia.org/T355885
[13:17:14] <marostegui>	 k
[13:17:15] <hashar>	 Lucas_WMDE: thank you to have verified! :)
[13:17:24] <Amir1>	 ah, I know what's going on
[13:17:35] <jynus>	 I pinged Amir because he was the person that had more prob to know about it
[13:17:35] <Amir1>	 I fix it
[13:17:47] <jynus>	 and it seems I wasn't wrong :-D
[13:18:07] <marostegui>	 It might be a leftover of https://phabricator.wikimedia.org/T336886
[13:18:37] <jynus>	 If mw looks sane we can descalate the response and let you handle it on the ticket not in a rush?
[13:18:37] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
[13:18:46] <Amir1>	 yup
[13:18:52] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
[13:18:56] <Amir1>	 I'm sure I ran --check everywhere
[13:19:09] <Amir1>	 The schema change is running
[13:19:27] <marostegui>	 Amir1: can you check frwiki, ruwiki and labswiki as well on that host?
[13:19:32] <icinga-wm>	 RECOVERY - MariaDB Replica SQL: s6 #page on db2129 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[13:19:42] <Amir1>	 marostegui: the script automatically does
[13:19:48] <marostegui>	 Amir1: ok
[13:19:53] <jynus>	 although discuss with hashar, as that could have been the trigger (deployment)
[13:20:03] <marostegui>	 Amir1: all clean? Can I repool?
[13:20:11] <Amir1>	 yes, the patch that starts writing to it got merged in this train
[13:20:18] <Amir1>	 marostegui: yes, thanks!
[13:20:43] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: After T355885', diff saved to https://phabricator.wikimedia.org/P55672 and previous config saved to /var/cache/conftool/dbconfig/20240125-132043-root.json
[13:21:16] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2395.codfw.wmnet with OS bullseye
[13:21:17] <Amir1>	 I run check again
[13:21:45] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:24:04] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2267.codfw.wmnet with OS bullseye
[13:24:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P55673 and previous config saved to /var/cache/conftool/dbconfig/20240125-132407-marostegui.json
[13:24:45] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 241, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:24:55] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Upstream release v0.3.4 [software/debmonitor-client] (debian) - 10https://gerrit.wikimedia.org/r/992788 (owner: 10Volans)
[13:25:01] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet
[13:25:03] <topranks>	 !log stopping logstash service on logstash2025 to faciliate VM migration T355549
[13:25:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:09] <stashbot>	 T355549: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549
[13:26:18] <wikibugs>	 (03PS1) 10Slyngshede: Bump version number for Debian package release to 0.4.0. [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992928
[13:26:21] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet
[13:26:25] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet
[13:26:58] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet
[13:27:03] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet
[13:27:12] <Amir1>	 hashar: for the sake of the train, I checked everything again (except s3 because checking every db in every replica gonna take at least six hours) and it was fine
[13:27:33] <hashar>	 <3
[13:27:41] <Amir1>	 Sorry for the mess :D
[13:27:50] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2022.codfw.wmnet
[13:28:01] <topranks>	 !log draining VMs from ganeti2022 ahead of codfw rack b5 maintenance T355549
[13:28:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:28:17] <logmsgbot>	 !log hnowlan@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2357.codfw.wmnet with OS bullseye
[13:28:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] Bump version number for Debian package release to 0.4.0. [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992928 (owner: 10Slyngshede)
[13:28:46] <wikibugs>	 (03CR) 10Ayounsi: "Let's add it to the pile of things to check after upgrading Netbox :)" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/985113 (https://phabricator.wikimedia.org/T303529) (owner: 10Ayounsi)
[13:29:27] <hashar>	 Amir1: as long as the mess is handled by someone, I am all fine with eggs being broken
[13:29:39] <hashar>	 (something like that, I don't know how to translate the french idiom I have in mind)
[13:29:41] <Amir1>	 Thanks <3
[13:29:53] <hashar>	 we should invent our own idioms
[13:30:09] <hashar>	 "don't put the carrot in the fridge when your DBA have an umbrella"
[13:30:17] <wikibugs>	 (03CR) 10Majavah: [C: 03+1] "+1. This means that any tools doing something stupid like `var_dump( $_SERVER );` will now leak their database credentials but that feels " [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/988498 (https://phabricator.wikimedia.org/T354320) (owner: 10David Caro)
[13:32:05] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2022.codfw.wmnet
[13:32:49] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] lighthttpd: don't remove environment vars [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/988498 (https://phabricator.wikimedia.org/T354320) (owner: 10David Caro)
[13:32:59] * hashar whistles at LiquidThreads
[13:33:21] <hashar>	 ¡log Undeploying LiquidThreads
[13:33:42] <Amir1>	 ooh, that'd be nice :D
[13:33:45] <Lucas_WMDE>	 if only
[13:33:52] <Amir1>	 or even Flow
[13:34:15] <hashar>	 they are both in the pipes as I got it
[13:34:43] <hashar>	 I was almost going to file my Annual Planning Santa wishlist asking for both to be prioritized for decommissioned
[13:34:49] <hashar>	 decommissionment
[13:34:55] <hashar>	 well something like that
[13:35:24] <hashar>	 and we have: https://phabricator.wikimedia.org/T350164 `[Spike] Investigate Undeploying LiquidThreads`
[13:35:38] <hashar>	 and https://phabricator.wikimedia.org/T332022 `[Epic] Undeploying StructuredDiscussions (Flow)`
[13:35:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: After T355885', diff saved to https://phabricator.wikimedia.org/P55674 and previous config saved to /var/cache/conftool/dbconfig/20240125-133547-root.json
[13:36:01] <stashbot>	 T355885: replication broken on db2129 - https://phabricator.wikimedia.org/T355885
[13:38:51] <jinxer-wm>	 (RdfStreamingUpdaterSpaceUsageTooHigh) firing: (2) The RDF Streaming Updater is using more than 50GiB of storage - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterSpaceUsageTooHigh
[13:39:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T354336)', diff saved to https://phabricator.wikimedia.org/P55675 and previous config saved to /var/cache/conftool/dbconfig/20240125-133913-marostegui.json
[13:39:16] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2172.codfw.wmnet with reason: Maintenance
[13:39:24] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[13:39:29] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2172.codfw.wmnet with reason: Maintenance
[13:39:36] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2172 (T354336)', diff saved to https://phabricator.wikimedia.org/P55676 and previous config saved to /var/cache/conftool/dbconfig/20240125-133935-marostegui.json
[13:40:06] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v0.3.4 [software/debmonitor-client] (debian) - 10https://gerrit.wikimedia.org/r/992788 (owner: 10Volans)
[13:40:10] <wikibugs>	 (03Merged) 10jenkins-bot: lighthttpd: don't remove environment vars [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/988498 (https://phabricator.wikimedia.org/T354320) (owner: 10David Caro)
[13:41:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T354336)', diff saved to https://phabricator.wikimedia.org/P55677 and previous config saved to /var/cache/conftool/dbconfig/20240125-134147-marostegui.json
[13:43:56] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
[13:47:48] <volans>	 !log uploaded debmonitor-client_0.3.4 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
[13:47:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:47:54] <volans>	 moritzm: ^^^
[13:48:56] <moritzm>	 thanks, will take care of the rollout in a bit
[13:50:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: After T355885', diff saved to https://phabricator.wikimedia.org/P55678 and previous config saved to /var/cache/conftool/dbconfig/20240125-135052-root.json
[13:51:17] <stashbot>	 T355885: replication broken on db2129 - https://phabricator.wikimedia.org/T355885
[13:53:18] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
[13:56:54] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P55679 and previous config saved to /var/cache/conftool/dbconfig/20240125-135653-marostegui.json
[14:00:05] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: Time to do the UTC afternoon backport window deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1400).
[14:00:05] <jouncebot>	 anzx: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:03:53] <Lucas_WMDE>	 hm, I don’t see anything in https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1400
[14:03:57] <icinga-wm>	 PROBLEM - Check systemd state on ml-serve2005 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:04:04] <Lucas_WMDE>	 ah, https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=prev&oldid=2142737
[14:05:40] <logmsgbot>	 !log btullis@cumin1002 START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
[14:05:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: After T355885', diff saved to https://phabricator.wikimedia.org/P55680 and previous config saved to /var/cache/conftool/dbconfig/20240125-140557-root.json
[14:06:04] <stashbot>	 T355885: replication broken on db2129 - https://phabricator.wikimedia.org/T355885
[14:08:50] <wikibugs>	 (03PS1) 10Jdlrobson: Begin capturing errors for Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992931
[14:12:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P55681 and previous config saved to /var/cache/conftool/dbconfig/20240125-141200-marostegui.json
[14:15:30] <moritzm>	 !log failover ganeti master for codfw to ganeti2020 T355549
[14:15:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:15:35] <stashbot>	 T355549: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549
[14:15:40] <wikibugs>	 (03PS3) 10Zabe: foreachwikiindblist: Return early when no arg is passed [puppet] - 10https://gerrit.wikimedia.org/r/992263
[14:17:05] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ml-serve2005 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[14:18:04] <logmsgbot>	 !log btullis@cumin1002 END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
[14:19:31] <icinga-wm>	 PROBLEM - ganeti-wconfd running on ganeti2022 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (gnt-masterd), command name ganeti-wconfd https://wikitech.wikimedia.org/wiki/Ganeti
[14:21:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: After T355885', diff saved to https://phabricator.wikimedia.org/P55682 and previous config saved to /var/cache/conftool/dbconfig/20240125-142102-root.json
[14:21:03] <claime>	 !log Draining kubernetes2031 - T355549
[14:21:09] <stashbot>	 T355885: replication broken on db2129 - https://phabricator.wikimedia.org/T355885
[14:21:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:15] <stashbot>	 T355549: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549
[14:22:13] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:22:33] <icinga-wm>	 RECOVERY - BGP status on lsw1-b2-codfw.mgmt is OK: BGP OK - up: 2, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:23:51] <claime>	 !log Draining kubernetes2032 - T355549
[14:23:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:13] <icinga-wm>	 PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The following units failed: httpbb_kubernetes_mw-web_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:25:29] <claime>	 ^probably me, will relaunch once done
[14:25:38] <claime>	 !log Draining kubernetes2033 - T355549
[14:25:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:25:55] <claime>	 !log Draining kubernetes2023 - T355549
[14:25:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:02] <claime>	 With the right node, better.
[14:26:21] <moritzm>	 !log installing debmonitor-client 0.3.4 fleet-wide
[14:26:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:27:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T354336)', diff saved to https://phabricator.wikimedia.org/P55683 and previous config saved to /var/cache/conftool/dbconfig/20240125-142706-marostegui.json
[14:27:09] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2179.codfw.wmnet with reason: Maintenance
[14:27:12] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[14:27:23] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2179.codfw.wmnet with reason: Maintenance
[14:27:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2179 (T354336)', diff saved to https://phabricator.wikimedia.org/P55684 and previous config saved to /var/cache/conftool/dbconfig/20240125-142729-marostegui.json
[14:28:53] <icinga-wm>	 RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:29:55] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - linkrecommendation-external_4006: Servers kubernetes2058.codfw.wmnet, kubernetes2017.codfw.wmnet, kubernetes2011.codfw.wmnet, kubernetes2026.codfw.wmnet, kubernetes2015.codfw.wmnet, kubernetes2052.codfw.wmnet, kubernetes2028.codfw.wmnet, kubernetes2059.codfw.wmnet, kubernetes2013.codfw.wmnet, kubernetes2049.codfw.wmnet, kubernetes2040.codfw.wmnet, ku
[14:29:55] <icinga-wm>	 2041.codfw.wmnet, kubernetes2037.codfw.wmnet, kubernetes2006.codfw.wmnet, kubernetes2035.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[14:30:35] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:30:55] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[14:34:38] <claime>	 !log Depooling parse2006 (setting inactive) - T355549
[14:34:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:47] <stashbot>	 T355549: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549
[14:34:56] <logmsgbot>	 !log cgoubert@cumin2002 conftool action : set/pooled=inactive; selector: name=parse2006.codfw.wmnet
[14:35:13] <claime>	 !log Depooling parse2007 (setting inactive) - T355549
[14:35:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:35:20] <logmsgbot>	 !log cgoubert@cumin2002 conftool action : set/pooled=inactive; selector: name=parse2007.codfw.wmnet
[14:37:04] <wikibugs>	 10ops-codfw, 10DC-Ops, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Unable to reimage elastic2088 and elastic2094 to bullseye - https://phabricator.wikimedia.org/T355830 (10bking)
[14:39:21] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:50:14] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, 10cloud-services-team (FY2023/2024-Q1-Q2): cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10Jclark-ctr) Submitted all new tsr reports along with smartctl data
[14:54:33] <wikibugs>	 (03PS1) 10Ssingh: wikimedia.org: add DKIM selectors for store.wm.org [dns] - 10https://gerrit.wikimedia.org/r/992936 (https://phabricator.wikimedia.org/T355835)
[14:59:22] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:05:34] <icinga-wm>	 RECOVERY - Check systemd state on ml-serve2005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:07:22] <wikibugs>	 10SRE, 10Data Products (Data Products Sprint 08): Forward ops-dumps@wikimedia.org to data-engineering-alerts@lists.wikimedia.org - https://phabricator.wikimedia.org/T355891 (10xcollazo)
[15:10:05] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: wdqs::internal
[15:10:14] <icinga-wm>	 RECOVERY - Disk space on stat1005 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=stat1005&var-datasource=eqiad+prometheus/ops
[15:12:33] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch wdqs::internal to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/992940 (https://phabricator.wikimedia.org/T349619)
[15:12:49] <wikibugs>	 (03PS2) 10Muehlenhoff: Switch wdqs::internal to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/992940 (https://phabricator.wikimedia.org/T349619)
[15:14:31] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] tegola: temporarily disable maps2006 db [deployment-charts] - 10https://gerrit.wikimedia.org/r/992887 (https://phabricator.wikimedia.org/T355549) (owner: 10Hnowlan)
[15:15:07] <wikibugs>	 (03PS1) 10Zabe: Start reading from af_actor/afh_actor in group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992942 (https://phabricator.wikimedia.org/T355616)
[15:15:09] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch wdqs::internal to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/992940 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[15:15:26] <wikibugs>	 (03Merged) 10jenkins-bot: tegola: temporarily disable maps2006 db [deployment-charts] - 10https://gerrit.wikimedia.org/r/992887 (https://phabricator.wikimedia.org/T355549) (owner: 10Hnowlan)
[15:17:34] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ml-serve2005 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[15:18:57] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
[15:19:04] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+2] Bump version number for Debian package release to 0.4.0. [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992928 (owner: 10Slyngshede)
[15:19:17] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
[15:20:33] <logmsgbot>	 !log hnowlan@puppetmaster1001 conftool action : set/pooled=no; selector: name=maps2006.cofw.wmnet
[15:20:43] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team (Hardware): Cloudvirt1063.eqiad.wmnet overheating - https://phabricator.wikimedia.org/T353408 (10Jclark-ctr) updated system settings server is back up now
[15:20:56] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wdqs::internal
[15:21:48] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[15:22:03] <wikibugs>	 (03PS5) 10Ssingh: P:dns::auth: add support for depooling authdns via confd [puppet] - 10https://gerrit.wikimedia.org/r/980427 (https://phabricator.wikimedia.org/T347054)
[15:22:05] <wikibugs>	 (03Merged) 10jenkins-bot: Bump version number for Debian package release to 0.4.0. [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/992928 (owner: 10Slyngshede)
[15:23:36] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (CORE_DIFF 1 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/980427 (https://phabricator.wikimedia.org/T347054) (owner: 10Ssingh)
[15:25:38] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-role for role: wcqs::public
[15:25:52] <wikibugs>	 (03CR) 10Ssingh: P:dns::auth: add support for depooling authdns via confd (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/980427 (https://phabricator.wikimedia.org/T347054) (owner: 10Ssingh)
[15:27:07] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch wcqs::public to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/992966 (https://phabricator.wikimedia.org/T349619)
[15:28:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2179 (T354336)', diff saved to https://phabricator.wikimedia.org/P55687 and previous config saved to /var/cache/conftool/dbconfig/20240125-152801-marostegui.json
[15:28:17] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[15:28:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch wcqs::public to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/992966 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[15:29:13] <wikibugs>	 (03CR) 10Slyngshede: Install debmonitor-server on bookworm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/992881 (https://phabricator.wikimedia.org/T241049) (owner: 10Muehlenhoff)
[15:29:48] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+1] "The Debian package has been updated, so the existing patch is good to go" [puppet] - 10https://gerrit.wikimedia.org/r/992881 (https://phabricator.wikimedia.org/T241049) (owner: 10Muehlenhoff)
[15:33:33] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wcqs::public
[15:38:38] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove obsolete Hiera entries [puppet] - 10https://gerrit.wikimedia.org/r/992967 (https://phabricator.wikimedia.org/T354959)
[15:39:03] <wikibugs>	 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff)
[15:39:23] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team (Hardware): Cloudvirt1063.eqiad.wmnet overheating - https://phabricator.wikimedia.org/T353408 (10Andrew) thanks! Let's let this sit w/out workload for a week or so and see if stays up, then we can try giving it some work to do.
[15:43:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P55688 and previous config saved to /var/cache/conftool/dbconfig/20240125-154307-marostegui.json
[15:43:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Install debmonitor-server on bookworm [puppet] - 10https://gerrit.wikimedia.org/r/992881 (https://phabricator.wikimedia.org/T241049) (owner: 10Muehlenhoff)
[15:46:04] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on asw-b-codfw,lsw1-b5-codfw.mgmt with reason: prepping for server uplink migration
[15:46:31] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on asw-b-codfw,lsw1-b5-codfw.mgmt with reason: prepping for server uplink migration
[15:46:40] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=34ae871a-7149-43dd-8180-02ddd5b8c983) set by...
[15:46:57] <topranks>	 !log configuring lsw1-b5-codfw switch ports for servers to be moved T355549
[15:47:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:02] <stashbot>	 T355549: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549
[15:47:42] <wikibugs>	 (03PS1) 10Stevemunene: Remove dummy-keytabs for decommissioned druid hosts [labs/private] - 10https://gerrit.wikimedia.org/r/992968 (https://phabricator.wikimedia.org/T336043)
[15:49:50] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] deployment_server: add dummy oauth2-proxy secrets for jaeger [labs/private] - 10https://gerrit.wikimedia.org/r/992699 (https://phabricator.wikimedia.org/T320555) (owner: 10Filippo Giunchedi)
[15:50:03] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] "Done" [deployment-charts] - 10https://gerrit.wikimedia.org/r/984143 (https://phabricator.wikimedia.org/T320555) (owner: 10Filippo Giunchedi)
[15:50:57] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977 (10cmooney) Just an update here, the restriction still exists however I think I know how I went wrong.  In order for the irb interface to be "up" the associated vlan ne...
[15:51:58] <topranks>	 !log disabling puppet fleet-wide to allow for maintenance in codfw rack b5 which hosts puppetmaster2003 T355549
[15:52:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:52:12] <stashbot>	 T355549: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549
[15:54:50] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'preparing to clone db2169 on db2196 as per TT343674', diff saved to https://phabricator.wikimedia.org/P55689 and previous config saved to /var/cache/conftool/dbconfig/20240125-155450-arnaudb.json
[15:56:52] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [labs/private] - 10https://gerrit.wikimedia.org/r/992968 (https://phabricator.wikimedia.org/T336043) (owner: 10Stevemunene)
[15:57:12] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 1:30:00 on cr[1-2]-codfw with reason: prepping for server uplink migration
[15:57:28] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on cr[1-2]-codfw with reason: prepping for server uplink migration
[15:57:37] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=e2f0518c-1df7-4528-89a1-5f2b248a7520) set by...
[15:58:02] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 1:30:00 on 32 hosts with reason: Migrating servers in codfw rack B5 to lsw1-b5-codfw T355549
[15:58:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P55690 and previous config saved to /var/cache/conftool/dbconfig/20240125-155813-marostegui.json
[15:58:17] <stashbot>	 T355549: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549
[15:58:31] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 32 hosts with reason: Migrating servers in codfw rack B5 to lsw1-b5-codfw T355549
[16:02:22] <icinga-wm>	 PROBLEM - Disk space on stat1005 is CRITICAL: DISK CRITICAL - free space: / 1959 MB (2% inode=83%): /tmp 1959 MB (2% inode=83%): /var/tmp 1959 MB (2% inode=83%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=stat1005&var-datasource=eqiad+prometheus/ops
[16:03:23] <topranks>	 !log Network maintenance codfw rack b5 underway T355549
[16:03:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:03:48] <stashbot>	 T355549: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549
[16:10:16] <icinga-wm>	 PROBLEM - Host ml-staging-ctrl2001 is DOWN: PING CRITICAL - Packet loss = 100%
[16:13:21] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2179 (T354336)', diff saved to https://phabricator.wikimedia.org/P55691 and previous config saved to /var/cache/conftool/dbconfig/20240125-161320-marostegui.json
[16:13:49] <stashbot>	 T354336: Add columns cul_result_id and cul_result_plaintext_id to cu_log - https://phabricator.wikimedia.org/T354336
[16:14:09] <jinxer-wm>	 (KubernetesCalicoDown) firing: ml-staging-ctrl2001.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlstaging&var-instance=ml-staging-ctrl2001.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[16:14:21] <jinxer-wm>	 (ProbeDown) firing: (2) Service ml-staging-ctrl2001:6443 has failed probes (http_ml_staging_codfw_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#ml-staging-ctrl2001:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:14:22] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job ganeti in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:15:44] <wikibugs>	 (03PS1) 10Hnowlan: kubernetes: make 5 jobrunners kubernetes workers [puppet] - 10https://gerrit.wikimedia.org/r/992973 (https://phabricator.wikimedia.org/T354791)
[16:16:48] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus: Disable cloudelastic writes to testwiki and mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992974 (https://phabricator.wikimedia.org/T352335)
[16:17:28] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] cirrus: Disable cloudelastic writes to testwiki and mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992974 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[16:19:09] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+2] cirrus updater: Align consumer-devnull with deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/992806 (owner: 10Ebernhardson)
[16:19:54] <wikibugs>	 (03Abandoned) 10Ebernhardson: cirrus updater: Remove consumer start time override [deployment-charts] - 10https://gerrit.wikimedia.org/r/975321 (owner: 10Ebernhardson)
[16:20:23] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus updater: Align consumer-devnull with deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/992806 (owner: 10Ebernhardson)
[16:24:07] <wikibugs>	 (03PS1) 10Jgiannelos: mobileapps: Use core /page/html output in all envs [deployment-charts] - 10https://gerrit.wikimedia.org/r/992975
[16:26:53] <wikibugs>	 (03PS2) 10Jgiannelos: mobileapps: Use core /page/html output in all envs [deployment-charts] - 10https://gerrit.wikimedia.org/r/992975 (https://phabricator.wikimedia.org/T339865)
[16:27:15] <wikibugs>	 10SRE, 10Data Products: Forward ops-dumps@wikimedia.org to data-engineering-alerts@lists.wikimedia.org - https://phabricator.wikimedia.org/T355891 (10xcollazo)
[16:28:04] <wikibugs>	 (03PS1) 10Hnowlan: Revert "tegola: temporarily disable maps2006 db" [deployment-charts] - 10https://gerrit.wikimedia.org/r/992986
[16:28:11] <wikibugs>	 (03PS2) 10Ebernhardson: cirrus updater: Expand test deployment to prod+cloudelastic [deployment-charts] - 10https://gerrit.wikimedia.org/r/979147 (https://phabricator.wikimedia.org/T352335)
[16:29:36] <claime>	 !log uncordoning kubernetes2031 - T355549
[16:29:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:54] <stashbot>	 T355549: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549
[16:30:11] <wikibugs>	 (03CR) 10BCornwall: [C: 03+1] wikimedia.org: add DKIM selectors for store.wm.org [dns] - 10https://gerrit.wikimedia.org/r/992936 (https://phabricator.wikimedia.org/T355835) (owner: 10Ssingh)
[16:31:50] <icinga-wm>	 RECOVERY - Host ml-staging-ctrl2001 is UP: PING OK - Packet loss = 0%, RTA = 56.51 ms
[16:32:31] <claime>	 !log uncordoning kubernetes2032 - T355549
[16:32:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:32:41] <claime>	 !log uncordoning kubernetes2023 - T355549
[16:33:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:33:34] <claime>	 !log repooling parse2006 - T355549
[16:33:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:33:44] <logmsgbot>	 !log cgoubert@cumin2002 conftool action : set/pooled=yes; selector: name=parse2006.codfw.wmnet
[16:34:09] <jinxer-wm>	 (KubernetesCalicoDown) resolved: ml-staging-ctrl2001.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s-mlstaging&var-instance=ml-staging-ctrl2001.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[16:34:15] <claime>	 !log repooling parse2007 - T355549
[16:34:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:20] <jinxer-wm>	 (ProbeDown) resolved: (2) Service ml-staging-ctrl2001:6443 has failed probes (http_ml_staging_codfw_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#ml-staging-ctrl2001:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:34:21] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job ganeti in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:34:23] <logmsgbot>	 !log cgoubert@cumin2002 conftool action : set/pooled=yes; selector: name=parse2007.codfw.wmnet
[16:35:55] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10cmooney) Migration done!  Serious props to @papaul and @Jhancock.wm for the smooth and super-fast execution!...
[16:36:12] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] wikimediacloud.org: Move RabbitMQ traffic to cloudrabbit1003 [dns] - 10https://gerrit.wikimedia.org/r/992884 (https://phabricator.wikimedia.org/T345610) (owner: 10Majavah)
[16:38:57] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: Switch canaries to 0.1% OpenTelemetry sampling [puppet] - 10https://gerrit.wikimedia.org/r/984814 (https://phabricator.wikimedia.org/T351566)
[16:40:16] <wikibugs>	 (03PS6) 10Alexandros Kosiaris: Switch canaries to 0.1% OpenTelemetry sampling [puppet] - 10https://gerrit.wikimedia.org/r/984814 (https://phabricator.wikimedia.org/T351566)
[16:41:56] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.remove-downtime for cr[1-2]-codfw
[16:41:57] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr[1-2]-codfw
[16:42:49] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.remove-downtime for 32 hosts
[16:43:02] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 32 hosts
[16:43:12] <icinga-wm>	 PROBLEM - cassandra-b SSL 10.192.16.83:7000 on restbase2013 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
[16:43:20] <icinga-wm>	 PROBLEM - cassandra-a service on restbase2013 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is inactive https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[16:43:44] <icinga-wm>	 PROBLEM - cassandra-c service on restbase2013 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is inactive https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[16:43:48] <icinga-wm>	 PROBLEM - cassandra-b CQL 10.192.16.83:9042 on restbase2013 is CRITICAL: connect to address 10.192.16.83 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[16:43:50] <icinga-wm>	 PROBLEM - cassandra-b service on restbase2013 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is inactive https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[16:44:22] <icinga-wm>	 PROBLEM - cassandra-c SSL 10.192.16.84:7000 on restbase2013 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
[16:44:36] <icinga-wm>	 PROBLEM - cassandra-c CQL 10.192.16.84:9042 on restbase2013 is CRITICAL: connect to address 10.192.16.84 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[16:45:09] <wikibugs>	 (03Abandoned) 10Andrew Bogott: base: puppet_alert: don't advertise the disable file [puppet] - 10https://gerrit.wikimedia.org/r/868221 (owner: 10Majavah)
[16:48:01] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2013.codfw.wmnet with reason: Decommissioning — T352469
[16:48:17] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2013.codfw.wmnet with reason: Decommissioning — T352469
[16:48:19] <stashbot>	 T352469: Decommission restbase20[13-20]) - https://phabricator.wikimedia.org/T352469
[16:48:34] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.hosts.decommission for hosts cloudrabbit[1001-1002].wikimedia.org
[16:48:45] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] cirrus updater: Expand test deployment to prod+cloudelastic [deployment-charts] - 10https://gerrit.wikimedia.org/r/979147 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[16:49:08] <logmsgbot>	 !log eevans@cumin1002 START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2014.codfw.wmnet with reason: Decommissioning — T352469
[16:49:12] <logmsgbot>	 !log eevans@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2014.codfw.wmnet with reason: Decommissioning — T352469
[16:51:23] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] wikimedia.org: add DKIM selectors for store.wm.org [dns] - 10https://gerrit.wikimedia.org/r/992936 (https://phabricator.wikimedia.org/T355835) (owner: 10Ssingh)
[16:51:37] <wikibugs>	 (03PS2) 10Ssingh: wikimedia.org: add DKIM selectors for store.wm.org [dns] - 10https://gerrit.wikimedia.org/r/992936 (https://phabricator.wikimedia.org/T355835)
[16:52:33] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] Revert "tegola: temporarily disable maps2006 db" [deployment-charts] - 10https://gerrit.wikimedia.org/r/992986 (owner: 10Hnowlan)
[16:52:57] <sukhe>	 !log running authdns-update for CR 992936: T355835
[16:53:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:13] <stashbot>	 T355835: Ensure that store.wikimedia.org complies with Google's new email sender guidelines - https://phabricator.wikimedia.org/T355835
[16:53:25] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "tegola: temporarily disable maps2006 db" [deployment-charts] - 10https://gerrit.wikimedia.org/r/992986 (owner: 10Hnowlan)
[16:56:01] <wikibugs>	 10SRE, 10serviceops, 10SecTeam-Processed, 10Security, 10Vuln-Misconfiguration: Helm Chart misconfigurations - https://phabricator.wikimedia.org/T355167 (10sbassett) 05In progress→03Resolved p:05Triage→03Low
[16:56:30] <wikibugs>	 10SRE, 10serviceops, 10SecTeam-Processed, 10Security, 10Vuln-Misconfiguration: Helm Chart misconfigurations - https://phabricator.wikimedia.org/T355167 (10sbassett) 05Resolved→03In progress Whoops, I'll leave it in progress until the patches are actually merged/deployed.
[16:56:51] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.dns.netbox
[16:57:37] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10klausman) Nice work. On our machine (ml-serve2002), it was but four seconds:  `[Thu Jan 25 16:09:14 2024] tg3...
[16:58:40] <wikibugs>	 (03PS3) 10Ebernhardson: cirrus updater: Expand test deployment to prod+cloudelastic [deployment-charts] - 10https://gerrit.wikimedia.org/r/979147 (https://phabricator.wikimedia.org/T352335)
[16:59:19] <wikibugs>	 (03CR) 10Stevemunene: [V: 03+2 C: 03+2] Remove dummy-keytabs for decommissioned druid hosts [labs/private] - 10https://gerrit.wikimedia.org/r/992968 (https://phabricator.wikimedia.org/T336043) (owner: 10Stevemunene)
[16:59:28] <wikibugs>	 10SRE-OnFire, 10Znuny, 10collaboration-services: ticket.wikimedia.org should page when down - https://phabricator.wikimedia.org/T354479 (10LSobanski) a:05Jelto→03LSobanski Claiming this as it's a process / SLA question for the time being.
[17:00:05] <jouncebot>	 jhathaway and rzl: #bothumor I � Unicode. All rise for Puppet request window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1700).
[17:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[17:00:22] <wikibugs>	 (03CR) 10Clément Goubert: [C: 03+1] kubernetes: make 5 jobrunners kubernetes workers [puppet] - 10https://gerrit.wikimedia.org/r/992973 (https://phabricator.wikimedia.org/T354791) (owner: 10Hnowlan)
[17:00:53] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudrabbit[1001-1002].wikimedia.org decommissioned, removing all IPs except the asset tag one - taavi@cumin1002"
[17:01:06] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
[17:01:23] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
[17:01:34] <wikibugs>	 (03PS1) 10Btullis: Update the datahub containers to pick up new JRE [deployment-charts] - 10https://gerrit.wikimedia.org/r/992980 (https://phabricator.wikimedia.org/T354273)
[17:03:22] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+2] cirrus updater: Expand test deployment to prod+cloudelastic [deployment-charts] - 10https://gerrit.wikimedia.org/r/979147 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[17:04:15] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus updater: Expand test deployment to prod+cloudelastic [deployment-charts] - 10https://gerrit.wikimedia.org/r/979147 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[17:04:19] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudrabbit[1001-1002].wikimedia.org decommissioned, removing all IPs except the asset tag one - taavi@cumin1002"
[17:04:19] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:04:20] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudrabbit[1001-1002].wikimedia.org
[17:04:33] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, and 2 others: cloudrabbit: connect them via cloudsw and cloud-private - https://phabricator.wikimedia.org/T345610 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by taavi@cumin1002 for hosts: `cloudrabbit[1001-1002].wikimedia.org` - cloudrabbit100...
[17:05:24] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] Update the datahub containers to pick up new JRE [deployment-charts] - 10https://gerrit.wikimedia.org/r/992980 (https://phabricator.wikimedia.org/T354273) (owner: 10Btullis)
[17:05:40] <wikibugs>	 10SRE, 10DNS, 10Foundational Technology Requests, 10Traffic, 10Patch-For-Review: Ensure that store.wikimedia.org complies with Google's new email sender guidelines - https://phabricator.wikimedia.org/T355835 (10ssingh) @bcampbell: The changes have been merged, please try the authenticate domain part now....
[17:05:43] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.dns.netbox
[17:06:20] <wikibugs>	 (03Merged) 10jenkins-bot: Update the datahub containers to pick up new JRE [deployment-charts] - 10https://gerrit.wikimedia.org/r/992980 (https://phabricator.wikimedia.org/T354273) (owner: 10Btullis)
[17:06:34] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, and 2 others: cloudrabbit: connect them via cloudsw and cloud-private - https://phabricator.wikimedia.org/T345610 (10taavi)
[17:07:07] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:09:22] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[17:09:39] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[17:13:03] <wikibugs>	 10SRE, 10Cumin, 10Infrastructure-Foundations: Feature request: When cumin is running with -b (and -s), it should display the current host being affected - https://phabricator.wikimedia.org/T355811 (10Volans) p:05Triage→03Medium
[17:14:32] <Amir1>	 jouncebot: nowandnext
[17:14:32] <jouncebot>	 For the next 0 hour(s) and 45 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1700)
[17:14:32] <jouncebot>	 In 0 hour(s) and 45 minute(s): Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1800)
[17:14:32] <jouncebot>	 In 0 hour(s) and 45 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1800)
[17:17:05] <wikibugs>	 (03PS1) 10Ebernhardson: flink-operator: Add cirrus-streaming-updater to prod watched namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/992983 (https://phabricator.wikimedia.org/T352335)
[17:17:29] <logmsgbot>	 !log btullis@deploy2002 helmfile [staging] START helmfile.d/services/datahub: apply on main
[17:19:08] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] flink-operator: Add cirrus-streaming-updater to prod watched namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/992983 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[17:20:10] <wikibugs>	 (03CR) 10Bking: [C: 03+1] flink-operator: Add cirrus-streaming-updater to prod watched namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/992983 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[17:21:39] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bookworm
[17:22:11] <sukhe>	 ^ BGP alerts expected in drmrs
[17:22:34] <logmsgbot>	 !log btullis@deploy2002 helmfile [staging] DONE helmfile.d/services/datahub: sync on main
[17:22:47] <logmsgbot>	 !log btullis@deploy2002 helmfile [codfw] START helmfile.d/services/datahub: apply on main
[17:25:51] <wikibugs>	 10SRE, 10DNS, 10Foundational Technology Requests, 10Traffic, 10Patch-For-Review: Ensure that store.wikimedia.org complies with Google's new email sender guidelines - https://phabricator.wikimedia.org/T355835 (10bcampbell) @ssingh Thank you, I just initiated the process, which Shopify says may take 24 hou...
[17:26:09] <icinga-wm>	 PROBLEM - BFD status on asw1-b12-drmrs.mgmt is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[17:26:11] <icinga-wm>	 PROBLEM - BGP status on asw1-b12-drmrs.mgmt is CRITICAL: BGP CRITICAL - AS64605/IPv6: Connect - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:27:47] <wikibugs>	 10SRE, 10DNS, 10Foundational Technology Requests, 10Traffic: Ensure that wikimediafoundation.myshopify.com complies with Google's new email sender guidelines - https://phabricator.wikimedia.org/T355833 (10jhathaway) @bcampbell I assume the intent is to allow shopify to dkim sign their mail with keys we adv...
[17:30:28] <Amir1>	 !log deploying new captchas (T141490)
[17:30:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:30:48] <stashbot>	 T141490: Deploy improved FancyCaptcha - https://phabricator.wikimedia.org/T141490
[17:33:09] <logmsgbot>	 !log btullis@deploy2002 helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
[17:34:07] <logmsgbot>	 !log btullis@deploy2002 helmfile [eqiad] START helmfile.d/services/datahub: apply on main
[17:38:28] <wikibugs>	 10SRE, 10DNS, 10Foundational Technology Requests, 10Traffic: Ensure that wikimediafoundation.myshopify.com complies with Google's new email sender guidelines - https://phabricator.wikimedia.org/T355833 (10ssingh) >>! In T355833#9489071, @jhathaway wrote: > @bcampbell I assume the intent is to allow shopify...
[17:38:49] <logmsgbot>	 !log btullis@deploy2002 helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
[17:38:51] <jinxer-wm>	 (RdfStreamingUpdaterSpaceUsageTooHigh) firing: (2) The RDF Streaming Updater is using more than 50GiB of storage - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterSpaceUsageTooHigh
[17:40:17] <wikibugs>	 (03PS2) 10Ladsgroup: mediawiki: Use the new captcha [puppet] - 10https://gerrit.wikimedia.org/r/990697 (https://phabricator.wikimedia.org/T141490)
[17:40:57] <wikibugs>	 (03PS3) 10Ladsgroup: mediawiki: Use the new captcha [puppet] - 10https://gerrit.wikimedia.org/r/990697 (https://phabricator.wikimedia.org/T141490)
[17:43:59] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
[17:44:52] <wikibugs>	 (03PS4) 10Ladsgroup: mediawiki: Use the new captcha [puppet] - 10https://gerrit.wikimedia.org/r/990697 (https://phabricator.wikimedia.org/T141490)
[17:44:57] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] mediawiki: Use the new captcha [puppet] - 10https://gerrit.wikimedia.org/r/990697 (https://phabricator.wikimedia.org/T141490) (owner: 10Ladsgroup)
[17:45:14] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.remove-downtime for asw-b-codfw,lsw1-b5-codfw.mgmt
[17:45:14] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for asw-b-codfw,lsw1-b5-codfw.mgmt
[17:46:24] <wikibugs>	 (03PS5) 10Reedy: mediawiki: Replace deprecated blacklist parameter in captchaloop [puppet] - 10https://gerrit.wikimedia.org/r/774940 (https://phabricator.wikimedia.org/T277936)
[17:47:16] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
[17:48:04] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2159 (re)pooling @ 10%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55692 and previous config saved to /var/cache/conftool/dbconfig/20240125-174803-root.json
[17:48:20] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2109 (re)pooling @ 10%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55693 and previous config saved to /var/cache/conftool/dbconfig/20240125-174819-root.json
[17:48:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55694 and previous config saved to /var/cache/conftool/dbconfig/20240125-174825-root.json
[17:48:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 10%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55695 and previous config saved to /var/cache/conftool/dbconfig/20240125-174833-root.json
[17:48:41] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137:3315 (re)pooling @ 10%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55696 and previous config saved to /var/cache/conftool/dbconfig/20240125-174840-root.json
[17:48:46] <wikibugs>	 (03PS1) 10Ssingh: wikimedia.org: fix store.wm.org records [dns] - 10https://gerrit.wikimedia.org/r/993008 (https://phabricator.wikimedia.org/T355835)
[17:48:47] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55697 and previous config saved to /var/cache/conftool/dbconfig/20240125-174846-root.json
[17:48:52] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2177 (re)pooling @ 10%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55698 and previous config saved to /var/cache/conftool/dbconfig/20240125-174851-root.json
[17:48:58] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2178 (re)pooling @ 10%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55699 and previous config saved to /var/cache/conftool/dbconfig/20240125-174857-root.json
[17:49:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2188 (re)pooling @ 10%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55700 and previous config saved to /var/cache/conftool/dbconfig/20240125-174902-root.json
[17:49:17] <wikibugs>	 (03CR) 10JHathaway: "maybe add the condition both to the timer and the service? https://github.com/systemd/systemd/issues/3963" [puppet] - 10https://gerrit.wikimedia.org/r/992888 (owner: 10Majavah)
[17:49:39] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
[17:49:43] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wikimedia.org: fix store.wm.org records [dns] - 10https://gerrit.wikimedia.org/r/993008 (https://phabricator.wikimedia.org/T355835) (owner: 10Ssingh)
[17:49:53] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
[17:51:19] <wikibugs>	 (03PS2) 10Ssingh: wikimedia.org: fix store.wm.org records [dns] - 10https://gerrit.wikimedia.org/r/993008 (https://phabricator.wikimedia.org/T355835)
[17:52:18] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wikimedia.org: fix store.wm.org records [dns] - 10https://gerrit.wikimedia.org/r/993008 (https://phabricator.wikimedia.org/T355835) (owner: 10Ssingh)
[17:54:28] <wikibugs>	 (03PS2) 10Reedy: captchaloop: Generate old and new captchas [puppet] - 10https://gerrit.wikimedia.org/r/990715
[17:54:30] <wikibugs>	 (03PS1) 10Reedy: mediawiki: Refactor and improve captchaloop [puppet] - 10https://gerrit.wikimedia.org/r/993010
[17:55:33] <wikibugs>	 (03PS3) 10Ssingh: wikimedia.org: fix store.wm.org records [dns] - 10https://gerrit.wikimedia.org/r/993008 (https://phabricator.wikimedia.org/T355835)
[17:58:18] <wikibugs>	 (03PS1) 10Btullis: Update the spark-operator image name and version [deployment-charts] - 10https://gerrit.wikimedia.org/r/993012 (https://phabricator.wikimedia.org/T354273)
[17:59:21] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] mediawiki: Replace deprecated blacklist parameter in captchaloop [puppet] - 10https://gerrit.wikimedia.org/r/774940 (https://phabricator.wikimedia.org/T277936) (owner: 10Reedy)
[17:59:37] <wikibugs>	 (03CR) 10Ladsgroup: [V: 03+2 C: 03+2] "Done" [puppet] - 10https://gerrit.wikimedia.org/r/774940 (https://phabricator.wikimedia.org/T277936) (owner: 10Reedy)
[18:00:05] <jouncebot>	 bd808: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1800).
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T1800)
[18:00:51] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] wikimedia.org: fix store.wm.org records [dns] - 10https://gerrit.wikimedia.org/r/993008 (https://phabricator.wikimedia.org/T355835) (owner: 10Ssingh)
[18:01:06] <sukhe>	 !log running authdns-update for CR 993008: T355835
[18:01:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:01:33] <stashbot>	 T355835: Ensure that store.wikimedia.org complies with Google's new email sender guidelines - https://phabricator.wikimedia.org/T355835
[18:03:09] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2159 (re)pooling @ 25%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55701 and previous config saved to /var/cache/conftool/dbconfig/20240125-180308-root.json
[18:03:25] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2109 (re)pooling @ 25%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55702 and previous config saved to /var/cache/conftool/dbconfig/20240125-180324-root.json
[18:03:30] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2107 (re)pooling @ 25%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55703 and previous config saved to /var/cache/conftool/dbconfig/20240125-180330-root.json
[18:03:39] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 25%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55704 and previous config saved to /var/cache/conftool/dbconfig/20240125-180338-root.json
[18:03:46] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137:3315 (re)pooling @ 25%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55705 and previous config saved to /var/cache/conftool/dbconfig/20240125-180345-root.json
[18:03:52] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55706 and previous config saved to /var/cache/conftool/dbconfig/20240125-180351-root.json
[18:03:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2177 (re)pooling @ 25%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55707 and previous config saved to /var/cache/conftool/dbconfig/20240125-180356-root.json
[18:04:03] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2178 (re)pooling @ 25%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55708 and previous config saved to /var/cache/conftool/dbconfig/20240125-180402-root.json
[18:04:08] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2188 (re)pooling @ 25%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55709 and previous config saved to /var/cache/conftool/dbconfig/20240125-180407-root.json
[18:05:16] <wikibugs>	 10SRE, 10ops-codfw, 10Data-Persistence, 10Infrastructure-Foundations, 10netops: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10Marostegui) @Jhancock.wm @papaul <3
[18:07:29] <wikibugs>	 10SRE, 10DNS, 10Foundational Technology Requests, 10Traffic, 10Patch-For-Review: Ensure that store.wikimedia.org complies with Google's new email sender guidelines - https://phabricator.wikimedia.org/T355835 (10ssingh) ` $ dig n1j._domainkey.wikimedia.org +short dkim1.327bdf87d37c.p413.email.myshopify.co...
[18:10:55] <icinga-wm>	 RECOVERY - BGP status on asw1-b12-drmrs.mgmt is OK: BGP OK - up: 13, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[18:11:09] <icinga-wm>	 RECOVERY - BFD status on asw1-b12-drmrs.mgmt is OK: UP: 5 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[18:13:04] <logmsgbot>	 !log sukhe@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6001.drmrs.wmnet with OS bookworm
[18:18:14] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2159 (re)pooling @ 50%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55710 and previous config saved to /var/cache/conftool/dbconfig/20240125-181814-root.json
[18:18:28] <wikibugs>	 (03PS1) 10Bking: cloudelastic: add CNAME for migration canary [dns] - 10https://gerrit.wikimedia.org/r/993014 (https://phabricator.wikimedia.org/T355617)
[18:18:29] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2109 (re)pooling @ 50%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55711 and previous config saved to /var/cache/conftool/dbconfig/20240125-181829-root.json
[18:18:35] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2107 (re)pooling @ 50%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55712 and previous config saved to /var/cache/conftool/dbconfig/20240125-181835-root.json
[18:18:43] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 50%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55713 and previous config saved to /var/cache/conftool/dbconfig/20240125-181843-root.json
[18:18:51] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137:3315 (re)pooling @ 50%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55714 and previous config saved to /var/cache/conftool/dbconfig/20240125-181850-root.json
[18:18:57] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55715 and previous config saved to /var/cache/conftool/dbconfig/20240125-181856-root.json
[18:19:02] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2177 (re)pooling @ 50%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55716 and previous config saved to /var/cache/conftool/dbconfig/20240125-181901-root.json
[18:19:07] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2178 (re)pooling @ 50%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55717 and previous config saved to /var/cache/conftool/dbconfig/20240125-181907-root.json
[18:19:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2188 (re)pooling @ 50%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55718 and previous config saved to /var/cache/conftool/dbconfig/20240125-181912-root.json
[18:21:00] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+1] cloudelastic: add CNAME for migration canary [dns] - 10https://gerrit.wikimedia.org/r/993014 (https://phabricator.wikimedia.org/T355617) (owner: 10Bking)
[18:33:19] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2159 (re)pooling @ 75%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55719 and previous config saved to /var/cache/conftool/dbconfig/20240125-183318-root.json
[18:33:34] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2109 (re)pooling @ 75%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55720 and previous config saved to /var/cache/conftool/dbconfig/20240125-183334-root.json
[18:33:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2107 (re)pooling @ 75%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55721 and previous config saved to /var/cache/conftool/dbconfig/20240125-183340-root.json
[18:33:48] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55722 and previous config saved to /var/cache/conftool/dbconfig/20240125-183348-root.json
[18:33:56] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137:3315 (re)pooling @ 75%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55723 and previous config saved to /var/cache/conftool/dbconfig/20240125-183355-root.json
[18:34:02] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55724 and previous config saved to /var/cache/conftool/dbconfig/20240125-183401-root.json
[18:34:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2177 (re)pooling @ 75%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55725 and previous config saved to /var/cache/conftool/dbconfig/20240125-183406-root.json
[18:34:13] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2178 (re)pooling @ 75%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55726 and previous config saved to /var/cache/conftool/dbconfig/20240125-183412-root.json
[18:34:18] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2188 (re)pooling @ 75%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55727 and previous config saved to /var/cache/conftool/dbconfig/20240125-183417-root.json
[18:42:54] <wikibugs>	 (03PS2) 10Bking: cloudelastic: add CNAME for migration canary [dns] - 10https://gerrit.wikimedia.org/r/993014 (https://phabricator.wikimedia.org/T355617)
[18:43:17] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting analytics-privatedata-users access for amastilovic - https://phabricator.wikimedia.org/T355606 (10Ahoelzl)
[18:45:35] <logmsgbot>	 !log dzahn@cumin1002 START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: reboot
[18:46:00] <logmsgbot>	 !log dzahn@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: reboot
[18:47:00] <mutante>	 !log phab2002 - rebooting
[18:47:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:48:07] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting analytics-privatedata-users access for amastilovic - https://phabricator.wikimedia.org/T355606 (10Ahoelzl) @RLazarus an ETA / update on the request would be very much appreciated. Cluster access is a key step for onboarding Aleksandar to my engineering team. Thank you!
[18:48:24] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2159 (re)pooling @ 100%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55728 and previous config saved to /var/cache/conftool/dbconfig/20240125-184823-root.json
[18:48:40] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2109 (re)pooling @ 100%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55729 and previous config saved to /var/cache/conftool/dbconfig/20240125-184839-root.json
[18:48:45] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55730 and previous config saved to /var/cache/conftool/dbconfig/20240125-184845-root.json
[18:48:53] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55731 and previous config saved to /var/cache/conftool/dbconfig/20240125-184853-root.json
[18:49:01] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2137:3315 (re)pooling @ 100%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55732 and previous config saved to /var/cache/conftool/dbconfig/20240125-184900-root.json
[18:49:06] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55733 and previous config saved to /var/cache/conftool/dbconfig/20240125-184906-root.json
[18:49:11] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2177 (re)pooling @ 100%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55734 and previous config saved to /var/cache/conftool/dbconfig/20240125-184911-root.json
[18:49:17] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2178 (re)pooling @ 100%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55735 and previous config saved to /var/cache/conftool/dbconfig/20240125-184917-root.json
[18:49:23] <logmsgbot>	 !log marostegui@cumin1002 dbctl commit (dc=all): 'db2188 (re)pooling @ 100%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P55736 and previous config saved to /var/cache/conftool/dbconfig/20240125-184922-root.json
[18:49:26] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting analytics-privatedata-users access for amastilovic - https://phabricator.wikimedia.org/T355606 (10RLazarus) This week's clinic duty SRE is @Arnoldokoth.
[18:49:31] <icinga-wm>	 RECOVERY - Check systemd state on phab2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:52:09] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for Aleksandar Mastilovic - https://phabricator.wikimedia.org/T355607 (10Ahoelzl) @Arnoldokoth would it be possible to get an update / ETA on the request? Ldap / wmf access is blocking onboarding Aleksandra to the engineering team. Thank you!
[18:59:11] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting analytics-privatedata-users access for amastilovic - https://phabricator.wikimedia.org/T355606 (10Arnoldokoth) Thanks @RLazarus   Apologies @Ahoelzl This will be done as soon as @odimitrijevic / @Milimetric approve the request.
[19:01:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[19:02:02] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting analytics-privatedata-users access for amastilovic - https://phabricator.wikimedia.org/T355606 (10Arnoldokoth) a:03odimitrijevic
[19:06:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[19:11:39] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for Aleksandar Mastilovic - https://phabricator.wikimedia.org/T355607 (10Arnoldokoth) 05Open→03In progress
[19:16:24] <wikibugs>	 (03PS49) 10AOkoth: prometheus: puppetise sql_exporter [puppet] - 10https://gerrit.wikimedia.org/r/945872 (https://phabricator.wikimedia.org/T310822)
[19:16:26] <wikibugs>	 (03PS6) 10AOkoth: vrts: enable connection pooling [puppet] - 10https://gerrit.wikimedia.org/r/988679
[19:16:28] <wikibugs>	 (03PS1) 10AOkoth: admin: add amastilovic to LDAP users [puppet] - 10https://gerrit.wikimedia.org/r/993019 (https://phabricator.wikimedia.org/T355607)
[19:19:08] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/993019 (https://phabricator.wikimedia.org/T355607) (owner: 10AOkoth)
[19:19:23] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to ldap/wmf for Aleksandar Mastilovic - https://phabricator.wikimedia.org/T355607 (10Arnoldokoth) @Ahoelzl This will be resolved shortly.
[19:19:34] <wikibugs>	 (03CR) 10Bking: [C: 03+2] flink-operator: Add cirrus-streaming-updater to prod watched namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/992983 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[19:20:39] <wikibugs>	 (03PS2) 10AOkoth: admin: add amastilovic to LDAP users [puppet] - 10https://gerrit.wikimedia.org/r/993019 (https://phabricator.wikimedia.org/T355607)
[19:22:19] <wikibugs>	 (03Merged) 10jenkins-bot: flink-operator: Add cirrus-streaming-updater to prod watched namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/992983 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[19:22:30] <wikibugs>	 (03CR) 10AOkoth: [C: 03+2] admin: add amastilovic to LDAP users [puppet] - 10https://gerrit.wikimedia.org/r/993019 (https://phabricator.wikimedia.org/T355607) (owner: 10AOkoth)
[19:24:50] <logmsgbot>	 !log bking@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[19:25:55] <logmsgbot>	 !log bking@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[19:28:13] <wikibugs>	 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to ldap/wmf for Aleksandar Mastilovic - https://phabricator.wikimedia.org/T355607 (10Arnoldokoth) @amastilovic This should be okay now.
[19:28:28] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[19:28:34] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[19:29:16] <logmsgbot>	 !log bking@deploy2002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[19:29:33] <logmsgbot>	 !log bking@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[19:29:42] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, 10cloud-services-team: cloudrabbit: connect them via cloudsw and cloud-private - https://phabricator.wikimedia.org/T345610 (10VRiley-WMF)
[19:29:47] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, 10cloud-services-team: cloudrabbit: connect them via cloudsw and cloud-private - https://phabricator.wikimedia.org/T345610 (10VRiley-WMF)
[19:30:36] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to wmf for arinaigum - https://phabricator.wikimedia.org/T355591 (10Dzahn) >>! In T355591#9486355, @Arinaigu wrote: > There seems to be a problem with my developer account as well.   Hi! It seems the problem is there is an account "Arinaugu" without the trailing...
[19:31:35] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, 10cloud-services-team: cloudrabbit: connect them via cloudsw and cloud-private - https://phabricator.wikimedia.org/T345610 (10VRiley-WMF) cloudrabbit1002 is now in   E4 U17 CableID 2M-20220016 Port 3
[19:41:19] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to wmf for arinaigum - https://phabricator.wikimedia.org/T355591 (10Arinaigu) Hi! I created the account Arinaigu for Meta Wikimedia and MediaWiki. Then I created a separate developer/Wikitech account arinaigum. I think I read somewhere in the documentation that t...
[19:43:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[19:45:34] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to wmf for arinaigum - https://phabricator.wikimedia.org/T355591 (10taavi) You do have a developer account, and the fact that you can log in to https://idm.wikimedia.org and https://idp.wikimedia.org confirms that. The problem with logging in to https://wikitech....
[19:48:15] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[19:56:39] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.dns.netbox
[19:57:09] <wikibugs>	 10SRE, 10ops-eqiad, 10Cloud-VPS, 10DC-Ops, 10cloud-services-team: cloudrabbit: connect them via cloudsw and cloud-private - https://phabricator.wikimedia.org/T345610 (10VRiley-WMF) cloudrabbit1001 is now in   C8 U19 CableID 5336 Port 21
[19:58:51] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add IPs for cloudrabbit1002 - taavi@cumin1002"
[19:59:35] <zabe>	 jouncebot: nowandnext
[19:59:35] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 0 minute(s)
[19:59:35] <jouncebot>	 In 1 hour(s) and 0 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T2100)
[19:59:43] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add IPs for cloudrabbit1002 - taavi@cumin1002"
[19:59:43] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:00:34] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host cloudrabbit1002
[20:01:21] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudrabbit1002
[20:02:09] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.dns.netbox
[20:03:41] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Start reading from af_actor/afh_actor in group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992942 (https://phabricator.wikimedia.org/T355616) (owner: 10Zabe)
[20:04:20] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add IPs for cloudrabbit1001 - taavi@cumin1002"
[20:04:25] <wikibugs>	 (03Merged) 10jenkins-bot: Start reading from af_actor/afh_actor in group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/992942 (https://phabricator.wikimedia.org/T355616) (owner: 10Zabe)
[20:05:14] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add IPs for cloudrabbit1001 - taavi@cumin1002"
[20:05:14] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:05:20] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host cloudrabbit1001
[20:05:21] <logmsgbot>	 !log zabe@deploy2002 Started scap: Backport for [[gerrit:992942|Start reading from af_actor/afh_actor in group1 wikis (T355616)]]
[20:05:29] <wikibugs>	 10SRE, 10Data Products: Forward ops-dumps@wikimedia.org to data-engineering-alerts@lists.wikimedia.org - https://phabricator.wikimedia.org/T355891 (10Dzahn) Hi @xcollazo are you asking to add the list in addition to the current recipients or to entirely replace them (to remove the other recipients)?   ` ops-du...
[20:05:42] <stashbot>	 T355616: Start reading from af_actor/afh_actor - https://phabricator.wikimedia.org/T355616
[20:06:10] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudrabbit1001
[20:06:47] <wikibugs>	 (03PS1) 10Majavah: Move cloudrabbit1001/2 to private vlan [puppet] - 10https://gerrit.wikimedia.org/r/993026 (https://phabricator.wikimedia.org/T345610)
[20:08:09] <wikibugs>	 (03CR) 10Majavah: [C: 03+2] Move cloudrabbit1001/2 to private vlan [puppet] - 10https://gerrit.wikimedia.org/r/993026 (https://phabricator.wikimedia.org/T345610) (owner: 10Majavah)
[20:09:36] <logmsgbot>	 !log zabe@deploy2002 zabe: Backport for [[gerrit:992942|Start reading from af_actor/afh_actor in group1 wikis (T355616)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[20:10:05] <logmsgbot>	 !log zabe@deploy2002 zabe: Continuing with sync
[20:10:56] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.hosts.reimage for host cloudrabbit1001.eqiad.wmnet with OS bookworm
[20:11:30] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.hosts.reimage for host cloudrabbit1002.eqiad.wmnet with OS bookworm
[20:14:23] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:14:34] <wikibugs>	 10SRE, 10Data Products: Forward ops-dumps@wikimedia.org to data-engineering-alerts@lists.wikimedia.org - https://phabricator.wikimedia.org/T355891 (10xcollazo)  > And another question, would it make sense if we move this to a Google group where your team becomes admin so in the future you can control it yourse...
[20:14:53] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:15:07] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:15:37] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[20:15:46] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[20:16:48] <logmsgbot>	 !log zabe@deploy2002 Finished scap: Backport for [[gerrit:992942|Start reading from af_actor/afh_actor in group1 wikis (T355616)]] (duration: 11m 27s)
[20:16:53] <stashbot>	 T355616: Start reading from af_actor/afh_actor - https://phabricator.wikimedia.org/T355616
[20:19:33] <logmsgbot>	 !log taavi@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudrabbit1002.eqiad.wmnet with OS bookworm
[20:19:48] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.hosts.reimage for host cloudrabbit1002.eqiad.wmnet with OS bookworm
[20:20:53] <wikibugs>	 10SRE, 10Data Products: Forward ops-dumps@wikimedia.org to data-engineering-alerts@lists.wikimedia.org - https://phabricator.wikimedia.org/T355891 (10Dzahn) >>! In T355891#9489388, @xcollazo wrote: > Oh, that would be nice. How about we do that instead? >  > Then I can take care of forwarding/figuring out if t...
[20:25:30] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set cloudrabbit1001/2 as active - taavi@cumin1002"
[20:26:38] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set cloudrabbit1001/2 as active - taavi@cumin1002"
[20:27:30] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1001.eqiad.wmnet with reason: host reimage
[20:30:05] <wikibugs>	 10SRE, 10Data Products: Forward ops-dumps@wikimedia.org to data-engineering-alerts@lists.wikimedia.org - https://phabricator.wikimedia.org/T355891 (10Dzahn) ITS request #99871
[20:30:51] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for Aleksandar Mastilovic - https://phabricator.wikimedia.org/T355607 (10Dzahn) 05In progress→03Resolved a:03Dzahn
[20:32:46] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1001.eqiad.wmnet with reason: host reimage
[20:32:53] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus-updater: Increase producer memory from 2g to 3g [deployment-charts] - 10https://gerrit.wikimedia.org/r/993028 (https://phabricator.wikimedia.org/T352335)
[20:33:00] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1002.eqiad.wmnet with reason: host reimage
[20:33:48] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for Aleksandar Mastilovic - https://phabricator.wikimedia.org/T355607 (10Dzahn) Also added to WMF-NDA group in Phabricator. (per https://wikitech.wikimedia.org/wiki/SRE/Clinic_Duty/Access_requests#WMF_Group)  You can now see non-public tickets.
[20:33:52] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[20:33:56] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[20:34:46] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for Aleksandar Mastilovic - https://phabricator.wikimedia.org/T355607 (10Dzahn) a:05Dzahn→03None
[20:35:39] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[20:35:44] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[20:36:23] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1002.eqiad.wmnet with reason: host reimage
[20:37:02] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[20:37:07] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[20:44:40] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Thu 15 Feb 2024 02:11:55 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:45:12] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.248 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:45:30] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 51305 bytes in 0.062 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:50:40] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
[20:51:29] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
[20:51:30] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1001.eqiad.wmnet with OS bookworm
[20:54:24] <wikibugs>	 (03PS1) 10Hashar: gerrit: use finer groups for commit commentlink [puppet] - 10https://gerrit.wikimedia.org/r/993029 (https://phabricator.wikimedia.org/T354886)
[20:54:52] <wikibugs>	 (03CR) 10Hashar: "Example https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LiquidThreads/+/992939" [puppet] - 10https://gerrit.wikimedia.org/r/993029 (https://phabricator.wikimedia.org/T354886) (owner: 10Hashar)
[20:54:57] <logmsgbot>	 !log taavi@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
[20:55:21] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for Aleksandar Mastilovic - https://phabricator.wikimedia.org/T355607 (10amastilovic) @Arnoldokoth thank you!
[20:55:47] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
[20:55:48] <logmsgbot>	 !log taavi@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1002.eqiad.wmnet with OS bookworm
[20:56:23] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[20:56:28] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[20:57:49] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[20:57:53] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[20:58:09] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[20:58:11] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[20:59:17] <wikibugs>	 (03CR) 10Paladox: [C: 03+1] "Tested locally and works." [puppet] - 10https://gerrit.wikimedia.org/r/993029 (https://phabricator.wikimedia.org/T354886) (owner: 10Hashar)
[21:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: Your horoscope predicts another UTC late backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240125T2100).
[21:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[21:00:53] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] logstash: consume from mediawiki accesslog sampled topics [puppet] - 10https://gerrit.wikimedia.org/r/992656 (https://phabricator.wikimedia.org/T355836) (owner: 10Cwhite)
[21:05:20] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus-updater: Normalize kafka configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/993032 (https://phabricator.wikimedia.org/T352335)
[21:07:07] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] gerrit: use finer groups for commit commentlink [puppet] - 10https://gerrit.wikimedia.org/r/993029 (https://phabricator.wikimedia.org/T354886) (owner: 10Hashar)
[21:09:56] <wikibugs>	 (03PS2) 10Ebernhardson: cirrus-updater: Normalize kafka configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/993032 (https://phabricator.wikimedia.org/T352335)
[21:11:43] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+2] cirrus-updater: Normalize kafka configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/993032 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[21:11:51] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "deployed and config reload." [puppet] - 10https://gerrit.wikimedia.org/r/993029 (https://phabricator.wikimedia.org/T354886) (owner: 10Hashar)
[21:12:37] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus-updater: Normalize kafka configuration [deployment-charts] - 10https://gerrit.wikimedia.org/r/993032 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[21:13:39] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[21:13:44] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[21:14:09] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[21:14:16] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[21:19:00] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[21:19:32] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[21:35:11] <wikibugs>	 (03PS1) 10Ryan Kemper: cloudelastic: remove old masters [puppet] - 10https://gerrit.wikimedia.org/r/993038 (https://phabricator.wikimedia.org/T351354)
[21:36:01] <wikibugs>	 (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/993038 (https://phabricator.wikimedia.org/T351354) (owner: 10Ryan Kemper)
[21:36:16] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus-updater: Update list of allowed wikis in production [deployment-charts] - 10https://gerrit.wikimedia.org/r/993039
[21:37:55] <wikibugs>	 (03CR) 10Bking: [C: 03+1] cloudelastic: remove old masters [puppet] - 10https://gerrit.wikimedia.org/r/993038 (https://phabricator.wikimedia.org/T351354) (owner: 10Ryan Kemper)
[21:38:51] <jinxer-wm>	 (RdfStreamingUpdaterSpaceUsageTooHigh) firing: (2) The RDF Streaming Updater is using more than 50GiB of storage - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterSpaceUsageTooHigh
[21:40:00] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+2] cirrus-updater: Update list of allowed wikis in production [deployment-charts] - 10https://gerrit.wikimedia.org/r/993039 (owner: 10Ebernhardson)
[21:40:55] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus-updater: Update list of allowed wikis in production [deployment-charts] - 10https://gerrit.wikimedia.org/r/993039 (owner: 10Ebernhardson)
[21:44:06] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[21:44:10] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[21:44:23] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[21:44:41] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[21:55:19] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[21:55:26] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[22:07:39] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: cloudelastic maintenance
[22:08:08] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: cloudelastic maintenance
[22:08:10] <wikibugs>	 (03PS2) 10Ryan Kemper: cloudelastic: remove old masters [puppet] - 10https://gerrit.wikimedia.org/r/993038 (https://phabricator.wikimedia.org/T351354)
[22:08:57] <ryankemper>	 !log T351354 Downtimed `cloudelastic*`; shortly will restart `cloudelastic100[1,2,4]` one host at a time to make them no longer masters
[22:09:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:09:11] <stashbot>	 T351354: Service implementation for cloudelastic1007-1010 - https://phabricator.wikimedia.org/T351354
[22:09:13] <wikibugs>	 (03CR) 10Ryan Kemper: [V: 03+2 C: 03+2] cloudelastic: remove old masters [puppet] - 10https://gerrit.wikimedia.org/r/993038 (https://phabricator.wikimedia.org/T351354) (owner: 10Ryan Kemper)
[22:11:25] <ryankemper>	 !log T351354 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/993038; restarting `cloudelastic1001` following puppet run
[22:11:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:12:07] <logmsgbot>	 !log dzahn@cumin1002 START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
[22:13:23] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus updater: Configure http routes for prod clusters [deployment-charts] - 10https://gerrit.wikimedia.org/r/993045 (https://phabricator.wikimedia.org/T352335)
[22:14:20] <wikibugs>	 (03PS2) 10Ebernhardson: cirrus updater: Configure http routes for prod clusters [deployment-charts] - 10https://gerrit.wikimedia.org/r/993045 (https://phabricator.wikimedia.org/T352335)
[22:15:37] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+2] cirrus updater: Configure http routes for prod clusters [deployment-charts] - 10https://gerrit.wikimedia.org/r/993045 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[22:15:50] <ryankemper>	 !log T351354 Restarting `cloudelastic1004` following puppet run
[22:15:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:15:55] <stashbot>	 T351354: Service implementation for cloudelastic1007-1010 - https://phabricator.wikimedia.org/T351354
[22:16:42] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus updater: Configure http routes for prod clusters [deployment-charts] - 10https://gerrit.wikimedia.org/r/993045 (https://phabricator.wikimedia.org/T352335) (owner: 10Ebernhardson)
[22:19:20] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[22:19:30] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[22:25:58] <ryankemper>	 !log T351354 Restarting `cloudelastic1002`
[22:26:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:26:03] <stashbot>	 T351354: Service implementation for cloudelastic1007-1010 - https://phabricator.wikimedia.org/T351354
[22:28:02] <wikibugs>	 (03PS1) 10BCornwall: fixup! Add module for ncmonitor [puppet] - 10https://gerrit.wikimedia.org/r/993046
[22:28:18] <wikibugs>	 (03Abandoned) 10BCornwall: fixup! Add module for ncmonitor [puppet] - 10https://gerrit.wikimedia.org/r/993046 (owner: 10BCornwall)
[22:33:19] <ryankemper>	 !log T351354 Now restarting new masters to keep configs in sync; restarting `cloudelastic1007`
[22:33:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:33:24] <stashbot>	 T351354: Service implementation for cloudelastic1007-1010 - https://phabricator.wikimedia.org/T351354
[22:34:42] <ryankemper>	 !log T351354 Now restarting new masters to keep configs in sync; restarting `cloudelastic1009`
[22:34:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:40:06] <ryankemper>	 !log T351354 Restarting `cloudelastic1006` (final restart for today)
[22:40:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:40:12] <stashbot>	 T351354: Service implementation for cloudelastic1007-1010 - https://phabricator.wikimedia.org/T351354
[22:52:41] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1010 for use cloudelastic1010 as migration canary - bking@cumin2002 - T355617
[22:52:41] <logmsgbot>	 !log bking@cumin2002 END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: cloudelastic1010 for use cloudelastic1010 as migration canary - bking@cumin2002 - T355617
[22:52:46] <stashbot>	 T355617: Migrate cloudelastic from public to private IPs - https://phabricator.wikimedia.org/T355617
[22:53:47] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1010 for use cloudelastic1010 as migration canary - bking@cumin2002 - T355617
[22:53:48] <logmsgbot>	 !log bking@cumin2002 END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: cloudelastic1010 for use cloudelastic1010 as migration canary - bking@cumin2002 - T355617
[22:53:55] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1010.wikimedia.org for use cloudelastic1010 as migration canary - bking@cumin2002 - T355617
[22:53:58] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1010.wikimedia.org for use cloudelastic1010 as migration canary - bking@cumin2002 - T355617
[23:14:18] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to ops for swfrench - https://phabricator.wikimedia.org/T355912 (10Scott_French)
[23:15:17] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to ops for swfrench - https://phabricator.wikimedia.org/T355912 (10Scott_French) 05Open→03In progress p:05Triage→03Medium
[23:16:23] <wikibugs>	 (03PS1) 10Scott French: admin: move swfrench from sre-admins to ops [puppet] - 10https://gerrit.wikimedia.org/r/993050 (https://phabricator.wikimedia.org/T355912)
[23:17:41] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cloudelastic1010.wikimedia.org with reason: migration canary T355617
[23:17:55] <stashbot>	 T355617: Migrate cloudelastic from public to private IPs - https://phabricator.wikimedia.org/T355617
[23:17:56] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cloudelastic1010.wikimedia.org with reason: migration canary T355617
[23:21:00] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] admin: move swfrench from sre-admins to ops [puppet] - 10https://gerrit.wikimedia.org/r/993050 (https://phabricator.wikimedia.org/T355912) (owner: 10Scott French)
[23:22:35] <wikibugs>	 10SRE, 10Data Products: Forward ops-dumps@wikimedia.org to data-engineering-alerts@lists.wikimedia.org - https://phabricator.wikimedia.org/T355891 (10Dzahn) a:03Dzahn
[23:29:08] <zabe>	 !log zabe@mwmaint2002:/tmp/uploads$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Sturm . # T355485
[23:29:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:29:13] <stashbot>	 T355485: Server side upload for Sturm - https://phabricator.wikimedia.org/T355485
[23:41:43] <wikibugs>	 (03PS3) 10Zabe: Setup namespace for 2025, 2026, enable subpages for 2023-2026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/961963 (https://phabricator.wikimedia.org/T347622) (owner: 10Robertsky)
[23:44:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by zabe@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/961963 (https://phabricator.wikimedia.org/T347622) (owner: 10Robertsky)
[23:45:20] <wikibugs>	 (03Merged) 10jenkins-bot: Setup namespace for 2025, 2026, enable subpages for 2023-2026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/961963 (https://phabricator.wikimedia.org/T347622) (owner: 10Robertsky)
[23:45:35] <logmsgbot>	 !log zabe@deploy2002 Started scap: Backport for [[gerrit:961963|Setup namespace for 2025, 2026, enable subpages for 2023-2026 (T347622)]]
[23:45:40] <stashbot>	 T347622: wikimaniawiki: create namespace for 2025 and 2026 - https://phabricator.wikimedia.org/T347622
[23:46:59] <logmsgbot>	 !log zabe@deploy2002 robertsky and zabe: Backport for [[gerrit:961963|Setup namespace for 2025, 2026, enable subpages for 2023-2026 (T347622)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[23:47:29] <logmsgbot>	 !log zabe@deploy2002 robertsky and zabe: Continuing with sync
[23:54:05] <logmsgbot>	 !log zabe@deploy2002 Finished scap: Backport for [[gerrit:961963|Setup namespace for 2025, 2026, enable subpages for 2023-2026 (T347622)]] (duration: 08m 30s)
[23:54:17] <zabe>	 !log zabe@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=wikimaniawiki --fix # T347622
[23:54:27] <stashbot>	 T347622: wikimaniawiki: create namespace for 2025 and 2026 - https://phabricator.wikimedia.org/T347622
[23:54:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log