[00:02:31] <wikibugs>	 (03PS9) 10BryanDavis: [WIP] Allow provisioning MediaWiki with PHP 8.1 [puppet] - 10https://gerrit.wikimedia.org/r/1085471 (https://phabricator.wikimedia.org/T378752)
[00:02:31] <logmsgbot>	 !log ebernhardson@deploy2002 Finished scap sync-world: Backport for [[gerrit:1087592|TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593|TextPassDumper: refresh content address on failure (T377594)]] (duration: 08m 48s)
[00:02:34] <stashbot>	 T377594: Fix Dumps - errors exporting good revisions - https://phabricator.wikimedia.org/T377594
[00:07:21] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[00:07:46] <wikibugs>	 (03PS2) 10Ryan Kemper: Resume XML/SQL dumps now that data qual fixed [puppet] - 10https://gerrit.wikimedia.org/r/1087598 (https://phabricator.wikimedia.org/T377594)
[00:08:10] <wikibugs>	 (03CR) 10Ssingh: "A bit late sorry but as communicated on IRC, looks good and thanks Amir for rolling it out. No ATS restart was required (and you didn't bu" [puppet] - 10https://gerrit.wikimedia.org/r/1087591 (https://phabricator.wikimedia.org/T374683) (owner: 10Arlolra)
[00:08:30] <wikibugs>	 (03CR) 10Btullis: [C:03+1] Resume XML/SQL dumps now that data qual fixed [puppet] - 10https://gerrit.wikimedia.org/r/1087598 (https://phabricator.wikimedia.org/T377594) (owner: 10Ryan Kemper)
[00:08:45] <wikibugs>	 (03CR) 10Ebernhardson: [C:03+1] "Copied votes on follow-up patch sets have been updated:" [puppet] - 10https://gerrit.wikimedia.org/r/1087598 (https://phabricator.wikimedia.org/T377594) (owner: 10Ryan Kemper)
[00:09:45] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+2] Resume XML/SQL dumps now that data qual fixed [puppet] - 10https://gerrit.wikimedia.org/r/1087598 (https://phabricator.wikimedia.org/T377594) (owner: 10Ryan Kemper)
[00:11:23] <wikibugs>	 (03CR) 10Santiago Faci: replace list of cassandra hosts with faux values (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087514 (owner: 10Eevans)
[00:21:29] <ryankemper>	 !log T377594 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now
[00:21:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:21:38] <stashbot>	 T377594: Fix Dumps - errors exporting good revisions - https://phabricator.wikimedia.org/T377594
[00:38:30] <jinxer-wm>	 FIRING: Primary outbound port utilisation over 80%  #page: Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[00:38:32] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1087600
[00:38:32] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1087600 (owner: 10TrainBranchBot)
[00:41:13] <sukhe>	 !incidents
[00:41:13] <sirenbot>	 5367 (UNACKED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[00:41:15] <sukhe>	 !ack 5367
[00:41:16] <sirenbot>	 5367 (ACKED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[00:42:17] <swfrench-wmf>	 sukhe: here now as well if you want more hands
[00:42:24] <cdanis>	 which link is it?
[00:42:24] <sukhe>	 swfrench-wmf: thanks <3
[00:42:30] <cdanis>	 is it equinix 
[00:43:06] <swfrench-wmf>	 I was about to assume this is US election-related hot linking :)
[00:43:30] <jinxer-wm>	 RESOLVED: Primary outbound port utilisation over 80%  #page: Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[00:43:32] <sukhe>	 should be recovering.
[00:43:33] <cdanis>	 NTT transit 
[00:43:33] <sukhe>	 ok
[00:43:42] <cdanis>	 thats interesting 
[00:44:16] <sukhe>	 yeah, seems to be that
[00:44:40] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over
[00:44:52] <cdanis>	 I don’t think we have to do anything unless it happens again 
[00:45:08] <cdanis>	 I’m going afk 
[00:45:23] <sukhe>	 yeah I am not sure what we can do. go ahead, I will be here intermittently
[00:55:01] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission aqs1013 - https://phabricator.wikimedia.org/T379026#10294950 (10VRiley-WMF)
[00:57:17] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission aqs1013 - https://phabricator.wikimedia.org/T379026#10294952 (10VRiley-WMF) 05Open→03Resolved a:03VRiley-WMF This has been decomissioned
[01:08:35] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1087602
[01:08:35] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1087602 (owner: 10TrainBranchBot)
[01:10:41] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1087600 (owner: 10TrainBranchBot)
[01:21:31] <wikibugs>	 (03PS1) 10Zabe: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087605 (https://phabricator.wikimedia.org/T378260)
[01:21:33] <wikibugs>	 (03CR) 10Zabe: [C:03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087605 (https://phabricator.wikimedia.org/T378260) (owner: 10Zabe)
[01:22:19] <wikibugs>	 (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087605 (https://phabricator.wikimedia.org/T378260) (owner: 10Zabe)
[01:23:00] <logmsgbot>	 !log zabe@deploy2002 Started scap sync-world: T378260
[01:23:03] <stashbot>	 T378260: Retire labtestwiki - https://phabricator.wikimedia.org/T378260
[01:27:31] <jinxer-wm>	 FIRING: Primary outbound port utilisation over 80%  #page: Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[01:30:34] <logmsgbot>	 !log zabe@deploy2002 Finished scap sync-world: T378260 (duration: 07m 34s)
[01:30:46] <stashbot>	 T378260: Retire labtestwiki - https://phabricator.wikimedia.org/T378260
[01:31:00] <swfrench-wmf>	 !incidents
[01:31:00] <sirenbot>	 5368 (UNACKED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[01:31:01] <sirenbot>	 5367 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[01:31:08] <swfrench-wmf>	 !ack 5368
[01:31:09] <sirenbot>	 5368 (ACKED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[01:32:31] <jinxer-wm>	 RESOLVED: Primary outbound port utilisation over 80%  #page: Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[01:32:32] <sukhe>	 Lumen
[01:32:33] <swfrench-wmf>	 same situation as before (same port, etc.)
[01:32:49] <swfrench-wmf>	 sukhe: Lumen?
[01:32:55] <sukhe>	 swfrench-wmf: so NTT as well? 
[01:32:57] <sukhe>	 https://librenms.wikimedia.org/graphs/to=1730856600/id=11624/type=port_bits/from=1730770200/
[01:33:15] <sukhe>	 Lumen also looks unhappy
[01:33:21] <sukhe>	 I have not found the source of this though
[01:33:34] <swfrench-wmf>	 this one was AFAICT for xe-3/1/6 again (NTT)
[01:33:43] <sukhe>	 ok
[01:34:41] <sukhe>	 https://w.wiki/BrSp
[01:34:44] <sukhe>	 now let's dig a bit deeper
[01:35:06] <wikibugs>	 (03PS1) 10Zabe: mediawiki-cache-warmup: Remove labtestwiki from dbname filter [puppet] - 10https://gerrit.wikimedia.org/r/1087606 (https://phabricator.wikimedia.org/T378260)
[01:44:39] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1087602 (owner: 10TrainBranchBot)
[01:53:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-memcached-exporter.service on idp1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:53:31] <jinxer-wm>	 FIRING: Primary outbound port utilisation over 80%  #page: Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[01:53:44] <wikibugs>	 (03PS1) 10Zabe: snapshot: Remove labtestwiki from excluded wikis [puppet] - 10https://gerrit.wikimedia.org/r/1087609 (https://phabricator.wikimedia.org/T378260)
[01:54:22] <wikibugs>	 (03CR) 10CI reject: [V:04-1] snapshot: Remove labtestwiki from excluded wikis [puppet] - 10https://gerrit.wikimedia.org/r/1087609 (https://phabricator.wikimedia.org/T378260) (owner: 10Zabe)
[01:54:36] <wikibugs>	 (03CR) 10Zabe: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087609 (https://phabricator.wikimedia.org/T378260) (owner: 10Zabe)
[01:55:18] <wikibugs>	 (03PS2) 10Zabe: snapshot: Remove labtestwiki from excluded wikis [puppet] - 10https://gerrit.wikimedia.org/r/1087609 (https://phabricator.wikimedia.org/T378260)
[01:56:58] <wikibugs>	 06SRE, 06collaboration-services, 10Wikimedia-Mailing-lists: Message content lost when mailing list is the only recipient - https://phabricator.wikimedia.org/T377045#10295009 (10Platonides) @eoghan these messageids are of the form %Y%m%d%H%M%S@test.bug.T377045.wikimedia.es  So, filtering at the entries that w...
[01:58:47] <sukhe>	 !incidents
[01:58:48] <sirenbot>	 5369 (UNACKED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[01:58:48] <sirenbot>	 5368 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[01:58:48] <sirenbot>	 5367 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[01:58:50] <sukhe>	 !ack 5369
[01:58:50] <sirenbot>	 5369 (ACKED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[02:08:31] <jinxer-wm>	 RESOLVED: Primary outbound port utilisation over 80%  #page: Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[02:12:21] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[02:12:50] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[02:13:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-mcrouter-exporter.service on idp1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:22:31] <jinxer-wm>	 FIRING: Primary outbound port utilisation over 80%  #page: Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[02:22:50] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[02:27:05] <sukhe>	 should recover, smaller spike
[02:27:31] <jinxer-wm>	 RESOLVED: Primary outbound port utilisation over 80%  #page: Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[02:33:30] <jinxer-wm>	 FIRING: Primary outbound port utilisation over 80%  #page: Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[02:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:38:30] <jinxer-wm>	 RESOLVED: Primary outbound port utilisation over 80%  #page: Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[02:57:31] <jinxer-wm>	 FIRING: Primary outbound port utilisation over 80%  #page: Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[02:58:54] <cdanis>	 !inc!incidents
[02:58:58] <cdanis>	 !incidents
[02:58:58] <sirenbot>	 5372 (UNACKED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[02:58:59] <sirenbot>	 5371 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[02:58:59] <sirenbot>	 5370 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[02:58:59] <sirenbot>	 5369 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[02:58:59] <sirenbot>	 5368 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[02:58:59] <sirenbot>	 5367 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[02:59:02] <cdanis>	 !ack 5372
[02:59:02] <sirenbot>	 5372 (ACKED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[03:01:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:03:09] <wikibugs>	 (03PS1) 10CDanis: haproxy: bwlim-by-path: also roll out to eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1087615 (https://phabricator.wikimedia.org/T317799)
[03:07:02] <wikibugs>	 (03PS2) 10CDanis: haproxy: bwlim-by-path: also roll out to eqiad [puppet] - 10https://gerrit.wikimedia.org/r/1087615 (https://phabricator.wikimedia.org/T317799)
[03:07:06] <wikibugs>	 (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087615 (https://phabricator.wikimedia.org/T317799) (owner: 10CDanis)
[03:07:31] <jinxer-wm>	 RESOLVED: Primary outbound port utilisation over 80%  #page: Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[03:08:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[03:13:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[03:28:30] <jinxer-wm>	 FIRING: Primary outbound port utilisation over 80%  #page: Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[03:33:30] <jinxer-wm>	 RESOLVED: Primary outbound port utilisation over 80%  #page: Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[03:58:31] <jinxer-wm>	 FIRING: Primary outbound port utilisation over 80%  #page: Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[04:03:31] <jinxer-wm>	 RESOLVED: Primary outbound port utilisation over 80%  #page: Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[04:09:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[04:13:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_prometheus-mcrouter-exporter.service on idp1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:14:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[04:16:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:20:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[04:21:42] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:23:31] <jinxer-wm>	 FIRING: Primary outbound port utilisation over 80%  #page: Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[04:24:24] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2024-10-25-044319-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087616 (https://phabricator.wikimedia.org/T377160)
[04:25:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[04:28:31] <jinxer-wm>	 RESOLVED: Primary outbound port utilisation over 80%  #page: Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[04:31:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[04:36:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[04:46:42] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:29:44] <kart_>	 Doing cxserver deployment. Minor changes.
[05:30:13] <wikibugs>	 (03CR) 10KartikMistry: [C:03+2] Update cxserver to 2024-10-25-044319-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087616 (https://phabricator.wikimedia.org/T377160) (owner: 10KartikMistry)
[05:31:12] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2024-10-25-044319-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087616 (https://phabricator.wikimedia.org/T377160) (owner: 10KartikMistry)
[05:32:31] <jinxer-wm>	 FIRING: [2x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota got worse   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[05:33:49] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply
[05:34:12] <logmsgbot>	 !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[05:36:52] <logmsgbot>	 !log kartik@deploy2002 helmfile [codfw] START helmfile.d/services/cxserver: apply
[05:37:19] <logmsgbot>	 !log kartik@deploy2002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[05:37:31] <jinxer-wm>	 FIRING: [3x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota got worse   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[05:38:06] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[05:38:41] <logmsgbot>	 !log kartik@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[05:41:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[05:43:25] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: wmf_auto_restart_prometheus-mcrouter-exporter.service on idp1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:46:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[05:52:31] <jinxer-wm>	 FIRING: [3x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota got worse   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[05:52:51] <kart_>	 !log Updated cxserver to 2024-10-25-044319-production (T377160, T375102, T371420)
[05:52:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:52:56] <stashbot>	 T377160: Post-creation work for annwiki - https://phabricator.wikimedia.org/T377160
[05:52:57] <stashbot>	 T375102: Post-creation work for nrwiki - https://phabricator.wikimedia.org/T375102
[05:52:57] <stashbot>	 T371420: Complete enablement Section Translation in new wikis and make the process less manual for the future - https://phabricator.wikimedia.org/T371420
[05:57:31] <jinxer-wm>	 RESOLVED: Traffic bill over quota: Alert for device cr2-codfw.wikimedia.org - Traffic bill over quota   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[06:04:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[06:19:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[06:26:41] <icinga-wm>	 PROBLEM - BGP status on cr2-eqord is CRITICAL: BGP CRITICAL - AS6939/IPv6: Connect - HE, AS6939/IPv4: Connect - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[06:32:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[06:36:47] <icinga-wm>	 PROBLEM - BGP status on cr2-eqord is CRITICAL: BGP CRITICAL - AS6939/IPv6: Idle - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[06:41:11] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10295141 (10Papaul)
[06:45:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:46:25] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10295142 (10Papaul) There will be some maintenance in magru sometime next week and the site will be de-pool we can take advantage of this maintenance window to upgrade the router th...
[06:57:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[07:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T0700)
[07:04:21] <jinxer-wm>	 FIRING: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[07:05:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[07:09:21] <jinxer-wm>	 RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[07:19:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:40:59] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] Revert "mariadb: wipe /srv on pc1017" [puppet] - 10https://gerrit.wikimedia.org/r/1087499 (owner: 10Arnaudb)
[08:00:05] <jouncebot>	 Amir1, Urbanecm, and awight: That opportune time for a UTC morning backport window deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T0800).
[08:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[08:03:40] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087508 (owner: 10Muehlenhoff)
[08:04:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add a helper script to setup the Ganeti LVM vg [puppet] - 10https://gerrit.wikimedia.org/r/1087412 (owner: 10Muehlenhoff)
[08:07:29] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1087572 (https://phabricator.wikimedia.org/T336485) (owner: 10Cathal Mooney)
[08:07:39] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 220, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:08:01] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 129, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:12:13] <volans>	 !log manually cleared /root/.ssh/known_hosts on the cumin hosts - T336485
[08:12:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:12:22] <wikibugs>	 06SRE, 10SRE-tools, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485#10295209 (10Volans) >>! In T336485#10294334, @cmooney wrote: > I don't see that forced in /etc/ssh/ssh_config though.  Also w...
[08:12:26] <stashbot>	 T336485: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485
[08:13:13] <wikibugs>	 (03PS1) 10Volans: mysql_legacy: improve pymysql usability [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087852
[08:13:13] <wikibugs>	 (03PS1) 10Volans: mysql: remove deprecated call [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087853
[08:13:14] <wikibugs>	 (03PS1) 10Volans: mysql_legacy: add MysqlClient class [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087854
[08:13:14] <wikibugs>	 (03PS1) 10Volans: mysql: remove unused module [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087855
[08:13:14] <wikibugs>	 (03PS1) 10Volans: mysql_legacy: rename to mysql [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087856
[08:14:20] <wikibugs>	 (03CR) 10Elukey: [V:03+1 C:03+2] role::aux_k8s: clean up after containerd migration [puppet] - 10https://gerrit.wikimedia.org/r/1085391 (https://phabricator.wikimedia.org/T378345) (owner: 10Elukey)
[08:22:00] <wikibugs>	 (03CR) 10Elukey: [C:03+2] ms-be-simple: partman EFI recipe [puppet] - 10https://gerrit.wikimedia.org/r/1087538 (https://phabricator.wikimedia.org/T371400) (owner: 10JHathaway)
[08:26:54] <wikibugs>	 (03PS1) 10Elukey: profile::installserver::preseed: use the EFI recipe for ms-be2083 [puppet] - 10https://gerrit.wikimedia.org/r/1087858 (https://phabricator.wikimedia.org/T371400)
[08:29:00] <wikibugs>	 (03CR) 10MVernon: [C:03+1] profile::installserver::preseed: use the EFI recipe for ms-be2083 [puppet] - 10https://gerrit.wikimedia.org/r/1087858 (https://phabricator.wikimedia.org/T371400) (owner: 10Elukey)
[08:29:13] <wikibugs>	 (03CR) 10Volans: mysql_legacy: add MysqlClient class (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087854 (owner: 10Volans)
[08:33:43] <wikibugs>	 (03CR) 10Muehlenhoff: "Let's not do that, this will only lead to confusion. Going forward all of this information will move as metadata into Bitu, so the additio" [puppet] - 10https://gerrit.wikimedia.org/r/1087575 (https://phabricator.wikimedia.org/T276465) (owner: 10Dzahn)
[08:36:53] <wikibugs>	 (03CR) 10Elukey: [C:03+1] mysql_legacy: improve pymysql usability (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087852 (owner: 10Volans)
[08:37:38] <wikibugs>	 (03CR) 10Elukey: [C:03+1] mysql: remove deprecated call [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087853 (owner: 10Volans)
[08:39:17] <wikibugs>	 (03PS1) 10Muehlenhoff: Add the ganeti role to ganeti1043/ganeti1044 [puppet] - 10https://gerrit.wikimedia.org/r/1087859 (https://phabricator.wikimedia.org/T378921)
[08:39:21] <wikibugs>	 (03CR) 10Elukey: mysql_legacy: add MysqlClient class (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087854 (owner: 10Volans)
[08:39:58] <wikibugs>	 (03PS1) 10Volans: sre.switchdc.databases: use mysql native methods [cookbooks] - 10https://gerrit.wikimedia.org/r/1087860
[08:39:58] <wikibugs>	 (03PS1) 10Volans: Adapt to new Spicerack API renaming mysql_legacy [cookbooks] - 10https://gerrit.wikimedia.org/r/1087861
[08:40:24] <wikibugs>	 (03CR) 10Elukey: [C:03+1] mysql_legacy: add MysqlClient class (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087854 (owner: 10Volans)
[08:40:36] <jinxer-wm>	 FIRING: [4x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota got acknowledged   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[08:40:42] <wikibugs>	 (03CR) 10Elukey: [C:03+1] mysql: remove unused module [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087855 (owner: 10Volans)
[08:41:32] <wikibugs>	 (03CR) 10Volans: mysql_legacy: add MysqlClient class (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087854 (owner: 10Volans)
[08:41:45] <wikibugs>	 (03CR) 10Elukey: [C:03+1] mysql_legacy: rename to mysql [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087856 (owner: 10Volans)
[08:41:58] <wikibugs>	 (03CR) 10Elukey: [C:03+2] profile::installserver::preseed: use the EFI recipe for ms-be2083 [puppet] - 10https://gerrit.wikimedia.org/r/1087858 (https://phabricator.wikimedia.org/T371400) (owner: 10Elukey)
[08:42:25] <wikibugs>	 (03CR) 10Volans: "CI failing is because of the depends-on that has not yet been merged/released into spicerack." [cookbooks] - 10https://gerrit.wikimedia.org/r/1087860 (owner: 10Volans)
[08:43:00] <wikibugs>	 (03CR) 10Volans: "CI failing is because of the depends-on that has not yet been merged/released into spicerack." [cookbooks] - 10https://gerrit.wikimedia.org/r/1087861 (owner: 10Volans)
[08:43:15] <wikibugs>	 (03PS2) 10Volans: mysql_legacy: improve pymysql usability [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087852
[08:43:16] <wikibugs>	 (03PS2) 10Volans: mysql: remove deprecated call [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087853
[08:43:16] <wikibugs>	 (03PS2) 10Volans: mysql_legacy: add MysqlClient class [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087854
[08:43:16] <wikibugs>	 (03PS2) 10Volans: mysql: remove unused module [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087855
[08:43:17] <wikibugs>	 (03PS2) 10Volans: mysql_legacy: rename to mysql [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087856
[08:43:22] <wikibugs>	 (03CR) 10CI reject: [V:04-1] sre.switchdc.databases: use mysql native methods [cookbooks] - 10https://gerrit.wikimedia.org/r/1087860 (owner: 10Volans)
[08:43:22] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Adapt to new Spicerack API renaming mysql_legacy [cookbooks] - 10https://gerrit.wikimedia.org/r/1087861 (owner: 10Volans)
[08:43:50] <wikibugs>	 (03CR) 10Volans: mysql_legacy: improve pymysql usability (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087852 (owner: 10Volans)
[08:44:46] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] mysql_legacy: improve pymysql usability [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087852 (owner: 10Volans)
[08:45:36] <jinxer-wm>	 FIRING: [7x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota got acknowledged   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[08:46:39] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] mysql: remove deprecated call [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087853 (owner: 10Volans)
[08:46:41] <wikibugs>	 (03PS1) 10Fabfur: hiera: enable haproxykafka on whole ulsfo dc [puppet] - 10https://gerrit.wikimedia.org/r/1087862 (https://phabricator.wikimedia.org/T378578)
[08:46:44] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[08:46:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add the ganeti role to ganeti1043/ganeti1044 [puppet] - 10https://gerrit.wikimedia.org/r/1087859 (https://phabricator.wikimedia.org/T378921) (owner: 10Muehlenhoff)
[08:49:51] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087862 (https://phabricator.wikimedia.org/T378578) (owner: 10Fabfur)
[08:51:15] <wikibugs>	 (03PS1) 10Jaime Nuche: Fix category creations [extensions/Translate] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087863 (https://phabricator.wikimedia.org/T285463)
[08:53:00] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] mysql_legacy: add MysqlClient class [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087854 (owner: 10Volans)
[08:54:08] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] mysql: remove unused module [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087855 (owner: 10Volans)
[08:55:03] <wikibugs>	 (03CR) 10Jaime Nuche: "This is currently blocking the train deployment: https://phabricator.wikimedia.org/T375661 A pair of eyes here would be appreciated" [extensions/Translate] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087863 (https://phabricator.wikimedia.org/T285463) (owner: 10Jaime Nuche)
[08:56:57] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
[08:57:02] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] mysql_legacy: rename to mysql [software/spicerack] - 10https://gerrit.wikimedia.org/r/1087856 (owner: 10Volans)
[08:58:21] <wikibugs>	 (03PS2) 10Abijeet Patro: Fix automatic category creations by FuzzyBot [extensions/Translate] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087863 (https://phabricator.wikimedia.org/T285463) (owner: 10Jaime Nuche)
[09:00:05] <jouncebot>	 jnuche and dduvall: That opportune time for a MediaWiki train - Utc-0+Utc-7 Version deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T0900).
[09:00:36] <jinxer-wm>	 FIRING: [7x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota got acknowledged   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[09:01:36] <jnuche>	 hi there, train is currently blocked on T285463
[09:01:37] <stashbot>	 T285463: FuzzyBot should automatically create wanted categories that are marked for translation - https://phabricator.wikimedia.org/T285463
[09:05:36] <jinxer-wm>	 RESOLVED: [3x] Traffic bill over quota: Alert for device cr2-codfw.wikimedia.org - Traffic bill over quota got acknowledged   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[09:09:07] <wikibugs>	 (03PS1) 10Elukey: sre.hosts.reimage: fix _validate() when using UEFI [cookbooks] - 10https://gerrit.wikimedia.org/r/1087865 (https://phabricator.wikimedia.org/T373519)
[09:10:20] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
[09:13:06] <wikibugs>	 (03PS1) 10Arnaudb: mariadb: add es204[1-6] [puppet] - 10https://gerrit.wikimedia.org/r/1087866 (https://phabricator.wikimedia.org/T378146)
[09:14:38] <wikibugs>	 (03CR) 10Elukey: "test-cookbooked with ms-be2083 and it worked :)" [cookbooks] - 10https://gerrit.wikimedia.org/r/1087865 (https://phabricator.wikimedia.org/T373519) (owner: 10Elukey)
[09:15:25] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/1087865 (https://phabricator.wikimedia.org/T373519) (owner: 10Elukey)
[09:18:25] <wikibugs>	 (03CR) 10Elukey: [C:03+2] sre.hosts.reimage: fix _validate() when using UEFI [cookbooks] - 10https://gerrit.wikimedia.org/r/1087865 (https://phabricator.wikimedia.org/T373519) (owner: 10Elukey)
[09:20:11] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10295315 (10ABran-WMF) >>! In T378146#10293606, @Jhancock.wm wrote: > @ABran-WMF we've received these servers. Please update the site.pp file. trying to...
[09:20:30] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
[09:21:32] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Configure cumin ssh to use network settings for fasw switches [puppet] - 10https://gerrit.wikimedia.org/r/1087572 (https://phabricator.wikimedia.org/T336485) (owner: 10Cathal Mooney)
[09:22:03] <wikibugs>	 06SRE, 06Infrastructure-Foundations: sre.netbox.update-extras hits KeyError with logging - https://phabricator.wikimedia.org/T379072#10295326 (10Volans) In our netbox config we have for the logging formatters: ` 'django.server': {         '()': 'django.utils.log.ServerFormatter',         'format': '[%(server_t...
[09:24:21] <wikibugs>	 06SRE, 06Infrastructure-Foundations: sre.netbox.update-extras hits KeyError with logging - https://phabricator.wikimedia.org/T379072#10295330 (10cmooney) >>! In T379072#10295326, @Volans wrote: > and I guess server_time is not defined outside of django web app when when we run the `manage.py syncdatasource` co...
[09:25:14] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043
[09:25:23] <wikibugs>	 (03CR) 10Abijeet Patro: [C:03+1] Fix automatic category creations by FuzzyBot [extensions/Translate] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087863 (https://phabricator.wikimedia.org/T285463) (owner: 10Jaime Nuche)
[09:27:40] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043
[09:28:02] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044
[09:29:43] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044
[09:30:45] <wikibugs>	 (03CR) 10Jaime Nuche: [C:03+2] Fix automatic category creations by FuzzyBot [extensions/Translate] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087863 (https://phabricator.wikimedia.org/T285463) (owner: 10Jaime Nuche)
[09:31:16] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet
[09:34:35] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): tables-catalog: Add GlobalUsage (globalimagelinks) [puppet] - 10https://gerrit.wikimedia.org/r/1087867 (https://phabricator.wikimedia.org/T363581)
[09:35:27] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] sre.switchdc.databases: use mysql native methods [cookbooks] - 10https://gerrit.wikimedia.org/r/1087860 (owner: 10Volans)
[09:36:41] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] Adapt to new Spicerack API renaming mysql_legacy [cookbooks] - 10https://gerrit.wikimedia.org/r/1087861 (owner: 10Volans)
[09:37:12] <wikibugs>	 (03PS1) 10Elukey: sre.hosts.reimage: fix remote command to use to test if d-i started [cookbooks] - 10https://gerrit.wikimedia.org/r/1087869 (https://phabricator.wikimedia.org/T373519)
[09:38:29] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet
[09:38:34] <logmsgbot>	 !log elukey@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
[09:41:42] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet
[09:43:25] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: wmf_auto_restart_prometheus-mcrouter-exporter.service on idp1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:48:59] <wikibugs>	 (03Merged) 10jenkins-bot: Fix automatic category creations by FuzzyBot [extensions/Translate] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087863 (https://phabricator.wikimedia.org/T285463) (owner: 10Jaime Nuche)
[09:49:08] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet
[09:49:17] <wikibugs>	 (03PS2) 10Slyngshede: P:idp enable Redis TGT backend [puppet] - 10https://gerrit.wikimedia.org/r/1087167 (https://phabricator.wikimedia.org/T377728)
[09:49:45] <wikibugs>	 (03PS3) 10Slyngshede: P:idp enable Redis TGT backend [puppet] - 10https://gerrit.wikimedia.org/r/1087167 (https://phabricator.wikimedia.org/T377728)
[09:50:54] <wikibugs>	 (03CR) 10Elukey: "The current /proc/cmd value is:" [cookbooks] - 10https://gerrit.wikimedia.org/r/1087869 (https://phabricator.wikimedia.org/T373519) (owner: 10Elukey)
[09:51:38] <logmsgbot>	 !log jnuche@deploy2002 Started scap sync-world: Backport for [[gerrit:1087863|Fix automatic category creations by FuzzyBot (T285463)]]
[09:51:41] <stashbot>	 T285463: FuzzyBot should automatically create wanted categories that are marked for translation - https://phabricator.wikimedia.org/T285463
[09:52:07] <wikibugs>	 (03CR) 10Marostegui: "remember these need to go to zarcillo with the final rack location too." [puppet] - 10https://gerrit.wikimedia.org/r/1087866 (https://phabricator.wikimedia.org/T378146) (owner: 10Arnaudb)
[09:52:20] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B
[09:52:30] <wikibugs>	 (03CR) 10Marostegui: [C:04-1] "missing partman recipe" [puppet] - 10https://gerrit.wikimedia.org/r/1087866 (https://phabricator.wikimedia.org/T378146) (owner: 10Arnaudb)
[09:53:25] <jinxer-wm>	 FIRING: [4x] SystemdUnitFailed: wmf_auto_restart_prometheus-mcrouter-exporter.service on idp1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:53:45] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] sre.hosts.reimage: fix remote command to use to test if d-i started [cookbooks] - 10https://gerrit.wikimedia.org/r/1087869 (https://phabricator.wikimedia.org/T373519) (owner: 10Elukey)
[09:53:50] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B
[09:54:19] <wikibugs>	 (03CR) 10Volans: [C:03+1] "checking preseed LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/1087869 (https://phabricator.wikimedia.org/T373519) (owner: 10Elukey)
[09:54:36] <logmsgbot>	 !log jnuche@deploy2002 jnuche: Backport for [[gerrit:1087863|Fix automatic category creations by FuzzyBot (T285463)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[09:54:51] <logmsgbot>	 !log jnuche@deploy2002 jnuche: Continuing with sync
[09:54:58] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B
[09:55:12] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B
[09:55:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:56:50] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Q4:rack/setup/install ganeti1039 to ganeti1052 - https://phabricator.wikimedia.org/T365650#10295412 (10MoritzMuehlenhoff) >>! In T365650#10287518, @elukey wrote: > Fixed 1044. For some reason IPv6 support was disabled, so our settings like `IPv6Au...
[09:57:15] <wikibugs>	 (03CR) 10Elukey: [C:03+2] sre.hosts.reimage: fix remote command to use to test if d-i started [cookbooks] - 10https://gerrit.wikimedia.org/r/1087869 (https://phabricator.wikimedia.org/T373519) (owner: 10Elukey)
[09:57:19] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10295413 (10MoritzMuehlenhoff)
[09:59:42] <logmsgbot>	 !log jnuche@deploy2002 Finished scap sync-world: Backport for [[gerrit:1087863|Fix automatic category creations by FuzzyBot (T285463)]] (duration: 08m 03s)
[09:59:45] <stashbot>	 T285463: FuzzyBot should automatically create wanted categories that are marked for translation - https://phabricator.wikimedia.org/T285463
[09:59:57] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
[10:05:22] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10295447 (10ops-monitoring-bot) VM ml-etcd1001.eqiad.wmnet switching disk type to drbd
[10:12:46] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
[10:12:56] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
[10:13:35] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ldap/nda for Deepesha Burse WMDE - https://phabricator.wikimedia.org/T378182#10295498 (10MatthewVernon) @Deepesha_WMDE do you have a Wikimedia developer account? If so, what is the username? If not, can you create one following [[ https://wikitech.wikimedia.org/wi...
[10:15:16] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10295503 (10ops-monitoring-bot) Draining ganeti1014.eqiad.wmnet of running VMs
[10:15:17] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
[10:15:53] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
[10:16:07] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10295504 (10ops-monitoring-bot) Draining ganeti1014.eqiad.wmnet of running VMs
[10:16:10] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
[10:16:31] <wikibugs>	 (03PS1) 10Slyngshede: Block Search: Priorities form input over query params. [software/bitu] - 10https://gerrit.wikimedia.org/r/1087875 (https://phabricator.wikimedia.org/T378338)
[10:17:01] <Lucas_WMDE>	 jouncebot: now
[10:17:01] <jouncebot>	 For the next 0 hour(s) and 42 minute(s): MediaWiki train - Utc-0+Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T0900)
[10:17:02] <wikibugs>	 (03PS2) 10Fabfur: hiera: enable haproxykafka on whole ulsfo dc [puppet] - 10https://gerrit.wikimedia.org/r/1087862 (https://phabricator.wikimedia.org/T378578)
[10:17:19] <Lucas_WMDE>	 ah, nevermind, jnuche already deployed https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/1087863 :)
[10:18:16] <jnuche>	 Lucas_WMDE: yeah, I'm waiting for confirmation the fix worked, then I'll continue with the train
[10:19:04] <Lucas_WMDE>	 nice
[10:26:05] <wikibugs>	 (03PS2) 10Arnaudb: mariadb: add es204[1-6] [puppet] - 10https://gerrit.wikimedia.org/r/1087866 (https://phabricator.wikimedia.org/T378146)
[10:26:20] <wikibugs>	 (03CR) 10Arnaudb: "good catch!" [puppet] - 10https://gerrit.wikimedia.org/r/1087866 (https://phabricator.wikimedia.org/T378146) (owner: 10Arnaudb)
[10:26:34] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be208[1-8] - https://phabricator.wikimedia.org/T371400#10295522 (10elukey) @MatthewVernon I tried to provision/reimage ms-be2083 with UEFI but we have the same `/dev/disk/by-path` duplication issue, I think it...
[10:27:48] <wikibugs>	 (03PS3) 10Arnaudb: mariadb: add es204[1-6] [puppet] - 10https://gerrit.wikimedia.org/r/1087866 (https://phabricator.wikimedia.org/T378146)
[10:27:51] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain
[10:28:20] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10295524 (10ops-monitoring-bot) VM ml-etcd1001.eqiad.wmnet switching disk type to plain
[10:28:28] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain
[10:28:38] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Document available wbformatvalue options [extensions/Wikibase] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087877 (https://phabricator.wikimedia.org/T323778)
[10:29:26] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "I think I’d like to backport this so it’s live when we send the breaking change announcement out. (But backports with i18n changes have be" [extensions/Wikibase] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087877 (https://phabricator.wikimedia.org/T323778) (owner: 10Lucas Werkmeister (WMDE))
[10:29:59] <jnuche>	 fix seemed to have worked, train will be rolling ahead in the next few minutes
[10:30:00] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 06 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [extensions/Wikibase] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087877 (https://phabricator.wikimedia.org/T323778) (owner: 10Lucas Werkmeister (WMDE))
[10:32:35] <XioNoX>	 !log push new pfw policies - T379127
[10:32:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:32:37] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087862 (https://phabricator.wikimedia.org/T378578) (owner: 10Fabfur)
[10:33:16] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087878 (https://phabricator.wikimedia.org/T375661)
[10:33:18] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] group1 to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087878 (https://phabricator.wikimedia.org/T375661) (owner: 10TrainBranchBot)
[10:34:03] <wikibugs>	 (03Merged) 10jenkins-bot: group1 to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087878 (https://phabricator.wikimedia.org/T375661) (owner: 10TrainBranchBot)
[10:35:38] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 06 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1085572 (owner: 10Hamish)
[10:36:03] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be208[1-8] - https://phabricator.wikimedia.org/T371400#10295541 (10MatthewVernon) Alas :(   I think adjusting the fact is the way to go? Presumably it now needs to keep track of the targets of the symlinks in...
[10:39:29] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "Adding also Brian and David since the tlproxy::localssl is deployed on elastic nodes, even if it will be a no-op." [puppet] - 10https://gerrit.wikimedia.org/r/1087421 (https://phabricator.wikimedia.org/T378944) (owner: 10Elukey)
[10:39:29] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] hiera: enable haproxykafka on whole ulsfo dc [puppet] - 10https://gerrit.wikimedia.org/r/1087862 (https://phabricator.wikimedia.org/T378578) (owner: 10Fabfur)
[10:41:15] <logmsgbot>	 !log jnuche@deploy2002 rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2  refs T375661
[10:41:18] <stashbot>	 T375661: 1.44.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T375661
[10:42:44] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] hiera: enable haproxykafka on whole ulsfo dc [puppet] - 10https://gerrit.wikimedia.org/r/1087862 (https://phabricator.wikimedia.org/T378578) (owner: 10Fabfur)
[10:43:06] <elukey>	 !log depool maps1005 to test an nginx config - T378944
[10:43:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:43:13] <stashbot>	 T378944: Strategy to slowly move Kartotherian's traffic from bare metal to k8s - https://phabricator.wikimedia.org/T378944
[10:43:33] <fabfur>	 !log rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) (T378578)
[10:43:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:43:43] <stashbot>	 T378578: Rollout haproxykafka on all hosts - https://phabricator.wikimedia.org/T378578
[10:45:20] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] mariadb: add es204[1-6] [puppet] - 10https://gerrit.wikimedia.org/r/1087866 (https://phabricator.wikimedia.org/T378146) (owner: 10Arnaudb)
[10:45:26] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es104[1-6] - https://phabricator.wikimedia.org/T378143#10295570 (10ABran-WMF) so, 1020 would go in [[ https://netbox.wikimedia.org/dcim/racks/2/ | A2 ]] es1045 in [[ https://netbox.wikimedia.org/dcim/racks/8...
[10:45:44] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] mariadb: add es204[1-6] [puppet] - 10https://gerrit.wikimedia.org/r/1087866 (https://phabricator.wikimedia.org/T378146) (owner: 10Arnaudb)
[10:47:03] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10295572 (10ABran-WMF) >>! In T378146#10295315, @ABran-WMF wrote: > done, but given that's my first install task, I'll wait for CR approval to come!  done!
[10:50:50] <logmsgbot>	 !log elukey@cumin2002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
[10:51:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1087167 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede)
[10:55:56] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] P:idp enable Redis TGT backend [puppet] - 10https://gerrit.wikimedia.org/r/1087167 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede)
[10:56:44] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10295607 (10Marostegui) >>! In T378146#10293606, @Jhancock.wm wrote: > @Marostegui this should make it diverse. lmk if you want something different. > e...
[10:56:50] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be208[1-8] - https://phabricator.wikimedia.org/T371400#10295608 (10elukey) >>! In T371400#10295541, @MatthewVernon wrote: > Alas :(  >  > I think adjusting the fact is the way to go? Presumably it now needs to...
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T1100)
[11:05:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[11:05:50] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "Confirming that lines were only moved around and not changed, as far as I can tell." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1085572 (owner: 10Hamish)
[11:06:45] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Q4:rack/setup/install ganeti1039 to ganeti1052 - https://phabricator.wikimedia.org/T365650#10295649 (10MoritzMuehlenhoff) I had been running into issues with moving VMs to ganeti1041 this morning (which is already added to the Ganeti cluster) and...
[11:08:09] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
[11:08:20] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Q4:rack/setup/install ganeti1039 to ganeti1052 - https://phabricator.wikimedia.org/T365650#10295654 (10ops-monitoring-bot) Draining ganeti1041.eqiad.wmnet of running VMs
[11:17:12] <wikibugs>	 (03PS1) 10Vgutierrez: hiera,liberica: Disable rp_filter [puppet] - 10https://gerrit.wikimedia.org/r/1087887 (https://phabricator.wikimedia.org/T377127)
[11:18:05] <wikibugs>	 (03PS2) 10Vgutierrez: hiera,liberica: Disable rp_filter [puppet] - 10https://gerrit.wikimedia.org/r/1087887 (https://phabricator.wikimedia.org/T377127)
[11:18:10] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087887 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[11:19:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:19:56] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] hiera,liberica: Disable rp_filter [puppet] - 10https://gerrit.wikimedia.org/r/1087887 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[11:25:48] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Deprecate system::role for memcached/redis roles [puppet] - 10https://gerrit.wikimedia.org/r/1083160 (owner: 10Muehlenhoff)
[11:30:03] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
[11:30:06] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[11:30:20] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[11:31:53] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[11:32:42] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[11:34:03] <icinga-wm>	 PROBLEM - Host ganeti1041 is DOWN: PING CRITICAL - Packet loss = 100%
[11:34:05] <icinga-wm>	 PROBLEM - Host ganeti1044 is DOWN: PING CRITICAL - Packet loss = 100%
[11:35:12] <elukey>	 these two are me --^
[11:36:33] <icinga-wm>	 RECOVERY - Host ganeti1044 is UP: PING OK - Packet loss = 0%, RTA = 0.39 ms
[11:37:07] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[11:37:17] <icinga-wm>	 RECOVERY - Host ganeti1041 is UP: PING OK - Packet loss = 0%, RTA = 0.36 ms
[11:37:37] <jinxer-wm>	 FIRING: ProbeDown: Service ganeti1041:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:37:57] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[11:38:20] <wikibugs>	 (03PS1) 10Muehlenhoff: sre.hosts.provision: Turn virt warning into a hard error [cookbooks] - 10https://gerrit.wikimedia.org/r/1087889
[11:38:58] <jinxer-wm>	 RESOLVED: ProbeDown: Service ganeti1041:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:40:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[11:41:41] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: openstack: designate: deploy and enable wmcs-nova-fixed-ptr [puppet] - 10https://gerrit.wikimedia.org/r/1087521 (https://phabricator.wikimedia.org/T378192)
[11:42:10] <wikibugs>	 (03CR) 10Elukey: [C:03+1] "LGTM, we could also think about silently enabling the flag instead of hard failing, but it is also good to be explicit. Fine to proceed fo" [cookbooks] - 10https://gerrit.wikimedia.org/r/1087889 (owner: 10Muehlenhoff)
[11:42:51] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/1087889 (owner: 10Muehlenhoff)
[11:45:43] <wikibugs>	 (03CR) 10Vgutierrez: [V:03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4456/console" [puppet] - 10https://gerrit.wikimedia.org/r/1087371 (https://phabricator.wikimedia.org/T374473) (owner: 10Fabfur)
[11:48:17] <wikibugs>	 (03PS1) 10MVernon: facts: adjust swift_disks fact to handle new SM kit [puppet] - 10https://gerrit.wikimedia.org/r/1087891 (https://phabricator.wikimedia.org/T371400)
[11:48:47] <wikibugs>	 (03PS1) 10Marostegui: control-mariadb-10.6-bookworm: Update version [software] - 10https://gerrit.wikimedia.org/r/1087892 (https://phabricator.wikimedia.org/T378940)
[11:49:27] <wikibugs>	 (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087891 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[11:49:28] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] control-mariadb-10.6-bookworm: Update version [software] - 10https://gerrit.wikimedia.org/r/1087892 (https://phabricator.wikimedia.org/T378940) (owner: 10Marostegui)
[11:49:57] <wikibugs>	 (03Merged) 10jenkins-bot: control-mariadb-10.6-bookworm: Update version [software] - 10https://gerrit.wikimedia.org/r/1087892 (https://phabricator.wikimedia.org/T378940) (owner: 10Marostegui)
[11:50:58] <wikibugs>	 (03Abandoned) 10Fabfur: haproxykafka: restart service on config file changes [puppet] - 10https://gerrit.wikimedia.org/r/1087371 (https://phabricator.wikimedia.org/T374473) (owner: 10Fabfur)
[11:52:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[11:53:18] <wikibugs>	 (03PS4) 10Abijeet Patro: tables-catalog: Add translate_message_group_subscriptions table [puppet] - 10https://gerrit.wikimedia.org/r/1082549 (https://phabricator.wikimedia.org/T372287)
[11:53:21] <wikibugs>	 (03PS3) 10Abijeet Patro: tables-catalog: Add translate_cache table [puppet] - 10https://gerrit.wikimedia.org/r/1082546 (https://phabricator.wikimedia.org/T370265)
[11:53:26] <wikibugs>	 (03CR) 10Abijeet Patro: tables-catalog: Add translate_cache table (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1082546 (https://phabricator.wikimedia.org/T370265) (owner: 10Abijeet Patro)
[11:53:28] <wikibugs>	 (03CR) 10Abijeet Patro: tables-catalog: Add translate_message_group_subscriptions table (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1082549 (https://phabricator.wikimedia.org/T372287) (owner: 10Abijeet Patro)
[11:53:29] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[11:53:51] <icinga-wm>	 PROBLEM - BFD status on cr2-eqiad is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[11:54:09] <icinga-wm>	 PROBLEM - OSPF status on cr1-drmrs is CRITICAL: OSPFv2: 1/2 UP : OSPFv3: 1/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[11:55:09] <icinga-wm>	 RECOVERY - OSPF status on cr1-drmrs is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[11:55:29] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[11:55:51] <icinga-wm>	 RECOVERY - BFD status on cr2-eqiad is OK: UP: 25 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[11:57:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[11:59:15] <wikibugs>	 (03CR) 10Mvolz: [C:03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087435 (owner: 10PipelineBot)
[12:00:05] <jouncebot>	 mvolz: Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T1200). Please do the needful.
[12:00:16] <wikibugs>	 (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087435 (owner: 10PipelineBot)
[12:02:17] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply
[12:02:47] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply
[12:03:25] <logmsgbot>	 !log mvolz@deploy2002 helmfile [codfw] START helmfile.d/services/citoid: apply
[12:03:53] <logmsgbot>	 !log mvolz@deploy2002 helmfile [codfw] DONE helmfile.d/services/citoid: apply
[12:05:37] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json
[12:06:22] <logmsgbot>	 !log mvolz@deploy2002 helmfile [eqiad] START helmfile.d/services/citoid: apply
[12:06:57] <logmsgbot>	 !log mvolz@deploy2002 helmfile [eqiad] DONE helmfile.d/services/citoid: apply
[12:08:10] <wikibugs>	 (03PS1) 10Arnaudb: mariadb: disable notifications on pc1017 [puppet] - 10https://gerrit.wikimedia.org/r/1087893
[12:09:35] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool
[12:09:39] <logmsgbot>	 !log arnaudb@cumin1002 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool
[12:09:47] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] mariadb: disable notifications on pc1017 [puppet] - 10https://gerrit.wikimedia.org/r/1087893 (owner: 10Arnaudb)
[12:14:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] sre.hosts.provision: Turn virt warning into a hard error [cookbooks] - 10https://gerrit.wikimedia.org/r/1087889 (owner: 10Muehlenhoff)
[12:21:11] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing
[12:21:13] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing
[12:21:17] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing
[12:21:20] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing
[12:22:11] <icinga-wm>	 PROBLEM - Uncommitted dbctl configuration changes- check dbctl config diff on cumin1002 is CRITICAL: CRITICAL - Uncommitted dbctl configuration changes, check dbctl config diff https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs
[12:22:25] <icinga-wm>	 PROBLEM - Uncommitted dbctl configuration changes- check dbctl config diff on cumin2002 is CRITICAL: CRITICAL - Uncommitted dbctl configuration changes, check dbctl config diff https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs
[12:22:40] <wikibugs>	 (03PS1) 10Volans: sre.mysql.pool: fix check for diff [cookbooks] - 10https://gerrit.wikimedia.org/r/1087895
[12:23:02] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10295844 (10ABran-WMF) >>! In T378146#10295607, @Marostegui wrote: >>>! In T378146#10293606, @Jhancock.wm wrote: >> es2043 > [[ https://netbox.wikimedia...
[12:23:19] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json
[12:23:37] <marostegui>	 !log Migrate db1125 to MariaDB 10.6.20 T378940
[12:23:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:23:39] <stashbot>	 T378940: Compile and package MariaDB 10.11.10 and 10.6.20 - https://phabricator.wikimedia.org/T378940
[12:23:48] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json
[12:25:00] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test 1087895
[12:25:42] <wikibugs>	 (03CR) 10Arnaudb: "tested on db1206 →" [cookbooks] - 10https://gerrit.wikimedia.org/r/1087895 (owner: 10Volans)
[12:25:49] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] sre.mysql.pool: fix check for diff [cookbooks] - 10https://gerrit.wikimedia.org/r/1087895 (owner: 10Volans)
[12:27:11] <icinga-wm>	 RECOVERY - Uncommitted dbctl configuration changes- check dbctl config diff on cumin1002 is OK: OK - no diffs https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs
[12:27:25] <icinga-wm>	 RECOVERY - Uncommitted dbctl configuration changes- check dbctl config diff on cumin2002 is OK: OK - no diffs https://wikitech.wikimedia.org/wiki/Dbctl%23Uncommitted_dbctl_diffs
[12:27:35] <arnaudb>	 sorry for the noise 
[12:28:43] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10295858 (10Marostegui) I would suggest to update the task so this new racking proposal is clearer.
[12:30:36] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10295863 (10ABran-WMF)
[12:31:21] <wikibugs>	 (03CR) 10Volans: [C:03+2] sre.mysql.pool: fix check for diff [cookbooks] - 10https://gerrit.wikimedia.org/r/1087895 (owner: 10Volans)
[12:34:23] <wikibugs>	 (03PS1) 10Slyngshede: Upgrade CAS and enable Redis TGT [dns] - 10https://gerrit.wikimedia.org/r/1087896 (https://phabricator.wikimedia.org/T377728)
[12:37:05] <wikibugs>	 (03Merged) 10jenkins-bot: sre.mysql.pool: fix check for diff [cookbooks] - 10https://gerrit.wikimedia.org/r/1087895 (owner: 10Volans)
[12:40:10] <wikibugs>	 (03CR) 10Nikerabbit: [C:03+1] tables-catalog: Add translate_message_group_subscriptions table [puppet] - 10https://gerrit.wikimedia.org/r/1082549 (https://phabricator.wikimedia.org/T372287) (owner: 10Abijeet Patro)
[12:40:12] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test 1087895
[12:40:19] <wikibugs>	 (03CR) 10Nikerabbit: [C:03+1] tables-catalog: Add translate_cache table [puppet] - 10https://gerrit.wikimedia.org/r/1082546 (https://phabricator.wikimedia.org/T370265) (owner: 10Abijeet Patro)
[12:40:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM, idp2004 appears to work fine" [dns] - 10https://gerrit.wikimedia.org/r/1087896 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede)
[12:41:05] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
[12:41:35] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10295895 (10ops-monitoring-bot) VM ml-etcd1001.eqiad.wmnet switching disk type to drbd
[12:43:27] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087521 (https://phabricator.wikimedia.org/T378192) (owner: 10Arturo Borrero Gonzalez)
[12:45:56] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] mariadb: productionize db2236 [puppet] - 10https://gerrit.wikimedia.org/r/1087202 (https://phabricator.wikimedia.org/T373579) (owner: 10Arnaudb)
[12:50:56] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
[12:52:06] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
[12:52:13] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Upgrade CAS and enable Redis TGT [dns] - 10https://gerrit.wikimedia.org/r/1087896 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede)
[12:52:18] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10295908 (10ops-monitoring-bot) Draining ganeti1014.eqiad.wmnet of running VMs
[12:52:42] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
[12:52:53] <slyngs>	 !log IDP/CAS-SSO Enable Redis TGT backend
[12:52:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:54:41] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - T373579
[12:54:43] <stashbot>	 T373579: Productionize db22[21-40] - https://phabricator.wikimedia.org/T373579
[12:54:55] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - T373579
[12:54:58] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - T373579
[12:55:12] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - T373579
[12:55:15] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236
[12:55:21] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236
[12:55:45] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain
[12:56:39] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10295921 (10ops-monitoring-bot) VM ml-etcd1001.eqiad.wmnet switching disk type to plain
[12:56:49] <logmsgbot>	 !log arnaudb@cumin1002 dbctl commit (dc=all): 'Cloning db2136 in db2236 for T373579', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json
[12:56:52] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain
[12:58:38] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd
[13:00:39] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10295930 (10ops-monitoring-bot) VM dse-k8s-etcd1002.eqiad.wmnet switching disk type to drbd
[13:02:17] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet
[13:04:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: httpbb_kubernetes_mw-api-ext_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:07:17] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[13:08:30] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd
[13:16:35] <wikibugs>	 (03PS1) 10Arnaudb: mariadb: productionize db2226 [puppet] - 10https://gerrit.wikimedia.org/r/1087902 (https://phabricator.wikimedia.org/T373579)
[13:19:11] <wikibugs>	 (03PS1) 10Brouberol: airflow: render the spark/hadoop/hdfs/yarn configuration files [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087903 (https://phabricator.wikimedia.org/T377928)
[13:20:01] <wikibugs>	 (03PS2) 10Brouberol: airflow: render the spark/hadoop/hdfs/yarn configuration files [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087903 (https://phabricator.wikimedia.org/T377928)
[13:20:55] <wikibugs>	 (03CR) 10CI reject: [V:04-1] airflow: render the spark/hadoop/hdfs/yarn configuration files [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087903 (https://phabricator.wikimedia.org/T377928) (owner: 10Brouberol)
[13:24:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: httpbb_kubernetes_mw-api-ext_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:25:55] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:27:17] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[13:27:25] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
[13:27:41] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet
[13:27:48] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
[13:28:23] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10296011 (10ops-monitoring-bot) Draining ganeti1014.eqiad.wmnet of running VMs
[13:28:28] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
[13:30:02] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] replace list of cassandra hosts with faux values (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087514 (owner: 10Eevans)
[13:33:38] <wikibugs>	 (03PS4) 10Arturo Borrero Gonzalez: openstack: designate: deploy and enable wmcs-nova-fixed-ptr [puppet] - 10https://gerrit.wikimedia.org/r/1087521 (https://phabricator.wikimedia.org/T378192)
[13:34:14] <wikibugs>	 (03CR) 10CI reject: [V:04-1] openstack: designate: deploy and enable wmcs-nova-fixed-ptr [puppet] - 10https://gerrit.wikimedia.org/r/1087521 (https://phabricator.wikimedia.org/T378192) (owner: 10Arturo Borrero Gonzalez)
[13:38:07] <denisse>	 !incidents
[13:38:08] <sirenbot>	 5375 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[13:38:08] <sirenbot>	 5374 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[13:38:08] <sirenbot>	 5373 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[13:38:08] <sirenbot>	 5372 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[13:38:09] <sirenbot>	 5371 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[13:38:09] <sirenbot>	 5370 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[13:38:09] <sirenbot>	 5369 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[13:38:10] <sirenbot>	 5368 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[13:38:10] <sirenbot>	 5367 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[13:39:22] <wikibugs>	 (03PS3) 10Brouberol: airflow: render the spark/hadoop/hdfs/yarn configuration files [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087903 (https://phabricator.wikimedia.org/T377928)
[13:41:59] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain
[13:42:25] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[13:43:09] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[13:43:53] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10296070 (10ops-monitoring-bot) VM dse-k8s-etcd1002.eqiad.wmnet switching disk type to plain
[13:44:07] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain
[13:47:29] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B
[13:50:30] <wikibugs>	 (03PS4) 10Brouberol: airflow: render the spark/hadoop/hdfs/yarn configuration files [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087903 (https://phabricator.wikimedia.org/T377928)
[13:52:09] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B
[13:52:28] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
[13:52:42] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10296112 (10MoritzMuehlenhoff)
[13:52:50] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10296116 (10ops-monitoring-bot) Draining ganeti1014.eqiad.wmnet of running VMs
[13:53:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-mcrouter-exporter.service on idp2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:54:20] <wikibugs>	 (03PS1) 10Muehlenhoff: Add ganeti1045/ganeti1046 as Ganeti nodes [puppet] - 10https://gerrit.wikimedia.org/r/1087912 (https://phabricator.wikimedia.org/T378921)
[13:57:10] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add ganeti1045/ganeti1046 as Ganeti nodes [puppet] - 10https://gerrit.wikimedia.org/r/1087912 (https://phabricator.wikimedia.org/T378921) (owner: 10Muehlenhoff)
[13:57:29] <wikibugs>	 (03CR) 10Marostegui: [C:04-1] "Needs to be included in s2" [puppet] - 10https://gerrit.wikimedia.org/r/1087902 (https://phabricator.wikimedia.org/T373579) (owner: 10Arnaudb)
[13:58:25] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitFailed: wmf_auto_restart_prometheus-mcrouter-exporter.service on idp2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T1400).
[14:00:05] <jouncebot>	 Hamishcz and Lucas_WMDE: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:45] <wikibugs>	 (03PS5) 10Arturo Borrero Gonzalez: openstack: designate: deploy and enable wmcs-nova-fixed-ptr [puppet] - 10https://gerrit.wikimedia.org/r/1087521 (https://phabricator.wikimedia.org/T378192)
[14:01:06] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] changeprop-jobqueue: update to 2024-11-05-170900-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087557 (https://phabricator.wikimedia.org/T356241) (owner: 10Scott French)
[14:01:40] <wikibugs>	 (03PS1) 10Slyngshede: P:idp enable Redis TGT for all of production. [puppet] - 10https://gerrit.wikimedia.org/r/1087913 (https://phabricator.wikimedia.org/T377728)
[14:02:27] <wikibugs>	 06SRE, 10Dumps 2.0, 10Dumps-Generation, 13Patch-For-Review: Dumps generation cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#10296152 (10ABran-WMF) it seems that the issue occured again [[ https://wm-bot.wmflabs.org/libera_logs/%23wikimedia-data-persistence/202411...
[14:02:40] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
[14:02:44] <stashbot>	 T378453: Testing liberica with ncredir@eqiad - https://phabricator.wikimedia.org/T378453
[14:02:46] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
[14:03:07] <Lucas_WMDE>	 o/
[14:03:09] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4457/console" [puppet] - 10https://gerrit.wikimedia.org/r/1087913 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede)
[14:03:11] <Lucas_WMDE>	 I can deploy!
[14:04:16] <Lucas_WMDE>	 waiting for Hamishcz to show up, I think
[14:04:26] <Lucas_WMDE>	 my backport will take a long time so I’d prefer not to do that first
[14:04:30] <Lucas_WMDE>	 (though I guess i can still +2 it)
[14:04:36] <wikibugs>	 (03PS2) 10Dzahn: admin: add group approvers for druid-admins, htmldumps-admin, udp2log-users [puppet] - 10https://gerrit.wikimedia.org/r/1087575 (https://phabricator.wikimedia.org/T276465)
[14:04:49] <wikibugs>	 (03CR) 10Dzahn: admin: add group approvers for druid-admins, htmldumps-admin, udp2log-users (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1087575 (https://phabricator.wikimedia.org/T276465) (owner: 10Dzahn)
[14:04:59] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "starting gate-and-submit ahead of deployment" [extensions/Wikibase] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087877 (https://phabricator.wikimedia.org/T323778) (owner: 10Lucas Werkmeister (WMDE))
[14:05:03] <wikibugs>	 (03CR) 10Dzahn: "Ok, gotcha!" [puppet] - 10https://gerrit.wikimedia.org/r/1087575 (https://phabricator.wikimedia.org/T276465) (owner: 10Dzahn)
[14:06:56] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4458/co" [puppet] - 10https://gerrit.wikimedia.org/r/1087913 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede)
[14:09:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus-ganeti-exporter.service on ganeti1046:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:10:16] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Q2:rack/setup/install wdqs202[67] - https://phabricator.wikimedia.org/T378031#10296184 (10Jhancock.wm) it is! must have overlooked it. thanks!
[14:10:18] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: PowerSupplyFailure - https://phabricator.wikimedia.org/T379116#10296181 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm
[14:10:21] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Q2:rack/setup/install wdqs202[67] - https://phabricator.wikimedia.org/T378031#10296186 (10Jhancock.wm)
[14:10:40] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Q2:rack/setup/install wdqs202[67] - https://phabricator.wikimedia.org/T378031#10296188 (10Jhancock.wm) a:03Jhancock.wm
[14:10:55] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitFailed: prometheus-ganeti-exporter.service on ganeti1046:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:13:04] <Hamishcz>	 Sorry i was stuck by traffic jam, is the windows still available pls?
[14:13:36] <Hamishcz>	 for https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1085572
[14:13:56] <Lucas_WMDE>	 yes! hi!
[14:14:18] <Hamishcz>	 Lucas_WMDE: hi! sorry for waiting, again :)
[14:14:35] <Hamishcz>	 I found your +1 and was going to ping u lol
[14:14:38] <wikibugs>	 (03PS1) 10Fabfur: hiera: split haproxykafka topics based on role [puppet] - 10https://gerrit.wikimedia.org/r/1087917 (https://phabricator.wikimedia.org/T377931)
[14:14:51] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045
[14:15:23] <wikibugs>	 (03PS1) 10Ayounsi: Policy BGP_Infra_In: add term prioritize_experimental [homer/public] - 10https://gerrit.wikimedia.org/r/1087918 (https://phabricator.wikimedia.org/T378453)
[14:15:23] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087917 (https://phabricator.wikimedia.org/T377931) (owner: 10Fabfur)
[14:15:30] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1085572 (owner: 10Hamish)
[14:15:54] <wikibugs>	 (03PS2) 10Fabfur: hiera: split haproxykafka topics based on role [puppet] - 10https://gerrit.wikimedia.org/r/1087917 (https://phabricator.wikimedia.org/T377931)
[14:16:01] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045
[14:16:15] <wikibugs>	 (03Merged) 10jenkins-bot: Cleanup for logo related file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1085572 (owner: 10Hamish)
[14:16:43] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1085572|Cleanup for logo related file]]
[14:16:58] <Hamishcz>	 Lucas_WMDE: Brilliant, and thank you!
[14:16:59] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] Policy BGP_Infra_In: add term prioritize_experimental [homer/public] - 10https://gerrit.wikimedia.org/r/1087918 (https://phabricator.wikimedia.org/T378453) (owner: 10Ayounsi)
[14:17:31] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087917 (https://phabricator.wikimedia.org/T377931) (owner: 10Fabfur)
[14:18:23] <wikibugs>	 (03CR) 10Gmodena: [C:03+1] hiera: split haproxykafka topics based on role [puppet] - 10https://gerrit.wikimedia.org/r/1087917 (https://phabricator.wikimedia.org/T377931) (owner: 10Fabfur)
[14:18:30] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Policy BGP_Infra_In: add term prioritize_experimental [homer/public] - 10https://gerrit.wikimedia.org/r/1087918 (https://phabricator.wikimedia.org/T378453) (owner: 10Ayounsi)
[14:19:03] <wikibugs>	 (03Merged) 10jenkins-bot: Policy BGP_Infra_In: add term prioritize_experimental [homer/public] - 10https://gerrit.wikimedia.org/r/1087918 (https://phabricator.wikimedia.org/T378453) (owner: 10Ayounsi)
[14:19:15] <icinga-wm>	 PROBLEM - Host cp2031 is DOWN: PING CRITICAL - Packet loss = 100%
[14:19:18] <sukhe>	 huh
[14:19:20] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet
[14:19:34] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 hamishz, lucaswerkmeister-wmde: Backport for [[gerrit:1085572|Cleanup for logo related file]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:19:41] <sukhe>	 !log depool cp2031
[14:19:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:19:49] <Lucas_WMDE>	 Hamishcz: can you check that the logos still look like they’re supposed to?
[14:19:55] <mutante>	 sukhe: I can take a look at mgmt if you are already busy
[14:19:59] <Lucas_WMDE>	 (IIUC there should be no change from this deployment, it’s just cleaning up the order)
[14:20:05] <sukhe>	 mutante: thanks but can look as well
[14:20:08] <mutante>	 ok
[14:20:34] <wikibugs>	 (03CR) 10Vgutierrez: "I'd go a step further and set the topics on hieradata/role/common/ rather than per DC" [puppet] - 10https://gerrit.wikimedia.org/r/1087917 (https://phabricator.wikimedia.org/T377931) (owner: 10Fabfur)
[14:20:56] <logmsgbot>	 !log sukhe@puppetserver1001 conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet
[14:20:59] <icinga-wm>	 RECOVERY - Host cp2031 is UP: PING OK - Packet loss = 0%, RTA = 30.28 ms
[14:21:48] <sukhe>	 mutante: thanks for the offer <3
[14:22:07] <mutante>	 sukhe: yw, looks like it rebooted itself?
[14:22:29] <sukhe>	 Description: The power input for power supply 2 is lost.
[14:22:57] <mutante>	 maybe check if dcops-codfw is on site 
[14:23:21] <mutante>	 btw there was already a broken PSU on this very host in the past  https://phabricator.wikimedia.org/T335110
[14:23:28] <wikibugs>	 (03Merged) 10jenkins-bot: Document available wbformatvalue options [extensions/Wikibase] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087877 (https://phabricator.wikimedia.org/T323778) (owner: 10Lucas Werkmeister (WMDE))
[14:23:41] <Lucas_WMDE>	 ^ I’ll deploy that backport soon, hopefully
[14:23:50] <Hamishcz>	 Lucas_WMDE: doing, need to check w/ files so may need some time
[14:24:08] <mutante>	 sukhe: you would think losing one of 2 PSUs doesnt cause this :/
[14:24:58] <sukhe>	 mutante: yeah, Jenn is on site so she is looking
[14:24:58] <mutante>	 maybe it was the power cable after all. unfortunately the ticket doesnt say how it was fixed
[14:25:02] <mutante>	 cool
[14:26:37] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet
[14:26:55] <Hamishcz>	 Lucas_WMDE: Confirm the logos are good
[14:27:00] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 hamishz, lucaswerkmeister-wmde: Continuing with sync
[14:27:09] <Lucas_WMDE>	 ok, thanks!
[14:27:24] <Hamishcz>	 my pleasure:)
[14:28:01] <wikibugs>	 (03CR) 10Elukey: "The approach looks good, and it seems working but to be honest it seems to make the whole readability and maintainability to decrease (eve" [puppet] - 10https://gerrit.wikimedia.org/r/1087891 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[14:28:12] <wikibugs>	 (03PS1) 10FNegri: Add komla to wmcs-roots [puppet] - 10https://gerrit.wikimedia.org/r/1087919 (https://phabricator.wikimedia.org/T379159)
[14:30:56] <wikibugs>	 (03PS3) 10Fabfur: hiera: split haproxykafka topics based on role [puppet] - 10https://gerrit.wikimedia.org/r/1087917 (https://phabricator.wikimedia.org/T377931)
[14:31:23] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
[14:31:25] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
[14:31:30] <stashbot>	 T378453: Testing liberica with ncredir@eqiad - https://phabricator.wikimedia.org/T378453
[14:31:45] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1085572|Cleanup for logo related file]] (duration: 15m 01s)
[14:31:58] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087917 (https://phabricator.wikimedia.org/T377931) (owner: 10Fabfur)
[14:31:59] <wikibugs>	 (03CR) 10Elukey: "The regex could be simplified even more to (ata-\d.\d|scsi-\d:\d:\d{2}:\d)-part4" [puppet] - 10https://gerrit.wikimedia.org/r/1087891 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[14:33:14] <Lucas_WMDE>	 I’m starting my backport now, since it was already merged
[14:33:22] <Lucas_WMDE>	 it will probably take more than 30 minutes, because it touches i18n :(
[14:33:28] <Lucas_WMDE>	 sorry in advance
[14:33:58] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1087877|Document available wbformatvalue options (T323778)]]
[14:34:01] <stashbot>	 T323778: [ACTION-API] [TECH] Wikibase doesn’t validate formatter options, can crash with different TypeErrors - https://phabricator.wikimedia.org/T323778
[14:34:19] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] changeprop-jobqueue: set max poll interval and revert concurrency [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087558 (https://phabricator.wikimedia.org/T356241) (owner: 10Scott French)
[14:36:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:36:49] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046
[14:40:38] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1087913 (https://phabricator.wikimedia.org/T377728) (owner: 10Slyngshede)
[14:42:28] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046
[14:43:09] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
[14:45:55] <jinxer-wm>	 FIRING: MaxConntrack: Max conntrack at 95.68% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[14:46:30] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.6 point update - https://phabricator.wikimedia.org/T374536#10296409 (10MoritzMuehlenhoff)
[14:47:12] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.6 point update - https://phabricator.wikimedia.org/T374536#10296412 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff All done!
[14:47:33] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] spark: Avoid Ferm-specific syntax (take 2) [puppet] - 10https://gerrit.wikimedia.org/r/1087488 (owner: 10Muehlenhoff)
[14:48:32] <Lucas_WMDE>	 scap is currently at sync-testservers-k8s
[14:48:39] <moritzm>	 !log installing usb.ids updates from Bookworm point release
[14:48:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:48:46] <Lucas_WMDE>	 taking longer than usual, probably because the image diff is bigger due to the l10n rebuild :/
[14:50:39] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
[14:50:55] <jinxer-wm>	 RESOLVED: MaxConntrack: Max conntrack at 92.08% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[14:51:14] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.7 point update - https://phabricator.wikimedia.org/T373783#10296453 (10MoritzMuehlenhoff)
[14:51:27] <moritzm>	 !log installing php7.4 security updates
[14:51:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:19] <Lucas_WMDE>	 now at scap-cdb-rebuild (still in the test servers)
[14:59:40] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Backport for [[gerrit:1087877|Document available wbformatvalue options (T323778)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:59:43] <Lucas_WMDE>	 yay
[14:59:43] <stashbot>	 T323778: [ACTION-API] [TECH] Wikibase doesn’t validate formatter options, can crash with different TypeErrors - https://phabricator.wikimedia.org/T323778
[14:59:44] <Lucas_WMDE>	 testing…
[15:00:05] <Lucas_WMDE>	 works \o/
[15:00:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T1500)
[15:00:06] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Continuing with sync
[15:02:28] <wikibugs>	 (03CR) 10Eevans: [C:03+2] replace list of cassandra hosts with faux values (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087514 (owner: 10Eevans)
[15:03:16] <wikibugs>	 (03CR) 10Eevans: [C:03+2] replace list of cassandra hosts with faux values (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087514 (owner: 10Eevans)
[15:04:34] <Lucas_WMDE>	 still deploying, sorry :(
[15:04:42] <wikibugs>	 (03Merged) 10jenkins-bot: replace list of cassandra hosts with faux values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087514 (owner: 10Eevans)
[15:05:41] <wikibugs>	 (03PS1) 10Dzahn: admin: add group approver for udp2log-users [puppet] - 10https://gerrit.wikimedia.org/r/1087926 (https://phabricator.wikimedia.org/T276465)
[15:06:03] <wikibugs>	 (03PS2) 10MVernon: facts: adjust swift_disks fact to handle new SM kit [puppet] - 10https://gerrit.wikimedia.org/r/1087891 (https://phabricator.wikimedia.org/T371400)
[15:06:14] <wikibugs>	 (03PS1) 10Esanders: Deploy EditCheck (references) to hiwiki, bnwiki, idwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087927 (https://phabricator.wikimedia.org/T366381)
[15:06:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:07:13] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet
[15:08:35] <wikibugs>	 (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087891 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[15:09:04] <wikibugs>	 (03CR) 10Elukey: [C:03+1] facts: adjust swift_disks fact to handle new SM kit [puppet] - 10https://gerrit.wikimedia.org/r/1087891 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[15:09:53] <wikibugs>	 (03CR) 10MVernon: [C:03+2] facts: adjust swift_disks fact to handle new SM kit [puppet] - 10https://gerrit.wikimedia.org/r/1087891 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[15:11:31] <wikibugs>	 (03PS2) 10Abijeet Patro: Translate: Enable message bundle Scribunto module [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087914 (https://phabricator.wikimedia.org/T359918)
[15:12:43] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1087877|Document available wbformatvalue options (T323778)]] (duration: 38m 45s)
[15:12:46] <stashbot>	 T323778: [ACTION-API] [TECH] Wikibase doesn’t validate formatter options, can crash with different TypeErrors - https://phabricator.wikimedia.org/T323778
[15:13:08] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
[15:14:02] <wikibugs>	 (03PS2) 10Arnaudb: mariadb: productionize db2226 [puppet] - 10https://gerrit.wikimedia.org/r/1087902 (https://phabricator.wikimedia.org/T373579)
[15:14:22] <wikibugs>	 (03CR) 10Arnaudb: "good catch!" [puppet] - 10https://gerrit.wikimedia.org/r/1087902 (https://phabricator.wikimedia.org/T373579) (owner: 10Arnaudb)
[15:18:42] <mutante>	 !log gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab  (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" T379166
[15:18:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:18:44] <stashbot>	 T379166: SystemdUnitFailed - wmf_auto_restart_ssh-gitlab.service on gitlab1004:9100 - https://phabricator.wikimedia.org/T379166
[15:19:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:19:29] <wikibugs>	 (03CR) 10Nikerabbit: [C:03+1] Translate: Enable message bundle Scribunto module [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087914 (https://phabricator.wikimedia.org/T359918) (owner: 10Abijeet Patro)
[15:21:26] * Lucas_WMDE done deploying
[15:23:15] <wikibugs>	 (03PS1) 10Slyngshede: P:netbox remove CAS authentication leftovers. [puppet] - 10https://gerrit.wikimedia.org/r/1087931 (https://phabricator.wikimedia.org/T371892)
[15:24:54] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236
[15:26:02] <wikibugs>	 (03CR) 10CI reject: [V:04-1] P:netbox remove CAS authentication leftovers. [puppet] - 10https://gerrit.wikimedia.org/r/1087931 (https://phabricator.wikimedia.org/T371892) (owner: 10Slyngshede)
[15:27:32] <wikibugs>	 (03PS1) 10Clément Goubert: Revert "profile::docker::report: use the internal registry endpoint" [puppet] - 10https://gerrit.wikimedia.org/r/1087932
[15:28:45] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] Revert "profile::docker::report: use the internal registry endpoint" [puppet] - 10https://gerrit.wikimedia.org/r/1087932 (owner: 10Clément Goubert)
[15:29:46] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] Revert "profile::docker::report: use the internal registry endpoint" [puppet] - 10https://gerrit.wikimedia.org/r/1087932 (owner: 10Clément Goubert)
[15:31:48] <moritzm>	 !log installing Linux 5.10.226 on bullseye hosts
[15:31:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:32:08] <wikibugs>	 (03PS1) 10Andrea Denisse: titan: bring thanos 5m retention to 44w [puppet] - 10https://gerrit.wikimedia.org/r/1087933 (https://phabricator.wikimedia.org/T351927)
[15:33:34] <wikibugs>	 (03PS2) 10Slyngshede: P:netbox remove CAS authentication leftovers. [puppet] - 10https://gerrit.wikimedia.org/r/1087931 (https://phabricator.wikimedia.org/T371892)
[15:33:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): an-presto1018.eqiad.wmnet: DRAC is down - https://phabricator.wikimedia.org/T378854#10296585 (10bking) DC Ops,  Per IRC conversation in dc-ops channel , Cathal checked the network plumbing and everything looks good. `bmc-info` fro...
[15:33:49] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "confirmed by Leo on IRC" [puppet] - 10https://gerrit.wikimedia.org/r/1087926 (https://phabricator.wikimedia.org/T276465) (owner: 10Dzahn)
[15:33:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1087926 (https://phabricator.wikimedia.org/T276465) (owner: 10Dzahn)
[15:34:28] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] admin: add group approver for udp2log-users [puppet] - 10https://gerrit.wikimedia.org/r/1087926 (https://phabricator.wikimedia.org/T276465) (owner: 10Dzahn)
[15:35:55] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:36:14] <wikibugs>	 (03CR) 10CI reject: [V:04-1] P:netbox remove CAS authentication leftovers. [puppet] - 10https://gerrit.wikimedia.org/r/1087931 (https://phabricator.wikimedia.org/T371892) (owner: 10Slyngshede)
[15:36:19] <wikibugs>	 (03CR) 10LMata: [V:03+1] admin: add group approver for udp2log-users [puppet] - 10https://gerrit.wikimedia.org/r/1087926 (https://phabricator.wikimedia.org/T276465) (owner: 10Dzahn)
[15:37:13] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ldap/nda for Deepesha Burse WMDE - https://phabricator.wikimedia.org/T378182#10296591 (10Deepesha_WMDE) @MatthewVernon here is my username: Deepesha_WMDE. Please let me know if you need anything else from me. Thanks!
[15:38:10] <wikibugs>	 (03PS2) 10Ssingh: hiera: set profile::lvs::do_ipv6_ra_primary on lvs4010 [puppet] - 10https://gerrit.wikimedia.org/r/1082257 (https://phabricator.wikimedia.org/T358260)
[15:39:54] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10vm-requests: eqiad: request 1 VM for wdqs-categories - https://phabricator.wikimedia.org/T376079#10296594 (10bking)
[15:41:25] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] hiera: set profile::lvs::do_ipv6_ra_primary on lvs4010 [puppet] - 10https://gerrit.wikimedia.org/r/1082257 (https://phabricator.wikimedia.org/T358260) (owner: 10Ssingh)
[15:41:31] <wikibugs>	 06SRE, 10Infrastructure Security, 06Infrastructure-Foundations, 13Patch-For-Review: puppet admin module: Assign approvers to unix groups - https://phabricator.wikimedia.org/T276465#10296614 (10Dzahn)
[15:42:37] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[15:43:08] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[15:48:19] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[15:48:49] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[15:49:35] <wikibugs>	 (03PS4) 10Fabfur: hiera: split haproxykafka topics based on role [puppet] - 10https://gerrit.wikimedia.org/r/1087917 (https://phabricator.wikimedia.org/T377931)
[15:50:35] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[15:50:56] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087917 (https://phabricator.wikimedia.org/T377931) (owner: 10Fabfur)
[15:51:05] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[15:53:53] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.dns.netbox
[15:55:03] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
[15:55:04] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted
[15:55:08] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted
[15:55:37] <topranks>	 !log rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work T358260
[15:55:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:55:40] <stashbot>	 T358260: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260
[15:57:21] <logmsgbot>	 !log mfossati@deploy2002 Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0
[15:57:28] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  fransc1001 - vriley@cumin1002"
[15:57:50] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  fransc1001 - vriley@cumin1002"
[15:57:50] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:57:58] <logmsgbot>	 !log mfossati@deploy2002 Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s)
[15:58:13] <wikibugs>	 (03CR) 10Jcrespo: [C:04-1] "This needs test-s4 extensive testing, blocking merge until done." [cookbooks] - 10https://gerrit.wikimedia.org/r/1087860 (owner: 10Volans)
[15:58:50] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[15:59:20] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:01:26] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
[16:01:43] <wikibugs>	 (03PS1) 10MVernon: swift: use regex to handle Dell & SM accounts|containers disks [puppet] - 10https://gerrit.wikimedia.org/r/1087935 (https://phabricator.wikimedia.org/T371400)
[16:02:02] <wikibugs>	 (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087935 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[16:08:34] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:08:59] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:10:06] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:10:16] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236
[16:10:45] <wikibugs>	 (03PS1) 10Vgutierrez: liberica: Harden healthcheck systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/1087939
[16:11:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[16:13:35] <wikibugs>	 (03Abandoned) 10Fabfur: hiera: split haproxykafka topics based on role [puppet] - 10https://gerrit.wikimedia.org/r/1087917 (https://phabricator.wikimedia.org/T377931) (owner: 10Fabfur)
[16:14:10] <wikibugs>	 (03CR) 10Cwhite: [C:03+1] "Looks good, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1087933 (https://phabricator.wikimedia.org/T351927) (owner: 10Andrea Denisse)
[16:14:31] <wikibugs>	 (03PS2) 10MVernon: swift: use regex to handle Dell & SM accounts|containers disks [puppet] - 10https://gerrit.wikimedia.org/r/1087935 (https://phabricator.wikimedia.org/T371400)
[16:14:41] <wikibugs>	 (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087935 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[16:15:37] <wikibugs>	 (03PS5) 10FNegri: WMCS: split cloudvirt alerts from generic nodes [alerts] - 10https://gerrit.wikimedia.org/r/1084782 (https://phabricator.wikimedia.org/T375479)
[16:16:05] <wikibugs>	 (03PS1) 10Fabfur: hiera: split haproxykafka topics based on role [puppet] - 10https://gerrit.wikimedia.org/r/1087940 (https://phabricator.wikimedia.org/T377931)
[16:16:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[16:16:43] <wikibugs>	 (03PS3) 10MVernon: swift: use regex to handle Dell & SM accounts|containers disks [puppet] - 10https://gerrit.wikimedia.org/r/1087935 (https://phabricator.wikimedia.org/T371400)
[16:16:50] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[16:16:51] <wikibugs>	 (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087935 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[16:16:58] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087940 (https://phabricator.wikimedia.org/T377931) (owner: 10Fabfur)
[16:17:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WMCS: split cloudvirt alerts from generic nodes [alerts] - 10https://gerrit.wikimedia.org/r/1084782 (https://phabricator.wikimedia.org/T375479) (owner: 10FNegri)
[16:17:14] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.dns.netbox
[16:18:11] <wikibugs>	 (03PS2) 10Vgutierrez: liberica: Harden healthcheck systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/1087939
[16:18:11] <wikibugs>	 (03PS1) 10Vgutierrez: liberica: Harden cp systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/1087941
[16:18:31] <wikibugs>	 (03CR) 10Ottomata: hiera: split haproxykafka topics based on role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1087940 (https://phabricator.wikimedia.org/T377931) (owner: 10Fabfur)
[16:18:44] <wikibugs>	 (03PS3) 10Gmodena: refinery: gobblin: add webrequest_frontend. [puppet] - 10https://gerrit.wikimedia.org/r/1082434 (https://phabricator.wikimedia.org/T377931)
[16:18:59] <icinga-wm>	 PROBLEM - Disk space on thanos-be1003 is CRITICAL: DISK CRITICAL - free space: /srv/swift-storage/sdh1 176741 MB (4% inode=92%): /srv/swift-storage/sdc1 186571 MB (4% inode=91%): /srv/swift-storage/sdf1 229716 MB (6% inode=91%): /srv/swift-storage/sdg1 206647 MB (5% inode=91%): /srv/swift-storage/sdd1 193936 MB (5% inode=91%): /srv/swift-storage/sde1 199289 MB (5% inode=92%): /srv/swift-storage/sdi1 184636 MB (4% inode=91%): /srv/swift-st
[16:18:59] <icinga-wm>	 k1 181125 MB (4% inode=92%): /srv/swift-storage/sdj1 193339 MB (5% inode=91%): /srv/swift-storage/sdl1 182177 MB (4% inode=91%): /srv/swift-storage/sdm1 189422 MB (4% inode=91%): /srv/swift-storage/sdn1 152482 MB (3% inode=90%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=thanos-be1003&var-datasource=eqiad+prometheus/ops
[16:19:00] <wikibugs>	 (03PS2) 10Vgutierrez: liberica: Harden cp systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/1087941
[16:19:01] <wikibugs>	 (03PS4) 10MVernon: swift: use regex to handle Dell & SM accounts|containers disks [puppet] - 10https://gerrit.wikimedia.org/r/1087935 (https://phabricator.wikimedia.org/T371400)
[16:19:08] <wikibugs>	 (03CR) 10MVernon: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087935 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[16:20:29] <wikibugs>	 (03PS2) 10Fabfur: hiera: split haproxykafka topics based on role [puppet] - 10https://gerrit.wikimedia.org/r/1087940 (https://phabricator.wikimedia.org/T377931)
[16:20:36] <wikibugs>	 (03CR) 10Fabfur: hiera: split haproxykafka topics based on role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1087940 (https://phabricator.wikimedia.org/T377931) (owner: 10Fabfur)
[16:20:46] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002"
[16:20:51] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002"
[16:20:51] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:20:57] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] hiera: split haproxykafka topics based on role [puppet] - 10https://gerrit.wikimedia.org/r/1087940 (https://phabricator.wikimedia.org/T377931) (owner: 10Fabfur)
[16:21:23] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] hiera: split haproxykafka topics based on role [puppet] - 10https://gerrit.wikimedia.org/r/1087940 (https://phabricator.wikimedia.org/T377931) (owner: 10Fabfur)
[16:22:13] <wikibugs>	 (03CR) 10MVernon: "I'm not sure if the commit message or code change are what you were aiming for, but I don't think they match..." [puppet] - 10https://gerrit.wikimedia.org/r/1087933 (https://phabricator.wikimedia.org/T351927) (owner: 10Andrea Denisse)
[16:23:36] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:24:14] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:24:55] <wikibugs>	 (03CR) 10Elukey: [C:03+1] swift: use regex to handle Dell & SM accounts|containers disks [puppet] - 10https://gerrit.wikimedia.org/r/1087935 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[16:25:10] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10296744 (10MoritzMuehlenhoff)
[16:25:15] <wikibugs>	 (03CR) 10MVernon: [C:03+2] swift: use regex to handle Dell & SM accounts|containers disks [puppet] - 10https://gerrit.wikimedia.org/r/1087935 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[16:25:38] <logmsgbot>	 !log elukey@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
[16:26:00] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:26:25] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:28:43] <icinga-wm>	 PROBLEM - BGP status on lsw1-e1-eqiad.mgmt is CRITICAL: BGP CRITICAL - AS64600/IPv4: Active - PyBal https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:30:11] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087521 (https://phabricator.wikimedia.org/T378192) (owner: 10Arturo Borrero Gonzalez)
[16:31:28] <wikibugs>	 (03PS1) 10Fabfur: hiera: moving haproxykafka common keys to profile [puppet] - 10https://gerrit.wikimedia.org/r/1087943 (https://phabricator.wikimedia.org/T377931)
[16:31:43] <icinga-wm>	 RECOVERY - BGP status on lsw1-e1-eqiad.mgmt is OK: BGP OK - up: 9, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:31:58] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
[16:32:34] <moritzm>	 !log remove ganeti1014 from active ganeti nodes T378921
[16:32:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:32:44] <stashbot>	 T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921
[16:32:54] <wikibugs>	 (03CR) 10Cwhite: titan: bring thanos 5m retention to 44w (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1087933 (https://phabricator.wikimedia.org/T351927) (owner: 10Andrea Denisse)
[16:33:46] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10296782 (10MoritzMuehlenhoff)
[16:34:24] <wikibugs>	 (03PS1) 10Vgutierrez: liberica: Harden fp systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/1087944
[16:35:05] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti1014 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 112 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[16:35:19] <logmsgbot>	 !log jclark@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:35:23] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti1014 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[16:36:14] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:37:19] <logmsgbot>	 !log vriley@cumin1002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[16:37:37] <jinxer-wm>	 FIRING: ProbeDown: Service ganeti1014:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:38:25] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to deployment shell access for Kgraessle - https://phabricator.wikimedia.org/T379173 (10Kgraessle) 03NEW
[16:38:37] <wikibugs>	 (03CR) 10Aqu: [C:03+1] refinery: gobblin: add webrequest_frontend. [puppet] - 10https://gerrit.wikimedia.org/r/1082434 (https://phabricator.wikimedia.org/T377931) (owner: 10Gmodena)
[16:39:38] <wikibugs>	 (03CR) 10Xcollazo: [C:03+1] "LGTM. Thanks for the patch!" [puppet] - 10https://gerrit.wikimedia.org/r/1087609 (https://phabricator.wikimedia.org/T378260) (owner: 10Zabe)
[16:40:07] <wikibugs>	 (03CR) 10Xcollazo: [C:03+1] "CCing @btullis@wikimedia.org FYI." [puppet] - 10https://gerrit.wikimedia.org/r/1087609 (https://phabricator.wikimedia.org/T378260) (owner: 10Zabe)
[16:43:31] <wikibugs>	 (03PS2) 10Andrea Denisse: titan: bring thanos 5m retention to 10w [puppet] - 10https://gerrit.wikimedia.org/r/1087933 (https://phabricator.wikimedia.org/T351927)
[16:44:19] <wikibugs>	 (03CR) 10Andrea Denisse: titan: bring thanos 5m retention to 10w (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1087933 (https://phabricator.wikimedia.org/T351927) (owner: 10Andrea Denisse)
[16:44:21] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296813 (10Papaul)
[16:45:42] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296815 (10Papaul)
[16:45:42] <wikibugs>	 (03CR) 10Cwhite: [C:03+1] "Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1087933 (https://phabricator.wikimedia.org/T351927) (owner: 10Andrea Denisse)
[16:45:58] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296817 (10Papaul)
[16:52:26] <wikibugs>	 (03CR) 10MVernon: [C:03+1] "LGTM thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1087933 (https://phabricator.wikimedia.org/T351927) (owner: 10Andrea Denisse)
[16:52:47] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+2] titan: bring thanos 5m retention to 10w [puppet] - 10https://gerrit.wikimedia.org/r/1087933 (https://phabricator.wikimedia.org/T351927) (owner: 10Andrea Denisse)
[16:55:26] <wikibugs>	 (03PS4) 10Dzahn: admin: add group approvers for druid-admins and htmldumps-admin [puppet] - 10https://gerrit.wikimedia.org/r/1087575 (https://phabricator.wikimedia.org/T276465)
[16:58:53] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
[17:04:00] <wikibugs>	 (03CR) 10Ottomata: [C:03+1] "One nit. +1 otherwise.  let me know if I can help merge when you are ready." [puppet] - 10https://gerrit.wikimedia.org/r/1082434 (https://phabricator.wikimedia.org/T377931) (owner: 10Gmodena)
[17:05:34] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.dns.netbox
[17:06:52] <wikibugs>	 (03PS1) 10MVernon: preseed - use ms-be_simple-efi.cfg for new SM Config-J nodes [puppet] - 10https://gerrit.wikimedia.org/r/1087949 (https://phabricator.wikimedia.org/T371400)
[17:07:26] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296916 (10ssingh) `cr1-eqiad` is stated for Nov 13 but note that T376737 is also scheduled for that period (Nov 13, 8 CT) and it might make tricky for both `magru` and `eqiad` to...
[17:09:11] <icinga-wm>	 PROBLEM - Host ms-be2083 is DOWN: PING CRITICAL - Packet loss = 100%
[17:09:41] <icinga-wm>	 RECOVERY - Host ms-be2083 is UP: PING OK - Packet loss = 0%, RTA = 30.42 ms
[17:09:51] <wikibugs>	 (03PS6) 10FNegri: WMCS: split cloudvirt alerts from generic nodes [alerts] - 10https://gerrit.wikimedia.org/r/1084782 (https://phabricator.wikimedia.org/T375479)
[17:11:07] <logmsgbot>	 !log vriley@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  fransw1001 - vriley@cumin1002"
[17:11:11] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update  mgmt  fransw1001 - vriley@cumin1002"
[17:11:11] <logmsgbot>	 !log vriley@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:11:25] <wikibugs>	 (03CR) 10FNegri: WMCS: split cloudvirt alerts from generic nodes (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1084782 (https://phabricator.wikimedia.org/T375479) (owner: 10FNegri)
[17:11:34] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
[17:12:26] <wikibugs>	 (03PS1) 10Dzahn: backup: add /var/lib/gerrit to gerrit repodata backup filesets [puppet] - 10https://gerrit.wikimedia.org/r/1087950 (https://phabricator.wikimedia.org/T338470)
[17:12:44] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296934 (10Papaul)
[17:13:21] <wikibugs>	 (03CR) 10Dzahn: "cc: Hashar: a good example how just changing the user name turns into a rabbit hole, but we can do it." [puppet] - 10https://gerrit.wikimedia.org/r/1087950 (https://phabricator.wikimedia.org/T338470) (owner: 10Dzahn)
[17:13:40] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296938 (10Papaul) @ssingh thanks i forgot about the 13th I update the dates.
[17:13:53] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296937 (10Joe) I see there is a maintenance planned for codfw now, and that the plan is to depool the datacenter. Does this mean we're doing a datacenter switchover?   Because oth...
[17:14:34] <wikibugs>	 (03CR) 10Daimona Eaytoy: [C:03+1] "LGTM!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1078764 (https://phabricator.wikimedia.org/T376061) (owner: 10ZhaoFJx)
[17:14:37] <wikibugs>	 (03PS1) 10Majavah: WMCS: Lookup IPv6 records more generally [puppet] - 10https://gerrit.wikimedia.org/r/1087951
[17:14:37] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
[17:15:11] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:17:12] <hnowlan>	 !log importing debs for mercurius-1.0.1
[17:17:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:17:17] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[17:18:56] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296958 (10akosiaris) > Upgrades should follow the standard process  The standard process docs are outdated I fear.   > Depool site (optional) > (optional) if codfw, drain mw traff...
[17:18:57] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296959 (10Papaul) >>! In T364092#10296937, @Joe wrote: > I see there is a maintenance planned for codfw now, and that the plan is to depool the datacenter. Does this mean we're do...
[17:18:58] <jinxer-wm>	 RESOLVED: ProbeDown: Service ganeti1014:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[17:19:34] <wikibugs>	 (03CR) 10Jcrespo: [C:03+1] "No issue on my side, looks fine, but please check/CC people if it could affect non production (beta or cloud envs)." [puppet] - 10https://gerrit.wikimedia.org/r/1087950 (https://phabricator.wikimedia.org/T338470) (owner: 10Dzahn)
[17:22:01] <wikibugs>	 (03CR) 10Dzahn: "thank you! it will either be non-existing or be larger than 0 bytes, that is for sure since at least some config files will always be in i" [puppet] - 10https://gerrit.wikimedia.org/r/1087950 (https://phabricator.wikimedia.org/T338470) (owner: 10Dzahn)
[17:22:02] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296980 (10Papaul) Thanks @akosiaris @Joe we can hold back on codfw for now and work on eqiad. when we switch back to eqiad we can schedule the upgrade for codfw.
[17:22:34] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10296985 (10Papaul)
[17:27:37] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm
[17:27:55] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install mc-gp200[4-6] - https://phabricator.wikimedia.org/T376968#10297015 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host mc-gp2006.codfw.wmnet with OS bookworm
[17:28:03] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10297017 (10akosiaris) >>! In T364092#10296980, @Papaul wrote: > Thanks @akosiaris @Joe we can hold back on codfw for now and work on eqiad. when we switch back to eqiad we can sche...
[17:29:56] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] backup: add /var/lib/gerrit to gerrit repodata backup filesets [puppet] - 10https://gerrit.wikimedia.org/r/1087950 (https://phabricator.wikimedia.org/T338470) (owner: 10Dzahn)
[17:30:46] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10297026 (10cmooney) >>! In T364092#10296958, @akosiaris wrote: > codfw will be the primary during that set of dates, it should NOT be depooled.  Agreed.  It should also be possible...
[17:31:50] <urandom>	 jouncebot: next
[17:31:50] <jouncebot>	 In 0 hour(s) and 28 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T1800)
[17:32:08] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] preseed - use ms-be_simple-efi.cfg for new SM Config-J nodes [puppet] - 10https://gerrit.wikimedia.org/r/1087949 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[17:32:51] <wikibugs>	 (03CR) 10MVernon: [C:03+2] preseed - use ms-be_simple-efi.cfg for new SM Config-J nodes [puppet] - 10https://gerrit.wikimedia.org/r/1087949 (https://phabricator.wikimedia.org/T371400) (owner: 10MVernon)
[17:33:18] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10297053 (10cmooney) @Jclark-ctr could you also let me know what ports on the fmsw these two were plugged into?  |Device 1|Fron...
[17:35:15] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
[17:37:48] <wikibugs>	 (03CR) 10Btullis: "Looks good. There are a couple of files that I think we could probably exclude and some settings that seem extraneous, but nothing big." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087903 (https://phabricator.wikimedia.org/T377928) (owner: 10Brouberol)
[17:44:07] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+1] "LGTM, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1087379 (https://phabricator.wikimedia.org/T374827) (owner: 10Slyngshede)
[17:45:09] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage
[17:48:18] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage
[17:58:39] <wikibugs>	 (03CR) 10Majavah: "PCC: https://puppet-compiler.wmflabs.org/output/1087951/4459/" [puppet] - 10https://gerrit.wikimedia.org/r/1087951 (owner: 10Majavah)
[17:59:17] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Thanks for the patch. Taking the liberty to merge it." [puppet] - 10https://gerrit.wikimedia.org/r/1087508 (owner: 10Muehlenhoff)
[17:59:22] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] dns::auth::update: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1087508 (owner: 10Muehlenhoff)
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T1800)
[18:03:00] <sukhe>	 !log dummy authdns-update to test CR 10857508
[18:03:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:06:47] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[18:06:49] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q1:rack/setup/install fransc1001 - https://phabricator.wikimedia.org/T367814#10297264 (10VRiley-WMF)
[18:10:26] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
[18:10:26] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
[18:11:15] <wikibugs>	 (03PS1) 10BCornwall: varnish: Increase RSA cert warnings to 5% of views [puppet] - 10https://gerrit.wikimedia.org/r/1087954 (https://phabricator.wikimedia.org/T370837)
[18:11:52] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] varnish: Increase RSA cert warnings to 5% of views [puppet] - 10https://gerrit.wikimedia.org/r/1087954 (https://phabricator.wikimedia.org/T370837) (owner: 10BCornwall)
[18:17:26] <wikibugs>	 (03CR) 10BCornwall: [V:03+2 C:03+2] "`" [puppet] - 10https://gerrit.wikimedia.org/r/1087954 (https://phabricator.wikimedia.org/T370837) (owner: 10BCornwall)
[18:19:47] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T378916#10297284 (10phaultfinder)
[18:25:38] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Hardening looks good. [nit] link bug T378341." [puppet] - 10https://gerrit.wikimedia.org/r/1087944 (owner: 10Vgutierrez)
[18:25:53] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "[nit] add Bug: T378341" [puppet] - 10https://gerrit.wikimedia.org/r/1087941 (owner: 10Vgutierrez)
[18:25:56] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "[nit] add Bug: T378341" [puppet] - 10https://gerrit.wikimedia.org/r/1087939 (owner: 10Vgutierrez)
[18:28:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10297301 (10cmooney) >>! In T377381#10250655, @Jgreen wrote: > There are 6 servers being replaced: > {T369565} > {T369947} > {T...
[18:29:05] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087943 (https://phabricator.wikimedia.org/T377931) (owner: 10Fabfur)
[18:31:02] <wikibugs>	 (03PS1) 10Scott French: shellbox-syntaxhighlight: 1 of 12 replicas on PHP 8.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087579 (https://phabricator.wikimedia.org/T377038)
[18:34:42] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T378916#10297318 (10phaultfinder)
[18:39:32] <wikibugs>	 (03CR) 10BCornwall: [C:03+2] idp: Remove rsa-2048 certs [puppet] - 10https://gerrit.wikimedia.org/r/1075608 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall)
[18:39:42] <wikibugs>	 (03PS2) 10BCornwall: idp: Remove rsa-2048 certs [puppet] - 10https://gerrit.wikimedia.org/r/1075608 (https://phabricator.wikimedia.org/T375569)
[18:39:48] <wikibugs>	 (03CR) 10BCornwall: [V:03+2 C:03+2] idp: Remove rsa-2048 certs [puppet] - 10https://gerrit.wikimedia.org/r/1075608 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall)
[18:41:27] <brett>	 !log Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) (T375569)
[18:41:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:41:31] <stashbot>	 T375569: Remove RSA certificates from puppet - https://phabricator.wikimedia.org/T375569
[18:47:30] <wikibugs>	 (03CR) 10BCornwall: "Marking unresolved." [puppet] - 10https://gerrit.wikimedia.org/r/1075604 (https://phabricator.wikimedia.org/T375569) (owner: 10BCornwall)
[18:50:59] <wikibugs>	 06SRE, 10Domains, 06Traffic, 13Patch-For-Review: Acquire enwp.org - https://phabricator.wikimedia.org/T332220#10297394 (10BCornwall) Hi, @violetwtf, has Thomas responded? Thanks for getting on this. :)
[19:00:04] <jouncebot>	 jnuche and dduvall: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T1900).
[19:06:17] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, November 14 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1084883 (https://phabricator.wikimedia.org/T378343) (owner: 10Scardenasmolinar)
[19:07:07] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, November 07 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1084883 (https://phabricator.wikimedia.org/T378343) (owner: 10Scardenasmolinar)
[19:19:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:19:36] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T378916#10297470 (10phaultfinder)
[19:21:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10297478 (10cmooney) All, just to be aware I hit another snag this evening which may be problematic.  When trying to configure...
[19:23:34] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 10Data-Persistence-Backup, 06DC-Ops: Q1:rack/setup/install backup1012 - https://phabricator.wikimedia.org/T371416#10297480 (10wiki_willy) Just a heads up @Jclark-ctr & @VRiley-WMF - the test controller kit should've arrived yesterday:  https://www.fedex.com/fedextr...
[19:26:03] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 10Data-Persistence-Backup, 06DC-Ops: Q1:rack/setup/install backup2012 - https://phabricator.wikimedia.org/T371984#10297496 (10wiki_willy) Hi @Jhancock.wm and @Papaul - just a heads up, it looks like the test controller kit arrived yesterday:  https://www.fedex.com/...
[19:34:40] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T378916#10297520 (10phaultfinder)
[19:41:42] <wikibugs>	 (03PS1) 10Dzahn: devtools: update gerrit user from gerrit2 to gerrit [puppet] - 10https://gerrit.wikimedia.org/r/1087963 (https://phabricator.wikimedia.org/T338470)
[19:54:42] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T378916#10297558 (10phaultfinder)
[19:55:51] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[19:55:52] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm
[19:57:29] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install mc-gp200[4-6] - https://phabricator.wikimedia.org/T376968#10297564 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host mc-gp2006.codfw.wmnet with OS bookworm completed: - mc-gp2006 (**WARN**...
[19:57:44] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install mc-gp200[4-6] - https://phabricator.wikimedia.org/T376968#10297568 (10Jhancock.wm)
[19:58:46] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install mc-gp200[4-6] - https://phabricator.wikimedia.org/T376968#10297569 (10Jhancock.wm) 05Open→03Resolved @Clement_Goubert this is ready for y'all
[20:02:20] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297575 (10Jhancock.wm)
[20:09:07] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10297583 (10Dwisehaupt) > Thanks @Jgreen .  Looking at the existing ports on the switch I think it might make sense if we chang...
[20:16:51] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[20:19:00] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.dns.netbox
[20:22:19] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10297595 (10Jclark-ctr) @cmooney  replaced 1g dac cables with sfpt and cat6 cables.   These two switches have been removed from...
[20:24:41] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002"
[20:25:15] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002"
[20:25:15] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:25:47] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:25:50] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:25:52] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:25:54] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:25:56] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:26:07] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:27:04] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:32:42] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: server failure for cloudvirt1063.eqiad.wmnet - https://phabricator.wikimedia.org/T375372#10297608 (10Jclark-ctr) 05Open→03Resolved @fnegri cpu2 and mainboard where replaced today
[20:35:25] <wikibugs>	 (03PS1) 10Dzahn: gerrit: add chown parameter to lfs data rsync, ensure daemon_user is used [puppet] - 10https://gerrit.wikimedia.org/r/1087967 (https://phabricator.wikimedia.org/T338470)
[20:36:08] <wikibugs>	 (03PS2) 10Dzahn: devtools: update gerrit user from gerrit2 to gerrit [puppet] - 10https://gerrit.wikimedia.org/r/1087963 (https://phabricator.wikimedia.org/T338470)
[20:36:41] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:36:43] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "cloud only and VM is currently down" [puppet] - 10https://gerrit.wikimedia.org/r/1087963 (https://phabricator.wikimedia.org/T338470) (owner: 10Dzahn)
[20:36:49] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:36:56] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:37:08] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:37:38] <wikibugs>	 06SRE-OnFire, 10Incident Tooling: corto: review irc grammar ergonomics - https://phabricator.wikimedia.org/T370786#10297622 (10Eevans) Ok, I think I'm going to be bold and say we just move forward with the version of the bot that requires you poke/nick highlight it.  Bots that work this way are actually pretty...
[20:37:57] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[20:38:35] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146']
[20:38:40] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147']
[20:38:41] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148']
[20:38:43] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149']
[20:38:44] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150']
[20:39:00] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146']
[20:39:02] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147']
[20:39:03] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148']
[20:39:05] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149']
[20:39:06] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150']
[20:39:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T378916#10297623 (10phaultfinder)
[20:40:13] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm
[20:40:15] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm
[20:40:18] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm
[20:40:20] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm
[20:40:22] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm
[20:40:24] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297624 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2146.codfw.wmnet with O...
[20:40:26] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297625 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2147.codfw.wmnet with O...
[20:40:30] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297626 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2149.codfw.wmnet with O...
[20:40:31] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297627 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2150.codfw.wmnet with O...
[20:40:36] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297628 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2148.codfw.wmnet with O...
[20:44:47] <wikibugs>	 (03CR) 10Dzahn: [V:03+1 C:03+1] "https://puppet-compiler.wmflabs.org/output/1087967/4462/" [puppet] - 10https://gerrit.wikimedia.org/r/1087967 (https://phabricator.wikimedia.org/T338470) (owner: 10Dzahn)
[20:45:58] <wikibugs>	 (03PS1) 10Raymond Ndibe: openstack: keystone: fix radosgw 500 errors with Object Storage [puppet] - 10https://gerrit.wikimedia.org/r/1087968 (https://phabricator.wikimedia.org/T360626)
[20:46:43] <wikibugs>	 (03CR) 10Dzahn: [V:03+1 C:03+1] "tested by manually running the rsync command and does the job as it should. files on gerrit2003 will be owned gerrit:gerrit but on gerrit2" [puppet] - 10https://gerrit.wikimedia.org/r/1087967 (https://phabricator.wikimedia.org/T338470) (owner: 10Dzahn)
[20:50:33] <wikibugs>	 (03CR) 10Raymond Ndibe: openstack: keystone: fix radosgw 500 errors with Object Storage (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1087968 (https://phabricator.wikimedia.org/T360626) (owner: 10Raymond Ndibe)
[20:53:10] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Q4:rack/setup/install ganeti1039 to ganeti1052 - https://phabricator.wikimedia.org/T365650#10297670 (10Jclark-ctr) @MoritzMuehlenhoff  these where not handed over to service owner while.  Luca  and dceng researched License /provisioning issue.   T...
[20:53:31] <wikibugs>	 06SRE-OnFire, 10Incident Tooling: corto: review irc grammar ergonomics - https://phabricator.wikimedia.org/T370786#10297671 (10jhathaway) sounds good to me!
[20:54:45] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10297675 (10cmooney) >>! In T377381#10297595, @Jclark-ctr wrote: > These two switches have been removed from racks. > asw2-d5-e...
[20:57:46] <wikibugs>	 (03CR) 10Raymond Ndibe: "Also someone should Add `profile::openstack::base::keystone::credential_key_0` to the cloudcontrol nodes (`1005`, `1006`, `1007`). I don't" [puppet] - 10https://gerrit.wikimedia.org/r/1087968 (https://phabricator.wikimedia.org/T360626) (owner: 10Raymond Ndibe)
[20:58:40] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage
[20:58:45] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage
[20:58:59] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage
[20:59:03] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage
[20:59:06] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage
[20:59:32] <wikibugs>	 (03PS3) 10Scott French: Add title-case mapping to support migration to PHP 8.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087604 (https://phabricator.wikimedia.org/T372603)
[21:00:04] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: May I have your attention please! UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T2100)
[21:00:04] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[21:01:46] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage
[21:05:13] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage
[21:05:43] <wikibugs>	 06SRE-OnFire, 10Incident Tooling: Corto: configuration improvements - https://phabricator.wikimedia.org/T375309#10297699 (10Eevans)
[21:06:04] <wikibugs>	 06SRE-OnFire, 10Incident Tooling: Corto: configuration improvements - https://phabricator.wikimedia.org/T375309#10297700 (10Eevans) Done: [[ https://gitlab.wikimedia.org/repos/sre/corto/-/commit/445e00ef3be5e47c73592122434f6e175b02f994 | corto/-/commit/445e00e ]]
[21:06:18] <wikibugs>	 06SRE-OnFire, 10Incident Tooling: corto: review irc grammar ergonomics - https://phabricator.wikimedia.org/T370786#10297693 (10Eevans) 05Open→03Resolved a:03Eevans Done: [[ https://gitlab.wikimedia.org/repos/sre/corto/-/commit/445e00ef3be5e47c73592122434f6e175b02f994 | corto/-/commit/445e00e ]]
[21:07:35] <wikibugs>	 06SRE-OnFire, 10Incident Tooling: Corto: configuration improvements - https://phabricator.wikimedia.org/T375309#10297701 (10Eevans) 05Open→03Resolved a:03Eevans
[21:08:21] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage
[21:08:44] <wikibugs>	 (03CR) 10Scott French: "Tim, Timo - it would be great to get your review on this. All of this "makes sense" from a purely technical perspective (i.e., in terms of" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087604 (https://phabricator.wikimedia.org/T372603) (owner: 10Scott French)
[21:12:24] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage
[21:12:52] <logmsgbot>	 !log sukhe@puppetserver1001 conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced]
[21:16:28] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage
[21:18:14] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.dns.netbox
[21:19:48] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 10Data-Persistence-Backup, 06DC-Ops: Q1:rack/setup/install backup1012 - https://phabricator.wikimedia.org/T371416#10297756 (10VRiley-WMF) We have recieved the test controller kit. We are ready to install it whenever you're ready!
[21:20:22] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[21:20:32] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[21:22:23] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 10Data-Persistence-Backup, 06DC-Ops: Q1:rack/setup/install backup1012 - https://phabricator.wikimedia.org/T371416#10297762 (10jcrespo) >>! In T371416#10276034, @jcrespo wrote:  > Feel free to use and replace/service backup1012 and backup2012 as you want.
[21:24:40] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T378916#10297766 (10phaultfinder)
[21:25:39] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[21:25:57] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q1:rack/setup/install fransw1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T367801#10297770 (10Jgreen)
[21:26:42] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[21:26:43] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm
[21:26:46] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[21:26:47] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm
[21:26:59] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297771 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2149.codfw.wmnet with OS bo...
[21:27:00] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297772 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2146.codfw.wmnet with OS bo...
[21:27:02] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q1:rack/setup/install fransc1001 - https://phabricator.wikimedia.org/T367814#10297773 (10Jgreen)
[21:27:18] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[21:27:23] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[21:27:24] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm
[21:27:39] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297774 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2147.codfw.wmnet with OS bo...
[21:31:13] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[21:31:35] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[21:31:36] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm
[21:31:50] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297789 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2148.codfw.wmnet with OS bo...
[21:32:58] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q1:rack/setup/install franio100[1-3] - https://phabricator.wikimedia.org/T367820#10297791 (10Jgreen)
[21:35:33] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[21:42:58] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[21:42:59] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm
[21:43:13] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297808 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2150.codfw.wmnet with OS bo...
[21:45:29] <jinxer-wm>	 FIRING: SystemdUnitFailed: uwsgi-netbox.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:45:42] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job netbox_global in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[21:46:20] <jinxer-wm>	 FIRING: CirrusSearchPoolCounterRejectionTooHigh: MediaWiki CirrusSearch failing to obtain a token from the pool counter at a very high rate - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Pool_Counter_rejections_(search_is_currently_too_busy) - https://grafana.wikimedia.org/d/qrOStmdGk/elasticsearch-pool-counters?viewPanel=4&orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchPoolCounterRejectionToo
[21:46:32] <jinxer-wm>	 FIRING: PoolcounterFullQueues: Full queues for poolcounter1007:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[21:46:43] <jinxer-wm>	 FIRING: VarnishUnavailable: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DVarnishUnavailable
[21:46:44] <jinxer-wm>	 FIRING: HaproxyUnavailable: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[21:47:10] <denisse>	 !incidents
[21:47:11] <sirenbot>	 5376 (UNACKED)  VarnishUnavailable global sre (varnish-text thanos-rule)
[21:47:11] <sirenbot>	 5377 (UNACKED)  HaproxyUnavailable cache_text global sre (thanos-rule)
[21:47:11] <sirenbot>	 5375 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[21:47:11] <sirenbot>	 5374 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[21:47:11] <sirenbot>	 5373 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[21:47:12] <sirenbot>	 5372 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[21:47:12] <sirenbot>	 5371 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[21:47:12] <sirenbot>	 5370 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[21:47:13] <sirenbot>	 5369 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[21:47:13] <sirenbot>	 5368 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[21:47:14] <sirenbot>	 5367 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[21:47:19] <denisse>	 !ack 5376
[21:47:19] <sirenbot>	 5376 (ACKED)  VarnishUnavailable global sre (varnish-text thanos-rule)
[21:47:25] <denisse>	 !ack 5377
[21:47:26] <sirenbot>	 5377 (ACKED)  HaproxyUnavailable cache_text global sre (thanos-rule)
[21:47:31] <denisse>	 Here.
[21:48:24] <rzl>	 I'm officially out sick but nearby, looking
[21:50:25] <jinxer-wm>	 FIRING: [7x] SystemdUnitFailed: netbox_ganeti_codfw02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:51:20] <jinxer-wm>	 RESOLVED: CirrusSearchPoolCounterRejectionTooHigh: MediaWiki CirrusSearch failing to obtain a token from the pool counter at a very high rate - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Pool_Counter_rejections_(search_is_currently_too_busy) - https://grafana.wikimedia.org/d/qrOStmdGk/elasticsearch-pool-counters?viewPanel=4&orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchPoolCounterRejectionT
[21:51:32] <jinxer-wm>	 RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1007:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[21:51:44] <jinxer-wm>	 RESOLVED: VarnishUnavailable: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DVarnishUnavailable
[21:51:44] <jinxer-wm>	 RESOLVED: HaproxyUnavailable: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[21:55:25] <jinxer-wm>	 FIRING: [9x] SystemdUnitFailed: netbox_ganeti_codfw02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:00:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T2200)
[22:00:25] <jinxer-wm>	 FIRING: [13x] SystemdUnitFailed: netbox_ganeti_codfw02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:01:26] <wikibugs>	 (03PS1) 10Ebernhardson: search: Exclude automated pool counter from alerts [alerts] - 10https://gerrit.wikimedia.org/r/1087973
[22:03:13] <jinxer-wm>	 FIRING: VarnishUnavailable: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DVarnishUnavailable
[22:03:14] <jinxer-wm>	 FIRING: HaproxyUnavailable: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[22:03:40] <denisse>	 !incidents
[22:03:40] <sirenbot>	 5378 (UNACKED)  VarnishUnavailable global sre (varnish-text thanos-rule)
[22:03:41] <sirenbot>	 5379 (UNACKED)  HaproxyUnavailable cache_text global sre (thanos-rule)
[22:03:41] <sirenbot>	 5377 (RESOLVED)  HaproxyUnavailable cache_text global sre (thanos-rule)
[22:03:41] <sirenbot>	 5376 (RESOLVED)  VarnishUnavailable global sre (varnish-text thanos-rule)
[22:03:41] <sirenbot>	 5375 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:03:41] <sirenbot>	 5374 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:03:42] <sirenbot>	 5373 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:03:42] <sirenbot>	 5372 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:03:43] <sirenbot>	 5371 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:03:43] <sirenbot>	 5370 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:03:44] <sirenbot>	 5369 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:03:44] <sirenbot>	 5368 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:03:45] <sirenbot>	 5367 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:03:49] <denisse>	 !ack 5378
[22:03:49] <sirenbot>	 5378 (ACKED)  VarnishUnavailable global sre (varnish-text thanos-rule)
[22:03:54] <denisse>	 !ack 5379
[22:03:54] <sirenbot>	 5379 (ACKED)  HaproxyUnavailable cache_text global sre (thanos-rule)
[22:04:26] <jinxer-wm>	 FIRING: CirrusSearchPoolCounterRejectionTooHigh: MediaWiki CirrusSearch failing to obtain a token from the pool counter at a very high rate - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Pool_Counter_rejections_(search_is_currently_too_busy) - https://grafana.wikimedia.org/d/qrOStmdGk/elasticsearch-pool-counters?viewPanel=4&orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchPoolCounterRejectionToo
[22:04:51] <jinxer-wm>	 FIRING: PoolcounterFullQueues: Full queues for poolcounter1007:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[22:05:25] <jinxer-wm>	 FIRING: [13x] SystemdUnitFailed: netbox_ganeti_codfw02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:05:42] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job netbox_global in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[22:08:13] <jinxer-wm>	 RESOLVED: VarnishUnavailable: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DVarnishUnavailable
[22:08:14] <jinxer-wm>	 RESOLVED: HaproxyUnavailable: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable
[22:09:20] <jinxer-wm>	 RESOLVED: CirrusSearchPoolCounterRejectionTooHigh: MediaWiki CirrusSearch failing to obtain a token from the pool counter at a very high rate - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Pool_Counter_rejections_(search_is_currently_too_busy) - https://grafana.wikimedia.org/d/qrOStmdGk/elasticsearch-pool-counters?viewPanel=4&orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchPoolCounterRejectionT
[22:09:51] <jinxer-wm>	 RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1007:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[22:10:25] <jinxer-wm>	 FIRING: [13x] SystemdUnitFailed: netbox_ganeti_codfw02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:10:32] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.dns.netbox
[22:11:13] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297878 (10Jhancock.wm)
[22:14:20] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004  - jclark@cumin1002"
[22:14:24] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004  - jclark@cumin1002"
[22:14:24] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[22:15:25] <jinxer-wm>	 FIRING: [13x] SystemdUnitFailed: netbox_ganeti_codfw02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:16:10] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:16:11] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:16:12] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:16:58] <mutante>	 !incidents
[22:16:59] <sirenbot>	 5379 (RESOLVED)  HaproxyUnavailable cache_text global sre (thanos-rule)
[22:16:59] <sirenbot>	 5378 (RESOLVED)  VarnishUnavailable global sre (varnish-text thanos-rule)
[22:16:59] <sirenbot>	 5377 (RESOLVED)  HaproxyUnavailable cache_text global sre (thanos-rule)
[22:16:59] <sirenbot>	 5376 (RESOLVED)  VarnishUnavailable global sre (varnish-text thanos-rule)
[22:16:59] <sirenbot>	 5375 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:17:00] <sirenbot>	 5374 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:17:00] <sirenbot>	 5373 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:17:00] <sirenbot>	 5372 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:17:01] <sirenbot>	 5371 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:17:01] <sirenbot>	 5370 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:17:02] <sirenbot>	 5369 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:17:02] <sirenbot>	 5368 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:17:03] <sirenbot>	 5367 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-eqiad.wikimedia.org)
[22:18:12] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.dns.netbox
[22:18:34] <wikibugs>	 (03PS1) 10Bvibber: DB config for testcommonswiki deployment for Charts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087975 (https://phabricator.wikimedia.org/T379199)
[22:19:16] <wikibugs>	 (03CR) 10CI reject: [V:04-1] DB config for testcommonswiki deployment for Charts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087975 (https://phabricator.wikimedia.org/T379199) (owner: 10Bvibber)
[22:20:19] <wikibugs>	 (03PS2) 10Bvibber: DB config for testcommonswiki deployment for Charts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087975 (https://phabricator.wikimedia.org/T379199)
[22:20:25] <jinxer-wm>	 RESOLVED: [13x] SystemdUnitFailed: netbox_ganeti_codfw02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:22:37] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002"
[22:22:42] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002"
[22:22:42] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[22:23:48] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:23:52] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:23:54] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:23:56] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:23:59] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:24:06] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:24:11] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:25:00] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:25:15] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install mc-gp100[4-6] - https://phabricator.wikimedia.org/T377032#10297909 (10Jclark-ctr)
[22:25:21] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:30:28] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence, 10Data-Persistence-Backup, 06DC-Ops: Q1:rack/setup/install backup1012 - https://phabricator.wikimedia.org/T371416#10297911 (10VRiley-WMF)
[22:33:54] <wikibugs>	 (03PS1) 10Eevans: Add (fake) corto bot password [labs/private] - 10https://gerrit.wikimedia.org/r/1087979 (https://phabricator.wikimedia.org/T379204)
[22:33:59] <wikibugs>	 06SRE-OnFire, 10Incident Tooling, 13Patch-For-Review: corto: update production deployment for project changes - https://phabricator.wikimedia.org/T379204 (10Eevans) 03NEW
[22:34:17] <wikibugs>	 (03CR) 10Eevans: [V:03+2 C:03+2] Add (fake) corto bot password [labs/private] - 10https://gerrit.wikimedia.org/r/1087979 (https://phabricator.wikimedia.org/T379204) (owner: 10Eevans)
[22:35:03] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:35:22] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:35:34] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:36:01] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:36:19] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:37:59] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155']
[22:38:00] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154']
[22:38:01] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153']
[22:38:02] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152']
[22:38:03] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151']
[22:38:30] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151']
[22:38:32] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152']
[22:38:34] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153']
[22:38:35] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154']
[22:38:37] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155']
[22:38:44] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:38:48] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:39:12] <wikibugs>	 (03PS1) 10Eevans: Update corto puppetization [puppet] - 10https://gerrit.wikimedia.org/r/1087980 (https://phabricator.wikimedia.org/T379204)
[22:39:42] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm
[22:39:44] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm
[22:39:46] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm
[22:39:47] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm
[22:39:48] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm
[22:39:48] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Update corto puppetization [puppet] - 10https://gerrit.wikimedia.org/r/1087980 (https://phabricator.wikimedia.org/T379204) (owner: 10Eevans)
[22:39:52] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297954 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2151.codfw.wmnet with O...
[22:39:57] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297955 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2152.codfw.wmnet with O...
[22:39:59] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297956 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2153.codfw.wmnet with O...
[22:40:01] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297957 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2154.codfw.wmnet with O...
[22:40:03] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10297958 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2155.codfw.wmnet with O...
[22:40:44] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:43:46] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm
[22:44:00] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install mc-gp100[4-6] - https://phabricator.wikimedia.org/T377032#10297962 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host mc-gp1006.eqiad.wmnet with OS bookworm
[22:44:03] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm
[22:44:05] <wikibugs>	 (03PS2) 10Eevans: Update corto puppetization [puppet] - 10https://gerrit.wikimedia.org/r/1087980 (https://phabricator.wikimedia.org/T379204)
[22:44:08] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install mc-gp100[4-6] - https://phabricator.wikimedia.org/T377032#10297964 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host mc-gp1005.eqiad.wmnet with OS bookworm
[22:44:16] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm
[22:44:21] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install mc-gp100[4-6] - https://phabricator.wikimedia.org/T377032#10297965 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host mc-gp1004.eqiad.wmnet with OS bookworm
[22:46:13] <wikibugs>	 (03CR) 10Eevans: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087980 (https://phabricator.wikimedia.org/T379204) (owner: 10Eevans)
[22:57:05] <wikibugs>	 (03CR) 10Bking: [C:03+2] search: Exclude automated pool counter from alerts [alerts] - 10https://gerrit.wikimedia.org/r/1087973 (owner: 10Ebernhardson)
[22:58:03] <wikibugs>	 (03CR) 10Eevans: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087980 (https://phabricator.wikimedia.org/T379204) (owner: 10Eevans)
[22:58:14] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage
[22:58:14] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage
[22:58:20] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage
[22:58:21] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage
[22:58:30] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage
[23:00:15] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage
[23:00:40] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage
[23:02:12] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage
[23:02:19] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage
[23:05:27] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage
[23:08:46] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage
[23:12:08] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage
[23:15:48] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage
[23:18:55] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:19:13] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage
[23:19:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:19:41] <wikibugs>	 (03CR) 10Scott French: [C:03+2] "Thanks for the clean-up!" [puppet] - 10https://gerrit.wikimedia.org/r/1087606 (https://phabricator.wikimedia.org/T378260) (owner: 10Zabe)
[23:22:55] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage
[23:23:20] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:23:21] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm
[23:23:28] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[23:23:31] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10298075 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2155.codfw.wmnet with OS bo...
[23:23:56] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[23:23:57] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm
[23:24:10] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install mc-gp100[4-6] - https://phabricator.wikimedia.org/T377032#10298076 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host mc-gp1004.eqiad.wmnet with OS bookworm completed: - mc-gp1004 (**PASS**)   -...
[23:25:12] <wikibugs>	 (03PS1) 10Aude: Enable Chart extension on testwiki and testcommonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087987 (https://phabricator.wikimedia.org/T378127)
[23:25:17] <wikibugs>	 (03CR) 10Eevans: "I'm not sure what is causing the PCC failures.  I did add `profile::corto::irc_config::password` to labs/private, but it doesn't seem to f" [puppet] - 10https://gerrit.wikimedia.org/r/1087980 (https://phabricator.wikimedia.org/T379204) (owner: 10Eevans)
[23:25:54] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Enable Chart extension on testwiki and testcommonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087987 (https://phabricator.wikimedia.org/T378127) (owner: 10Aude)
[23:26:58] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage
[23:27:57] <wikibugs>	 (03PS2) 10Aude: Enable Chart extension on testwiki and testcommonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087987 (https://phabricator.wikimedia.org/T378127)
[23:27:59] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:28:57] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:28:59] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm
[23:29:10] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10298086 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2153.codfw.wmnet with OS bo...
[23:30:03] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[23:31:33] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[23:31:34] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm
[23:31:39] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install mc-gp100[4-6] - https://phabricator.wikimedia.org/T377032#10298090 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host mc-gp1005.eqiad.wmnet with OS bookworm completed: - mc-gp1005 (**PASS**)   -...
[23:34:05] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:36:02] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:36:04] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm
[23:36:14] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10298098 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2154.codfw.wmnet with OS bo...
[23:37:56] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:39:46] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:39:47] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm
[23:39:57] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10298108 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2151.codfw.wmnet with OS bo...
[23:41:24] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[23:41:41] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
[23:41:42] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm
[23:41:47] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install mc-gp100[4-6] - https://phabricator.wikimedia.org/T377032#10298120 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host mc-gp1006.eqiad.wmnet with OS bookworm completed: - mc-gp1006 (**PASS**)   -...
[23:45:46] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:46:15] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:46:16] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm
[23:46:25] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10298135 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2152.codfw.wmnet with OS bo...
[23:46:39] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10298136 (10Jhancock.wm)
[23:50:54] <wikibugs>	 (03CR) 10Jdlrobson: Enable Chart extension on testwiki and testcommonswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087987 (https://phabricator.wikimedia.org/T378127) (owner: 10Aude)
[23:54:11] <wikibugs>	 (03PS3) 10Aude: Enable Chart extension on testwiki and testcommonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087987 (https://phabricator.wikimedia.org/T378127)
[23:54:54] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Enable Chart extension on testwiki and testcommonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087987 (https://phabricator.wikimedia.org/T378127) (owner: 10Aude)
[23:55:01] <wikibugs>	 (03CR) 10Aude: Enable Chart extension on testwiki and testcommonswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087987 (https://phabricator.wikimedia.org/T378127) (owner: 10Aude)