[00:00:44] <wikibugs>	 06SRE-OnFire, 10Incident Tooling: Corto: ensure Phabricator tasks are created with correct default visibility & priority - https://phabricator.wikimedia.org/T376500#10291276 (10Eevans) >>! In T376500#10286143, @Aklapper wrote: > @corto could set view policy and edit policy to #acl_sre-team (`PHID-PROJ-fqb3bher...
[00:01:58] <wikibugs>	 06SRE-OnFire, 10Incident Tooling: Corto: ensure Phabricator tasks are created with correct default visibility & priority - https://phabricator.wikimedia.org/T376500#10291278 (10Eevans) 05Open→03Resolved a:03Eevans
[00:03:29] <wikibugs>	 (03CR) 10RLazarus: [C:03+2] "Thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1085506 (https://phabricator.wikimedia.org/T376230) (owner: 10RLazarus)
[00:04:03] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[00:06:19] <rzl>	 jouncebot: nowandnext
[00:06:19] <jouncebot>	 No deployments scheduled for the next 2 hour(s) and 53 minute(s)
[00:06:19] <jouncebot>	 In 2 hour(s) and 53 minute(s): Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T0300)
[00:07:05] <rzl>	 quick rollout to resolve the helmfile diffs for a chart bump that only affects mw-script
[00:08:09] <logmsgbot>	 !log rzl@deploy2002 Started scap sync-world: 1085506
[00:10:02] <logmsgbot>	 !log rzl@deploy2002 Finished scap sync-world: 1085506 (duration: 02m 50s)
[00:10:55] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[00:11:14] <wikibugs>	 (03CR) 10RLazarus: [C:03+2] "Thanks for the review!" [puppet] - 10https://gerrit.wikimedia.org/r/1085507 (https://phabricator.wikimedia.org/T376230) (owner: 10RLazarus)
[00:21:59] <wikibugs>	 (03PS8) 10BryanDavis: [WIP] Allow provisioning MediaWiki with PHP 8.1 [puppet] - 10https://gerrit.wikimedia.org/r/1085471 (https://phabricator.wikimedia.org/T378752)
[00:22:40] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install mc-gp200[4-6] - https://phabricator.wikimedia.org/T376968#10291293 (10Jhancock.wm) mc-gp2006 nic port isn't coming up. tried different DAC cables but not coming up. will try a different port to see if it's a switch issue in the morning.
[00:32:50] <icinga-wm>	 RECOVERY - MD RAID on wikikube-worker2068 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[00:34:21] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[00:38:37] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1087269
[00:38:37] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1087269 (owner: 10TrainBranchBot)
[01:01:09] <wikibugs>	 (03PS2) 10Aude: Create releases for chart-renderer service [deployment-charts] - 10https://gerrit.wikimedia.org/r/1085633 (https://phabricator.wikimedia.org/T376948)
[01:01:59] <wikibugs>	 (03CR) 10Aude: Create releases for chart-renderer service (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1085633 (https://phabricator.wikimedia.org/T376948) (owner: 10Aude)
[01:03:54] <icinga-wm>	 RECOVERY - Host ganeti2042 is UP: PING WARNING - Packet loss = 75%, RTA = 30.38 ms
[01:03:58] <icinga-wm>	 PROBLEM - SSH on ganeti2042 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[01:08:36] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1087273
[01:08:36] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1087273 (owner: 10TrainBranchBot)
[01:10:18] <icinga-wm>	 PROBLEM - Host ganeti2042 is DOWN: PING CRITICAL - Packet loss = 100%
[01:10:47] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1087269 (owner: 10TrainBranchBot)
[01:11:24] <wikibugs>	 06SRE, 06collaboration-services, 10Wikimedia-Mailing-lists: Message content lost when mailing list is the only recipient - https://phabricator.wikimedia.org/T377045#10291366 (10Platonides) I have been testing this by sending emails to be held. **It is very clearly losing emails**. There is a portion in the h...
[01:14:21] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[01:41:53] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1087273 (owner: 10TrainBranchBot)
[02:08:36] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/1.44.0-wmf.2 [core] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087276 (https://phabricator.wikimedia.org/T375661)
[02:08:38] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/1.44.0-wmf.2 [core] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087276 (https://phabricator.wikimedia.org/T375661) (owner: 10TrainBranchBot)
[02:37:37] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:42:43] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/1.44.0-wmf.2 [core] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087276 (https://phabricator.wikimedia.org/T375661) (owner: 10TrainBranchBot)
[03:00:05] <jouncebot>	 Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T0300)
[03:02:37] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[04:00:05] <jouncebot>	 Deploy window Automatic deployment of of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T0400)
[04:01:43] <wikibugs>	 (03PS1) 10TrainBranchBot: testwikis to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087290 (https://phabricator.wikimedia.org/T375661)
[04:01:44] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] testwikis to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087290 (https://phabricator.wikimedia.org/T375661) (owner: 10TrainBranchBot)
[04:02:32] <wikibugs>	 (03Merged) 10jenkins-bot: testwikis to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087290 (https://phabricator.wikimedia.org/T375661) (owner: 10TrainBranchBot)
[04:03:01] <logmsgbot>	 !log mwpresync@deploy2002 Started scap sync-world: testwikis to 1.44.0-wmf.2  refs T375661
[04:03:04] <stashbot>	 T375661: 1.44.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T375661
[05:00:05] <jouncebot>	 Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T0500)
[05:10:39] <logmsgbot>	 !log mwpresync@deploy2002 Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s)
[05:14:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: mediawiki_job_startupregistrystats-testwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:41:16] <icinga-wm>	 PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1003), Fresh: 136 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[06:12:42] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 215, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:12:54] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 112, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:13:45] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 219, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:13:54] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 128, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T0700)
[07:00:05] <jouncebot>	 marostegui, Amir1, and arnaudb: OwO what's this, a deployment window?? Primary database switchover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T0700). nyaa~
[07:04:21] <jinxer-wm>	 FIRING: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[07:09:21] <jinxer-wm>	 RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[07:39:36] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 11414
[07:39:49] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414
[07:41:18] <icinga-wm>	 RECOVERY - Backup freshness on backup1001 is OK: Fresh: 137 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[07:55:04] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'configure' for AS: 14593
[07:55:12] <icinga-wm>	 PROBLEM - Memcached on idp1004 is CRITICAL: connect to address 208.80.154.7 and port 11000: Connection refused https://wikitech.wikimedia.org/wiki/Memcached
[07:56:15] <wikibugs>	 (03PS1) 10Fabfur: hiera: haproxykafka defaults to 2 workers [puppet] - 10https://gerrit.wikimedia.org/r/1087359 (https://phabricator.wikimedia.org/T374473)
[07:56:31] <moritzm>	 idp1004 is just some alert spam, should recover shortly
[07:56:40] <moritzm>	 memcached is unused there and I'm cleaning up things
[07:58:36] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: ganeti2042 seems to have a broken CPU? (new Supermicro node) - https://phabricator.wikimedia.org/T378358#10291601 (10MoritzMuehlenhoff) >>! In T378358#10289446, @Jhancock.wm wrote: > removed CPU 2. gonna let it run for a little and see if it generates errors. then we'll at least...
[07:58:43] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 216, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:58:47] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] hiera: haproxykafka defaults to 2 workers [puppet] - 10https://gerrit.wikimedia.org/r/1087359 (https://phabricator.wikimedia.org/T374473) (owner: 10Fabfur)
[07:58:54] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 113, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:59:39] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] hiera: haproxykafka defaults to 2 workers [puppet] - 10https://gerrit.wikimedia.org/r/1087359 (https://phabricator.wikimedia.org/T374473) (owner: 10Fabfur)
[08:00:04] <jouncebot>	 Amir1, Urbanecm, and awight: #bothumor My software never has bugs. It just develops random features. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T0800).
[08:00:05] <jouncebot>	 Tchanders: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:03:17] <Tchanders>	 Are we OK to go ahead with the deployment window now? I can deploy
[08:03:34] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593
[08:03:43] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'email' for AS: 2828
[08:05:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[08:06:13] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828
[08:08:55] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by tchanders@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087195 (https://phabricator.wikimedia.org/T378336) (owner: 10Tchanders)
[08:09:43] <wikibugs>	 (03Merged) 10jenkins-bot: temp accounts: Enable temp account creation on second-round pilots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087195 (https://phabricator.wikimedia.org/T378336) (owner: 10Tchanders)
[08:10:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[08:10:28] <logmsgbot>	 !log tchanders@deploy2002 Started scap sync-world: Backport for [[gerrit:1087195|temp accounts: Enable temp account creation on second-round pilots (T378336)]]
[08:10:31] <stashbot>	 T378336: Temporary Accounts: Minor pilots - Nov 5 deploy - https://phabricator.wikimedia.org/T378336
[08:16:00] <Tchanders>	 I'm getting backport failed: "'mwscript eval.php --wiki testwiki' generated unexpected output: Warning: socket_sendto(): unable to write to socket [101]: Network is unreachable in /srv/mediawiki-staging/php-1.44.0-wmf.2/includes/debug/logger/monolog/LegacyHandler.php on line 234"
[08:17:51] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Add BGP.tools sessions [homer/public] - 10https://gerrit.wikimedia.org/r/1084129 (owner: 10Ayounsi)
[08:17:57] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add BGP.tools sessions [homer/public] - 10https://gerrit.wikimedia.org/r/1084129 (owner: 10Ayounsi)
[08:18:05] <wikibugs>	 (03PS3) 10Ayounsi: Add BGP.tools sessions [homer/public] - 10https://gerrit.wikimedia.org/r/1084129
[08:19:12] <icinga-wm>	 RECOVERY - Memcached on idp1004 is OK: TCP OK - 0.000 second response time on 208.80.154.7 port 11000 https://wikitech.wikimedia.org/wiki/Memcached
[08:20:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[08:20:32] <wikibugs>	 (03CR) 10Ayounsi: "recheck" [homer/public] - 10https://gerrit.wikimedia.org/r/1084129 (owner: 10Ayounsi)
[08:21:04] <wikibugs>	 (03Merged) 10jenkins-bot: Add BGP.tools sessions [homer/public] - 10https://gerrit.wikimedia.org/r/1084129 (owner: 10Ayounsi)
[08:24:02] <jnuche>	 Tchanders: it looks like that error already happened last night during the train presync, I'll see what I can dig up
[08:24:21] <vgutierrez>	 !log uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm)
[08:24:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:25:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[08:26:27] <wikibugs>	 (03PS1) 10Muehlenhoff: Failover idp.w.o to idp1004 [dns] - 10https://gerrit.wikimedia.org/r/1087363
[08:27:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[08:29:55] <Tchanders>	 jnuche: Thanks. Is it worth re-running in the meantime?
[08:30:23] <jnuche>	 my guess is it will probably fail again
[08:31:06] <Tchanders>	 OK, I'll hold off
[08:32:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[08:36:38] <wikibugs>	 (03PS1) 10Fabfur: hiera: hpk batch_deadline on socket set to 1s [puppet] - 10https://gerrit.wikimedia.org/r/1087365 (https://phabricator.wikimedia.org/T374473)
[08:37:59] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087365 (https://phabricator.wikimedia.org/T374473) (owner: 10Fabfur)
[08:39:09] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] hiera: hpk batch_deadline on socket set to 1s [puppet] - 10https://gerrit.wikimedia.org/r/1087365 (https://phabricator.wikimedia.org/T374473) (owner: 10Fabfur)
[08:39:12] <kostajh>	 jnuche: is there a task tracking the error?
[08:40:07] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] hiera: hpk batch_deadline on socket set to 1s [puppet] - 10https://gerrit.wikimedia.org/r/1087365 (https://phabricator.wikimedia.org/T374473) (owner: 10Fabfur)
[08:40:13] <jnuche>	 kostajh: not yet, I'm putting together details to create one
[08:41:11] <jnuche>	 it seems to me maybe the behavior of the mwscript "eval.php" has changed and now it needs network access, where before it didn't
[08:41:57] <jnuche>	 kostajh: can I ping you in the story once it's ready? don't know if that sounds like something you're familiar with
[08:44:15] <wikibugs>	 (03PS1) 10Muehlenhoff: Disable installation of memcached on IDP hosts [puppet] - 10https://gerrit.wikimedia.org/r/1087366
[08:44:52] <kostajh>	 jnuche: I could take a look at it, sure 
[08:44:56] <Tchanders>	 jnuche: thanks for investigating. I'll abandon our deploy attempt for this window and try again next window
[08:45:13] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1087366 (owner: 10Muehlenhoff)
[08:46:21] <wikibugs>	 (03PS1) 10Fabfur: haproxykafka: restart service on config file changes [puppet] - 10https://gerrit.wikimedia.org/r/1087371 (https://phabricator.wikimedia.org/T374473)
[08:47:26] <Tchanders>	 jnuche: Since the patch got merged, should I run scap backport --revert to clean up?
[08:48:34] <jnuche>	 Tchanders: yeah, probably that's a good idea, thx
[08:48:45] <jnuche>	 (most likely will fail at the same place though)
[08:49:10] <wikibugs>	 (03PS1) 10Arnaudb: dbproxy: update grants with ip and fqdn [puppet] - 10https://gerrit.wikimedia.org/r/1087369 (https://phabricator.wikimedia.org/T368874)
[08:49:10] <wikibugs>	 (03CR) 10Arnaudb: "I suppose those templates are not applied automatically, right?" [puppet] - 10https://gerrit.wikimedia.org/r/1087369 (https://phabricator.wikimedia.org/T368874) (owner: 10Arnaudb)
[08:50:04] <wikibugs>	 (03PS1) 10TrainBranchBot: Revert "temp accounts: Enable temp account creation on second-round pilots" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087373
[08:50:04] <wikibugs>	 (03CR) 10TrainBranchBot: "tchanders@deploy2002 created a revert of this change as I8585140e96e99c710bc591455b86e9d19cc8ad88" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087195 (https://phabricator.wikimedia.org/T378336) (owner: 10Tchanders)
[08:50:44] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Disable installation of memcached on IDP hosts [puppet] - 10https://gerrit.wikimedia.org/r/1087366 (owner: 10Muehlenhoff)
[08:51:10] <kostajh>	 I guess this is related to work around T341560? 
[08:51:11] <stashbot>	 T341560: Migrate mwmaint server functionality to mw-on-k8s - https://phabricator.wikimedia.org/T341560
[08:52:48] <wikibugs>	 (03PS1) 10Arnaudb: dbproxy: switch CNAMEs [dns] - 10https://gerrit.wikimedia.org/r/1087374 (https://phabricator.wikimedia.org/T368874)
[08:56:11] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by tchanders@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087373 (owner: 10TrainBranchBot)
[08:56:55] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "temp accounts: Enable temp account creation on second-round pilots" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087373 (owner: 10TrainBranchBot)
[08:57:22] <jnuche>	 kostajh: mmh, not sure, the failing script is running on the deployment server directly, T341560 seems to be about mwmaint
[08:57:22] <stashbot>	 T341560: Migrate mwmaint server functionality to mw-on-k8s - https://phabricator.wikimedia.org/T341560
[08:57:27] <logmsgbot>	 !log tchanders@deploy2002 Started scap sync-world: Backport for [[gerrit:1087373|Revert "temp accounts: Enable temp account creation on second-round pilots"]]
[09:00:05] <jouncebot>	 jnuche and dduvall: OwO what's this, a deployment window?? MediaWiki train - Utc-0+Utc-7 Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T0900). nyaa~
[09:02:46] <jnuche>	 kostajh: T379044, my best guess is the script's behavior changed recently, but I'm gonna keep looking
[09:02:47] <stashbot>	 T379044: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044
[09:05:03] <wikibugs>	 06SRE, 06serviceops, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291704 (10kostajh)
[09:06:34] <wikibugs>	 06SRE, 06serviceops, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291711 (10kostajh)
[09:06:34] <wikibugs>	 (03PS1) 10Vgutierrez: liberica: fix hcforwarder configuration [puppet] - 10https://gerrit.wikimedia.org/r/1087376 (https://phabricator.wikimedia.org/T377127)
[09:06:36] <wikibugs>	 06SRE, 06serviceops, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291707 (10kostajh) p:05Triage→03Unbreak! Marking as UBN, as this is blocking train and backport deployments.
[09:06:44] <kostajh>	 thanks jnuche. I added some tags and marked as a subtask of this week's train blockers 
[09:07:48] <wikibugs>	 (03PS2) 10Vgutierrez: liberica: fix hcforwarder configuration [puppet] - 10https://gerrit.wikimedia.org/r/1087376 (https://phabricator.wikimedia.org/T377127)
[09:08:27] <jnuche>	 thanks kostajh
[09:08:57] <jnuche>	 any idea how be a good person to ask about this?
[09:09:01] <jnuche>	 *would be
[09:09:22] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087376 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[09:10:35] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] P:firewall remove Icinga conntrack check [puppet] - 10https://gerrit.wikimedia.org/r/1085515 (https://phabricator.wikimedia.org/T374827) (owner: 10Slyngshede)
[09:13:24] <kostajh>	 jnuche: maybe post in #wikimedia-sre with a link to the task? 
[09:13:52] <kostajh>	 I've added the `#SRE` tag to the task so it should get triaged quickly, but given the urgency seems like a good idea to message in IRC as well 
[09:14:17] <jnuche>	 sure
[09:14:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: mediawiki_job_startupregistrystats-testwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:15:42] <wikibugs>	 (03PS1) 10Slyngshede: P:firewall absent check_conntrack script. [puppet] - 10https://gerrit.wikimedia.org/r/1087379 (https://phabricator.wikimedia.org/T374827)
[09:19:23] <wikibugs>	 (03PS1) 10Vgutierrez: liberica: Provide ipip0 and ipip60 devices [puppet] - 10https://gerrit.wikimedia.org/r/1087381 (https://phabricator.wikimedia.org/T377127)
[09:20:26] <wikibugs>	 06SRE, 06serviceops, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291760 (10Joe) Investigating right now. It would help to know on what server this happened. I assume the active deployment server?
[09:21:38] <wikibugs>	 06SRE, 06serviceops, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291762 (10jnuche) >  I assume the active deployment server?  Yeah, on `deploy2002`
[09:21:49] <_joe_>	 !log restarted rsyslog on deploy2002 T379044
[09:21:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:21:52] <stashbot>	 T379044: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044
[09:22:45] <logmsgbot>	 !log jnuche@deploy2002 Started scap sync-world: testwikis to 1.44.0-wmf.2  refs T375661
[09:22:48] <stashbot>	 T375661: 1.44.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T375661
[09:23:34] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087381 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[09:23:51] <kostajh>	 jnuche: can I try to sync the config patch after you're done with train deployment?
[09:24:32] <jnuche>	 kostajh: that's no problem, failure is still happening though
[09:25:43] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4453/console" [puppet] - 10https://gerrit.wikimedia.org/r/1087379 (https://phabricator.wikimedia.org/T374827) (owner: 10Slyngshede)
[09:26:53] <wikibugs>	 (03CR) 10Slyngshede: P:firewall absent check_conntrack script. [puppet] - 10https://gerrit.wikimedia.org/r/1087379 (https://phabricator.wikimedia.org/T374827) (owner: 10Slyngshede)
[09:29:44] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] liberica: fix hcforwarder configuration [puppet] - 10https://gerrit.wikimedia.org/r/1087376 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[09:31:18] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] liberica: Provide ipip0 and ipip60 devices [puppet] - 10https://gerrit.wikimedia.org/r/1087381 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[09:31:44] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
[09:31:46] <stashbot>	 T373037: Make ParserCache more like a ring - https://phabricator.wikimedia.org/T373037
[09:31:58] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
[09:33:30] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
[09:34:38] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1087379 (https://phabricator.wikimedia.org/T374827) (owner: 10Slyngshede)
[09:37:39] <wikibugs>	 06SRE, 06serviceops, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291795 (10Joe) Rsyslog had some tls errors so i restarted it, but i doubted it could be the real culprit, given it is reached via udp...
[09:41:06] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to snapshot* with group snapshot-admins for ebernhardson - https://phabricator.wikimedia.org/T379025#10291797 (10MatthewVernon) 05Open→03Resolved a:03Volans AFAICT the requested change was done by @Volans already, so closing this ticket.
[09:41:55] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Failover idp.w.o to idp1004 [dns] - 10https://gerrit.wikimedia.org/r/1087363 (owner: 10Muehlenhoff)
[09:45:27] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: api|rest-gateway: Support sending request headers to upstream [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087388 (https://phabricator.wikimedia.org/T374683)
[09:45:47] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
[09:45:49] <wikibugs>	 (03PS1) 10MVernon: admin/data.yaml: set krb: present for jsn [puppet] - 10https://gerrit.wikimedia.org/r/1087389 (https://phabricator.wikimedia.org/T378786)
[09:48:21] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1087389 (https://phabricator.wikimedia.org/T378786) (owner: 10MVernon)
[09:48:43] <wikibugs>	 (03CR) 10MVernon: [C:03+2] admin/data.yaml: set krb: present for jsn [puppet] - 10https://gerrit.wikimedia.org/r/1087389 (https://phabricator.wikimedia.org/T378786) (owner: 10MVernon)
[09:49:16] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
[09:50:47] <wikibugs>	 (03PS1) 10Muehlenhoff: Assign ganeti role for ganeti1041/ganeti1042 [puppet] - 10https://gerrit.wikimedia.org/r/1087393 (https://phabricator.wikimedia.org/T378921)
[09:50:55] <jinxer-wm>	 FIRING: MaxConntrack: Max conntrack at 93.3% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[09:52:51] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Platform-SRE, 13Patch-For-Review: Request Kerberos identity for jsn.sherman - https://phabricator.wikimedia.org/T378786#10291834 (10MatthewVernon) 05Open→03Resolved a:03MatthewVernon Hi @jsn.sherman this is all done for you now.
[09:53:50] <wikibugs>	 (03PS1) 10Slyngshede: Revert "P:firewall remove Icinga conntrack check" [puppet] - 10https://gerrit.wikimedia.org/r/1087397
[09:55:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Assign ganeti role for ganeti1041/ganeti1042 [puppet] - 10https://gerrit.wikimedia.org/r/1087393 (https://phabricator.wikimedia.org/T378921) (owner: 10Muehlenhoff)
[09:55:47] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "P:firewall remove Icinga conntrack check" [puppet] - 10https://gerrit.wikimedia.org/r/1087397 (owner: 10Slyngshede)
[09:56:37] <wikibugs>	 (03PS2) 10Slyngshede: Revert "P:firewall remove Icinga conntrack check" [puppet] - 10https://gerrit.wikimedia.org/r/1087397
[09:57:55] <wikibugs>	 (03CR) 10Daniel Kinzler: [C:03+1] "I'm not very familiar with helm charts, but the intent tooks right to me" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087388 (https://phabricator.wikimedia.org/T374683) (owner: 10Alexandros Kosiaris)
[09:58:06] <wikibugs>	 (03CR) 10David Caro: [C:03+1] "LGTM, if you prefer not re-enabling it for everyone it could use some parameter (though complicates things)" [puppet] - 10https://gerrit.wikimedia.org/r/1087397 (owner: 10Slyngshede)
[09:59:22] <wikibugs>	 (03CR) 10Slyngshede: "It's fine, it's just not a very active alert, so the inconvenience is minimal." [puppet] - 10https://gerrit.wikimedia.org/r/1087397 (owner: 10Slyngshede)
[09:59:39] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Revert "P:firewall remove Icinga conntrack check" [puppet] - 10https://gerrit.wikimedia.org/r/1087397 (owner: 10Slyngshede)
[10:00:01] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: api|rest-gateway: Support sending request headers to upstream [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087388 (https://phabricator.wikimedia.org/T374683)
[10:00:36] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[10:00:39] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[10:00:55] <jinxer-wm>	 RESOLVED: MaxConntrack: Max conntrack at 99.42% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[10:01:16] <wikibugs>	 06SRE, 06serviceops, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291857 (10Joe) I would suggest, on the short term, to just run docker with `--network=host` and then check what log calls are being m...
[10:03:45] <wikibugs>	 (03Abandoned) 10Ayounsi: Add Netbox script to change a server's NIC [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1012680 (https://phabricator.wikimedia.org/T360297) (owner: 10Ayounsi)
[10:05:00] <wikibugs>	 (03CR) 10Elukey: [V:03+1 C:03+2] profile::docker::report: use the internal registry endpoint [puppet] - 10https://gerrit.wikimedia.org/r/1087205 (https://phabricator.wikimedia.org/T378618) (owner: 10Elukey)
[10:05:08] <wikibugs>	 (03CR) 10Elukey: [C:03+2] docker_registry_ha: reduce from 300 to 180 the nginx timeout [puppet] - 10https://gerrit.wikimedia.org/r/1087206 (https://phabricator.wikimedia.org/T378618) (owner: 10Elukey)
[10:07:07] <wikibugs>	 06SRE, 06serviceops, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10291863 (10jnuche) More details, successful backports were still happening yesterday, e.g.: https://sal.toolforge.org/log/6iIf-ZIBFFSC...
[10:07:41] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm
[10:09:44] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
[10:09:47] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
[10:09:54] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling es1032 (T376905)', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json
[10:11:32] <elukey>	 !log set proxy timeouts of docker registry's nginx instances from 300s to 180s - T378618
[10:11:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:54] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1032 (T376905)', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json
[10:19:44] <wikibugs>	 (03PS1) 10Urbanecm: CirrusSearch: Disable updating weighted tags via EventBus [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087407 (https://phabricator.wikimedia.org/T378983)
[10:21:01] <wikibugs>	 (03PS1) 10Vgutierrez: liberica: Fix hcforwarder config (take two) [puppet] - 10https://gerrit.wikimedia.org/r/1087408 (https://phabricator.wikimedia.org/T377127)
[10:21:49] <wikibugs>	 (03PS4) 10Alexandros Kosiaris: Revert^2 "ats: Route rest_v1/page/(html|title) to rest-gateway" [puppet] - 10https://gerrit.wikimedia.org/r/1087230 (https://phabricator.wikimedia.org/T374683)
[10:22:22] <wikibugs>	 (03CR) 10Btullis: [C:03+1] aqs1013 replaced by aqs1022 (hardware refresh) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087240 (https://phabricator.wikimedia.org/T379026) (owner: 10Eevans)
[10:22:55] <wikibugs>	 (03CR) 10DCausse: [C:03+1] "thanks" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087407 (https://phabricator.wikimedia.org/T378983) (owner: 10Urbanecm)
[10:23:20] <urbanecm>	 thanks dcausse!
[10:23:46] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087408 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[10:27:45] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "> Note that this requires an Airflow user with the same username as the" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1075875 (https://phabricator.wikimedia.org/T375716) (owner: 10Brouberol)
[10:28:15] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] api|rest-gateway: Support sending request headers to upstream [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087388 (https://phabricator.wikimedia.org/T374683) (owner: 10Alexandros Kosiaris)
[10:28:45] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Great, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1087187 (owner: 10Muehlenhoff)
[10:29:07] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] "We absolutely could run it as a one-off CLI call, but we could also ensure these user/roles via a DAG. Whatever float our boat!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1075875 (https://phabricator.wikimedia.org/T375716) (owner: 10Brouberol)
[10:29:10] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1087186 (owner: 10Muehlenhoff)
[10:30:22] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] "makes sense to me" [puppet] - 10https://gerrit.wikimedia.org/r/1087408 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[10:30:32] <wikibugs>	 (03CR) 10Btullis: [C:03+1] global_config: expose additional ports on hadoop masters/workers [puppet] - 10https://gerrit.wikimedia.org/r/1087135 (https://phabricator.wikimedia.org/T377928) (owner: 10Brouberol)
[10:31:01] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] liberica: Fix hcforwarder config (take two) [puppet] - 10https://gerrit.wikimedia.org/r/1087408 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[10:31:01] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json
[10:31:37] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Looks good, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1087136 (https://phabricator.wikimedia.org/T377928) (owner: 10Brouberol)
[10:31:47] <wikibugs>	 (03CR) 10Brouberol: [V:03+1 C:03+2] global_config: expose additional ports on hadoop masters/workers [puppet] - 10https://gerrit.wikimedia.org/r/1087135 (https://phabricator.wikimedia.org/T377928) (owner: 10Brouberol)
[10:31:58] <wikibugs>	 (03CR) 10Brouberol: [V:03+1 C:03+2] global_config: define external services entries for the hive metastore servers [puppet] - 10https://gerrit.wikimedia.org/r/1087136 (https://phabricator.wikimedia.org/T377928) (owner: 10Brouberol)
[10:32:22] <wikibugs>	 (03CR) 10Marostegui: [C:04-1] mariadb: productionize db2236 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1087202 (https://phabricator.wikimedia.org/T373579) (owner: 10Arnaudb)
[10:34:42] <wikibugs>	 (03PS3) 10Arnaudb: mariadb: productionize db2236 [puppet] - 10https://gerrit.wikimedia.org/r/1087202 (https://phabricator.wikimedia.org/T373579)
[10:34:50] <wikibugs>	 (03CR) 10Marostegui: "Grants were applied on the sections already?" [puppet] - 10https://gerrit.wikimedia.org/r/1087369 (https://phabricator.wikimedia.org/T368874) (owner: 10Arnaudb)
[10:35:02] <wikibugs>	 (03CR) 10Arnaudb: "good catch!" [puppet] - 10https://gerrit.wikimedia.org/r/1087202 (https://phabricator.wikimedia.org/T373579) (owner: 10Arnaudb)
[10:35:35] <wikibugs>	 (03CR) 10Marostegui: "let's do one at the time. Let's start with m5" [dns] - 10https://gerrit.wikimedia.org/r/1087374 (https://phabricator.wikimedia.org/T368874) (owner: 10Arnaudb)
[10:35:48] <wikibugs>	 (03CR) 10Arnaudb: "nope!" [puppet] - 10https://gerrit.wikimedia.org/r/1087369 (https://phabricator.wikimedia.org/T368874) (owner: 10Arnaudb)
[10:35:54] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] mariadb: productionize db2236 [puppet] - 10https://gerrit.wikimedia.org/r/1087202 (https://phabricator.wikimedia.org/T373579) (owner: 10Arnaudb)
[10:37:21] <wikibugs>	 (03CR) 10Marostegui: [C:04-1] "Then that needs to be done before we commit this." [puppet] - 10https://gerrit.wikimedia.org/r/1087369 (https://phabricator.wikimedia.org/T368874) (owner: 10Arnaudb)
[10:37:50] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install kubestage200[3-4] - https://phabricator.wikimedia.org/T377009#10291974 (10Clement_Goubert) Thanks @Jhancock.wm !
[10:40:41] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[10:41:22] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[10:41:23] <logmsgbot>	 !log jnuche@deploy2002 Installing scap version "4.121.0" for 209 hosts
[10:43:30] <wikibugs>	 (03PS1) 10Muehlenhoff: Add a helper script to setup the Ganeti KVM vg [puppet] - 10https://gerrit.wikimedia.org/r/1087412
[10:43:58] <wikibugs>	 (03PS2) 10Muehlenhoff: Add a helper script to setup the Ganeti LVM vg [puppet] - 10https://gerrit.wikimedia.org/r/1087412
[10:44:27] <logmsgbot>	 !log jnuche@deploy2002 install-world aborted: (no justification provided) (duration: 03m 09s)
[10:44:40] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add a helper script to setup the Ganeti LVM vg [puppet] - 10https://gerrit.wikimedia.org/r/1087412 (owner: 10Muehlenhoff)
[10:46:08] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json
[10:46:10] <logmsgbot>	 !log jnuche@deploy2002 Installing scap version "4.121.0" for 209 hosts
[10:46:12] <wikibugs>	 (03PS3) 10Muehlenhoff: Add a helper script to setup the Ganeti LVM vg [puppet] - 10https://gerrit.wikimedia.org/r/1087412
[10:47:04] <wikibugs>	 (03CR) 10Peter Fischer: [C:03+1] CirrusSearch: Disable updating weighted tags via EventBus [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087407 (https://phabricator.wikimedia.org/T378983) (owner: 10Urbanecm)
[10:48:25] <urbanecm>	 jouncebot: nowandnext
[10:48:26] <jouncebot>	 For the next 0 hour(s) and 11 minute(s): MediaWiki train - Utc-0+Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T0900)
[10:48:26] <jouncebot>	 In 0 hour(s) and 11 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T1100)
[10:50:38] <wikibugs>	 (03PS2) 10Arnaudb: dbproxy: switch CNAMEs [dns] - 10https://gerrit.wikimedia.org/r/1087374 (https://phabricator.wikimedia.org/T368874)
[10:51:01] <wikibugs>	 (03CR) 10Arnaudb: "done!" [dns] - 10https://gerrit.wikimedia.org/r/1087374 (https://phabricator.wikimedia.org/T368874) (owner: 10Arnaudb)
[10:52:13] <wikibugs>	 (03CR) 10Marostegui: [C:04-1] "-1 as this is blocked on the grants" [dns] - 10https://gerrit.wikimedia.org/r/1087374 (https://phabricator.wikimedia.org/T368874) (owner: 10Arnaudb)
[10:52:14] <jnuche>	 urbanecm: I'm trying to solve a train blocker and would like to try to run the presync again, are you thinking of deploying something?
[10:52:37] <jinxer-wm>	 FIRING: [4x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:53:06] <wikibugs>	 06SRE, 07SRE-Unowned, 10wikitech.wikimedia.org: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#10292031 (10jijiki) >>! In T376400#10287901, @MatthewVernon wrote: > @jijiki can you expand on what you mean, please? This task is currently too broad...  For the time being the task is delibe...
[10:53:08] <urbanecm>	 jnuche: was thinking of deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1087407, but i can wait for the nearest window, i don't want to interfere with the train :)
[10:53:49] <jnuche>	 urbanecm: ty :)
[10:55:46] <jnuche>	 running the train presync in the next few mins if I don't see any objections
[10:56:28] <wikibugs>	 (03PS20) 10Clément Goubert: Provide conftool data for mwcron and mwscript-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1083146 (https://phabricator.wikimedia.org/T341555)
[10:56:30] <wikibugs>	 (03CR) 10Clément Goubert: Provide conftool data for mwcron and mwscript-k8s (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1083146 (https://phabricator.wikimedia.org/T341555) (owner: 10Clément Goubert)
[10:58:38] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] hiera: fix haproxykafka workers number [puppet] - 10https://gerrit.wikimedia.org/r/1085465 (https://phabricator.wikimedia.org/T377614) (owner: 10Fabfur)
[10:59:51] <wikibugs>	 (03Abandoned) 10Fabfur: hiera: fix haproxykafka workers number [puppet] - 10https://gerrit.wikimedia.org/r/1085465 (https://phabricator.wikimedia.org/T377614) (owner: 10Fabfur)
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T1100)
[11:01:15] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1032 (T376905)', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json
[11:01:20] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance
[11:01:33] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance
[11:01:40] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling es1029 (T376905)', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json
[11:01:58] <wikibugs>	 (03PS1) 10Vgutierrez: liberica: Set healthcheck type [puppet] - 10https://gerrit.wikimedia.org/r/1087417 (https://phabricator.wikimedia.org/T377127)
[11:02:29] <logmsgbot>	 !log jnuche@deploy2002 Started scap sync-world: testwikis to 1.44.0-wmf.2  refs T375661
[11:02:32] <stashbot>	 T375661: 1.44.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T375661
[11:02:46] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087417 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[11:07:40] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1029 (T376905)', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json
[11:09:23] <wikibugs>	 (03PS1) 10Elukey: tlsproxy::localssl: allow multiple listens for tls ports [puppet] - 10https://gerrit.wikimedia.org/r/1087421 (https://phabricator.wikimedia.org/T378944)
[11:09:24] <wikibugs>	 (03PS1) 10Elukey: Change port for kartotherian-ssl [puppet] - 10https://gerrit.wikimedia.org/r/1087422 (https://phabricator.wikimedia.org/T378944)
[11:09:26] <wikibugs>	 (03PS1) 10Elukey: profile::trafficserver::backend: move kartotherian to port 6543 [puppet] - 10https://gerrit.wikimedia.org/r/1087423 (https://phabricator.wikimedia.org/T378944)
[11:10:00] <wikibugs>	 (03CR) 10CI reject: [V:04-1] tlsproxy::localssl: allow multiple listens for tls ports [puppet] - 10https://gerrit.wikimedia.org/r/1087421 (https://phabricator.wikimedia.org/T378944) (owner: 10Elukey)
[11:10:16] <wikibugs>	 (03PS1) 10Kosta Harlan: Revert^2 "temp accounts: Enable temp account creation on second-round pilots" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087424
[11:11:16] <wikibugs>	 (03PS2) 10Kosta Harlan: Revert^2 "temp accounts: Enable temp account creation on second-round pilots" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087424 (https://phabricator.wikimedia.org/T378336)
[11:12:40] <kostajh>	 urbanecm: I also have a patch to sync if scap is unbroken 
[11:17:00] <wikibugs>	 06SRE, 07SRE-Unowned, 10wikitech.wikimedia.org: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#10292157 (10jijiki)
[11:17:21] <wikibugs>	 06SRE, 07SRE-Unowned, 10wikitech.wikimedia.org: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#10292158 (10jijiki) @MatthewVernon updated description
[11:17:30] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1087417 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[11:18:59] <wikibugs>	 (03PS2) 10Vgutierrez: liberica: Set healthcheck type [puppet] - 10https://gerrit.wikimedia.org/r/1087417 (https://phabricator.wikimedia.org/T377127)
[11:19:08] <wikibugs>	 (03CR) 10Vgutierrez: liberica: Set healthcheck type (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1087417 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[11:22:40] <wikibugs>	 (03PS2) 10Elukey: tlsproxy::localssl: allow multiple listens for tls ports [puppet] - 10https://gerrit.wikimedia.org/r/1087421 (https://phabricator.wikimedia.org/T378944)
[11:22:40] <wikibugs>	 (03PS2) 10Elukey: Change port for kartotherian-ssl [puppet] - 10https://gerrit.wikimedia.org/r/1087422 (https://phabricator.wikimedia.org/T378944)
[11:22:47] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json
[11:23:26] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] liberica: Set healthcheck type [puppet] - 10https://gerrit.wikimedia.org/r/1087417 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[11:23:44] <urbanecm>	 kostajh: not sure whether the message was for me, we're in mw infra window anyway, and i'm not doing any changes atm :)
[11:24:52] <kostajh>	 urbanecm: it was intended for you, as I saw you were interested to sync a patch as well
[11:25:45] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+1] Deprecate system::role for memcached/redis roles [puppet] - 10https://gerrit.wikimedia.org/r/1083160 (owner: 10Muehlenhoff)
[11:25:50] <urbanecm>	 ah, gotcha.
[11:27:04] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] kubernetes: fix hostnames for eqiad refresh and expansion [puppet] - 10https://gerrit.wikimedia.org/r/1087216 (https://phabricator.wikimedia.org/T376185) (owner: 10Clément Goubert)
[11:30:10] <kostajh>	 jnuche: did it work?
[11:31:23] <wikibugs>	 06SRE, 06serviceops, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10292204 (10jnuche)
[11:31:40] <jnuche>	 kostajh: yeah, train presync is currently running :)
[11:31:40] <claime>	 kostajh: urbanecm: We don't have much to do in that window today afaict, so if you want to go ahead and take it to finish backports once scap's good, I think it's all right
[11:32:00] <wikibugs>	 06SRE, 06serviceops, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10292200 (10jnuche) p:05Unbreak!→03Triage Train now got past the failing stage. Removing this as a blocker
[11:32:15] <kostajh>	 +1
[11:32:22] <urbanecm>	 ty!
[11:32:34] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] kubernetes: fix hostnames for eqiad refresh and expansion [puppet] - 10https://gerrit.wikimedia.org/r/1087216 (https://phabricator.wikimedia.org/T376185) (owner: 10Clément Goubert)
[11:34:40] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10292215 (10Clement_Goubert) Hostnames fixed in tasks and in puppet, sorry about that.
[11:34:46] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[13-28] - https://phabricator.wikimedia.org/T378185#10292216 (10Clement_Goubert) Hostnames fixed in tasks and in puppet, sorry about that.
[11:37:15] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] Revert^2 "ats: Route rest_v1/page/(html|title) to rest-gateway" [puppet] - 10https://gerrit.wikimedia.org/r/1087230 (https://phabricator.wikimedia.org/T374683) (owner: 10Alexandros Kosiaris)
[11:37:54] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json
[11:38:57] <logmsgbot>	 !log jnuche@deploy2002 Finished scap sync-world: testwikis to 1.44.0-wmf.2  refs T375661 (duration: 36m 28s)
[11:39:05] <stashbot>	 T375661: 1.44.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T375661
[11:39:57] <jnuche>	 presync done, I'm gonna promote to grup0 and then we can do the backports
[11:40:12] <wikibugs>	 (03CR) 10Hnowlan: [V:03+1] "PCC SUCCESS (CORE_DIFF 16): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4454/c" [puppet] - 10https://gerrit.wikimedia.org/r/1087421 (https://phabricator.wikimedia.org/T378944) (owner: 10Elukey)
[11:41:15] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087433 (https://phabricator.wikimedia.org/T375661)
[11:41:16] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] group0 to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087433 (https://phabricator.wikimedia.org/T375661) (owner: 10TrainBranchBot)
[11:41:44] <wikibugs>	 (03CR) 10Hnowlan: [V:03+1 C:03+1] tlsproxy::localssl: allow multiple listens for tls ports [puppet] - 10https://gerrit.wikimedia.org/r/1087421 (https://phabricator.wikimedia.org/T378944) (owner: 10Elukey)
[11:41:50] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] Change port for kartotherian-ssl [puppet] - 10https://gerrit.wikimedia.org/r/1087422 (https://phabricator.wikimedia.org/T378944) (owner: 10Elukey)
[11:42:01] <wikibugs>	 (03Merged) 10jenkins-bot: group0 to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087433 (https://phabricator.wikimedia.org/T375661) (owner: 10TrainBranchBot)
[11:42:13] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] profile::trafficserver::backend: move kartotherian to port 6543 [puppet] - 10https://gerrit.wikimedia.org/r/1087423 (https://phabricator.wikimedia.org/T378944) (owner: 10Elukey)
[11:46:15] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040
[11:47:38] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040
[11:47:51] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041
[11:49:03] <wikibugs>	 (03PS1) 10Slyngshede: Netfilter: Route alerts for cloud hosts to WMCS. [alerts] - 10https://gerrit.wikimedia.org/r/1087434
[11:49:06] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041
[11:52:11] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042
[11:53:01] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1029 (T376905)', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json
[11:53:18] <logmsgbot>	 !log jnuche@deploy2002 rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2  refs T375661
[11:53:21] <stashbot>	 T375661: 1.44.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T375661
[11:53:26] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042
[11:55:57] <jnuche>	 checking everything looks healthy
[11:57:24] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
[11:58:14] <jnuche>	 well damn, I need to rollback...
[11:58:26] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087435
[12:01:13] <wikibugs>	 (03PS1) 10TrainBranchBot: testwikis to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087438 (https://phabricator.wikimedia.org/T375661)
[12:01:14] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] testwikis to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087438 (https://phabricator.wikimedia.org/T375661) (owner: 10TrainBranchBot)
[12:01:38] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
[12:02:00] <wikibugs>	 (03Merged) 10jenkins-bot: testwikis to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087438 (https://phabricator.wikimedia.org/T375661) (owner: 10TrainBranchBot)
[12:02:27] <logmsgbot>	 !log jnuche@deploy2002 Started scap sync-world: testwikis to 1.44.0-wmf.2  refs T375661
[12:02:29] <stashbot>	 T375661: 1.44.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T375661
[12:02:40] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B
[12:04:33] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B
[12:10:10] <logmsgbot>	 !log jnuche@deploy2002 Finished scap sync-world: testwikis to 1.44.0-wmf.2  refs T375661 (duration: 07m 43s)
[12:10:18] <stashbot>	 T375661: 1.44.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T375661
[12:11:13] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10292360 (10MoritzMuehlenhoff)
[12:11:34] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Drive host network config from Netbox, and move away from ifupdown - https://phabricator.wikimedia.org/T347411#10292358 (10cmooney) While thinking about T378346 it occurs to me there may be a half-way approach to all this which might work.  In brief:   * We provision hosts...
[12:13:16] <urbanecm>	 jnuche: i see the rollback finished, does that mean things are done (for now), please?
[12:14:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: mediawiki_job_startupregistrystats-testwiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:14:34] <jnuche>	 kostajh, urbanecm: yeah, sorry, I was checking errors went back to normal after the rollback. You can go ahead with your backports from my side
[12:14:42] <urbanecm>	 thanks!
[12:14:53] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] CirrusSearch: Disable updating weighted tags via EventBus [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087407 (https://phabricator.wikimedia.org/T378983) (owner: 10Urbanecm)
[12:15:33] <wikibugs>	 (03Merged) 10jenkins-bot: CirrusSearch: Disable updating weighted tags via EventBus [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087407 (https://phabricator.wikimedia.org/T378983) (owner: 10Urbanecm)
[12:16:09] <logmsgbot>	 !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1087407|CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]]
[12:16:13] <stashbot>	 T378983: Add Link recommendation are not being processed by CirrusSearch (November 2024) - https://phabricator.wikimedia.org/T378983
[12:16:13] <stashbot>	 T377150: Config: enable CirrusSearchEnableEventBusWeightedTags  - https://phabricator.wikimedia.org/T377150
[12:16:22] <kostajh>	 urbanecm: lmk when you're done please 
[12:16:32] <urbanecm>	 kostajh: will do!
[12:17:37] <jinxer-wm>	 RESOLVED: [4x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:17:58] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing
[12:18:12] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing
[12:18:18] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing
[12:18:31] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing
[12:19:49] <wikibugs>	 (03CR) 10Tchanders: [C:03+1] Revert^2 "temp accounts: Enable temp account creation on second-round pilots" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087424 (https://phabricator.wikimedia.org/T378336) (owner: 10Kosta Harlan)
[12:22:47] <wikibugs>	 (03PS1) 10Kosta Harlan: maintain-views.yml: Fix globalblocks filtering [puppet] - 10https://gerrit.wikimedia.org/r/1087443 (https://phabricator.wikimedia.org/T378994)
[12:23:49] <logmsgbot>	 !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1087407|CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)]] (duration: 07m 39s)
[12:23:53] <stashbot>	 T378983: Add Link recommendation are not being processed by CirrusSearch (November 2024) - https://phabricator.wikimedia.org/T378983
[12:23:54] <stashbot>	 T377150: Config: enable CirrusSearchEnableEventBusWeightedTags  - https://phabricator.wikimedia.org/T377150
[12:27:46] <wikibugs>	 (03CR) 10FNegri: [C:03+1] "Adding Ben and Amir to reviewers, but I think this one can be merged straight away as it's just a simple fix." [puppet] - 10https://gerrit.wikimedia.org/r/1087443 (https://phabricator.wikimedia.org/T378994) (owner: 10Kosta Harlan)
[12:28:07] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
[12:28:30] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10292394 (10ops-monitoring-bot) Draining ganeti1013.eqiad.wmnet of running VMs
[12:30:13] <urbanecm>	 kostajh: deployment done
[12:30:31] <kostajh>	 Thanks
[12:32:13] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
[12:32:18] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] maintain-views.yml: Fix globalblocks filtering [puppet] - 10https://gerrit.wikimedia.org/r/1087443 (https://phabricator.wikimedia.org/T378994) (owner: 10Kosta Harlan)
[12:32:51] <wikibugs>	 (03CR) 10FNegri: [C:03+2] maintain-views.yml: Fix globalblocks filtering [puppet] - 10https://gerrit.wikimedia.org/r/1087443 (https://phabricator.wikimedia.org/T378994) (owner: 10Kosta Harlan)
[12:33:02] <urbanecm>	 !log mwmaint2002: kill all  instances of refreshLinkRecommendation (T378983)
[12:33:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:33:05] <stashbot>	 T378983: Add Link recommendation are not being processed by CirrusSearch (November 2024) - https://phabricator.wikimedia.org/T378983
[12:33:18] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
[12:33:27] <urbanecm>	 !log eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; T378983)
[12:33:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:34:14] <urbanecm>	 voila, update is here!
[12:34:25] <logmsgbot>	 !log fnegri@cumin1002 START - Cookbook sre.wikireplicas.update-views
[12:34:25] <logmsgbot>	 !log fnegri@cumin1002 END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93)
[12:35:00] <logmsgbot>	 !log fnegri@cumin1002 START - Cookbook sre.wikireplicas.update-views
[12:35:01] <logmsgbot>	 !log fnegri@cumin1002 END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93)
[12:35:09] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10292423 (10ops-monitoring-bot) Draining ganeti1013.eqiad.wmnet of running VMs
[12:35:38] <logmsgbot>	 !log fnegri@cumin1002 START - Cookbook sre.wikireplicas.update-views
[12:36:25] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: mediawiki_job_growthexperiments-refreshLinkRecommendations-s1.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:36:46] <kostajh>	 jouncebot: nowandnext
[12:36:47] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 23 minute(s)
[12:36:47] <jouncebot>	 In 0 hour(s) and 23 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T1300)
[12:36:58] <kostajh>	 ok, I'm going ahead with the temp accounts patch
[12:38:04] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087424 (https://phabricator.wikimedia.org/T378336) (owner: 10Kosta Harlan)
[12:38:46] <wikibugs>	 (03Merged) 10jenkins-bot: Revert^2 "temp accounts: Enable temp account creation on second-round pilots" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087424 (https://phabricator.wikimedia.org/T378336) (owner: 10Kosta Harlan)
[12:39:11] <logmsgbot>	 !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1087424|Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]]
[12:39:21] <stashbot>	 T378336: Temporary Accounts: Minor pilots - Nov 5 deploy - https://phabricator.wikimedia.org/T378336
[12:40:26] <logmsgbot>	 !log fnegri@cumin1002 END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
[12:42:01] <logmsgbot>	 !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1087424|Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[12:46:13] <logmsgbot>	 !log kharlan@deploy2002 kharlan: Continuing with sync
[12:49:22] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
[12:50:21] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10292432 (10Marostegui) es2022 → will be replaced by es2043, server can be installed in C6: There is already one es in C6, better to look for a differen...
[12:50:58] <logmsgbot>	 !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1087424|Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)]] (duration: 11m 46s)
[12:51:01] <stashbot>	 T378336: Temporary Accounts: Minor pilots - Nov 5 deploy - https://phabricator.wikimedia.org/T378336
[12:56:37] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
[12:57:40] <kostajh>	 I'm finished
[13:00:05] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T1300)
[13:03:00] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
[13:03:25] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Degraded RAID on wikikube-worker2068 - https://phabricator.wikimedia.org/T378255#10292455 (10Clement_Goubert) 05In progress→03Resolved RAID is now rebuilt.
[13:05:40] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] Only try to find 'real' netmask for IPs if they are /32 or /128 [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1085590 (https://phabricator.wikimedia.org/T378751) (owner: 10Cathal Mooney)
[13:07:49] <wikibugs>	 (03Merged) 10jenkins-bot: Only try to find 'real' netmask for IPs if they are /32 or /128 [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1085590 (https://phabricator.wikimedia.org/T378751) (owner: 10Cathal Mooney)
[13:08:42] <moritzm>	 !log installing php7.4 security updates on remaining non-wikikube servers T378173
[13:08:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:09:17] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
[13:09:53] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
[13:09:58] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
[13:10:26] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
[13:10:28] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
[13:11:06] <wikibugs>	 (03PS2) 10Elukey: profile::trafficserver::backend: move kartotherian to port 6543 [puppet] - 10https://gerrit.wikimedia.org/r/1087423 (https://phabricator.wikimedia.org/T378944)
[13:11:56] <wikibugs>	 (03CR) 10Elukey: "Hey Valentin, adding you to verify that I am not doing something horrible on the nginx side. This config is almost deprecated and used by " [puppet] - 10https://gerrit.wikimedia.org/r/1087421 (https://phabricator.wikimedia.org/T378944) (owner: 10Elukey)
[13:16:42] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
[13:16:44] <stashbot>	 T378068: pc1017 crashed - https://phabricator.wikimedia.org/T378068
[13:16:55] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
[13:24:34] <wikibugs>	 (03PS1) 10Brouberol: airflow: export a PYTHONPATH env var reflecting the new bullseye based image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087454 (https://phabricator.wikimedia.org/T377928)
[13:24:48] <wikibugs>	 (03PS1) 10Gergő Tisza: JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087455 (https://phabricator.wikimedia.org/T379067)
[13:26:27] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] "Thanks to both!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087388 (https://phabricator.wikimedia.org/T374683) (owner: 10Alexandros Kosiaris)
[13:27:16] <wikibugs>	 (03CR) 10Btullis: [C:03+1] airflow: export a PYTHONPATH env var reflecting the new bullseye based image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087454 (https://phabricator.wikimedia.org/T377928) (owner: 10Brouberol)
[13:27:33] <wikibugs>	 (03Merged) 10jenkins-bot: api|rest-gateway: Support sending request headers to upstream [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087388 (https://phabricator.wikimedia.org/T374683) (owner: 10Alexandros Kosiaris)
[13:28:22] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] airflow: export a PYTHONPATH env var reflecting the new bullseye based image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087454 (https://phabricator.wikimedia.org/T377928) (owner: 10Brouberol)
[13:28:26] <wikibugs>	 (03PS1) 10Arnaudb: dotfiles: add ssh config for mgmt [puppet] - 10https://gerrit.wikimedia.org/r/1087457 (https://phabricator.wikimedia.org/T378068)
[13:28:28] <wikibugs>	 06SRE, 10Huggle, 06Infrastructure-Foundations: IRC recent changes provider fails in Huggle after recent irc.wikimedia.org upgrade - https://phabricator.wikimedia.org/T378667#10292537 (10Petrb) 05Open→03Resolved a:03Petrb Hello, new version of Huggle was released that contains a patched libirc  http...
[13:29:09] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+1] JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087455 (https://phabricator.wikimedia.org/T379067) (owner: 10Gergő Tisza)
[13:29:38] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply
[13:31:25] <jinxer-wm>	 RESOLVED: [6x] SystemdUnitFailed: mediawiki_job_growthexperiments-refreshLinkRecommendations-s1.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:32:58] <wikibugs>	 (03CR) 10Bvibber: [C:03+1] "Does exactly what I'd have done for now. :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087455 (https://phabricator.wikimedia.org/T379067) (owner: 10Gergő Tisza)
[13:33:33] <wikibugs>	 10ops-eqiad, 06Data-Persistence-SRE, 06DBA, 06DC-Ops: db1246 crashed, doesn't reboot cleanly - https://phabricator.wikimedia.org/T374215#10292551 (10Marostegui) 05Resolved→03Open @papaul @Jclark-ctr @VRiley-WMF I have reopened this ticket because I do think this is a HW issue and it is in fact a recurr...
[13:33:49] <wikibugs>	 10ops-eqiad, 06Data-Persistence-SRE, 06DBA, 06DC-Ops: db1246 crashed, doesn't reboot cleanly - https://phabricator.wikimedia.org/T374215#10292557 (10Marostegui)
[13:34:28] <moritzm>	 !log imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia T379059
[13:34:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:34:30] <moritzm>	 ^ hashar
[13:34:30] <stashbot>	 T379059: Upgrade Jenkins instances to  2.479.1 - https://phabricator.wikimedia.org/T379059
[13:35:11] <hashar>	 moritzm: Danke Schon!
[13:36:25] <wikibugs>	 06SRE, 06Infrastructure-Foundations: sre.netbox.update-extras hits KeyError with logging - https://phabricator.wikimedia.org/T379072 (10cmooney) 03NEW p:05Triage→03Low
[13:36:39] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Top-of-rack 'MoveServersUplinks' Netbox scripts doesn't clean up the old trunk port - https://phabricator.wikimedia.org/T375216#10292564 (10ayounsi) I added some logging (`self.log_info(f"{interface} {interface.enabled} {interface.untagged_vlan} {interface.tagge...
[13:39:47] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
[13:41:40] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply
[13:42:29] <wikibugs>	 (03PS1) 10Brouberol: airflow: create the kerberos token PVC even if kerberos is disabled [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087459 (https://phabricator.wikimedia.org/T375875)
[13:42:40] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[13:42:54] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
[13:44:41] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply
[13:44:51] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
[13:49:45] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, November 05 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087455 (https://phabricator.wikimedia.org/T379067) (owner: 10Gergő Tisza)
[13:52:19] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply
[13:52:28] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
[13:53:33] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
[13:54:05] <arnaudb>	 !log reimage pc1017 T378068
[13:54:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:54:08] <stashbot>	 T378068: pc1017 crashed - https://phabricator.wikimedia.org/T378068
[13:55:56] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Fixup for rest-gateway's request_headers_to_add [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087462 (https://phabricator.wikimedia.org/T374683)
[13:57:13] <moritzm>	 !log installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release
[13:57:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:26] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.6 point update - https://phabricator.wikimedia.org/T374536#10292648 (10MoritzMuehlenhoff)
[13:57:42] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] Fixup for rest-gateway's request_headers_to_add [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087462 (https://phabricator.wikimedia.org/T374683) (owner: 10Alexandros Kosiaris)
[13:58:43] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Top-of-rack 'MoveServersUplinks' Netbox scripts doesn't clean up the old trunk port - https://phabricator.wikimedia.org/T375216#10292651 (10ayounsi) Another point, after running the script, the changelog on a problematic interface shows 3 changes (for that inter...
[13:58:49] <wikibugs>	 (03Merged) 10jenkins-bot: Fixup for rest-gateway's request_headers_to_add [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087462 (https://phabricator.wikimedia.org/T374683) (owner: 10Alexandros Kosiaris)
[14:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: It is that lovely time of the day again! You are hereby commanded to deploy UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T1400).
[14:00:05] <jouncebot>	 tgr: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:24] <Lucas_WMDE>	 o/
[14:01:25] <Lucas_WMDE>	 I can deploy, if someone can test the fix for T379067
[14:01:25] <stashbot>	 T379067: Wikimedia\Rdbms\DBQueryError: Error 1146: Table 'testwikidatawiki.globaljsonlinks' doesn't existFunction: JsonConfig\GlobalJsonLinks::getLinksFromPageQuery: SELECT  gjlt_namespace,gjlt_title  FROM `globaljsonlinks` JOIN `glob - https://phabricator.wikimedia.org/T379067
[14:03:27] <Lucas_WMDE>	 so far I haven’t been able to reproduce it, which is annoying
[14:05:31] <Lucas_WMDE>	 ok, evidently I managed to trigger it, it’s just not user-visible
[14:05:40] <tgr|away>	 o/
[14:05:42] <urbanecm>	 Lucas_WMDE: according to the logs, it seems to appear in deferred update
[14:05:48] <Lucas_WMDE>	 because post output deferred updates fun
[14:05:49] <Lucas_WMDE>	 yup
[14:05:57] <Lucas_WMDE>	 tgr|away: want to self-serve or should I deploy?
[14:06:51] <Lucas_WMDE>	 seems to be reproducible pretty reliably, just make a page edit
[14:06:56] <Lucas_WMDE>	 (e.g. on https://test.wikidata.org/wiki/User:Lucas_Werkmeister_(WMDE)/sandbox)
[14:07:11] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply
[14:07:20] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
[14:07:40] <tgr|away>	 Lucas_WMDE: thx, I can deploy it
[14:07:48] <Lucas_WMDE>	 ok!
[14:07:52] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [codfw] START helmfile.d/services/rest-gateway: apply
[14:08:00] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
[14:08:30] <moritzm>	 !log installing PHP 7.4 security updates on bullseye (as packaged in Debian)
[14:08:30] <Lucas_WMDE>	 (reproducible on mwdebug too, yay)
[14:08:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:09:08] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
[14:10:16] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by tgr@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087455 (https://phabricator.wikimedia.org/T379067) (owner: 10Gergő Tisza)
[14:10:36] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
[14:10:50] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
[14:11:00] <wikibugs>	 (03Merged) 10jenkins-bot: JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087455 (https://phabricator.wikimedia.org/T379067) (owner: 10Gergő Tisza)
[14:11:02] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] Revert^2 "ats: Route rest_v1/page/(html|title) to rest-gateway" [puppet] - 10https://gerrit.wikimedia.org/r/1087230 (https://phabricator.wikimedia.org/T374683) (owner: 10Alexandros Kosiaris)
[14:11:28] <logmsgbot>	 !log tgr@deploy2002 Started scap sync-world: Backport for [[gerrit:1087455|JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]]
[14:11:37] <stashbot>	 T379067: Wikimedia\Rdbms\DBQueryError: Error 1146: Table 'testwikidatawiki.globaljsonlinks' doesn't existFunction: JsonConfig\GlobalJsonLinks::getLinksFromPageQuery: SELECT  gjlt_namespace,gjlt_title  FROM `globaljsonlinks` JOIN `glob - https://phabricator.wikimedia.org/T379067
[14:12:07] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
[14:12:43] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] dotfiles: add ssh config for mgmt [puppet] - 10https://gerrit.wikimedia.org/r/1087457 (https://phabricator.wikimedia.org/T378068) (owner: 10Arnaudb)
[14:12:45] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.6 point update - https://phabricator.wikimedia.org/T374536#10292711 (10MoritzMuehlenhoff)
[14:12:56] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] dotfiles: add ssh config for mgmt [puppet] - 10https://gerrit.wikimedia.org/r/1087457 (https://phabricator.wikimedia.org/T378068) (owner: 10Arnaudb)
[14:15:01] <tgr|away>	 well this is new
[14:15:09] <tgr|away>	 14:13:47 Check 'check_testservers_k8s-1_of_1' failed: Sending to mwdebug.discovery.wmnet...
[14:15:17] <tgr|away>	     Location header: expected 'http://it.wikipedia.org/wiki/Saturno_(astronomia)?a=test', was missing.
[14:15:41] <tgr|away>	 FAIL: 131 requests sent to mwdebug.discovery.wmnet. 1 request with failed assertions.
[14:15:59] <urbanecm>	 tgr|away: i'd say rerun the checks
[14:16:08] <logmsgbot>	 !log tgr@deploy2002 tgr: Backport for [[gerrit:1087455|JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:16:11] <urbanecm>	 i saw flaky checks like this before
[14:16:11] <Lucas_WMDE>	 yeah, it happens sometimes for me (maybe once every few months)
[14:16:18] <urbanecm>	 ^^
[14:16:35] <Lucas_WMDE>	 did it say that the HTTP status was 500, or something else?
[14:16:54] <tgr|away>	 503
[14:17:25] <urbanecm>	 very likely transient then. might be worth filling a "this happens every now and then" task if it doesn't eixst yet
[14:21:19] <Lucas_WMDE>	 only task I’m aware of is T364886
[14:21:20] <stashbot>	 T364886: httpbb should show more information / details about failed checks - https://phabricator.wikimedia.org/T364886
[14:22:06] <wikibugs>	 (03CR) 10Vgutierrez: "sure, no problem 😊" [puppet] - 10https://gerrit.wikimedia.org/r/1087421 (https://phabricator.wikimedia.org/T378944) (owner: 10Elukey)
[14:22:26] <wikibugs>	 (03PS1) 10Arnaudb: mariadb: wipe /srv on pc1017 [puppet] - 10https://gerrit.wikimedia.org/r/1087468 (https://phabricator.wikimedia.org/T378068)
[14:24:05] <logmsgbot>	 !log tgr@deploy2002 tgr: Continuing with sync
[14:28:52] <logmsgbot>	 !log tgr@deploy2002 Finished scap sync-world: Backport for [[gerrit:1087455|JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)]] (duration: 17m 24s)
[14:28:55] <stashbot>	 T379067: Wikimedia\Rdbms\DBQueryError: Error 1146: Table 'testwikidatawiki.globaljsonlinks' doesn't existFunction: JsonConfig\GlobalJsonLinks::getLinksFromPageQuery: SELECT  gjlt_namespace,gjlt_title  FROM `globaljsonlinks` JOIN `glob - https://phabricator.wikimedia.org/T379067
[14:29:11] <vgutierrez>	 !log upload liberica 0.3 to apt.wm.o (bookworm-wikimedia)
[14:29:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:29:39] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
[14:29:52] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
[14:30:00] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling es1026 (T376905)', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json
[14:30:03] <wikibugs>	 (03PS3) 10Elukey: tlsproxy::localssl: allow multiple listens for tls ports [puppet] - 10https://gerrit.wikimedia.org/r/1087421 (https://phabricator.wikimedia.org/T378944)
[14:30:03] <wikibugs>	 (03PS3) 10Elukey: Change port for kartotherian-ssl [puppet] - 10https://gerrit.wikimedia.org/r/1087422 (https://phabricator.wikimedia.org/T378944)
[14:30:03] <wikibugs>	 (03PS3) 10Elukey: profile::trafficserver::backend: move kartotherian to port 6543 [puppet] - 10https://gerrit.wikimedia.org/r/1087423 (https://phabricator.wikimedia.org/T378944)
[14:30:19] <wikibugs>	 (03CR) 10Elukey: tlsproxy::localssl: allow multiple listens for tls ports (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1087421 (https://phabricator.wikimedia.org/T378944) (owner: 10Elukey)
[14:31:48] <Lucas_WMDE>	 nothing else to deploy, I think?
[14:32:37] <wikibugs>	 (03CR) 10Marostegui: [C:03+1] mariadb: wipe /srv on pc1017 [puppet] - 10https://gerrit.wikimedia.org/r/1087468 (https://phabricator.wikimedia.org/T378068) (owner: 10Arnaudb)
[14:32:48] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] mariadb: wipe /srv on pc1017 [puppet] - 10https://gerrit.wikimedia.org/r/1087468 (https://phabricator.wikimedia.org/T378068) (owner: 10Arnaudb)
[14:32:48] <wikibugs>	 (03CR) 10Elukey: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/4455/co" [puppet] - 10https://gerrit.wikimedia.org/r/1087421 (https://phabricator.wikimedia.org/T378944) (owner: 10Elukey)
[14:32:49] <tgr|away>	 !log UTC afternoon deploys done
[14:32:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:59] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm
[14:34:57] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
[14:35:18] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] tlsproxy::localssl: allow multiple listens for tls ports [puppet] - 10https://gerrit.wikimedia.org/r/1087421 (https://phabricator.wikimedia.org/T378944) (owner: 10Elukey)
[14:35:40] <wikibugs>	 (03PS1) 10Slyngshede: Account Managers: Allow account managers to be assigned by LDAP. [software/bitu] - 10https://gerrit.wikimedia.org/r/1087474
[14:35:53] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1026 (T376905)', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json
[14:37:37] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:37:56] <jnuche>	 I'm gonna use the remainder of the window to roll forward the train in a couple mins
[14:38:09] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] spark: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1087186 (owner: 10Muehlenhoff)
[14:38:13] <wikibugs>	 (03PS3) 10CDanis: Create releases for chart-renderer service [deployment-charts] - 10https://gerrit.wikimedia.org/r/1085633 (https://phabricator.wikimedia.org/T376948) (owner: 10Aude)
[14:38:27] <Lucas_WMDE>	 ok!
[14:39:26] <hashar>	 :)
[14:39:50] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087478 (https://phabricator.wikimedia.org/T375661)
[14:39:52] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] group0 to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087478 (https://phabricator.wikimedia.org/T375661) (owner: 10TrainBranchBot)
[14:40:02] <wikibugs>	 (03PS1) 10Muehlenhoff: Revert "spark: Avoid Ferm-specific syntax" [puppet] - 10https://gerrit.wikimedia.org/r/1087479
[14:40:36] <wikibugs>	 (03Merged) 10jenkins-bot: group0 to 1.44.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087478 (https://phabricator.wikimedia.org/T375661) (owner: 10TrainBranchBot)
[14:42:17] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Revert "spark: Avoid Ferm-specific syntax" [puppet] - 10https://gerrit.wikimedia.org/r/1087479 (owner: 10Muehlenhoff)
[14:42:30] <wikibugs>	 (03CR) 10CDanis: [C:03+2] Create releases for chart-renderer service [deployment-charts] - 10https://gerrit.wikimedia.org/r/1085633 (https://phabricator.wikimedia.org/T376948) (owner: 10Aude)
[14:42:47] <wikibugs>	 (03PS4) 10Elukey: Create new lvs service kartotherian-k8s-ssl [puppet] - 10https://gerrit.wikimedia.org/r/1087422 (https://phabricator.wikimedia.org/T378944)
[14:42:47] <wikibugs>	 (03PS4) 10Elukey: profile::trafficserver::backend: move kartotherian to port 6543 [puppet] - 10https://gerrit.wikimedia.org/r/1087423 (https://phabricator.wikimedia.org/T378944)
[14:43:28] <wikibugs>	 (03Merged) 10jenkins-bot: Create releases for chart-renderer service [deployment-charts] - 10https://gerrit.wikimedia.org/r/1085633 (https://phabricator.wikimedia.org/T376948) (owner: 10Aude)
[14:44:16] <logmsgbot>	 !log cdanis@deploy2002 helmfile [staging] START helmfile.d/services/chart-renderer: apply
[14:44:38] <logmsgbot>	 !log cdanis@deploy2002 helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
[14:47:18] <wikibugs>	 (03PS4) 10FNegri: WMCS: split cloudvirt alerts from generic nodes [alerts] - 10https://gerrit.wikimedia.org/r/1084782 (https://phabricator.wikimedia.org/T375479)
[14:48:06] <logmsgbot>	 !log jnuche@deploy2002 rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2  refs T375661
[14:48:09] <stashbot>	 T375661: 1.44.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T375661
[14:48:26] <wikibugs>	 (03PS1) 10Slyngshede: P:idm Assign permission to TarLogic. [puppet] - 10https://gerrit.wikimedia.org/r/1087487 (https://phabricator.wikimedia.org/T352144)
[14:48:45] <wikibugs>	 (03PS1) 10Muehlenhoff: spark: Avoid Ferm-specific syntax (take 2) [puppet] - 10https://gerrit.wikimedia.org/r/1087488
[14:49:05] <wikibugs>	 (03CR) 10FNegri: [C:03+2] alertmanager: fix WMCS template (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1077038 (https://phabricator.wikimedia.org/T375479) (owner: 10FNegri)
[14:50:03] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Q2:rack/setup/install elastic211[0-5] - https://phabricator.wikimedia.org/T378034#10292809 (10Gehel) a:05bking→03None
[14:50:20] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
[14:51:00] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json
[14:52:03] <hashar>	 jnuche: if the train looks good, I will update the CI Jenkins
[14:53:33] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
[14:54:18] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087488 (owner: 10Muehlenhoff)
[14:54:59] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1087487 (https://phabricator.wikimedia.org/T352144) (owner: 10Slyngshede)
[14:55:09] <jnuche>	 hashar: there's another new issue which is probably another blocker, but it affects only one wiki for now so I'm not rolling back
[14:55:14] <jnuche>	 you can update the CI jenkins :)
[14:55:46] <hashar>	 great!
[14:59:56] <wikibugs>	 (03PS1) 10FNegri: Update .pint.hcl syntax for pint >=0.64 [alerts] - 10https://gerrit.wikimedia.org/r/1087490
[15:01:08] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Update .pint.hcl syntax for pint >=0.64 [alerts] - 10https://gerrit.wikimedia.org/r/1087490 (owner: 10FNegri)
[15:01:08] <hashar>	 !log Upgrading CI Jenkins | T379059
[15:01:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:12] <stashbot>	 T379059: Upgrade Jenkins instances to  2.479.1 - https://phabricator.wikimedia.org/T379059
[15:01:58] <logmsgbot>	 !log cdanis@deploy2002 helmfile [codfw] START helmfile.d/services/chart-renderer: apply
[15:02:10] <hashar>	 oh nice
[15:02:19] <logmsgbot>	 !log cdanis@deploy2002 helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
[15:02:23] <logmsgbot>	 !log cdanis@deploy2002 helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
[15:02:36] <hashar>	 the agents can't connect over ssh
[15:02:37] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:02:56] <logmsgbot>	 !log cdanis@deploy2002 helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
[15:05:12] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] P:idm Assign permission to TarLogic. [puppet] - 10https://gerrit.wikimedia.org/r/1087487 (https://phabricator.wikimedia.org/T352144) (owner: 10Slyngshede)
[15:06:07] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json
[15:06:13] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Drive host network config from Netbox, and move away from ifupdown - https://phabricator.wikimedia.org/T347411#10292867 (10cmooney) I spoke with @ayounsi on irc about this approach and he expressed some concerns with the use of a cookbook to manage the host's network config...
[15:12:25] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove spark2 profile [puppet] - 10https://gerrit.wikimedia.org/r/1087187
[15:12:37] <hashar>	 I gotta rollback
[15:12:50] <hashar>	 short of installaing Java 17 on the agents :D
[15:13:47] <hashar>	 what I don't get is that I was sure I had all the agents upgraded bah
[15:14:30] <moritzm>	 better now than when there's the next Jenkins security release (and then we can only move to the current LTS)
[15:14:36] <wikibugs>	 (03PS5) 10Elukey: Create new lvs service kartotherian-k8s-ssl [puppet] - 10https://gerrit.wikimedia.org/r/1087422 (https://phabricator.wikimedia.org/T378944)
[15:14:37] <wikibugs>	 (03PS5) 10Elukey: profile::trafficserver::backend: move kartotherian to port 6543 [puppet] - 10https://gerrit.wikimedia.org/r/1087423 (https://phabricator.wikimedia.org/T378944)
[15:15:19] <hashar>	 and of course I can't login Horizon
[15:15:19] <hashar>	 bah
[15:15:21] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm
[15:16:18] <hashar>	 https://openstack.eqiad1.wikimediacloud.org:25000/protected goes in a loop
[15:16:28] <hashar>	 ah I got in eventually
[15:18:09] <hashar>	 !log Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in T359795  and blocks today Jenkins upgrade ( T379059 )
[15:18:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:18:20] <stashbot>	 T359795: Switch Jenkins instances from Java 11 to Java 17 - https://phabricator.wikimedia.org/T359795
[15:18:20] <stashbot>	 T379059: Upgrade Jenkins instances to  2.479.1 - https://phabricator.wikimedia.org/T379059
[15:18:51] <hashar>	 https://integration.wikimedia.org/ci/computer/integration%2Dagent%2Ddocker%2D1044/log
[15:18:53] <hashar>	 \o/
[15:19:12] <hashar>	 moritzm: yup, thank you for the positive mood!
[15:19:56] <hashar>	 I don't know why I forgot the WMCS instances
[15:20:35] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
[15:20:42] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] varnish: Move wm_recv_early subroutine to inline [puppet] - 10https://gerrit.wikimedia.org/r/1083913 (https://phabricator.wikimedia.org/T370200) (owner: 10BCornwall)
[15:21:06] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] varnish: Move wm_recv_purge subroutine to inline [puppet] - 10https://gerrit.wikimedia.org/r/1083914 (https://phabricator.wikimedia.org/T370200) (owner: 10BCornwall)
[15:21:14] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1026 (T376905)', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json
[15:21:19] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
[15:21:32] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
[15:21:39] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling es1033 (T376905)', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json
[15:27:54] <hashar>	 !log Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # T359795
[15:27:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:57] <stashbot>	 T359795: Switch Jenkins instances from Java 11 to Java 17 - https://phabricator.wikimedia.org/T359795
[15:28:20] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1033 (T376905)', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json
[15:29:20] <wikibugs>	 (03CR) 10BCornwall: [C:03+2] varnish: Move wm_recv_purge subroutine to inline [puppet] - 10https://gerrit.wikimedia.org/r/1083914 (https://phabricator.wikimedia.org/T370200) (owner: 10BCornwall)
[15:29:20] <wikibugs>	 (03CR) 10BCornwall: [C:03+2] varnish: Move wm_recv_early subroutine to inline [puppet] - 10https://gerrit.wikimedia.org/r/1083913 (https://phabricator.wikimedia.org/T370200) (owner: 10BCornwall)
[15:29:47] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Platform-SRE: Request Kerberos identity for jsn.sherman - https://phabricator.wikimedia.org/T378786#10292979 (10jsn.sherman) >>! In T378786#10291834, @MatthewVernon wrote: > Hi @jsn.sherman this is all done for you now. Thanks @MatthewVernon! I w...
[15:32:36] <hashar>	 !log Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker  # T359795
[15:32:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:35:38] <wikibugs>	 (03PS1) 10Arnaudb: Revert "mariadb: wipe /srv on pc1017" [puppet] - 10https://gerrit.wikimedia.org/r/1087499
[15:37:05] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
[15:37:28] <wikibugs>	 (03PS1) 10Brouberol: global_config: register the Yarn resourcemanager IPC port in the hadoop service [puppet] - 10https://gerrit.wikimedia.org/r/1087500 (https://phabricator.wikimedia.org/T377602)
[15:38:05] <wikibugs>	 (03CR) 10CI reject: [V:04-1] global_config: register the Yarn resourcemanager IPC port in the hadoop service [puppet] - 10https://gerrit.wikimedia.org/r/1087500 (https://phabricator.wikimedia.org/T377602) (owner: 10Brouberol)
[15:40:01] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
[15:41:18] <Lucas_WMDE>	 jouncebot: nowandnext
[15:41:19] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 18 minute(s)
[15:41:19] <jouncebot>	 In 0 hour(s) and 18 minute(s): SRE Collaboration Services office hours (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T1600)
[15:41:40] <Lucas_WMDE>	 (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ConfirmEdit/+/1087494 will probably want backporting soonish, though it should go through gate-and-submit on master first)
[15:41:56] <hashar>	 I have successfully upgraded the CI Jenkins
[15:42:28] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087187 (owner: 10Muehlenhoff)
[15:42:40] <Lucas_WMDE>	 ah, I thought things looked slightly different :)
[15:42:47] <Lucas_WMDE>	 yay for upgrades!
[15:42:55] <hashar>	 jnuche: I have upgraded the CI Jenkins successfully!  The releases one can wait next week, there is no security update in that LTS version
[15:43:14] <hashar>	 the changelog at https://www.jenkins.io/changelog-stable/  lists a bunch of UI improvements
[15:43:19] <claime>	 Lucas_WMDE: Can I merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1083146 first, so I can use your backport to see if I didn't break anything existing?
[15:43:25] <hashar>	  Enhancements and refinements for the appearance of several pages in Jenkins. pull 9521, pull 9707, pull 9461, pull 9411, pull 9393, pull 9381
[15:43:25] <hashar>	 Refinements and modernizations to sections of the Jenkins UI. pull 9453, pull 9380, pull 9365, pull 9395, pull 9641 
[15:43:26] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json
[15:43:44] <Lucas_WMDE>	 claime: as far as I’m concerned, sure
[15:43:50] <wikibugs>	 (03CR) 10Scott French: [C:03+1] Provide conftool data for mwcron and mwscript-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1083146 (https://phabricator.wikimedia.org/T341555) (owner: 10Clément Goubert)
[15:43:50] <Lucas_WMDE>	 not sure it’s “my” backport to begin with ^^
[15:43:56] <Lucas_WMDE>	 we’ll see
[15:44:08] <claime>	 s/your/that/ :p
[15:44:26] <Lucas_WMDE>	 :P
[15:44:32] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] Provide conftool data for mwcron and mwscript-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1083146 (https://phabricator.wikimedia.org/T341555) (owner: 10Clément Goubert)
[15:45:53] <wikibugs>	 (03PS2) 10Brouberol: global_config: register the Yarn resourcemanager IPC port in the hadoop service [puppet] - 10https://gerrit.wikimedia.org/r/1087500 (https://phabricator.wikimedia.org/T377602)
[15:46:14] <jnuche>	 hashar: ack!
[15:47:32] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
[15:48:09] <moritzm>	 !log remove ganeti1013 from active ganeti nodes T378921
[15:48:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:48:24] <stashbot>	 T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921
[15:50:18] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B
[15:50:34] <icinga-wm>	 PROBLEM - ganeti-confd running on ganeti1013 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 112 (gnt-confd), command name ganeti-confd https://wikitech.wikimedia.org/wiki/Ganeti
[15:51:00] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti1013 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), command name ganeti-noded https://wikitech.wikimedia.org/wiki/Ganeti
[15:51:29] <wikibugs>	 (03CR) 10Scott French: [C:03+2] trafficserver: Lua script for routing 8.1-enrolled traffic [puppet] - 10https://gerrit.wikimedia.org/r/1072821 (https://phabricator.wikimedia.org/T377042) (owner: 10Scott French)
[15:51:33] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B
[15:51:57] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B
[15:51:57] <wikibugs>	 (03CR) 10Btullis: [C:03+1] global_config: register the Yarn resourcemanager IPC port in the hadoop service [puppet] - 10https://gerrit.wikimedia.org/r/1087500 (https://phabricator.wikimedia.org/T377602) (owner: 10Brouberol)
[15:52:37] <jinxer-wm>	 FIRING: ProbeDown: Service ganeti1013:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:53:11] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B
[15:53:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[15:53:46] <wikibugs>	 (03CR) 10Eevans: [C:03+2] aqs1013 replaced by aqs1022 (hardware refresh) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087240 (https://phabricator.wikimedia.org/T379026) (owner: 10Eevans)
[15:54:16] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
[15:54:21] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet
[15:54:33] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
[15:54:48] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10293116 (10ops-monitoring-bot) Draining ganeti1014.eqiad.wmnet of running VMs
[15:55:24] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10293120 (10MoritzMuehlenhoff)
[15:56:38] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 - https://phabricator.wikimedia.org/T378921#10293134 (10MoritzMuehlenhoff)
[15:56:38] <wikibugs>	 (03PS1) 10JHathaway: ms-be: partman EFI recipe [puppet] - 10https://gerrit.wikimedia.org/r/1087505 (https://phabricator.wikimedia.org/T371400)
[15:58:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[15:58:33] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json
[15:58:50] <claime>	 Lucas_WMDE: Done, should be all good
[15:59:12] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Fixup paths to moved resources [extensions/ConfirmEdit] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087507 (https://phabricator.wikimedia.org/T379080)
[15:59:19] <Lucas_WMDE>	 uploaded the backport ^
[15:59:33] <Lucas_WMDE>	 Reedy: do you want to deploy or should I?
[16:00:05] <jouncebot>	 eoghan, jelto, arnoldokoth, and mutante: OwO what's this, a deployment window?? SRE Collaboration Services office hours. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T1600). nyaa~
[16:00:09] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
[16:00:10] <Reedy>	 feel free if you're bored :P
[16:00:15] <Lucas_WMDE>	 ^^
[16:00:16] <Reedy>	 But I can do it if you'd rather
[16:00:45] <Reedy>	 I think I should file a bug about that... It feels like something that should be tested by CI
[16:00:57] <Lucas_WMDE>	 I don’t mind either way
[16:01:09] <Lucas_WMDE>	 (T290932 is a more general “make remoteExtPath better” task, I suppose)
[16:01:10] <stashbot>	 T290932: Figure out remoteExtPath/remoteBasePath automatically for the common case - https://phabricator.wikimedia.org/T290932
[16:01:11] <wikibugs>	 (03CR) 10Elukey: "Left a nit, LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1087505 (https://phabricator.wikimedia.org/T371400) (owner: 10JHathaway)
[16:01:18] <wikibugs>	 (03CR) 10Elukey: [C:03+1] ms-be: partman EFI recipe [puppet] - 10https://gerrit.wikimedia.org/r/1087505 (https://phabricator.wikimedia.org/T371400) (owner: 10JHathaway)
[16:01:56] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm
[16:02:55] <wikibugs>	 (03PS2) 10JHathaway: ms-be: partman EFI recipe [puppet] - 10https://gerrit.wikimedia.org/r/1087505 (https://phabricator.wikimedia.org/T371400)
[16:02:57] <Lucas_WMDE>	 eoghan, jelto, arnoldokoth, mutante: there’s a train blocker fix (T379080) that would be nice to backport soon, is it okay to do it during your window?
[16:02:58] <stashbot>	 T379080: RuntimeException: package file not found or not a file: "/srv/mediawiki/php-1.44.0-wmf.2/extensions/ConfirmEdit/FancyCaptcha/resources/ext.confirmEdit.fancyCaptcha.js" - https://phabricator.wikimedia.org/T379080
[16:03:05] <wikibugs>	 (03CR) 10JHathaway: ms-be: partman EFI recipe (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1087505 (https://phabricator.wikimedia.org/T371400) (owner: 10JHathaway)
[16:03:35] <wikibugs>	 (03CR) 10Elukey: "More info: https://wikitech.wikimedia.org/wiki/UEFI_Boot" [puppet] - 10https://gerrit.wikimedia.org/r/1087505 (https://phabricator.wikimedia.org/T371400) (owner: 10JHathaway)
[16:04:15] <wikibugs>	 (03CR) 10JHathaway: ms-be: partman EFI recipe (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1087505 (https://phabricator.wikimedia.org/T371400) (owner: 10JHathaway)
[16:05:32] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Fixup paths to moved resources [extensions/ConfirmEdit] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087507 (https://phabricator.wikimedia.org/T379080) (owner: 10Lucas Werkmeister (WMDE))
[16:05:51] <Amir1>	 Lucas_WMDE: go for it!
[16:06:23] <Lucas_WMDE>	 ok ^^
[16:06:28] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/ConfirmEdit] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087507 (https://phabricator.wikimedia.org/T379080) (owner: 10Lucas Werkmeister (WMDE))
[16:06:39] <Lucas_WMDE>	 if anyone things I shouldn’t deploy, you still have some time while gate-and-submit runs
[16:07:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[16:07:30] <jnuche>	 all good from the conductor side (and thank you!)
[16:07:46] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove spark2 profile [puppet] - 10https://gerrit.wikimedia.org/r/1087187 (owner: 10Muehlenhoff)
[16:09:00] * Lucas_WMDE checks if the issue is reproducible
[16:09:29] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] global_config: register the Yarn resourcemanager IPC port in the hadoop service [puppet] - 10https://gerrit.wikimedia.org/r/1087500 (https://phabricator.wikimedia.org/T377602) (owner: 10Brouberol)
[16:09:46] <Reedy>	 In theory, some missing JS/less/css... whether it's noticeable from the UI.. :P
[16:10:11] <Lucas_WMDE>	 https://www.mediawiki.org/w/load.php?lang=en&modules=ext.confirmEdit.fancyCaptcha is good enough for me, that shows the error ^^
[16:10:15] <wikibugs>	 (03CR) 10MVernon: "Hi," [puppet] - 10https://gerrit.wikimedia.org/r/1087505 (https://phabricator.wikimedia.org/T371400) (owner: 10JHathaway)
[16:12:34] <Lucas_WMDE>	 (https://www.mediawiki.org/wiki/Special:AbuseFilter/?rulescope=global&deletedfilters=hide&limit=500 looks like all the showcaptcha abuse filters are private so I wouldn’t know how to trigger them anyway)
[16:12:45] <Lucas_WMDE>	 (and I’m happy to stay ignorant there)
[16:13:40] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1033 (T376905)', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json
[16:14:23] <wikibugs>	 (03PS1) 10Muehlenhoff: dns::auth::update: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1087508
[16:14:35] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance
[16:14:48] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance
[16:14:55] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling es1031 (T376905)', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json
[16:17:25] <wikibugs>	 (03PS1) 10Dzahn: admin: add a yubikey SSH key to user dzahn [puppet] - 10https://gerrit.wikimedia.org/r/1087509
[16:18:57] <jinxer-wm>	 RESOLVED: ProbeDown: Service ganeti1013:1811 has failed probes (tcp_ganeti_noded_ip4) - https://wikitech.wikimedia.org/wiki/Ganeti - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[16:20:49] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1031 (T376905)', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json
[16:22:47] <wikibugs>	 (03PS1) 10CDanis: Add service chart-renderer to k8s ingress as a/a [dns] - 10https://gerrit.wikimedia.org/r/1087511 (https://phabricator.wikimedia.org/T372081)
[16:27:07] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] Add service chart-renderer to k8s ingress as a/a [dns] - 10https://gerrit.wikimedia.org/r/1087511 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[16:28:23] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] Add service chart-renderer to k8s ingress as a/a [dns] - 10https://gerrit.wikimedia.org/r/1087511 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[16:28:39] <wikibugs>	 (03Merged) 10jenkins-bot: Fixup paths to moved resources [extensions/ConfirmEdit] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087507 (https://phabricator.wikimedia.org/T379080) (owner: 10Lucas Werkmeister (WMDE))
[16:29:13] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1087507|Fixup paths to moved resources (T379080)]]
[16:29:26] <wikibugs>	 (03CR) 10CDanis: [C:03+2] Add service chart-renderer to k8s ingress as a/a [dns] - 10https://gerrit.wikimedia.org/r/1087511 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[16:29:26] <stashbot>	 T379080: RuntimeException: package file not found or not a file: "/srv/mediawiki/php-1.44.0-wmf.2/extensions/ConfirmEdit/FancyCaptcha/resources/ext.confirmEdit.fancyCaptcha.js" - https://phabricator.wikimedia.org/T379080
[16:32:04] <logmsgbot>	 !log cdanis@cumin1002 START - Cookbook sre.dns.netbox
[16:32:13] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Backport for [[gerrit:1087507|Fixup paths to moved resources (T379080)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[16:32:27] <Lucas_WMDE>	 https://www.mediawiki.org/w/load.php?lang=en&modules=ext.confirmEdit.fancyCaptcha works with mwdebug \o/
[16:32:29] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Continuing with sync
[16:34:20] <logmsgbot>	 !log cdanis@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:35:46] <wikibugs>	 (03PS1) 10Eevans: remove outdated (and incorrect) host entries [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087514
[16:35:56] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json
[16:37:15] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1087507|Fixup paths to moved resources (T379080)]] (duration: 08m 02s)
[16:37:18] <stashbot>	 T379080: RuntimeException: package file not found or not a file: "/srv/mediawiki/php-1.44.0-wmf.2/extensions/ConfirmEdit/FancyCaptcha/resources/ext.confirmEdit.fancyCaptcha.js" - https://phabricator.wikimedia.org/T379080
[16:38:21] * Lucas_WMDE done deploying
[16:38:45] <wikibugs>	 (03CR) 10Eevans: "I'm not completely certain of the rationale provided in the commit message, but I'm employing [Cunningham's Law](https://meta.wikimedia.or" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087514 (owner: 10Eevans)
[16:39:20] <wikibugs>	 (03PS1) 10CDanis: service catalog: add chart-renderer in service_setup [puppet] - 10https://gerrit.wikimedia.org/r/1087515 (https://phabricator.wikimedia.org/T372081)
[16:40:54] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] service catalog: add chart-renderer in service_setup [puppet] - 10https://gerrit.wikimedia.org/r/1087515 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[16:43:00] <wikibugs>	 (03PS1) 10Vgutierrez: prometheus::ops: Scrape liberica endpoints [puppet] - 10https://gerrit.wikimedia.org/r/1087517 (https://phabricator.wikimedia.org/T377127)
[16:43:45] <wikibugs>	 (03CR) 10Hnowlan: "I kinda suspect the config rendered by an empty array here will just break the service when used in a test setting. The rationale for this" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087514 (owner: 10Eevans)
[16:46:08] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087517 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[16:50:54] <wikibugs>	 (03CR) 10Scott French: "Agreed, yeah - this is roughly what I suspected when this question came up on I5bae756dfa8ed3d0e22d2281d4c45e68a6236be7 (e.g., possibly us" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087514 (owner: 10Eevans)
[16:51:03] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json
[16:51:39] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: designate: deploy and enable wmcs-nova-fixed-ptr [puppet] - 10https://gerrit.wikimedia.org/r/1087521 (https://phabricator.wikimedia.org/T378192)
[16:54:55] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] prometheus::ops: Scrape liberica endpoints [puppet] - 10https://gerrit.wikimedia.org/r/1087517 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[16:55:18] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] prometheus::ops: Scrape liberica endpoints [puppet] - 10https://gerrit.wikimedia.org/r/1087517 (https://phabricator.wikimedia.org/T377127) (owner: 10Vgutierrez)
[16:57:46] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: designate: deploy and enable wmcs-nova-fixed-ptr [puppet] - 10https://gerrit.wikimedia.org/r/1087521 (https://phabricator.wikimedia.org/T378192)
[16:58:05] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1087521 (https://phabricator.wikimedia.org/T378192) (owner: 10Arturo Borrero Gonzalez)
[16:58:57] <wikibugs>	 06SRE, 06serviceops, 05MediaWiki-backport-deployments, 05Train Deployments: MW script "eval.php" failing during scap operations - https://phabricator.wikimedia.org/T379044#10293477 (10thcipriani) >>! In T379044#10291795, @Joe wrote: > and something in eval.php tries to log the call. I guess it's something...
[16:59:17] <wikibugs>	 (03CR) 10CDanis: [C:03+2] service catalog: add chart-renderer in service_setup [puppet] - 10https://gerrit.wikimedia.org/r/1087515 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[17:00:05] <jouncebot>	 jhathaway and rzl: #bothumor I � Unicode. All rise for Puppet request window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T1700).
[17:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[17:03:23] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10293524 (10Jhancock.wm)
[17:06:10] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1031 (T376905)', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json
[17:06:15] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
[17:06:29] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
[17:06:37] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling es1028 (T376905)', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json
[17:11:15] <wikibugs>	 07Puppet, 10MW-on-K8s, 10Observability-Alerting, 10SRE Observability (FY2024/2025-Q2): Clean up "git repo needs merge" checks - https://phabricator.wikimedia.org/T370530#10293583 (10lmata)
[17:11:21] <wikibugs>	 10SRE-swift-storage, 10Observability-Alerting, 10SRE Observability (FY2024/2025-Q2): Remove load_average check for ms-be/thanos-be - https://phabricator.wikimedia.org/T370526#10293584 (10lmata)
[17:12:16] <wikibugs>	 07sre-alert-triage, 10SRE Observability (FY2024/2025-Q2): Alert in need of triage: AlertLintProblem (instance localhost:9123) - https://phabricator.wikimedia.org/T354255#10293597 (10lmata)
[17:13:30] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1028 (T376905)', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json
[17:13:51] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Q2:rack/setup/install wdqs202[67] - https://phabricator.wikimedia.org/T378031#10293609 (10Jhancock.wm)
[17:15:12] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Q2:rack/setup/install wdqs202[67] - https://phabricator.wikimedia.org/T378031#10293622 (10Jhancock.wm) @bking  we got these in. Please update the site.pp file. We should be able to get these to you by End of...
[17:16:03] <wikibugs>	 06SRE, 06cloud-services-team, 10Cloud-VPS, 10observability, and 3 others: ossl rsyslog errors post-migration - https://phabricator.wikimedia.org/T351710#10293600 (10lmata)
[17:16:41] <wikibugs>	 10ops-codfw, 06SRE, 06Data-Persistence, 06Data-Persistence-SRE, and 3 others: Q2:rack/setup/install es204[1-6] - https://phabricator.wikimedia.org/T378146#10293606 (10Jhancock.wm) @Marostegui this should make it diverse. lmk if you want something different. es2043 > C7 es2045 > A7 es2046 > B7  @ABran-WMF w...
[17:26:23] <wikibugs>	 (03CR) 10David Caro: [C:03+1] "LGTM" [alerts] - 10https://gerrit.wikimedia.org/r/1084782 (https://phabricator.wikimedia.org/T375479) (owner: 10FNegri)
[17:28:37] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json
[17:28:52] <wikibugs>	 (03PS1) 10CDanis: chart-renderer: enable ingress [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087527 (https://phabricator.wikimedia.org/T372081)
[17:29:11] <wikibugs>	 (03PS1) 10Hnowlan: rest-gateway: order mw-api-int paths strictly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087528 (https://phabricator.wikimedia.org/T379097)
[17:30:18] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] chart-renderer: enable ingress [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087527 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[17:31:01] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087528 (https://phabricator.wikimedia.org/T379097) (owner: 10Hnowlan)
[17:31:02] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] rest-gateway: order mw-api-int paths strictly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087528 (https://phabricator.wikimedia.org/T379097) (owner: 10Hnowlan)
[17:31:02] <wikibugs>	 (03CR) 10CDanis: [C:03+2] chart-renderer: enable ingress [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087527 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[17:32:02] <wikibugs>	 (03Merged) 10jenkins-bot: rest-gateway: order mw-api-int paths strictly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087528 (https://phabricator.wikimedia.org/T379097) (owner: 10Hnowlan)
[17:32:27] <wikibugs>	 (03Merged) 10jenkins-bot: chart-renderer: enable ingress [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087527 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[17:32:30] <wikibugs>	 (03PS1) 10FNegri: alertmanager: simplify WMCS templates [puppet] - 10https://gerrit.wikimedia.org/r/1087531 (https://phabricator.wikimedia.org/T375479)
[17:32:43] <logmsgbot>	 !log cdanis@deploy2002 helmfile [staging] START helmfile.d/services/chart-renderer: apply
[17:32:59] <logmsgbot>	 !log cdanis@deploy2002 helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
[17:33:10] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply
[17:33:20] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
[17:34:36] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
[17:34:51] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
[17:35:10] <wikibugs>	 (03PS1) 10CDanis: fix vscode file extension mangling [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087532 (https://phabricator.wikimedia.org/T372081)
[17:36:03] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [codfw] START helmfile.d/services/rest-gateway: apply
[17:36:04] <wikibugs>	 (03PS2) 10Eevans: replace list of cassandra hosts with faux values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087514
[17:36:12] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
[17:37:40] <wikibugs>	 (03CR) 10CDanis: [C:03+2] fix vscode file extension mangling [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087532 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[17:37:53] <wikibugs>	 (03CR) 10Eevans: "Thanks @hnowlan@wikimedia.org; Thanks @swfrench@wikimedia.org!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087514 (owner: 10Eevans)
[17:38:31] <wikibugs>	 (03Abandoned) 10JHathaway: ms-be: partman EFI recipe [puppet] - 10https://gerrit.wikimedia.org/r/1087505 (https://phabricator.wikimedia.org/T371400) (owner: 10JHathaway)
[17:38:49] <wikibugs>	 (03Merged) 10jenkins-bot: fix vscode file extension mangling [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087532 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[17:39:20] <logmsgbot>	 !log cdanis@deploy2002 helmfile [staging] START helmfile.d/services/chart-renderer: apply
[17:39:29] <logmsgbot>	 !log cdanis@deploy2002 helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
[17:40:29] <icinga-wm>	 PROBLEM - mysqld processes on pc1017 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[17:40:29] <icinga-wm>	 PROBLEM - MariaDB Replica SQL: pc5 on pc1017 is CRITICAL: CRITICAL slave_sql_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[17:40:29] <icinga-wm>	 PROBLEM - MariaDB Replica IO: pc5 on pc1017 is CRITICAL: CRITICAL slave_io_state could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[17:40:29] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: pc5 on pc1017 is CRITICAL: CRITICAL slave_sql_lag could not connect https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[17:40:31] <icinga-wm>	 PROBLEM - MariaDB read only pc5 on pc1017 is CRITICAL: Could not connect to localhost:3306 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[17:40:31] <icinga-wm>	 PROBLEM - MariaDB Event Scheduler pc5 on pc1017 is CRITICAL: Could not connect to localhost:3306 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Event_Scheduler
[17:41:40] <logmsgbot>	 !log cdanis@deploy2002 helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
[17:41:56] <logmsgbot>	 !log cdanis@deploy2002 helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
[17:41:59] <logmsgbot>	 !log cdanis@deploy2002 helmfile [codfw] START helmfile.d/services/chart-renderer: apply
[17:42:16] <logmsgbot>	 !log cdanis@deploy2002 helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
[17:43:44] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json
[17:44:09] <hnowlan>	 uhhh is that pc1017 alert something to worry about? 
[17:44:18] <hnowlan>	 ah, just reimaged 
[17:51:01] <wikibugs>	 (03PS1) 10CDanis: chart-renderer: to production [puppet] - 10https://gerrit.wikimedia.org/r/1087536 (https://phabricator.wikimedia.org/T372081)
[17:51:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] chart-renderer: to production [puppet] - 10https://gerrit.wikimedia.org/r/1087536 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[17:51:14] <wikibugs>	 (03PS2) 10CDanis: chart-renderer: to production [puppet] - 10https://gerrit.wikimedia.org/r/1087536 (https://phabricator.wikimedia.org/T372081)
[17:55:00] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[17:55:41] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[17:55:50] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] chart-renderer: to production [puppet] - 10https://gerrit.wikimedia.org/r/1087536 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[17:55:59] <wikibugs>	 (03CR) 10CDanis: [C:03+2] chart-renderer: to production [puppet] - 10https://gerrit.wikimedia.org/r/1087536 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[17:57:35] <wikibugs>	 (03PS1) 10JHathaway: ms-be-simple: partman EFI recipe [puppet] - 10https://gerrit.wikimedia.org/r/1087538 (https://phabricator.wikimedia.org/T371400)
[17:57:55] <jinxer-wm>	 FIRING: MaxConntrack: Max conntrack at 100% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[17:58:51] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance es1028 (T376905)', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json
[17:59:53] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance
[17:59:53] <wikibugs>	 (03PS1) 10Esanders: Deploy DiscussionTools visual enhancements to top 10 wikis (exc. enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087539 (https://phabricator.wikimedia.org/T379102)
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T1800)
[18:00:07] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance
[18:00:14] <logmsgbot>	 !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling es1021 (T376905)', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json
[18:02:55] <jinxer-wm>	 RESOLVED: MaxConntrack: Max conntrack at 92.12% on krb1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[18:04:38] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM!  Hopefully one less annoying manual thing." [puppet] - 10https://gerrit.wikimedia.org/r/1087412 (owner: 10Muehlenhoff)
[18:06:34] <wikibugs>	 (03CR) 10Scott French: [C:03+1] replace list of cassandra hosts with faux values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087514 (owner: 10Eevans)
[18:06:38] <wikibugs>	 (03CR) 10Majavah: Add a helper script to setup the Ganeti LVM vg (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1087412 (owner: 10Muehlenhoff)
[18:06:47] <wikibugs>	 (03CR) 10MVernon: [C:03+1] "LGTM, thanks, although I am a very long way from a partman expert!" [puppet] - 10https://gerrit.wikimedia.org/r/1087538 (https://phabricator.wikimedia.org/T371400) (owner: 10JHathaway)
[18:10:52] <Amir1>	 !log gradual delete of thumbs in fawiki local images in both dcs
[18:10:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:18:25] <wikibugs>	 (03PS3) 10Scott French: changeprop: add per-rule consumer properties in jobqueue [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087542 (https://phabricator.wikimedia.org/T356241)
[18:22:07] <icinga-wm>	 PROBLEM - Disk space on releases1003 is CRITICAL: DISK CRITICAL - /srv/docker/overlay2/f1609191c411ecee6097dae25efe86b565ec8bcc80fe892af9879c08eb96a590/merged is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops
[18:23:29] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] "Nice, thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087542 (https://phabricator.wikimedia.org/T356241) (owner: 10Scott French)
[18:24:31] <wikibugs>	 (03PS4) 10Anzx: cswiki: adding throttle rule for Editathon Czechoslovakia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087540 (https://phabricator.wikimedia.org/T379060)
[18:24:55] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, November 05 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087540 (https://phabricator.wikimedia.org/T379060) (owner: 10Anzx)
[18:28:25] <wikibugs>	 (03PS5) 10Anzx: cswiki: adding throttle rule for Editathon Czechoslovakia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087540 (https://phabricator.wikimedia.org/T379060)
[18:40:25] <wikibugs>	 (03PS4) 10Muehlenhoff: Add a helper script to setup the Ganeti LVM vg [puppet] - 10https://gerrit.wikimedia.org/r/1087412
[18:41:15] <wikibugs>	 (03CR) 10Muehlenhoff: Add a helper script to setup the Ganeti LVM vg (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1087412 (owner: 10Muehlenhoff)
[18:42:07] <icinga-wm>	 RECOVERY - Disk space on releases1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops
[18:43:05] <wikibugs>	 (03PS2) 10Dzahn: admin: add group approver for ldap-admins (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1085435 (https://phabricator.wikimedia.org/T276465)
[18:45:45] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.netbox
[18:46:41] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 74428224 and 3 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[18:47:41] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 6898296 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[18:48:20] <wikibugs>	 (03PS2) 10Muehlenhoff: admin: add group approver for ldap-admins (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1085435 (https://phabricator.wikimedia.org/T276465) (owner: 10Dzahn)
[18:48:20] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1085435 (https://phabricator.wikimedia.org/T276465) (owner: 10Dzahn)
[19:00:05] <jouncebot>	 jnuche and dduvall: May I have your attention please! MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T1900)
[19:21:10] <wikibugs>	 (03PS2) 10Dzahn: admin: add group approver for ldap-admins (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1085435 (https://phabricator.wikimedia.org/T276465)
[19:21:19] <wikibugs>	 (03PS3) 10Dzahn: admin: add group approver for ldap-admins [puppet] - 10https://gerrit.wikimedia.org/r/1085435 (https://phabricator.wikimedia.org/T276465)
[19:21:35] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] admin: add group approver for ldap-admins [puppet] - 10https://gerrit.wikimedia.org/r/1085435 (https://phabricator.wikimedia.org/T276465) (owner: 10Dzahn)
[19:36:22] <wikibugs>	 (03PS1) 10Scott French: changeprop-jobqueue: update to 2024-11-05-170900-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087557 (https://phabricator.wikimedia.org/T356241)
[19:36:24] <wikibugs>	 (03PS1) 10Scott French: changeprop-jobqueue: set max poll interval and revert concurrency [deployment-charts] - 10https://gerrit.wikimedia.org/r/1087558 (https://phabricator.wikimedia.org/T356241)
[19:39:26] <wikibugs>	 (03PS1) 10Urbanecm: AbstractProvider: Normalize top level config correctly [extensions/CommunityConfiguration] (wmf/1.44.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1087560 (https://phabricator.wikimedia.org/T379094)
[19:39:36] <wikibugs>	 (03PS1) 10Urbanecm: AbstractProvider: Normalize top level config correctly [extensions/CommunityConfiguration] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087561 (https://phabricator.wikimedia.org/T379094)
[19:40:31] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10294144 (10cmooney) >>! In T377381#10274577, @Jclark-ctr wrote: > @cmooney  fyi i have 10x of the 100g green handled optics  J...
[19:52:33] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
[19:52:33] <logmsgbot>	 !log cmooney@cumin1002 END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
[19:56:36] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
[19:56:37] <logmsgbot>	 !log cmooney@cumin1002 END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
[19:57:50] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
[19:57:52] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.netbox
[20:02:30] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002"
[20:02:58] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002"
[20:02:58] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:05:17] <wikibugs>	 (03PS1) 10CDanis: services_proxy: add chart-renderer & enable for MW [puppet] - 10https://gerrit.wikimedia.org/r/1087565 (https://phabricator.wikimedia.org/T372081)
[20:07:21] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@codfw to codfw) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[20:07:34] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet
[20:07:36] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.dns.netbox
[20:14:00] <logmsgbot>	 !log cmooney@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002"
[20:14:05] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002"
[20:14:05] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[20:48:39] <wikibugs>	 (03CR) 10Scott French: [C:03+1] services_proxy: add chart-renderer & enable for MW [puppet] - 10https://gerrit.wikimedia.org/r/1087565 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[20:53:22] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, November 05 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/CommunityConfiguration] (wmf/1.44.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1087560 (https://phabricator.wikimedia.org/T379094) (owner: 10Urbanecm)
[20:53:54] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, November 05 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [extensions/CommunityConfiguration] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087561 (https://phabricator.wikimedia.org/T379094) (owner: 10Urbanecm)
[20:54:37] <wikibugs>	 06SRE, 10SRE-tools, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485#10294334 (10cmooney) I used the cookbook to provision the two new frack switches in eqiad this evening.  Mostly it worked ok,...
[20:55:46] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] AbstractProvider: Normalize top level config correctly [extensions/CommunityConfiguration] (wmf/1.44.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1087560 (https://phabricator.wikimedia.org/T379094) (owner: 10Urbanecm)
[20:55:47] <wikibugs>	 (03CR) 10Urbanecm: [C:03+2] AbstractProvider: Normalize top level config correctly [extensions/CommunityConfiguration] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087561 (https://phabricator.wikimedia.org/T379094) (owner: 10Urbanecm)
[20:56:31] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[20:56:56] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
[20:58:19] <icinga-wm>	 PROBLEM - Host ganeti1041 is DOWN: PING CRITICAL - Packet loss = 100%
[21:00:05] <jouncebot>	 RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T2100). Please do the needful.
[21:00:05] <jouncebot>	 anzx and urbanecm: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:25] <urbanecm>	 i can deploy today
[21:00:28] <urbanecm>	 anzx: hi!
[21:00:36] <logmsgbot>	 !log cmooney@cumin1002 END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet
[21:00:47] <icinga-wm>	 RECOVERY - Host ganeti1041 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms
[21:00:47] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087540 (https://phabricator.wikimedia.org/T379060) (owner: 10Anzx)
[21:00:55] <urbanecm>	 i'll go ahead given it is a throttle rule, and there's nothing to do anyway
[21:01:26] <logmsgbot>	 !log jclark@cumin1002 START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[21:01:37] <wikibugs>	 (03Merged) 10jenkins-bot: cswiki: adding throttle rule for Editathon Czechoslovakia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087540 (https://phabricator.wikimedia.org/T379060) (owner: 10Anzx)
[21:02:08] <logmsgbot>	 !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1087540|cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]]
[21:02:19] <stashbot>	 T379060: Lift IP cap on 2024-11-06 for Editathon Czechoslovakia - cs.wikipedia - https://phabricator.wikimedia.org/T379060
[21:02:29] <wikibugs>	 (03PS1) 10Cathal Mooney: Configure cumin ssh to use network settings for fasw switches [puppet] - 10https://gerrit.wikimedia.org/r/1087572 (https://phabricator.wikimedia.org/T336485)
[21:03:09] <icinga-wm>	 PROBLEM - Host ganeti1041 is DOWN: PING CRITICAL - Packet loss = 100%
[21:05:47] <icinga-wm>	 RECOVERY - Host ganeti1041 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms
[21:06:46] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[21:06:48] <wikibugs>	 10ops-codfw, 06DC-Ops: PowerSupplyFailure - https://phabricator.wikimedia.org/T379116 (10phaultfinder) 03NEW
[21:11:41] <logmsgbot>	 !log jclark@cumin1002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
[21:13:52] <wikibugs>	 (03Merged) 10jenkins-bot: AbstractProvider: Normalize top level config correctly [extensions/CommunityConfiguration] (wmf/1.44.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1087560 (https://phabricator.wikimedia.org/T379094) (owner: 10Urbanecm)
[21:13:53] <wikibugs>	 (03Merged) 10jenkins-bot: AbstractProvider: Normalize top level config correctly [extensions/CommunityConfiguration] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087561 (https://phabricator.wikimedia.org/T379094) (owner: 10Urbanecm)
[21:14:10] <urbanecm>	 still in progress...
[21:17:28] <wikibugs>	 06SRE, 10Infrastructure Security, 06Infrastructure-Foundations: puppet admin module: Assign approvers to unix groups - https://phabricator.wikimedia.org/T276465#10294415 (10Dzahn)
[21:22:00] <wikibugs>	 (03PS1) 10Dzahn: admin: add group approvers for druid-admins, htmldumps-admin, udp2log-users [puppet] - 10https://gerrit.wikimedia.org/r/1087575 (https://phabricator.wikimedia.org/T276465)
[21:25:42] <wikibugs>	 (03CR) 10Dzahn: "While doing this I was also thinking about adding a new type of group to this yaml. "group of approvers", so we could use "*data_platform_" [puppet] - 10https://gerrit.wikimedia.org/r/1087575 (https://phabricator.wikimedia.org/T276465) (owner: 10Dzahn)
[21:27:33] <wikibugs>	 (03CR) 10CDanis: [C:03+2] services_proxy: add chart-renderer & enable for MW [puppet] - 10https://gerrit.wikimedia.org/r/1087565 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis)
[21:28:31] <urbanecm>	 still deploying the throttle patches...
[21:33:10] <cdanis>	 jouncebot: nowandnext
[21:33:10] <jouncebot>	 For the next 0 hour(s) and 26 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241105T2100)
[21:33:10] <jouncebot>	 In 9 hour(s) and 26 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241106T0700)
[21:33:26] <logmsgbot>	 !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1087540|cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)]] (duration: 31m 18s)
[21:33:29] <stashbot>	 T379060: Lift IP cap on 2024-11-06 for Editathon Czechoslovakia - cs.wikipedia - https://phabricator.wikimedia.org/T379060
[21:33:33] <urbanecm>	 finally
[21:33:36] <urbanecm>	 cdanis: still deploying
[21:33:44] <urbanecm>	 31minutes for a simple config change was unexpected
[21:33:47] <cdanis>	 :|
[21:33:51] <cdanis>	 what step took so long?
[21:34:23] <urbanecm>	 all of them, kind of
[21:34:31] <logmsgbot>	 !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1087560|AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561|AbstractProvider: Normalize top level config correctly (T379094)]]
[21:34:33] <stashbot>	 T379094: TypeError: Argument 1 passed to MediaWiki\Extension\CommunityConfiguration\Provider\AbstractProvider::normalizeTopLevelConfigData() must be an instance of stdClass, array given, called in /srv/mediawiki/php-1.44.0-wmf.1/extensi - https://phabricator.wikimedia.org/T379094
[21:35:19] <urbanecm>	 longest two on scap record: build-and-push-container-images (duration: 11m 57s) , sync-testservers-k8s (duration: 05m 02s)
[21:35:23] <urbanecm>	 but that doesn't add up to 31 minutes...
[21:35:29] <cdanis>	 yeah...
[21:36:45] <urbanecm>	 complete traceback https://www.irccloud.com/pastebin/MaBCwDHA/
[21:36:47] <urbanecm>	 cdanis: ^^
[21:39:13] <dduvall>	 urbanecm: can you provide the contents of /home/urbanecm/scap-image-build-and-push-log as well? i just merged some changes into the release project that performs php8.1 builds and it's possible that your backport kicked off the first of that build
[21:39:51] <urbanecm>	 dduvall: not unless that file gets archived somewhere i started another scap by now
[21:40:02] <swfrench-wmf>	 dduvall: yeah, I was just about to comment - looks like this pulled in the release scripts changes that were merged
[21:41:09] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Discovery-Search, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Q2:rack/setup/install wdqs202[67] - https://phabricator.wikimedia.org/T378031#10294508 (10bking) @Jhancock.wm I think the hosts are [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1084253/1/manifests/...
[21:41:39] <dduvall>	 urbanecm: ok. no problem. it's most likely the full image build that took a long time, but we'll definitely investigate if it persists
[21:41:49] <dduvall>	 sorry, i should have mentioned that here prior to the window
[21:42:07] <urbanecm>	 ack. let's see how this one goes :)
[21:42:18] <dduvall>	 :)
[21:42:34] <cdanis>	 dduvall: should we start auto-archiving those logs? maybe when a build-and-push step takes longer than N minutes 
[21:43:12] <dduvall>	 i believe the plan is to integrate build-images.py into scap soon. at that point the logs will be unified
[21:43:16] <cdanis>	 ack
[21:43:24] <cdanis>	 ... docker buildkit can emit opentelemetry?
[21:43:52] <urbanecm>	 or maybe do an empty backport whenever a change is merged, so that it doesn't impact random deployers?
[21:44:08] <urbanecm>	 it'd be super inconvenient were i to do a rollback that i need to get out ASAP
[21:44:12] <dduvall>	 cdanis: buildkitd can
[21:44:41] <cdanis>	 we could hook that up to jaeger if we wanted to
[21:44:56] <dduvall>	 interesting...
[21:45:09] <cdanis>	 urbanecm: yeah I think +1 on that from me, ideally at least build, push, and maybe pre-pull the images in prod
[21:45:48] <dduvall>	 urbanecm: the incremental image builds are a lot faster. it's just that your backport happened to kick off the very first php8.1 based image which is a full rebuild
[21:45:52] <dduvall>	 _was_ a full rebuild
[21:46:05] <dduvall>	 from here on the incremental optimizations _should_ work correctly
[21:46:17] <urbanecm>	 still can hit a patch that needs to go out ASAP though :)
[21:46:23] <dduvall>	 yeah very true
[21:47:10] <logmsgbot>	 !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1087560|AbstractProvider: Normalize top level config correctly (T379094)]], [[gerrit:1087561|AbstractProvider: Normalize top level config correctly (T379094)]] (duration: 12m 39s)
[21:47:16] <dduvall>	 there's a task open that tracks an idea for preemptively building images based on current backports
[21:47:21] <stashbot>	 T379094: TypeError: Argument 1 passed to MediaWiki\Extension\CommunityConfiguration\Provider\AbstractProvider::normalizeTopLevelConfigData() must be an instance of stdClass, array given, called in /srv/mediawiki/php-1.44.0-wmf.1/extensi - https://phabricator.wikimedia.org/T379094
[21:47:21] * dduvall finds
[21:47:24] <urbanecm>	 well, this one was lot quicker
[21:47:29] <dduvall>	 oh good!
[21:47:33] <urbanecm>	 dduvall: let me know if you want me to capture this idea somewhere
[21:48:52] <dduvall>	 urbanecm: take a look at https://phabricator.wikimedia.org/T378428 and see if it captures some of the idea, and please add your idea/thoughts
[21:49:03] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Grant Access to ldap/nda for Deepesha Burse WMDE - https://phabricator.wikimedia.org/T378182#10294542 (10KFrancis) Hi all, confirming the NDA has been signed.  Thanks!
[21:49:32] <swfrench-wmf>	 there might have also been some amount of churn for the existing 7.4-based images as well, going by how long the various updates took (e.g., sync-testservers-k8s took 5m)
[21:49:43] <urbanecm>	 ty dduvall 
[21:49:48] <swfrench-wmf>	 just guessing, as those delays are usually the image pull
[21:49:51] <dduvall>	 np
[21:49:53] <dduvall>	 swfrench-wmf: ah yeah
[21:50:07] <urbanecm>	 12 mins is still longer than 7.5min earlier today
[21:50:37] <urbanecm>	 but at least not as significant
[21:50:50] <dduvall>	 urbanecm: do you have the image build log of this one still?
[21:50:54] <dduvall>	 i would love to have a look
[21:50:57] <urbanecm>	 yep
[21:51:06] <urbanecm>	 dduvall: is it ok to send them in public?
[21:51:33] <dduvall>	 the image builds are now done in parallel. i was hoping it would be a net optimization, but there is a whole additional mw image build in there too
[21:51:42] <dduvall>	 urbanecm: good question. i'm not 100% on that
[21:51:51] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294547 (10Jhancock.wm)
[21:51:57] <cdanis>	 I can copy them between homedirs on deploy200x if needed
[21:52:09] <dduvall>	 cdanis: urbanecm that works for me
[21:53:05] <dduvall>	 swfrench-wmf: optimizations aside, i'm glad it's working :D
[21:53:21] <urbanecm>	 cdanis: the file's 664 anyway, pretty much anyone can do anything to it
[21:53:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-memcached-exporter.service on idp1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:53:36] <swfrench-wmf>	 dduvall: yeah, I was just about to say - hooray and thank you :)
[21:53:58] <dduvall>	 urbanecm: copied it. ty!
[21:54:01] <swfrench-wmf>	 we now indeed have 8.1 images published
[21:54:10] <wikibugs>	 10ops-eqiad, 06SRE, 06Data-Persistence-SRE, 06DBA, 06DC-Ops: db1246 crashed, doesn't reboot cleanly - https://phabricator.wikimedia.org/T374215#10294549 (10Jclark-ctr) This is still under warranty @VRiley-WMF  If you want to reopen the ticket with Dell    ` 2024-11-05T21:25:01.773142+00:00 db1246 rsyslog...
[21:54:15] <dduvall>	 swfrench-wmf: yw and thanks for the reviews
[21:54:34] <dduvall>	 (and the scap changes :D)
[21:55:16] <urbanecm>	 dduvall: https://phabricator.wikimedia.org/P70951 if you want a more permanent link (it also has the scap terminal log in case you need it too)
[21:55:24] <urbanecm>	 posted as a NDA paste, should be good enough
[21:55:48] <dduvall>	 urbanecm: thanks!
[21:55:58] <dduvall>	 seems like the image builds were pretty fast this time actually
[21:56:46] <dduvall>	 urbanecm: do you still have the scap output as well? i'm curious to see how the image build log timestamps and that task timing compare
[21:57:01] <urbanecm>	 dduvall: that should be on the paste?
[21:57:06] <dduvall>	 *the image build task* timings compare i mean
[21:57:19] <dduvall>	 urbanecm: ah, yes indeed
[21:57:37] <dduvall>	 sorry, looked at the one on disk first
[21:57:44] <wikibugs>	 (03PS1) 10Tchanders: temp accounts: Enable IP reveal rights for local groups on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087584 (https://phabricator.wikimedia.org/T356294)
[21:58:09] <urbanecm>	 i don't have timing from the short one, cleared my terminal logs in the last...9 hours or so
[21:58:30] <dduvall>	 np
[21:59:07] <dduvall>	 would be great if scap finished with a summary of the top 5 longest timed tasks or similar
[22:00:21] <urbanecm>	 yep. or logged everything to /var/log/scap or something
[22:09:20] <cdanis>	 scap does log to logstash
[22:10:04] <cdanis>	 https://logstash.wikimedia.org/goto/1f3d40b27eae2e66cc9e73c8386af9d2 the dashboard is challenging, even by kibana standards
[22:10:07] <cdanis>	 but the data is there
[22:10:42] <cdanis>	 anyway I have to go afk now but sometime soon dduvall let's explore exporting whatever otel trace data we can from buildx/buildkit 
[22:12:43] <urbanecm>	 ooo, cool! til
[22:25:42] <wikibugs>	 (03CR) 10Peter Fischer: [C:03+1] WIP: Migrate package to opensearch [software/opensearch/plugins] - 10https://gerrit.wikimedia.org/r/1080749 (https://phabricator.wikimedia.org/T372769) (owner: 10Ebernhardson)
[22:26:15] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.dns.netbox
[22:26:24] <dduvall>	 cdanis: sounds good!
[22:29:44] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132
[22:29:46] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002"
[22:29:55] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002"
[22:29:56] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[22:30:08] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130
[22:30:10] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131
[22:30:11] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132
[22:30:12] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133
[22:30:14] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134
[22:30:15] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135
[22:30:21] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130
[22:30:22] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131
[22:30:25] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132
[22:30:30] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133
[22:30:31] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135
[22:30:33] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134
[22:31:30] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:31:32] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:31:35] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:31:36] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:31:38] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:31:40] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:42:20] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:42:24] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:42:33] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:42:42] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:42:45] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:42:50] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
[22:52:13] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130']
[22:52:17] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131']
[22:52:21] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132']
[22:52:26] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133']
[22:52:31] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134']
[22:52:38] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135']
[22:52:57] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130']
[22:53:00] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131']
[22:53:04] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132']
[22:53:07] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133']
[22:53:11] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134']
[22:53:14] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135']
[22:54:06] <wikibugs>	 06SRE, 06collaboration-services, 10Wikimedia-Mailing-lists: Message content lost when mailing list is the only recipient - https://phabricator.wikimedia.org/T377045#10294725 (10eoghan) >>! In T377045#10287153, @Platonides wrote: > The bug for multiple mailing lists was fixed several years ago: https://gitlab...
[22:58:13] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm
[22:58:14] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm
[22:58:16] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm
[22:58:18] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm
[22:58:23] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294729 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2130.codfw.wmnet with OS bookworm
[22:58:24] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294730 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2131.codfw.wmnet with OS bookworm
[22:58:25] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294731 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2132.codfw.wmnet with OS bookworm
[22:58:27] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294732 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2133.codfw.wmnet with OS bookworm
[23:00:13] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm
[23:00:15] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm
[23:01:50] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294736 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2134.codfw.wmnet with OS bookworm
[23:01:51] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294737 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2135.codfw.wmnet with OS bookworm
[23:04:55] <wikibugs>	 (03PS1) 10Arlolra: Revert "Revert^2 "ats: Route rest_v1/page/(html|title) to rest-gateway"" [puppet] - 10https://gerrit.wikimedia.org/r/1087591 (https://phabricator.wikimedia.org/T374683)
[23:09:27] <wikibugs>	 (03CR) 10Subramanya Sastry: [C:03+1] Revert "Revert^2 "ats: Route rest_v1/page/(html|title) to rest-gateway"" [puppet] - 10https://gerrit.wikimedia.org/r/1087591 (https://phabricator.wikimedia.org/T374683) (owner: 10Arlolra)
[23:10:11] <wikibugs>	 (03PS1) 10Ebernhardson: TextPassDumper: refresh content address on failure [core] (wmf/1.44.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1087592 (https://phabricator.wikimedia.org/T377594)
[23:10:34] <wikibugs>	 (03PS1) 10Ebernhardson: TextPassDumper: refresh content address on failure [core] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087593 (https://phabricator.wikimedia.org/T377594)
[23:11:31] <icinga-wm>	 PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[23:16:41] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage
[23:16:42] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage
[23:16:58] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage
[23:17:24] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage
[23:17:51] <icinga-wm>	 PROBLEM - BGP status on cr2-eqsin is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[23:17:56] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ebernhardson@deploy2002 using scap backport" [core] (wmf/1.44.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1087592 (https://phabricator.wikimedia.org/T377594) (owner: 10Ebernhardson)
[23:17:56] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ebernhardson@deploy2002 using scap backport" [core] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087593 (https://phabricator.wikimedia.org/T377594) (owner: 10Ebernhardson)
[23:18:27] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] Revert "Revert^2 "ats: Route rest_v1/page/(html|title) to rest-gateway"" [puppet] - 10https://gerrit.wikimedia.org/r/1087591 (https://phabricator.wikimedia.org/T374683) (owner: 10Arlolra)
[23:18:31] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage
[23:18:45] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage
[23:19:59] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage
[23:23:01] <wikibugs>	 (03CR) 10Urbanecm: [C:04-1] "per conversation in Slack, Global Contributons should be more clear about how permissions work there, especially for users with local acce" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1087584 (https://phabricator.wikimedia.org/T356294) (owner: 10Tchanders)
[23:23:28] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage
[23:26:43] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage
[23:30:26] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage
[23:33:32] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage
[23:38:09] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage
[23:39:10] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:42:34] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Q4:rack/setup/install ganeti1039 to ganeti1052 - https://phabricator.wikimedia.org/T365650#10294794 (10Jclark-ctr) I have gone through and rerun all provision scripts on these  i believe this is good to close @elukey
[23:42:51] <wikibugs>	 (03PS1) 10Ryan Kemper: Revert "Pause the XML/SQL dumps due to potential data quality issues" [puppet] - 10https://gerrit.wikimedia.org/r/1087598 (https://phabricator.wikimedia.org/T377594)
[23:43:30] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "Pause the XML/SQL dumps due to potential data quality issues" [puppet] - 10https://gerrit.wikimedia.org/r/1087598 (https://phabricator.wikimedia.org/T377594) (owner: 10Ryan Kemper)
[23:44:47] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:50:04] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:53:04] <wikibugs>	 (03Merged) 10jenkins-bot: TextPassDumper: refresh content address on failure [core] (wmf/1.44.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1087592 (https://phabricator.wikimedia.org/T377594) (owner: 10Ebernhardson)
[23:53:09] <wikibugs>	 (03Merged) 10jenkins-bot: TextPassDumper: refresh content address on failure [core] (wmf/1.44.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1087593 (https://phabricator.wikimedia.org/T377594) (owner: 10Ebernhardson)
[23:53:42] <logmsgbot>	 !log ebernhardson@deploy2002 Started scap sync-world: Backport for [[gerrit:1087592|TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593|TextPassDumper: refresh content address on failure (T377594)]]
[23:53:54] <stashbot>	 T377594: Fix Dumps - errors exporting good revisions - https://phabricator.wikimedia.org/T377594
[23:54:47] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:54:48] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm
[23:54:51] <logmsgbot>	 !log jhancock@cumin2002 END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:54:52] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm
[23:54:53] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:54:54] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm
[23:55:11] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:55:27] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294843 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2131.codfw.wmnet with OS bookworm completed: - wi...
[23:55:28] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294844 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2133.codfw.wmnet with OS bookworm completed: - wi...
[23:55:30] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294845 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2130.codfw.wmnet with OS bookworm completed: - wi...
[23:56:03] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:56:04] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm
[23:56:37] <logmsgbot>	 !log ebernhardson@deploy2002 ebernhardson: Backport for [[gerrit:1087592|TextPassDumper: refresh content address on failure (T377594)]], [[gerrit:1087593|TextPassDumper: refresh content address on failure (T377594)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[23:57:03] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:57:09] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:57:10] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm
[23:57:12] <logmsgbot>	 !log jhancock@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:57:39] <logmsgbot>	 !log ebernhardson@deploy2002 ebernhardson: Continuing with sync
[23:58:01] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
[23:58:02] <logmsgbot>	 !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm
[23:58:54] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294848 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2132.codfw.wmnet with OS bookworm completed: - wi...
[23:58:58] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294850 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2135.codfw.wmnet with OS bookworm completed: - wi...
[23:59:00] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294851 (10Jhancock.wm)
[23:59:02] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294852 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2134.codfw.wmnet with OS bookworm completed: - wi...
[23:59:04] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker21[28-35] - https://phabricator.wikimedia.org/T377007#10294853 (10Jhancock.wm)
[23:59:17] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over