[00:07:49] <icinga-wm>	 PROBLEM - Router interfaces on cr1-drmrs is CRITICAL: CRITICAL: host 185.15.58.128, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:08:39] <icinga-wm>	 PROBLEM - BFD status on cr1-eqiad is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[00:08:49] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:15:03] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:15:39] <icinga-wm>	 RECOVERY - Router interfaces on cr1-drmrs is OK: OK: host 185.15.58.128, interfaces up: 58, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:16:29] <icinga-wm>	 RECOVERY - BFD status on cr1-eqiad is OK: UP: 24 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[00:23:27] <icinga-wm>	 PROBLEM - Router interfaces on cr1-drmrs is CRITICAL: CRITICAL: host 185.15.58.128, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:24:17] <icinga-wm>	 PROBLEM - BFD status on cr1-eqiad is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[00:24:25] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:31:05] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[00:32:49] <icinga-wm>	 RECOVERY - Router interfaces on cr1-drmrs is OK: OK: host 185.15.58.128, interfaces up: 58, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:33:39] <icinga-wm>	 RECOVERY - BFD status on cr1-eqiad is OK: UP: 24 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[00:33:45] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:38:21] <icinga-wm>	 PROBLEM - BFD status on cr1-eqiad is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[00:38:27] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:38:47] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/986834
[00:38:53] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/986834 (owner: 10TrainBranchBot)
[00:39:07] <icinga-wm>	 PROBLEM - Router interfaces on cr1-drmrs is CRITICAL: CRITICAL: host 185.15.58.128, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:43:49] <icinga-wm>	 RECOVERY - Router interfaces on cr1-drmrs is OK: OK: host 185.15.58.128, interfaces up: 58, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:44:35] <icinga-wm>	 RECOVERY - BFD status on cr1-eqiad is OK: UP: 24 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[00:44:41] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[00:58:41] <icinga-wm>	 PROBLEM - BFD status on cr1-eqiad is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[00:59:07] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/986834 (owner: 10TrainBranchBot)
[01:00:17] <icinga-wm>	 RECOVERY - BFD status on cr1-eqiad is OK: UP: 24 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[01:01:06] <jinxer-wm>	 (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[02:37:10] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:46:07] <wikibugs>	 (03CR) 10Gergő Tisza: add foundationwiki to the list of central auth login wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987138 (https://phabricator.wikimedia.org/T205347) (owner: 10ArielGlenn)
[03:06:42] <wikibugs>	 10SRE, 10ops-codfw, 10serviceops: Broken CPU on mw2394 - https://phabricator.wikimedia.org/T354193 (10Papaul) ` Your dispatch shipped on 1/3/2024 4:20 PM `
[03:08:57] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:18:33] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-codfw, 10DC-Ops: Disk (sdh) failed in ms-be2068 - https://phabricator.wikimedia.org/T354180 (10Papaul) Request replacement  `  Create Dispatch: Success You have successfully submitted request SR182683531.
[03:49:56] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[04:18:53] <icinga-wm>	 RECOVERY - MD RAID on ganeti1031 is OK: OK: Active: 12, Working: 12, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[05:41:33] <wikibugs>	 (03PS1) 10Jdlrobson: Revise logic for creating compact links button on Vector 2022 [extensions/UniversalLanguageSelector] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987473 (https://phabricator.wikimedia.org/T353850)
[05:45:57] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to wmde, nda for Dima Koushha - https://phabricator.wikimedia.org/T354276 (10Aklapper) @Dima: Hi! //In case// you are WMDE staff/contractor, could you please state so on https://phabricator.wikimedia.org/p/Dima/ for transparency? Thanks! :)
[05:46:06] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] "The contint* and doc* hosts already have php 7.4 and I don't understand the intent of this change." [puppet] - 10https://gerrit.wikimedia.org/r/987458 (https://phabricator.wikimedia.org/T334517) (owner: 10Dzahn)
[06:07:17] <wikibugs>	 (03PS1) 10Marostegui: db2[096-120]: Add hosts [puppet] - 10https://gerrit.wikimedia.org/r/987517 (https://phabricator.wikimedia.org/T354210)
[06:07:59] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2[096-120]: Add hosts [puppet] - 10https://gerrit.wikimedia.org/r/987517 (https://phabricator.wikimedia.org/T354210) (owner: 10Marostegui)
[06:10:41] <wikibugs>	 (03PS1) 10Marostegui: db2143: Migrare to 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/987518 (https://phabricator.wikimedia.org/T353499)
[06:11:20] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2143: Migrare to 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/987518 (https://phabricator.wikimedia.org/T353499) (owner: 10Marostegui)
[06:14:09] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] update_zarcillo: Push to the repo [software] - 10https://gerrit.wikimedia.org/r/987445 (owner: 10Marostegui)
[06:14:47] <wikibugs>	 (03Merged) 10jenkins-bot: update_zarcillo: Push to the repo [software] - 10https://gerrit.wikimedia.org/r/987445 (owner: 10Marostegui)
[06:28:05] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.reimage for host db1151.eqiad.wmnet with OS bookworm
[06:40:07] <logmsgbot>	 !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on db1151.eqiad.wmnet with reason: host reimage
[06:42:57] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1151.eqiad.wmnet with reason: host reimage
[07:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T0700)
[07:00:05] <jouncebot>	 kormat, marostegui, and Amir1: That opportune time for a Primary database switchover deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T0700).
[07:00:41] <logmsgbot>	 !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1151.eqiad.wmnet with OS bookworm
[07:16:15] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/987153 (owner: 10Muehlenhoff)
[07:17:31] <wikibugs>	 10SRE, 10ops-eqiad: Degraded RAID on ganeti1031 - https://phabricator.wikimedia.org/T354251 (10MoritzMuehlenhoff)
[07:17:45] <wikibugs>	 10SRE, 10ops-eqiad: SMART errors on ganeti1031 - https://phabricator.wikimedia.org/T353324 (10MoritzMuehlenhoff)
[07:19:46] <wikibugs>	 (03PS3) 10Muehlenhoff: nftables: On Buster install nftables and libnftnl from backports [puppet] - 10https://gerrit.wikimedia.org/r/987439 (https://phabricator.wikimedia.org/T354279)
[07:22:05] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/987439 (https://phabricator.wikimedia.org/T354279) (owner: 10Muehlenhoff)
[07:51:34] <jinxer-wm>	 (CirrusSearchJobQueueLagTooHigh) firing: CirrusSearch job cirrusSearchOtherIndex lag is too high: 6h 1m 15s - TODO - https://grafana.wikimedia.org/d/CbmStnlGk/jobqueue-job?orgId=1&var-dc=codfw%20prometheus/k8s&var-job=cirrusSearchOtherIndex - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchJobQueueLagTooHigh
[07:51:54] <wikibugs>	 (03PS4) 10Muehlenhoff: nftables: On Buster install nftables and libnftnl from backports [puppet] - 10https://gerrit.wikimedia.org/r/987439 (https://phabricator.wikimedia.org/T354279)
[07:56:14] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/987439 (https://phabricator.wikimedia.org/T354279) (owner: 10Muehlenhoff)
[07:56:34] <jinxer-wm>	 (CirrusSearchJobQueueLagTooHigh) resolved: CirrusSearch job cirrusSearchOtherIndex lag is too high: 6h 3m 37s - TODO - https://grafana.wikimedia.org/d/CbmStnlGk/jobqueue-job?orgId=1&var-dc=codfw%20prometheus/k8s&var-job=cirrusSearchOtherIndex - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchJobQueueLagTooHigh
[08:00:04] <jouncebot>	 Amir1, apergos, and jnuche: Time to snap out of that daydream and deploy UTC morning backport and config training. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T0800).
[08:00:37] <apergos>	 morning!  no trainees wish to learn the joys of deployment and no one has a patch to send around to production anyways, so there we have it.   wishing you all a 2024 that is better in all ways than last year, and see you next time! 
[08:00:51] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch peopleweb to nftables [puppet] - 10https://gerrit.wikimedia.org/r/984161 (owner: 10Muehlenhoff)
[08:14:14] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/987465 (owner: 10Andrew Bogott)
[08:17:39] <wikibugs>	 (03PS10) 10Ladsgroup: Add compare tables periodic job [puppet] - 10https://gerrit.wikimedia.org/r/979390 (https://phabricator.wikimedia.org/T207253)
[08:17:47] <wikibugs>	 (03CR) 10Ladsgroup: Add compare tables periodic job (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/979390 (https://phabricator.wikimedia.org/T207253) (owner: 10Ladsgroup)
[08:21:54] <wikibugs>	 (03PS2) 10Ladsgroup: Add virtual domain config for reading lists extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/985160 (https://phabricator.wikimedia.org/T353948)
[08:22:19] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Add virtual domain config for reading lists extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/985160 (https://phabricator.wikimedia.org/T353948) (owner: 10Ladsgroup)
[08:22:58] <wikibugs>	 (03Merged) 10jenkins-bot: Add virtual domain config for reading lists extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/985160 (https://phabricator.wikimedia.org/T353948) (owner: 10Ladsgroup)
[08:24:28] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/985160 (https://phabricator.wikimedia.org/T353948) (owner: 10Ladsgroup)
[08:25:23] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap: Backport for [[gerrit:985160|Add virtual domain config for reading lists extension (T353948)]]
[08:25:30] <stashbot>	 T353948: Migrate reading lists to use a virtual database domain - https://phabricator.wikimedia.org/T353948
[08:27:02] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Backport for [[gerrit:985160|Add virtual domain config for reading lists extension (T353948)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:28:18] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Continuing with sync
[08:30:02] <wikibugs>	 (03PS1) 10Muehlenhoff: idm: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/987656
[08:33:36] <wikibugs>	 (03PS1) 10Ladsgroup: Set commonswiki pagelinks migration stage to READ NEW [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987657 (https://phabricator.wikimedia.org/T351237)
[08:33:48] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/987656 (owner: 10Muehlenhoff)
[08:34:29] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap: Backport for [[gerrit:985160|Add virtual domain config for reading lists extension (T353948)]] (duration: 09m 05s)
[08:34:33] <stashbot>	 T353948: Migrate reading lists to use a virtual database domain - https://phabricator.wikimedia.org/T353948
[08:34:47] <wikibugs>	 (03CR) 10Jgiannelos: [C: 04-1] "This is ready for review. Blocking so we don't deploy before apps teams approval." [deployment-charts] - 10https://gerrit.wikimedia.org/r/987158 (owner: 10Jgiannelos)
[08:35:38] <wikibugs>	 (03PS2) 10Ladsgroup: Update virtual domain for url shortener [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987134
[08:35:41] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Update virtual domain for url shortener [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987134 (owner: 10Ladsgroup)
[08:35:53] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987134 (owner: 10Ladsgroup)
[08:36:28] <wikibugs>	 (03Merged) 10jenkins-bot: Update virtual domain for url shortener [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987134 (owner: 10Ladsgroup)
[08:36:53] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap: Backport for [[gerrit:987134|Update virtual domain for url shortener]]
[08:38:24] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Backport for [[gerrit:987134|Update virtual domain for url shortener]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[08:41:58] <wikibugs>	 (03PS1) 10Slyngshede: C:raid::perccli Support compression of output. [puppet] - 10https://gerrit.wikimedia.org/r/987659 (https://phabricator.wikimedia.org/T354254)
[08:43:40] <logmsgbot>	 !log ladsgroup@deploy2002 ladsgroup: Continuing with sync
[08:49:28] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap: Backport for [[gerrit:987134|Update virtual domain for url shortener]] (duration: 12m 35s)
[08:56:38] <wikibugs>	 (03PS1) 10Peter Fischer: Search update pipeline: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/987692
[08:56:51] <wikibugs>	 (03CR) 10Peter Fischer: [C: 03+2] Search update pipeline: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/987692 (owner: 10Peter Fischer)
[08:57:12] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "Indeed they do. https://w.wiki/8j4j for a quick view on anyone with a grafana account that wants to take a peek. There are some interestin" [deployment-charts] - 10https://gerrit.wikimedia.org/r/984482 (owner: 10JMeybohm)
[08:57:14] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Bump memory for calico-node on wikikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/984482 (owner: 10JMeybohm)
[08:58:12] <wikibugs>	 (03Merged) 10jenkins-bot: Search update pipeline: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/987692 (owner: 10Peter Fischer)
[09:00:42] <wikibugs>	 (03Merged) 10jenkins-bot: Bump memory for calico-node on wikikube [deployment-charts] - 10https://gerrit.wikimedia.org/r/984482 (owner: 10JMeybohm)
[09:04:44] <wikibugs>	 (03CR) 10Muehlenhoff: Package Debmonitor server as .deb (033 comments) [software/debmonitor] (debian) - 10https://gerrit.wikimedia.org/r/981300 (owner: 10Slyngshede)
[09:07:25] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] pybal: Disable Pint promql/series checks [alerts] - 10https://gerrit.wikimedia.org/r/987499 (https://phabricator.wikimedia.org/T353760) (owner: 10BCornwall)
[09:07:47] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] prometheus::migration: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987153 (owner: 10Muehlenhoff)
[09:09:45] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[09:11:21] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[09:12:44] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[09:13:04] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[09:13:21] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[09:13:32] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[09:22:08] <wikibugs>	 (03PS6) 10Brouberol: spark3: enable event logging and history server integration for all spark jobs [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849)
[09:22:15] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[09:22:49] <akosiaris>	 !log bump memory limits for calico-node in wikikube codfw/eqiad by 25% (i.e from 400Mi to 500Mi)
[09:22:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:26:09] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[09:31:09] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[09:34:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] prometheus::migration: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987153 (owner: 10Muehlenhoff)
[09:34:18] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] releases: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987436 (owner: 10Muehlenhoff)
[09:36:09] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[09:36:28] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[09:36:36] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[09:38:22] <akosiaris>	 !log bump memory limits for calico-node in wikikube codfw/eqiad by 25% (i.e from 400Mi to 500Mi) take #2
[09:38:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:38:40] <akosiaris>	 !log delete mw1377-mw1383 from eqiad wikikube nodes
[09:38:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:39:12] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-codfw, 10DC-Ops: Disk (sdh) failed in ms-be2068 - https://phabricator.wikimedia.org/T354180 (10MatthewVernon) @Papaul Thanks for the quick swap :)
[09:39:42] <wikibugs>	 (03CR) 10MVernon: [C: 03+2] swift: add new storagehosts [puppet] - 10https://gerrit.wikimedia.org/r/987448 (https://phabricator.wikimedia.org/T353149) (owner: 10MVernon)
[09:41:09] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[09:46:09] <wikibugs>	 (03CR) 10Volans: [C: 03+2] raid handler: fix broken cases [puppet] - 10https://gerrit.wikimedia.org/r/987409 (owner: 10Volans)
[09:53:35] <wikibugs>	 (03CR) 10Volans: "The compression looks ok. One comment on the other change." [puppet] - 10https://gerrit.wikimedia.org/r/987659 (https://phabricator.wikimedia.org/T354254) (owner: 10Slyngshede)
[09:54:40] <wikibugs>	 (03PS1) 10DDesouza: research-landing-page: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/987707 (https://phabricator.wikimedia.org/T352583)
[09:54:48] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] research-landing-page: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/987707 (https://phabricator.wikimedia.org/T352583) (owner: 10DDesouza)
[09:55:19] <wikibugs>	 (03PS2) 10DDesouza: research-landing-page: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/987707 (https://phabricator.wikimedia.org/T352583)
[09:55:24] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] profile::prometheus::rsyncd: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/984808 (owner: 10Muehlenhoff)
[09:57:09] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[09:57:53] <wikibugs>	 (03PS3) 10DDesouza: research-landing-page: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/987707 (https://phabricator.wikimedia.org/T352583)
[09:58:03] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to wmde, nda for Dima Koushha - https://phabricator.wikimedia.org/T354276 (10Dima) @Aklapper : Hi Andre! Sure thing, I added that. :)
[10:01:09] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[10:01:19] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Calico: Bump timeout to 1H [deployment-charts] - 10https://gerrit.wikimedia.org/r/987708
[10:03:07] <wikibugs>	 10SRE: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756 (10LSobanski) There are 31 HP servers and 1 storage array remaining (https://netbox.wikimedia.org/dcim/manufacturers/6/), excluding the Swift hosts the majority remaining are DBs. Looking at the purchase dates m...
[10:03:09] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Calico: Bump timeout to 1H [deployment-charts] - 10https://gerrit.wikimedia.org/r/987708
[10:06:09] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[10:07:29] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[10:08:13] <wikibugs>	 (03CR) 10Hashar: [C: 03+2] Merge tag 'v3.6.8' into wmf/stable-3.6 [software/gerrit] (wmf/stable-3.6) - 10https://gerrit.wikimedia.org/r/987438 (https://phabricator.wikimedia.org/T309870) (owner: 10Hashar)
[10:08:34] <wikibugs>	 (03PS1) 10Muehlenhoff: rsyslog::receiver: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/987709
[10:08:36] <wikibugs>	 (03PS1) 10Muehlenhoff: rsyslog::receiver: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987710
[10:08:38] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Calico: Bump timeout to 1H [deployment-charts] - 10https://gerrit.wikimedia.org/r/987708 (owner: 10Alexandros Kosiaris)
[10:08:54] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] pki::multirootca: Merge custom profiles on top of default_profiles [puppet] - 10https://gerrit.wikimedia.org/r/982854 (https://phabricator.wikimedia.org/T353314) (owner: 10JMeybohm)
[10:09:01] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[10:09:10] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/987709 (owner: 10Muehlenhoff)
[10:09:22] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/987710 (owner: 10Muehlenhoff)
[10:10:00] <jinxer-wm>	 (HelmReleaseBadStatus) firing: Helm release kube-system/calico on k8s@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=kube-system - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[10:11:28] <wikibugs>	 (03Merged) 10jenkins-bot: Calico: Bump timeout to 1H [deployment-charts] - 10https://gerrit.wikimedia.org/r/987708 (owner: 10Alexandros Kosiaris)
[10:11:43] <wikibugs>	 10SRE: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756 (10Marostegui) Fine by me!
[10:14:51] <wikibugs>	 (03Merged) 10jenkins-bot: Merge tag 'v3.6.8' into wmf/stable-3.6 [software/gerrit] (wmf/stable-3.6) - 10https://gerrit.wikimedia.org/r/987438 (https://phabricator.wikimedia.org/T309870) (owner: 10Hashar)
[10:16:55] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/987656 (owner: 10Muehlenhoff)
[10:17:02] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[10:17:19] <akosiaris>	 !log bump memory limits for calico-node in wikikube codfw/eqiad by 25% (i.e from 400Mi to 500Mi) take #3
[10:17:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:20:00] <jinxer-wm>	 (HelmReleaseBadStatus) resolved: Helm release kube-system/calico on k8s@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=kube-system - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[10:26:09] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to wmde, nda for Dima Koushha - https://phabricator.wikimedia.org/T354276 (10WMDE-leszek) FWIW, I confirm @Dima is WMDE software engineer, and approve this request on WMDE's behalf. Thank you.
[10:31:30] <jinxer-wm>	 (HelmReleaseBadStatus) firing: (2) Helm release kube-system/calico on k8s@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=kube-system - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[10:32:37] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[10:33:40] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[10:36:30] <jinxer-wm>	 (HelmReleaseBadStatus) resolved: (2) Helm release kube-system/calico on k8s@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=kube-system - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[10:37:16] <wikibugs>	 (03PS1) 10Zabe: Revert "Support new block schema" [extensions/CentralAuth] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987482 (https://phabricator.wikimedia.org/T354298)
[10:37:51] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:38:04] <wikibugs>	 (03PS1) 10Zabe: Revert "Get blocks from DatabaseBlockStore instead of doing our own query" [extensions/CheckUser] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987483 (https://phabricator.wikimedia.org/T353620)
[10:39:41] <wikibugs>	 10SRE: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756 (10MatthewVernon) Hm, actually, that list from netbox includes servers not in the description of this task (ah, and they have manufacturer = HPE not HP) and the necessary binary is now /usr/sbin/ssacli. So the c...
[10:42:05] <wikibugs>	 (03PS3) 10Ladsgroup: snapshot: Improve border of dumps cards [puppet] - 10https://gerrit.wikimedia.org/r/986181
[10:42:33] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 298, down: 2, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[10:43:12] <zabe>	 jouncebot: nowandnext
[10:43:12] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 16 minute(s)
[10:43:13] <jouncebot>	 In 0 hour(s) and 16 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T1100)
[10:43:13] <jouncebot>	 In 0 hour(s) and 16 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T1100)
[10:43:23] <wikibugs>	 (03CR) 10Ladsgroup: snapshot: Improve border of dumps cards (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/986181 (owner: 10Ladsgroup)
[10:45:02] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Revert "Get blocks from DatabaseBlockStore instead of doing our own query" [extensions/CheckUser] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987483 (https://phabricator.wikimedia.org/T353620) (owner: 10Zabe)
[10:48:30] <jinxer-wm>	 (HelmReleaseBadStatus) firing: (2) Helm release kube-system/calico on k8s@codfw in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency  - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[10:51:10] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1027/co" [puppet] - 10https://gerrit.wikimedia.org/r/983363 (https://phabricator.wikimedia.org/T353314) (owner: 10JMeybohm)
[10:51:14] <logmsgbot>	 !log akosiaris@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[10:53:05] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1 C: 03+2] pki::multirootca: Override the server profiles expiry for k8s staging [puppet] - 10https://gerrit.wikimedia.org/r/983363 (https://phabricator.wikimedia.org/T353314) (owner: 10JMeybohm)
[10:53:30] <jinxer-wm>	 (HelmReleaseBadStatus) resolved: Helm release kube-system/calico on k8s@codfw in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=kube-system - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[11:00:05] <jouncebot>	 mvolz: #bothumor My software never has bugs. It just develops random features. Rise for Services – Citoid / Zotero. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T1100).
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T1100)
[11:01:43] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host mw1379.eqiad.wmnet with OS bullseye
[11:15:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] idm: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/987656 (owner: 10Muehlenhoff)
[11:16:07] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1379.eqiad.wmnet with reason: host reimage
[11:19:00] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1379.eqiad.wmnet with reason: host reimage
[11:21:31] <wikibugs>	 (03PS6) 10EoghanGaffney: [gerrit] Add rsync job for lfs sync [puppet] - 10https://gerrit.wikimedia.org/r/987135 (https://phabricator.wikimedia.org/T257741)
[11:27:10] <wikibugs>	 (03PS1) 10Muehlenhoff: idm: Configure tlsproxy::envoy::firewall_srange [puppet] - 10https://gerrit.wikimedia.org/r/987715
[11:27:18] <wikibugs>	 (03PS2) 10Muehlenhoff: idm: Configure tlsproxy::envoy::firewall_srange [puppet] - 10https://gerrit.wikimedia.org/r/987715
[11:32:15] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/987715 (owner: 10Muehlenhoff)
[11:34:07] <wikibugs>	 (03PS1) 10MVernon: swift: add new nodes, drain old nodes from the rings [puppet] - 10https://gerrit.wikimedia.org/r/987718 (https://phabricator.wikimedia.org/T353149)
[11:35:55] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1379.eqiad.wmnet with OS bullseye
[11:37:43] <wikibugs>	 (03PS1) 10Kamila Součková: Revert "Set MW API servers to insetup to fix failed reimage" [puppet] - 10https://gerrit.wikimedia.org/r/987484
[11:38:05] <wikibugs>	 (03PS2) 10Kamila Součková: Revert "Set MW API servers to insetup to fix failed reimage" [puppet] - 10https://gerrit.wikimedia.org/r/987484
[11:38:43] <wikibugs>	 (03PS1) 10Dreamy Jazz: Attempt to send original file to PhotoDNA if no thumbnail [extensions/MediaModeration] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987485 (https://phabricator.wikimedia.org/T353854)
[11:40:43] <wikibugs>	 (03PS1) 10Dreamy Jazz: Attempt to send original file to PhotoDNA if no thumbnail [extensions/MediaModeration] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/987726 (https://phabricator.wikimedia.org/T353854)
[11:41:21] <wikibugs>	 (03CR) 10Kamila Součková: [C: 03+2] Revert "Set MW API servers to insetup to fix failed reimage" [puppet] - 10https://gerrit.wikimedia.org/r/987484 (owner: 10Kamila Součková)
[11:42:14] <wikibugs>	 (03CR) 10Btullis: "Instead of putting these options in to all of the role definition files where `profile::hadoop::spark3` is applied, we could instead make " [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849) (owner: 10Brouberol)
[11:50:17] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/981418 (owner: 10Majavah)
[11:52:12] <moritzm>	 !log installing libde265 security updates
[11:52:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:28] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+1] "Looks good." [puppet] - 10https://gerrit.wikimedia.org/r/987715 (owner: 10Muehlenhoff)
[12:04:14] <moritzm>	 !log installing lua5.3 security updates
[12:04:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:09:13] <wikibugs>	 (03PS1) 10Muehlenhoff: Add Cumin alias for cloudlb [puppet] - 10https://gerrit.wikimedia.org/r/987748
[12:10:14] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reboot-single for host mw1377.eqiad.wmnet
[12:10:26] <wikibugs>	 (03PS2) 10Muehlenhoff: Add Cumin alias for cloudlb [puppet] - 10https://gerrit.wikimedia.org/r/987748
[12:16:25] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add Cumin alias for cloudlb [puppet] - 10https://gerrit.wikimedia.org/r/987748 (owner: 10Muehlenhoff)
[12:19:09] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] rancid/librenms: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987403 (owner: 10Muehlenhoff)
[12:20:41] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Cannot enter configuration mode on cr2-drmrs - https://phabricator.wikimedia.org/T354340 (10cmooney) p:05Triage→03Medium
[12:21:20] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.8 point update - https://phabricator.wikimedia.org/T348327 (10MoritzMuehlenhoff)
[12:22:23] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Cannot enter configuration mode on cr2-drmrs - https://phabricator.wikimedia.org/T354340 (10cmooney)
[12:23:41] <wikibugs>	 (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/986835
[12:23:52] <wikibugs>	 (03CR) 10Phuedx: [C: 04-1] "AIUI we're only adding this proxy to support existing MediaWiki installations submitting pingbacks as the information is valuable (though " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/985023 (https://phabricator.wikimedia.org/T353817) (owner: 10Ottomata)
[12:25:27] <wikibugs>	 (03PS1) 10Jelto: miscweb: add design.wikimedia.org services [deployment-charts] - 10https://gerrit.wikimedia.org/r/987758 (https://phabricator.wikimedia.org/T350791)
[12:30:31] <icinga-wm>	 PROBLEM - Host mw1377 is DOWN: PING CRITICAL - Packet loss = 100%
[12:32:12] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] idm: Configure tlsproxy::envoy::firewall_srange [puppet] - 10https://gerrit.wikimedia.org/r/987715 (owner: 10Muehlenhoff)
[12:38:03] <zabe>	 jouncebot: nowandnext
[12:38:03] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 21 minute(s)
[12:38:03] <jouncebot>	 In 0 hour(s) and 21 minute(s): Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T1300)
[12:38:23] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Revert "Get blocks from DatabaseBlockStore instead of doing our own query" [extensions/CheckUser] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987483 (https://phabricator.wikimedia.org/T353620) (owner: 10Zabe)
[12:38:30] <wikibugs>	 (03CR) 10Zabe: [C: 03+2] Revert "Support new block schema" [extensions/CentralAuth] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987482 (https://phabricator.wikimedia.org/T354298) (owner: 10Zabe)
[12:43:19] <wikibugs>	 (03PS1) 10Ayounsi: Depool drmrs for cr2 maintenance [dns] - 10https://gerrit.wikimedia.org/r/987767 (https://phabricator.wikimedia.org/T354340)
[12:45:36] <wikibugs>	 (03PS1) 10Muehlenhoff: idm-test: Configure tlsproxy::envoy::firewall_srange [puppet] - 10https://gerrit.wikimedia.org/r/987768
[12:45:58] <wikibugs>	 (03PS2) 10Muehlenhoff: idm-test: Configure tlsproxy::envoy::firewall_srange [puppet] - 10https://gerrit.wikimedia.org/r/987768
[12:51:48] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch idm-test to nftables [puppet] - 10https://gerrit.wikimedia.org/r/987769
[12:52:38] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] "LGTM!" [dns] - 10https://gerrit.wikimedia.org/r/987767 (https://phabricator.wikimedia.org/T354340) (owner: 10Ayounsi)
[12:52:44] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'configure' for AS: 63296
[12:53:08] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 63296
[12:56:19] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Get blocks from DatabaseBlockStore instead of doing our own query" [extensions/CheckUser] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987483 (https://phabricator.wikimedia.org/T353620) (owner: 10Zabe)
[12:56:22] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Support new block schema" [extensions/CentralAuth] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987482 (https://phabricator.wikimedia.org/T354298) (owner: 10Zabe)
[12:56:52] <logmsgbot>	 !log zabe@deploy2002 Started scap: Backport for [[gerrit:987483|Revert "Get blocks from DatabaseBlockStore instead of doing our own query" (T353620)]], [[gerrit:987482|Revert "Support new block schema" (T354298)]]
[12:56:57] <stashbot>	 T353620: Wikimedia\Assert\PreconditionException: Expected MediaWiki\Block\AbstractBlock to belong to the local wiki, but it belongs to 'commonswiki' - https://phabricator.wikimedia.org/T353620
[12:56:57] <stashbot>	 T354298: Wikimedia\Assert\PreconditionException: Expected MediaWiki\Block\AbstractBlock to belong to the local wiki, but it belongs to 'xxxwiki' - https://phabricator.wikimedia.org/T354298
[12:59:05] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] rsyslog::receiver: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987710 (owner: 10Muehlenhoff)
[13:00:05] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T1300)
[13:00:07] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] rsyslog::receiver: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/987709 (owner: 10Muehlenhoff)
[13:00:29] <logmsgbot>	 !log zabe@deploy2002 zabe: Backport for [[gerrit:987483|Revert "Get blocks from DatabaseBlockStore instead of doing our own query" (T353620)]], [[gerrit:987482|Revert "Support new block schema" (T354298)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:01:06] <logmsgbot>	 !log zabe@deploy2002 zabe: Continuing with sync
[13:01:43] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Depool drmrs for cr2 maintenance [dns] - 10https://gerrit.wikimedia.org/r/987767 (https://phabricator.wikimedia.org/T354340) (owner: 10Ayounsi)
[13:02:29] <XioNoX>	 !log depool drmrs for router work - T354340
[13:02:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:02:33] <stashbot>	 T354340: Cannot enter configuration mode on cr2-drmrs - https://phabricator.wikimedia.org/T354340
[13:02:54] <logmsgbot>	 !log kamila@cumin1002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mw1377.eqiad.wmnet
[13:05:20] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] rsyslog::receiver: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/987709 (owner: 10Muehlenhoff)
[13:06:58] <logmsgbot>	 !log zabe@deploy2002 Finished scap: Backport for [[gerrit:987483|Revert "Get blocks from DatabaseBlockStore instead of doing our own query" (T353620)]], [[gerrit:987482|Revert "Support new block schema" (T354298)]] (duration: 10m 06s)
[13:07:13] <stashbot>	 T353620: Wikimedia\Assert\PreconditionException: Expected MediaWiki\Block\AbstractBlock to belong to the local wiki, but it belongs to 'commonswiki' - https://phabricator.wikimedia.org/T353620
[13:07:14] <stashbot>	 T354298: Wikimedia\Assert\PreconditionException: Expected MediaWiki\Block\AbstractBlock to belong to the local wiki, but it belongs to 'xxxwiki' - https://phabricator.wikimedia.org/T354298
[13:09:06] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] rsyslog::receiver: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987710 (owner: 10Muehlenhoff)
[13:11:03] <wikibugs>	 (03PS1) 10Ayounsi: Repool drmrs after cr2 maintenance [dns] - 10https://gerrit.wikimedia.org/r/987727 (https://phabricator.wikimedia.org/T354340)
[13:11:31] <wikibugs>	 10SRE: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756 (10fgiunchedi) >>! In T141756#9434526, @LSobanski wrote: > There are 31 HP servers and 1 storage array remaining (https://netbox.wikimedia.org/dcim/manufacturers/6/), excluding the Swift hosts the majority remai...
[13:13:40] <MichaelG_WMDE>	 Hi, we're seeing critical alerts about wdqs1019 being 503: https://alerts.wikimedia.org/?q=%40state%3Dactive&q=wikidata&q=alertname%21%3DCirrusSearchIndexTooOld&q=instance%21%3Dwikidata-analytics-1 --- I'm trying to understand if that is something we should worry about or whether there is anything needed from our side?
[13:15:06] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] Repool drmrs after cr2 maintenance [dns] - 10https://gerrit.wikimedia.org/r/987727 (https://phabricator.wikimedia.org/T354340) (owner: 10Ayounsi)
[13:16:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] aptrepo:rsync: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/984805 (owner: 10Muehlenhoff)
[13:18:21] <wikibugs>	 10SRE: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756 (10LSobanski) 05Open→03Resolved a:03LSobanski
[13:18:31] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10Patch-For-Review: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631 (10LSobanski)
[13:18:35] <wikibugs>	 10SRE: audit / test / upgrade hp smartarray P840 firmware - https://phabricator.wikimedia.org/T141756 (10LSobanski) a:05LSobanski→03None
[13:20:03] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] enable page_rerender for 2nd batch: dewiki, frwiktionary, and kuwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984810 (owner: 10Peter Fischer)
[13:20:44] <dcausse>	 MichaelG_WMDE: looking
[13:21:08] <MichaelG_WMDE>	 thank you 🙏
[13:24:37] <dcausse>	 !log restarting blazegraph on wdqs1019 (stuck with high thread count)
[13:24:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:21] <icinga-wm>	 RECOVERY - Query Service HTTP Port on wdqs1019 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.025 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[13:25:22] <dcausse>	 lag alerts will fire for this host but can be ignored (it's depooled)
[13:26:05] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1019 is OK: HTTP OK: HTTP/1.1 200 OK - 692 bytes in 0.249 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[13:27:30] <MichaelG_WMDE>	 dcausse Ok, good to know. Thank you for taking care of it 
[13:28:24] <dcausse>	 np! thanks for raising this here :)
[13:30:58] <jinxer-wm>	 (RdfStreamingUpdaterHighConsumerUpdateLag) firing: wdqs1019:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[13:32:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] rancid/librenms: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987403 (owner: 10Muehlenhoff)
[13:37:26] <icinga-wm>	 PROBLEM - Host cr2-drmrs #page is DOWN: PING CRITICAL - Packet loss = 100%
[13:37:47] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:37:52] <vgutierrez>	 XioNoX:, topranks ^^?
[13:37:54] <moritzm>	 expired downtime?
[13:37:54] <_joe_>	 uh
[13:38:00] <XioNoX>	 drmrs is depooled
[13:38:03] <_joe_>	 ok
[13:38:05] <vgutierrez>	 ack
[13:38:05] <icinga-wm>	 PROBLEM - BGP status on asw1-b13-drmrs.mgmt is CRITICAL: BGP CRITICAL - AS14907/IPv4: Active - wmf_public_asn, AS14907/IPv6: Idle - wmf_public_asn https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:38:07] * Emperor twitches
[13:38:15] <icinga-wm>	 PROBLEM - Host cr2-drmrs IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[13:38:16] * vgutierrez sends a kill -SIGHUP to his heart attack
[13:38:20] <_joe_>	 !incidents
[13:38:20] <sirenbot>	 4371 (ACKED)  Host cr2-drmrs (paged) - PING  - Packet loss = 100%
[13:38:29] <icinga-wm>	 PROBLEM - BGP status on asw1-b12-drmrs.mgmt is CRITICAL: BGP CRITICAL - AS14907/IPv6: Active - wmf_public_asn https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:38:37] <topranks>	 ^^ this was due to maintenance we were doing which hasn't gone as smoothly as hoped 
[13:38:49] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:38:52] <topranks>	 rebooting device now - site is depoosed 
[13:38:55] <Emperor>	 At least we know the batphone is working ;-)
[13:39:15] <icinga-wm>	 PROBLEM - BFD status on cr2-eqdfw is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[13:39:21] <icinga-wm>	 PROBLEM - BFD status on cr1-eqiad is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[13:40:37] <XioNoX>	 it's also only one router, so even if some users are sticking to drmrs due to bad DNS config, it would still work for them
[13:43:17] <icinga-wm>	 PROBLEM - Router interfaces on cr1-drmrs is CRITICAL: CRITICAL: host 185.15.58.128, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[13:43:25] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] idm-test: Configure tlsproxy::envoy::firewall_srange [puppet] - 10https://gerrit.wikimedia.org/r/987768 (owner: 10Muehlenhoff)
[13:44:19] <icinga-wm>	 RECOVERY - BGP status on asw1-b13-drmrs.mgmt is OK: BGP OK - up: 12, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:44:41] <icinga-wm>	 RECOVERY - BGP status on asw1-b12-drmrs.mgmt is OK: BGP OK - up: 13, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:44:48] <icinga-wm>	 RECOVERY - Host cr2-drmrs #page is UP: PING OK - Packet loss = 0%, RTA = 86.02 ms
[13:44:53] <icinga-wm>	 RECOVERY - Router interfaces on cr1-drmrs is OK: OK: host 185.15.58.128, interfaces up: 58, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[13:45:07] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:45:33] <icinga-wm>	 RECOVERY - BFD status on cr2-eqdfw is OK: UP: 13 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[13:45:37] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[13:45:37] <icinga-wm>	 RECOVERY - BFD status on cr1-eqiad is OK: UP: 24 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[13:49:35] <icinga-wm>	 RECOVERY - Host cr2-drmrs IPv6 is UP: PING OK - Packet loss = 0%, RTA = 85.71 ms
[13:50:01] <wikibugs>	 (03PS2) 10Muehlenhoff: Switch idm-test to nftables [puppet] - 10https://gerrit.wikimedia.org/r/987769
[13:51:58] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/987769 (owner: 10Muehlenhoff)
[13:53:13] <jinxer-wm>	 (KubernetesCalicoDown) firing: mw1377.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s&var-instance=mw1377.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[13:53:29] <moritzm>	 !log installing libssh security updates
[13:53:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:56:51] <logmsgbot>	 !log ayounsi@cumin1002 START - Cookbook sre.network.peering with action 'configure' for AS: 2686
[13:57:20] <logmsgbot>	 !log ayounsi@cumin1002 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2686
[13:57:34] <wikibugs>	 (03PS1) 10Peter Fischer: Search update pipeline: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/987773
[13:57:44] <wikibugs>	 (03CR) 10Peter Fischer: [C: 03+2] Search update pipeline: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/987773 (owner: 10Peter Fischer)
[13:58:34] <wikibugs>	 (03Merged) 10jenkins-bot: Search update pipeline: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/987773 (owner: 10Peter Fischer)
[14:00:05] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: It is that lovely time of the day again! You are hereby commanded to deploy UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T1400).
[14:00:05] <jouncebot>	 Dreamy_Jazz: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:23] <Dreamy_Jazz>	 \o
[14:00:41] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[14:00:45] <TheresNoTime>	 if someone else could deploy, that'd be great :-)
[14:00:55] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[14:01:07] <Dreamy_Jazz>	 My backports cannot be tested on-wiki as the only changes are to maintenance scripts. These are currently run manually, so if there are problems I will see them once the script is run manually.
[14:01:24] <Dreamy_Jazz>	 *to maintenance scripts and code that interacts only with maintenance scripts.
[14:01:28] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[14:01:30] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Repool drmrs after cr2 maintenance [dns] - 10https://gerrit.wikimedia.org/r/987727 (https://phabricator.wikimedia.org/T354340) (owner: 10Ayounsi)
[14:02:02] <Dreamy_Jazz>	 Need to backport as I want to run another scan on testwiki later today.
[14:03:10] <XioNoX>	 !log repool drmrs - T354340
[14:03:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:03:14] <stashbot>	 T354340: Cannot enter configuration mode on cr2-drmrs - https://phabricator.wikimedia.org/T354340
[14:04:53] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic, 10netops: Test depool of drmrs - https://phabricator.wikimedia.org/T344968 (10ayounsi) 05Open→03Resolved a:03ayounsi Depooled esams for 1h and everything went well.
[14:06:04] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[14:07:42] <wikibugs>	 (03CR) 10FNegri: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/984815 (owner: 10Muehlenhoff)
[14:08:24] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[14:08:43] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[14:08:51] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[14:09:03] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[14:09:07] <wikibugs>	 (03CR) 10Brouberol: spark3: enable event logging and history server integration for all spark jobs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849) (owner: 10Brouberol)
[14:09:24] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[14:09:34] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[14:10:20] * James_F waves.
[14:10:20] <Tchanders>	 Dreamy_Jazz, I can deploy the MediaModeration fixes along with https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/987483
[14:10:40] <Dreamy_Jazz>	 Sure.
[14:10:49] <Dreamy_Jazz>	 Has that CheckUser change already been backported?
[14:10:57] <Dreamy_Jazz>	 Just as zabe gave it a +2
[14:11:05] <James_F>	 I think so.
[14:11:10] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[14:11:47] <Dreamy_Jazz>	 Based on https://phabricator.wikimedia.org/T353620#9435093 I think that the CheckUser backport has already been scap backported
[14:11:56] <James_F>	 Yup.
[14:12:01] <James_F>	 Also https://sal.toolforge.org/production?p=0&q=987483&d=
[14:12:07] <Dreamy_Jazz>	 However, my MediaModeration changes have not :)
[14:12:10] <Tchanders>	 Dreamy_Jazz, OK just MediaModeration then
[14:12:18] <Dreamy_Jazz>	 👍
[14:12:31] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[14:12:43] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[14:13:25] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by tchanders@deploy2002 using scap backport" [extensions/MediaModeration] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987485 (https://phabricator.wikimedia.org/T353854) (owner: 10Dreamy Jazz)
[14:14:02] <wikibugs>	 (03CR) 10Slyngshede: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/987769 (owner: 10Muehlenhoff)
[14:16:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Switch idm-test to nftables [puppet] - 10https://gerrit.wikimedia.org/r/987769 (owner: 10Muehlenhoff)
[14:16:04] <wikibugs>	 (03Merged) 10jenkins-bot: Attempt to send original file to PhotoDNA if no thumbnail [extensions/MediaModeration] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987485 (https://phabricator.wikimedia.org/T353854) (owner: 10Dreamy Jazz)
[14:16:09] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[14:16:27] <logmsgbot>	 !log tchanders@deploy2002 Started scap: Backport for [[gerrit:987485|Attempt to send original file to PhotoDNA if no thumbnail (T353854)]]
[14:16:37] <stashbot>	 T353854: Attempt to send original file to PhotoDNA for the scan if thumbnail fails - https://phabricator.wikimedia.org/T353854
[14:20:04] <logmsgbot>	 !log tchanders@deploy2002 dreamyjazz and tchanders: Backport for [[gerrit:987485|Attempt to send original file to PhotoDNA if no thumbnail (T353854)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:20:10] <logmsgbot>	 !log tchanders@deploy2002 dreamyjazz and tchanders: Continuing with sync
[14:25:51] <logmsgbot>	 !log tchanders@deploy2002 Finished scap: Backport for [[gerrit:987485|Attempt to send original file to PhotoDNA if no thumbnail (T353854)]] (duration: 09m 24s)
[14:25:55] <stashbot>	 T353854: Attempt to send original file to PhotoDNA for the scan if thumbnail fails - https://phabricator.wikimedia.org/T353854
[14:27:56] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by tchanders@deploy2002 using scap backport" [extensions/MediaModeration] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/987726 (https://phabricator.wikimedia.org/T353854) (owner: 10Dreamy Jazz)
[14:28:01] <wikibugs>	 (03PS7) 10Brouberol: spark3: enable event logging and history server integration for all spark jobs [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849)
[14:28:05] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] keystone: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/984815 (owner: 10Muehlenhoff)
[14:28:32] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] spark3: enable event logging and history server integration for all spark jobs [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849) (owner: 10Brouberol)
[14:29:51] <wikibugs>	 (03PS8) 10Brouberol: spark3: enable event logging and history server integration for all spark jobs [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849)
[14:30:22] <wikibugs>	 (03Merged) 10jenkins-bot: Attempt to send original file to PhotoDNA if no thumbnail [extensions/MediaModeration] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/987726 (https://phabricator.wikimedia.org/T353854) (owner: 10Dreamy Jazz)
[14:30:49] <logmsgbot>	 !log tchanders@deploy2002 Started scap: Backport for [[gerrit:987726|Attempt to send original file to PhotoDNA if no thumbnail (T353854)]]
[14:34:27] <logmsgbot>	 !log tchanders@deploy2002 tchanders and dreamyjazz: Backport for [[gerrit:987726|Attempt to send original file to PhotoDNA if no thumbnail (T353854)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:34:31] <stashbot>	 T353854: Attempt to send original file to PhotoDNA for the scan if thumbnail fails - https://phabricator.wikimedia.org/T353854
[14:34:32] <pfischer>	 Hi, am I too late for the current UTC afternoon backport window? I just added a config change.
[14:34:34] <logmsgbot>	 !log tchanders@deploy2002 tchanders and dreamyjazz: Continuing with sync
[14:34:35] <wikibugs>	 (03PS1) 10Muehlenhoff: mwmaint: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987778
[14:35:30] <wikibugs>	 (03PS2) 10Ssingh: hiera: dnsbox: remove anycast-hc dependency on pdns-rec [puppet] - 10https://gerrit.wikimedia.org/r/979159 (https://phabricator.wikimedia.org/T347054)
[14:36:37] <wikibugs>	 (03PS1) 10Muehlenhoff: deployment servers: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987779
[14:36:55] <Tchanders>	 pfischer, I'll do that one next
[14:37:11] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:38:16] <wikibugs>	 (03PS1) 10Muehlenhoff: toolforge::docker::registry: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987781
[14:38:34] <wikibugs>	 (03PS1) 10Ayounsi: Depool esams for cr1 maintenance [dns] - 10https://gerrit.wikimedia.org/r/987782 (https://phabricator.wikimedia.org/T346779)
[14:38:43] <pfischer>	 Tchanders: thanks!
[14:39:12] <wikibugs>	 (03CR) 10Cathal Mooney: [C: 03+1] "LGTM." [dns] - 10https://gerrit.wikimedia.org/r/987782 (https://phabricator.wikimedia.org/T346779) (owner: 10Ayounsi)
[14:40:14] <logmsgbot>	 !log tchanders@deploy2002 Finished scap: Backport for [[gerrit:987726|Attempt to send original file to PhotoDNA if no thumbnail (T353854)]] (duration: 09m 25s)
[14:40:18] <stashbot>	 T353854: Attempt to send original file to PhotoDNA for the scan if thumbnail fails - https://phabricator.wikimedia.org/T353854
[14:40:22] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/987778 (owner: 10Muehlenhoff)
[14:41:00] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by tchanders@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984810 (owner: 10Peter Fischer)
[14:41:31] <wikibugs>	 (03Abandoned) 10Phuedx: ext-EventStreamConfig: Add eventlogging_MediaWikiPingback stream config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/981446 (https://phabricator.wikimedia.org/T323828) (owner: 10Phuedx)
[14:41:48] <wikibugs>	 (03Merged) 10jenkins-bot: enable page_rerender for 2nd batch: dewiki, frwiktionary, and kuwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/984810 (owner: 10Peter Fischer)
[14:42:13] <logmsgbot>	 !log tchanders@deploy2002 Started scap: Backport for [[gerrit:984810|enable page_rerender for 2nd batch: dewiki, frwiktionary, and kuwiktionary]]
[14:42:42] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Cannot enter configuration mode on cr2-drmrs - https://phabricator.wikimedia.org/T354340 (10cmooney) 05Open→03Resolved Problem has resolved following device reboot.  It looked like killing the mgd processes in "lockf" state was working, but I made an error a...
[14:45:17] <wikibugs>	 (03PS1) 10Peter Fischer: Search update pipeline: 2nd batch page_rerender [deployment-charts] - 10https://gerrit.wikimedia.org/r/987783 (https://phabricator.wikimedia.org/T351503)
[14:45:27] <Dreamy_Jazz>	 Thanks for doing my backports Tchanders!
[14:45:48] <logmsgbot>	 !log tchanders@deploy2002 pfischer and tchanders: Backport for [[gerrit:984810|enable page_rerender for 2nd batch: dewiki, frwiktionary, and kuwiktionary]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[14:46:01] <wikibugs>	 (03CR) 10Peter Fischer: "MW config change is already on it's way…" [deployment-charts] - 10https://gerrit.wikimedia.org/r/987783 (https://phabricator.wikimedia.org/T351503) (owner: 10Peter Fischer)
[14:46:14] <Tchanders>	 pfischer, how does that look?
[14:46:21] <wikibugs>	 (03CR) 10Dzahn: ""values-landing-page.yaml" - wouldn't that be "values-design-landing-page.yaml" to match the other 2 yaml file names?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/987758 (https://phabricator.wikimedia.org/T350791) (owner: 10Jelto)
[14:46:26] <pfischer>	 Tchanders: just a sec.
[14:46:28] <wikibugs>	 (03PS1) 10Aklapper: phabricator: quarterly_metrics.sh: Correct Year output [puppet] - 10https://gerrit.wikimedia.org/r/987784
[14:47:21] <wikibugs>	 (03PS2) 10Jelto: miscweb: add design.wikimedia.org services [deployment-charts] - 10https://gerrit.wikimedia.org/r/987758 (https://phabricator.wikimedia.org/T350791)
[14:48:08] <Tchanders>	 Dreamy_Jazz, no problem!
[14:49:39] <wikibugs>	 (03CR) 10Jelto: miscweb: add design.wikimedia.org services (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/987758 (https://phabricator.wikimedia.org/T350791) (owner: 10Jelto)
[14:51:38] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/987781 (owner: 10Muehlenhoff)
[14:53:39] <pfischer>	 Tchanders: it’s alright, we can continue
[14:54:09] <Tchanders>	 pfischer, OK
[14:54:13] <logmsgbot>	 !log tchanders@deploy2002 pfischer and tchanders: Continuing with sync
[14:55:48] <wikibugs>	 (03CR) 10Milimetric: [C: 03+1] webrequest varnishkafka - Add to X-Analytics the Sec-Purpose HTTP header [puppet] - 10https://gerrit.wikimedia.org/r/981352 (https://phabricator.wikimedia.org/T346463) (owner: 10Ottomata)
[14:56:08] <logmsgbot>	 !log volans@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on mw1378.eqiad.wmnet with reason: WIP hosts to be setup
[14:56:22] <logmsgbot>	 !log volans@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1378.eqiad.wmnet with reason: WIP hosts to be setup
[14:56:41] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/987779 (owner: 10Muehlenhoff)
[14:57:10] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:57:21] <wikibugs>	 (03CR) 10Btullis: "Looks good." [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849) (owner: 10Brouberol)
[14:57:50] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] Search update pipeline: 2nd batch page_rerender [deployment-charts] - 10https://gerrit.wikimedia.org/r/987783 (https://phabricator.wikimedia.org/T351503) (owner: 10Peter Fischer)
[14:59:38] <volans>	 !log rebooting mw1378 (downtimed and depooled) to debug reboot issues afer reimage - T351074
[14:59:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:59:42] <stashbot>	 T351074: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074
[15:00:08] <logmsgbot>	 !log tchanders@deploy2002 Finished scap: Backport for [[gerrit:984810|enable page_rerender for 2nd batch: dewiki, frwiktionary, and kuwiktionary]] (duration: 17m 55s)
[15:00:55] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Depool esams for cr1 maintenance [dns] - 10https://gerrit.wikimedia.org/r/987782 (https://phabricator.wikimedia.org/T346779) (owner: 10Ayounsi)
[15:00:57] <wikibugs>	 (03CR) 10Peter Fischer: [C: 03+2] Search update pipeline: 2nd batch page_rerender [deployment-charts] - 10https://gerrit.wikimedia.org/r/987783 (https://phabricator.wikimedia.org/T351503) (owner: 10Peter Fischer)
[15:01:28] <XioNoX>	 !log depool esams for router work - T346779
[15:01:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:31] <stashbot>	 T346779: cr1-esams:fpc0 errors - https://phabricator.wikimedia.org/T346779
[15:02:17] <wikibugs>	 (03Merged) 10jenkins-bot: Search update pipeline: 2nd batch page_rerender [deployment-charts] - 10https://gerrit.wikimedia.org/r/987783 (https://phabricator.wikimedia.org/T351503) (owner: 10Peter Fischer)
[15:02:25] <wikibugs>	 (03PS1) 10Ayounsi: Repool esams after maintenance [dns] - 10https://gerrit.wikimedia.org/r/987732 (https://phabricator.wikimedia.org/T346779)
[15:04:43] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:04:44] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:05:02] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:05:03] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:07:16] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:07:22] <wikibugs>	 (03PS9) 10Brouberol: spark3: enable event logging and history server integration for all spark jobs [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849)
[15:07:31] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:07:48] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:07:58] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:08:06] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:08:11] <volans>	 !log rebooting mw1378 (downtimed and depooled) to debug reboot issues afer reimage - T351074
[15:08:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:08:15] <stashbot>	 T351074: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074
[15:08:18] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:08:27] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:08:53] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:12:16] <moritzm>	 !log installing curl security updates
[15:12:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:13:36] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:13:44] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:13:59] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:14:08] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to deployment for WBrown (WMF) - https://phabricator.wikimedia.org/T353735 (10Dreamy_Jazz) Are there any updates on this request?
[15:14:37] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:15:32] <wikibugs>	 (03PS1) 10Btullis: Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166)
[15:15:34] <wikibugs>	 (03PS1) 10Btullis: Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790)
[15:16:09] <XioNoX>	 !log drain esams-eqiad transport - T346779
[15:16:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:16:16] <stashbot>	 T346779: cr1-esams:fpc0 errors - https://phabricator.wikimedia.org/T346779
[15:17:17] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "lgtm, afaict:)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/987758 (https://phabricator.wikimedia.org/T350791) (owner: 10Jelto)
[15:18:35] <wikibugs>	 (03CR) 10Btullis: spark3: enable event logging and history server integration for all spark jobs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849) (owner: 10Brouberol)
[15:19:06] <wikibugs>	 (03CR) 10Dzahn: "the intent of this change is to remove one blocker for "contint on bullseye". with an "eq buster" the php74 APT repo won't be installed on" [puppet] - 10https://gerrit.wikimedia.org/r/987458 (https://phabricator.wikimedia.org/T334517) (owner: 10Dzahn)
[15:19:16] <logmsgbot>	 !log volans@cumin2002 START - Cookbook sre.hosts.provision for host mw1378.mgmt.eqiad.wmnet with reboot policy GRACEFUL
[15:19:54] <wikibugs>	 (03PS4) 10Dzahn: contint: use php7.4 on bullseye just like on buster [puppet] - 10https://gerrit.wikimedia.org/r/987458 (https://phabricator.wikimedia.org/T334517)
[15:19:54] <volans>	 !log running sre.hosts.provision for mw1378 - T351074
[15:19:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:20:04] <stashbot>	 T351074: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074
[15:21:50] <wikibugs>	 (03CR) 10Dzahn: "the intent is to keep using the wmf php packages, so that there is no change for CI besides the rest of the OS upgrade" [puppet] - 10https://gerrit.wikimedia.org/r/987458 (https://phabricator.wikimedia.org/T334517) (owner: 10Dzahn)
[15:26:06] <wikibugs>	 10SRE, 10ops-eqiad, 10Cassandra, 10DC-Ops: hw troubleshooting: SSD failure (/dev/sd3) for aqs1013.eqiad.wmnet - https://phabricator.wikimedia.org/T354200 (10Eevans)
[15:26:55] <XioNoX>	 !log disable peering/transit on cr1-esams for linecard reboot - T346779
[15:26:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:26:58] <stashbot>	 T346779: cr1-esams:fpc0 errors - https://phabricator.wikimedia.org/T346779
[15:29:23] <logmsgbot>	 !log volans@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1378.mgmt.eqiad.wmnet with reboot policy GRACEFUL
[15:30:22] <XioNoX>	 !log reboot fpc0 on cr1-esams - T346779
[15:30:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:29] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 211, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:34:29] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 185.15.59.129, interfaces up: 59, down: 5, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:34:44] <XioNoX>	 expected ^
[15:34:51] <icinga-wm>	 PROBLEM - BFD status on cr2-eqiad is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[15:35:25] <logmsgbot>	 !log volans@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1378.eqiad.wmnet
[15:36:03] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 212, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:36:05] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 185.15.59.129, interfaces up: 64, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[15:36:25] <icinga-wm>	 RECOVERY - BFD status on cr2-eqiad is OK: UP: 19 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[15:37:10] <XioNoX>	 !log re-enable peering/transit on cr1-esams - T346779
[15:37:11] <icinga-wm>	 PROBLEM - IPv4 ping to esams on ripe-atlas-esams is CRITICAL: CRITICAL - failed 292 probes of 794 (alerts on 35) - https://atlas.ripe.net/measurements/59935536/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:37:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:37:26] <stashbot>	 T346779: cr1-esams:fpc0 errors - https://phabricator.wikimedia.org/T346779
[15:37:37] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 320 probes of 728 (alerts on 90) - https://atlas.ripe.net/measurements/59935539/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:38:57] <XioNoX>	 !log undrain esams-eqiad transport - T346779
[15:38:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:06] <wikibugs>	 (03CR) 10Brouberol: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849) (owner: 10Brouberol)
[15:42:50] <wikibugs>	 (03PS2) 10Btullis: Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166)
[15:42:52] <wikibugs>	 (03PS2) 10Btullis: Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790)
[15:43:11] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 75 probes of 728 (alerts on 90) - https://atlas.ripe.net/measurements/59935539/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:43:33] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166) (owner: 10Btullis)
[15:43:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790) (owner: 10Btullis)
[15:46:04] <logmsgbot>	 !log volans@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw1378.eqiad.wmnet
[15:46:28] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Repool esams after maintenance [dns] - 10https://gerrit.wikimedia.org/r/987732 (https://phabricator.wikimedia.org/T346779) (owner: 10Ayounsi)
[15:47:58] <XioNoX>	 !log repool esams - T346779
[15:48:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:48:02] <stashbot>	 T346779: cr1-esams:fpc0 errors - https://phabricator.wikimedia.org/T346779
[15:48:19] <icinga-wm>	 RECOVERY - IPv4 ping to esams on ripe-atlas-esams is OK: OK - failed 8 probes of 794 (alerts on 35) - https://atlas.ripe.net/measurements/59935536/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:51:16] <moritzm>	 !log rolling restart of FPM/apache on mw canaries to pick up curl updates
[15:51:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:52:16] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: cr1-esams:fpc0 errors - https://phabricator.wikimedia.org/T346779 (10ayounsi) a:03ayounsi Error logs stopped showing up after the linecard reboot. Monitoring it for a bit before closing the task.
[15:53:32] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: validate check Prometheus instance [puppet] - 10https://gerrit.wikimedia.org/r/987789
[15:58:42] <moritzm>	 !log installing libdatetime-timezone-perl updates
[15:58:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:59:21] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmcs admin scripts: run everything through Black [puppet] - 10https://gerrit.wikimedia.org/r/987465 (owner: 10Andrew Bogott)
[15:59:32] <logmsgbot>	 !log volans@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1378.eqiad.wmnet
[16:00:03] <logmsgbot>	 !log volans@cumin2002 END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1378.eqiad.wmnet
[16:12:42] <wikibugs>	 (03PS1) 10Muehlenhoff: Switch role::test to nftables [puppet] - 10https://gerrit.wikimedia.org/r/987791
[16:17:29] <wikibugs>	 (03PS10) 10Brouberol: spark3: enable event logging and history server integration for all spark jobs [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849)
[16:20:08] <wikibugs>	 (03CR) 10Brouberol: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849) (owner: 10Brouberol)
[16:22:07] <wikibugs>	 (03PS3) 10Btullis: Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166)
[16:22:09] <wikibugs>	 (03PS3) 10Btullis: Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790)
[16:25:43] <wikibugs>	 (03CR) 10Brouberol: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849) (owner: 10Brouberol)
[16:25:56] <logmsgbot>	 !log volans@cumin2002 START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1378.eqiad.wmnet
[16:26:50] <wikibugs>	 (03PS11) 10Brouberol: spark3: enable event logging and history server integration for all spark jobs [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849)
[16:28:17] <wikibugs>	 (03CR) 10Brouberol: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/984132 (https://phabricator.wikimedia.org/T352849) (owner: 10Brouberol)
[16:35:09] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[16:35:57] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] cli: Fix IRC logging [software/conftool] - 10https://gerrit.wikimedia.org/r/987167 (https://phabricator.wikimedia.org/T354209) (owner: 10Majavah)
[16:36:02] <logmsgbot>	 !log volans@cumin2002 END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw1378.eqiad.wmnet
[16:39:00] <wikibugs>	 (03Merged) 10jenkins-bot: cli: Fix IRC logging [software/conftool] - 10https://gerrit.wikimedia.org/r/987167 (https://phabricator.wikimedia.org/T354209) (owner: 10Majavah)
[16:39:59] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] tox: show black diff on failure [software/conftool] - 10https://gerrit.wikimedia.org/r/987170 (owner: 10Majavah)
[16:41:10] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[16:41:25] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[16:41:58] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[16:42:29] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[16:42:52] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[16:42:56] <wikibugs>	 (03Merged) 10jenkins-bot: tox: show black diff on failure [software/conftool] - 10https://gerrit.wikimedia.org/r/987170 (owner: 10Majavah)
[16:43:12] <logmsgbot>	 !log pfischer@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[16:45:09] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[16:50:14] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: requestctl: ensure no irc logging happens [software/conftool] - 10https://gerrit.wikimedia.org/r/987792 (https://phabricator.wikimedia.org/T354209)
[16:50:16] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Release 2.3.3 [software/conftool] - 10https://gerrit.wikimedia.org/r/987793
[16:51:09] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[16:53:28] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] requestctl: ensure no irc logging happens [software/conftool] - 10https://gerrit.wikimedia.org/r/987792 (https://phabricator.wikimedia.org/T354209) (owner: 10Giuseppe Lavagetto)
[16:56:09] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad mw-api-int (k8s) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-int - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[16:56:33] <wikibugs>	 (03Merged) 10jenkins-bot: requestctl: ensure no irc logging happens [software/conftool] - 10https://gerrit.wikimedia.org/r/987792 (https://phabricator.wikimedia.org/T354209) (owner: 10Giuseppe Lavagetto)
[16:57:16] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Release 2.3.3 [software/conftool] - 10https://gerrit.wikimedia.org/r/987793 (owner: 10Giuseppe Lavagetto)
[16:58:04] <taavi>	 thanks _joe_ 
[17:00:05] <jouncebot>	 jhathaway and rzl: Time to do the Puppet request window deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T1700).
[17:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[17:00:11] <_joe_>	 taavi: I'm in the process of building it
[17:03:03] <wikibugs>	 (03PS4) 10Btullis: Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166)
[17:03:05] <wikibugs>	 (03PS4) 10Btullis: Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790)
[17:03:52] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790) (owner: 10Btullis)
[17:03:54] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166) (owner: 10Btullis)
[17:09:05] <icinga-wm>	 RECOVERY - Host mw1377 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms
[17:10:36] <logmsgbot>	 !log oblivian@puppetmaster2001 conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=mw1377.*
[17:10:46] <_joe_>	 taavi: ^^
[17:10:48] <_joe_>	 :)
[17:12:28] <wikibugs>	 (03PS5) 10Btullis: Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166)
[17:12:30] <wikibugs>	 (03PS5) 10Btullis: Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790)
[17:13:08] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166) (owner: 10Btullis)
[17:13:12] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790) (owner: 10Btullis)
[17:14:49] <icinga-wm>	 PROBLEM - Host mw1377 is DOWN: PING CRITICAL - Packet loss = 100%
[17:16:07] <icinga-wm>	 PROBLEM - Host mw1380 is DOWN: PING CRITICAL - Packet loss = 100%
[17:16:13] <icinga-wm>	 PROBLEM - Host mw1379 is DOWN: PING CRITICAL - Packet loss = 100%
[17:16:29] <icinga-wm>	 RECOVERY - Host mw1380 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms
[17:16:43] <icinga-wm>	 RECOVERY - Host mw1379 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms
[17:16:49] <wikibugs>	 (03PS6) 10Btullis: Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166)
[17:16:51] <wikibugs>	 (03PS6) 10Btullis: Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790)
[17:17:05] <icinga-wm>	 RECOVERY - Host mw1377 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms
[17:17:29] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166) (owner: 10Btullis)
[17:17:34] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790) (owner: 10Btullis)
[17:22:44] <wikibugs>	 (03PS7) 10Btullis: Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166)
[17:22:46] <wikibugs>	 (03PS7) 10Btullis: Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790)
[17:24:32] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166) (owner: 10Btullis)
[17:24:34] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790) (owner: 10Btullis)
[17:24:49] <wikibugs>	 (03PS1) 10Kamila Součková: mw1377: change role to insetup for debugging [puppet] - 10https://gerrit.wikimedia.org/r/987797 (https://phabricator.wikimedia.org/T351074)
[17:25:55] <wikibugs>	 (03CR) 10Kamila Součková: [C: 03+2] mw1377: change role to insetup for debugging [puppet] - 10https://gerrit.wikimedia.org/r/987797 (https://phabricator.wikimedia.org/T351074) (owner: 10Kamila Součková)
[17:26:29] <wikibugs>	 (03PS1) 10RobH: updated for energy star sku [software] - 10https://gerrit.wikimedia.org/r/987798
[17:28:46] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.reimage for host mw1377.eqiad.wmnet with OS bullseye
[17:30:25] <wikibugs>	 (03CR) 10RobH: [C: 03+2] updated for energy star sku [software] - 10https://gerrit.wikimedia.org/r/987798 (owner: 10RobH)
[17:30:58] <jinxer-wm>	 (RdfStreamingUpdaterHighConsumerUpdateLag) firing: wdqs1019:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[17:31:25] <wikibugs>	 (03PS1) 10Andrew Bogott: mwopenstackclients: allow passing the 'edit-managed' flag to designate [puppet] - 10https://gerrit.wikimedia.org/r/987799 (https://phabricator.wikimedia.org/T354365)
[17:31:27] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-novastats-dnsleaks.py: use upstream designate clients [puppet] - 10https://gerrit.wikimedia.org/r/987800 (https://phabricator.wikimedia.org/T354365)
[17:34:59] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[17:35:15] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[17:42:42] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[17:42:57] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[17:43:42] <logmsgbot>	 !log kamila@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1377.eqiad.wmnet with reason: host reimage
[17:46:36] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1377.eqiad.wmnet with reason: host reimage
[17:52:56] <wikibugs>	 (03CR) 10Xcollazo: [C: 03+1] "Fair enough." [puppet] - 10https://gerrit.wikimedia.org/r/986181 (owner: 10Ladsgroup)
[17:54:09] <jinxer-wm>	 (MXQueueHigh) firing: MX host mx2001:9100 has many queued messages: 4843 #page - https://wikitech.wikimedia.org/wiki/Exim - https://grafana.wikimedia.org/d/000000451/mail - https://alerts.wikimedia.org/?q=alertname%3DMXQueueHigh
[17:55:01] <sukhe>	 looking
[17:55:07] * Emperor summoned by phone
[17:55:28] <jhathaway>	 taking a look as well
[17:56:39] <Emperor>	 looks like a lot of mails to one gmail user (and thus gmail is rate-limiting)
[17:57:00] <sukhe>	 see -security
[17:57:53] <sukhe>	 !log mx2001: exiqgrep -i -r w*@gmail.com | xargs exim -Mrm
[17:57:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:00:05] <jouncebot>	 bd808: OwO what's this, a deployment window?? Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T1800). nyaa~
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T1800)
[18:01:40] <bd808>	 nothing for me today
[18:02:30] <wikibugs>	 (03PS8) 10Btullis: Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166)
[18:02:32] <wikibugs>	 (03PS8) 10Btullis: Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790)
[18:03:12] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166) (owner: 10Btullis)
[18:03:17] <logmsgbot>	 !log kamila@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1377.eqiad.wmnet with OS bullseye
[18:03:20] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790) (owner: 10Btullis)
[18:04:09] <jinxer-wm>	 (MXQueueHigh) resolved: MX host mx2001:9100 has many queued messages: 4902 #page - https://wikitech.wikimedia.org/wiki/Exim - https://grafana.wikimedia.org/d/000000451/mail - https://alerts.wikimedia.org/?q=alertname%3DMXQueueHigh
[18:22:40] <wikibugs>	 (03PS9) 10Btullis: Add a deployment chart for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987785 (https://phabricator.wikimedia.org/T352166)
[18:22:42] <wikibugs>	 (03PS9) 10Btullis: Add helmfile deployments for Superset [deployment-charts] - 10https://gerrit.wikimedia.org/r/987786 (https://phabricator.wikimedia.org/T353790)
[18:27:58] <wikibugs>	 (03PS1) 10Dreamy Jazz: Check for invalid JSON on a good response from PhotoDNA [extensions/MediaModeration] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987734 (https://phabricator.wikimedia.org/T354370)
[18:28:10] <wikibugs>	 (03PS1) 10Dreamy Jazz: Check for invalid JSON on a good response from PhotoDNA [extensions/MediaModeration] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/987735 (https://phabricator.wikimedia.org/T354370)
[18:42:10] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[18:46:54] <sukhe>	 !log [second time] mx2001: exiqgrep -i -r w*@gmail.com | xargs exim -Mrm
[18:46:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:10] <jinxer-wm>	 (JobUnavailable) resolved: (2) Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[19:00:05] <jouncebot>	 dduvall and dancy: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for MediaWiki train - Utc-7 Version . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T1900).
[19:04:55] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[19:04:59] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[19:06:15] <dduvall>	 dancy, Jdlrobson: o/
[19:06:31] <dancy>	 o/
[19:06:38] <dduvall>	 let's start with the backport for https://phabricator.wikimedia.org/T353850 
[19:07:22] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by dduvall@deploy2002 using scap backport" [extensions/UniversalLanguageSelector] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987473 (https://phabricator.wikimedia.org/T353850) (owner: 10Jdlrobson)
[19:11:26] <dduvall>	 random thought: scap backport could probably query gerrit for the check (ci job) progress and display it
[19:11:58] <dancy>	 It could definitely do that.
[19:13:16] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] releases: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987436 (owner: 10Muehlenhoff)
[19:13:16] <jinxer-wm>	 (KubernetesRsyslogDown) firing: rsyslog on mw1381:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1381 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[19:18:16] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (6) rsyslog on mw1378:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[19:22:45] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[19:22:53] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[19:24:17] <wikibugs>	 (03Merged) 10jenkins-bot: Revise logic for creating compact links button on Vector 2022 [extensions/UniversalLanguageSelector] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987473 (https://phabricator.wikimedia.org/T353850) (owner: 10Jdlrobson)
[19:24:40] <logmsgbot>	 !log dduvall@deploy2002 Started scap: Backport for [[gerrit:987473|Revise logic for creating compact links button on Vector 2022 (T353850)]]
[19:24:44] <stashbot>	 T353850: UniversalLanguageSelector compact links and @wikimedia/codex loads on page load - https://phabricator.wikimedia.org/T353850
[19:25:02] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[19:25:10] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[19:26:12] <logmsgbot>	 !log dduvall@deploy2002 jdlrobson and dduvall: Backport for [[gerrit:987473|Revise logic for creating compact links button on Vector 2022 (T353850)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[19:26:42] <logmsgbot>	 !log dduvall@deploy2002 jdlrobson and dduvall: Continuing with sync
[19:26:58] <dduvall>	 continuing the sync since this is only to testwikis
[19:28:49] <Jdlrobson>	 dduvall: let me know when synced and i can test!
[19:29:15] <dduvall>	 Jdlrobson: 👍
[19:30:10] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "private.pp and security.pp are affecting deployment servers, common.pp affects the actual releases server(s)" [puppet] - 10https://gerrit.wikimedia.org/r/987436 (owner: 10Muehlenhoff)
[19:32:39] <logmsgbot>	 !log dduvall@deploy2002 Finished scap: Backport for [[gerrit:987473|Revise logic for creating compact links button on Vector 2022 (T353850)]] (duration: 07m 58s)
[19:32:43] <stashbot>	 T353850: UniversalLanguageSelector compact links and @wikimedia/codex loads on page load - https://phabricator.wikimedia.org/T353850
[19:32:49] <dduvall>	 Jdlrobson: ^
[19:32:54] <wikibugs>	 (03PS1) 10Ebernhardson: cirrus-updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/987812
[19:37:23] <Jdlrobson>	 dduvall: it worked \o/
[19:40:18] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+2] cirrus-updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/987812 (owner: 10Ebernhardson)
[19:41:09] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus-updater: Update container image [deployment-charts] - 10https://gerrit.wikimedia.org/r/987812 (owner: 10Ebernhardson)
[19:41:27] <dduvall>	 Jdlrobson: yay. thanks so much for the fix
[19:41:34] <dduvall>	 rolling group0
[19:41:48] <wikibugs>	 (03PS1) 10TrainBranchBot: group0 wikis to 1.42.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987813 (https://phabricator.wikimedia.org/T350088)
[19:41:50] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] group0 wikis to 1.42.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987813 (https://phabricator.wikimedia.org/T350088) (owner: 10TrainBranchBot)
[19:42:34] <wikibugs>	 (03Merged) 10jenkins-bot: group0 wikis to 1.42.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987813 (https://phabricator.wikimedia.org/T350088) (owner: 10TrainBranchBot)
[19:47:08] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] elastic: test out elastic2087 puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/987189 (owner: 10Ryan Kemper)
[19:47:12] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] elastic: prepare new hosts [puppet] - 10https://gerrit.wikimedia.org/r/987188 (https://phabricator.wikimedia.org/T353878) (owner: 10Ryan Kemper)
[19:47:24] <mutante>	 !log deploy1002 - systemctl start rsync-patches_module after gerrit:987436
[19:47:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:49:50] <logmsgbot>	 !log dduvall@deploy2002 rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.12  refs T350088
[19:49:54] <stashbot>	 T350088: 1.42.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T350088
[19:51:51] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[19:52:02] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[19:53:45] <dduvall>	 i'll wait until the hour and then roll group1
[19:55:59] <jinxer-wm>	 (RdfStreamingUpdaterHighConsumerUpdateLag) resolved: wdqs1019:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[19:57:11] <dcausse>	 !log repooling wdqs1019
[19:57:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:59:07] <mutante>	 !log releases1003 - systemctl start rsync-srv-patches-releases-primary after gerrit:987436
[19:59:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:59:47] <brett>	 !log restarting pybal on lvs5006 for testing purposes - T353760
[19:59:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:59:54] <stashbot>	 T353760: pybal_monitor_down_results_total metric only created when PyBal goes down - https://phabricator.wikimedia.org/T353760
[20:01:03] <mutante>	 !log releases2003 - systemctl start rsync-srv-patches-releases2003.codfw.wmnet after gerrit:987436
[20:01:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:03:39] <mutante>	 !log releases2003 - systemctl status rsync-srv-org-wikimedia-releases-releases2003.codfw.wmnet after gerrit:987436
[20:03:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:04:18] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "did some tests:" [puppet] - 10https://gerrit.wikimedia.org/r/987436 (owner: 10Muehlenhoff)
[20:04:41] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 wikis to 1.42.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987816 (https://phabricator.wikimedia.org/T350088)
[20:04:43] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] group1 wikis to 1.42.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987816 (https://phabricator.wikimedia.org/T350088) (owner: 10TrainBranchBot)
[20:06:05] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.42.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987816 (https://phabricator.wikimedia.org/T350088) (owner: 10TrainBranchBot)
[20:07:11] <jinxer-wm>	 (ProbeDown) firing: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip6) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[20:08:47] <wikibugs>	 (03PS4) 10Eevans: restbase: set production role and add config for restbase2034 [puppet] - 10https://gerrit.wikimedia.org/r/981608 (https://phabricator.wikimedia.org/T352468)
[20:08:48] <wikibugs>	 (03PS4) 10Eevans: restbase: set production role and add config for restbase2035 [puppet] - 10https://gerrit.wikimedia.org/r/981609 (https://phabricator.wikimedia.org/T352468)
[20:13:56] <logmsgbot>	 !log dduvall@deploy2002 rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.12  refs T350088
[20:14:07] <stashbot>	 T350088: 1.42.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T350088
[20:16:11] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] mwmaint: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987778 (owner: 10Muehlenhoff)
[20:20:06] <logmsgbot>	 !log dduvall@deploy2002 Synchronized php: group1 wikis to 1.42.0-wmf.12  refs T350088 (duration: 06m 09s)
[20:20:13] <stashbot>	 T350088: 1.42.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T350088
[20:20:32] <wikibugs>	 (03PS1) 10Dreamy Jazz: Ensure all non-okay statuses from ::getImageContents have a message [extensions/MediaModeration] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/987737 (https://phabricator.wikimedia.org/T354374)
[20:21:07] <wikibugs>	 (03PS1) 10Dreamy Jazz: Ensure all non-okay statuses from ::getImageContents have a message [extensions/MediaModeration] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987738 (https://phabricator.wikimedia.org/T354374)
[20:21:35] <wikibugs>	 (03CR) 10Dr0ptp4kt: "Hopefully quick question." [puppet] - 10https://gerrit.wikimedia.org/r/981352 (https://phabricator.wikimedia.org/T346463) (owner: 10Ottomata)
[20:27:10] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[20:30:05] <mutante>	 !log mwmaint2002 - /usr/local/sbin/sync-home-mwmaint after gerrit:987778
[20:30:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:30:28] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "this unified rules into a single file /etc/ferm/conf.d/10_rsyncd_access_home_mwmaint on mwmaint1002. mwmaint2002 is allowed to connect to " [puppet] - 10https://gerrit.wikimedia.org/r/987778 (owner: 10Muehlenhoff)
[20:31:05] <wikibugs>	 (03PS1) 10TrainBranchBot: group2 wikis to 1.42.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987819 (https://phabricator.wikimedia.org/T350088)
[20:31:07] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] group2 wikis to 1.42.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987819 (https://phabricator.wikimedia.org/T350088) (owner: 10TrainBranchBot)
[20:31:10] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] deployment servers: Switch rsync service to use firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/987779 (owner: 10Muehlenhoff)
[20:31:52] <wikibugs>	 (03Merged) 10jenkins-bot: group2 wikis to 1.42.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987819 (https://phabricator.wikimedia.org/T350088) (owner: 10TrainBranchBot)
[20:32:10] <jinxer-wm>	 (JobUnavailable) resolved: (2) Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[20:32:38] <wikibugs>	 (03CR) 10Eevans: [C: 03+2] restbase: set production role and add config for restbase2034 [puppet] - 10https://gerrit.wikimedia.org/r/981608 (https://phabricator.wikimedia.org/T352468) (owner: 10Eevans)
[20:34:07] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "no automatic sync / timer here. since I started the sync manually it's now copying many gigabytes from /home/ladsgroup/moveToExternal/  an" [puppet] - 10https://gerrit.wikimedia.org/r/987778 (owner: 10Muehlenhoff)
[20:39:05] <logmsgbot>	 !log dduvall@deploy2002 rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.12  refs T350088
[20:39:08] <stashbot>	 T350088: 1.42.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T350088
[20:41:11] <ryankemper>	 !log [apifeatureusage] T350703 Restarted `logstash` on `apifeatureusage[1,2]001`
[20:41:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:41:15] <stashbot>	 T350703: Restart Search Platform-owned services for Java 8 / Java 11 security updates - https://phabricator.wikimedia.org/T350703
[21:00:04] <jouncebot>	 brennen and TheresNoTime: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC late backport and config training deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240104T2100).
[21:00:04] <jouncebot>	 Dreamy_Jazz: A patch you scheduled for UTC late backport and config training is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:11] <Dreamy_Jazz>	 \o
[21:00:58] <Dreamy_Jazz>	 My backports cannot be tested as they backport code only used by a maintenance script that is run manually. The script will be tested once the scan is re-started on testwiki either later today or tommorrow.
[21:01:18] <brennen>	 o/
[21:02:09] <brennen>	 Dreamy_Jazz: we're canceling the regular backport training sessions at this time, but i'm around atm and can deploy.
[21:02:22] <Dreamy_Jazz>	 Sure. Thanks.
[21:04:20] <brennen>	 i think we can skip the .10 backports here?  looks like train has reached all wikis.
[21:04:33] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "compiler shows only on gerrit1003 the rsync service and config gets installed, while all other things happen on both servers." [puppet] - 10https://gerrit.wikimedia.org/r/987135 (https://phabricator.wikimedia.org/T257741) (owner: 10EoghanGaffney)
[21:05:16] <Dreamy_Jazz>	 We can if you prefer
[21:05:44] <Dreamy_Jazz>	 I had made them just in case the train rolls back to wmf.10, but don't need to do them necessarily.
[21:06:10] <Dreamy_Jazz>	 I'll go ahead and abandon the wmf.10 backports.
[21:06:16] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] "Approved by brennen@deploy2002 using scap backport" [extensions/MediaModeration] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987734 (https://phabricator.wikimedia.org/T354370) (owner: 10Dreamy Jazz)
[21:06:44] <wikibugs>	 (03Abandoned) 10Dreamy Jazz: Check for invalid JSON on a good response from PhotoDNA [extensions/MediaModeration] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/987735 (https://phabricator.wikimedia.org/T354370) (owner: 10Dreamy Jazz)
[21:06:51] <wikibugs>	 (03Abandoned) 10Dreamy Jazz: Ensure all non-okay statuses from ::getImageContents have a message [extensions/MediaModeration] (wmf/1.42.0-wmf.10) - 10https://gerrit.wikimedia.org/r/987737 (https://phabricator.wikimedia.org/T354374) (owner: 10Dreamy Jazz)
[21:08:37] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to deployment for WBrown (WMF) - https://phabricator.wikimedia.org/T353735 (10thcipriani) Approved! Sorry for the delay!
[21:08:39] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] Ensure all non-okay statuses from ::getImageContents have a message [extensions/MediaModeration] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987738 (https://phabricator.wikimedia.org/T354374) (owner: 10Dreamy Jazz)
[21:08:45] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "btw: cool how you fixed the previous CI issue! didn't know" [puppet] - 10https://gerrit.wikimedia.org/r/987135 (https://phabricator.wikimedia.org/T257741) (owner: 10EoghanGaffney)
[21:08:51] <wikibugs>	 (03Merged) 10jenkins-bot: Check for invalid JSON on a good response from PhotoDNA [extensions/MediaModeration] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987734 (https://phabricator.wikimedia.org/T354370) (owner: 10Dreamy Jazz)
[21:09:08] <logmsgbot>	 !log brennen@deploy2002 Started scap: Backport for [[gerrit:987734|Check for invalid JSON on a good response from PhotoDNA (T354370)]]
[21:09:12] <stashbot>	 T354370: Argument 1 passed to MediaWiki\Extension\MediaModeration\PhotoDNA\Response::newFromArray() must be of the type array, null given - https://phabricator.wikimedia.org/T354370
[21:10:18] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] [gerrit] Add rsync job for lfs sync (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/987135 (https://phabricator.wikimedia.org/T257741) (owner: 10EoghanGaffney)
[21:10:43] <logmsgbot>	 !log brennen@deploy2002 brennen and dreamyjazz: Backport for [[gerrit:987734|Check for invalid JSON on a good response from PhotoDNA (T354370)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:10:57] <wikibugs>	 (03Merged) 10jenkins-bot: Ensure all non-okay statuses from ::getImageContents have a message [extensions/MediaModeration] (wmf/1.42.0-wmf.12) - 10https://gerrit.wikimedia.org/r/987738 (https://phabricator.wikimedia.org/T354374) (owner: 10Dreamy Jazz)
[21:11:14] <logmsgbot>	 !log brennen@deploy2002 brennen and dreamyjazz: Continuing with sync
[21:11:40] <icinga-wm>	 PROBLEM - cassandra-a CQL 10.192.48.234:9042 on restbase2034 is CRITICAL: connect to address 10.192.48.234 and port 9042: Connection refused https://phabricator.wikimedia.org/T93886
[21:12:40] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] research-landing-page: bump version [deployment-charts] - 10https://gerrit.wikimedia.org/r/987707 (https://phabricator.wikimedia.org/T352583) (owner: 10DDesouza)
[21:17:06] <logmsgbot>	 !log brennen@deploy2002 Finished scap: Backport for [[gerrit:987734|Check for invalid JSON on a good response from PhotoDNA (T354370)]] (duration: 07m 57s)
[21:17:14] <stashbot>	 T354370: Argument 1 passed to MediaWiki\Extension\MediaModeration\PhotoDNA\Response::newFromArray() must be of the type array, null given - https://phabricator.wikimedia.org/T354370
[21:18:04] <logmsgbot>	 !log brennen@deploy2002 Started scap: Backport for [[gerrit:987738|Ensure all non-okay statuses from ::getImageContents have a message (T354374)]]
[21:18:08] <stashbot>	 T354374: Internal error: MediaWiki\Status\StatusFormatter::getWikiText: Invalid result object: no error text but not OK in output of scanning script - https://phabricator.wikimedia.org/T354374
[21:19:51] <logmsgbot>	 !log brennen@deploy2002 brennen and dreamyjazz: Backport for [[gerrit:987738|Ensure all non-okay statuses from ::getImageContents have a message (T354374)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[21:20:16] <brennen>	 Dreamy_Jazz: ok, going ahead with this last one
[21:20:19] <Dreamy_Jazz>	 Thanks!
[21:20:20] <logmsgbot>	 !log brennen@deploy2002 brennen and dreamyjazz: Continuing with sync
[21:22:57] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to deployment for WBrown (WMF) - https://phabricator.wikimedia.org/T353735 (10Dreamy_Jazz) >>! In T353735#9436330, @thcipriani wrote: > Approved! Sorry for the delay! Thanks!
[21:26:06] <logmsgbot>	 !log brennen@deploy2002 Finished scap: Backport for [[gerrit:987738|Ensure all non-okay statuses from ::getImageContents have a message (T354374)]] (duration: 08m 01s)
[21:26:10] <stashbot>	 T354374: Internal error: MediaWiki\Status\StatusFormatter::getWikiText: Invalid result object: no error text but not OK in output of scanning script - https://phabricator.wikimedia.org/T354374
[21:26:22] <wikibugs>	 (03CR) 10Dzahn: contint: use php7.4 on bullseye just like on buster (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/987458 (https://phabricator.wikimedia.org/T334517) (owner: 10Dzahn)
[21:26:35] <Dreamy_Jazz>	 Thanks for deploying.
[21:27:05] <brennen>	 sure thing
[21:27:11] <brennen>	 !log end of utc late backport window
[21:27:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:27:37] <wikibugs>	 (03CR) 10Dzahn: contint: use php7.4 on bullseye just like on buster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/987458 (https://phabricator.wikimedia.org/T334517) (owner: 10Dzahn)
[21:38:24] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[21:38:34] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[21:38:59] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on elastic2083:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[21:43:09] <wikibugs>	 (03PS1) 10BCornwall: Add new release of wmf-debci images for building [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/987837 (https://phabricator.wikimedia.org/T352003)
[21:45:42] <wikibugs>	 (03PS2) 10BCornwall: Add new release of wmf-debci images for building [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/987837 (https://phabricator.wikimedia.org/T352003)
[21:46:51] <wikibugs>	 (03CR) 10BCornwall: [C: 03+1] Add new release of wmf-debci images for building [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/987837 (https://phabricator.wikimedia.org/T352003) (owner: 10BCornwall)
[21:47:48] <wikibugs>	 (03CR) 10BCornwall: [V: 03+2 C: 03+2] Add new release of wmf-debci images for building [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/987837 (https://phabricator.wikimedia.org/T352003) (owner: 10BCornwall)
[21:51:39] <wikibugs>	 (03PS1) 10RobH: updated config F skus [software] - 10https://gerrit.wikimedia.org/r/987841
[21:52:01] <wikibugs>	 (03CR) 10RobH: [C: 03+2] updated config F skus [software] - 10https://gerrit.wikimedia.org/r/987841 (owner: 10RobH)
[21:52:33] <wikibugs>	 (03Merged) 10jenkins-bot: updated config F skus [software] - 10https://gerrit.wikimedia.org/r/987841 (owner: 10RobH)
[22:00:22] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[22:00:26] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[22:03:37] <wikibugs>	 (03PS1) 10RobH: updates for Config G [software] - 10https://gerrit.wikimedia.org/r/987843
[22:03:59] <wikibugs>	 (03CR) 10RobH: [C: 03+2] updates for Config G [software] - 10https://gerrit.wikimedia.org/r/987843 (owner: 10RobH)
[22:04:30] <wikibugs>	 (03Merged) 10jenkins-bot: updates for Config G [software] - 10https://gerrit.wikimedia.org/r/987843 (owner: 10RobH)
[22:21:09] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] START helmfile.d/services/miscweb: apply
[22:21:29] <logmsgbot>	 !log dani@deploy2002 helmfile [staging] DONE helmfile.d/services/miscweb: apply
[22:21:34] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] START helmfile.d/services/miscweb: apply
[22:22:01] <logmsgbot>	 !log dani@deploy2002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[22:22:08] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] START helmfile.d/services/miscweb: apply
[22:22:38] <logmsgbot>	 !log dani@deploy2002 helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
[22:24:06] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[22:24:14] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[22:25:30] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[22:25:39] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[22:27:11] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[22:29:11] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[22:29:14] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[22:29:47] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[22:29:58] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[22:31:31] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[22:31:40] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[22:32:11] <jinxer-wm>	 (JobUnavailable) resolved: (2) Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[22:33:35] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[22:33:49] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[22:33:53] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[22:34:26] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[22:36:36] <wikibugs>	 (03PS2) 10BCornwall: slo_definitions: Use trafficserver_backend_sli_bad [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/973872 (https://phabricator.wikimedia.org/T341606)
[22:36:53] <wikibugs>	 (03PS1) 10RobH: R750xs sku updates [software] - 10https://gerrit.wikimedia.org/r/987845
[22:37:51] <wikibugs>	 (03CR) 10BCornwall: "Hello again! We've accumulated more data and a preview of the dashboard can be found at https://grafana-rw.wikimedia.org/dashboard/snapsho" [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/973872 (https://phabricator.wikimedia.org/T341606) (owner: 10BCornwall)
[22:38:28] <wikibugs>	 (03CR) 10RobH: [C: 03+2] R750xs sku updates [software] - 10https://gerrit.wikimedia.org/r/987845 (owner: 10RobH)
[22:39:58] <wikibugs>	 (03CR) 10BCornwall: "Hello again! We've accumulated more data and a preview of the dashboard can be found at https://grafana-rw.wikimedia.org/dashboard/snapsho" [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/973871 (https://phabricator.wikimedia.org/T341606) (owner: 10BCornwall)
[22:48:17] <wikibugs>	 (03PS2) 10Andrew Bogott: mwopenstackclients: allow passing the 'edit-managed' flag to designate [puppet] - 10https://gerrit.wikimedia.org/r/987799 (https://phabricator.wikimedia.org/T354365)
[22:48:19] <wikibugs>	 (03PS2) 10Andrew Bogott: wmcs-novastats-dnsleaks.py: use upstream designate clients [puppet] - 10https://gerrit.wikimedia.org/r/987800 (https://phabricator.wikimedia.org/T354365)
[22:48:21] <wikibugs>	 (03PS1) 10Andrew Bogott: Rename wmcs-novastats-dnsleaks to wmcs-dnsleaks [puppet] - 10https://gerrit.wikimedia.org/r/987851 (https://phabricator.wikimedia.org/T354365)
[22:48:23] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-dnsleaks.py: add the --to-prometheus flag [puppet] - 10https://gerrit.wikimedia.org/r/987852 (https://phabricator.wikimedia.org/T354365)
[22:48:25] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-dnsleaks: Add prometheus metric [puppet] - 10https://gerrit.wikimedia.org/r/987853 (https://phabricator.wikimedia.org/T354365)
[22:49:26] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmcs-dnsleaks.py: add the --to-prometheus flag [puppet] - 10https://gerrit.wikimedia.org/r/987852 (https://phabricator.wikimedia.org/T354365) (owner: 10Andrew Bogott)
[22:50:43] <wikibugs>	 (03PS1) 10RobH: config I skus [software] - 10https://gerrit.wikimedia.org/r/987855
[22:50:51] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] config I skus [software] - 10https://gerrit.wikimedia.org/r/987855 (owner: 10RobH)
[22:57:46] <wikibugs>	 (03PS1) 10RobH: update sku for g15 restbase [software] - 10https://gerrit.wikimedia.org/r/987857
[22:58:09] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] update sku for g15 restbase [software] - 10https://gerrit.wikimedia.org/r/987857 (owner: 10RobH)
[22:58:30] <wikibugs>	 (03CR) 10RobH: [C: 03+2] config I skus [software] - 10https://gerrit.wikimedia.org/r/987855 (owner: 10RobH)
[22:58:51] <wikibugs>	 (03CR) 10RobH: [V: 03+2 C: 03+2] config I skus [software] - 10https://gerrit.wikimedia.org/r/987855 (owner: 10RobH)
[22:59:13] <wikibugs>	 (03PS1) 10Andrew Bogott: team-wmcs: alert when stray DNS records appear in designate. [alerts] - 10https://gerrit.wikimedia.org/r/987858 (https://phabricator.wikimedia.org/T354365)
[22:59:20] <wikibugs>	 (03CR) 10RobH: [C: 03+2] update sku for g15 restbase [software] - 10https://gerrit.wikimedia.org/r/987857 (owner: 10RobH)
[22:59:54] <wikibugs>	 (03Merged) 10jenkins-bot: update sku for g15 restbase [software] - 10https://gerrit.wikimedia.org/r/987857 (owner: 10RobH)
[23:00:09] <wikibugs>	 (03CR) 10Andrew Bogott: "That {{ $value }} bit is cribbed from another alert here but I'm not sure I understand what it actually does." [alerts] - 10https://gerrit.wikimedia.org/r/987858 (https://phabricator.wikimedia.org/T354365) (owner: 10Andrew Bogott)
[23:01:00] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] team-wmcs: alert when stray DNS records appear in designate. [alerts] - 10https://gerrit.wikimedia.org/r/987858 (https://phabricator.wikimedia.org/T354365) (owner: 10Andrew Bogott)
[23:01:28] <wikibugs>	 (03PS1) 10Bking: aptrepo: add Elastic-related components to bookworm repo [puppet] - 10https://gerrit.wikimedia.org/r/987859 (https://phabricator.wikimedia.org/T353392)
[23:02:02] <wikibugs>	 (03PS2) 10Andrew Bogott: wmcs-dnsleaks.py: add the --to-prometheus flag [puppet] - 10https://gerrit.wikimedia.org/r/987852 (https://phabricator.wikimedia.org/T354365)
[23:02:04] <wikibugs>	 (03PS2) 10Andrew Bogott: wmcs-dnsleaks: Add prometheus metric [puppet] - 10https://gerrit.wikimedia.org/r/987853 (https://phabricator.wikimedia.org/T354365)
[23:05:55] <wikibugs>	 (03PS2) 10Andrew Bogott: team-wmcs: alert when stray DNS records appear in designate. [alerts] - 10https://gerrit.wikimedia.org/r/987858 (https://phabricator.wikimedia.org/T354365)
[23:07:08] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] team-wmcs: alert when stray DNS records appear in designate. [alerts] - 10https://gerrit.wikimedia.org/r/987858 (https://phabricator.wikimedia.org/T354365) (owner: 10Andrew Bogott)
[23:10:25] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[23:10:32] <logmsgbot>	 !log ebernhardson@deploy2002 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[23:16:41] <wikibugs>	 (03CR) 10Andrew Bogott: "recheck" [alerts] - 10https://gerrit.wikimedia.org/r/987858 (https://phabricator.wikimedia.org/T354365) (owner: 10Andrew Bogott)
[23:17:21] <wikibugs>	 (03PS1) 10VolkerE: styles: Replace obsolete WikimediaUI Base var with Codex alias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/987861
[23:18:16] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (6) rsyslog on mw1378:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown