[00:13:36] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:13:42] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:23:00] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:23:06] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:25:04] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:25:11] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:30:59] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:31:05] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:37:33] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1011427 [00:37:35] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1011427 (owner: 10TrainBranchBot) [00:45:40] (KubernetesRsyslogDown) firing: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [00:46:54] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:47:01] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:50:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [00:54:30] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:54:37] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:58:51] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:58:57] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:00:31] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1011427 (owner: 10TrainBranchBot) [01:12:11] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:12:13] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:12:18] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:17:40] (KubernetesRsyslogDown) firing: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [01:18:31] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:18:38] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:20:45] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:20:51] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:23:58] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:24:04] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:27:32] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:27:38] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:32:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [01:33:25] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:37:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:45:25] (SystemdUnitFailed) firing: httpbb_kubernetes_mw-parsoid_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:52:40] (KubernetesRsyslogDown) firing: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [01:57:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [02:04:40] (KubernetesRsyslogDown) firing: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [02:14:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [02:20:40] (KubernetesRsyslogDown) firing: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [02:37:16] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:45:25] (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-parsoid_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:57:16] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:00:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [03:06:32] 06SRE, 10SRE-swift-storage, 06Data-Persistence, 10Thumbor, and 5 others: Change default image thumbnail size - https://phabricator.wikimedia.org/T355914#9636507 (10Ganesha811) @Jdlrobson, do you know if there has been any further progress on this? Has this change been added to the schedule or work plan of... [03:16:40] (KubernetesRsyslogDown) firing: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [03:21:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [03:32:16] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:32:22] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:27:35] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:27:42] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:29:40] (KubernetesRsyslogDown) firing: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [04:34:25] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:34:32] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:34:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [04:40:32] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:40:38] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:42:40] (KubernetesRsyslogDown) firing: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [04:42:41] (03PS1) 10Andrew Bogott: cloudbackups: move postgres data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/1012091 [04:44:57] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:45:02] (03CR) 10CI reject: [V:04-1] cloudbackups: move postgres data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/1012091 (owner: 10Andrew Bogott) [04:45:04] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:46:31] (03PS2) 10Andrew Bogott: cloudbackups: move postgres data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/1012091 [04:48:11] (03CR) 10Andrew Bogott: [C:03+2] cloudbackups: move postgres data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/1012091 (owner: 10Andrew Bogott) [04:51:58] (SystemdUnitFailed) firing: (2) netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:57:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [05:06:02] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:06:09] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:16:58] (SystemdUnitFailed) firing: (2) netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:17:06] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:17:13] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:37:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:42:40] (KubernetesRsyslogDown) firing: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [05:46:22] <_joe_> !log restarted rsyslog on mw1374 [05:46:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:47:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1374:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1374 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [05:54:57] (03PS1) 10KartikMistry: Update cxserver to 2024-03-18-053939-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1012130 (https://phabricator.wikimedia.org/T350773) [06:04:01] * kart_ deploying cxserver.. [06:04:07] (03CR) 10KartikMistry: [C:03+2] Update cxserver to 2024-03-18-053939-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1012130 (https://phabricator.wikimedia.org/T350773) (owner: 10KartikMistry) [06:05:28] (03Merged) 10jenkins-bot: Update cxserver to 2024-03-18-053939-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1012130 (https://phabricator.wikimedia.org/T350773) (owner: 10KartikMistry) [06:07:01] !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply [06:07:27] !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply [06:22:33] !log kartik@deploy2002 helmfile [codfw] START helmfile.d/services/cxserver: apply [06:23:06] !log kartik@deploy2002 helmfile [codfw] DONE helmfile.d/services/cxserver: apply [06:23:39] !log kartik@deploy2002 helmfile [eqiad] START helmfile.d/services/cxserver: apply [06:24:15] !log kartik@deploy2002 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply [06:25:20] !log Updated cxserver to 2024-03-18-053939-production (T350773) [06:25:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:25:27] T350773: Remove preq and use node fetch - https://phabricator.wikimedia.org/T350773 [06:26:30] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:26:37] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:31:45] (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [06:45:25] (SystemdUnitFailed) firing: httpbb_kubernetes_mw-api-ext_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:57:31] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [07:04:21] (PoolcounterFullQueues) firing: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [07:09:21] (PoolcounterFullQueues) resolved: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [07:19:51] 10ops-codfw, 06SRE, 06Infrastructure-Foundations, 10netops: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9636601 (10ayounsi) FYI it's alerting for one of its PSU being down, but we don't really care anymore : > asw-a-codfw> show system alarms > 1 alarms currently active > Ala... [07:21:45] (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads [07:23:25] (03PS1) 10Ayounsi: Remove ripe-atlas-ulsfo from Icinga [puppet] - 10https://gerrit.wikimedia.org/r/1012225 (https://phabricator.wikimedia.org/T325824) [07:30:45] (03CR) 10Ayounsi: [C:03+2] "https://puppet-compiler.wmflabs.org/output/1012225/1646/alert1001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1012225 (https://phabricator.wikimedia.org/T325824) (owner: 10Ayounsi) [07:30:54] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:31:00] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:37:25] (SystemdUnitFailed) resolved: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:42:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:45:25] (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-api-ext_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:48:59] 10ops-codfw, 06DBA: Netbox: mismatched device models: PowerEdge R450 - ConfigE-10G (netbox) != PowerEdge R650xs (puppetdb) - https://phabricator.wikimedia.org/T360285 (10ayounsi) 03NEW [07:49:24] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:49:31] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:52:40] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:52:47] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:00:05] Amir1 and Urbanecm: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240318T0800). [08:00:05] Ammar and kostajh: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [08:00:42] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:00:48] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:01:04] 10ops-codfw, 06DC-Ops: Netbox: mismatched device models: PowerEdge R450 - ConfigE-10G (netbox) != PowerEdge R650xs (puppetdb) - https://phabricator.wikimedia.org/T360285#9636651 (10ABran-WMF) p:05Triage→03Medium [08:02:30] hi, I'm here [08:05:15] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:05:21] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:06:59] I see a scap message that I haven't seen before, so I don't think I'll proceed with deployment (cc jnuche hashar) [08:07:07] > 08:05:41 Change '1008111', project 'mediawiki/core', branch 'master' not found in any deployed wikiversion. Deployed wikiversions: ['1.42.0-wmf.22'] [08:07:21] That is for deploying an operations/mediawiki-config change [08:23:38] (03CR) 10Kosta Harlan: "I've left a comment on https://phabricator.wikimedia.org/T360145#9636661" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011662 (https://phabricator.wikimedia.org/T360145) (owner: 10Ammarpad) [08:27:03] kostajh: hi, the change number in that error message is 1008111 which is indeed mediawiki/core: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1008111 [08:27:07] think your change is https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1008112 [08:29:05] jnuche: oh dear. Thanks for catching that [08:29:24] np :) [08:29:36] (03PS1) 10Muehlenhoff: Remove access for pearley [puppet] - 10https://gerrit.wikimedia.org/r/1012340 [08:32:16] jnuche: wait, there is a bug here after all [08:32:43] https://phabricator.wikimedia.org/P58805 [08:33:55] mmh, looking [08:37:05] !log root@cumin2002 START - Cookbook sre.idm.logout Logging PEarley (WMF) out of all services on: 2210 hosts [08:37:16] kostajh: I'm gonna file a bug for this [08:37:20] in the meantime, can you remove the `Depends-On: Ibf36ac96f717107bace6f0a3326f79ed129a1dfe` from your commit message? [08:37:28] that should let you deploy your change? [08:37:36] s/?// [08:37:43] (03PS4) 10Kosta Harlan: throttle: Allow for overriding temp account creation limits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1008112 (https://phabricator.wikimedia.org/T357777) [08:37:57] !log root@cumin2002 END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging PEarley (WMF) out of all services on: 2210 hosts [08:37:59] jnuche: yes, it does [08:38:28] (03CR) 10Muehlenhoff: [C:03+2] Remove access for pearley [puppet] - 10https://gerrit.wikimedia.org/r/1012340 (owner: 10Muehlenhoff) [08:39:26] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1008112 (https://phabricator.wikimedia.org/T357777) (owner: 10Kosta Harlan) [08:39:26] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011662 (https://phabricator.wikimedia.org/T360145) (owner: 10Ammarpad) [08:39:30] that was one hell of a coincidence with the change numbers btw [08:40:48] (03Merged) 10jenkins-bot: throttle: Add throttle rule for editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011662 (https://phabricator.wikimedia.org/T360145) (owner: 10Ammarpad) [08:44:28] (03PS5) 10Kosta Harlan: throttle: Allow for overriding temp account creation limits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1008112 (https://phabricator.wikimedia.org/T357777) [08:44:38] (03CR) 10TrainBranchBot: "Approved by kharlan@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1008112 (https://phabricator.wikimedia.org/T357777) (owner: 10Kosta Harlan) [08:46:04] (03Merged) 10jenkins-bot: throttle: Allow for overriding temp account creation limits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1008112 (https://phabricator.wikimedia.org/T357777) (owner: 10Kosta Harlan) [08:47:04] !log kharlan@deploy2002 Started scap: Backport for [[gerrit:1008112|throttle: Allow for overriding temp account creation limits (T357777)]], [[gerrit:1011662|throttle: Add throttle rule for editathon (T360145)]] [08:47:09] T357777: Implement more restrictive rate limit for temporary account creation - https://phabricator.wikimedia.org/T357777 [08:47:10] T360145: Requesting temporary lift of IP cap - Wiki Edit-a-thon March 21 - https://phabricator.wikimedia.org/T360145 [09:15:28] hmm, I see a 503 error in scap now, so I will roll this back. [09:16:09] !log kharlan@deploy2002 ammarpad and kharlan: Backport for [[gerrit:1008112|throttle: Allow for overriding temp account creation limits (T357777)]], [[gerrit:1011662|throttle: Add throttle rule for editathon (T360145)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [09:16:17] T357777: Implement more restrictive rate limit for temporary account creation - https://phabricator.wikimedia.org/T357777 [09:16:18] T360145: Requesting temporary lift of IP cap - Wiki Edit-a-thon March 21 - https://phabricator.wikimedia.org/T360145 [09:16:23] ... but on "retry" it seems ok [09:17:13] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:17:14] !log kharlan@deploy2002 ammarpad and kharlan: Continuing with sync [09:21:11] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:21:28] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:29:28] !log kharlan@deploy2002 Finished scap: Backport for [[gerrit:1008112|throttle: Allow for overriding temp account creation limits (T357777)]], [[gerrit:1011662|throttle: Add throttle rule for editathon (T360145)]] (duration: 42m 23s) [09:29:37] T357777: Implement more restrictive rate limit for temporary account creation - https://phabricator.wikimedia.org/T357777 [09:29:37] T360145: Requesting temporary lift of IP cap - Wiki Edit-a-thon March 21 - https://phabricator.wikimedia.org/T360145 [09:30:26] !log UTC morning deploys done [09:30:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:43:53] !log installing libuv1 security updates [09:43:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:37] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:47:43] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:49:44] (duration: 42m 23s) [09:50:04] iirc the first backport of monday ends up syncing everything for whatever reason [09:50:15] but I don't know what/why [09:50:40] (KubernetesRsyslogDown) firing: rsyslog on mw1474:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1474 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [09:50:45] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:50:51] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:53:23] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:53:30] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:54:28] ah no `Finished check-testservers (duration: 17m 32s)` that took a bit of time to verify/test [09:54:33] still that is a long sync :/ [09:55:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1474:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1474 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [09:55:51] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:55:58] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:00:48] !log jmm@cumin2002 START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw [10:05:31] !log jmm@cumin2002 END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-codfw [10:05:40] (KubernetesRsyslogDown) firing: rsyslog on mw1474:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1474 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [10:13:08] !log jmm@cumin2002 START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad [10:15:39] 06SRE, 10SRE-Access-Requests: Requesting access to "researchers" and "analytics-privatedata-users" for Xiao Xiao - https://phabricator.wikimedia.org/T352098#9636743 (10Marostegui) 05Resolved→03Open We just had an email letting us know that this key is now being used in WMCS as well [10:18:19] !log jmm@cumin2002 END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad [10:20:37] !log jmm@cumin2002 START - Cookbook sre.maps.roll-restart-reboot-master rolling restart_daemons on A:maps-master [10:21:41] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:21:48] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:22:01] !log jmm@cumin2002 END (PASS) - Cookbook sre.maps.roll-restart-reboot-master (exit_code=0) rolling restart_daemons on A:maps-master [10:24:41] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Adapt profile::nginx to new packaging scheme introduced in Bookworm - https://phabricator.wikimedia.org/T329529#9636867 (10MoritzMuehlenhoff) [10:25:13] (03CR) 10Ayounsi: [C:03+1] "lgtm, one small suggestion inline." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/1009789 (https://phabricator.wikimedia.org/T359629) (owner: 10Cathal Mooney) [10:25:29] (03PS1) 10Muehlenhoff: apt-staging: Select the custom nginx provider with no additional modules [puppet] - 10https://gerrit.wikimedia.org/r/1012346 (https://phabricator.wikimedia.org/T329529) [10:25:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1474:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1474 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [10:25:45] 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Adapt profile::nginx to new packaging scheme introduced in Bookworm - https://phabricator.wikimedia.org/T329529#9636888 (10MoritzMuehlenhoff) [10:29:11] (03PS1) 10Aklapper: phabricator: MFA status check: Exclude bot accounts [puppet] - 10https://gerrit.wikimedia.org/r/1012350 [10:30:39] 10SRE-tools, 06cloud-services-team, 06Infrastructure-Foundations, 10Spicerack: 14[spicerack] Add remote command output to log file - 14https://phabricator.wikimedia.org/T347093#9636927 (10aborrero) 14I was bitten by this recently. I think the proposal made to show at least _something_ in the logs with... [10:31:23] (03PS1) 10Majavah: hieradata: remove non-private nets from private_reverse_zones [puppet] - 10https://gerrit.wikimedia.org/r/1012351 [10:31:39] (03CR) 10Jelto: [C:03+1] "lgtm, one comment in-line!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011028 (https://phabricator.wikimedia.org/T350796) (owner: 10AOkoth) [10:35:12] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1012346 (https://phabricator.wikimedia.org/T329529) (owner: 10Muehlenhoff) [10:36:06] (03CR) 10Majavah: [C:03+2] P:toolforge::proxy: fix custom 429 page [puppet] - 10https://gerrit.wikimedia.org/r/1011301 (owner: 10Majavah) [10:38:56] !log jmm@cumin2002 START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-codfw [10:41:40] (KubernetesRsyslogDown) firing: rsyslog on mw1474:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1474 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [10:43:03] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:43:10] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:46:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1474:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1474 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [10:50:50] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:50:56] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:52:11] !log jmm@cumin2002 END (FAIL) - Cookbook sre.aqs.roll-restart-reboot (exit_code=1) rolling restart_daemons on A:aqs-codfw [10:53:27] 10SRE-tools, 10Spicerack, 10Puppet (Puppet 7.0): Spicerack puppetserver.destroy() raises an exception when certificate does not exist - https://phabricator.wikimedia.org/T360293 (10taavi) 03NEW [10:53:58] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:54:04] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:55:10] (KubernetesRsyslogDown) firing: (2) rsyslog on mw1382:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [10:55:35] (03PS19) 10Arnaudb: mysqld-exporter-config: simplify manual runs [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) [10:57:31] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [10:58:32] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:58:39] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:00:04] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240318T1100) [11:00:10] (KubernetesRsyslogDown) resolved: (2) rsyslog on mw1382:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [11:07:25] (03PS3) 10KartikMistry: Enable Content/Section translation on some Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010226 (https://phabricator.wikimedia.org/T353510) [11:08:42] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:08:49] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:08:51] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (NOOP 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1647/console" [puppet] - 10https://gerrit.wikimedia.org/r/1011138 (https://phabricator.wikimedia.org/T284656) (owner: 10Majavah) [11:08:58] (03CR) 10Majavah: [V:03+1 C:03+2] kubeadm: Drop buster support [puppet] - 10https://gerrit.wikimedia.org/r/1011138 (https://phabricator.wikimedia.org/T284656) (owner: 10Majavah) [11:09:40] (KubernetesRsyslogDown) firing: rsyslog on mw1382:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1382 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [11:13:40] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:13:47] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:14:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1382:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1382 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [11:16:58] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:17:05] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:17:58] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 10Puppet (Puppet 7.0): Spicerack puppetserver.destroy() raises an exception when certificate does not exist - https://phabricator.wikimedia.org/T360293#9637119 (10Volans) p:05Triage→03Medium That's indeed the current behaviour and clearly an error... [11:20:04] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:20:11] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:38:40] (KubernetesRsyslogDown) firing: rsyslog on mw1474:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1474 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [11:40:30] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:40:36] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:42:25] (SystemdUnitFailed) firing: rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:43:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1474:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1474 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [11:49:57] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:50:04] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:57:40] (KubernetesRsyslogDown) firing: rsyslog on mw1382:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1382 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [11:58:29] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:58:35] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:58:55] (KubernetesRsyslogDown) firing: (2) rsyslog on mw1382:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [12:00:50] (03PS1) 10Fabfur: Revert "hiera: temporary disable haproxy logging to benthos for cp4037" [puppet] - 10https://gerrit.wikimedia.org/r/1011453 [12:02:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1382:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1382 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [12:05:23] (03CR) 10Fabfur: [V:03+1] "PCC SUCCESS (NOOP 1 CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1011453 (owner: 10Fabfur) [12:05:41] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:05:48] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:08:31] !log disabling puppet and depooling cp4037 to gradually apply new HAProxy/Benthos configuration (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1011453) T358109 [12:08:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:47] T358109: Install new Benthos instance on cp hosts - https://phabricator.wikimedia.org/T358109 [12:08:56] !log fabfur@cumin1002 conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet [12:09:29] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:09:35] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:10:44] (03CR) 10Fabfur: [V:03+1 C:03+2] Revert "hiera: temporary disable haproxy logging to benthos for cp4037" [puppet] - 10https://gerrit.wikimedia.org/r/1011453 (owner: 10Fabfur) [12:12:30] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:12:37] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:15:43] (03PS1) 10KartikMistry: Update cxserver to 2024-03-18-111401-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1012364 (https://phabricator.wikimedia.org/T353510) [12:22:25] (SystemdUnitFailed) firing: (2) rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:23:38] (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011431 [12:24:09] (03PS1) 10Majavah: P:toolforge::harbor: add back docker config [puppet] - 10https://gerrit.wikimedia.org/r/1012368 [12:26:06] (03CR) 10David Caro: P:toolforge::harbor: add back docker config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1012368 (owner: 10Majavah) [12:26:44] (03CR) 10Majavah: P:toolforge::harbor: add back docker config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1012368 (owner: 10Majavah) [12:31:07] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:31:14] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:41:30] (ProbeDown) firing: Service wdqs1014:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:41:33] (03CR) 10David Caro: [C:03+1] P:toolforge::harbor: add back docker config [puppet] - 10https://gerrit.wikimedia.org/r/1012368 (owner: 10Majavah) [12:41:51] (03CR) 10Majavah: [C:03+2] P:toolforge::harbor: add back docker config [puppet] - 10https://gerrit.wikimedia.org/r/1012368 (owner: 10Majavah) [12:45:04] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:45:11] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:45:44] (03CR) 10Majavah: [C:03+2] dynamicproxy: fix 429 error page [puppet] - 10https://gerrit.wikimedia.org/r/1011302 (owner: 10Majavah) [12:46:02] !log slyngshede@cumin1002 START - Cookbook sre.ganeti.reboot-vm for VM idp-test1003.wikimedia.org [12:46:30] (ProbeDown) resolved: (2) Service wdqs1014:443 has failed probes (http_wdqs_external_sparql_endpoint_search_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:49:53] !log slyngshede@cumin1002 END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test1003.wikimedia.org [12:57:08] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:57:15] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:00:04] RoanKattouw, Lucas_WMDE, Urbanecm, awight, and TheresNoTime: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240318T1300). [13:00:04] No Gerrit patches in the queue for this window AFAICS. [13:08:30] 10ops-codfw, 06SRE, 06DC-Ops: Netbox: mismatched device models: PowerEdge R450 - ConfigE-10G (netbox) != PowerEdge R650xs (puppetdb) - https://phabricator.wikimedia.org/T360285#9637761 (10RobH) a:03Jhancock.wm [13:09:13] 10ops-codfw, 06SRE, 06DC-Ops: Netbox: mismatched device models: PowerEdge R450 - ConfigE-10G (netbox) != PowerEdge R650xs (puppetdb) - https://phabricator.wikimedia.org/T360285#9637759 (10RobH) It appears when these were racked they had the incorrect model selected. There was already an R650 in Netbox (that... [13:10:26] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:10:33] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:12:36] !log slyngshede@cumin1002 START - Cookbook sre.hosts.reimage for host idp-test1003.wikimedia.org with OS bookworm [13:16:13] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1011438 [13:17:13] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:17:53] (03CR) 10Muehlenhoff: P:idp Use Tomcat9 build for Bookworm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1009709 (https://phabricator.wikimedia.org/T357748) (owner: 10Slyngshede) [13:19:29] 10ops-codfw, 06SRE: 14Inbound interface errors - 14https://phabricator.wikimedia.org/T358417#9637812 (10Papaul) 05Open→03Resolved 14We have been having this issue a long time ago with this same server so I always close the task when i can the inbound interface error on this server. The error is not t... [13:20:58] 10ops-codfw, 06SRE, 06DC-Ops, 06serviceops: mw2420-mw2451 do have unnecessary raid controllers (configured) - https://phabricator.wikimedia.org/T358489#9637825 (10Papaul) @JMeybohm hello is there anything DC-ops need to do on this task? [13:22:41] !log slyngshede@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1003.wikimedia.org with reason: host reimage [13:23:22] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:23:29] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:24:52] !log slyngshede@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1003.wikimedia.org with reason: host reimage [13:24:56] 10ops-codfw, 06SRE, 06DC-Ops: 14Netbox: mismatched device models: PowerEdge R450 - ConfigE-10G (netbox) != PowerEdge R650xs (puppetdb) - 14https://phabricator.wikimedia.org/T360285#9637828 (10Jhancock.wm) 05Open→03Resolved 14My bad! I got that fixed. Must have been a muscle memory thing. [13:25:11] (03PS1) 10Slyngshede: C:tomcat Pin Tomcat9 packages for Bookworm. [puppet] - 10https://gerrit.wikimedia.org/r/1012374 (https://phabricator.wikimedia.org/T357748) [13:25:54] 06SRE, 06serviceops: mw2420-mw2451 do have unnecessary raid controllers (configured) - https://phabricator.wikimedia.org/T358489#9637831 (10JMeybohm) >>! In T358489#9637825, @Papaul wrote: > @JMeybohm hello is there anything DC-ops need to do on this task? No, all good from your end. Cheers [13:29:22] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:29:29] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:32:34] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:32:40] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:36:52] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1649/console" [puppet] - 10https://gerrit.wikimedia.org/r/1012374 (https://phabricator.wikimedia.org/T357748) (owner: 10Slyngshede) [13:36:53] !log jmm@cumin2002 START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on A:restbase-codfw [13:37:57] (03PS20) 10Arnaudb: mysqld-exporter-config: simplify manual runs [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) [13:38:23] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:38:30] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:38:47] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1012374 (https://phabricator.wikimedia.org/T357748) (owner: 10Slyngshede) [13:39:10] (03Abandoned) 10Muehlenhoff: tomcat: When on bookworm, install from component [puppet] - 10https://gerrit.wikimedia.org/r/1009506 (https://phabricator.wikimedia.org/T359333) (owner: 10Muehlenhoff) [13:40:22] !log slyngshede@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test1003.wikimedia.org with OS bookworm [13:40:54] (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1650/console" [puppet] - 10https://gerrit.wikimedia.org/r/1012374 (https://phabricator.wikimedia.org/T357748) (owner: 10Slyngshede) [13:41:22] (03CR) 10Slyngshede: [V:03+1 C:03+2] C:tomcat Pin Tomcat9 packages for Bookworm. [puppet] - 10https://gerrit.wikimedia.org/r/1012374 (https://phabricator.wikimedia.org/T357748) (owner: 10Slyngshede) [13:41:37] !log installing tar security updates [13:41:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:42:17] !log Restarted MediaModeration scanning script for commonswiki - https://wikitech.wikimedia.org/wiki/MediaModeration [13:42:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:42:25] (SystemdUnitFailed) firing: (2) rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:44:17] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:44:24] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:45:34] !log jmm@cumin2002 END (PASS) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=0) rolling restart_daemons on A:restbase-codfw [13:47:21] 10ops-codfw, 06SRE: Degraded RAID on elastic2037 - https://phabricator.wikimedia.org/T359742#9637936 (10Jhancock.wm) This server is out of warranty. I searched our spare stock and I do not have an exact replacement for this drive (1.6 TB). I do have a slightly larger one available (1.92 GB). If that is accepta... [13:47:25] (SystemdUnitFailed) firing: (2) rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:47:57] (03CR) 10Arnaudb: "idempotency has been tested so far but I was wondering:" [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) (owner: 10Arnaudb) [13:49:42] (03PS21) 10Arnaudb: mysqld-exporter-config: simplify manual runs [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) [13:52:13] i am going to steal the remainder of the window [13:52:47] (03PS2) 10Urbanecm: [Growth] frwiki: Enable personalized praise backend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011284 (https://phabricator.wikimedia.org/T360152) [13:52:49] (03CR) 10Urbanecm: [C:03+2] [Growth] frwiki: Enable personalized praise backend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011284 (https://phabricator.wikimedia.org/T360152) (owner: 10Urbanecm) [13:53:39] (03CR) 10CI reject: [V:04-1] mysqld-exporter-config: simplify manual runs [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) (owner: 10Arnaudb) [13:53:43] !log jnuche@deploy2002 Started deploy [releng/jenkins-deploy@b3ccf85] (releasing): (no justification provided) [13:53:52] (03PS22) 10Arnaudb: mysqld-exporter-config: simplify manual runs [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) [13:54:06] !log jmm@cumin2002 START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on A:restbase-eqiad [13:54:23] (03Merged) 10jenkins-bot: [Growth] frwiki: Enable personalized praise backend [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011284 (https://phabricator.wikimedia.org/T360152) (owner: 10Urbanecm) [13:54:37] !log jnuche@deploy2002 Finished deploy [releng/jenkins-deploy@b3ccf85] (releasing): (no justification provided) (duration: 00m 54s) [13:56:12] (03PS23) 10Arnaudb: mysqld-exporter-config: simplify manual runs [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) [13:57:29] (03PS1) 10Urbanecm: skwiki: Create autopatrolled and patroller groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012378 (https://phabricator.wikimedia.org/T353980) [13:57:31] (03PS1) 10Urbanecm: skwiki: Enable RC patrol [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012379 (https://phabricator.wikimedia.org/T353980) [13:58:01] !log urbanecm@deploy2002 Started scap: Backport for [[gerrit:1011284|[Growth] frwiki: Enable personalized praise backend (T360152)]] [13:58:06] T360152: Deploy Personal Praise at French Wikipedia - https://phabricator.wikimedia.org/T360152 [13:58:41] (03PS2) 10Urbanecm: skwiki: Create autopatrolled and patroller groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012378 (https://phabricator.wikimedia.org/T353980) [13:58:48] (03PS2) 10Urbanecm: skwiki: Enable RC patrol [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012379 (https://phabricator.wikimedia.org/T353980) [13:59:00] (03CR) 10Urbanecm: [C:03+2] skwiki: Create autopatrolled and patroller groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012378 (https://phabricator.wikimedia.org/T353980) (owner: 10Urbanecm) [14:00:04] (03Merged) 10jenkins-bot: skwiki: Create autopatrolled and patroller groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012378 (https://phabricator.wikimedia.org/T353980) (owner: 10Urbanecm) [14:00:09] !log jmm@cumin2002 END (FAIL) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=1) rolling restart_daemons on A:restbase-eqiad [14:00:10] (03CR) 10CI reject: [V:04-1] mysqld-exporter-config: simplify manual runs [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) (owner: 10Arnaudb) [14:00:15] !log urbanecm@deploy2002 urbanecm: Backport for [[gerrit:1011284|[Growth] frwiki: Enable personalized praise backend (T360152)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [14:00:24] (03PS24) 10Arnaudb: mysqld-exporter-config: simplify manual runs [puppet] - 10https://gerrit.wikimedia.org/r/984232 (https://phabricator.wikimedia.org/T327384) [14:03:25] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1011439 [14:03:30] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1011439 (owner: 10TrainBranchBot) [14:03:40] (KubernetesRsyslogDown) firing: rsyslog on mw1382:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1382 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [14:03:45] !log urbanecm@deploy2002 urbanecm: Continuing with sync [14:04:35] !log jmm@cumin2002 START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on P{restbase102[4-5]*} and A:restbase [14:04:46] (03CR) 10Urbanecm: [C:03+2] skwiki: Enable RC patrol [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012379 (https://phabricator.wikimedia.org/T353980) (owner: 10Urbanecm) [14:04:59] (03PS2) 10Urbanecm: [Growth] frwiki: Enable personalized-praise module [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011285 (https://phabricator.wikimedia.org/T360152) [14:05:01] (03CR) 10Urbanecm: [C:03+2] [Growth] frwiki: Enable personalized-praise module [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011285 (https://phabricator.wikimedia.org/T360152) (owner: 10Urbanecm) [14:05:42] !log jmm@cumin2002 END (PASS) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=0) rolling restart_daemons on P{restbase102[4-5]*} and A:restbase [14:05:44] (03Merged) 10jenkins-bot: skwiki: Enable RC patrol [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012379 (https://phabricator.wikimedia.org/T353980) (owner: 10Urbanecm) [14:05:53] (03Merged) 10jenkins-bot: [Growth] frwiki: Enable personalized-praise module [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011285 (https://phabricator.wikimedia.org/T360152) (owner: 10Urbanecm) [14:06:45] (03PS1) 10Filippo Giunchedi: hieradata: remove ripe atlas ulsfo measurements [puppet] - 10https://gerrit.wikimedia.org/r/1012380 (https://phabricator.wikimedia.org/T325824) [14:07:25] XioNoX: I noticed icinga config was broken, ^ should fix it [14:08:30] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:08:40] (KubernetesRsyslogDown) resolved: rsyslog on mw1382:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw1382 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [14:08:47] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:10:08] (03CR) 10Ssingh: [V:03+1] "I just realized that this won't work at all. The reason is that we say $abuse_networks = network::parse_abuse_nets('varnish') but on P" [puppet] - 10https://gerrit.wikimedia.org/r/1007953 (https://phabricator.wikimedia.org/T358887) (owner: 10Ssingh) [14:11:00] (03PS1) 10Muehlenhoff: SREBatchBase: Provide an example in the help for --query [cookbooks] - 10https://gerrit.wikimedia.org/r/1012382 [14:11:07] !log jmm@cumin2002 START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on P{restbase102[6-9]*} and A:restbase [14:13:03] !log jmm@cumin2002 END (PASS) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=0) rolling restart_daemons on P{restbase102[6-9]*} and A:restbase [14:15:51] !log urbanecm@deploy2002 Finished scap: Backport for [[gerrit:1011284|[Growth] frwiki: Enable personalized praise backend (T360152)]] (duration: 17m 50s) [14:15:56] T360152: Deploy Personal Praise at French Wikipedia - https://phabricator.wikimedia.org/T360152 [14:16:31] !log urbanecm@deploy2002 Started scap: Backport for [[gerrit:1011285|[Growth] frwiki: Enable personalized-praise module (T360152)]], [[gerrit:1012378|skwiki: Create autopatrolled and patroller groups (T353980)]], [[gerrit:1012379|skwiki: Enable RC patrol (T353980)]] [14:16:37] T353980: New user groups on skwiki and RCPatrol on skwiki - https://phabricator.wikimedia.org/T353980 [14:17:20] (03CR) 10Ssingh: [C:03+1] varnish: Remove 10.68.0.0/16 reference [puppet] - 10https://gerrit.wikimedia.org/r/1004082 (owner: 10Majavah) [14:17:59] !log disable puppet on A:cp to rollout 1004082 [14:18:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:41] !log urbanecm@deploy2002 urbanecm: Backport for [[gerrit:1011285|[Growth] frwiki: Enable personalized-praise module (T360152)]], [[gerrit:1012378|skwiki: Create autopatrolled and patroller groups (T353980)]], [[gerrit:1012379|skwiki: Enable RC patrol (T353980)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [14:18:46] (03CR) 10Majavah: [C:03+2] varnish: Remove 10.68.0.0/16 reference [puppet] - 10https://gerrit.wikimedia.org/r/1004082 (owner: 10Majavah) [14:19:40] !log urbanecm@deploy2002 urbanecm: Continuing with sync [14:20:22] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:20:28] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:23:47] (03CR) 10Volans: [C:03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/1012382 (owner: 10Muehlenhoff) [14:23:53] (03PS16) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) [14:23:53] (03PS17) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455) [14:23:56] (03PS1) 10Andrew Bogott: eqiad pdns: move the 'puppet' alias to the new puppetserver [puppet] - 10https://gerrit.wikimedia.org/r/1012384 (https://phabricator.wikimedia.org/T351450) [14:24:56] !log re-enable puppet for 1004082 rollout after testing on cp3066, cp3080 [14:24:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:17] (03PS1) 10Muehlenhoff: Revert "More Tomcat 10 changes T357748" [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1012385 [14:28:17] (03PS1) 10Muehlenhoff: Revert "More postinst changes to cope with Tomcat 9->10 changes" [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1012386 [14:28:19] (03PS1) 10Muehlenhoff: Revert "Rebuild cas for Bookworm, and depend on tomcat 10 now" [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1012387 [14:30:08] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1011439 (owner: 10TrainBranchBot) [14:31:07] !log urbanecm@deploy2002 Finished scap: Backport for [[gerrit:1011285|[Growth] frwiki: Enable personalized-praise module (T360152)]], [[gerrit:1012378|skwiki: Create autopatrolled and patroller groups (T353980)]], [[gerrit:1012379|skwiki: Enable RC patrol (T353980)]] (duration: 14m 36s) [14:31:14] T360152: Deploy Personal Praise at French Wikipedia - https://phabricator.wikimedia.org/T360152 [14:31:14] godog: <3 I ran PCC but that wasn't enough [14:31:14] T353980: New user groups on skwiki and RCPatrol on skwiki - https://phabricator.wikimedia.org/T353980 [14:31:30] (03CR) 10Ayounsi: [C:03+1] hieradata: remove ripe atlas ulsfo measurements [puppet] - 10https://gerrit.wikimedia.org/r/1012380 (https://phabricator.wikimedia.org/T325824) (owner: 10Filippo Giunchedi) [14:31:51] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:31:57] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:33:36] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Revert "More Tomcat 10 changes T357748" [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1012385 (owner: 10Muehlenhoff) [14:37:16] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:37:42] (03PS2) 10Andrew Bogott: eqiad pdns: move the 'puppet' alias to the new puppetserver [puppet] - 10https://gerrit.wikimedia.org/r/1012384 (https://phabricator.wikimedia.org/T351450) [14:37:43] (03PS17) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) [14:37:48] (03PS18) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455) [14:37:56] (03PS1) 10Andrew Bogott: puppetserver: allow specifying an autosign script [puppet] - 10https://gerrit.wikimedia.org/r/1012389 (https://phabricator.wikimedia.org/T351450) [14:38:19] (03PS2) 10Andrew Bogott: puppetserver: allow specifying an autosign script [puppet] - 10https://gerrit.wikimedia.org/r/1012389 (https://phabricator.wikimedia.org/T351450) [14:38:20] (03PS3) 10Andrew Bogott: eqiad pdns: move the 'puppet' alias to the new puppetserver [puppet] - 10https://gerrit.wikimedia.org/r/1012384 (https://phabricator.wikimedia.org/T351450) [14:38:22] (03PS18) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) [14:38:28] (03PS19) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455) [14:38:36] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1012389 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [14:38:46] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Revert "Rebuild cas for Bookworm, and depend on tomcat 10 now" [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1012387 (owner: 10Muehlenhoff) [14:38:55] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Revert "More postinst changes to cope with Tomcat 9->10 changes" [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1012386 (owner: 10Muehlenhoff) [14:39:09] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:39:16] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:42:53] (03PS1) 10Majavah: P:toolforge: move webservice CLI to the CLI profile [puppet] - 10https://gerrit.wikimedia.org/r/1012390 (https://phabricator.wikimedia.org/T314664) [14:42:56] (03PS1) 10Majavah: P:toolforge::bastion: remove tekton component [puppet] - 10https://gerrit.wikimedia.org/r/1012391 [14:43:49] (03PS3) 10Majavah: P:dumps::distribution::nfs: use networks class for WMCS network ranges [puppet] - 10https://gerrit.wikimedia.org/r/1007889 [14:44:18] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:44:24] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:45:23] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1651/co" [puppet] - 10https://gerrit.wikimedia.org/r/1007889 (owner: 10Majavah) [14:45:37] (03CR) 10Majavah: [C:04-1] "-1 for the use of `defined()`, and a question inline" [puppet] - 10https://gerrit.wikimedia.org/r/1012389 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [14:48:10] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:48:17] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:50:12] (03PS2) 10Majavah: P:toolforge: move webservice CLI to the CLI profile [puppet] - 10https://gerrit.wikimedia.org/r/1012390 (https://phabricator.wikimedia.org/T314664) [14:50:12] (03PS2) 10Majavah: P:toolforge::bastion: remove tekton component [puppet] - 10https://gerrit.wikimedia.org/r/1012391 [14:50:48] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:50:55] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:51:41] (03PS3) 10Andrew Bogott: puppetserver: allow specifying an autosign script [puppet] - 10https://gerrit.wikimedia.org/r/1012389 (https://phabricator.wikimedia.org/T351450) [14:51:45] (03PS4) 10Andrew Bogott: eqiad pdns: move the 'puppet' alias to the new puppetserver [puppet] - 10https://gerrit.wikimedia.org/r/1012384 (https://phabricator.wikimedia.org/T351450) [14:51:53] (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (NOOP 6 CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1012390 (https://phabricator.wikimedia.org/T314664) (owner: 10Majavah) [14:52:01] (03PS19) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) [14:52:09] (03PS20) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455) [14:52:25] (03PS1) 10Muehlenhoff: cas-overlay: Revert Tomcat10-related changes [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1012394 (https://phabricator.wikimedia.org/T357749) [14:54:31] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:54:37] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:55:47] (03CR) 10Muehlenhoff: [C:03+2] SREBatchBase: Provide an example in the help for --query [cookbooks] - 10https://gerrit.wikimedia.org/r/1012382 (owner: 10Muehlenhoff) [14:56:16] (03CR) 10Ssingh: [C:03+1] "Looks good, though I leave the grok expressions to your expertise!" [puppet] - 10https://gerrit.wikimedia.org/r/1009724 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur) [14:57:19] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] cas-overlay: Revert Tomcat10-related changes [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1012394 (https://phabricator.wikimedia.org/T357749) (owner: 10Muehlenhoff) [14:57:27] (03CR) 10Majavah: [C:03+1] puppetserver: allow specifying an autosign script (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1012389 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [14:57:35] (03CR) 10Filippo Giunchedi: [C:03+2] hieradata: remove ripe atlas ulsfo measurements [puppet] - 10https://gerrit.wikimedia.org/r/1012380 (https://phabricator.wikimedia.org/T325824) (owner: 10Filippo Giunchedi) [14:57:50] XioNoX: yeah this case is totally outside PCC's reach :( [14:57:57] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:58:03] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:58:46] (03CR) 10Andrew Bogott: [C:03+2] puppetserver: allow specifying an autosign script [puppet] - 10https://gerrit.wikimedia.org/r/1012389 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [14:58:58] (03CR) 10Andrew Bogott: [C:03+2] eqiad pdns: move the 'puppet' alias to the new puppetserver [puppet] - 10https://gerrit.wikimedia.org/r/1012384 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [14:59:23] 10ops-codfw, 06SRE, 06Infrastructure-Foundations, 10netops: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9638240 (10Papaul) [14:59:38] (03CR) 10Fabfur: [V:03+1 C:03+2] benthos/haproxy: fix parsing for possible missing headers [puppet] - 10https://gerrit.wikimedia.org/r/1009724 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur) [14:59:57] !log disable puppet on A:dns-rec to merge CR 1009316 [15:00:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:02] (03CR) 10Fabfur: [V:03+1 C:03+2] benthos: fixe metadata field [puppet] - 10https://gerrit.wikimedia.org/r/1009722 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur) [15:00:26] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:00:32] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:00:54] !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org [15:01:06] (03CR) 10Ssingh: [C:03+2] Make auth NSID distinct from recdns on same host [puppet] - 10https://gerrit.wikimedia.org/r/1009316 (owner: 10BBlack) [15:02:12] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 10Puppet (Puppet 7.0): Spicerack puppetserver.destroy() raises an exception when certificate does not exist - https://phabricator.wikimedia.org/T360293#9638262 (10taavi) If there would be a method to check whether a certificate exists or not we could... [15:02:16] (JobUnavailable) firing: (3) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:02:31] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:02:37] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:03:25] (03PS1) 10Muehlenhoff: Bump version number to supercede old versions [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1012395 [15:06:03] 10ops-codfw, 06DC-Ops, 10cloud-services-team (Hardware): Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9638287 (10Jhancock.wm) [15:06:09] 10ops-codfw, 06DC-Ops, 10cloud-services-team (Hardware): Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9638286 (10Jhancock.wm) @Andrew we've received these servers. Could you update this ticket with racking requirements and names of the servers? [15:06:31] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Bump version number to supercede old versions [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/1012395 (owner: 10Muehlenhoff) [15:07:26] !log sukhe@puppetmaster1001 conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org [15:10:08] !log sudo cumin -b1 -s120 "A:dns-rec and not P{dns6001*}" "run-puppet-agent --enable 'merging CR 1009316'" [15:10:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:22:06] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 10Puppet (Puppet 7.0): Spicerack puppetserver.destroy() raises an exception when certificate does not exist - https://phabricator.wikimedia.org/T360293#9638308 (10Volans) We do have `get_certificate_metadata()` that raises `spicerack.puppet.PuppetServ... [15:23:04] (03PS20) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) [15:23:05] (03PS21) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455) [15:23:07] (03PS1) 10Andrew Bogott: profile::puppetserver::wmcs: include validatelabsfqdn.py script [puppet] - 10https://gerrit.wikimedia.org/r/1012396 (https://phabricator.wikimedia.org/T351450) [15:27:54] (03CR) 10Andrew Bogott: [C:03+2] profile::puppetserver::wmcs: include validatelabsfqdn.py script [puppet] - 10https://gerrit.wikimedia.org/r/1012396 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [15:29:32] 10ops-codfw, 06SRE, 06Infrastructure-Foundations, 10netops: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9638343 (10Papaul) [15:30:04] jan_drewniak: I, the Bot under the Fountain, call upon thee, The Deployer, to do Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240318T1530). [15:57:33] !log uploaded cas 6.6.12+wmf12u4 (rebuild with/for tomcat9) T357748 [15:57:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:38] T357748: Migrate CAS to Bookworm - https://phabricator.wikimedia.org/T357748 [16:00:26] !log installing squid security updates [16:00:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:48] !log jmm@cumin2002 START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on P{restbase103*} and A:restbase [16:04:09] (03PS1) 10Papaul: Remove asw-a-codfw from monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1012402 (https://phabricator.wikimedia.org/T358244) [16:06:42] !log jmm@cumin2002 END (PASS) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=0) rolling restart_daemons on P{restbase103*} and A:restbase [16:07:36] (03CR) 10Ayounsi: [C:03+2] "LGTM, I also checked that https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?hostgroup=asw-a-codfw&style=overview was empty" [puppet] - 10https://gerrit.wikimedia.org/r/1012402 (https://phabricator.wikimedia.org/T358244) (owner: 10Papaul) [16:07:59] (03CR) 10Ayounsi: [C:03+1] Remove asw-a-codfw from monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1012402 (https://phabricator.wikimedia.org/T358244) (owner: 10Papaul) [16:09:29] (03CR) 10Papaul: [C:03+2] Remove asw-a-codfw from monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1012402 (https://phabricator.wikimedia.org/T358244) (owner: 10Papaul) [16:10:16] !log jmm@cumin2002 START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on P{restbase104*} and A:restbase [16:11:15] (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 49.91% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [16:11:48] !log jmm@cumin2002 END (PASS) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=0) rolling restart_daemons on P{restbase104*} and A:restbase [16:11:58] (SystemdUnitFailed) firing: (2) generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:14:46] 10ops-codfw, 06SRE, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9638559 (10Papaul) [16:16:15] (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 48.72% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [16:18:26] 10ops-codfw, 06SRE, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9638580 (10Papaul) [16:18:39] 10ops-codfw, 06SRE, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9638581 (10Papaul) [16:19:43] 10ops-codfw, 06SRE, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9638602 (10Papaul) [16:22:01] !log hashar@deploy2002 Started deploy [integration/docroot@b2c74b7]: doc: add Blubber - T352262 [16:22:08] !log hashar@deploy2002 Finished deploy [integration/docroot@b2c74b7]: doc: add Blubber - T352262 (duration: 00m 06s) [16:22:12] T352262: Review supporting deployment pipeline documentation - https://phabricator.wikimedia.org/T352262 [16:22:59] (03PS1) 10Elukey: profile::prometheus::k8s: move istio metrics to a separate job [puppet] - 10https://gerrit.wikimedia.org/r/1012404 (https://phabricator.wikimedia.org/T351390) [16:23:08] jouncebot: nowandnext [16:23:08] No deployments scheduled for the next 0 hour(s) and 36 minute(s) [16:23:08] In 0 hour(s) and 36 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240318T1700) [16:23:08] In 0 hour(s) and 36 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240318T1700) [16:25:15] (PHPFPMTooBusy) firing: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 49.06% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [16:27:37] !log repooling cp4037 for very short time to collect HAProxy logs with Benthos (T358109) [16:27:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:27:43] T358109: Install new Benthos instance on cp hosts - https://phabricator.wikimedia.org/T358109 [16:28:55] !log fabfur@cumin1002 conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet [16:30:00] !log fabfur@cumin1002 conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet [16:30:15] (PHPFPMTooBusy) resolved: Not enough idle PHP-FPM workers for Mediawiki mw-parsoid at codfw: 49.06% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [16:31:20] (03CR) 10Elukey: "Hi folks! Not sure if the patch is correct, lemme know if it makes sense or not :)" [puppet] - 10https://gerrit.wikimedia.org/r/1012404 (https://phabricator.wikimedia.org/T351390) (owner: 10Elukey) [16:37:26] (RoutinatorRsyncErrors) firing: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [16:41:41] 06SRE, 10Wikimedia-Mailing-lists: Subscribe Elton to Internal mailing list for Meta-Wiki oversighters - https://phabricator.wikimedia.org/T360263#9638760 (10Dzahn) Can't you simply subscribe yourself using the form on https://lists.wikimedia.org/postorius/lists/meta-oversight.lists.wikimedia.org/ ? Or is it t... [16:42:26] (RoutinatorRsyncErrors) resolved: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [16:43:35] (03CR) 10SBassett: "> It would need to go into docroot/standard-docroot for most of the sites, and then docroot/mediawiki.org and docroot/wikimediafoundation." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1010971 (https://phabricator.wikimedia.org/T337949) (owner: 10Mmartorana) [16:45:56] 06SRE, 10Wikimedia-Mailing-lists: Subscribe Elton to Internal mailing list for Meta-Wiki oversighters - https://phabricator.wikimedia.org/T360263#9638779 (10Dzahn) I checked on `oversight-meta@wikimedia.org` and it's a VRTS (formerly OTRS) queue, not a mailman list or Google inbox. So this looks like it should... [17:00:01] (03PS8) 10SBassett: Remove X-Webkit-CSP-Report-Only response header from foundationwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1003108 (https://phabricator.wikimedia.org/T357479) (owner: 10TheDJ) [17:00:07] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240318T1700) [17:00:07] ryankemper: May I have your attention please! Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240318T1700) [17:20:25] (03PS1) 10Fabfur: benthos: drop ssl handshake errors early in the pipeline [puppet] - 10https://gerrit.wikimedia.org/r/1012413 (https://phabricator.wikimedia.org/T359627) [17:27:08] (03CR) 10Fabfur: [V:03+1] "PCC SUCCESS (NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/1655/console" [puppet] - 10https://gerrit.wikimedia.org/r/1012413 (https://phabricator.wikimedia.org/T359627) (owner: 10Fabfur) [17:35:05] 06SRE, 10SRE-Access-Requests, 06Infrastructure-Foundations: Request access to servers Dcops group - https://phabricator.wikimedia.org/T360356 (10Jclark-ctr) 03NEW [17:36:37] (03PS21) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) [17:36:37] (03PS22) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455) [17:36:39] (03PS1) 10Andrew Bogott: puppetserver: allow specifying certname in [server] conf [puppet] - 10https://gerrit.wikimedia.org/r/1012414 (https://phabricator.wikimedia.org/T351450) [17:37:22] 06SRE, 10SRE-Access-Requests, 06Infrastructure-Foundations: Request access to servers Dcops group - https://phabricator.wikimedia.org/T360356#9639193 (10Jclark-ctr) p:05Triage→03Medium [17:38:14] (03CR) 10CI reject: [V:04-1] puppetserver: allow specifying certname in [server] conf [puppet] - 10https://gerrit.wikimedia.org/r/1012414 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [17:41:15] (03PS2) 10Andrew Bogott: puppetserver: allow specifying certname in [server] conf [puppet] - 10https://gerrit.wikimedia.org/r/1012414 (https://phabricator.wikimedia.org/T351450) [17:41:16] (03PS22) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) [17:41:18] (03PS23) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455) [17:41:41] (03PS2) 10Fabfur: benthos: drop ssl handshake errors early in the pipeline [puppet] - 10https://gerrit.wikimedia.org/r/1012413 (https://phabricator.wikimedia.org/T359627) [17:41:45] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1012414 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [17:42:28] (03PS3) 10Andrew Bogott: puppetserver: allow specifying certname in [server] conf [puppet] - 10https://gerrit.wikimedia.org/r/1012414 (https://phabricator.wikimedia.org/T351450) [17:42:29] (03PS23) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) [17:42:36] (03PS24) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455) [17:42:44] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1012414 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [17:46:13] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:46:20] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:46:34] (03CR) 10CI reject: [V:04-1] puppetserver: allow specifying certname in [server] conf [puppet] - 10https://gerrit.wikimedia.org/r/1012414 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [17:47:25] (SystemdUnitFailed) firing: (2) rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:47:47] 06SRE, 06collaboration-services, 10Stewards-Onboarding-Tool, 10Wikimedia-Mailing-lists: stewards1001 / stewards2001: Enable API access for Mailman3 - https://phabricator.wikimedia.org/T351202#9639226 (10Dzahn) [17:48:28] 06SRE, 06collaboration-services, 10Stewards-Onboarding-Tool, 10Wikimedia-Mailing-lists: stewards1001 / stewards2001: Enable API access for Mailman3 - https://phabricator.wikimedia.org/T351202#9639225 (10Dzahn) Following up on the last comment by Legoktm. I am also in favor of this option and would be happy... [17:48:48] (03PS4) 10Andrew Bogott: puppetserver: allow specifying certname in [server] conf [puppet] - 10https://gerrit.wikimedia.org/r/1012414 (https://phabricator.wikimedia.org/T351450) [17:48:48] (03PS24) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) [17:48:53] (03PS25) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455) [17:50:55] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1012414 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [17:54:23] 10ops-codfw, 06SRE, 06Infrastructure-Foundations, 10netops: Decom asw-a-codfw switch stack - https://phabricator.wikimedia.org/T358244#9639269 (10Papaul) Zeroize done on asw-a1 setups: - delete the member from the master - Disconnect both cable going to asw-a2 and asw-a7 - while login into to console r... [17:54:56] (03PS3) 10Fabfur: benthos: drop ssl handshake errors early in the pipeline [puppet] - 10https://gerrit.wikimedia.org/r/1012413 (https://phabricator.wikimedia.org/T359627) [17:55:58] (03CR) 10Ssingh: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1012413 (https://phabricator.wikimedia.org/T359627) (owner: 10Fabfur) [17:56:51] !log kafka-logging1001:~# kafka reassign-partitions -reassignment-json-file mediawiki.httpd.accesslog.json --execute --throttle 50000000 T326419 [17:56:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:56:56] T326419: Expand kafka-logging using hosts kafka-logging[12]00[45] - https://phabricator.wikimedia.org/T326419 [17:58:00] (03PS4) 10FNegri: [wmcs-backup] Fix parsing of exclude_volumes [puppet] - 10https://gerrit.wikimedia.org/r/1009787 (https://phabricator.wikimedia.org/T359192) [17:59:15] (03CR) 10Fabfur: [C:03+2] benthos: drop ssl handshake errors early in the pipeline [puppet] - 10https://gerrit.wikimedia.org/r/1012413 (https://phabricator.wikimedia.org/T359627) (owner: 10Fabfur) [18:02:02] (03CR) 10CI reject: [V:04-1] [wmcs-backup] Fix parsing of exclude_volumes [puppet] - 10https://gerrit.wikimedia.org/r/1009787 (https://phabricator.wikimedia.org/T359192) (owner: 10FNegri) [18:03:03] (KafkaUnderReplicatedPartitions) firing: Under replicated partitions for Kafka cluster logging-eqiad in eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=eqiad%20prometheus/ops&var-kafka_cluster=logging-eqiad - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions [18:03:28] thats me ^ acking [18:06:54] (03PS5) 10FNegri: [wmcs-backup] Fix parsing of exclude_volumes [puppet] - 10https://gerrit.wikimedia.org/r/1009787 (https://phabricator.wikimedia.org/T359192) [18:08:29] (03PS2) 10RhinosF1: Throttle: add event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011454 (https://phabricator.wikimedia.org/T360357) [18:08:55] (03PS3) 10RhinosF1: Throttle: add event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011454 (https://phabricator.wikimedia.org/T360357) [18:09:40] (03PS4) 10RhinosF1: Throttle: add event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011454 (https://phabricator.wikimedia.org/T360357) [18:10:35] (03PS1) 10Htriedman: T354456: 18 March 2024 update of ruwiki redacted pages [deployment-charts] - 10https://gerrit.wikimedia.org/r/1012418 [18:11:17] (03CR) 10RhinosF1: [C:04-1] "see task" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011454 (https://phabricator.wikimedia.org/T360357) (owner: 10RhinosF1) [18:12:59] jouncebot: nowandnext [18:12:59] No deployments scheduled for the next 1 hour(s) and 47 minute(s) [18:12:59] In 1 hour(s) and 47 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240318T2000) [18:13:35] 06SRE, 06collaboration-services, 10Stewards-Onboarding-Tool, 10Wikimedia-Mailing-lists: stewards1001 / stewards2001: Enable API access for Mailman3 - https://phabricator.wikimedia.org/T351202#9639341 (10Dzahn) Specifically the `mailman-wrapper syncmembers` command seems like the right thing for this: `... [18:15:54] !log fabfur@cumin1002 conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet [18:16:06] !log fabfur@cumin1002 conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet [18:23:28] (03CR) 10RhinosF1: Throttle: add event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011454 (https://phabricator.wikimedia.org/T360357) (owner: 10RhinosF1) [18:54:59] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:55:06] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:02:31] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [19:05:00] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:05:07] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:06:08] 06SRE, 10Wikimedia-Mailing-lists: Mailing list request for Igbo Wikimedians - https://phabricator.wikimedia.org/T360350#9639463 (10Ladsgroup) Is it a recognized UG (by AffCom)? [19:09:20] (03PS1) 10Jdlrobson: [phase 4] Projects with < 50 user scripts no longer share skin scripts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012425 (https://phabricator.wikimedia.org/T301212) [19:19:51] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:19:58] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:21:58] (SystemdUnitFailed) firing: (3) netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:39:46] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:39:52] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:43:40] (03PS1) 10Ahmon Dancy: static.php: Handle COPYING and CREDITS files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012427 (https://phabricator.wikimedia.org/T359643) [19:44:26] (03CR) 10CI reject: [V:04-1] static.php: Handle COPYING and CREDITS files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012427 (https://phabricator.wikimedia.org/T359643) (owner: 10Ahmon Dancy) [19:46:58] (SystemdUnitFailed) firing: (3) netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:47:25] (SystemdUnitFailed) firing: (2) rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:49:49] (03PS2) 10Ahmon Dancy: static.php: Handle COPYING and CREDITS files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012427 (https://phabricator.wikimedia.org/T359643) [19:52:25] (SystemdUnitFailed) firing: (2) rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:52:46] 06SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for Katie Coleman - https://phabricator.wikimedia.org/T360367 (10KColeman-WMF) 03NEW [20:00:04] RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240318T2000). Please do the needful. [20:00:04] RhinosF1 and Jdlrobson: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:00:31] I can deploy today [20:00:45] Jdlrobson: RhinosF1: are you around? [20:00:59] urbanecm: ye [20:01:04] yay [20:01:04] Mine can't be tested [20:01:15] (03CR) 10Urbanecm: [C:03+2] Throttle: add event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011454 (https://phabricator.wikimedia.org/T360357) (owner: 10RhinosF1) [20:01:19] yup yup, i know [20:02:27] o/ present [20:02:34] welcome :) [20:02:42] (03CR) 10Urbanecm: [C:03+2] [phase 4] Projects with < 50 user scripts no longer share skin scripts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012425 (https://phabricator.wikimedia.org/T301212) (owner: 10Jdlrobson) [20:02:58] (03Merged) 10jenkins-bot: Throttle: add event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1011454 (https://phabricator.wikimedia.org/T360357) (owner: 10RhinosF1) [20:04:41] (03Merged) 10jenkins-bot: [phase 4] Projects with < 50 user scripts no longer share skin scripts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012425 (https://phabricator.wikimedia.org/T301212) (owner: 10Jdlrobson) [20:05:39] !log urbanecm@deploy2002 Started scap: Backport for [[gerrit:1011454|Throttle: add event (T360357)]], [[gerrit:1012425|[phase 4] Projects with < 50 user scripts no longer share skin scripts (T301212)]] [20:05:46] T360357: Lift IP cap on 2024-04-05 for Editathon for commonswiki, eswiki and Wikidata - https://phabricator.wikimedia.org/T360357 [20:05:46] T301212: Vector-2022.js should no longer load legacy Vector site and user scripts/styles - https://phabricator.wikimedia.org/T301212 [20:07:28] 06SRE, 10Wikimedia-Mailing-lists: Mailing list request for Igbo Wikimedians - https://phabricator.wikimedia.org/T360350#9639766 (10Tochiprecious) >>! In T360350#9639463, @Ladsgroup wrote: > Is it a recognized UG (by AffCom)? Yes, it has been a recognized UG by AffCom since May 2018. [20:08:00] !log urbanecm@deploy2002 rhinosf1 and urbanecm and jdlrobson: Backport for [[gerrit:1011454|Throttle: add event (T360357)]], [[gerrit:1012425|[phase 4] Projects with < 50 user scripts no longer share skin scripts (T301212)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [20:08:10] Jdlrobson: can you test the change, please? [20:08:51] urbanecm: on it [20:10:01] urbanecm: lgtm please sync [20:10:06] !log urbanecm@deploy2002 rhinosf1 and urbanecm and jdlrobson: Continuing with sync [20:10:11] proceeding [20:17:45] (03PS1) 10Ahmon Dancy: Route /w/CREDITS and /w/COPYING to /w/static.php [puppet] - 10https://gerrit.wikimedia.org/r/1012439 (https://phabricator.wikimedia.org/T359643) [20:19:34] (03PS3) 10Ahmon Dancy: static.php: Handle COPYING and CREDITS files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012427 (https://phabricator.wikimedia.org/T359643) [20:19:42] (03CR) 10CI reject: [V:04-1] Route /w/CREDITS and /w/COPYING to /w/static.php [puppet] - 10https://gerrit.wikimedia.org/r/1012439 (https://phabricator.wikimedia.org/T359643) (owner: 10Ahmon Dancy) [20:22:05] !log urbanecm@deploy2002 Finished scap: Backport for [[gerrit:1011454|Throttle: add event (T360357)]], [[gerrit:1012425|[phase 4] Projects with < 50 user scripts no longer share skin scripts (T301212)]] (duration: 16m 26s) [20:22:11] T360357: Lift IP cap on 2024-04-05 for Editathon for commonswiki, eswiki and Wikidata - https://phabricator.wikimedia.org/T360357 [20:22:11] T301212: Vector-2022.js should no longer load legacy Vector site and user scripts/styles - https://phabricator.wikimedia.org/T301212 [20:22:18] Jdlrobson: RhinosF1: should be live [20:22:20] anything else? [20:22:35] urbanecm: nope, thanks for the magic [20:22:41] any time :) [20:54:54] (03PS4) 10Krinkle: static.php: Handle COPYING and CREDITS files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012427 (https://phabricator.wikimedia.org/T359643) (owner: 10Ahmon Dancy) [20:55:01] (03CR) 10Krinkle: [C:03+1] "LGTM, feel free to deploy anytime!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012427 (https://phabricator.wikimedia.org/T359643) (owner: 10Ahmon Dancy) [20:56:53] (03CR) 10Krinkle: Route /w/CREDITS and /w/COPYING to /w/static.php (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1012439 (https://phabricator.wikimedia.org/T359643) (owner: 10Ahmon Dancy) [21:00:04] Reedy, sbassett, Maryum, and manfredi: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240318T2100). [21:07:10] thank urbanecm [21:39:09] (03PS2) 10Ahmon Dancy: Route /w/CREDITS and /w/COPYING to /w/static.php [puppet] - 10https://gerrit.wikimedia.org/r/1012439 (https://phabricator.wikimedia.org/T359643) [21:41:44] (03CR) 10Ahmon Dancy: Route /w/CREDITS and /w/COPYING to /w/static.php (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1012439 (https://phabricator.wikimedia.org/T359643) (owner: 10Ahmon Dancy) [21:43:17] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:43:24] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:55:14] (03PS1) 10Jdlrobson: Enable night mode on pilot wikis in AMC mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012452 (https://phabricator.wikimedia.org/T359152) [21:55:22] (03CR) 10CI reject: [V:04-1] Enable night mode on pilot wikis in AMC mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012452 (https://phabricator.wikimedia.org/T359152) (owner: 10Jdlrobson) [21:55:28] (03PS2) 10Jdlrobson: Enable night mode on pilot wikis in AMC mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012452 (https://phabricator.wikimedia.org/T359152) [22:07:25] (SystemdUnitFailed) firing: (3) rsync-aptrepo-apt2001.wikimedia.org.service on apt1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:10:05] (03PS1) 10Jdlrobson: The new class should be present alongside the old class for all page views [skins/MinervaNeue] (wmf/1.42.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1011457 (https://phabricator.wikimedia.org/T359983) [22:10:21] (03PS3) 10Jdlrobson: Enable night mode on pilot wikis in AMC mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1012452 (https://phabricator.wikimedia.org/T359152) [22:15:25] (03PS1) 10Fabfur: benthos: added minor unit tests [puppet] - 10https://gerrit.wikimedia.org/r/1012453 (https://phabricator.wikimedia.org/T359626) [22:15:37] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:15:44] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:17:10] !log urbanecm@deploy2002 Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 13m 27s) [22:21:59] (SystemdUnitFailed) firing: (3) netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:26:39] (03CR) 10Krinkle: [C:03+1] Route /w/CREDITS and /w/COPYING to /w/static.php [puppet] - 10https://gerrit.wikimedia.org/r/1012439 (https://phabricator.wikimedia.org/T359643) (owner: 10Ahmon Dancy) [22:37:57] (03Abandoned) 10Andrew Bogott: puppetserver: allow specifying certname in [server] conf [puppet] - 10https://gerrit.wikimedia.org/r/1012414 (https://phabricator.wikimedia.org/T351450) (owner: 10Andrew Bogott) [22:38:31] (03CR) 10Andrew Bogott: [C:03+2] wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) (owner: 10Andrew Bogott) [22:39:51] (03PS25) 10Andrew Bogott: wmf_sink: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007445 (https://phabricator.wikimedia.org/T351455) [22:39:51] (03PS26) 10Andrew Bogott: wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455) [22:41:01] (03CR) 10Andrew Bogott: [C:03+2] wmcs-puppetcertleaks: Use puppet7 syntax [puppet] - 10https://gerrit.wikimedia.org/r/1007444 (https://phabricator.wikimedia.org/T351455) (owner: 10Andrew Bogott) [22:46:59] (SystemdUnitFailed) firing: (3) netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:02:31] (JobUnavailable) firing: (2) Reduced availability for job ldap in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [23:19:04] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:19:11] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:49:53] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:50:00] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:56:44] !log @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:56:50] !log @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply