[00:00:57] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P70072 and previous config saved to /var/cache/conftool/dbconfig/20241016-000057-ladsgroup.json [00:09:25] FIRING: [7x] SystemdUnitFailed: mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:13:42] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1080412 (owner: 10TrainBranchBot) [00:16:05] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70073 and previous config saved to /var/cache/conftool/dbconfig/20241016-001604-ladsgroup.json [00:16:09] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance [00:16:23] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance [00:16:30] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70074 and previous config saved to /var/cache/conftool/dbconfig/20241016-001629-ladsgroup.json [00:24:47] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70075 and previous config saved to /var/cache/conftool/dbconfig/20241016-002446-ladsgroup.json [00:39:54] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P70076 and previous config saved to /var/cache/conftool/dbconfig/20241016-003953-ladsgroup.json [00:55:04] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P70077 and previous config saved to /var/cache/conftool/dbconfig/20241016-005500-ladsgroup.json [01:10:11] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70078 and previous config saved to /var/cache/conftool/dbconfig/20241016-011010-ladsgroup.json [01:10:15] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance [01:10:29] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance [01:10:36] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70079 and previous config saved to /var/cache/conftool/dbconfig/20241016-011036-ladsgroup.json [01:17:47] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70080 and previous config saved to /var/cache/conftool/dbconfig/20241016-011747-ladsgroup.json [01:28:27] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70081 and previous config saved to /var/cache/conftool/dbconfig/20241016-012826-ladsgroup.json [01:28:31] T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742 [01:32:55] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P70082 and previous config saved to /var/cache/conftool/dbconfig/20241016-013254-ladsgroup.json [01:42:03] (03CR) 10RLazarus: mediawiki: add mw.name helper (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079443 (owner: 10Giuseppe Lavagetto) [01:43:34] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70083 and previous config saved to /var/cache/conftool/dbconfig/20241016-014333-ladsgroup.json [01:48:01] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P70084 and previous config saved to /var/cache/conftool/dbconfig/20241016-014801-ladsgroup.json [01:58:40] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70085 and previous config saved to /var/cache/conftool/dbconfig/20241016-015840-ladsgroup.json [02:03:08] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70086 and previous config saved to /var/cache/conftool/dbconfig/20241016-020308-ladsgroup.json [02:03:13] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance [02:03:27] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance [02:03:34] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70087 and previous config saved to /var/cache/conftool/dbconfig/20241016-020333-ladsgroup.json [02:09:36] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10232191 (10phaultfinder) [02:10:48] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70088 and previous config saved to /var/cache/conftool/dbconfig/20241016-021047-ladsgroup.json [02:13:47] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70089 and previous config saved to /var/cache/conftool/dbconfig/20241016-021347-ladsgroup.json [02:13:49] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance [02:13:51] T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742 [02:13:52] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance [02:13:59] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70090 and previous config saved to /var/cache/conftool/dbconfig/20241016-021358-ladsgroup.json [02:25:55] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P70091 and previous config saved to /var/cache/conftool/dbconfig/20241016-022554-ladsgroup.json [02:34:44] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10232196 (10phaultfinder) [02:37:13] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:41:02] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P70092 and previous config saved to /var/cache/conftool/dbconfig/20241016-024101-ladsgroup.json [02:56:08] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70093 and previous config saved to /var/cache/conftool/dbconfig/20241016-025608-ladsgroup.json [02:56:13] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance [02:56:27] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance [02:56:34] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70094 and previous config saved to /var/cache/conftool/dbconfig/20241016-025633-ladsgroup.json [03:02:13] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:03:45] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70095 and previous config saved to /var/cache/conftool/dbconfig/20241016-030345-ladsgroup.json [03:14:53] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10232319 (10phaultfinder) [03:18:53] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P70096 and previous config saved to /var/cache/conftool/dbconfig/20241016-031852-ladsgroup.json [03:34:00] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P70097 and previous config saved to /var/cache/conftool/dbconfig/20241016-033400-ladsgroup.json [03:34:52] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10232356 (10phaultfinder) [03:49:07] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70098 and previous config saved to /var/cache/conftool/dbconfig/20241016-034907-ladsgroup.json [03:49:12] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance [03:49:25] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance [03:49:32] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70099 and previous config saved to /var/cache/conftool/dbconfig/20241016-034932-ladsgroup.json [03:52:14] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70100 and previous config saved to /var/cache/conftool/dbconfig/20241016-035214-ladsgroup.json [03:52:18] T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742 [03:56:43] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70101 and previous config saved to /var/cache/conftool/dbconfig/20241016-035643-ladsgroup.json [04:01:15] !log pt1979@cumin2002 START - Cookbook sre.dns.netbox [04:05:41] !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002" [04:05:46] !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002" [04:05:47] !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [04:07:21] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70102 and previous config saved to /var/cache/conftool/dbconfig/20241016-040721-ladsgroup.json [04:09:25] FIRING: [7x] SystemdUnitFailed: mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:09:37] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10232374 (10phaultfinder) [04:10:44] (03CR) 10Hamish: Configure ContactPage and IPBE contact form on zhwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1072876 (https://phabricator.wikimedia.org/T359998) (owner: 10Hamish) [04:11:50] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P70103 and previous config saved to /var/cache/conftool/dbconfig/20241016-041150-ladsgroup.json [04:18:01] !log pt1979@cumin2002 START - Cookbook sre.dns.netbox [04:21:53] !log pt1979@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002" [04:21:57] !log pt1979@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002" [04:21:58] !log pt1979@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [04:22:28] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70104 and previous config saved to /var/cache/conftool/dbconfig/20241016-042227-ladsgroup.json [04:26:57] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P70105 and previous config saved to /var/cache/conftool/dbconfig/20241016-042657-ladsgroup.json [04:37:35] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70106 and previous config saved to /var/cache/conftool/dbconfig/20241016-043734-ladsgroup.json [04:37:37] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance [04:37:39] T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742 [04:37:50] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance [04:37:57] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70107 and previous config saved to /var/cache/conftool/dbconfig/20241016-043757-ladsgroup.json [04:42:04] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70108 and previous config saved to /var/cache/conftool/dbconfig/20241016-044204-ladsgroup.json [04:42:09] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance [04:42:11] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance [04:46:37] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance [04:46:51] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance [04:46:58] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70109 and previous config saved to /var/cache/conftool/dbconfig/20241016-044657-ladsgroup.json [04:53:57] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70110 and previous config saved to /var/cache/conftool/dbconfig/20241016-045356-ladsgroup.json [05:09:04] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P70111 and previous config saved to /var/cache/conftool/dbconfig/20241016-050904-ladsgroup.json [05:09:43] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10232406 (10phaultfinder) [05:19:04] (03PS1) 10Hamish: zhwiki: Revise contact page deprecated usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080429 [05:19:14] (03Abandoned) 10Hamish: zhwiki: Revise contact page field usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1078773 (owner: 10Hamish) [05:24:11] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P70112 and previous config saved to /var/cache/conftool/dbconfig/20241016-052411-ladsgroup.json [05:35:09] (03PS1) 10Santhosh: Update cxserver to 2024-10-15-033213-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080431 (https://phabricator.wikimedia.org/T357950) [05:36:00] (03CR) 10CI reject: [V:04-1] Update cxserver to 2024-10-15-033213-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080431 (https://phabricator.wikimedia.org/T357950) (owner: 10Santhosh) [05:38:04] (03PS2) 10Santhosh: Update cxserver to 2024-10-15-033213-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080431 (https://phabricator.wikimedia.org/T357950) [05:39:19] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70113 and previous config saved to /var/cache/conftool/dbconfig/20241016-053918-ladsgroup.json [05:39:23] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance [05:39:37] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance [05:39:44] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70114 and previous config saved to /var/cache/conftool/dbconfig/20241016-053943-ladsgroup.json [05:44:50] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10232418 (10phaultfinder) [05:45:45] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70115 and previous config saved to /var/cache/conftool/dbconfig/20241016-054544-ladsgroup.json [06:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241016T0600) [06:00:52] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P70116 and previous config saved to /var/cache/conftool/dbconfig/20241016-060051-ladsgroup.json [06:04:21] FIRING: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [06:07:15] FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid (k8s) 955.7ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [06:09:21] RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1006:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [06:09:44] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10232425 (10phaultfinder) [06:12:15] RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid (k8s) 880.1ms - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [06:14:02] (03CR) 10Brouberol: "I'm not well versed with promql, at least clearly less than you are, given you wrote these queries :). My personal feeling is that they co" [alerts] - 10https://gerrit.wikimedia.org/r/1077986 (https://phabricator.wikimedia.org/T370153) (owner: 10Tiziano Fogli) [06:16:00] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P70117 and previous config saved to /var/cache/conftool/dbconfig/20241016-061558-ladsgroup.json [06:17:03] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70118 and previous config saved to /var/cache/conftool/dbconfig/20241016-061703-ladsgroup.json [06:17:07] T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742 [06:31:07] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70119 and previous config saved to /var/cache/conftool/dbconfig/20241016-063107-ladsgroup.json [06:31:12] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance [06:31:26] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance [06:31:33] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70120 and previous config saved to /var/cache/conftool/dbconfig/20241016-063132-ladsgroup.json [06:32:10] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70121 and previous config saved to /var/cache/conftool/dbconfig/20241016-063210-ladsgroup.json [06:39:40] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70122 and previous config saved to /var/cache/conftool/dbconfig/20241016-063940-ladsgroup.json [06:47:17] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70123 and previous config saved to /var/cache/conftool/dbconfig/20241016-064717-ladsgroup.json [06:52:22] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, October 16 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080429 (owner: 10Hamish) [06:54:47] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P70124 and previous config saved to /var/cache/conftool/dbconfig/20241016-065447-ladsgroup.json [07:00:05] Amir1 and Urbanecm: OwO what's this, a deployment window?? UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241016T0700). nyaa~ [07:00:05] Hamishcz: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:00:22] im here :) [07:02:24] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70125 and previous config saved to /var/cache/conftool/dbconfig/20241016-070224-ladsgroup.json [07:02:26] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance [07:02:28] T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742 [07:02:34] (03CR) 10KartikMistry: [C:03+2] "Deploying this in staging to test. We can update patch if anything breaks there." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080431 (https://phabricator.wikimedia.org/T357950) (owner: 10Santhosh) [07:02:40] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance [07:02:47] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2181 (T371742)', diff saved to https://phabricator.wikimedia.org/P70126 and previous config saved to /var/cache/conftool/dbconfig/20241016-070246-ladsgroup.json [07:03:47] (03Merged) 10jenkins-bot: Update cxserver to 2024-10-15-033213-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080431 (https://phabricator.wikimedia.org/T357950) (owner: 10Santhosh) [07:05:11] Hamishcz: is your patch being deployed? [07:05:49] not really [07:05:56] (03CR) 10Volans: "reply inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1077661 (owner: 10Ayounsi) [07:06:48] OK. I can quick deploy cxserver then. [07:08:45] !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply [07:09:10] !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply [07:09:54] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P70127 and previous config saved to /var/cache/conftool/dbconfig/20241016-070954-ladsgroup.json [07:10:31] Hamishcz: I can deploy the config patch once you're done with cxserver, kart_ . [07:12:03] alright, should i schedule the patch to next window? if this is better for you [07:14:40] Hamishcz: we can still try for this window, if you're okay with sticking around... [07:15:07] (03CR) 10Awight: [C:03+1] "Thanks, looks right to me!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080429 (owner: 10Hamish) [07:16:31] yea im ok w/ that, just no more than 2 hrs lol [07:16:55] cuz dinner is more important than the patch [07:19:56] +1 :-p [07:20:15] kart_: please ping me when I can take over deployment [07:23:27] awight: sorry. Done. [07:25:01] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70128 and previous config saved to /var/cache/conftool/dbconfig/20241016-072501-ladsgroup.json [07:28:49] ack! [07:29:58] (03CR) 10TrainBranchBot: [C:03+2] "Approved by awight@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080429 (owner: 10Hamish) [07:30:45] (03Merged) 10jenkins-bot: zhwiki: Revise contact page deprecated usage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080429 (owner: 10Hamish) [07:31:11] (03PS2) 10Volans: Fix issues reported by pylint >3 [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078663 [07:31:39] !log awight@deploy2002 Started scap sync-world: Backport for [[gerrit:1080429|zhwiki: Revise contact page deprecated usage]] [07:32:12] Hamishcz: is it correct that the contact page is currently broken, before the patch? https://zh.wikipedia.org/wiki/Special:%E8%81%94%E7%B3%BB [07:33:18] yes, we have note configured this page [07:33:26] not* [07:34:03] !log awight@deploy2002 awight, hamishz: Backport for [[gerrit:1080429|zhwiki: Revise contact page deprecated usage]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [07:34:42] Hamishcz: patch is ready to test on k8s-mwdebug [07:35:28] awight: tested and LGTM, thanks [07:35:48] ack [07:35:50] !log awight@deploy2002 awight, hamishz: Continuing with sync [07:36:23] Hamishcz: Just out of curiosity, can you send a link to the working page? Special:Contact still fails for me, even with the patch. [07:36:48] https://zh.wikipedia.org/wiki/Special:Contact/IPBE [07:37:01] (03CR) 10Stevemunene: [C:03+1] "Looks good" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080039 (https://phabricator.wikimedia.org/T371874) (owner: 10Brouberol) [07:37:21] excellent, thanks! [07:37:45] (and TIL...) [07:37:47] (03CR) 10Stevemunene: [C:03+1] airflow: monitor the availability of the deployments [alerts] - 10https://gerrit.wikimedia.org/r/1080219 (https://phabricator.wikimedia.org/T377178) (owner: 10Brouberol) [07:38:34] (03CR) 10Ayounsi: [C:03+2] Netbox: better logging for scripts import [puppet] - 10https://gerrit.wikimedia.org/r/1080240 (owner: 10Ayounsi) [07:39:24] awight: thank you for handling the config change :] [07:39:40] no problem, if you look into enwp.org/Special:Contact, it is broken as well [07:39:48] as the real contact form is on other page [07:40:46] !log awight@deploy2002 Finished scap sync-world: Backport for [[gerrit:1080429|zhwiki: Revise contact page deprecated usage]] (duration: 09m 07s) [07:40:47] Interesting, yeah and the footer links to an unrelated page Wikipedia:Contact [07:41:00] enwp.org/Special:Contact/arbcom-blockappeal, if my memory is correct [07:41:02] !log UTC morning deployments done [07:41:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:41:10] ah yes it does work [07:42:22] hashar: happily! This deployment window will be much better for my schedule [07:42:29] where is the footer link exactly? i'm curious about it [07:42:58] I can see a link to another contact page, on zhwiki but no link to Special:Contact [07:47:15] FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid (k8s) 1.001s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:52:15] RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid (k8s) 1.001s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:56:34] 10ops-eqiad, 06SRE, 06DC-Ops: Repurposing 2x Decommissioned Servers for Phasing Out Puppet 5 - https://phabricator.wikimedia.org/T375000#10232583 (10VRiley-WMF) 05Open→03Resolved After speaking with Moritz at length on this, we have set aside 2 servers in the event that these need to be replaced. Clo... [07:58:14] (03CR) 10Brouberol: [C:03+2] airflow: monitor the availability of the deployments [alerts] - 10https://gerrit.wikimedia.org/r/1080219 (https://phabricator.wikimedia.org/T377178) (owner: 10Brouberol) [07:58:21] (03CR) 10Brouberol: [C:03+2] flink-operator: deply an image with fixes for recent OpenJDK vulns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080039 (https://phabricator.wikimedia.org/T371874) (owner: 10Brouberol) [07:59:30] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. [08:00:08] jeena and andre: It is that lovely time of the day again! You are hereby commanded to deploy MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241016T0800). [08:00:17] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. [08:01:27] !log brouberol@deploy2002 helmfile [eqiad] START helmfile.d/admin 'apply'. [08:01:51] (03PS1) 10Elukey: sre.hosts.provision: first refactor with vendor-specific classes [cookbooks] - 10https://gerrit.wikimedia.org/r/1080456 (https://phabricator.wikimedia.org/T365372) [08:02:25] !log brouberol@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'apply'. [08:03:23] (03PS1) 10Michael Große: fix(growthexperiments.pp): correct order of arguments for mwscript [puppet] - 10https://gerrit.wikimedia.org/r/1080453 (https://phabricator.wikimedia.org/T372337) [08:03:33] 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission scandium - https://phabricator.wikimedia.org/T376632#10232606 (10VRiley-WMF) a:03VRiley-WMF [08:03:52] !log brouberol@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'. [08:04:21] !log brouberol@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'. [08:04:31] (03PS2) 10Elukey: sre.hosts.provision: first refactor with vendor-specific classes [cookbooks] - 10https://gerrit.wikimedia.org/r/1080456 (https://phabricator.wikimedia.org/T365372) [08:05:00] 10ops-eqiad, 06SRE, 06DC-Ops, 10decommission-hardware: decommission scandium - https://phabricator.wikimedia.org/T376632#10232607 (10VRiley-WMF) 05Open→03Resolved [08:05:15] !log brouberol@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'. [08:07:04] !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART [08:07:17] !log elukey@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART [08:09:25] FIRING: [7x] SystemdUnitFailed: mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:13:21] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T376235#10232636 (10VRiley-WMF) 05Open→03Resolved a:03VRiley-WMF Multiple attempts to rebalance power. We may not be able to rebalance it until a server is decomissioned or even moved out of the rack. We may nee... [08:18:31] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops: Disk (sdv) failed on ms-be1065 - https://phabricator.wikimedia.org/T376775#10232669 (10MatthewVernon) 05Resolved→03Open @Jclark-ctr please do replace the disk at your earliest convenience - the server is ready for the disk swap. [I've reopened this ta... [08:28:11] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be208[1-8] - https://phabricator.wikimedia.org/T371400#10232705 (10MatthewVernon) @Papaul the problem looks to be that the kernel can't see any of the spinning disks on this system, only the two SSDs: ` [ 3... [08:29:44] (03CR) 10JMeybohm: [C:03+2] containerd: Remove container log line length limit [puppet] - 10https://gerrit.wikimedia.org/r/1080071 (https://phabricator.wikimedia.org/T377132) (owner: 10JMeybohm) [08:29:48] (03CR) 10JMeybohm: [C:03+2] wikikube: Prepare clusters for containerd workers [puppet] - 10https://gerrit.wikimedia.org/r/1079970 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm) [08:32:59] (03PS30) 10Arnaudb: mariadb: pii cleaner cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1080129 (https://phabricator.wikimedia.org/T366146) [08:34:08] (03PS9) 10Arnaudb: mysql_legacy: double quote escape in run_query [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) [08:34:31] 10SRE-swift-storage, 06Commons, 10MediaWiki-File-management, 06Traffic: Commons' file is inaccessible for some users - https://phabricator.wikimedia.org/T377202#10232734 (10MatthewVernon) It's not in general a consistent problem, no - mostly MW manages to successfully upload two copies (one to eqiad, one t... [08:36:31] (03CR) 10Ayounsi: [C:03+2] re-image: ask user about migrating to per-rack vlan/IP [cookbooks] - 10https://gerrit.wikimedia.org/r/1080012 (owner: 10Ayounsi) [08:36:31] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance [08:36:36] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2181 (T371742)', diff saved to https://phabricator.wikimedia.org/P70129 and previous config saved to /var/cache/conftool/dbconfig/20241016-083636-ladsgroup.json [08:36:40] T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742 [08:36:45] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance [08:36:52] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2136 (T376905)', diff saved to https://phabricator.wikimedia.org/P70130 and previous config saved to /var/cache/conftool/dbconfig/20241016-083651-ladsgroup.json [08:43:12] (03CR) 10CI reject: [V:04-1] mysql_legacy: double quote escape in run_query [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) (owner: 10Arnaudb) [08:46:27] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2136 (T376905)', diff saved to https://phabricator.wikimedia.org/P70131 and previous config saved to /var/cache/conftool/dbconfig/20241016-084626-ladsgroup.json [08:47:52] (03PS1) 10JMeybohm: wikikube-staging: Migrate control planes to containerd [puppet] - 10https://gerrit.wikimedia.org/r/1080554 (https://phabricator.wikimedia.org/T362408) [08:48:08] (03PS2) 10JMeybohm: wikikube-staging: Migrate control planes to containerd [puppet] - 10https://gerrit.wikimedia.org/r/1080554 (https://phabricator.wikimedia.org/T362408) [08:48:23] (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1080554 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm) [08:48:51] (03Merged) 10jenkins-bot: re-image: ask user about migrating to per-rack vlan/IP [cookbooks] - 10https://gerrit.wikimedia.org/r/1080012 (owner: 10Ayounsi) [08:50:06] (03PS1) 10Ilias Sarantopoulos: ml-services: enable mp for fiwiki-damaging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080555 (https://phabricator.wikimedia.org/T363336) [08:51:43] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70132 and previous config saved to /var/cache/conftool/dbconfig/20241016-085143-ladsgroup.json [08:53:26] (03CR) 10Kevin Bazira: [C:03+1] ml-services: enable mp for fiwiki-damaging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080555 (https://phabricator.wikimedia.org/T363336) (owner: 10Ilias Sarantopoulos) [08:54:00] (03CR) 10Kevin Bazira: [C:03+2] ml-services: enable mp for fiwiki-damaging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080555 (https://phabricator.wikimedia.org/T363336) (owner: 10Ilias Sarantopoulos) [08:54:17] 10SRE-swift-storage, 06Commons, 10MediaWiki-File-management, 06Traffic: Commons' file is inaccessible for some users - https://phabricator.wikimedia.org/T377202#10232792 (10TheDJ) Maybe the filerepo layer should actively confirm if the write happened to both servers, and kick of a retry job or something if... [08:55:02] (03Merged) 10jenkins-bot: ml-services: enable mp for fiwiki-damaging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080555 (https://phabricator.wikimedia.org/T363336) (owner: 10Ilias Sarantopoulos) [08:57:53] !log kevinbazira@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [08:59:27] (03CR) 10Ladsgroup: "it has a merge conflict, if you fix it. I'll do PCC and then merge it." [puppet] - 10https://gerrit.wikimedia.org/r/1078901 (https://phabricator.wikimedia.org/T376726) (owner: 10Kosta Harlan) [08:59:40] (03CR) 10Volans: "I did a quick pass over the python bits, I didn't spend too much on it, I think it might be simplified a bit though. Couple of suggestions" [puppet] - 10https://gerrit.wikimedia.org/r/1080276 (https://phabricator.wikimedia.org/T376291) (owner: 10Cathal Mooney) [09:01:34] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P70133 and previous config saved to /var/cache/conftool/dbconfig/20241016-090133-ladsgroup.json [09:02:30] (03CR) 10JMeybohm: [C:04-1] "Some suggestions:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079465 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto) [09:04:03] !log kevinbazira@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [09:05:05] (03CR) 10JMeybohm: [C:04-1] "Not sure if it makes sense, but would it be an option to change layout of the `configMaps` value to explicitly define the config map name," [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079465 (https://phabricator.wikimedia.org/T350793) (owner: 10Jelto) [09:06:25] (03Abandoned) 10Jelto: gitlab: set throttling policy to accept again [puppet] - 10https://gerrit.wikimedia.org/r/1073740 (https://phabricator.wikimedia.org/T366882) (owner: 10Jelto) [09:06:34] (03PS8) 10Kosta Harlan: dumps: Drop the globalblocks table dump [puppet] - 10https://gerrit.wikimedia.org/r/1078901 (https://phabricator.wikimedia.org/T376726) [09:06:50] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70134 and previous config saved to /var/cache/conftool/dbconfig/20241016-090650-ladsgroup.json [09:07:09] (03CR) 10Kosta Harlan: "Done, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1078901 (https://phabricator.wikimedia.org/T376726) (owner: 10Kosta Harlan) [09:07:49] (03CR) 10Volans: "Nice! the logic to split them LGTM. I didn't check all the implementation for now (the moving parts). I think we could also rename the var" [cookbooks] - 10https://gerrit.wikimedia.org/r/1080456 (https://phabricator.wikimedia.org/T365372) (owner: 10Elukey) [09:11:38] (03PS2) 10Elukey: registry: expand the HTTP Accept headers [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/1078345 [09:12:11] (03PS3) 10JMeybohm: wikikube: Remove explicit container_runtime config [puppet] - 10https://gerrit.wikimedia.org/r/1080554 (https://phabricator.wikimedia.org/T362408) [09:12:23] (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1080554 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm) [09:13:07] (03PS4) 10JMeybohm: wikikube: Remove explicit container_runtime config [puppet] - 10https://gerrit.wikimedia.org/r/1080554 (https://phabricator.wikimedia.org/T362408) [09:13:35] (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1080554 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm) [09:15:03] (03CR) 10Elukey: "Quicklink: https://github.com/opencontainers/image-spec/blob/main/image-index.md" [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/1078345 (owner: 10Elukey) [09:16:41] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P70136 and previous config saved to /var/cache/conftool/dbconfig/20241016-091640-ladsgroup.json [09:21:57] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2181 (T371742)', diff saved to https://phabricator.wikimedia.org/P70137 and previous config saved to /var/cache/conftool/dbconfig/20241016-092157-ladsgroup.json [09:21:59] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db2195.codfw.wmnet with reason: Maintenance [09:22:01] T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742 [09:22:13] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2195.codfw.wmnet with reason: Maintenance [09:22:20] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2195 (T371742)', diff saved to https://phabricator.wikimedia.org/P70138 and previous config saved to /var/cache/conftool/dbconfig/20241016-092219-ladsgroup.json [09:23:32] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "Can this be deployed at any time, or should the WikibaseMediaInfo (and WikibaseCirrusSearch?) change(s) be backported to the deployed bran" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080332 (https://phabricator.wikimedia.org/T377226) (owner: 10DCausse) [09:24:13] (03CR) 10Ladsgroup: mariadb: pii cleaner cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1080129 (https://phabricator.wikimedia.org/T366146) (owner: 10Arnaudb) [09:29:06] (03CR) 10DCausse: "could be merged any time it won't break anything but might be cleaner to wait for the dependent patch to be deployed so that this data is " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080332 (https://phabricator.wikimedia.org/T377226) (owner: 10DCausse) [09:31:48] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2136 (T376905)', diff saved to https://phabricator.wikimedia.org/P70139 and previous config saved to /var/cache/conftool/dbconfig/20241016-093147-ladsgroup.json [09:31:54] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance [09:32:08] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance [09:33:13] (03PS31) 10Arnaudb: mariadb: pii cleaner cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1080129 (https://phabricator.wikimedia.org/T366146) [09:33:30] (03CR) 10Arnaudb: mariadb: pii cleaner cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1080129 (https://phabricator.wikimedia.org/T366146) (owner: 10Arnaudb) [09:38:31] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance [09:38:45] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance [09:38:52] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2140 (T376905)', diff saved to https://phabricator.wikimedia.org/P70140 and previous config saved to /var/cache/conftool/dbconfig/20241016-093852-ladsgroup.json [09:40:14] (03CR) 10JMeybohm: [C:03+1] "While this changes the behavior of `get_tags()` I don't think that's an issue. Looking at the code I don't seen anything that relies on th" [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/1078345 (owner: 10Elukey) [09:41:47] (03CR) 10JMeybohm: [C:03+2] wikikube: Remove explicit container_runtime config [puppet] - 10https://gerrit.wikimedia.org/r/1080554 (https://phabricator.wikimedia.org/T362408) (owner: 10JMeybohm) [09:43:13] (03PS1) 10Giuseppe Lavagetto: sre.deploy: add cookbook to deploy hiddenparma [cookbooks] - 10https://gerrit.wikimedia.org/r/1080571 [09:46:10] (03CR) 10Btullis: airflow: monitor the availability of the deployments (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1080219 (https://phabricator.wikimedia.org/T377178) (owner: 10Brouberol) [09:50:33] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2140 (T376905)', diff saved to https://phabricator.wikimedia.org/P70141 and previous config saved to /var/cache/conftool/dbconfig/20241016-095032-ladsgroup.json [09:52:30] (03PS12) 10Arnaudb: mysql_legacy: double quote escape in run_query [software/spicerack] - 10https://gerrit.wikimedia.org/r/1078658 (https://phabricator.wikimedia.org/T376712) [09:58:13] (03CR) 10CI reject: [V:04-1] sre.deploy: add cookbook to deploy hiddenparma [cookbooks] - 10https://gerrit.wikimedia.org/r/1080571 (owner: 10Giuseppe Lavagetto) [10:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241016T1000) [10:05:40] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P70142 and previous config saved to /var/cache/conftool/dbconfig/20241016-100539-ladsgroup.json [10:07:09] (03CR) 10Ladsgroup: "This can be abandoned now?" [puppet] - 10https://gerrit.wikimedia.org/r/1013368 (https://phabricator.wikimedia.org/T347967) (owner: 10Cparle) [10:07:41] (03PS9) 10Ladsgroup: dumps: Drop the globalblocks table dump [puppet] - 10https://gerrit.wikimedia.org/r/1078901 (https://phabricator.wikimedia.org/T376726) (owner: 10Kosta Harlan) [10:07:48] (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1078901 (https://phabricator.wikimedia.org/T376726) (owner: 10Kosta Harlan) [10:08:06] (03PS3) 10Arnaudb: mariadb: get systemd status for instance [software/spicerack] - 10https://gerrit.wikimedia.org/r/1080019 (https://phabricator.wikimedia.org/T377129) [10:14:41] 10ops-eqiad, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T377317 (10phaultfinder) 03NEW [10:15:26] (03PS1) 10Hnowlan: Add chart for mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080583 (https://phabricator.wikimedia.org/T371701) [10:16:01] (03PS10) 10Ladsgroup: dumps: Drop the globalblocks table dump [puppet] - 10https://gerrit.wikimedia.org/r/1078901 (https://phabricator.wikimedia.org/T376726) (owner: 10Kosta Harlan) [10:16:03] (03CR) 10Ladsgroup: [C:03+2] dumps: Drop the globalblocks table dump [puppet] - 10https://gerrit.wikimedia.org/r/1078901 (https://phabricator.wikimedia.org/T376726) (owner: 10Kosta Harlan) [10:16:04] (03CR) 10Ladsgroup: [V:03+2 C:03+2] dumps: Drop the globalblocks table dump [puppet] - 10https://gerrit.wikimedia.org/r/1078901 (https://phabricator.wikimedia.org/T376726) (owner: 10Kosta Harlan) [10:16:18] (03CR) 10CI reject: [V:04-1] Add chart for mercurius [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080583 (https://phabricator.wikimedia.org/T371701) (owner: 10Hnowlan) [10:17:15] (03PS1) 10JMeybohm: role::etcd::v3::kubernetes is no more [puppet] - 10https://gerrit.wikimedia.org/r/1080584 (https://phabricator.wikimedia.org/T353464) [10:20:07] (03PS1) 10JMeybohm: kubernetes::master_stacked: Limit etcd access to localhost [puppet] - 10https://gerrit.wikimedia.org/r/1080585 (https://phabricator.wikimedia.org/T353464) [10:20:35] (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1080585 (https://phabricator.wikimedia.org/T353464) (owner: 10JMeybohm) [10:20:47] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P70143 and previous config saved to /var/cache/conftool/dbconfig/20241016-102046-ladsgroup.json [10:23:31] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "Thanks… I might backport it anyway in the next window if there’s nothing else to do – I wouldn’t mind getting this task resolved quicker :" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080332 (https://phabricator.wikimedia.org/T377226) (owner: 10DCausse) [10:27:33] (03CR) 10Elukey: "LGTM, should we also remove:" [puppet] - 10https://gerrit.wikimedia.org/r/1080584 (https://phabricator.wikimedia.org/T353464) (owner: 10JMeybohm) [10:28:56] (03CR) 10Elukey: [C:03+1] kubernetes::master_stacked: Limit etcd access to localhost [puppet] - 10https://gerrit.wikimedia.org/r/1080585 (https://phabricator.wikimedia.org/T353464) (owner: 10JMeybohm) [10:35:54] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2140 (T376905)', diff saved to https://phabricator.wikimedia.org/P70144 and previous config saved to /var/cache/conftool/dbconfig/20241016-103553-ladsgroup.json [10:35:59] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance [10:36:13] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance [10:36:20] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70145 and previous config saved to /var/cache/conftool/dbconfig/20241016-103620-ladsgroup.json [10:38:33] (03CR) 10DCausse: "no worries, I'll watch for the backports and try to schedule a deploy of this one asap, thanks for taking care of this cleanup! :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080332 (https://phabricator.wikimedia.org/T377226) (owner: 10DCausse) [10:47:42] (03PS9) 10Arnaudb: mariadb: get systemd status for instance [software/spicerack] - 10https://gerrit.wikimedia.org/r/1080019 (https://phabricator.wikimedia.org/T377129) [10:51:19] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2195 (T371742)', diff saved to https://phabricator.wikimedia.org/P70146 and previous config saved to /var/cache/conftool/dbconfig/20241016-105118-ladsgroup.json [10:51:23] T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742 [11:00:05] mvolz: I, the Bot under the Fountain, call upon thee, The Deployer, to do Services – Citoid / Zotero deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241016T1100). [11:06:26] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70147 and previous config saved to /var/cache/conftool/dbconfig/20241016-110625-ladsgroup.json [11:12:13] (03CR) 10Mvolz: [C:03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1066786 (owner: 10PipelineBot) [11:13:24] (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1066786 (owner: 10PipelineBot) [11:21:14] !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply [11:21:33] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70148 and previous config saved to /var/cache/conftool/dbconfig/20241016-112132-ladsgroup.json [11:22:47] !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply [11:23:46] (03CR) 10Sergio Gimeno: [C:03+1] Remove wgGEUseNewImpactModule config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1075196 (https://phabricator.wikimedia.org/T350077) (owner: 10Cyndywikime) [11:25:50] !log mvolz@deploy2002 helmfile [codfw] START helmfile.d/services/citoid: apply [11:26:22] !log mvolz@deploy2002 helmfile [codfw] DONE helmfile.d/services/citoid: apply [11:28:19] !log mvolz@deploy2002 helmfile [eqiad] START helmfile.d/services/citoid: apply [11:29:07] !log mvolz@deploy2002 helmfile [eqiad] DONE helmfile.d/services/citoid: apply [11:33:48] (03PS1) 10Kosta Harlan: ProofreadPage: Remove pagequality permission override [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080621 (https://phabricator.wikimedia.org/T326940) [11:36:14] (03CR) 10Brouberol: [C:03+2] airflow: monitor the availability of the deployments (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1080219 (https://phabricator.wikimedia.org/T377178) (owner: 10Brouberol) [11:36:40] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2195 (T371742)', diff saved to https://phabricator.wikimedia.org/P70149 and previous config saved to /var/cache/conftool/dbconfig/20241016-113639-ladsgroup.json [11:36:42] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance [11:36:44] T371742: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742 [11:36:46] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70150 and previous config saved to /var/cache/conftool/dbconfig/20241016-113645-ladsgroup.json [11:36:56] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance [11:38:03] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, October 16 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1075196 (https://phabricator.wikimedia.org/T350077) (owner: 10Cyndywikime) [11:38:18] (03PS7) 10Volans: sre.mysql.pool: add two new cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/1077101 (https://phabricator.wikimedia.org/T374026) [11:38:26] (03PS1) 10Brouberol: airflow: fix the hardcoded namespace [alerts] - 10https://gerrit.wikimedia.org/r/1080622 (https://phabricator.wikimedia.org/T377178) [11:38:37] (03CR) 10Volans: "Addressed comments. There are still a couple of comments pending reply from the reviewers." [cookbooks] - 10https://gerrit.wikimedia.org/r/1077101 (https://phabricator.wikimedia.org/T374026) (owner: 10Volans) [11:40:01] (03CR) 10CI reject: [V:04-1] airflow: fix the hardcoded namespace [alerts] - 10https://gerrit.wikimedia.org/r/1080622 (https://phabricator.wikimedia.org/T377178) (owner: 10Brouberol) [11:40:46] (03PS2) 10Brouberol: airflow: fix the hardcoded namespace [alerts] - 10https://gerrit.wikimedia.org/r/1080622 (https://phabricator.wikimedia.org/T377178) [11:45:44] Doing quick cxserver update on staging.. [11:46:54] (03PS1) 10KartikMistry: Update cxserver to 2024-10-16-114208-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080623 (https://phabricator.wikimedia.org/T357950) [11:51:53] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70152 and previous config saved to /var/cache/conftool/dbconfig/20241016-115152-ladsgroup.json [11:55:36] (03CR) 10Brouberol: [C:03+2] airflow: fix the hardcoded namespace [alerts] - 10https://gerrit.wikimedia.org/r/1080622 (https://phabricator.wikimedia.org/T377178) (owner: 10Brouberol) [11:58:19] (03Abandoned) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1066785 (owner: 10PipelineBot) [11:58:26] (03Abandoned) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1066773 (owner: 10PipelineBot) [11:58:33] (03Abandoned) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1066772 (owner: 10PipelineBot) [11:58:43] (03Abandoned) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1064024 (owner: 10PipelineBot) [11:58:52] (03Abandoned) 10Mvolz: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1064023 (owner: 10PipelineBot) [12:02:38] (03PS2) 10JMeybohm: role::etcd::v3::kubernetes is no more [puppet] - 10https://gerrit.wikimedia.org/r/1080584 (https://phabricator.wikimedia.org/T353464) [12:02:38] (03PS2) 10JMeybohm: kubernetes::master_stacked: Limit etcd access to localhost [puppet] - 10https://gerrit.wikimedia.org/r/1080585 (https://phabricator.wikimedia.org/T353464) [12:04:13] (03CR) 10JMeybohm: [C:03+2] kubernetes::master_stacked: Limit etcd access to localhost [puppet] - 10https://gerrit.wikimedia.org/r/1080585 (https://phabricator.wikimedia.org/T353464) (owner: 10JMeybohm) [12:04:14] (03CR) 10KartikMistry: [C:03+2] Update cxserver to 2024-10-16-114208-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080623 (https://phabricator.wikimedia.org/T357950) (owner: 10KartikMistry) [12:04:43] (03CR) 10JMeybohm: [C:03+2] role::etcd::v3::kubernetes is no more [puppet] - 10https://gerrit.wikimedia.org/r/1080584 (https://phabricator.wikimedia.org/T353464) (owner: 10JMeybohm) [12:05:17] (03Merged) 10jenkins-bot: Update cxserver to 2024-10-16-114208-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080623 (https://phabricator.wikimedia.org/T357950) (owner: 10KartikMistry) [12:07:00] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70153 and previous config saved to /var/cache/conftool/dbconfig/20241016-120659-ladsgroup.json [12:09:25] FIRING: [7x] SystemdUnitFailed: mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:14:51] !log kartik@deploy2002 helmfile [staging] START helmfile.d/services/cxserver: apply [12:15:15] !log kartik@deploy2002 helmfile [staging] DONE helmfile.d/services/cxserver: apply [12:16:44] Done ^ [12:22:06] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70154 and previous config saved to /var/cache/conftool/dbconfig/20241016-122206-ladsgroup.json [12:22:13] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance [12:22:27] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance [12:22:28] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance [12:22:41] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance [12:22:49] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70155 and previous config saved to /var/cache/conftool/dbconfig/20241016-122248-ladsgroup.json [12:27:02] (03PS2) 10Giuseppe Lavagetto: sre.deploy: add cookbook to deploy hiddenparma [cookbooks] - 10https://gerrit.wikimedia.org/r/1080571 [12:27:44] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "While change I87b05cb9bb, which removes the config variable, was only merged yesterday (and will be part of next week’s train), the variab" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1075196 (https://phabricator.wikimedia.org/T350077) (owner: 10Cyndywikime) [12:28:43] !log stevemunene@cumin1002 START - Cookbook sre.dns.netbox [12:29:06] (03PS8) 10Cathal Mooney: Authdns: add class to create zonefile snippets for K8s PTR delegation [puppet] - 10https://gerrit.wikimedia.org/r/1080276 (https://phabricator.wikimedia.org/T376291) [12:30:10] (03CR) 10Cathal Mooney: "Thanks @volans for the suggestions, new patchset uses them :)" [puppet] - 10https://gerrit.wikimedia.org/r/1080276 (https://phabricator.wikimedia.org/T376291) (owner: 10Cathal Mooney) [12:32:46] !log stevemunene@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly reassigned an-worker hosts in analytics eqiad - stevemunene@cumin1002" [12:32:50] !log stevemunene@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly reassigned an-worker hosts in analytics eqiad - stevemunene@cumin1002" [12:32:51] !log stevemunene@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [12:33:10] !log Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration [12:33:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:26] (03PS9) 10Cathal Mooney: Authdns: add class to create zonefile snippets for K8s PTR delegation [puppet] - 10https://gerrit.wikimedia.org/r/1080276 (https://phabricator.wikimedia.org/T376291) [12:34:48] !log stevemunene@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host an-worker1176 [12:35:19] !log stevemunene@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1176 [12:35:44] !log stevemunene@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host an-worker1177 [12:35:57] !log stevemunene@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1177 [12:36:31] (03CR) 10Cathal Mooney: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1080276 (https://phabricator.wikimedia.org/T376291) (owner: 10Cathal Mooney) [12:40:26] (03CR) 10CI reject: [V:04-1] sre.deploy: add cookbook to deploy hiddenparma [cookbooks] - 10https://gerrit.wikimedia.org/r/1080571 (owner: 10Giuseppe Lavagetto) [12:40:30] (03PS10) 10Cathal Mooney: Authdns: add class to create zonefile snippets for K8s PTR delegation [puppet] - 10https://gerrit.wikimedia.org/r/1080276 (https://phabricator.wikimedia.org/T376291) [12:42:36] (03CR) 10Arnaudb: "replies containing the essence of the various conversation I had with @Ladsgroup@gmail.com and @rcoccioli@wikimedia.org about those topics" [cookbooks] - 10https://gerrit.wikimedia.org/r/1077101 (https://phabricator.wikimedia.org/T374026) (owner: 10Volans) [12:43:03] !log stevemunene@cumin1002 START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye [12:43:42] (03PS11) 10Cathal Mooney: Authdns: add class to create zonefile snippets for K8s PTR delegation [puppet] - 10https://gerrit.wikimedia.org/r/1080276 (https://phabricator.wikimedia.org/T376291) [12:43:54] (03PS1) 10Lucas Werkmeister (WMDE): Hard-code LabelCountField::NAME [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080669 (https://phabricator.wikimedia.org/T377226) [12:43:59] (03PS1) 10Lucas Werkmeister (WMDE): Remove LabelCountField [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080670 (https://phabricator.wikimedia.org/T377226) [12:44:04] (03CR) 10Volans: "Thanks for clarifying the remaining comments. Then it's ready for a final review before starting to test it with real instances." [cookbooks] - 10https://gerrit.wikimedia.org/r/1077101 (https://phabricator.wikimedia.org/T374026) (owner: 10Volans) [12:44:22] (03PS1) 10Lucas Werkmeister (WMDE): Drop label_count field (LabelCountField) [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080671 (https://phabricator.wikimedia.org/T377226) [12:44:56] (03PS1) 10Lucas Werkmeister (WMDE): Hard-code LabelCountField::NAME [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080673 (https://phabricator.wikimedia.org/T377226) [12:45:02] (03CR) 10CI reject: [V:04-1] Remove LabelCountField [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080670 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [12:45:07] (03CR) 10CI reject: [V:04-1] Drop label_count field (LabelCountField) [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080671 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [12:45:13] (03PS1) 10Lucas Werkmeister (WMDE): Remove LabelCountField [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080674 (https://phabricator.wikimedia.org/T377226) [12:45:25] (03PS1) 10Lucas Werkmeister (WMDE): Drop label_count field (LabelCountField) [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080676 (https://phabricator.wikimedia.org/T377226) [12:45:37] (03PS3) 10Elukey: sre.hosts.provision: first refactor with vendor-specific classes [cookbooks] - 10https://gerrit.wikimedia.org/r/1080456 (https://phabricator.wikimedia.org/T365372) [12:46:02] !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART [12:46:10] (03CR) 10Jcrespo: ":-( This is mysql_legacy, of course!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1079536 (https://phabricator.wikimedia.org/T375144) (owner: 10Volans) [12:46:15] !log elukey@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART [12:46:35] !log elukey@cumin2002 START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL [12:47:23] !log elukey@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL [12:48:04] (03CR) 10Lucas Werkmeister (WMDE): "recheck" [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080670 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [12:48:23] (03CR) 10Lucas Werkmeister (WMDE): "recheck" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080671 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [12:48:27] (03CR) 10Elukey: "Makes sense, I've renamed the methods too!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1080456 (https://phabricator.wikimedia.org/T365372) (owner: 10Elukey) [12:49:28] (03CR) 10Elukey: "Tested with:" [cookbooks] - 10https://gerrit.wikimedia.org/r/1080456 (https://phabricator.wikimedia.org/T365372) (owner: 10Elukey) [12:49:47] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, October 16 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [extensions/WikiLambda] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1078762 (owner: 10Jforrester) [12:51:08] (03CR) 10Jcrespo: "See the comment. Otherwise I will just test it thoroughly and merge it ASAP." [cookbooks] - 10https://gerrit.wikimedia.org/r/1079536 (https://phabricator.wikimedia.org/T375144) (owner: 10Volans) [12:51:20] (03PS13) 10Ayounsi: redfish: add UEFI functions [software/spicerack] - 10https://gerrit.wikimedia.org/r/1077661 [12:51:57] (03CR) 10Ayounsi: redfish: add UEFI functions (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1077661 (owner: 10Ayounsi) [12:52:08] (03PS3) 10Giuseppe Lavagetto: sre.deploy: add cookbook to deploy hiddenparma [cookbooks] - 10https://gerrit.wikimedia.org/r/1080571 [12:52:08] (03PS1) 10Giuseppe Lavagetto: tox: run the multitude of linters only on one python version [cookbooks] - 10https://gerrit.wikimedia.org/r/1080687 [12:52:19] !log stevemunene@cumin1002 START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye [12:52:39] (03CR) 10CDanis: [C:03+1] tox: run the multitude of linters only on one python version [cookbooks] - 10https://gerrit.wikimedia.org/r/1080687 (owner: 10Giuseppe Lavagetto) [12:52:46] (03PS12) 10Cathal Mooney: Authdns: add class to create zonefile snippets for K8s PTR delegation [puppet] - 10https://gerrit.wikimedia.org/r/1080276 (https://phabricator.wikimedia.org/T376291) [12:54:24] (03CR) 10Jcrespo: sre.switchdc.databases.prepare: fix heartbeat (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1079536 (https://phabricator.wikimedia.org/T375144) (owner: 10Volans) [12:55:10] (03CR) 10Cathal Mooney: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1080276 (https://phabricator.wikimedia.org/T376291) (owner: 10Cathal Mooney) [12:58:26] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 12:00:00 on db2200.codfw.wmnet with reason: Maintenance [12:58:39] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2200.codfw.wmnet with reason: Maintenance [12:59:37] (03CR) 10Elukey: tox: run the multitude of linters only on one python version (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1080687 (owner: 10Giuseppe Lavagetto) [13:00:05] Lucas_WMDE, Urbanecm, awight, and TheresNoTime: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241016T1300). [13:00:05] Cyndywikime, Lucas_WMDE, and James_F: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:11] o/ [13:00:14] o/ [13:01:18] (03CR) 10Giuseppe Lavagetto: "For the record, 6 minutes for CI for such a change is still absolutely ridiculous and unacceptable. We will need to revisit further." [cookbooks] - 10https://gerrit.wikimedia.org/r/1080687 (owner: 10Giuseppe Lavagetto) [13:01:31] “Update Z669x references to Z609x” sounds like an obscure pair of those IBM Z architectures ^^ [13:01:34] anyway, I can deploy! [13:01:39] Lucas_WMDE: Quite. :-) [13:01:55] Thanks! [13:02:00] Cyndywikime: are you there? [13:02:50] let’s start with James_F then [13:02:51] yes [13:02:52] I can oversee Cyndy's change [13:02:56] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikiLambda] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1078762 (owner: 10Jforrester) [13:03:02] ah, okay ^^ [13:03:03] oh, jit :) [13:03:05] then let’s do the config change first! [13:03:11] and CI for the backport can finish in the meantime [13:03:11] :), hi sorry, didn't here the ping [13:03:22] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1075196 (https://phabricator.wikimedia.org/T350077) (owner: 10Cyndywikime) [13:04:06] (03Merged) 10jenkins-bot: Remove wgGEUseNewImpactModule config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1075196 (https://phabricator.wikimedia.org/T350077) (owner: 10Cyndywikime) [13:04:28] maaany submodules being fetched by scap [13:04:42] lots of changes in extensions’ REL1_41 branches apparently ^^ [13:04:42] !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1075196|Remove wgGEUseNewImpactModule config (T350077)]] [13:04:47] T350077: Drop support for the old Impact module - https://phabricator.wikimedia.org/T350077 [13:05:45] (03CR) 10CI reject: [V:04-1] Hard-code LabelCountField::NAME [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080669 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:05:49] (03CR) 10Giuseppe Lavagetto: tox: run the multitude of linters only on one python version (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1080687 (owner: 10Giuseppe Lavagetto) [13:05:52] (03PS2) 10Giuseppe Lavagetto: tox: run the multitude of linters only on one python version [cookbooks] - 10https://gerrit.wikimedia.org/r/1080687 [13:06:05] (03CR) 10CI reject: [V:04-1] Remove LabelCountField [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080674 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:06:12] noooo my backports :( [13:06:37] (03PS3) 10Volans: sre.switchdc.databases.prepare: add check [cookbooks] - 10https://gerrit.wikimedia.org/r/1074127 (https://phabricator.wikimedia.org/T371351) [13:06:38] (03PS3) 10Volans: sre.switchdc.databases: update Phabricator more [cookbooks] - 10https://gerrit.wikimedia.org/r/1074128 (https://phabricator.wikimedia.org/T371351) [13:06:38] (03PS2) 10Volans: sre.switchdc.databases.prepare: fix heartbeat [cookbooks] - 10https://gerrit.wikimedia.org/r/1079536 (https://phabricator.wikimedia.org/T375144) [13:06:38] (03PS3) 10Volans: sre.switchdc.databases: allow to select a section [cookbooks] - 10https://gerrit.wikimedia.org/r/1079537 (https://phabricator.wikimedia.org/T375144) [13:07:01] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, cyndywikime: Backport for [[gerrit:1075196|Remove wgGEUseNewImpactModule config (T350077)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [13:07:07] (03CR) 10Elukey: [C:03+1] tox: run the multitude of linters only on one python version [cookbooks] - 10https://gerrit.wikimedia.org/r/1080687 (owner: 10Giuseppe Lavagetto) [13:07:23] Cyndywikime: anything to test for your change? [13:07:34] (on WikimediaDebug, I mean) [13:07:54] (03CR) 10Volans: "addressed comment" [cookbooks] - 10https://gerrit.wikimedia.org/r/1079536 (https://phabricator.wikimedia.org/T375144) (owner: 10Volans) [13:07:56] (03CR) 10Lucas Werkmeister (WMDE): "CI is failing due to T377197 >.<" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080669 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:08:06] minor test to see nothing breaks [13:08:22] (03PS1) 10Lucas Werkmeister (WMDE): Tests: Skip testViewForExistingGlobalTemporaryAccount [extensions/CentralAuth] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080702 (https://phabricator.wikimedia.org/T377197) [13:08:27] (03Merged) 10jenkins-bot: Update Z669x references to Z609x [extensions/WikiLambda] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1078762 (owner: 10Jforrester) [13:08:36] (03PS1) 10Lucas Werkmeister (WMDE): Tests: Skip testViewForExistingGlobalTemporaryAccount [extensions/CentralAuth] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080703 (https://phabricator.wikimedia.org/T377197) [13:08:52] (03PS2) 10Lucas Werkmeister (WMDE): Hard-code LabelCountField::NAME [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080669 (https://phabricator.wikimedia.org/T377226) [13:08:57] (03PS2) 10Lucas Werkmeister (WMDE): Hard-code LabelCountField::NAME [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080673 (https://phabricator.wikimedia.org/T377226) [13:09:09] sounds good [13:11:33] change looks good [13:11:36] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, cyndywikime: Continuing with sync [13:11:39] alright, thanks! [13:13:10] (03CR) 10Giuseppe Lavagetto: [C:03+2] tox: run the multitude of linters only on one python version [cookbooks] - 10https://gerrit.wikimedia.org/r/1080687 (owner: 10Giuseppe Lavagetto) [13:16:11] Thanks @Lucas_WMDE :) [13:16:17] !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1075196|Remove wgGEUseNewImpactModule config (T350077)]] (duration: 11m 35s) [13:16:24] T350077: Drop support for the old Impact module - https://phabricator.wikimedia.org/T350077 [13:16:30] !log Started time limited scan on enwiki - https://wikitech.wikimedia.org/wiki/MediaModeration [13:16:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:37] 10ops-codfw, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: codfw:frack pfw3 and old fasw decommission - https://phabricator.wikimedia.org/T377254#10233448 (10Papaul) [13:16:38] np :) [13:16:40] -bash: scaap: command not found [13:16:41] heh [13:16:45] (03CR) 10CDanis: [C:03+1] sre.deploy: add cookbook to deploy hiddenparma (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1080571 (owner: 10Giuseppe Lavagetto) [13:16:56] !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1078762|Update Z669x references to Z609x]] [13:19:03] (03CR) 10Kosta Harlan: [C:03+1] Tests: Skip testViewForExistingGlobalTemporaryAccount [extensions/CentralAuth] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080702 (https://phabricator.wikimedia.org/T377197) (owner: 10Lucas Werkmeister (WMDE)) [13:19:12] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, jforrester: Backport for [[gerrit:1078762|Update Z669x references to Z609x]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [13:19:13] (03CR) 10Kosta Harlan: [C:03+1] Tests: Skip testViewForExistingGlobalTemporaryAccount [extensions/CentralAuth] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080703 (https://phabricator.wikimedia.org/T377197) (owner: 10Lucas Werkmeister (WMDE)) [13:19:44] James_F: can you test the change? [13:19:47] Sure. [13:20:28] Yup, works. [13:20:43] (03Merged) 10jenkins-bot: tox: run the multitude of linters only on one python version [cookbooks] - 10https://gerrit.wikimedia.org/r/1080687 (owner: 10Giuseppe Lavagetto) [13:20:49] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, jforrester: Continuing with sync [13:20:52] nice, thanks! [13:21:03] Also, yay for Lexeme access on Wikifunctions. :-) [13:21:09] (03CR) 10Volans: "FWIW I have a plan to redo CI with other tools once I'm back in I/F but not sure when that will be prioritized, it will also need support " [cookbooks] - 10https://gerrit.wikimedia.org/r/1080687 (owner: 10Giuseppe Lavagetto) [13:22:30] do [13:22:32] * :o [13:23:15] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70156 and previous config saved to /var/cache/conftool/dbconfig/20241016-132314-ladsgroup.json [13:24:59] (03PS1) 10Jcrespo: Revert "cumin: Exclude test-* mariadb sections." [puppet] - 10https://gerrit.wikimedia.org/r/1080707 (https://phabricator.wikimedia.org/T374933) [13:25:14] (03CR) 10CI reject: [V:04-1] Revert "cumin: Exclude test-* mariadb sections." [puppet] - 10https://gerrit.wikimedia.org/r/1080707 (https://phabricator.wikimedia.org/T374933) (owner: 10Jcrespo) [13:25:18] (03PS8) 10Arnaudb: sre.mysql.pool: add two new cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/1077101 (https://phabricator.wikimedia.org/T374026) (owner: 10Volans) [13:25:19] !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1078762|Update Z669x references to Z609x]] (duration: 08m 23s) [13:25:27] alright [13:25:35] I’ll try to get my huge block of backports through then [13:25:49] it’s a trivial change in principle but it takes some doing… [13:26:30] (03CR) 10Arnaudb: "I've "upgraded" upgrade to use depool/pool so we can test it in good conditions, I'll have to upgrade a bunch of host so this will be perf" [cookbooks] - 10https://gerrit.wikimedia.org/r/1077101 (https://phabricator.wikimedia.org/T374026) (owner: 10Volans) [13:26:52] (03CR) 10Volans: [C:03+1] "LGTM, thanks" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1077661 (owner: 10Ayounsi) [13:27:14] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/CentralAuth] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080702 (https://phabricator.wikimedia.org/T377197) (owner: 10Lucas Werkmeister (WMDE)) [13:27:15] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080669 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:27:16] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080670 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:27:19] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080671 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:27:23] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/CentralAuth] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080703 (https://phabricator.wikimedia.org/T377197) (owner: 10Lucas Werkmeister (WMDE)) [13:27:31] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080673 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:27:37] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080674 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:27:45] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080676 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:27:49] (03PS5) 10Giuseppe Lavagetto: sre.deploy: add cookbook to deploy hiddenparma [cookbooks] - 10https://gerrit.wikimedia.org/r/1080571 [13:27:53] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/CentralAuth] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080702 (https://phabricator.wikimedia.org/T377197) (owner: 10Lucas Werkmeister (WMDE)) [13:27:58] buhhhh… [13:28:01] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080669 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:28:07] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080670 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:28:15] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080671 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:28:19] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/CentralAuth] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080703 (https://phabricator.wikimedia.org/T377197) (owner: 10Lucas Werkmeister (WMDE)) [13:28:27] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080673 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:28:32] apparently I needed to clear out stray V-1 votes manually [13:28:33] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080674 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:28:37] let’s see what happens now [13:28:41] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080676 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:28:46] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/CentralAuth] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080702 (https://phabricator.wikimedia.org/T377197) (owner: 10Lucas Werkmeister (WMDE)) [13:28:56] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080669 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:29:02] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080670 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:29:10] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080671 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:29:14] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/CentralAuth] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080703 (https://phabricator.wikimedia.org/T377197) (owner: 10Lucas Werkmeister (WMDE)) [13:29:22] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080673 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:29:28] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080674 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:29:36] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080676 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:29:44] (03CR) 10Volans: [C:04-1] "Please restore the previous PS." [cookbooks] - 10https://gerrit.wikimedia.org/r/1077101 (https://phabricator.wikimedia.org/T374026) (owner: 10Volans) [13:30:39] (03CR) 10Ayounsi: [C:03+2] redfish: add UEFI functions [software/spicerack] - 10https://gerrit.wikimedia.org/r/1077661 (owner: 10Ayounsi) [13:30:58] (03CR) 10CI reject: [V:04-1] sre.mysql.pool: add two new cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/1077101 (https://phabricator.wikimedia.org/T374026) (owner: 10Volans) [13:31:00] (03PS1) 10Vgutierrez: liberica: provide a liberica module [puppet] - 10https://gerrit.wikimedia.org/r/1080708 (https://phabricator.wikimedia.org/T377127) [13:31:28] (03PS2) 10Jcrespo: Revert "cumin: Exclude test-* mariadb sections." [puppet] - 10https://gerrit.wikimedia.org/r/1080707 (https://phabricator.wikimedia.org/T374933) [13:32:58] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367856)', diff saved to https://phabricator.wikimedia.org/P70157 and previous config saved to /var/cache/conftool/dbconfig/20241016-133257-ladsgroup.json [13:33:03] T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856 [13:33:45] (03PS3) 10Jcrespo: Revert "cumin: Exclude test-* mariadb sections." [puppet] - 10https://gerrit.wikimedia.org/r/1080707 (https://phabricator.wikimedia.org/T374933) [13:34:42] (03PS9) 10Volans: sre.mysql.pool: add two new cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/1077101 (https://phabricator.wikimedia.org/T374026) [13:35:23] (03CR) 10Giuseppe Lavagetto: sre.deploy: add cookbook to deploy hiddenparma (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1080571 (owner: 10Giuseppe Lavagetto) [13:35:34] (03CR) 10Giuseppe Lavagetto: [C:03+2] sre.deploy: add cookbook to deploy hiddenparma [cookbooks] - 10https://gerrit.wikimedia.org/r/1080571 (owner: 10Giuseppe Lavagetto) [13:35:42] (03Merged) 10jenkins-bot: Tests: Skip testViewForExistingGlobalTemporaryAccount [extensions/CentralAuth] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080702 (https://phabricator.wikimedia.org/T377197) (owner: 10Lucas Werkmeister (WMDE)) [13:37:27] (03PS10) 10Brouberol: Define the ceph-csi-cephfs admin_ng helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/1077878 (https://phabricator.wikimedia.org/T376406) [13:38:22] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70158 and previous config saved to /var/cache/conftool/dbconfig/20241016-133821-ladsgroup.json [13:38:48] (03Merged) 10jenkins-bot: Tests: Skip testViewForExistingGlobalTemporaryAccount [extensions/CentralAuth] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080703 (https://phabricator.wikimedia.org/T377197) (owner: 10Lucas Werkmeister (WMDE)) [13:41:15] !log stevemunene@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye [13:41:16] (03PS1) 10Jcrespo: WIP: Rename tendril to db_inventory [puppet] - 10https://gerrit.wikimedia.org/r/1080711 (https://phabricator.wikimedia.org/T297605) [13:41:19] (03Merged) 10jenkins-bot: redfish: add UEFI functions [software/spicerack] - 10https://gerrit.wikimedia.org/r/1077661 (owner: 10Ayounsi) [13:41:23] (03Merged) 10jenkins-bot: sre.deploy: add cookbook to deploy hiddenparma [cookbooks] - 10https://gerrit.wikimedia.org/r/1080571 (owner: 10Giuseppe Lavagetto) [13:42:15] jouncebot: next [13:42:15] In 0 hour(s) and 17 minute(s): Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241016T1400) [13:42:18] hm. [13:42:25] not sure all my CI will finish before then tbh :/ [13:42:48] (03PS4) 10Jcrespo: Revert "cumin: Exclude test-* mariadb sections." [puppet] - 10https://gerrit.wikimedia.org/r/1080707 (https://phabricator.wikimedia.org/T374933) [13:43:16] I’m especially concerned that, if I understand Zuul correctly, the last two changes haven’t even been enqueued for gate-and-submit-wmf yet [13:43:38] !log stevemunene@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye [13:43:58] (03CR) 10Jcrespo: "zarcillo dbs are still known as tendril internally, apparently. See: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1080711/" [puppet] - 10https://gerrit.wikimedia.org/r/1080707 (https://phabricator.wikimedia.org/T374933) (owner: 10Jcrespo) [13:46:25] (03CR) 10Volans: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1080707 (https://phabricator.wikimedia.org/T374933) (owner: 10Jcrespo) [13:46:30] (03Merged) 10jenkins-bot: Hard-code LabelCountField::NAME [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080673 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:47:21] o_O why does gerrit say merge conflict on https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseMediaInfo/+/1080676 [13:47:57] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] "poke zuul, gate-and-submit pls?" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080676 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:47:59] (03CR) 10Lucas Werkmeister (WMDE): "poke zuul, gate-and-submit pls?" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080671 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:48:03] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] Drop label_count field (LabelCountField) [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080671 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:48:05] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70159 and previous config saved to /var/cache/conftool/dbconfig/20241016-134805-ladsgroup.json [13:48:21] (03CR) 10Jcrespo: [C:03+2] Revert "cumin: Exclude test-* mariadb sections." [puppet] - 10https://gerrit.wikimedia.org/r/1080707 (https://phabricator.wikimedia.org/T374933) (owner: 10Jcrespo) [13:48:27] “Gerrit could not merge the change '1080676' as is and could require a rebase” [13:48:40] (03Merged) 10jenkins-bot: Hard-code LabelCountField::NAME [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080669 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:48:56] (03PS2) 10Lucas Werkmeister (WMDE): Drop label_count field (LabelCountField) [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080676 (https://phabricator.wikimedia.org/T377226) [13:49:09] (03PS2) 10Lucas Werkmeister (WMDE): Drop label_count field (LabelCountField) [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080671 (https://phabricator.wikimedia.org/T377226) [13:49:13] (03CR) 10Volans: "Done" [cookbooks] - 10https://gerrit.wikimedia.org/r/1077101 (https://phabricator.wikimedia.org/T374026) (owner: 10Volans) [13:49:17] o_O [13:49:21] idk why they needed rebasing [13:49:45] James_F: how much of your window are you going to need? [13:49:57] (03Merged) 10jenkins-bot: Remove LabelCountField [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080674 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:49:58] (03Merged) 10jenkins-bot: Remove LabelCountField [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080670 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:50:10] I think right now I can either try to backport everything I wanted to backport, and definitely run into the wikifunctions window [13:50:19] or only deploy what already merged, and then do the rest later [13:50:37] (I definitely need *some* scap because several of the backports were merged already and so they should be deployed too) [13:52:29] well, I’ll try backporting everything… let me know if I should abort that scap and do the smaller deployment instead [13:52:31] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080670 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:52:32] (03CR) 10TrainBranchBot: "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080671 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:52:32] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseCirrusSearch] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080674 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:52:33] Lucas_WMDE: Oh, sorry. None! [13:52:38] (03CR) 10TrainBranchBot: "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080676 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [13:52:39] Lucas_WMDE: Did a deploy last night instead. [13:52:39] oh, yay! [13:52:43] nice ^^ [13:53:00] Hence why I put the one MW-land patch into the regular window. Thank you for deploying! [13:53:07] np :) [13:53:29] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70160 and previous config saved to /var/cache/conftool/dbconfig/20241016-135328-ladsgroup.json [13:55:13] (03PS1) 10Elukey: tox: align behavior with what we use in Spicerack [cookbooks] - 10https://gerrit.wikimedia.org/r/1080716 [13:55:50] (03PS2) 10Elukey: tox: align behavior with what we use in Spicerack [cookbooks] - 10https://gerrit.wikimedia.org/r/1080716 [13:57:19] (03PS1) 10Jcrespo: mariadb: Remove test-pc1 as a valid section [puppet] - 10https://gerrit.wikimedia.org/r/1080717 (https://phabricator.wikimedia.org/T374933) [13:57:30] (03CR) 10Elukey: [C:03+1] tox: run the multitude of linters only on one python version (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1080687 (owner: 10Giuseppe Lavagetto) [13:59:02] (03CR) 10Jcrespo: "Not super urgent but FYI, was an unexpected addition after merging https://gerrit.wikimedia.org/r/1080707" [puppet] - 10https://gerrit.wikimedia.org/r/1080717 (https://phabricator.wikimedia.org/T374933) (owner: 10Jcrespo) [13:59:04] (03CR) 10Michael Große: [C:03+1] [Growth] beta: Lower batch size for reassignMenteesJob [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080223 (https://phabricator.wikimedia.org/T376124) (owner: 10Urbanecm) [13:59:38] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T377317#10233633 (10phaultfinder) [13:59:45] (03PS2) 10Jcrespo: mariadb: Remove test-pc1 as a valid section [puppet] - 10https://gerrit.wikimedia.org/r/1080717 (https://phabricator.wikimedia.org/T374933) [14:00:05] Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241016T1400) [14:00:12] * Lucas_WMDE still deploying [14:01:24] (03PS2) 10Vgutierrez: liberica: provide a liberica module [puppet] - 10https://gerrit.wikimedia.org/r/1080708 (https://phabricator.wikimedia.org/T377127) [14:03:12] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70161 and previous config saved to /var/cache/conftool/dbconfig/20241016-140312-ladsgroup.json [14:05:02] (03CR) 10Bking: [C:03+1] rdf-streaming-updater: relax latency/unstability alerts [alerts] - 10https://gerrit.wikimedia.org/r/1079530 (owner: 10DCausse) [14:05:25] (03CR) 10DCausse: [C:03+2] rdf-streaming-updater: relax latency/unstability alerts [alerts] - 10https://gerrit.wikimedia.org/r/1079530 (owner: 10DCausse) [14:06:36] (03Merged) 10jenkins-bot: rdf-streaming-updater: relax latency/unstability alerts [alerts] - 10https://gerrit.wikimedia.org/r/1079530 (owner: 10DCausse) [14:08:36] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70162 and previous config saved to /var/cache/conftool/dbconfig/20241016-140835-ladsgroup.json [14:08:42] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance [14:08:56] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance [14:09:00] 06SRE-OnFire, 06Data-Persistence-SRE, 06DBA, 13Patch-For-Review, 07Sustainability: ROW-based replicas broke with cleaned up heartbeat tables after setting up circular replication - https://phabricator.wikimedia.org/T375144#10233688 (10jcrespo) ` DRY-RUN: MASTER_TO db2230.codfw.wmnet Ignoring MASTER STATU... [14:09:02] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70163 and previous config saved to /var/cache/conftool/dbconfig/20241016-140902-ladsgroup.json [14:09:51] (03CR) 10Ladsgroup: [C:03+1] mariadb: Remove test-pc1 as a valid section [puppet] - 10https://gerrit.wikimedia.org/r/1080717 (https://phabricator.wikimedia.org/T374933) (owner: 10Jcrespo) [14:11:27] CI is almost done… [14:12:25] (03Merged) 10jenkins-bot: Drop label_count field (LabelCountField) [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080671 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [14:12:27] (03Merged) 10jenkins-bot: Drop label_count field (LabelCountField) [extensions/WikibaseMediaInfo] (wmf/1.43.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1080676 (https://phabricator.wikimedia.org/T377226) (owner: 10Lucas Werkmeister (WMDE)) [14:12:31] wheeee [14:13:04] !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1080702|Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197)]], [[gerrit:1080669|Hard-code LabelCountField::NAME (T377226)]], [[gerrit:1080670|Remove LabelCountField (T377226)]], [[gerrit:1080671|Drop label_count field (LabelCountField) (T377226)]], [[gerrit:1080703|Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197 [14:13:04] )]], [[gerrit:1080673|Hard-code LabelCountField::NAME (T377226)]], [[gerrit:1080674|Remove LabelCountField (T377226)]], [[gerrit:1080676|Drop label_count field (LabelCountField) (T377226)]] [14:13:26] T377197: SpecialCentralAuthTest fails when run in a suite with AccountCreationDetailsLookupTest - https://phabricator.wikimedia.org/T377197 [14:13:27] T377226: Remove LabelCountField from WikibaseCirrusSearch - https://phabricator.wikimedia.org/T377226 [14:13:31] !log [cont.] )]], [[gerrit:1080673|Hard-code LabelCountField::NAME (T377226)]], [[gerrit:1080674|Remove LabelCountField (T377226)]], [[gerrit:1080676|Drop label_count field (LabelCountField) (T377226)]] [14:13:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:48] (03CR) 10Bking: [V:03+2 C:03+2] wdqs: better filtering of monitoring queries [puppet] - 10https://gerrit.wikimedia.org/r/1079522 (owner: 10DCausse) [14:15:18] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Backport for [[gerrit:1080702|Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197)]], [[gerrit:1080669|Hard-code LabelCountField::NAME (T377226)]], [[gerrit:1080670|Remove LabelCountField (T377226)]], [[gerrit:1080671|Drop label_count field (LabelCountField) (T377226)]], [[gerrit:1080703|Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197)] [14:15:18] ], [[gerrit:1080673|Hard-code LabelCountField::NAME (T377226)]], [[gerrit:1080674|Remove LabelCountField (T377226)]], [[gerrit:1080676|Drop label_count field (LabelCountField) (T377226)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [14:15:36] !log [cont.] ], [[gerrit:1080673|Hard-code LabelCountField::NAME (T377226)]], [[gerrit:1080674|Remove LabelCountField (T377226)]], [[gerrit:1080676|Drop label_count field (LabelCountField) (T377226)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [14:15:37] testing… [14:15:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:15] (03PS1) 10Giuseppe Lavagetto: Deploy bugfixes [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1080725 [14:16:27] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Deploy bugfixes [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1080725 (owner: 10Giuseppe Lavagetto) [14:16:33] when editing an existing item, the label_count, does *not* vanish from cirrusDump, which I think is expected [14:16:36] !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance [14:16:49] (that’s why https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1080332 will be needed) [14:16:50] !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance [14:16:57] !log oblivian@cumin1002 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002" [14:16:58] !log oblivian@cumin1002 END (FAIL) - Cookbook sre.deploy.hiddenparma (exit_code=99) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002" [14:17:24] Lucas_WMDE: yes, these will vanish on the next re-index after the config patch is merged [14:17:33] sounds good [14:17:36] just testing a new item now [14:18:07] !log oblivian@cumin1002 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002" [14:18:09] !log oblivian@cumin1002 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - oblivian@cumin1002 [14:18:13] * Lucas_WMDE waits for https://test.wikidata.org/wiki/Q236164?action=cirrusDump to be indexed [14:18:19] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367856)', diff saved to https://phabricator.wikimedia.org/P70164 and previous config saved to /var/cache/conftool/dbconfig/20241016-141819-ladsgroup.json [14:18:21] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance [14:18:22] we also have the builddoc api that might be usefull [14:18:34] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance [14:18:40] !log oblivian@cumin1002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - oblivian@cumin1002 [14:18:41] !log oblivian@cumin1002 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002" [14:18:54] T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856 [14:19:01] (03PS1) 10Herron: jaeger: upgrade images to 1.62 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1080727 (https://phabricator.wikimedia.org/T376904) [14:19:51] dcausse: good point, https://test.wikidata.org/w/api.php?action=query&format=json&prop=cirrusbuilddoc&titles=Q236164&formatversion=2 looks good to me I think [14:20:06] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Continuing with sync [14:20:09] let’s sync then [14:20:09] thanks! [14:20:15] +1 [14:20:50] (03PS3) 10Arnaudb: sre.mysql.upgrade: add depool/pool logic [cookbooks] - 10https://gerrit.wikimedia.org/r/1080718 (https://phabricator.wikimedia.org/T368881) [14:20:50] (03CR) 10Arnaudb: "This CR can be used to test https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1077101" [cookbooks] - 10https://gerrit.wikimedia.org/r/1080718 (https://phabricator.wikimedia.org/T368881) (owner: 10Arnaudb) [14:21:59] I’m surprised how much longer the indexing(?) seems to take on testwikidata compared to wikidata (where a removed alias got updated within 5-10 seconds I think) [14:22:07] but I guess they have completely separate job queues, so it’s not implausible [14:22:16] or it’s because it’s a new item. idk ^^ [14:22:26] not a big problem I think, it’s a test wiki after all [14:23:26] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [14:23:50] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [14:24:41] !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1080702|Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197)]], [[gerrit:1080669|Hard-code LabelCountField::NAME (T377226)]], [[gerrit:1080670|Remove LabelCountField (T377226)]], [[gerrit:1080671|Drop label_count field (LabelCountField) (T377226)]], [[gerrit:1080703|Tests: Skip testViewForExistingGlobalTemporaryAccount (T37719 [14:24:41] 7)]], [[gerrit:1080673|Hard-code LabelCountField::NAME (T377226)]], [[gerrit:1080674|Remove LabelCountField (T377226)]], [[gerrit:1080676|Drop label_count field (LabelCountField) (T377226)]] (duration: 11m 36s) [14:24:54] !log [cont.] 7)]], [[gerrit:1080673|Hard-code LabelCountField::NAME (T377226)]], [[gerrit:1080674|Remove LabelCountField (T377226)]], [[gerrit:1080676|Drop label_count field (LabelCountField) (T377226)]] (duration: 11m 36s) [14:24:59] T377197: SpecialCentralAuthTest fails when run in a suite with AccountCreationDetailsLookupTest - https://phabricator.wikimedia.org/T377197 [14:24:59] T377226: Remove LabelCountField from WikibaseCirrusSearch - https://phabricator.wikimedia.org/T377226 [14:24:59] T37719: API: list=search&srwhat=nearmatch doesn't work for titles with namespace prefix - https://phabricator.wikimedia.org/T37719 [14:25:01] !log UTC afternoon backport+config window done [14:25:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:25] (03PS5) 10Hnowlan: php-cli: include mercurius in 8.1 multiversion image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1077682 (https://phabricator.wikimedia.org/T371699) [14:25:29] ugh, that truncated SAL message caused an ancient task to be pinged :( [14:25:32] (03CR) 10Hnowlan: php-cli: include mercurius in 8.1 multiversion image (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1077682 (https://phabricator.wikimedia.org/T371699) (owner: 10Hnowlan) [14:25:35] that’s very unfortunate [14:26:14] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "all backported now :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080332 (https://phabricator.wikimedia.org/T377226) (owner: 10DCausse) [14:26:42] 10ops-eqiad, 06SRE, 06DC-Ops: ManagementSSHDown - ms-be1077 / logging-hd1005 - https://phabricator.wikimedia.org/T376094#10233782 (10VRiley-WMF) →14Duplicate dup:03T376764 [14:26:44] 10ops-eqiad, 06SRE, 06DC-Ops: ManagementSSHDown - https://phabricator.wikimedia.org/T376764#10233779 (10VRiley-WMF) [14:27:14] 10ops-eqiad, 06SRE, 06DC-Ops: ManagementSSHDown - https://phabricator.wikimedia.org/T376764#10233787 (10VRiley-WMF) a:03VRiley-WMF [14:27:16] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be208[1-8] - https://phabricator.wikimedia.org/T371400#10233788 (10Jhancock.wm) [14:28:03] 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: cr1-eqiad: disk failure - https://phabricator.wikimedia.org/T372781#10233790 (10VRiley-WMF) Just wanted to ask, if the disk is showing that it's back, should we still keep this ticket open? Or, is it safe to close? [14:30:14] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops: Disk (sdv) failed on ms-be1065 - https://phabricator.wikimedia.org/T376775#10233796 (10Jclark-ctr) @MatthewVernon Drive has been replaced but will not let me add new drive. Reboot might be needed I get this error but new drive is listed as ready. I hav... [14:30:39] (03PS2) 10Alexandros Kosiaris: thanos: Add a recording rule for PHP FPM workers [puppet] - 10https://gerrit.wikimedia.org/r/1079453 [14:31:15] hi Lucas_WMDE, are you still deploying? :) [14:32:37] 06SRE-OnFire, 06Data-Persistence-SRE, 06DBA, 13Patch-For-Review, 07Sustainability: ROW-based replicas broke with cleaned up heartbeat tables after setting up circular replication - https://phabricator.wikimedia.org/T375144#10233820 (10jcrespo) I need to research more line 255 change: ` self._validate_sl... [14:33:05] urbanecm: nope, I’m done [14:33:09] ty! [14:33:34] (03CR) 10Alexandros Kosiaris: [C:03+2] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1079453 (owner: 10Alexandros Kosiaris) [14:33:37] (03PS2) 10Urbanecm: [Growth] beta: Lower batch size for reassignMenteesJob [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080223 (https://phabricator.wikimedia.org/T376124) [14:33:40] (03CR) 10Urbanecm: [C:03+2] [Growth] beta: Lower batch size for reassignMenteesJob [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080223 (https://phabricator.wikimedia.org/T376124) (owner: 10Urbanecm) [14:34:21] (03Merged) 10jenkins-bot: [Growth] beta: Lower batch size for reassignMenteesJob [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080223 (https://phabricator.wikimedia.org/T376124) (owner: 10Urbanecm) [14:35:01] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1080223|[Growth] beta: Lower batch size for reassignMenteesJob (T376124)]] [14:35:33] T376124: Removing a mentor from the list of mentors does not always reassign newcomers - https://phabricator.wikimedia.org/T376124 [14:37:13] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:41:47] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1080223|[Growth] beta: Lower batch size for reassignMenteesJob (T376124)]] (duration: 06m 46s) [14:41:55] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Disk (sdk) failed on moss-be1002 - https://phabricator.wikimedia.org/T377154#10233868 (10Jclark-ctr) ` Device: /dev/sda ID_SERIAL=TOSHIBA_MG08ADA400NY_11T0A00WFYXG ID_SERIAL_SHORT=11T0A00WFYXG ID_PATH=pci-0000:3b:00.0-scsi-0:0:1:0 ID_PATH_TAG=pci... [14:42:11] T376124: Removing a mentor from the list of mentors does not always reassign newcomers - https://phabricator.wikimedia.org/T376124 [14:45:07] (03PS3) 10Vgutierrez: liberica: provide a liberica module [puppet] - 10https://gerrit.wikimedia.org/r/1080708 (https://phabricator.wikimedia.org/T377127) [14:46:58] !log jhancock@cumin2002 START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [14:47:16] !log jhancock@cumin2002 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART [14:47:42] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Disk (sdk) failed on moss-be1002 - https://phabricator.wikimedia.org/T377154#10233915 (10Jclark-ctr) 05Open→03Resolved replaced failed drive [14:53:28] 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops: cloudgw: add support and enable IPv6 - https://phabricator.wikimedia.org/T374716#10233918 (10aborrero) 05Open→03Resolved We got it all working on 2024-10-11. [14:54:29] 10ops-eqiad, 06SRE, 06DC-Ops, 10cloud-services-team (Hardware): cloudgw1002: network interface problem - https://phabricator.wikimedia.org/T376589#10233943 (10Jclark-ctr) Firmware applied for nic and bios [14:57:59] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, October 17 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080332 (https://phabricator.wikimedia.org/T377226) (owner: 10DCausse) [15:02:13] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:05:36] !log ongoing maintenance on mr1-eqsin [15:05:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:35] (03PS2) 10Gmodena: dse-k8s-services: content_history: version bump image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080245 (https://phabricator.wikimedia.org/T368787) [15:09:29] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70165 and previous config saved to /var/cache/conftool/dbconfig/20241016-150928-ladsgroup.json [15:09:50] 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops: CloudVPS: IPv6 in codfw1dev - https://phabricator.wikimedia.org/T245495#10234047 (10aborrero) [15:16:01] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops: Disk (sdv) failed on ms-be1065 - https://phabricator.wikimedia.org/T376775#10234107 (10MatthewVernon) @Jclark-ctr I don't know what error you're referring to, but kern.log shows a new disk being added and then removed again: ` Oct 16 14:21:43 ms-be1065 ke... [15:18:42] (03CR) 10Xcollazo: [C:03+1] dse-k8s-services: content_history: version bump image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080245 (https://phabricator.wikimedia.org/T368787) (owner: 10Gmodena) [15:22:13] FIRING: JobUnavailable: Reduced availability for job pdu_sentry4 in ops@eqsin - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:22:22] (03PS1) 10Brouberol: airflow-analytic-test: comment out the postgresql deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080742 (https://phabricator.wikimedia.org/T374948) [15:23:42] (03PS2) 10Brouberol: airflow-analytic-test: comment out the postgresql deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080742 (https://phabricator.wikimedia.org/T374948) [15:24:36] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70166 and previous config saved to /var/cache/conftool/dbconfig/20241016-152436-ladsgroup.json [15:24:42] (03PS3) 10Brouberol: airflow-analytic-test: comment out the postgresql deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080742 (https://phabricator.wikimedia.org/T374948) [15:25:03] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 10Ceph, 06DC-Ops: Disk (sdk) failed on moss-be1002 - https://phabricator.wikimedia.org/T377154#10234171 (10MatthewVernon) @Jclark-ctr Thanks! OSD spun up on the new disk just fine :) [15:25:08] (03CR) 10Tiziano Fogli: "I didn't write those queries. I mean, I'm migrating checks from Icinga to Prometheus (take a look at https://gerrit.wikimedia.org/r/c/oper" [alerts] - 10https://gerrit.wikimedia.org/r/1077986 (https://phabricator.wikimedia.org/T370153) (owner: 10Tiziano Fogli) [15:25:53] (03CR) 10Elukey: [C:03+1] "Done" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1077872 (https://phabricator.wikimedia.org/T376406) (owner: 10Brouberol) [15:26:15] (03CR) 10Elukey: [C:03+1] Disable the priviledged security context of the liveness-prometheus container [deployment-charts] - 10https://gerrit.wikimedia.org/r/1077875 (https://phabricator.wikimedia.org/T376406) (owner: 10Brouberol) [15:26:23] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q1:rack/setup/install ms-be208[1-8] - https://phabricator.wikimedia.org/T371400#10234173 (10Jhancock.wm) when I tried to login to the BMC this morning, 2081 and 2082 were unreachable. connected a console and both had their mgmt IPs re... [15:29:29] (03PS4) 10Brouberol: airflow-analytic-test: comment out the postgresql deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080742 (https://phabricator.wikimedia.org/T374948) [15:30:51] (03CR) 10Elukey: "Left a nit to understand, thanks a lot for this work!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080032 (https://phabricator.wikimedia.org/T376406) (owner: 10Brouberol) [15:32:13] RESOLVED: JobUnavailable: Reduced availability for job pdu_sentry4 in ops@eqsin - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:34:04] (03CR) 10Brouberol: ceph-csi-cephfs: replace the ClusterRole by a list of ns-scoped Roles (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080032 (https://phabricator.wikimedia.org/T376406) (owner: 10Brouberol) [15:36:26] (03CR) 10Elukey: [C:03+1] ceph-csi-cephfs: replace the ClusterRole by a list of ns-scoped Roles (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080032 (https://phabricator.wikimedia.org/T376406) (owner: 10Brouberol) [15:38:49] (03CR) 10Scott French: [C:03+1] php-cli: include mercurius in 8.1 multiversion image (032 comments) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1077682 (https://phabricator.wikimedia.org/T371699) (owner: 10Hnowlan) [15:39:22] (03CR) 10Elukey: "Adding also Alex and Joe to the party, if they want to review/comment :)" [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/1078345 (owner: 10Elukey) [15:39:43] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70167 and previous config saved to /var/cache/conftool/dbconfig/20241016-153943-ladsgroup.json [15:42:21] (03CR) 10Elukey: "After checking a bit more, I'd say that we have these two to compare:" [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/1078345 (owner: 10Elukey) [15:47:58] (03PS1) 10Ebernhardson: Migrate package to opensearch [software/opensearch/plugins] - 10https://gerrit.wikimedia.org/r/1080749 (https://phabricator.wikimedia.org/T372769) [15:52:08] !log maintenance on mr1-eqsin complete [15:52:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:51] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70168 and previous config saved to /var/cache/conftool/dbconfig/20241016-155450-ladsgroup.json [15:54:56] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance [15:55:10] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance [15:56:04] 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops: dns: integrate PTR support for 2a02:ec80:a100::/48 - https://phabricator.wikimedia.org/T376462#10234300 (10cmooney) [15:56:09] 06SRE, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops: openstack: work out IPv6 and designate integration - https://phabricator.wikimedia.org/T374715#10234301 (10cmooney) [15:59:28] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance [15:59:42] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance [15:59:48] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70169 and previous config saved to /var/cache/conftool/dbconfig/20241016-155948-ladsgroup.json [16:05:19] (03CR) 10RLazarus: [C:03+2] fix(growthexperiments.pp): correct order of arguments for mwscript [puppet] - 10https://gerrit.wikimedia.org/r/1080453 (https://phabricator.wikimedia.org/T372337) (owner: 10Michael Große) [16:07:57] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70170 and previous config saved to /var/cache/conftool/dbconfig/20241016-160756-ladsgroup.json [16:09:25] FIRING: [7x] SystemdUnitFailed: mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:12:15] MichaelG_WMF: fyi, merged it in time for the 16:10 run, the logs look good so far :) [16:12:38] thank you! 💚 [16:13:46] * MichaelG_WMF needs to leave ~now though to attend a very important pubquiz, will check results tomorrow morning [16:14:25] FIRING: [7x] SystemdUnitFailed: mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun-eswiki.service on mwmaint2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:21:37] oop yes good luck [16:23:04] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P70171 and previous config saved to /var/cache/conftool/dbconfig/20241016-162303-ladsgroup.json [16:23:33] (03CR) 10Volans: [C:04-1] "Small copy/paste error from spicerack" [cookbooks] - 10https://gerrit.wikimedia.org/r/1080716 (owner: 10Elukey) [16:34:25] FIRING: [6x] SystemdUnitFailed: httpbb_kubernetes_mw-web_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:36:27] FIRING: [12x] ProbeDown: Service ripe-atlas-codfw:0 has failed probes (icmp_ripe_atlas_codfw_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:38:11] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P70172 and previous config saved to /var/cache/conftool/dbconfig/20241016-163810-ladsgroup.json [16:44:55] (03PS6) 10Hnowlan: php-cli: include mercurius in 8.1 multiversion image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1077682 (https://phabricator.wikimedia.org/T371699) [16:46:39] (03CR) 10Hnowlan: [V:03+2 C:03+2] php-cli: include mercurius in 8.1 multiversion image (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1077682 (https://phabricator.wikimedia.org/T371699) (owner: 10Hnowlan) [16:47:10] (03CR) 10Hnowlan: [V:03+2 C:03+2] php-cli: include mercurius in 8.1 multiversion image (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1077682 (https://phabricator.wikimedia.org/T371699) (owner: 10Hnowlan) [16:48:18] (03PS7) 10Hnowlan: php-cli: include mercurius in 8.1 multiversion image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1077682 (https://phabricator.wikimedia.org/T371699) [16:48:56] (03CR) 10Hnowlan: [V:03+2 C:03+2] php-cli: include mercurius in 8.1 multiversion image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1077682 (https://phabricator.wikimedia.org/T371699) (owner: 10Hnowlan) [16:51:12] (03PS2) 10Bking: ATS: add mapping for airflow-analytics-test [puppet] - 10https://gerrit.wikimedia.org/r/1079361 (https://phabricator.wikimedia.org/T374948) [16:52:40] (03PS3) 10Bking: ATS: add mapping for airflow-analytics-test [puppet] - 10https://gerrit.wikimedia.org/r/1079361 (https://phabricator.wikimedia.org/T374948) [16:52:52] (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1079361 (https://phabricator.wikimedia.org/T374948) (owner: 10Bking) [16:53:17] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70173 and previous config saved to /var/cache/conftool/dbconfig/20241016-165317-ladsgroup.json [16:53:23] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance [16:53:37] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance [16:53:44] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2210 (T376905)', diff saved to https://phabricator.wikimedia.org/P70174 and previous config saved to /var/cache/conftool/dbconfig/20241016-165343-ladsgroup.json [16:57:07] !log xcollazo@deploy2002 Started deploy [analytics/refinery@f186c94]: Regular analytics weekly train [analytics/refinery@f186c94a] [17:00:05] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241016T1700) [17:03:00] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2210 (T376905)', diff saved to https://phabricator.wikimedia.org/P70175 and previous config saved to /var/cache/conftool/dbconfig/20241016-170300-ladsgroup.json [17:03:48] (03CR) 10Bking: airflow-analytic-test: comment out the postgresql deployment (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080742 (https://phabricator.wikimedia.org/T374948) (owner: 10Brouberol) [17:04:15] (03CR) 10Hnowlan: [C:03+1] Add chart-renderer namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079350 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis) [17:04:21] (03CR) 10Hnowlan: [C:03+1] Add chart-renderer deployment server profile [puppet] - 10https://gerrit.wikimedia.org/r/1079345 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis) [17:06:01] !log xcollazo@deploy2002 Finished deploy [analytics/refinery@f186c94]: Regular analytics weekly train [analytics/refinery@f186c94a] (duration: 08m 54s) [17:06:52] !log xcollazo@deploy2002 Started deploy [analytics/refinery@f186c94] (thin): Regular analytics weekly train THIN [analytics/refinery@f186c94a] [17:07:05] (03PS4) 10Volans: sre.switchdc.databases.prepare: add check [cookbooks] - 10https://gerrit.wikimedia.org/r/1074127 (https://phabricator.wikimedia.org/T371351) [17:07:05] (03PS4) 10Volans: sre.switchdc.databases: update Phabricator more [cookbooks] - 10https://gerrit.wikimedia.org/r/1074128 (https://phabricator.wikimedia.org/T371351) [17:07:05] (03PS3) 10Volans: sre.switchdc.databases.prepare: fix heartbeat [cookbooks] - 10https://gerrit.wikimedia.org/r/1079536 (https://phabricator.wikimedia.org/T375144) [17:07:05] (03PS4) 10Volans: sre.switchdc.databases: allow to select a section [cookbooks] - 10https://gerrit.wikimedia.org/r/1079537 (https://phabricator.wikimedia.org/T375144) [17:07:48] 06SRE-OnFire, 06Data-Persistence-SRE, 06DBA, 13Patch-For-Review, 07Sustainability: ROW-based replicas broke with cleaned up heartbeat tables after setting up circular replication - https://phabricator.wikimedia.org/T375144#10234887 (10Volans) The cause of that dry-run failure was the added check of repli... [17:08:17] (03CR) 10Bking: [C:03+2] airflow-analytic-test: comment out the postgresql deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080742 (https://phabricator.wikimedia.org/T374948) (owner: 10Brouberol) [17:10:22] (03CR) 10CDanis: [C:03+1] jaeger: upgrade images to 1.62 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1080727 (https://phabricator.wikimedia.org/T376904) (owner: 10Herron) [17:11:40] (03Merged) 10jenkins-bot: airflow-analytic-test: comment out the postgresql deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080742 (https://phabricator.wikimedia.org/T374948) (owner: 10Brouberol) [17:12:04] !log xcollazo@deploy2002 Finished deploy [analytics/refinery@f186c94] (thin): Regular analytics weekly train THIN [analytics/refinery@f186c94a] (duration: 05m 11s) [17:13:08] !log xcollazo@deploy2002 Started deploy [analytics/refinery@f186c94] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f186c94a] [17:16:52] !log xcollazo@deploy2002 Finished deploy [analytics/refinery@f186c94] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f186c94a] (duration: 03m 44s) [17:18:07] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P70176 and previous config saved to /var/cache/conftool/dbconfig/20241016-171807-ladsgroup.json [17:20:42] !log stevemunene@cumin1002 START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye [17:21:14] (03CR) 10Herron: [V:03+2 C:03+2] jaeger: upgrade images to 1.62 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1080727 (https://phabricator.wikimedia.org/T376904) (owner: 10Herron) [17:29:25] FIRING: [6x] SystemdUnitFailed: httpbb_kubernetes_mw-web_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:33:14] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P70177 and previous config saved to /var/cache/conftool/dbconfig/20241016-173314-ladsgroup.json [17:37:21] !log swfrench@cumin2002 START - Cookbook sre.dns.netbox [17:37:33] !log stevemunene@cumin1002 START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye [17:38:33] (03PS1) 10Bking: stat hosts: Permit zRAM swapping [puppet] - 10https://gerrit.wikimedia.org/r/1080769 (https://phabricator.wikimedia.org/T376813) [17:38:54] (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1080769 (https://phabricator.wikimedia.org/T376813) (owner: 10Bking) [17:38:56] !log gmodena@deploy2002 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply [17:39:02] !log gmodena@deploy2002 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply [17:41:45] !log swfrench@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly allocated LVS VIPs for mw-web-next and mw-api-ext-next - swfrench@cumin2002" [17:42:10] (03PS2) 10C. Scott Ananian: Bump wikimedia/parsoid to 0.20.0-a26 [vendor] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080770 (https://phabricator.wikimedia.org/T377287) [17:43:05] (03CR) 10CDanis: [C:03+2] Add chart-renderer deployment server profile [puppet] - 10https://gerrit.wikimedia.org/r/1079345 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis) [17:43:37] (03PS2) 10C. Scott Ananian: Bump wikimedia/parsoid to 0.20.0-a26 [core] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080773 (https://phabricator.wikimedia.org/T377287) [17:44:38] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, October 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [core] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080773 (https://phabricator.wikimedia.org/T377287) (owner: 10C. Scott Ananian) [17:46:59] (03PS2) 10Bking: stat hosts: Permit zRAM swapping [puppet] - 10https://gerrit.wikimedia.org/r/1080769 (https://phabricator.wikimedia.org/T376813) [17:48:17] !log swfrench@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly allocated LVS VIPs for mw-web-next and mw-api-ext-next - swfrench@cumin2002" [17:48:17] !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [17:48:21] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2210 (T376905)', diff saved to https://phabricator.wikimedia.org/P70178 and previous config saved to /var/cache/conftool/dbconfig/20241016-174821-ladsgroup.json [17:48:27] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance [17:48:41] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance [17:48:48] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70179 and previous config saved to /var/cache/conftool/dbconfig/20241016-174847-ladsgroup.json [17:49:43] (03CR) 10CDanis: [C:03+2] Add chart-renderer namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079350 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis) [17:53:14] (03Merged) 10jenkins-bot: Add chart-renderer namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079350 (https://phabricator.wikimedia.org/T372081) (owner: 10CDanis) [17:53:36] (03CR) 10Brouberol: [C:03+1] stat hosts: Permit zRAM swapping [puppet] - 10https://gerrit.wikimedia.org/r/1080769 (https://phabricator.wikimedia.org/T376813) (owner: 10Bking) [17:55:33] !log cdanis@deploy2002 helmfile [staging-codfw] START helmfile.d/admin 'apply'. [17:56:43] !log cdanis@deploy2002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. [17:57:37] !log dzahn@cumin2002 START - Cookbook sre.hosts.reimage for host phab2002.codfw.wmnet with OS bullseye [17:57:38] !log cdanis@deploy2002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'. [17:57:54] !log dzahn@cumin2002 START - Cookbook sre.hosts.move-vlan for host phab2002 [17:58:34] 06SRE, 06Infrastructure-Foundations, 10netops: Re-IP hosts on codfw row A and B to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T354869#10235144 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin2002 for host phab2002.codfw.wmnet with OS bullseye [18:00:00] !log cdanis@deploy2002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. [18:00:06] jeena and andre: Your horoscope predicts another MediaWiki train - Utc-7+Utc-0 Version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241016T1800). [18:00:14] !log ongoing maintenance on mr1-ulsfo [18:00:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:51] !log cdanis@deploy2002 helmfile [eqiad] START helmfile.d/admin 'apply'. [18:01:37] (03CR) 10CI reject: [V:04-1] Bump wikimedia/parsoid to 0.20.0-a26 [vendor] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080770 (https://phabricator.wikimedia.org/T377287) (owner: 10C. Scott Ananian) [18:02:05] !log cdanis@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'apply'. [18:04:40] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T377317#10235182 (10phaultfinder) [18:04:47] !log cdanis@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'. [18:05:13] !log cdanis@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'. [18:06:14] !log dzahn@cumin2002 START - Cookbook sre.dns.netbox [18:09:33] (03PS1) 10TrainBranchBot: group1 to 1.43.0-wmf.27 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080777 (https://phabricator.wikimedia.org/T375658) [18:09:35] (03CR) 10TrainBranchBot: [C:03+2] group1 to 1.43.0-wmf.27 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080777 (https://phabricator.wikimedia.org/T375658) (owner: 10TrainBranchBot) [18:10:23] (03Merged) 10jenkins-bot: group1 to 1.43.0-wmf.27 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080777 (https://phabricator.wikimedia.org/T375658) (owner: 10TrainBranchBot) [18:10:58] (03PS1) 10Ejegg: Make wikitech a target for CentralNotice banners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080780 (https://phabricator.wikimedia.org/T377030) [18:11:35] !log gmodena@deploy2002 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply [18:11:37] !log dzahn@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host phab2002 - dzahn@cumin2002" [18:11:40] !log gmodena@deploy2002 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply [18:11:42] !log dzahn@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host phab2002 - dzahn@cumin2002" [18:11:42] !log dzahn@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [18:11:43] !log dzahn@cumin2002 START - Cookbook sre.dns.wipe-cache phab2002.codfw.wmnet 54.32.192.10.in-addr.arpa 4.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors [18:11:46] !log dzahn@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab2002.codfw.wmnet 54.32.192.10.in-addr.arpa 4.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors [18:11:47] !log dzahn@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host phab2002 [18:12:22] !log dzahn@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host phab2002 [18:12:22] !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host phab2002 [18:14:30] (03PS1) 10Dzahn: mariadb: update grants for phab2002 with new IP [puppet] - 10https://gerrit.wikimedia.org/r/1080781 (https://phabricator.wikimedia.org/T377374) [18:15:19] (03PS1) 10Gmodena: servive: page-concent-change-enrich version bump. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080783 (https://phabricator.wikimedia.org/T371874) [18:17:12] (03CR) 10Cstone: [C:03+1] Make wikitech a target for CentralNotice banners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080780 (https://phabricator.wikimedia.org/T377030) (owner: 10Ejegg) [18:17:13] FIRING: JobUnavailable: Reduced availability for job pdu_sentry4 in ops@ulsfo - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:17:24] (03PS3) 10Gmodena: dse-k8s-services: content_history: version bump image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080245 (https://phabricator.wikimedia.org/T368787) [18:17:49] !log jhuneidi@deploy2002 rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.27 refs T375658 [18:18:16] T375658: 1.43.0-wmf.27 deployment blockers - https://phabricator.wikimedia.org/T375658 [18:18:34] (03PS2) 10Gmodena: services: page-content-change-enrich version bump. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080783 (https://phabricator.wikimedia.org/T371874) [18:20:45] !log gmodena@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply [18:20:50] !log gmodena@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply [18:24:08] (03CR) 10Gmodena: [C:03+2] services: page-content-change-enrich version bump. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080783 (https://phabricator.wikimedia.org/T371874) (owner: 10Gmodena) [18:25:09] (03Merged) 10jenkins-bot: services: page-content-change-enrich version bump. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080783 (https://phabricator.wikimedia.org/T371874) (owner: 10Gmodena) [18:27:13] RESOLVED: JobUnavailable: Reduced availability for job pdu_sentry4 in ops@ulsfo - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:27:37] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, October 16 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080780 (https://phabricator.wikimedia.org/T377030) (owner: 10Ejegg) [18:27:41] !log gmodena@deploy2002 helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply [18:27:44] !log gmodena@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply [18:28:29] (03PS4) 10Gmodena: dse-k8s-services: content_history: version bump image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080245 (https://phabricator.wikimedia.org/T368787) [18:28:44] (03CR) 10Gmodena: [C:03+2] dse-k8s-services: content_history: version bump image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080245 (https://phabricator.wikimedia.org/T368787) (owner: 10Gmodena) [18:29:32] !log dzahn@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage [18:29:47] (03Merged) 10jenkins-bot: dse-k8s-services: content_history: version bump image. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080245 (https://phabricator.wikimedia.org/T368787) (owner: 10Gmodena) [18:30:19] 06SRE-OnFire, 06Data-Persistence-SRE, 06DBA, 13Patch-For-Review, 07Sustainability: ROW-based replicas broke with cleaned up heartbeat tables after setting up circular replication - https://phabricator.wikimedia.org/T375144#10235314 (10jcrespo) Thanks Riccardo, as I said on IRC it looked like a minor issu... [18:31:19] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply [18:31:25] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply [18:32:29] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply [18:32:35] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply [18:33:09] !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage [18:34:16] (03CR) 10Bking: [C:03+2] stat hosts: Permit zRAM swapping [puppet] - 10https://gerrit.wikimedia.org/r/1080769 (https://phabricator.wikimedia.org/T376813) (owner: 10Bking) [18:34:30] FIRING: Device rebooted: Alert for device ps1-c6-eqiad.mgmt.eqiad.wmnet - Device rebooted - https://alerts.wikimedia.org/?q=alertname%3DDevice+rebooted [18:35:56] !log jhathaway@cumin1002 START - Cookbook sre.hosts.decommission for hosts mx2001.wikimedia.org [18:36:57] !log stevemunene@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye [18:39:30] RESOLVED: Device rebooted: Device ps1-c6-eqiad.mgmt.eqiad.wmnet recovered from Device rebooted - https://alerts.wikimedia.org/?q=alertname%3DDevice+rebooted [18:41:06] !log jhathaway@cumin1002 START - Cookbook sre.dns.netbox [18:42:04] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops: Disk (sdv) failed on ms-be1065 - https://phabricator.wikimedia.org/T376775#10235365 (10Jclark-ctr) 05Open→03Resolved Rebooted drive and cleared cache. added drive back in. looks good now to me. [18:43:28] FIRING: JobUnavailable: Reduced availability for job mtail in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:43:39] !log maintenance on mr1-ulsfo complete [18:43:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:45:55] !log jhathaway@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002" [18:46:24] !log jhathaway@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002" [18:46:24] !log jhathaway@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [18:46:25] !log jhathaway@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx2001.wikimedia.org [18:46:36] 06SRE, 06Infrastructure-Foundations, 10Mail: Remove Exim based MTAs - https://phabricator.wikimedia.org/T325409#10235387 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jhathaway@cumin1002 for hosts: `mx2001.wikimedia.org` - mx2001.wikimedia.org (**PASS**) - Downtimed host on Icinga/Ale... [18:48:28] RESOLVED: JobUnavailable: Reduced availability for job mtail in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:49:13] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70181 and previous config saved to /var/cache/conftool/dbconfig/20241016-184912-ladsgroup.json [18:54:36] !log stevemunene@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye [18:56:32] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: eqiad:frack network design, installation and configuration - https://phabricator.wikimedia.org/T377381 (10cmooney) 03NEW p:05Triage→03Medium [18:56:45] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: eqiad:frack network design, installation and configuration - https://phabricator.wikimedia.org/T377381#10235421 (10cmooney) [18:56:46] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Q1:eqiad:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371435#10235422 (10cmooney) [19:04:20] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70182 and previous config saved to /var/cache/conftool/dbconfig/20241016-190419-ladsgroup.json [19:14:36] !log bking@stat1011 racadm>>racadm set BIOS.MemSettings.NodeInterleave Enabled T376813 [19:14:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:54] T376813: Implement non-cgroups-related performance optimizations on stat hosts - https://phabricator.wikimedia.org/T376813 [19:16:01] !log bking@stat1011 racadm>>racadm jobqueue create BIOS.Setup.1-1 Commit JID = JID_291241139935 T376813 [19:16:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:19:27] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70183 and previous config saved to /var/cache/conftool/dbconfig/20241016-191926-ladsgroup.json [19:19:37] !log brennen@deploy2002 Started deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374 [19:20:08] T377374: reimage collab servers in legacy codfw VLANs - https://phabricator.wikimedia.org/T377374 [19:30:20] !log brennen@deploy2002 Finished deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374 (duration: 10m 42s) [19:30:49] T377374: reimage physical collab servers in legacy codfw VLANs - https://phabricator.wikimedia.org/T377374 [19:34:34] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70184 and previous config saved to /var/cache/conftool/dbconfig/20241016-193433-ladsgroup.json [19:34:40] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance [19:34:53] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance [19:35:00] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70185 and previous config saved to /var/cache/conftool/dbconfig/20241016-193500-ladsgroup.json [19:35:27] !log jhathaway@cumin1002 START - Cookbook sre.hosts.decommission for hosts mx1001.wikimedia.org [19:38:11] FIRING: MXQueueNoMetrics: Queue length metrics not found - https://wikitech.wikimedia.org/wiki/Exim - https://grafana.wikimedia.org/d/000000451/mail - https://alerts.wikimedia.org/?q=alertname%3DMXQueueNoMetrics [19:40:16] !log jhathaway@cumin1002 START - Cookbook sre.dns.netbox [19:42:43] !log jhathaway@cumin1002 START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org [19:42:57] !log jhathaway@cumin1002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mx-out2001.wikimedia.org [19:43:52] !log jhathaway@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002" [19:44:08] !log jhathaway@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002" [19:44:08] !log jhathaway@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [19:44:09] !log jhathaway@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx1001.wikimedia.org [19:44:18] !log jhathaway@cumin1002 START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org [19:44:20] 06SRE, 06Infrastructure-Foundations, 10Mail: Remove Exim based MTAs - https://phabricator.wikimedia.org/T325409#10235508 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jhathaway@cumin1002 for hosts: `mx1001.wikimedia.org` - mx1001.wikimedia.org (**PASS**) - Downtimed host on Icinga/Ale... [19:44:21] !log jhathaway@cumin1002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mx-out2001.wikimedia.org [19:44:31] !log jhathaway@cumin1002 START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org [19:44:33] !log jhathaway@cumin1002 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mx-out2001.wikimedia.org [19:45:14] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70186 and previous config saved to /var/cache/conftool/dbconfig/20241016-194513-ladsgroup.json [19:45:14] !log jhathaway@cumin1002 START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org [19:49:02] !log jhathaway@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-out2001.wikimedia.org [19:49:38] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10235538 (10cmooney) [19:50:07] !log jhathaway@cumin1002 START - Cookbook sre.hosts.reboot-single for host mx-out1001.wikimedia.org [19:54:02] !log jhathaway@cumin1002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-out1001.wikimedia.org [19:54:23] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10235556 (10cmooney) @jclark-ctr I think existing stock was used for the 100G links between the switches in codfw. I know we s... [19:57:26] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10235573 (10cmooney) [20:00:06] RoanKattouw, Urbanecm, cjming, TheresNoTime, and kindrobot: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241016T2000). [20:00:07] cscott and ejegg: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:00:16] \O/ [20:00:21] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P70187 and previous config saved to /var/cache/conftool/dbconfig/20241016-200020-ladsgroup.json [20:01:14] * TheresNoTime cannot deploy this evening, sorry! [20:01:59] I can deploy if needed [20:02:44] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10235582 (10cmooney) [20:04:55] cscott, your patch 1080770 has a build test failure [20:06:47] ejegg: are you available for your backport? [20:09:14] https://www.irccloud.com/pastebin/xpGMradb [20:09:23] Huh that's unrelated to my patch [20:09:55] Looks like community configuration is breaking ci? [20:12:07] hmm, I'm not sure who to contact but I'll ask around [20:12:52] (03PS1) 10Bartosz Dziewoński: Set $wgAllowRawHtmlCopyrightMessages = false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080805 (https://phabricator.wikimedia.org/T375789) [20:13:10] (03CR) 10C. Scott Ananian: "recheck" [vendor] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080770 (https://phabricator.wikimedia.org/T377287) (owner: 10C. Scott Ananian) [20:13:26] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, October 17 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080805 (https://phabricator.wikimedia.org/T375789) (owner: 10Bartosz Dziewoński) [20:13:28] hi jeena, sorry to respond late [20:13:33] can we still get that out? [20:13:56] ejegg yeah I can do it now since we're a bit blocked on the other change [20:14:53] great, thanks! [20:15:03] (03CR) 10TrainBranchBot: [C:03+2] "Approved by jhuneidi@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080780 (https://phabricator.wikimedia.org/T377030) (owner: 10Ejegg) [20:15:28] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P70188 and previous config saved to /var/cache/conftool/dbconfig/20241016-201527-ladsgroup.json [20:15:31] !log phab2002 - manually bootstrapping scap since puppet did not do it due to dependency cycles: sudo -u scap /usr/local/bin/bootstrap-scap-target.sh deploy2002.codfw.wmnet /var/lib/scap T303559 T310740 T377374 [20:15:51] (03Merged) 10jenkins-bot: Make wikitech a target for CentralNotice banners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1080780 (https://phabricator.wikimedia.org/T377030) (owner: 10Ejegg) [20:16:21] !log jhuneidi@deploy2002 Started scap sync-world: Backport for [[gerrit:1080780|Make wikitech a target for CentralNotice banners (T377030)]] [20:16:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:16:59] T303559: Use scap to deploy itself to scap targets - https://phabricator.wikimedia.org/T303559 [20:16:59] T310740: scap-o-scap: Bootstrapping a new host fails - https://phabricator.wikimedia.org/T310740 [20:17:00] T377374: reimage physical collab servers in legacy codfw VLANs - https://phabricator.wikimedia.org/T377374 [20:17:25] !log phab2002 - after manually running bootstrap-scap-target.sh and "Scap from local bullseye wheels successfully installed at /var/lib/scap/scap" still "cannot open `/usr/bin/scap' (No such file or directory)" though. T303559 T310740 T377374 [20:17:26] T377030: Wikitech showing Wikipedia CentralNotice banners - https://phabricator.wikimedia.org/T377030 [20:17:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:24] !log phab2002 - ln -s /var/lib/scap/scap/bin/scap /usr/bin/scap [20:18:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:39] !log jhuneidi@deploy2002 ejegg, jhuneidi: Backport for [[gerrit:1080780|Make wikitech a target for CentralNotice banners (T377030)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [20:18:50] ejegg ready for any tests [20:20:15] ok, I'm logged in to the CentralNotice admin page with debug on [20:20:21] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CommunityConfiguration/+/1064361 seems to be what broke CI [20:20:22] will just take a look at the project list [20:20:54] yep, wikitech is on there as a target! [20:20:54] 10ops-eqiad, 06SRE, 10Cassandra, 06DC-Ops: Degraded RAID on aqs1013 - https://phabricator.wikimedia.org/T362033#10235675 (10Eevans) Hi @VRiley-WMF, Per our chat on IRC, the affected SSD is `/dev/sde` (serial no. !!S4KVNA0MB03305!!). It should be the first SSD on the second controller. ` *-disk:0... [20:21:05] thanks jeena, looks fine to go out to all the servers [20:21:11] thanks will do [20:21:18] !log jhuneidi@deploy2002 ejegg, jhuneidi: Continuing with sync [20:24:28] cscott: I just saw your message, so I guess we are waiting on a reply from them? [20:25:15] and/or trying to debug it myself. I could make a revert for the patch in question, and backport that as well, but that seems like overkill maybe. [20:25:26] yeah [20:26:23] !log jhuneidi@deploy2002 Finished scap sync-world: Backport for [[gerrit:1080780|Make wikitech a target for CentralNotice banners (T377030)]] (duration: 10m 02s) [20:26:38] ejegg: all done [20:26:47] i'm testing right now, but i think all patches to the mediawiki-vendor repository are going to fail CI right now. [20:26:50] T377030: Wikitech showing Wikipedia CentralNotice banners - https://phabricator.wikimedia.org/T377030 [20:29:13] !log gmodena@deploy2002 helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply [20:29:19] !log gmodena@deploy2002 helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply [20:29:30] FIRING: Device rebooted: Alert for device ps1-d7-eqiad.mgmt.eqiad.wmnet - Device rebooted - https://alerts.wikimedia.org/?q=alertname%3DDevice+rebooted [20:30:28] !log brennen@deploy2002 Started deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374 [20:30:35] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70189 and previous config saved to /var/cache/conftool/dbconfig/20241016-203034-ladsgroup.json [20:30:36] (03PS1) 10Gmodena: services: page-content-change-enrich version bump. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080810 (https://phabricator.wikimedia.org/T371874) [20:30:36] !log brennen@deploy2002 Finished deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374 (duration: 00m 08s) [20:31:03] T377374: reimage physical collab servers in legacy codfw VLANs - https://phabricator.wikimedia.org/T377374 [20:32:43] (03CR) 10Gmodena: [C:03+2] "The previous bump did not update to a recent enough jre." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080810 (https://phabricator.wikimedia.org/T371874) (owner: 10Gmodena) [20:33:38] (03Merged) 10jenkins-bot: services: page-content-change-enrich version bump. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080810 (https://phabricator.wikimedia.org/T371874) (owner: 10Gmodena) [20:34:28] (03PS1) 10C. Scott Ananian: DNP: no-op patch to test CI [vendor] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080811 [20:34:30] RESOLVED: Device rebooted: Device ps1-d7-eqiad.mgmt.eqiad.wmnet recovered from Device rebooted - https://alerts.wikimedia.org/?q=alertname%3DDevice+rebooted [20:34:59] jeena: in any case i don't think we can/should backport until the CI issue is fixed, so I think you're off the hook. [20:35:23] okay, maybe we can tomorrow [20:36:10] (03PS1) 10Gmodena: dse-k8s-services: content_history: version bump [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080813 (https://phabricator.wikimedia.org/T371874) [20:37:18] !log gmodena@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply [20:37:23] !log gmodena@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply [20:37:54] jhathaway: I am seeing an email delivery error I usually never see. About mail from cloud VPS. Now wondering if any relation is possible to decom of mx1001 [20:38:11] FIRING: [2x] MXQueueNoMetrics: Queue length metrics not found - https://wikitech.wikimedia.org/wiki/Exim - https://grafana.wikimedia.org/d/000000451/mail - https://alerts.wikimedia.org/?q=alertname%3DMXQueueNoMetrics [20:38:42] hmm, this one too I guess [20:39:40] !log gmodena@deploy2002 helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply [20:39:43] !log gmodena@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply [20:41:37] (03CR) 10Gmodena: "The previous version bump did not update jre to a recent enough version." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080813 (https://phabricator.wikimedia.org/T371874) (owner: 10Gmodena) [20:41:43] (03CR) 10Gmodena: [C:03+2] dse-k8s-services: content_history: version bump [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080813 (https://phabricator.wikimedia.org/T371874) (owner: 10Gmodena) [20:42:26] filed T377391. Oddly enough, it only affects the wmf.27 branch of mediawiki-vendor. I don't know why. [20:42:26] T377391: CommunityConfiguration extension breaking CI for all patches to mediawiki-vendor - https://phabricator.wikimedia.org/T377391 [20:42:40] (03Merged) 10jenkins-bot: dse-k8s-services: content_history: version bump [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080813 (https://phabricator.wikimedia.org/T371874) (owner: 10Gmodena) [20:43:29] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply [20:43:35] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply [20:44:27] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply [20:44:33] !log gmodena@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply [20:49:03] mutante: definitely could be, perhaps I missed moving them to the new servers? [20:49:53] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T377317#10235808 (10phaultfinder) [20:50:58] jhathaway: so, what this is is trying to deliver TO: To: root [20:51:24] I see it is received by mx-in1001 [20:51:34] (03CR) 10CI reject: [V:04-1] DNP: no-op patch to test CI [vendor] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080811 (owner: 10C. Scott Ananian) [20:51:49] but then there is also "The recipient server did not accept our requests to connect" [20:52:08] those are these automated emails when cloud VPS machines have puppet problems for example [20:52:57] it sends those to root@ which then somehow resolves to all admins of the Horizon project, afaict [20:55:18] what did you grep for in the mx-in1001 logs mutante [20:55:23] I don't see that example [20:55:45] (03PS1) 10Clare Ming: Metrics Platform Instrument Configuration: Deploying to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080816 (https://phabricator.wikimedia.org/T373967) [20:58:25] jhathaway: I only looked at the source of the mail which has a "Received: from mx-in1001.wikimedia.org" line [20:58:34] I got this bounce mail from google [20:59:04] mutante: does it provide the message-id from mx-in1001.wikimedia.org [20:59:39] ESMTPS id 6a1803df08f44-6cc22b50e2csi47034226d6.435.2024.10.16.13.08.24 [21:00:05] Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241016T2100) [21:00:49] jhathaway: oooh, I think this is not what I thought, sorry! [21:01:01] (03PS1) 10Clare Ming: Metrics Platform Instrument Configuration: Deploying to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080817 (https://phabricator.wikimedia.org/T373967) [21:01:25] 1 day ago I _responded_ to one of those emails. It is telling me 24 hours later it wasnt able to deliver my response. [21:02:10] nod okay, no problem at all mutante [21:12:35] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance [21:12:38] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance [21:13:08] (03PS1) 10Dzahn: phabricator: add numeric group ID for vcs systemuser [puppet] - 10https://gerrit.wikimedia.org/r/1080818 (https://phabricator.wikimedia.org/T377374) [21:17:04] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance [21:17:07] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance [21:29:25] FIRING: [5x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:34:47] 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T377317#10235938 (10phaultfinder) [21:40:25] (03CR) 10Dzahn: [C:04-2] "eh, this does not have a gid parameter. https://puppet-compiler.wmflabs.org/output/1080818/4312/phab2002.codfw.wmnet/change.phab2002.codfw" [puppet] - 10https://gerrit.wikimedia.org/r/1080818 (https://phabricator.wikimedia.org/T377374) (owner: 10Dzahn) [21:43:07] (03PS2) 10Dzahn: phabricator: add numeric UID/GID for vcs systemuser [puppet] - 10https://gerrit.wikimedia.org/r/1080818 (https://phabricator.wikimedia.org/T377374) [21:49:26] (03CR) 10Dzahn: "on current prod server:" [puppet] - 10https://gerrit.wikimedia.org/r/1080818 (https://phabricator.wikimedia.org/T377374) (owner: 10Dzahn) [21:49:49] (03CR) 10Dzahn: [C:03+1] "https://puppet-compiler.wmflabs.org/output/1080818/4313/phab1004.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1080818 (https://phabricator.wikimedia.org/T377374) (owner: 10Dzahn) [21:51:23] (03CR) 10Dzahn: [C:03+2] phabricator: add numeric UID/GID for vcs systemuser [puppet] - 10https://gerrit.wikimedia.org/r/1080818 (https://phabricator.wikimedia.org/T377374) (owner: 10Dzahn) [21:52:29] (03CR) 10Dzahn: [C:03+2] "cc: @mmuhlenhoff I feel like we didn't need this in the past, but it does make sense since we want reserved global UID. and it matches the" [puppet] - 10https://gerrit.wikimedia.org/r/1080818 (https://phabricator.wikimedia.org/T377374) (owner: 10Dzahn) [21:53:52] (03CR) 10Santiago Faci: [C:03+2] "Looks good!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080817 (https://phabricator.wikimedia.org/T373967) (owner: 10Clare Ming) [21:54:01] (03CR) 10Santiago Faci: [C:03+2] "Looks good!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080816 (https://phabricator.wikimedia.org/T373967) (owner: 10Clare Ming) [21:55:00] (03Merged) 10jenkins-bot: Metrics Platform Instrument Configuration: Deploying to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080817 (https://phabricator.wikimedia.org/T373967) (owner: 10Clare Ming) [21:55:04] (03Merged) 10jenkins-bot: Metrics Platform Instrument Configuration: Deploying to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1080816 (https://phabricator.wikimedia.org/T373967) (owner: 10Clare Ming) [21:55:43] (03CR) 10Dzahn: [C:03+2] "--- /etc/sysusers.d/vcs.conf 2024-01-21 00:09:42.046162426 +0000" [puppet] - 10https://gerrit.wikimedia.org/r/1080818 (https://phabricator.wikimedia.org/T377374) (owner: 10Dzahn) [21:56:57] (03CR) 10Cwhite: "There are a few comments in the old puppet files that it'd be worth to bring over here. They have guidance and intent details that it'd b" [alerts] - 10https://gerrit.wikimedia.org/r/1077986 (https://phabricator.wikimedia.org/T370153) (owner: 10Tiziano Fogli) [21:57:46] (03CR) 10Dzahn: [C:03+2] "now: Execution of '/usr/sbin/groupadd -g 497 -r vcs' returned 4: groupadd: GID '497' already exists" [puppet] - 10https://gerrit.wikimedia.org/r/1080818 (https://phabricator.wikimedia.org/T377374) (owner: 10Dzahn) [21:57:49] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10236007 (10cmooney) We should also use SFP28 modules for the 'fabric-link' between the two pfw's if possible, two of these (on... [21:58:39] (03CR) 10Dzahn: [C:03+2] "sigh, it's because 497 was taken by group "aphlict"" [puppet] - 10https://gerrit.wikimedia.org/r/1080818 (https://phabricator.wikimedia.org/T377374) (owner: 10Dzahn) [22:00:32] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance [22:00:46] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance [22:00:53] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2150 (T376905)', diff saved to https://phabricator.wikimedia.org/P70191 and previous config saved to /var/cache/conftool/dbconfig/20241016-220053-ladsgroup.json [22:07:08] (03PS1) 10Dzahn: aphlict: create system user with systemd:sysuser and reserved UID/GID [puppet] - 10https://gerrit.wikimedia.org/r/1080823 (https://phabricator.wikimedia.org/T377374) [22:07:27] (03CR) 10Dzahn: [C:03+2] "https://gerrit.wikimedia.org/r/1080823" [puppet] - 10https://gerrit.wikimedia.org/r/1080818 (https://phabricator.wikimedia.org/T377374) (owner: 10Dzahn) [22:11:25] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2150 (T376905)', diff saved to https://phabricator.wikimedia.org/P70192 and previous config saved to /var/cache/conftool/dbconfig/20241016-221125-ladsgroup.json [22:26:32] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P70193 and previous config saved to /var/cache/conftool/dbconfig/20241016-222632-ladsgroup.json [22:41:39] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P70194 and previous config saved to /var/cache/conftool/dbconfig/20241016-224139-ladsgroup.json [22:41:40] (03PS3) 10Scott French: P:trafficserver: extend x-wikimedia-debug-routing for mwdebug-next [puppet] - 10https://gerrit.wikimedia.org/r/1072638 (https://phabricator.wikimedia.org/T372605) [22:53:09] (03CR) 10Scott French: "Many thanks for offering to review, Valentin!" [puppet] - 10https://gerrit.wikimedia.org/r/1072638 (https://phabricator.wikimedia.org/T372605) (owner: 10Scott French) [22:56:46] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2150 (T376905)', diff saved to https://phabricator.wikimedia.org/P70195 and previous config saved to /var/cache/conftool/dbconfig/20241016-225646-ladsgroup.json [22:56:53] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance [22:57:06] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance [22:57:07] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance [22:57:10] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance [22:57:17] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2159 (T376905)', diff saved to https://phabricator.wikimedia.org/P70196 and previous config saved to /var/cache/conftool/dbconfig/20241016-225716-ladsgroup.json [22:57:36] (03PS1) 10C. Scott Ananian: tests: ensure maintenance base class has always been requierd [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080828 (https://phabricator.wikimedia.org/T377391) [22:58:21] (03PS2) 10C. Scott Ananian: DNP: no-op patch to test CI [vendor] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080811 [23:05:42] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2159 (T376905)', diff saved to https://phabricator.wikimedia.org/P70197 and previous config saved to /var/cache/conftool/dbconfig/20241016-230541-ladsgroup.json [23:12:30] 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops, and 2 others: Frack eqiad network upgrade: design, installation and configuration - https://phabricator.wikimedia.org/T377381#10236211 (10cmooney) @RobH so looking at the options after discussion I think we need to do an fs.com (or alternative but with... [23:20:49] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P70198 and previous config saved to /var/cache/conftool/dbconfig/20241016-232048-ladsgroup.json [23:35:56] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P70199 and previous config saved to /var/cache/conftool/dbconfig/20241016-233555-ladsgroup.json [23:38:35] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1080835 [23:38:35] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1080835 (owner: 10TrainBranchBot) [23:39:04] (03Abandoned) 10C. Scott Ananian: DNP: no-op patch to test CI [vendor] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080811 (owner: 10C. Scott Ananian) [23:39:54] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, October 17 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/CommunityConfiguration] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080828 (https://phabricator.wikimedia.org/T377391) (owner: 10C. Scott Ananian) [23:40:21] (03PS3) 10Scott French: mw-(api-ext|web): create "next" releases [deployment-charts] - 10https://gerrit.wikimedia.org/r/1079572 (https://phabricator.wikimedia.org/T377040) [23:43:00] (03PS3) 10C. Scott Ananian: Bump wikimedia/parsoid to 0.20.0-a26 [vendor] (wmf/1.43.0-wmf.27) - 10https://gerrit.wikimedia.org/r/1080770 (https://phabricator.wikimedia.org/T377287) [23:51:03] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2159 (T376905)', diff saved to https://phabricator.wikimedia.org/P70200 and previous config saved to /var/cache/conftool/dbconfig/20241016-235102-ladsgroup.json [23:51:09] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance [23:51:22] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance [23:51:29] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2168 (T376905)', diff saved to https://phabricator.wikimedia.org/P70201 and previous config saved to /var/cache/conftool/dbconfig/20241016-235129-ladsgroup.json [23:59:50] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2168 (T376905)', diff saved to https://phabricator.wikimedia.org/P70202 and previous config saved to /var/cache/conftool/dbconfig/20241016-235950-ladsgroup.json