[00:00:13] (03CR) 10CI reject: [V:04-1] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1037501 (owner: 10TrainBranchBot) [00:02:53] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:02:56] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:03:45] FIRING: [3x] CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate [00:04:50] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:04:53] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:06:47] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:06:50] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:08:54] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:08:56] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:10:40] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:10:43] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:10:44] FIRING: [3x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:12:46] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:12:49] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:13:41] PROBLEM - Check unit status of httpbb_kubernetes_mw-parsoid_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-parsoid_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [00:21:23] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:21:26] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:23:20] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:23:24] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:23:45] RESOLVED: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_search_eqiad in eqiad (k8s): ... [00:23:45] fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-search - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate [00:25:47] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:25:50] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:27:54] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:27:58] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:29:52] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:29:55] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:32:45] FIRING: Traffic bill over quota: Alert for device cr2-drmrs.wikimedia.org - Traffic bill over quota got worse - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [00:35:01] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:35:04] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:37:18] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:37:22] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:37:45] FIRING: [5x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [00:45:15] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:45:19] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:49:52] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:49:56] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:51:50] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:51:53] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:52:45] FIRING: [5x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [00:53:47] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:53:50] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:55:44] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:55:47] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [00:57:45] RESOLVED: [4x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [00:58:01] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [00:58:05] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:02:49] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:02:52] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:06:06] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:06:09] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:08:33] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:08:36] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:08:43] FIRING: [3x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:10:40] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:10:44] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:13:41] RECOVERY - Check unit status of httpbb_kubernetes_mw-parsoid_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-parsoid_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [01:14:17] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:14:20] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:16:03] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:16:07] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:17:50] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:17:53] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:19:47] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:19:50] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:21:44] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:21:47] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:24:41] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:24:44] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:27:08] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:27:11] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:33:04] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:33:07] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:34:51] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:34:54] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:36:47] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:36:50] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:38:44] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:38:47] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:41:21] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63841 and previous config saved to /var/cache/conftool/dbconfig/20240602-014121-marostegui.json [01:41:25] T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069 [01:42:31] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:42:34] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:44:27] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:44:30] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:46:24] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:46:27] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:48:31] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:48:34] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:50:27] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:50:31] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:52:24] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:52:28] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [01:56:29] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P63842 and previous config saved to /var/cache/conftool/dbconfig/20240602-015629-marostegui.json [01:59:41] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [01:59:44] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:01:48] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:01:51] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:03:35] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:03:38] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:04:52] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63843 and previous config saved to /var/cache/conftool/dbconfig/20240602-020451-ladsgroup.json [02:04:55] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [02:06:11] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:06:14] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:08:58] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:09:01] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:10:55] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:10:58] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:11:39] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P63844 and previous config saved to /var/cache/conftool/dbconfig/20240602-021137-marostegui.json [02:13:02] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:13:05] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:14:58] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:15:02] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:16:55] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:16:58] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:20:00] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P63845 and previous config saved to /var/cache/conftool/dbconfig/20240602-021959-ladsgroup.json [02:21:45] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:21:48] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:23:51] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:23:55] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:25:48] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:25:52] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:26:47] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63846 and previous config saved to /var/cache/conftool/dbconfig/20240602-022646-marostegui.json [02:26:49] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance [02:26:50] T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069 [02:27:02] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance [02:27:14] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63847 and previous config saved to /var/cache/conftool/dbconfig/20240602-022710-marostegui.json [02:27:45] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:27:48] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:29:42] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:29:45] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:31:38] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:31:41] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:33:35] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:33:38] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:35:08] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P63848 and previous config saved to /var/cache/conftool/dbconfig/20240602-023507-ladsgroup.json [02:35:31] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:35:34] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:37:29] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:37:32] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:38:43] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:43:26] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:43:29] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:49:33] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:49:36] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:50:16] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63849 and previous config saved to /var/cache/conftool/dbconfig/20240602-025015-ladsgroup.json [02:50:18] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance [02:50:19] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [02:50:31] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance [02:50:31] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63850 and previous config saved to /var/cache/conftool/dbconfig/20240602-025031-marostegui.json [02:50:35] T364299: Make rc_id a bigint - https://phabricator.wikimedia.org/T364299 [02:50:39] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63851 and previous config saved to /var/cache/conftool/dbconfig/20240602-025039-ladsgroup.json [02:53:30] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [02:53:33] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [02:55:44] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:00:17] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:00:20] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:02:14] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:02:17] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:03:43] FIRING: [2x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:05:39] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P63852 and previous config saved to /var/cache/conftool/dbconfig/20240602-030539-marostegui.json [03:16:30] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:16:33] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:19:57] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:20:00] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:20:47] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P63853 and previous config saved to /var/cache/conftool/dbconfig/20240602-032047-marostegui.json [03:21:44] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:21:47] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:27:01] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:27:04] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:31:08] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:31:11] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:35:55] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63854 and previous config saved to /var/cache/conftool/dbconfig/20240602-033555-marostegui.json [03:35:58] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance [03:35:58] T364299: Make rc_id a bigint - https://phabricator.wikimedia.org/T364299 [03:36:11] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance [03:36:19] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2130 (T364299)', diff saved to https://phabricator.wikimedia.org/P63855 and previous config saved to /var/cache/conftool/dbconfig/20240602-033618-marostegui.json [03:36:34] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:36:38] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:42:31] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:42:35] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:45:18] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:45:22] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:47:15] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:47:18] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:52:22] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:52:25] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:55:39] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:55:42] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [03:58:08] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [03:58:11] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:02:35] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:02:38] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:04:32] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:04:35] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:06:38] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:06:42] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:14:05] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:14:08] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:16:02] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:16:05] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:17:58] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:18:02] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:36:45] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:36:49] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:53:23] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:53:26] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:56:40] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:56:43] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [04:58:37] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [04:58:41] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:00:34] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:00:37] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:02:31] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:02:34] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:04:28] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:04:32] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:06:25] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:06:28] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:08:22] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:08:25] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:10:49] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:10:53] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:13:06] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:13:10] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:15:04] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:15:07] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:16:51] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:16:54] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:18:48] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:18:52] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:27:05] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:27:09] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:29:23] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:29:26] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:31:20] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:31:23] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:34:56] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:35:00] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:37:34] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:37:37] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:39:30] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:39:34] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:42:18] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:42:21] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:44:04] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:44:08] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:47:39] PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1003), Fresh: 143 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [05:47:42] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:47:45] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:50:19] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:50:23] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:53:46] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:53:50] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:55:44] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:55:47] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:57:41] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:57:44] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [05:59:38] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [05:59:41] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:04:21] FIRING: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [06:08:05] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:08:08] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:09:21] RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [06:10:01] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:10:04] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:12:18] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:12:22] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:14:05] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:14:08] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:15:52] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:15:55] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:20:44] FIRING: [2x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:21:20] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:21:23] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:23:27] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:23:30] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:29:04] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:29:07] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:39:10] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:39:13] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:41:07] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:41:11] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:45:14] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:45:17] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:47:01] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:47:04] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:49:18] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:49:21] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:57:16] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:57:19] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [06:59:13] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [06:59:16] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240602T0700) [07:06:40] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:06:44] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:09:48] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:09:51] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:24:24] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:24:28] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:29:57] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P63856 and previous config saved to /var/cache/conftool/dbconfig/20240602-072956-root.json [07:30:09] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 10:00:00 on db1206.eqiad.wmnet with reason: Long schema change [07:30:22] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1206.eqiad.wmnet with reason: Long schema change [07:38:32] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:38:35] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:40:18] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:40:21] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:42:15] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:42:18] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:44:12] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:44:16] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:46:09] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:46:13] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:47:56] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:47:59] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:50:23] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:50:26] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:56:49] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:56:53] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [07:59:07] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [07:59:10] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:01:04] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:01:07] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:02:51] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:02:55] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:04:48] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:04:52] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:06:35] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:06:38] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:08:32] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:08:35] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:13:48] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:13:51] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:15:45] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:15:48] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:17:42] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:17:45] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:19:38] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:19:41] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:21:45] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:21:48] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:23:42] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:23:45] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:24:17] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364299)', diff saved to https://phabricator.wikimedia.org/P63857 and previous config saved to /var/cache/conftool/dbconfig/20240602-082417-marostegui.json [08:25:39] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:25:43] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:27:26] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:27:30] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:29:24] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:29:27] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:31:21] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:31:24] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:33:18] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:33:21] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:37:45] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:37:48] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:39:25] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P63858 and previous config saved to /var/cache/conftool/dbconfig/20240602-083925-marostegui.json [08:39:41] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:39:44] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:41:38] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:41:42] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:43:55] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:43:58] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:45:52] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:45:55] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [08:54:33] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P63859 and previous config saved to /var/cache/conftool/dbconfig/20240602-085433-marostegui.json [08:59:49] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [08:59:52] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:07:26] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:07:29] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:09:12] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:09:16] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:09:41] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364299)', diff saved to https://phabricator.wikimedia.org/P63860 and previous config saved to /var/cache/conftool/dbconfig/20240602-090941-marostegui.json [09:09:44] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance [09:09:44] T364299: Make rc_id a bigint - https://phabricator.wikimedia.org/T364299 [09:09:57] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance [09:10:00] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance [09:10:14] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance [09:10:22] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63861 and previous config saved to /var/cache/conftool/dbconfig/20240602-091021-marostegui.json [09:18:50] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:18:53] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:25:16] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:25:20] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:31:13] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:31:17] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:42:10] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:42:13] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:44:07] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:44:10] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:47:39] RECOVERY - Backup freshness on backup1001 is OK: Fresh: 144 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [09:57:53] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:57:57] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [09:59:51] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [09:59:54] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:05:27] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:05:31] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:08:35] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:08:38] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:10:21] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:10:24] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:12:08] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:12:11] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:14:15] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:14:18] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:16:22] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:16:25] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:20:44] FIRING: [2x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:22:49] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:22:52] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:24:56] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:24:59] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:26:52] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:26:55] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:30:28] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:30:31] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:30:49] (03PS3) 10Gergő Tisza: varnish: Copy value of X-Wikimedia-Debug cookie to header [puppet] - 10https://gerrit.wikimedia.org/r/1030591 (https://phabricator.wikimedia.org/T350094) [10:32:25] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:32:28] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:34:22] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:34:25] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:36:18] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:36:21] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:39:48] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63862 and previous config saved to /var/cache/conftool/dbconfig/20240602-103948-marostegui.json [10:39:52] T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069 [10:41:35] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:41:38] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:43:32] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:43:35] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:47:39] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:47:42] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:49:36] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:49:39] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:52:22] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:52:25] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:54:56] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P63863 and previous config saved to /var/cache/conftool/dbconfig/20240602-105456-marostegui.json [10:56:19] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:56:22] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [10:58:16] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [10:58:19] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:10:04] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P63864 and previous config saved to /var/cache/conftool/dbconfig/20240602-111004-marostegui.json [11:12:37] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:12:40] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:14:34] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:14:37] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:14:48] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63865 and previous config saved to /var/cache/conftool/dbconfig/20240602-111447-ladsgroup.json [11:14:50] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [11:17:20] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:17:23] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:19:17] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:19:20] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:23:34] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:23:37] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:25:12] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63866 and previous config saved to /var/cache/conftool/dbconfig/20240602-112512-marostegui.json [11:25:15] T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069 [11:29:11] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:29:14] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:29:55] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P63867 and previous config saved to /var/cache/conftool/dbconfig/20240602-112955-ladsgroup.json [11:29:56] (03CR) 10Gergő Tisza: varnish: Copy value of X-Wikimedia-Debug cookie to header (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1030591 (https://phabricator.wikimedia.org/T350094) (owner: 10Gergő Tisza) [11:31:19] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:31:22] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:36:06] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:36:10] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:39:13] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:39:16] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:41:50] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:41:54] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:45:03] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P63868 and previous config saved to /var/cache/conftool/dbconfig/20240602-114503-ladsgroup.json [11:48:58] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:49:01] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:54:14] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:54:18] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:56:12] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:56:15] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [11:58:09] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [11:58:12] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:00:11] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63869 and previous config saved to /var/cache/conftool/dbconfig/20240602-120010-ladsgroup.json [12:00:13] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance [12:00:22] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [12:00:26] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance [12:00:34] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63870 and previous config saved to /var/cache/conftool/dbconfig/20240602-120033-ladsgroup.json [12:02:55] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:02:58] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:36:04] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:36:07] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:38:01] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:38:04] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:39:58] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:40:01] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:41:02] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P63871 and previous config saved to /var/cache/conftool/dbconfig/20240602-124102-root.json [12:41:55] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:41:56] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance [12:41:58] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:42:09] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance [12:43:51] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:43:54] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:45:38] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:45:41] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:47:35] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:47:39] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:49:42] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:49:46] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:51:29] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:51:32] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:53:26] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:53:29] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [12:56:08] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P63872 and previous config saved to /var/cache/conftool/dbconfig/20240602-125608-root.json [12:59:53] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [12:59:55] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:01:59] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:02:03] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:07:16] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:07:19] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:09:03] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:09:06] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:11:15] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P63873 and previous config saved to /var/cache/conftool/dbconfig/20240602-131114-root.json [13:13:29] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:13:32] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:15:26] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:15:29] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:17:12] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:17:16] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:19:10] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:19:13] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:21:17] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:21:20] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:25:53] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:25:56] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:26:20] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P63874 and previous config saved to /var/cache/conftool/dbconfig/20240602-132620-root.json [13:27:50] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:27:53] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:29:47] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:29:50] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:41:27] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P63875 and previous config saved to /var/cache/conftool/dbconfig/20240602-134126-root.json [13:47:34] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:47:38] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:49:29] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [13:49:32] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [13:56:33] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P63876 and previous config saved to /var/cache/conftool/dbconfig/20240602-135632-root.json [14:03:34] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63877 and previous config saved to /var/cache/conftool/dbconfig/20240602-140334-marostegui.json [14:03:38] T364299: Make rc_id a bigint - https://phabricator.wikimedia.org/T364299 [14:04:26] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:04:30] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:06:23] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:06:27] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:11:01] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:11:04] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:11:40] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P63878 and previous config saved to /var/cache/conftool/dbconfig/20240602-141139-root.json [14:12:57] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:13:00] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:14:54] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:14:58] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:17:01] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:17:04] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:18:44] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P63879 and previous config saved to /var/cache/conftool/dbconfig/20240602-141843-marostegui.json [14:18:48] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:18:52] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:19:52] (03PS2) 10Gergő Tisza: [WIP][POC] Handle sso.wikimedia.org domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) [14:19:52] (03PS1) 10Gergő Tisza: multiversion: Support beta for upload hostname check [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037929 [14:19:52] (03PS1) 10Gergő Tisza: multiversion: Make MWMultiVersion::error() unittest-friendly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037930 [14:20:30] (03CR) 10CI reject: [V:04-1] [WIP][POC] Handle sso.wikimedia.org domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) (owner: 10Gergő Tisza) [14:20:44] FIRING: [2x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:21:03] (03PS3) 10Gergő Tisza: [WIP][POC] Handle sso.wikimedia.org domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) [14:21:38] (03CR) 10CI reject: [V:04-1] [WIP][POC] Handle sso.wikimedia.org domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) (owner: 10Gergő Tisza) [14:29:45] FIRING: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ... [14:29:45] fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate [14:33:53] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P63880 and previous config saved to /var/cache/conftool/dbconfig/20240602-143352-marostegui.json [14:34:45] RESOLVED: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ... [14:34:45] fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate [14:38:43] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:40:26] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:40:29] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:41:43] FIRING: VarnishUnavailable: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DVarnishUnavailable [14:41:44] FIRING: HaproxyUnavailable: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable [14:43:54] here [14:44:30] grafana graphs seem to recover again [14:44:36] !incidents [14:44:37] 4719 (UNACKED) VarnishUnavailable global sre (varnish-text thanos-rule) [14:44:37] 4720 (UNACKED) HaproxyUnavailable cache_text global sre (thanos-rule) [14:44:59] !ack 4719 [14:45:00] 4719 (ACKED) VarnishUnavailable global sre (varnish-text thanos-rule) [14:45:05] !ack 4720 [14:45:06] 4720 (ACKED) HaproxyUnavailable cache_text global sre (thanos-rule) [14:46:43] RESOLVED: VarnishUnavailable: varnish-text has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/Varnish#Diagnosing_Varnish_alerts - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DVarnishUnavailable [14:46:44] RESOLVED: HaproxyUnavailable: HAProxy (cache_text) has reduced HTTP availability #page - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyUnavailable [14:48:02] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:48:05] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:49:01] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63881 and previous config saved to /var/cache/conftool/dbconfig/20240602-144900-marostegui.json [14:49:03] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance [14:49:05] T364299: Make rc_id a bigint - https://phabricator.wikimedia.org/T364299 [14:49:16] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance [14:49:24] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63882 and previous config saved to /var/cache/conftool/dbconfig/20240602-144924-marostegui.json [14:49:59] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [14:50:02] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [14:55:44] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:06:16] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:06:19] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:08:12] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:08:15] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:10:19] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:10:23] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:13:16] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:13:19] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:15:13] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:15:16] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:17:10] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:17:13] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:18:57] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:19:00] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:20:53] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:20:57] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:26:01] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:26:04] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:30:08] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:30:12] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:32:16] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:32:18] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:41:12] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:41:15] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:43:19] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:43:22] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:45:17] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:45:20] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:47:14] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:47:17] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:49:20] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:49:24] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:51:17] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:51:21] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:53:25] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:53:28] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:55:31] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:55:34] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:57:18] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:57:21] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [15:59:15] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [15:59:18] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:01:42] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:01:45] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:10:38] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:10:41] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:12:35] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:12:38] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:14:51] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:14:55] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:33:00] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:33:03] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:34:57] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:34:59] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:37:03] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:37:07] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:39:10] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:39:13] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:41:07] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:41:10] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:46:14] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:46:17] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:50:25] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:50:28] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:52:22] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:52:26] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [16:57:59] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [16:58:02] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:01:06] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:01:10] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:04:03] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:04:07] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:13:41] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:13:44] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:16:37] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:16:40] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:18:34] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:18:38] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:20:21] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:20:24] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:22:19] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:22:22] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:31:46] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:31:49] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:33:32] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:33:35] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:35:29] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:35:32] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:37:26] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:37:29] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:39:13] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:39:16] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:41:09] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:41:13] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:52:17] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:52:20] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:54:13] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:54:17] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:56:20] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:56:24] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [17:58:17] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [17:58:21] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:03:25] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:03:28] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:05:22] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:05:24] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:06:28] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63883 and previous config saved to /var/cache/conftool/dbconfig/20240602-180627-ladsgroup.json [18:06:31] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [18:07:18] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:07:21] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:09:35] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:09:38] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:11:42] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:11:45] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:13:39] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:13:43] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:19:47] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:19:50] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:20:44] FIRING: [2x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:21:36] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P63884 and previous config saved to /var/cache/conftool/dbconfig/20240602-182135-ladsgroup.json [18:21:54] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:21:57] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:23:51] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:23:54] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:25:47] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:25:50] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:27:33] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:27:37] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:29:31] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:29:34] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:31:18] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:31:21] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:33:14] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:33:17] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:35:11] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:35:14] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:36:44] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P63885 and previous config saved to /var/cache/conftool/dbconfig/20240602-183643-ladsgroup.json [18:37:58] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:38:01] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:39:55] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:39:58] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:41:51] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:41:54] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:45:48] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:45:51] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:46:37] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [18:46:59] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [18:47:09] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [18:47:49] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 52065 bytes in 0.105 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [18:47:55] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:47:58] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:47:59] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Tue 13 Aug 2024 12:55:14 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [18:48:27] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8616 bytes in 0.261 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [18:50:45] FIRING: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ... [18:50:45] fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate [18:51:52] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63886 and previous config saved to /var/cache/conftool/dbconfig/20240602-185151-ladsgroup.json [18:51:54] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance [18:51:55] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [18:52:07] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance [18:52:15] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P63887 and previous config saved to /var/cache/conftool/dbconfig/20240602-185215-ladsgroup.json [18:52:42] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:52:45] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:54:39] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:54:43] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [18:56:27] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [18:56:30] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:00:34] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:00:37] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:02:30] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:02:34] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:03:41] PROBLEM - Check unit status of httpbb_hourly_appserver on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_hourly_appserver https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:05:45] RESOLVED: [2x] CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate [19:07:27] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:07:30] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:08:43] FIRING: [3x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:09:44] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:09:47] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:11:41] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:11:43] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:13:37] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:13:40] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:14:58] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63888 and previous config saved to /var/cache/conftool/dbconfig/20240602-191458-marostegui.json [19:15:02] T364299: Make rc_id a bigint - https://phabricator.wikimedia.org/T364299 [19:20:44] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:20:47] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:23:01] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:23:04] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:26:38] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:26:41] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:28:35] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:28:38] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:30:06] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P63889 and previous config saved to /var/cache/conftool/dbconfig/20240602-193006-marostegui.json [19:30:32] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:30:35] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:32:18] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:32:21] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:34:25] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:34:28] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:36:12] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:36:16] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:41:59] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:42:02] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:45:14] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P63890 and previous config saved to /var/cache/conftool/dbconfig/20240602-194514-marostegui.json [19:47:06] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:47:09] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:48:53] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:48:56] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:50:40] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:50:43] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:53:47] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:53:50] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:55:43] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:55:46] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [19:58:20] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [19:58:23] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:00:15] (03PS1) 10Ebrahim: Enable numeric sorting for Persian in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037940 (https://phabricator.wikimedia.org/T329440) [20:00:17] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:00:20] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:00:23] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63891 and previous config saved to /var/cache/conftool/dbconfig/20240602-200021-marostegui.json [20:00:25] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance [20:00:31] T364299: Make rc_id a bigint - https://phabricator.wikimedia.org/T364299 [20:00:38] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance [20:00:38] (03PS2) 10Ladsgroup: beta: Enable numeric sorting for Persian [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037940 (https://phabricator.wikimedia.org/T329440) (owner: 10Ebrahim) [20:00:46] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2153 (T364299)', diff saved to https://phabricator.wikimedia.org/P63892 and previous config saved to /var/cache/conftool/dbconfig/20240602-200046-marostegui.json [20:00:53] (03CR) 10Ladsgroup: [C:03+2] beta: Enable numeric sorting for Persian [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037940 (https://phabricator.wikimedia.org/T329440) (owner: 10Ebrahim) [20:01:02] (03PS3) 10Ebrahim: Enable numeric sorting for Persian in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037940 (https://phabricator.wikimedia.org/T329440) [20:01:51] (03CR) 10Ladsgroup: Enable numeric sorting for Persian in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037940 (https://phabricator.wikimedia.org/T329440) (owner: 10Ebrahim) [20:02:14] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:02:17] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:02:27] (03PS4) 10Ladsgroup: beta: Enable numeric sorting for Persian [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037940 (https://phabricator.wikimedia.org/T329440) (owner: 10Ebrahim) [20:02:30] (03CR) 10Ladsgroup: [C:03+2] beta: Enable numeric sorting for Persian [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037940 (https://phabricator.wikimedia.org/T329440) (owner: 10Ebrahim) [20:03:08] (03Merged) 10jenkins-bot: beta: Enable numeric sorting for Persian [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037940 (https://phabricator.wikimedia.org/T329440) (owner: 10Ebrahim) [20:03:41] RECOVERY - Check unit status of httpbb_hourly_appserver on cumin2002 is OK: OK: Status of the systemd unit httpbb_hourly_appserver https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [20:03:43] FIRING: [3x] SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:04:40] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:04:43] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:06:37] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:06:40] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:12:44] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:12:47] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:15:41] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:15:44] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:17:37] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:17:41] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:19:45] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:19:48] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:21:42] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:21:45] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:23:29] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:23:32] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:25:16] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:25:19] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:30:43] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:30:47] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:31:52] (03PS1) 10Ebrahim: Enable numeric sorting for Persian [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037942 (https://phabricator.wikimedia.org/T329440) [20:32:31] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:32:34] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:34:28] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:34:31] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:38:15] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:38:18] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:40:02] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:40:05] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:41:59] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:42:02] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:43:56] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:43:59] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:44:20] PROBLEM - MariaDB Replica SQL: s5 #page on db1213 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Index for table page_props is corrupt: try to repair it on query. Default database: dewiki. [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [20:46:23] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:46:26] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:47:20] !log taavi@cumin1002 dbctl commit (dc=all): 'depool db1213', diff saved to https://phabricator.wikimedia.org/P63893 and previous config saved to /var/cache/conftool/dbconfig/20240602-204719-taavi.json [20:48:30] i depooled db1213, opening a task for DBAs to look at monday [20:49:55] thanks taavi [20:51:15] T366429 [20:51:16] T366429: db1213 replication broken (Index for table dewiki.page_props is corrupt) - https://phabricator.wikimedia.org/T366429 [20:51:19] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:51:19] and acked on victorops [20:51:23] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:52:20] PROBLEM - MariaDB Replica Lag: s5 #page on db1213 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 646.09 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [20:52:35] i will downtime it [20:53:16] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:53:19] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:53:43] !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1213.eqiad.wmnet with reason: replication issues [20:53:56] !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1213.eqiad.wmnet with reason: replication issues [20:55:03] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:55:06] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:57:20] RECOVERY - MariaDB Replica SQL: s5 #page on db1213 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [20:57:57] replication is back but lets keep the host depooled [20:59:39] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [20:59:42] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [20:59:44] and it goes down again [21:01:36] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:01:39] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:12:32] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:12:36] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:14:40] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:14:43] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:21:07] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:21:10] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:23:04] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:23:07] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:25:01] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:25:04] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:26:48] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:26:51] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:28:45] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:28:48] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:31:22] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:31:25] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:33:09] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:33:12] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:35:26] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:35:29] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:36:44] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:36:47] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:38:31] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:38:34] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:40:27] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:40:31] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:42:25] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:42:28] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:44:22] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:44:25] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:46:39] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:46:43] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:48:36] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:48:39] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:53:03] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:53:06] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:54:59] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:55:03] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [21:58:17] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [21:58:20] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:00:14] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:00:17] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:03:33] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:03:36] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:07:00] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:07:03] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:09:07] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:09:10] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:13:39] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [22:13:59] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [22:14:09] PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [22:14:29] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8616 bytes in 0.868 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [22:14:49] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 52066 bytes in 0.225 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [22:14:59] RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Tue 13 Aug 2024 12:55:14 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [22:15:04] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:15:07] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:17:51] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:17:54] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:19:37] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:19:41] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:21:34] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:21:37] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:23:31] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:23:34] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:25:28] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:25:31] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:30:55] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:30:58] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:34:02] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:34:05] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:35:48] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:35:52] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:37:46] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:37:49] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:39:42] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:39:46] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:52:00] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:52:03] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [22:59:07] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [22:59:10] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:01:34] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:01:36] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:02:41] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [23:03:20] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:03:23] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:04:01] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [23:04:33] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8616 bytes in 0.276 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [23:04:53] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 52065 bytes in 0.070 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [23:06:30] (03PS4) 10Gergő Tisza: [POC] Handle sso.wikimedia.org domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) [23:06:57] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:07:00] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:07:07] (03CR) 10CI reject: [V:04-1] [POC] Handle sso.wikimedia.org domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) (owner: 10Gergő Tisza) [23:08:54] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:08:57] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:11:40] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:11:43] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:13:37] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:13:40] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:14:41] (03PS2) 10Gergő Tisza: multiversion: Support beta for upload hostname check [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037929 [23:14:41] (03PS2) 10Gergő Tisza: multiversion: Make MWMultiVersion::error() unittest-friendly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1037930 [23:14:41] (03PS5) 10Gergő Tisza: [POC] Handle sso.wikimedia.org domain [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1036245 (https://phabricator.wikimedia.org/T365162) [23:15:24] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:15:27] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:17:20] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:17:23] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:19:17] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:19:21] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:21:04] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:21:07] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:28:11] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:28:14] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:28:26] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance [23:28:39] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance [23:28:47] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db1190 (T364069)', diff saved to https://phabricator.wikimedia.org/P63894 and previous config saved to /var/cache/conftool/dbconfig/20240602-232847-marostegui.json [23:28:50] T364069: Rebuild pagelinks tables - https://phabricator.wikimedia.org/T364069 [23:30:06] (03Abandoned) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1037501 (owner: 10TrainBranchBot) [23:35:38] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:35:42] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:37:25] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:37:29] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:38:15] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1037504 [23:38:15] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1037504 (owner: 10TrainBranchBot) [23:39:12] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:39:15] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:41:50] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:41:53] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:42:43] (03PS3) 10Wargo: Assign applychangetags right to group "all" on plwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1031174 (https://phabricator.wikimedia.org/T363638) [23:43:47] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:43:50] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:45:43] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:45:46] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:47:40] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:47:43] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:49:47] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:49:50] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:51:44] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:51:48] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:54:51] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:54:54] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:56:38] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:56:41] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [23:58:45] !log @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [23:58:45] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1037504 (owner: 10TrainBranchBot) [23:58:48] !log @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply