[00:10:14] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1065574 (owner: 10TrainBranchBot) [00:19:26] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [00:23:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs1023:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [01:03:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs1023:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [01:49:26] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [01:52:40] FIRING: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:19:26] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [02:29:27] FIRING: [12x] ProbeDown: Service puppetmaster1001:8140 has failed probes (http_puppetmaster1001_eqiad_wmnet_https_ip4) - https://wikitech.wikimedia.org/wiki/Puppet#Debugging - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [02:31:03] FIRING: PuppetFailure: Puppet has failed on wdqs1024:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [02:39:27] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:59:27] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:02:25] RESOLVED: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:09:26] RESOLVED: RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [03:17:26] FIRING: ProbeDown: Service upload-https:443 has failed probes (http_upload-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#upload-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [03:20:44] FIRING: [14x] ProbeDown: Service puppetmaster1001:8140 has failed probes (http_puppetmaster1001_eqiad_wmnet_https_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [03:22:26] RESOLVED: ProbeDown: Service upload-https:443 has failed probes (http_upload-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#upload-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [03:24:27] FIRING: [14x] ProbeDown: Service puppetmaster1001:8140 has failed probes (http_puppetmaster1001_eqiad_wmnet_https_ip4) - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [03:29:26] FIRING: RoutinatorRsyncErrors: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [03:39:26] RESOLVED: RoutinatorRsyncErrors: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [03:47:45] FIRING: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_producer_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=producer - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUnstable [03:52:45] RESOLVED: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_producer_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=producer - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUnstable [03:55:40] FIRING: RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [03:57:56] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [04:22:56] RESOLVED: RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [04:32:10] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [05:12:10] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [06:03:21] FIRING: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [06:06:35] I'll [06:06:49] I'm not around unfortunately [06:08:21] RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [06:12:07] !log oblivian@cumin1002 dbctl commit (dc=all): 'depooling db1161, broken replica', diff saved to https://phabricator.wikimedia.org/P67744 and previous config saved to /var/cache/conftool/dbconfig/20240825-061206-oblivian.json [06:12:23] <_joe_> !incidents [06:12:24] 5109 (UNACKED) db1161 (paged)/MariaDB Replica SQL: s5 (paged) [06:12:24] 5110 (UNACKED) db1161 (paged)/MariaDB Replica Lag: s5 (paged) [06:12:24] 5106 (RESOLVED) ProbeDown sre (198.35.26.112 ip4 upload-https:443 probes/service http_upload-https_ip4 ulsfo) [06:12:40] <_joe_> !ack 5109 [06:12:40] 5109 (ACKED) db1161 (paged)/MariaDB Replica SQL: s5 (paged) [06:12:46] <_joe_> !ack 5110 [06:12:47] 5110 (ACKED) db1161 (paged)/MariaDB Replica Lag: s5 (paged) [06:17:25] FIRING: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:31:03] FIRING: PuppetFailure: Puppet has failed on wdqs1024:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [06:53:56] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [06:57:45] <_joe_> !log repairing mgwiktionary.pagelinks on db1161 [06:57:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240825T0700) [07:00:33] <_joe_> !restarted replication on db1161 [07:05:23] !log oblivian@cumin1002 dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67745 and previous config saved to /var/cache/conftool/dbconfig/20240825-070522-oblivian.json [07:20:28] !log oblivian@cumin1002 dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67746 and previous config saved to /var/cache/conftool/dbconfig/20240825-072027-oblivian.json [07:24:27] FIRING: [12x] ProbeDown: Service puppetmaster1001:8140 has failed probes (http_puppetmaster1001_eqiad_wmnet_https_ip4) - https://wikitech.wikimedia.org/wiki/Puppet#Debugging - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:35:33] !log oblivian@cumin1002 dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67747 and previous config saved to /var/cache/conftool/dbconfig/20240825-073533-oblivian.json [07:42:25] FIRING: SystemdUnitFailed: user@499.service on wdqs1023:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:47:25] RESOLVED: SystemdUnitFailed: user@499.service on wdqs1023:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:50:39] !log oblivian@cumin1002 dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67748 and previous config saved to /var/cache/conftool/dbconfig/20240825-075038-oblivian.json [07:52:11] RESOLVED: RoutinatorRsyncErrors: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [07:56:25] FIRING: [2x] SystemdUnitFailed: systemd-timedated.service on wdqs1023:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:05:44] !log oblivian@cumin1002 dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Replication fixed', diff saved to https://phabricator.wikimedia.org/P67749 and previous config saved to /var/cache/conftool/dbconfig/20240825-080544-oblivian.json [08:11:25] RESOLVED: [2x] SystemdUnitFailed: systemd-timedated.service on wdqs1023:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:17:55] FIRING: [3x] SystemdUnitFailed: systemd-timedated.service on wdqs1022:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:27:55] RESOLVED: [3x] SystemdUnitFailed: systemd-timedated.service on wdqs1022:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:33:55] FIRING: [3x] SystemdUnitFailed: systemd-timedated.service on wdqs1022:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:35:48] RESOLVED: PuppetFailure: Puppet has failed on wdqs1024:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [08:41:26] FIRING: RoutinatorRsyncErrors: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [08:43:55] RESOLVED: [3x] SystemdUnitFailed: systemd-timedated.service on wdqs1022:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:44:55] FIRING: [3x] SystemdUnitFailed: systemd-timedated.service on wdqs1022:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:49:55] FIRING: [4x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:59:55] FIRING: [4x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:04:55] FIRING: [4x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:14:55] RESOLVED: [3x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:26:26] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [09:28:25] FIRING: SystemdUnitFailed: systemd-timedated.service on wdqs2024:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:38:25] RESOLVED: SystemdUnitFailed: systemd-timedated.service on wdqs2024:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:43:40] FIRING: [2x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:57:55] FIRING: [2x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:01:26] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [10:02:55] RESOLVED: [2x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:07:55] FIRING: [4x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:08:40] RESOLVED: [4x] SystemdUnitFailed: systemd-timedated.service on wdqs1021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:11:26] RESOLVED: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [10:17:40] FIRING: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:29:27] FIRING: [14x] ProbeDown: Service kubestagemaster1003:6443 has failed probes (http_staging_eqiad_kube_apiserver_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [10:59:26] FIRING: RoutinatorRsyncErrors: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [11:31:48] FIRING: PuppetFailure: Puppet has failed on wdqs1024:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [11:49:26] RESOLVED: RoutinatorRsyncErrors: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [12:39:26] FIRING: RoutinatorRsyncErrors: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [13:04:26] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [13:09:26] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [14:17:40] FIRING: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:19:26] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [14:29:27] FIRING: [12x] ProbeDown: Service puppetmaster1001:8140 has failed probes (http_puppetmaster1001_eqiad_wmnet_https_ip4) - https://wikitech.wikimedia.org/wiki/Puppet#Debugging - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:39:27] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:46:23] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2164 (T367856)', diff saved to https://phabricator.wikimedia.org/P67750 and previous config saved to /var/cache/conftool/dbconfig/20240825-144623-marostegui.json [14:46:27] T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856 [15:01:31] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P67751 and previous config saved to /var/cache/conftool/dbconfig/20240825-150130-marostegui.json [15:04:27] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:09:26] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [15:16:37] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P67752 and previous config saved to /var/cache/conftool/dbconfig/20240825-151637-marostegui.json [15:31:45] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2164 (T367856)', diff saved to https://phabricator.wikimedia.org/P67753 and previous config saved to /var/cache/conftool/dbconfig/20240825-153144-marostegui.json [15:31:46] !log marostegui@cumin1002 START - Cookbook sre.hosts.downtime for 2 days, 7:00:00 on db2165.codfw.wmnet with reason: Maintenance [15:31:48] T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856 [15:32:00] !log marostegui@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 7:00:00 on db2165.codfw.wmnet with reason: Maintenance [15:32:03] FIRING: PuppetFailure: Puppet has failed on wdqs1024:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [15:32:07] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2165 (T367856)', diff saved to https://phabricator.wikimedia.org/P67754 and previous config saved to /var/cache/conftool/dbconfig/20240825-153206-marostegui.json [15:59:26] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [16:46:31] 06SRE, 06Infrastructure-Foundations, 06Traffic: NetworkProbeLimit cookie should set samesite attribute - https://phabricator.wikimedia.org/T342624#10089982 (10Bugreporter) [17:46:05] (03PS1) 10PipelineBot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1065911 [18:14:26] RESOLVED: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [18:17:40] FIRING: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:23:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on wdqs1023:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [18:29:27] FIRING: [12x] ProbeDown: Service puppetmaster1001:8140 has failed probes (http_puppetmaster1001_eqiad_wmnet_https_ip4) - https://wikitech.wikimedia.org/wiki/Puppet#Debugging - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [18:33:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on wdqs1023:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [19:07:17] Hi, I don't know if this is the right channel to ask, I need a user rename on phabricator, please PM for details. Thanks [19:09:24] legoktm: or quiddity: might be [19:10:40] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [19:18:32] (03PS2) 10Andrew Bogott: Remove obsolete files for openstack v. antelope [puppet] - 10https://gerrit.wikimedia.org/r/1065235 [19:18:33] (03PS1) 10Andrew Bogott: Openstack magnum: override default db settings to increase connection count [puppet] - 10https://gerrit.wikimedia.org/r/1065934 (https://phabricator.wikimedia.org/T373227) [19:18:45] (03PS2) 10Andrew Bogott: Openstack magnum: override default db settings to increase connection count [puppet] - 10https://gerrit.wikimedia.org/r/1065934 (https://phabricator.wikimedia.org/T373227) [19:18:46] (03PS3) 10Andrew Bogott: Remove obsolete files for openstack v. antelope [puppet] - 10https://gerrit.wikimedia.org/r/1065235 [19:20:15] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 23.91% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-ext&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [19:25:15] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 23.91% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-ext&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [19:30:12] (03CR) 10Aude: [C:03+1] "this has what we need to use the service on beta and appropriate configs for the extension" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1055984 (https://phabricator.wikimedia.org/T369945) (owner: 10Catrope) [19:32:03] FIRING: PuppetFailure: Puppet has failed on wdqs1024:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [19:32:20] (03PS3) 10Andrew Bogott: Openstack magnum: override default db settings to increase connection count [puppet] - 10https://gerrit.wikimedia.org/r/1065934 (https://phabricator.wikimedia.org/T373227) [19:32:20] (03PS4) 10Andrew Bogott: Remove obsolete files for openstack v. antelope [puppet] - 10https://gerrit.wikimedia.org/r/1065235 [19:33:08] (03PS4) 10Andrew Bogott: Openstack magnum: override default db settings to increase connection count [puppet] - 10https://gerrit.wikimedia.org/r/1065934 (https://phabricator.wikimedia.org/T373227) [19:33:08] (03PS5) 10Andrew Bogott: Remove obsolete files for openstack v. antelope [puppet] - 10https://gerrit.wikimedia.org/r/1065235 [19:33:08] M7: Probably creating a Phabricator ticket about it will be easier [19:33:10] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1065934 (https://phabricator.wikimedia.org/T373227) (owner: 10Andrew Bogott) [19:35:11] marostegui: thanks [19:38:05] 10SRE-swift-storage: Unable to upload to Commons: uploadstash-file-not-found: Key "187kyl5ozj74.xtav8j.51508.djvu" not found in stash - https://phabricator.wikimedia.org/T278104#10090158 (10TheDJ) 05Open→03Resolved a:03TheDJ Several big issues in chunk assembling (T335243 and T358308), were resolved th... [20:20:41] RESOLVED: RoutinatorRsyncErrors: Routinator rsync fetching issue in eqiad - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [20:30:56] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [21:05:56] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [21:19:24] (03PS6) 10Andrew Bogott: Remove obsolete files for openstack v. antelope [puppet] - 10https://gerrit.wikimedia.org/r/1065235 [21:19:25] (03PS1) 10Andrew Bogott: Magnum: hack in fixes to sqlalchemy use [puppet] - 10https://gerrit.wikimedia.org/r/1065988 (https://phabricator.wikimedia.org/T373227) [21:19:57] (03CR) 10CI reject: [V:04-1] Magnum: hack in fixes to sqlalchemy use [puppet] - 10https://gerrit.wikimedia.org/r/1065988 (https://phabricator.wikimedia.org/T373227) (owner: 10Andrew Bogott) [21:20:04] (03Abandoned) 10Andrew Bogott: Openstack magnum: override default db settings to increase connection count [puppet] - 10https://gerrit.wikimedia.org/r/1065934 (https://phabricator.wikimedia.org/T373227) (owner: 10Andrew Bogott) [21:21:48] (03PS2) 10Andrew Bogott: Magnum: hack in fixes to sqlalchemy use [puppet] - 10https://gerrit.wikimedia.org/r/1065988 (https://phabricator.wikimedia.org/T373227) [21:21:48] (03PS7) 10Andrew Bogott: Remove obsolete files for openstack v. antelope [puppet] - 10https://gerrit.wikimedia.org/r/1065235 [21:22:14] (03CR) 10CI reject: [V:04-1] Magnum: hack in fixes to sqlalchemy use [puppet] - 10https://gerrit.wikimedia.org/r/1065988 (https://phabricator.wikimedia.org/T373227) (owner: 10Andrew Bogott) [21:24:00] (03PS3) 10Andrew Bogott: Magnum: hack in fixes to sqlalchemy use [puppet] - 10https://gerrit.wikimedia.org/r/1065988 (https://phabricator.wikimedia.org/T373227) [21:24:01] (03PS8) 10Andrew Bogott: Remove obsolete files for openstack v. antelope [puppet] - 10https://gerrit.wikimedia.org/r/1065235 [21:27:39] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1065988 (https://phabricator.wikimedia.org/T373227) (owner: 10Andrew Bogott) [21:55:56] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [22:01:23] (03CR) 10Andrew Bogott: [C:03+2] Magnum: hack in fixes to sqlalchemy use [puppet] - 10https://gerrit.wikimedia.org/r/1065988 (https://phabricator.wikimedia.org/T373227) (owner: 10Andrew Bogott) [22:17:40] FIRING: SystemdUnitFailed: docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:29:27] FIRING: [12x] ProbeDown: Service puppetmaster1001:8140 has failed probes (http_puppetmaster1001_eqiad_wmnet_https_ip4) - https://wikitech.wikimedia.org/wiki/Puppet#Debugging - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [22:45:56] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [23:32:03] FIRING: PuppetFailure: Puppet has failed on wdqs1024:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [23:38:31] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1066131 [23:38:31] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1066131 (owner: 10TrainBranchBot)