[00:38:06] <icinga-wm>	 RECOVERY - Check systemd state on netbox1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:38:30] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/949225
[00:38:36] <wikibugs>	 (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/949225 (owner: 10TrainBranchBot)
[00:42:20] <icinga-wm>	 PROBLEM - Check systemd state on netbox1002 is CRITICAL: CRITICAL - degraded: The following units failed: netbox_ganeti_esams_sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:42:46] <icinga-wm>	 RECOVERY - Check systemd state on logstash2026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:53:28] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/949225 (owner: 10TrainBranchBot)
[01:51:52] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] "I'll deploy this later" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/950132 (owner: 10Zabe)
[02:06:40] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:14:49] <jinxer-wm>	 (ConfdResourceFailed) firing: (120) confd resource _srv_config-master_pybal_codfw_ncredir-https.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[02:31:40] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:39:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[02:49:16] <jinxer-wm>	 (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
[03:11:04] <icinga-wm>	 PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:12:24] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqdfw is CRITICAL: CRITICAL: host 208.80.153.198, interfaces up: 70, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[04:05:00] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1006:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[04:10:54] <icinga-wm>	 PROBLEM - Backup freshness on backup1001 is CRITICAL: All failures: 1 (bast3007), Fresh: 128 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[04:31:10] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint2002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10
[04:58:10] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint2002 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10
[05:11:16] <icinga-wm>	 RECOVERY - Backup freshness on backup1001 is OK: Fresh: 129 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[05:31:48] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 142, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:32:24] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 211, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:44:30] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 143, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:45:04] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 212, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:14:50] <jinxer-wm>	 (ConfdResourceFailed) firing: (120) confd resource _srv_config-master_pybal_codfw_ncredir-https.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[06:41:05] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[07:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230819T0700)
[07:11:05] <jinxer-wm>	 (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[07:44:51] <wikibugs>	 (03PS1) 10Aklapper: mariadb: grant user 'phstats' additional select on phabricator_repository db [puppet] - 10https://gerrit.wikimedia.org/r/950683 (https://phabricator.wikimedia.org/T344513)
[07:45:12] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10DBA, 10Patch-For-Review: mariadb: grant user 'phstats' additional select on phabricator_repository DB - https://phabricator.wikimedia.org/T344513 (10Aklapper) 05Stalled→03Open
[08:05:00] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) resolved: Blazegraph instance wdqs1006:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[08:06:30] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1006:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[08:19:33] <wikibugs>	 (03PS2) 10Zabe: manage-dblist: Add lang to langlist if not present [mediawiki-config] - 10https://gerrit.wikimedia.org/r/950132
[08:26:09] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on 16 hosts with reason: Downtime esams hosts prior to cr1-esams reboot
[08:26:33] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 16 hosts with reason: Downtime esams hosts prior to cr1-esams reboot
[08:26:45] <wikibugs>	 10SRE, 10ops-knams, 10DC-Ops, 10Patch-For-Review: Main Tracking Task for ESAMS Migration to KNAMS - https://phabricator.wikimedia.org/T329219 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=7e93d2ff-5842-45cb-8ae6-d620952951e6) set by cmooney@cumin1001 for 2:00:00 on 16 host(s) and thei...
[08:37:15] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: cr1-esams listing transport ospf interfaces multiple times - https://phabricator.wikimedia.org/T344546 (10cmooney) p:05Triage→03Medium
[08:37:22] <topranks>	 !log downtiming esams hosts ahead of core router (cr1-esams) reboot T344546
[08:37:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:37:26] <stashbot>	 T344546: cr1-esams listing transport ospf interfaces multiple times - https://phabricator.wikimedia.org/T344546
[08:38:04] <icinga-wm>	 RECOVERY - Check systemd state on netbox1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:38:09] <logmsgbot>	 !log cmooney@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on 29 hosts with reason: Downtime esams hosts prior to migration week.
[08:38:42] <logmsgbot>	 !log cmooney@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 29 hosts with reason: Downtime esams hosts prior to migration week.
[08:38:47] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: cr1-esams listing transport ospf interfaces multiple times - https://phabricator.wikimedia.org/T344546 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=9daa4455-f5dd-4a7a-8558-107a3e4842a2) set by cmooney@cumin1001 for 2:00:00 on 29 host(s) an...
[08:42:28] <icinga-wm>	 PROBLEM - Check systemd state on netbox1002 is CRITICAL: CRITICAL - degraded: The following units failed: netbox_ganeti_esams_sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:55:04] <icinga-wm>	 RECOVERY - BGP status on asw1-by27-esams.mgmt is OK: BGP OK - up: 8, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[09:09:18] <icinga-wm>	 RECOVERY - BFD status on cr1-esams is OK: UP: 6 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[09:09:20] <icinga-wm>	 RECOVERY - OSPF status on cr1-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[09:09:48] <icinga-wm>	 RECOVERY - BFD status on cr2-eqiad is OK: UP: 18 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[09:16:35] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: cr1-esams listing transport ospf interfaces multiple times - https://phabricator.wikimedia.org/T344546 (10cmooney) 05Open→03Resolved @ayounsi advised to try running a "commit full" on the router rather than a reboot.  Apparently this command forces a full co...
[09:33:48] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney) p:05Triage→03Low
[09:38:59] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney)
[09:40:05] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney)
[09:42:36] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney) In terms of Pybal or Anycast routes etc. we can possibly not send those.  If the overall approach seems good we can weigh up exactly what is useful to s...
[09:51:05] <jinxer-wm>	 (SwiftTooManyMediaUploads) firing: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[09:51:30] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney) This should also result in more optimal routing as switches will send traffic to the CR with the best path to the remote internal destinations.  i.e. if...
[10:14:51] <jinxer-wm>	 (ConfdResourceFailed) firing: (120) confd resource _srv_config-master_pybal_codfw_ncredir-https.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[10:21:05] <jinxer-wm>	 (SwiftTooManyMediaUploads) resolved: (2) Too many eqiad mediawiki originals uploads - https://wikitech.wikimedia.org/wiki/Swift/How_To#mediawiki_originals_uploads  - https://alerts.wikimedia.org/?q=alertname%3DSwiftTooManyMediaUploads
[11:35:46] <icinga-wm>	 PROBLEM - Check systemd state on thanos-be1003 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:47:42] <icinga-wm>	 PROBLEM - Disk space on thanos-be1003 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdn1 is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=thanos-be1003&var-datasource=eqiad+prometheus/ops
[12:06:45] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1006:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[12:38:06] <icinga-wm>	 RECOVERY - Check systemd state on netbox1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:42:24] <icinga-wm>	 PROBLEM - Check systemd state on netbox1002 is CRITICAL: CRITICAL - degraded: The following units failed: netbox_ganeti_esams_sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:06:40] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:13:28] <icinga-wm>	 RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[14:13:34] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqdfw is OK: OK: host 208.80.153.198, interfaces up: 71, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[14:14:57] <jinxer-wm>	 (ConfdResourceFailed) firing: (120) confd resource _srv_config-master_pybal_codfw_ncredir-https.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[14:16:40] <jinxer-wm>	 (JobUnavailable) resolved: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:11:30] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1006:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[16:52:02] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:53:28] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:53:40] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:54:58] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 18 Oct 2023 03:52:32 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:00:30] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50420 bytes in 0.105 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:00:30] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.261 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:48:37] <jelto>	 topranks: the cr1-esams alert is just expired?
[17:48:50] <topranks>	 cr1-esams page was it firing again after 24h - everything seems ok I’ll do a quick check in an hour or so when I’m at my computer
[17:49:13] <topranks>	 jelto: yes, acked yesterday and wasn’t resolved after so firing again
[17:49:14] <jelto>	 Great thanks
[17:49:18] <topranks>	 Sorry for the noise
[17:49:23] <jelto>	 np
[17:50:39] <jelto>	 I can confirm, alert fired yesterday Aug. 18 - 7:41 PM, so 24h ago
[18:15:06] <jinxer-wm>	 (ConfdResourceFailed) firing: (120) confd resource _srv_config-master_pybal_codfw_ncredir-https.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[18:34:28] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:45:56] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:48:16] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:48:26] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:55:14] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50420 bytes in 0.067 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[18:55:26] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.283 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[20:11:30] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1006:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[22:15:11] <jinxer-wm>	 (ConfdResourceFailed) firing: (120) confd resource _srv_config-master_pybal_codfw_ncredir-https.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed