[00:02:34] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:25:50] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:39:18] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on alert1001 is CRITICAL: 57.12 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[00:39:42] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is CRITICAL: 54.75 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[00:41:40] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[00:43:10] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on alert1001 is OK: (C)60 le (W)70 le 84.38 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[00:59:36] <icinga-wm>	 PROBLEM - Logstash Elasticsearch indexing errors #o11y on alert1001 is CRITICAL: 549 ge 480 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/3283cc1372b7df18f26128163125cf45 https://grafana.wikimedia.org/dashboard/db/logstash
[01:01:34] <icinga-wm>	 RECOVERY - Logstash Elasticsearch indexing errors #o11y on alert1001 is OK: (C)480 ge (W)60 ge 7 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/3283cc1372b7df18f26128163125cf45 https://grafana.wikimedia.org/dashboard/db/logstash
[01:02:04] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[01:02:38] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:05:58] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 8 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[01:25:54] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:02:44] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:26:04] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:35:40] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic2053-production-search-psi-codfw on elastic2053 is CRITICAL: 101.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-codfw&var-instance=elastic2053&panelId=37
[02:50:30] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic2056-production-search-omega-codfw on elastic2056 is CRITICAL: 104.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-omega-codfw&var-instance=elastic2056&panelId=37
[02:55:04] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[03:02:48] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:20:16] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic2053-production-search-psi-codfw on elastic2053 is OK: (C)100 gt (W)80 gt 54.92 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-codfw&var-instance=elastic2053&panelId=37
[03:22:16] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[03:26:08] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:30:02] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[03:44:50] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic2056-production-search-omega-codfw on elastic2056 is OK: (C)100 gt (W)80 gt 77.29 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-omega-codfw&var-instance=elastic2056&panelId=37
[04:01:06] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:08:08] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic2056-production-search-omega-codfw on elastic2056 is CRITICAL: 476.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-omega-codfw&var-instance=elastic2056&panelId=37
[04:26:16] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:43:48] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic2059-production-search-omega-codfw on elastic2059 is CRITICAL: 106.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-omega-codfw&var-instance=elastic2059&panelId=37
[04:53:36] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic2053-production-search-psi-codfw on elastic2053 is CRITICAL: 143.4 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-codfw&var-instance=elastic2053&panelId=37
[05:01:18] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic2059-production-search-omega-codfw on elastic2059 is OK: (C)100 gt (W)80 gt 53.9 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-omega-codfw&var-instance=elastic2059&panelId=37
[05:01:22] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:26:42] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:44:12] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic2053-production-search-psi-codfw on elastic2053 is OK: (C)100 gt (W)80 gt 72.2 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-codfw&var-instance=elastic2053&panelId=37
[05:55:12] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic2056-production-search-omega-codfw on elastic2056 is OK: (C)100 gt (W)80 gt 67.12 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-omega-codfw&var-instance=elastic2056&panelId=37
[06:01:42] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:26:58] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210822T0700)
[07:01:47] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) firing: Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[07:01:47] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) firing: Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[07:01:48] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:02:16] <legoktm>	 o.O
[07:02:38] <XioNoX>	 yo
[07:03:21] <XioNoX>	 mr1-esams, false alert...
[07:03:59] * sobanski goes back to breakfast then
[07:04:52] <legoktm>	 ack
[07:05:22] <jynus>	 ok
[07:06:01] <XioNoX>	 I'll open a task on monday for the management routers not to page for that alert
[07:06:47] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) resolved: Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[07:06:47] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) resolved: Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[07:08:51] <jinxer-wm>	 (Processor usage over 85%) firing: Processor usage over 85%   - https://alerts.wikimedia.org
[07:13:28] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic2053-production-search-psi-codfw on elastic2053 is CRITICAL: 667.1 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-codfw&var-instance=elastic2053&panelId=37
[07:13:51] <jinxer-wm>	 (Processor usage over 85%) resolved: Processor usage over 85%   - https://alerts.wikimedia.org
[07:17:20] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[07:27:02] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:42:30] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[07:50:16] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[07:54:54] <icinga-wm>	 PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: / (spec from root) is CRITICAL: Test spec from root returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Citoid
[07:58:48] <icinga-wm>	 RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[08:01:56] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:15:34] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic2053-production-search-psi-codfw on elastic2053 is OK: (C)100 gt (W)80 gt 64.07 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-codfw&var-instance=elastic2053&panelId=37
[08:27:12] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:28:20] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[08:30:16] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 6 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[08:36:04] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[08:38:02] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 7 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[08:42:44] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[08:54:28] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic2053-production-search-psi-codfw on elastic2053 is CRITICAL: 126.1 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-codfw&var-instance=elastic2053&panelId=37
[08:56:22] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[09:02:12] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:19:42] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[09:19:48] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic2053-production-search-psi-codfw on elastic2053 is CRITICAL: 116.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-codfw&var-instance=elastic2053&panelId=37
[09:25:36] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:45:04] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic2053-production-search-psi-codfw on elastic2053 is OK: (C)100 gt (W)80 gt 74.24 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-codfw&var-instance=elastic2053&panelId=37
[10:02:28] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:25:42] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:02:30] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:17:58] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[11:25:28] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[11:25:32] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:00:44] <wikibugs>	 (03PS3) 10Majavah: Take OS codename into account for grid compatibility [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/713663 (https://phabricator.wikimedia.org/T278748)
[12:01:08] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:01:18] <wikibugs>	 (03PS4) 10Majavah: Do not compare OS versions [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/713663 (https://phabricator.wikimedia.org/T278748)
[12:21:28] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[12:23:26] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 4 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[12:26:18] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:46:10] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic2056-production-search-omega-codfw on elastic2056 is CRITICAL: 344.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-omega-codfw&var-instance=elastic2056&panelId=37
[12:50:22] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic2059-production-search-omega-codfw on elastic2059 is CRITICAL: 239 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-omega-codfw&var-instance=elastic2059&panelId=37
[13:02:00] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:27:02] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:42:30] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic2059-production-search-omega-codfw on elastic2059 is OK: (C)100 gt (W)80 gt 65.08 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-omega-codfw&var-instance=elastic2059&panelId=37
[13:43:54] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic2056-production-search-omega-codfw on elastic2056 is OK: (C)100 gt (W)80 gt 62.03 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-omega-codfw&var-instance=elastic2056&panelId=37
[14:01:58] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:15:42] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic2053-production-search-psi-codfw on elastic2053 is CRITICAL: 115.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-codfw&var-instance=elastic2053&panelId=37
[14:17:24] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic2059-production-search-omega-codfw on elastic2059 is CRITICAL: 582.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-omega-codfw&var-instance=elastic2059&panelId=37
[14:27:12] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:38:58] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic2053-production-search-psi-codfw on elastic2053 is OK: (C)100 gt (W)80 gt 66.1 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-codfw&var-instance=elastic2053&panelId=37
[15:01:08] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=rails site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:02:04] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:03:04] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:13:42] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic2059-production-search-omega-codfw on elastic2059 is OK: (C)100 gt (W)80 gt 39.66 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-omega-codfw&var-instance=elastic2059&panelId=37
[15:24:10] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[15:27:08] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:28:02] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 5 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[16:02:08] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:25:34] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:02:32] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:25:54] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:01:42] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:27:08] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:41:34] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[18:43:30] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 4 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[19:02:04] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:26:32] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:35:11] <wikibugs>	 (03PS4) 10H.krishna123: [WIP]: Add basic tox config [software/bernard] - 10https://gerrit.wikimedia.org/r/713604 (https://phabricator.wikimedia.org/T284404)
[19:35:47] <wikibugs>	 (03CR) 10H.krishna123: "recheck" [software/bernard] - 10https://gerrit.wikimedia.org/r/713604 (https://phabricator.wikimedia.org/T284404) (owner: 10H.krishna123)
[19:39:23] <wikibugs>	 (03PS5) 10H.krishna123: bernard: Add basic tox config [software/bernard] - 10https://gerrit.wikimedia.org/r/713604 (https://phabricator.wikimedia.org/T284404)
[19:39:37] <wikibugs>	 (03CR) 10H.krishna123: "recheck" [software/bernard] - 10https://gerrit.wikimedia.org/r/713604 (https://phabricator.wikimedia.org/T284404) (owner: 10H.krishna123)
[19:43:57] <wikibugs>	 (03CR) 10H.krishna123: "Yay, CI is ready, it runs unit tests for both api_db and web_app, and some flake8 + mypy + py3-format which I lifted from wmfdbbackups. I " [software/bernard] - 10https://gerrit.wikimedia.org/r/713604 (https://phabricator.wikimedia.org/T284404) (owner: 10H.krishna123)
[20:01:58] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:06:38] <wikibugs>	 (03PS1) 10H.krishna123: bernard: Add basic tox config [software/bernard] - 10https://gerrit.wikimedia.org/r/714206 (https://phabricator.wikimedia.org/T284404)
[20:07:44] <wikibugs>	 (03Abandoned) 10H.krishna123: bernard: Add basic tox config [software/bernard] - 10https://gerrit.wikimedia.org/r/714206 (https://phabricator.wikimedia.org/T284404) (owner: 10H.krishna123)
[20:09:25] <wikibugs>	 (03PS6) 10H.krishna123: bernard: Add basic tox config [software/bernard] - 10https://gerrit.wikimedia.org/r/713604 (https://phabricator.wikimedia.org/T284404)
[20:09:35] <wikibugs>	 (03CR) 10H.krishna123: "recheck" [software/bernard] - 10https://gerrit.wikimedia.org/r/713604 (https://phabricator.wikimedia.org/T284404) (owner: 10H.krishna123)
[20:12:00] <wikibugs>	 (03CR) 10H.krishna123: "CI code ready for review + merge." [software/bernard] - 10https://gerrit.wikimedia.org/r/713604 (https://phabricator.wikimedia.org/T284404) (owner: 10H.krishna123)
[20:26:44] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:01:54] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:27:14] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:02:12] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:12:36] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[22:14:34] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 6 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[22:25:32] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:02:32] <icinga-wm>	 RECOVERY - Check systemd state on planet1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:12:26] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic2053-production-search-psi-codfw on elastic2053 is CRITICAL: 187.1 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-codfw&var-instance=elastic2053&panelId=37
[23:14:52] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[23:16:50] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 9 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[23:25:38] <icinga-wm>	 PROBLEM - Check systemd state on planet1002 is CRITICAL: CRITICAL - degraded: The following units failed: planet-update-en.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:37:37] <wikibugs>	 (03PS1) 10Platonides: Add Wikimedia ES to $wgCopyUploadsDomains whitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/714211 (https://phabricator.wikimedia.org/T289446)