[00:12:25] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:25:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.04, 19.99, 18.96 [00:29:56] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.76, 19.28, 18.92 [00:38:48] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.01, 20.78, 19.71 [00:44:44] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.89, 19.48, 19.59 [00:48:40] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.03, 21.02, 20.20 [00:52:37] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.23, 19.67, 19.84 [00:56:33] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.58, 20.64, 20.28 [00:58:32] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 19.14, 20.09, 20.12 [01:18:00] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [01:18:18] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.78, 20.70, 20.29 [01:20:00] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [01:22:15] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.82, 19.47, 19.88 [01:33:06] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.76, 20.19, 19.87 [01:35:05] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.60, 19.05, 19.53 [01:48:54] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.53, 21.22, 19.96 [01:49:11] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.49, 20.27, 18.69 [01:50:52] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 19.67, 20.23, 19.74 [01:51:08] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.98, 19.47, 18.57 [02:00:43] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.77, 21.13, 20.46 [02:08:37] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 19.89, 20.32, 20.35 [02:12:33] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.16, 20.83, 20.52 [02:14:32] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.49, 22.08, 21.01 [02:16:31] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.72, 20.57, 20.58 [02:17:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:22:27] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 19.49, 19.71, 20.23 [02:29:20] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.07, 21.79, 20.86 [02:31:19] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.61, 20.25, 20.40 [02:52:02] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.81, 20.68, 19.83 [02:59:57] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.32, 20.30, 20.23 [03:03:53] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.31, 21.08, 20.58 [03:04:57] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:05:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.26, 3.34, 1.36 [03:05:52] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.11, 19.90, 20.20 [03:06:09] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:06:51] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.058 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:07:25] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:09:45] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:10:18] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [03:11:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.56, 3.56, 2.20 [03:14:44] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:15:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.01, 4.19, 2.73 [03:15:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:20:25] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:22:30] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 9.632 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:23:46] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [03:25:48] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [03:27:02] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.38, 3.19, 3.16 [03:30:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:30:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:33:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.92, 3.89, 3.49 [03:35:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.50, 4.08, 3.60 [03:35:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:37:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.48, 3.50, 3.42 [03:38:04] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:39:01] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.63, 3.00, 3.26 [03:39:29] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.20, 21.89, 19.82 [03:41:54] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:43:03] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.82, 5.74, 4.33 [03:43:49] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.222 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:46:16] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.48, 21.02, 19.45 [03:48:13] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.31, 21.72, 19.83 [03:50:11] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.58, 21.62, 20.06 [03:51:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.61, 3.72, 4.00 [03:52:08] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.63, 23.68, 20.98 [03:53:04] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:53:39] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.90, 19.48, 16.82 [03:55:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.10, 3.84, 3.95 [03:55:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:55:38] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 25.08, 22.02, 18.09 [03:55:39] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 22.75, 19.52, 16.36 [03:55:51] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.39, 21.68, 17.98 [03:56:55] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 22.28, 20.73, 16.92 [03:57:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.60, 3.55, 3.84 [03:57:38] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.50, 21.48, 18.36 [03:58:34] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 25.45, 21.13, 17.51 [03:59:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.38, 4.18, 4.00 [03:59:39] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 14.82, 18.71, 16.86 [03:59:51] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 17.35, 21.21, 18.77 [04:00:20] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:00:34] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 14.66, 18.37, 16.93 [04:00:55] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 14.74, 18.74, 17.10 [04:01:03] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.22, 3.49, 3.76 [04:01:36] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 12.27, 17.79, 17.71 [04:01:51] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 13.39, 18.37, 18.02 [04:01:54] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.01, 23.02, 22.61 [04:03:29] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.22, 22.42, 23.71 [04:05:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.29, 4.10, 3.90 [04:07:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.75, 3.28, 3.61 [04:09:01] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.44, 2.81, 3.38 [04:11:39] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 11.73, 16.09, 19.54 [04:13:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.64, 3.73, 3.70 [04:13:29] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 14.37, 16.37, 20.05 [04:15:20] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:18:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:19:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.46, 4.33, 3.87 [04:24:28] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:26:23] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.068 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:28:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:29:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.70, 3.77, 4.00 [04:30:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:31:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.58, 4.37, 4.18 [04:32:38] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:34:33] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.072 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:35:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:36:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:38:57] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:40:51] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.076 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:41:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:42:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:52:50] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:55:27] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:56:51] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:57:22] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.219 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:01:03] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [05:02:50] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:04:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:12:46] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 4 datacenters are down: 38.46.223.205/cpweb, 38.46.223.206/cpweb, 2602:294:0:b13::110/cpweb, 2602:294:0:b23::112/cpweb [05:12:54] PROBLEM - ping6 on cp51 is CRITICAL: PING CRITICAL - Packet loss = 60%, RTA = 205.01 ms [05:13:04] PROBLEM - cp41 Varnish Backends on cp41 is CRITICAL: 7 backends are down. mw151 mw152 mw161 mw162 mw172 mw181 mw182 [05:13:05] PROBLEM - ping6 on cp41 is CRITICAL: PING CRITICAL - Packet loss = 60%, RTA = 170.96 ms [05:13:05] PROBLEM - cp41 HTTPS on cp41 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10003 milliseconds with 0 bytes received [05:13:14] PROBLEM - ping6 on ns2 is CRITICAL: PING CRITICAL - Packet loss = 100% [05:13:27] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 46.250.240.167/cpweb [05:13:46] PROBLEM - cp51 HTTPS on cp51 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10000 milliseconds with 0 bytes received [05:13:49] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:13:51] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 9 backends are down. mw151 mw152 mw161 mw162 mw171 mw172 mw181 mw182 mediawiki [05:14:28] PROBLEM - cp51 HTTP 4xx/5xx ERROR Rate on cp51 is WARNING: WARNING - NGINX Error Rate is 45% [05:15:12] RECOVERY - cp41 HTTPS on cp41 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3843 bytes in 9.055 second response time [05:17:56] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:18:41] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [05:19:17] RECOVERY - cp51 HTTP 4xx/5xx ERROR Rate on cp51 is OK: OK - NGINX Error Rate is 36% [05:19:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:19:21] RECOVERY - ping6 on cp41 is OK: PING OK - Packet loss = 0%, RTA = 103.55 ms [05:19:27] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [05:19:28] RECOVERY - ping6 on ns2 is OK: PING OK - Packet loss = 0%, RTA = 141.02 ms [05:19:54] RECOVERY - cp51 HTTPS on cp51 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3843 bytes in 1.290 second response time [05:19:58] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [05:19:59] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 6.140 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:21:17] RECOVERY - cp41 Varnish Backends on cp41 is OK: All 19 backends are healthy [05:21:18] RECOVERY - ping6 on cp51 is OK: PING OK - Packet loss = 0%, RTA = 162.13 ms [05:21:20] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [05:21:49] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [05:25:29] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.39, 17.60, 14.50 [05:26:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:31:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.51, 2.99, 3.86 [05:31:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:35:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.02, 3.38, 3.77 [05:35:29] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.87, 23.10, 18.61 [05:37:29] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.09, 23.82, 19.48 [05:38:10] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:39:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.87, 2.94, 3.50 [05:39:29] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.81, 24.17, 20.11 [05:41:29] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.94, 22.47, 20.01 [05:42:21] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:43:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.09, 3.39, 3.54 [05:44:50] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o