[00:01:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:10:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:11:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.71, 2.89, 3.82 [00:15:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.11, 4.03, 3.98 [00:17:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.60, 3.65, 3.87 [00:20:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:21:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.11, 3.58, 3.78 [00:21:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:25:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.60, 3.60, 3.74 [00:26:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:26:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:27:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.53, 4.30, 3.98 [00:29:10] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:31:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:32:55] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:35:17] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 2.403 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:36:41] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:38:22] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:39:07] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 22 minutes ago with 0 failures [00:40:21] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [00:42:55] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:46:42] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:51:42] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:53:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.80, 3.15, 3.79 [00:56:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:57:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.34, 3.69, 3.82 [01:01:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.42, 3.57, 3.73 [01:01:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:02:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:03:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.86, 4.50, 4.06 [01:03:41] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [01:04:00] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [01:05:59] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [01:07:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:09:45] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.064 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [01:12:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:17:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:18:15] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:24:04] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [01:24:41] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [01:24:45] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [01:31:12] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:32:57] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 15 minutes ago with 0 failures [01:32:58] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [01:33:11] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [01:33:15] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:37:38] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:39:38] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [01:46:50] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.071 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [01:48:15] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:49:15] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:54:15] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:57:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:03:10] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:03:49] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [02:03:52] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:05:49] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [02:05:51] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [02:07:34] [02mw-config] 07BlankEclair opened pull request 03#5629: T12425: Add custom footer for rainversewiki - 13https://github.com/miraheze/mw-config/pull/5629 [02:07:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:09:14] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.075 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:11:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:12:22] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:13:35] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:15:29] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.055 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:16:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:16:34] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [02:20:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:21:51] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:23:01] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:46] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.229 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:25:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:26:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:27:09] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [02:31:14] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:35:15] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 1.972 second response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:41:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:42:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:47:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:51:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.95, 3.04, 3.86 [02:53:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.13, 3.65, 3.96 [02:55:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.64, 3.21, 3.74 [02:59:02] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.68, 2.52, 3.35 [03:00:27] RECOVERY - db171 Backups SQL on db171 is OK: FILE_AGE OK: /var/log/sql-backup.log is 26 seconds old and 0 bytes [03:02:25] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:03:27] PROBLEM - mon181 Backups Grafana on mon181 is WARNING: FILE_AGE WARNING: /var/log/grafana-backup.log is 864188 seconds old and 93 bytes [03:03:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:05:53] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.70, 4.52, 3.89 [03:07:02] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [03:07:06] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [03:07:46] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [03:09:18] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 23 minutes ago with 0 failures [03:09:19] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [03:09:48] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [03:12:07] PROBLEM - ping6 on cp51 is CRITICAL: PING CRITICAL - Packet loss = 37%, RTA = 210.75 ms [03:12:19] PROBLEM - ping6 on ns2 is CRITICAL: PING CRITICAL - Packet loss = 100% [03:13:20] PROBLEM - ping6 on cp41 is CRITICAL: PING CRITICAL - Packet loss = 90%, RTA = 166.55 ms [03:13:30] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 5 backends are down. mw152 mw161 mw162 mw171 mw181 [03:13:32] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 0.26, 2.80, 3.64 [03:13:46] PROBLEM - cp51 HTTPS on cp51 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10000 milliseconds with 0 bytes received [03:13:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:13:54] PROBLEM - cp41 Varnish Backends on cp41 is CRITICAL: 2 backends are down. mw161 mw181 [03:14:09] RECOVERY - ping6 on cp51 is OK: PING OK - Packet loss = 0%, RTA = 162.08 ms [03:14:20] RECOVERY - ping6 on ns2 is OK: PING OK - Packet loss = 0%, RTA = 141.84 ms [03:15:21] RECOVERY - ping6 on cp41 is OK: PING OK - Packet loss = 0%, RTA = 102.74 ms [03:15:27] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [03:15:27] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.16, 1.95, 3.23 [03:15:43] RECOVERY - cp51 HTTPS on cp51 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3843 bytes in 1.081 second response time [03:15:54] RECOVERY - cp41 Varnish Backends on cp41 is OK: All 19 backends are healthy [03:47:25] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:52:25] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:43:26] PROBLEM - cp36 HTTP 4xx/5xx ERROR Rate on cp36 is CRITICAL: CRITICAL - NGINX Error Rate is 100% [04:43:27] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 38.46.223.205/cpweb, 38.46.223.206/cpweb, 2602:294:0:b13::110/cpweb, 2602:294:0:b23::112/cpweb, 109.123.230.163/cpweb, 2400:d320:2161:9775::1/cpweb [04:43:42] PROBLEM - wiki.nowchess.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:43:47] PROBLEM - 321nails.crpteam.club - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:43:47] PROBLEM - cp36 Varnish Backends on cp36 is CRITICAL: 9 backends are down. mw151 mw152 mw161 mw162 mw171 mw172 mw181 mw182 mediawiki [04:43:49] PROBLEM - www.durawiki.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:43:49] PROBLEM - urbanshade.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:43:49] PROBLEM - wiki.cdntennis.ca - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:43:53] PROBLEM - franchise.franchising.org.ua - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:43:53] PROBLEM - wiki.luemir.xyz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:43:53] PROBLEM - wiki.cubestudios.xyz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:43:54] PROBLEM - wiki.omegabuild.uk - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:43:56] PROBLEM - cp41 Varnish Backends on cp41 is CRITICAL: 9 backends are down. mw151 mw152 mw161 mw162 mw171 mw172 mw181 mw182 mediawiki [04:44:01] PROBLEM - wiki.buryland.net - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:03] PROBLEM - cp37 HTTPS on cp37 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 7 - Failed to connect to cp37.wikitide.net port 443 after 0 ms: Couldn't connect to server [04:44:07] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 8 datacenters are down: 46.250.240.167/cpweb, 2407:3641:2161:9774::1/cpweb, 38.46.223.205/cpweb, 38.46.223.206/cpweb, 2602:294:0:b13::110/cpweb, 2602:294:0:b23::112/cpweb, 109.123.230.163/cpweb, 2400:d320:2161:9775::1/cpweb [04:44:24] PROBLEM - cp36 Current Load on cp36 is WARNING: LOAD WARNING - total load average: 6.82, 4.26, 2.11 [04:44:41] PROBLEM - wiki.meregos.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:41] PROBLEM - tl.awiki.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:43] PROBLEM - cp41 HTTP 4xx/5xx ERROR Rate on cp41 is CRITICAL: CRITICAL - NGINX Error Rate is 100% [04:44:44] PROBLEM - cp37 HTTP 4xx/5xx ERROR Rate on cp37 is CRITICAL: CRITICAL - NGINX Error Rate is 100% [04:44:46] PROBLEM - allthetropes.orain.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:47] PROBLEM - poserdazfreebies.orain.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:47] PROBLEM - housing.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:47] PROBLEM - wiki.knowledgerevolution.eu - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:50] PROBLEM - rct.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:53] PROBLEM - cp37 Nginx Backend for mwtask171 on cp37 is CRITICAL: connect to address localhost and port 8161: Connection refused [04:44:57] PROBLEM - www.johanloopmans.nl - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:57] PROBLEM - pyramidgames.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:57] PROBLEM - gimkit.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:57] PROBLEM - aman.awiki.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:58] PROBLEM - resources.africanvision.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:58] PROBLEM - antiguabarbudacalypso.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:01] PROBLEM - wiki.18t.rip - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds PROBLEM - www.dariawiki.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:04] PROBLEM - wiki.mahdiruiz.line.pm - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:05] PROBLEM - cp37 Nginx Backend for mw161 on cp37 is CRITICAL: connect to address localhost and port 8115: Connection refused [04:45:05] PROBLEM - cp37 Nginx Backend for swiftproxy161 on cp37 is CRITICAL: connect to address localhost and port 8206: Connection refused [04:45:05] PROBLEM - cp41 HTTPS on cp41 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Connection timed out after 10004 milliseconds [04:45:08] PROBLEM - cp37 Nginx Backend for mw151 on cp37 is CRITICAL: connect to address localhost and port 8113: Connection refused [04:45:09] PROBLEM - wiki.junkstore.xyz - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:09] PROBLEM - cp36 HTTPS on cp36 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Connection timed out after 10003 milliseconds [04:45:10] PROBLEM - cp37 Nginx Backend for phorge171 on cp37 is CRITICAL: connect to address localhost and port 8202: Connection refused [04:45:15] PROBLEM - cp37 Nginx Backend for mw172 on cp37 is CRITICAL: connect to address localhost and port 8118: Connection refused [04:45:23] PROBLEM - rosettacode.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:23] PROBLEM - wiki.openhatch.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:23] PROBLEM - wiki.walkscape.app - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:27] PROBLEM - wiki.limaru.net - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:27] PROBLEM - cp37 Nginx Backend for mw181 on cp37 is CRITICAL: connect to address localhost and port 8119: Connection refused [04:45:28] PROBLEM - cp37 Nginx Backend for mw171 on cp37 is CRITICAL: connect to address localhost and port 8117: Connection refused [04:45:28] PROBLEM - cp37 Nginx Backend for mw152 on cp37 is CRITICAL: connect to address localhost and port 8114: Connection refused [04:45:28] PROBLEM - wiki.cube-conflict.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:31] PROBLEM - wiki.sheepservermc.net - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:32] PROBLEM - cp51 Current Load on cp51 is CRITICAL: LOAD CRITICAL - total load average: 10.54, 9.89, 4.55 [04:45:32] PROBLEM - infectowiki.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:33] PROBLEM - wiki.arsrobotics.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:36] PROBLEM - cp37 Nginx Backend for reports171 on cp37 is CRITICAL: connect to address localhost and port 8205: Connection refused [04:45:38] PROBLEM - wiki.villagecollaborative.net - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:38] PROBLEM - wiki.thunis.eu - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:39] PROBLEM - cp51 HTTPS on cp51 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Connection timed out after 10004 milliseconds [04:45:42] PROBLEM - cp37 Nginx Backend for mon181 on cp37 is CRITICAL: connect to address localhost and port 8201: Connection refused [04:45:45] PROBLEM - cp37 Nginx Backend for mwtask181 on cp37 is CRITICAL: connect to address localhost and port 8160: Connection refused [04:45:47] PROBLEM - cp37 Nginx Backend for matomo151 on cp37 is CRITICAL: connect to address localhost and port 8203: Connection refused [04:45:48] PROBLEM - wiki.potabi.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:00] PROBLEM - cp37 Nginx Backend for mw162 on cp37 is CRITICAL: connect to address localhost and port 8116: Connection refused [04:46:16] PROBLEM - wiki.orivium.io - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:17] PROBLEM - wiki.aridia.space - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:17] PROBLEM - wiki.cutefame.net - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:18] PROBLEM - iceria.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:23] PROBLEM - yokaiwatchwiki.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:23] PROBLEM - wiki.mikrodev.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:24] PROBLEM - cp37 Nginx Backend for test151 on cp37 is CRITICAL: connect to address localhost and port 8181: Connection refused RECOVERY - cp36 Current Load on cp36 is OK: LOAD OK - total load average: 5.63, 4.83, 2.59 [04:46:24] PROBLEM - nonciclopedia.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:24] PROBLEM - wiki.msnld.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:24] PROBLEM - n64brew.dev - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:26] PROBLEM - cp37 Nginx Backend for puppet181 on cp37 is CRITICAL: connect to address localhost and port 8204: Connection refused [04:46:34] PROBLEM - wiki.ff6worldscollide.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:34] PROBLEM - wiki.cyberfurs.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:36] PROBLEM - cp37 Varnish Backends on cp37 is CRITICAL: 9 backends are down. mw151 mw152 mw161 mw162 mw171 mw172 mw181 mw182 mediawiki [04:46:36] PROBLEM - largedu.eu.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:36] PROBLEM - ff8.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:36] PROBLEM - en.religiononfire.mar.in.ua - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:40] PROBLEM - cp37 Nginx Backend for mw182 on cp37 is CRITICAL: connect to address localhost and port 8120: Connection refused [04:46:40] PROBLEM - cp37 Nginx Backend for swiftproxy171 on cp37 is CRITICAL: connect to address localhost and port 8207: Connection refused [04:46:41] PROBLEM - wiki.ciptamedia.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:42] PROBLEM - wiki.meeusen.net - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:53] PROBLEM - worldsanskrit.net - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:58] PROBLEM - wiki.jill-jimmy.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:47:02] PROBLEM - history.sdtef.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:47:02] PROBLEM - kagaga.jp - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:47:02] PROBLEM - www.permanentfuturelab.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:47:05] RECOVERY - cp36 HTTPS on cp36 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3815 bytes in 0.069 second response time [04:47:10] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 1 backends are down. mw171 [04:47:11] PROBLEM - wikislamica.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:47:11] PROBLEM - alternatewiki.tombricks.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:47:12] PROBLEM - www.thegreatwar.uk - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:47:19] PROBLEM - cp51 HTTP 4xx/5xx ERROR Rate on cp51 is WARNING: WARNING - NGINX Error Rate is 46% [04:47:26] RECOVERY - cp36 HTTP 4xx/5xx ERROR Rate on cp36 is OK: OK - NGINX Error Rate is 4% [04:47:32] PROBLEM - cp51 Current Load on cp51 is WARNING: LOAD WARNING - total load average: 2.50, 7.37, 4.30 [04:47:33] PROBLEM - lgbtqia.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:47:35] RECOVERY - cp51 HTTPS on cp51 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3819 bytes in 1.051 second response time [04:47:47] RECOVERY - cp36 Varnish Backends on cp36 is OK: All 19 backends are healthy [04:48:18] PROBLEM - files.petrawiki.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:48:19] PROBLEM - grayzonewarfare.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:48:23] PROBLEM - bobobay.wiki - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:48:25] PROBLEM - revi.wiki - PositiveSSLDV on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:48:25] PROBLEM - podpedia.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:48:25] PROBLEM - issue-tracker.wikitide.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:48:31] PROBLEM - mwcosmos.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:48:35] PROBLEM - familiacorsi.com - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:48:40] PROBLEM - kunwok.org - LetsEncrypt on sslhost is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:49:09] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [04:49:19] RECOVERY - cp51 HTTP 4xx/5xx ERROR Rate on cp51 is OK: OK - NGINX Error Rate is 13% [04:49:32] RECOVERY - cp51 Current Load on cp51 is OK: LOAD OK - total load average: 0.53, 5.03, 3.81 [04:52:31] PROBLEM - cp41 Nginx Backend for mw162 on cp41 is CRITICAL: connect to address localhost and port 8116: Connection refused [04:52:33] PROBLEM - cp41 Nginx Backend for mw182 on cp41 is CRITICAL: connect to address localhost and port 8120: Connection refused [04:52:34] PROBLEM - cp41 Nginx Backend for test151 on cp41 is CRITICAL: connect to address localhost and port 8181: Connection refused [04:52:42] PROBLEM - cp41 Nginx Backend for phorge171 on cp41 is CRITICAL: connect to address localhost and port 8202: Connection refused [04:52:46] PROBLEM - cp41 Nginx Backend for mw172 on cp41 is CRITICAL: connect to address localhost and port 8118: Connection refused [04:53:01] PROBLEM - cp41 Nginx Backend for swiftproxy171 on cp41 is CRITICAL: connect to address localhost and port 8207: Connection refused [04:53:11] PROBLEM - cp41 Nginx Backend for reports171 on cp41 is CRITICAL: connect to address localhost and port 8205: Connection refused [04:53:17] PROBLEM - cp41 Nginx Backend for mw151 on cp41 is CRITICAL: connect to address localhost and port 8113: Connection refused [04:53:37] PROBLEM - cp41 Nginx Backend for mwtask171 on cp41 is CRITICAL: connect to address localhost and port 8161: Connection refused [04:53:40] PROBLEM - cp41 Nginx Backend for puppet181 on cp41 is CRITICAL: connect to address localhost and port 8204: Connection refused [04:53:43] PROBLEM - cp41 Nginx Backend for mon181 on cp41 is CRITICAL: connect to address localhost and port 8201: Connection refused [04:53:43] PROBLEM - cp41 Nginx Backend for mw181 on cp41 is CRITICAL: connect to address localhost and port 8119: Connection refused [04:53:46] PROBLEM - cp41 Nginx Backend for matomo151 on cp41 is CRITICAL: connect to address localhost and port 8203: Connection refused [04:53:56] PROBLEM - cp41 Nginx Backend for mwtask181 on cp41 is CRITICAL: connect to address localhost and port 8160: Connection refused [04:54:05] PROBLEM - cp41 Nginx Backend for mw161 on cp41 is CRITICAL: connect to address localhost and port 8115: Connection refused [04:54:16] PROBLEM - cp41 Nginx Backend for swiftproxy161 on cp41 is CRITICAL: connect to address localhost and port 8206: Connection refused [04:54:21] PROBLEM - cp41 Nginx Backend for mw152 on cp41 is CRITICAL: connect to address localhost and port 8114: Connection refused [04:54:25] PROBLEM - cp41 Nginx Backend for mw171 on cp41 is CRITICAL: connect to address localhost and port 8117: Connection refused [04:59:12] !tech cp37 and cp41 are down (see above from icinga, and i get connection refused on port 443 when i try to connect to both of them) [04:59:36] RECOVERY - cp37 Nginx Backend for reports171 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8205 [04:59:42] RECOVERY - cp37 Nginx Backend for mon181 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8201 [04:59:45] RECOVERY - cp37 Nginx Backend for mwtask181 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8160 [04:59:47] RECOVERY - cp37 Nginx Backend for matomo151 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8203 [05:00:00] RECOVERY - cp37 Nginx Backend for mw162 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8116 [05:00:24] RECOVERY - cp37 Nginx Backend for test151 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8181 [05:00:26] RECOVERY - cp37 Nginx Backend for puppet181 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8204 [05:00:36] RECOVERY - cp37 Varnish Backends on cp37 is OK: All 19 backends are healthy [05:00:40] RECOVERY - cp37 Nginx Backend for mw182 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8120 [05:00:40] RECOVERY - cp37 Nginx Backend for swiftproxy171 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8207 [05:00:44] RECOVERY - cp37 HTTP 4xx/5xx ERROR Rate on cp37 is OK: OK - NGINX Error Rate is 2% [05:00:53] RECOVERY - cp37 Nginx Backend for mwtask171 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8161 [05:01:05] RECOVERY - cp37 Nginx Backend for swiftproxy161 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8206 [05:01:05] RECOVERY - cp37 Nginx Backend for mw161 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8115 [05:01:08] RECOVERY - cp37 Nginx Backend for mw151 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8113 [05:01:10] RECOVERY - cp37 Nginx Backend for phorge171 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8202 [05:01:15] RECOVERY - cp37 Nginx Backend for mw172 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8118 [05:01:27] RECOVERY - cp37 Nginx Backend for mw181 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8119 [05:01:28] RECOVERY - cp37 Nginx Backend for mw152 on cp37 is OK: TCP OK - 0.001 second response time on localhost port 8114 [05:01:28] RECOVERY - cp37 Nginx Backend for mw171 on cp37 is OK: TCP OK - 0.000 second response time on localhost port 8117 [05:01:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:02:00] RECOVERY - cp37 HTTPS on cp37 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3820 bytes in 0.095 second response time [05:05:11] RECOVERY - cp41 Nginx Backend for reports171 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8205 [05:05:17] RECOVERY - cp41 Nginx Backend for mw151 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8113 [05:05:27] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [05:05:37] RECOVERY - cp41 Nginx Backend for mwtask171 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8161 [05:05:40] RECOVERY - cp41 Nginx Backend for puppet181 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8204 [05:05:43] RECOVERY - cp41 Nginx Backend for mon181 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8201 [05:05:43] RECOVERY - cp41 Nginx Backend for mw181 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8119 [05:05:46] RECOVERY - cp41 Nginx Backend for matomo151 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8203 [05:05:51] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [05:05:56] RECOVERY - cp41 Nginx Backend for mwtask181 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8160 [05:05:58] RECOVERY - cp41 Varnish Backends on cp41 is OK: All 19 backends are healthy [05:06:05] RECOVERY - cp41 Nginx Backend for mw161 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8115 [05:06:16] RECOVERY - cp41 Nginx Backend for swiftproxy161 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8206 [05:06:21] RECOVERY - cp41 Nginx Backend for mw152 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8114 [05:06:25] RECOVERY - cp41 Nginx Backend for mw171 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8117 [05:06:31] RECOVERY - cp41 Nginx Backend for mw162 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8116 [05:06:32] RECOVERY - cp41 Nginx Backend for mw182 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8120 [05:06:34] RECOVERY - cp41 Nginx Backend for test151 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8181 [05:06:42] RECOVERY - cp41 Nginx Backend for phorge171 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8202 [05:06:43] RECOVERY - cp41 HTTP 4xx/5xx ERROR Rate on cp41 is OK: OK - NGINX Error Rate is 2% [05:06:46] RECOVERY - cp41 Nginx Backend for mw172 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8118 [05:06:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:07:01] RECOVERY - cp41 Nginx Backend for swiftproxy171 on cp41 is OK: TCP OK - 0.000 second response time on localhost port 8207 [05:07:01] RECOVERY - cp41 HTTPS on cp41 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3842 bytes in 0.728 second response time [05:12:55] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:12:59] RECOVERY - wiki.nowchess.org - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.nowchess.org' will expire on Wed 09 Oct 2024 09:12:07 PM GMT +0000. [05:13:09] RECOVERY - 321nails.crpteam.club - LetsEncrypt on sslhost is OK: OK - Certificate '321nails.crpteam.club' will expire on Fri 11 Oct 2024 12:53:39 PM GMT +0000. [05:13:09] RECOVERY - urbanshade.org - LetsEncrypt on sslhost is OK: OK - Certificate 'urbanshade.org' will expire on Sat 26 Oct 2024 03:54:50 PM GMT +0000. [05:13:09] RECOVERY - www.durawiki.com - LetsEncrypt on sslhost is OK: OK - Certificate 'www.durawiki.com' will expire on Wed 09 Oct 2024 02:26:27 AM GMT +0000. [05:13:09] RECOVERY - wiki.cdntennis.ca - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.cdntennis.ca' will expire on Wed 09 Oct 2024 07:33:32 PM GMT +0000. [05:13:18] RECOVERY - wiki.cubestudios.xyz - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.cubestudios.xyz' will expire on Sat 14 Sep 2024 11:36:09 AM GMT +0000. [05:13:18] RECOVERY - wiki.luemir.xyz - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.luemir.xyz' will expire on Thu 10 Oct 2024 04:56:38 PM GMT +0000. [05:13:18] RECOVERY - franchise.franchising.org.ua - LetsEncrypt on sslhost is OK: OK - Certificate 'franchise.franchising.org.ua' will expire on Wed 09 Oct 2024 06:49:43 PM GMT +0000. [05:13:25] RECOVERY - wiki.omegabuild.uk - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.omegabuild.uk' will expire on Tue 27 Aug 2024 04:06:14 PM GMT +0000. [05:13:26] RECOVERY - wiki.meregos.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.meregos.com' will expire on Thu 10 Oct 2024 01:35:57 AM GMT +0000. [05:13:34] RECOVERY - allthetropes.orain.org - LetsEncrypt on sslhost is OK: OK - Certificate 'orain.org' will expire on Wed 30 Oct 2024 09:58:01 PM GMT +0000. [05:13:36] RECOVERY - poserdazfreebies.orain.org - LetsEncrypt on sslhost is OK: OK - Certificate 'orain.org' will expire on Wed 30 Oct 2024 09:58:01 PM GMT +0000. [05:13:37] RECOVERY - housing.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'housing.wiki' will expire on Thu 10 Oct 2024 04:26:31 PM GMT +0000. [05:13:41] RECOVERY - wiki.knowledgerevolution.eu - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.knowledgerevolution.eu' will expire on Tue 22 Oct 2024 04:32:42 PM GMT +0000. [05:13:47] RECOVERY - rct.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'rct.wiki' will expire on Thu 10 Oct 2024 11:11:19 AM GMT +0000. [05:13:56] RECOVERY - gimkit.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'gimkit.wiki' will expire on Mon 07 Oct 2024 04:03:17 PM GMT +0000. [05:13:56] RECOVERY - pyramidgames.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'pyramidgames.wiki' will expire on Sun 20 Oct 2024 04:56:18 PM GMT +0000. [05:13:56] RECOVERY - www.johanloopmans.nl - LetsEncrypt on sslhost is OK: OK - Certificate 'www.johanloopmans.nl' will expire on Sun 20 Oct 2024 05:00:59 PM GMT +0000. [05:13:56] RECOVERY - wiki.buryland.net - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.buryland.net' will expire on Mon 02 Sep 2024 01:02:22 AM GMT +0000. [05:13:57] RECOVERY - aman.awiki.org - LetsEncrypt on sslhost is OK: OK - Certificate 'aman.awiki.org' will expire on Wed 09 Oct 2024 07:48:25 PM GMT +0000. [05:13:57] RECOVERY - antiguabarbudacalypso.com - LetsEncrypt on sslhost is OK: OK - Certificate 'antiguabarbudacalypso.com' will expire on Thu 10 Oct 2024 07:27:39 PM GMT +0000. [05:13:58] RECOVERY - resources.africanvision.org - LetsEncrypt on sslhost is OK: OK - Certificate 'resources.africanvision.org' will expire on Fri 11 Oct 2024 04:33:40 PM GMT +0000. [05:14:05] RECOVERY - www.dariawiki.org - LetsEncrypt on sslhost is OK: OK - Certificate 'dariawiki.org' will expire on Sun 20 Oct 2024 04:52:09 PM GMT +0000. [05:14:05] RECOVERY - wiki.18t.rip - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.18t.rip' will expire on Sun 01 Sep 2024 01:47:13 PM GMT +0000. [05:14:19] RECOVERY - wiki.junkstore.xyz - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.junkstore.xyz' will expire on Sun 27 Oct 2024 06:28:42 PM GMT +0000. [05:14:33] RECOVERY - tl.awiki.org - LetsEncrypt on sslhost is OK: OK - Certificate 'tl.awiki.org' will expire on Sun 20 Oct 2024 04:51:42 PM GMT +0000. [05:14:44] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 3.71, 4.27, 2.67 [05:14:49] RECOVERY - wiki.openhatch.org - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.openhatch.org' will expire on Wed 09 Oct 2024 09:11:14 PM GMT +0000. [05:14:49] RECOVERY - wiki.walkscape.app - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.walkscape.app' will expire on Tue 29 Oct 2024 11:14:23 AM GMT +0000. [05:14:50] RECOVERY - wiki.mahdiruiz.line.pm - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.mahdiruiz.line.pm' will expire on Tue 27 Aug 2024 03:54:13 PM GMT +0000. [05:14:51] RECOVERY - rosettacode.org - LetsEncrypt on sslhost is OK: OK - Certificate 'rosettacode.org' will expire on Mon 26 Aug 2024 03:10:07 AM GMT +0000. [05:15:02] RECOVERY - wiki.cube-conflict.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.cube-conflict.com' will expire on Tue 22 Oct 2024 03:22:26 PM GMT +0000. [05:15:06] RECOVERY - wiki.cutefame.net - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.cutefame.net' will expire on Wed 21 Aug 2024 06:44:54 PM GMT +0000. [05:15:06] RECOVERY - wiki.orivium.io - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.orivium.io' will expire on Sun 15 Sep 2024 10:10:11 PM GMT +0000. [05:15:06] RECOVERY - wiki.aridia.space - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.aridia.space' will expire on Thu 10 Oct 2024 01:53:22 AM GMT +0000. [05:15:07] RECOVERY - wiki.sheepservermc.net - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.sheepservermc.net' will expire on Sat 05 Oct 2024 09:12:55 PM GMT +0000. [05:15:08] RECOVERY - iceria.org - LetsEncrypt on sslhost is OK: OK - Certificate 'www.iceria.org' will expire on Wed 09 Oct 2024 05:40:21 PM GMT +0000. [05:15:16] RECOVERY - wiki.limaru.net - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.limaru.net' will expire on Fri 11 Oct 2024 07:14:25 PM GMT +0000. [05:15:18] RECOVERY - wiki.villagecollaborative.net - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.villagecollaborative.net' will expire on Wed 09 Oct 2024 09:20:12 PM GMT +0000. [05:15:19] RECOVERY - wiki.thunis.eu - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.thunis.eu' will expire on Wed 09 Oct 2024 09:11:22 AM GMT +0000. [05:15:19] RECOVERY - yokaiwatchwiki.com - LetsEncrypt on sslhost is OK: OK - Certificate 'yokaiwatchwiki.com' will expire on Fri 11 Oct 2024 06:09:00 PM GMT +0000. [05:15:19] RECOVERY - wiki.mikrodev.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.mikrodev.com' will expire on Wed 09 Oct 2024 09:38:25 PM GMT +0000. [05:15:22] RECOVERY - nonciclopedia.org - LetsEncrypt on sslhost is OK: OK - Certificate 'nonciclopedia.org' will expire on Wed 09 Oct 2024 08:01:13 AM GMT +0000. [05:15:22] RECOVERY - wiki.msnld.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.msnld.com' will expire on Sun 20 Oct 2024 04:47:34 PM GMT +0000. [05:15:22] RECOVERY - n64brew.dev - LetsEncrypt on sslhost is OK: OK - Certificate 'n64brew.dev' will expire on Wed 09 Oct 2024 07:43:10 PM GMT +0000. [05:15:24] RECOVERY - infectowiki.com - LetsEncrypt on sslhost is OK: OK - Certificate 'infectowiki.com' will expire on Wed 09 Oct 2024 04:22:42 PM GMT +0000. [05:15:27] RECOVERY - wiki.arsrobotics.org - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.arsrobotics.org' will expire on Mon 09 Sep 2024 10:43:12 AM GMT +0000. [05:15:38] RECOVERY - wiki.potabi.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.potabi.com' will expire on Thu 10 Oct 2024 09:13:23 AM GMT +0000. [05:15:44] RECOVERY - wiki.ff6worldscollide.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.ff6worldscollide.com' will expire on Thu 05 Sep 2024 07:56:54 PM GMT +0000. [05:15:44] RECOVERY - wiki.cyberfurs.org - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.cyberfurs.org' will expire on Thu 10 Oct 2024 12:18:16 AM GMT +0000. [05:15:44] RECOVERY - ff8.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'ff8.wiki' will expire on Thu 19 Sep 2024 02:16:08 AM GMT +0000. [05:15:45] RECOVERY - largedu.eu.org - LetsEncrypt on sslhost is OK: OK - Certificate 'largedu.eu.org' will expire on Fri 11 Oct 2024 01:08:40 PM GMT +0000. [05:15:45] RECOVERY - en.religiononfire.mar.in.ua - LetsEncrypt on sslhost is OK: OK - Certificate 'en.religiononfire.mar.in.ua' will expire on Thu 31 Oct 2024 02:08:48 PM GMT +0000. [05:15:58] RECOVERY - wiki.ciptamedia.org - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.ciptamedia.org' will expire on Wed 09 Oct 2024 09:31:29 PM GMT +0000. [05:15:59] RECOVERY - wiki.meeusen.net - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.meeusen.net' will expire on Fri 11 Oct 2024 09:45:07 AM GMT +0000. [05:16:29] RECOVERY - worldsanskrit.net - LetsEncrypt on sslhost is OK: OK - Certificate 'worldsanskrit.net' will expire on Thu 10 Oct 2024 09:37:26 AM GMT +0000. [05:16:39] RECOVERY - www.permanentfuturelab.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'permanentfuturelab.wiki' will expire on Sun 20 Oct 2024 04:53:18 PM GMT +0000. [05:16:39] RECOVERY - kagaga.jp - LetsEncrypt on sslhost is OK: OK - Certificate 'kagaga.jp' will expire on Tue 22 Oct 2024 06:13:40 PM GMT +0000. [05:16:41] RECOVERY - history.sdtef.org - LetsEncrypt on sslhost is OK: OK - Certificate 'history.sdtef.org' will expire on Wed 09 Oct 2024 04:01:19 PM GMT +0000. [05:16:53] RECOVERY - wiki.jill-jimmy.com - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.jill-jimmy.com' will expire on Fri 04 Oct 2024 09:26:34 PM GMT +0000. [05:16:54] RECOVERY - wikislamica.org - LetsEncrypt on sslhost is OK: OK - Certificate 'wikislamica.org' will expire on Sun 13 Oct 2024 11:59:16 PM GMT +0000. [05:16:54] RECOVERY - alternatewiki.tombricks.com - LetsEncrypt on sslhost is OK: OK - Certificate 'alternatewiki.tombricks.com' will expire on Thu 10 Oct 2024 04:46:42 PM GMT +0000. [05:16:58] RECOVERY - www.thegreatwar.uk - LetsEncrypt on sslhost is OK: OK - Certificate 'thegreatwar.uk' will expire on Wed 09 Oct 2024 07:05:29 PM GMT +0000. [05:17:25] RECOVERY - lgbtqia.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'lgbtqia.wiki' will expire on Wed 30 Oct 2024 11:12:15 PM GMT +0000. [05:17:41] RECOVERY - grayzonewarfare.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'grayzonewarfare.wiki' will expire on Tue 27 Aug 2024 09:44:27 PM GMT +0000. [05:17:42] RECOVERY - files.petrawiki.org - LetsEncrypt on sslhost is OK: OK - Certificate 'files.petrawiki.org' will expire on Sat 12 Oct 2024 12:57:58 PM GMT +0000. [05:17:50] RECOVERY - bobobay.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'bobobay.wiki' will expire on Mon 16 Sep 2024 03:23:44 PM GMT +0000. [05:17:53] RECOVERY - revi.wiki - PositiveSSLDV on sslhost is OK: OK - Certificate 'revi.wiki' will expire on Sun 29 Dec 2024 11:59:59 PM GMT +0000. [05:17:53] RECOVERY - podpedia.org - LetsEncrypt on sslhost is OK: OK - Certificate 'podpedia.org' will expire on Wed 09 Oct 2024 10:18:01 AM GMT +0000. [05:17:53] RECOVERY - issue-tracker.wikitide.org - LetsEncrypt on sslhost is OK: OK - Certificate 'wikitide.org' will expire on Wed 30 Oct 2024 11:03:09 PM GMT +0000. [05:17:55] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:18:25] RECOVERY - kunwok.org - LetsEncrypt on sslhost is OK: OK - Certificate 'kunwok.org' will expire on Wed 09 Oct 2024 09:43:14 AM GMT +0000. [05:18:30] RECOVERY - mwcosmos.com - LetsEncrypt on sslhost is OK: OK - Certificate 'mwcosmos.com' will expire on Fri 04 Oct 2024 01:22:23 AM GMT +0000. [05:18:33] RECOVERY - familiacorsi.com - LetsEncrypt on sslhost is OK: OK - Certificate 'familiacorsi.com' will expire on Tue 22 Oct 2024 05:16:02 PM GMT +0000. [05:18:35] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.18, 3.68, 2.79 [05:20:29] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.97, 3.20, 2.73 [05:24:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:27:14] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.50, 4.42, 3.44 [05:29:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:33:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.66, 3.81, 3.51 [05:35:02] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.87, 3.18, 3.32 [05:36:44] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:41:55] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.79, 5.14, 4.10 [05:44:08] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [05:45:45] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:47:40] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.060 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:49:49] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [05:51:44] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:52:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:57:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:58:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:03:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:05:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.21, 3.12, 3.74 [06:05:54] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [06:07:54] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [06:09:02] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.21, 2.25, 3.26 [06:10:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:10:57] PROBLEM - swiftac171 Current Load on swiftac171 is CRITICAL: LOAD CRITICAL - total load average: 17.61, 9.22, 4.75 [06:12:29] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.66, 20.22, 17.05 [06:12:55] PROBLEM - swiftac171 Current Load on swiftac171 is WARNING: LOAD WARNING - total load average: 10.80, 10.68, 5.91 [06:13:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 3.88, 4.22, 3.91 [06:14:55] RECOVERY - swiftac171 Current Load on swiftac171 is OK: LOAD OK - total load average: 3.55, 7.91, 5.47 [06:15:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.53, 3.73, 3.77 [06:15:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:17:05] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 8.92, 5.07, 4.21 [06:17:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:18:25] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.44, 19.54, 17.93 [06:22:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:23:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:33:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:37:20] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:39:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.62, 3.31, 3.97 [06:41:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.11, 4.33, 4.27 [06:41:18] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [06:42:20] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:43:13] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.236 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [06:47:20] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:49:28] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [06:52:20] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:53:28] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.087 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [06:57:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.21, 3.26, 3.81 [06:57:20] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:02:20] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:03:02] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.69, 2.68, 3.34 [07:04:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:05:29] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [07:07:03] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 9.20, 5.67, 4.37 [07:07:24] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.238 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [07:09:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:10:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:11:45] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [07:12:08] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:15:44] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.064 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [07:16:16] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [07:20:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:22:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:32:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:32:55] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:37:55] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:42:14] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [07:42:55] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:46:14] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.063 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [07:53:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.43, 3.22, 3.91 [07:55:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.68, 5.03, 4.49 [08:02:55] [Grafana] !tech RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:05:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.94, 3.18, 3.79 [08:09:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.39, 3.96, 3.95 [08:09:11] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:11:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.93, 3.55, 3.78 [08:14:11] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:15:06] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [08:17:03] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.01, 4.04, 3.85 [08:17:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:21:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.45, 3.97, 3.92 [08:22:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:23:02] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.33, 3.72, 3.81 [08:23:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:28:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:29:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.80, 3.30, 3.65 [08:29:46] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [08:29:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:31:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.13, 4.48, 4.05 [08:31:48] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [442 system event log (SEL) entries present] [08:33:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.65, 3.74, 3.81 [08:34:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:36:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:37:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.42, 4.10, 3.87 [08:39:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.35, 3.56, 3.70 [08:41:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:42:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:42:54] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [08:43:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.77, 3.86, 3.75 [08:47:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.78, 3.78, 3.79 [08:47:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:49:03] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.68, 4.19, 3.91 [08:49:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:53:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.54, 3.55, 3.74 [08:54:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:57:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:59:03] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.96, 4.21, 3.87 [09:02:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:03:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.20, 3.56, 3.71 [09:04:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:05:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.93, 4.64, 4.10 [09:09:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:10:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:15:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.33, 3.66, 3.94 [09:15:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:17:01] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.53, 4.03, 4.04 [09:17:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [09:20:19] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o