[00:02:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.70, 23.21, 22.87 [00:05:27] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:06:17] PROBLEM - os162 Current Load on os162 is WARNING: LOAD WARNING - total load average: 7.47, 6.55, 5.94 [00:06:58] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:07:23] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.064 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:08:49] PROBLEM - mwtask181 Disk Space on mwtask181 is WARNING: DISK WARNING - free space: / 24055MiB (10% inode=94%); [00:10:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 14.98, 17.71, 20.32 [00:11:12] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [00:11:46] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:12:49] RECOVERY - mwtask181 Disk Space on mwtask181 is OK: DISK OK - free space: / 43939MiB (19% inode=94%); [00:17:53] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.069 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:19:47] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.77, 20.23, 20.44 [00:21:43] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.25, 21.90, 21.03 [00:23:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.04, 21.95, 21.17 [00:27:34] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.56, 23.26, 21.78 [00:29:31] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.05, 23.06, 21.90 [00:38:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:42:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.18, 22.73, 23.60 [00:43:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:46:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.67, 23.18, 23.57 [00:48:58] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.08, 18.98, 20.24 [00:51:18] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:53:18] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [00:54:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.52, 23.63, 23.66 [00:56:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.86, 24.87, 24.08 [00:57:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.74, 21.24, 20.70 [00:59:36] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.13, 22.03, 21.03 [01:01:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.96, 21.13, 20.82 [01:04:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.79, 22.89, 23.74 [01:05:26] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.76, 19.15, 20.07 [01:14:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.36, 23.37, 23.23 [01:16:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.22, 22.83, 23.04 [01:26:21] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:26:27] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [01:27:00] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.21, 22.24, 22.34 [01:28:25] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [01:28:25] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [01:29:00] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.55, 22.17, 22.33 [01:30:38] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 8.952 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [01:30:41] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 51 packages available for upgrade (0 critical updates). [01:32:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.27, 22.37, 22.29 [01:34:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.39, 22.37, 22.31 [01:40:29] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.11, 3.17, 3.84 [01:42:28] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.01, 4.01, 4.08 [01:46:26] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.38, 3.42, 3.85 [01:48:26] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.23, 4.64, 4.25 [01:52:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.05, 21.33, 21.39 [01:53:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:54:22] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.84, 3.69, 4.00 [01:54:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.70, 22.00, 21.63 [01:58:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:59:43] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 1 backends are down. mw181 [02:00:20] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 21.66, 19.69, 16.39 [02:01:40] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [02:02:16] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 15.02, 18.02, 16.18 [02:06:16] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.69, 2.75, 3.32 [02:10:13] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.77, 4.12, 3.78 [02:11:17] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:12:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.53, 22.29, 21.58 [02:13:21] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 8.216 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:18:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:22:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.50, 23.43, 23.13 [02:23:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:28:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:30:10] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:36:17] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.108 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:36:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.66, 24.14, 23.24 [02:38:49] PROBLEM - mwtask181 Disk Space on mwtask181 is WARNING: DISK WARNING - free space: / 23872MiB (10% inode=94%); [02:40:13] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:40:40] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:42:49] RECOVERY - mwtask181 Disk Space on mwtask181 is OK: DISK OK - free space: / 43755MiB (19% inode=94%); [02:44:20] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [02:44:42] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.476 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:46:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.91, 22.00, 22.97 [02:48:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:51:54] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 0.15, 2.07, 3.55 [02:53:53] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.09, 1.42, 3.13 [03:00:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.09, 22.87, 22.78 [03:02:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.70, 22.06, 22.45 [03:04:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.78, 22.80, 22.63 [03:18:17] RECOVERY - os162 Current Load on os162 is OK: LOAD OK - total load average: 3.81, 5.88, 6.71 [03:19:17] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:19:42] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.83, 5.10, 3.54 [03:19:49] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:21:13] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.080 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:21:47] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [03:22:17] PROBLEM - os162 Current Load on os162 is WARNING: LOAD WARNING - total load average: 7.37, 6.79, 6.91 [03:25:39] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.98, 3.64, 3.37 [03:27:39] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.54, 3.05, 3.18 [03:33:37] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.25, 4.26, 3.62 [03:33:41] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:35:38] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 1.329 second response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:39:33] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.67, 3.55, 3.61 [03:43:31] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.26, 2.67, 3.25 [03:49:29] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.50, 5.05, 4.07 [03:53:00] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:55:25] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.54, 3.65, 3.80 [04:01:21] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.53, 4.13, 3.89 [04:02:30] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.33, 18.76, 15.44 [04:02:46] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 1 backends are down. mw181 [04:03:12] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 25.23, 20.11, 15.98 [04:03:19] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 2 backends are down. mw162 mw182 [04:03:19] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.81, 3.43, 3.66 [04:03:32] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 26.69, 20.11, 16.23 [04:03:49] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 22.05, 19.56, 15.67 [04:04:44] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.24, 22.51, 17.30 [04:04:45] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [04:04:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.23, 22.22, 23.96 [04:05:08] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.80, 18.63, 13.74 [04:05:12] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 20.38, 20.89, 16.83 [04:05:19] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [04:05:32] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 20.68, 20.12, 16.72 [04:05:48] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 18.53, 19.55, 16.16 [04:06:30] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 14.52, 19.11, 16.60 [04:06:44] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 14.73, 19.54, 16.87 [04:06:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.65, 24.47, 24.57 [04:07:08] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 9.87, 15.27, 13.10 [04:07:12] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 14.59, 18.71, 16.54 [04:07:18] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.27, 4.53, 4.04 [04:07:32] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 14.66, 17.79, 16.26 [04:07:48] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:08:17] RECOVERY - os162 Current Load on os162 is OK: LOAD OK - total load average: 6.64, 6.44, 6.76 [04:08:19] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:09:45] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [04:12:17] PROBLEM - os162 Current Load on os162 is WARNING: LOAD WARNING - total load average: 6.92, 6.83, 6.87 [04:12:21] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.075 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:13:15] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.45, 3.76, 3.92 [04:14:17] RECOVERY - os162 Current Load on os162 is OK: LOAD OK - total load average: 6.07, 6.47, 6.73 [04:17:12] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.05, 4.44, 4.12 [04:18:40] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:20:40] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 4.213 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:23:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:27:13] PROBLEM - os162 Current Load on os162 is WARNING: LOAD WARNING - total load average: 7.35, 6.68, 6.51 [04:27:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:31:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.09, 3.22, 3.89 [04:34:58] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.79, 4.90, 4.34 [04:37:08] RECOVERY - os162 Current Load on os162 is OK: LOAD OK - total load average: 5.36, 6.27, 6.55 [04:38:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.00, 20.45, 19.22 [04:38:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.17, 3.66, 3.96 [04:40:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.98, 20.10, 19.28 [04:42:52] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.34, 3.79, 3.90 [04:46:36] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:38] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:47:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:49:42] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [04:51:56] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 51 packages available for upgrade (0 critical updates). [04:52:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:52:46] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.576 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:52:49] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [04:57:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:00:49] PROBLEM - mwtask181 Disk Space on mwtask181 is WARNING: DISK WARNING - free space: / 23941MiB (10% inode=94%); [05:02:38] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.24, 2.99, 3.88 [05:03:22] PROBLEM - wiki.jsal.org - LetsEncrypt on sslhost is CRITICAL: No address associated with hostnameHTTP CRITICAL - Unable to open TCP socket [05:04:49] RECOVERY - mwtask181 Disk Space on mwtask181 is OK: DISK OK - free space: / 43900MiB (19% inode=94%); [05:06:18] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.64, 22.90, 20.30 [05:06:35] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.04, 1.37, 3.01 [05:06:46] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 1 backends are down. mw181 [05:07:01] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 28.18, 19.75, 16.25 [05:07:06] PROBLEM - cp41 Varnish Backends on cp41 is CRITICAL: 1 backends are down. mw181 [05:07:12] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 29.17, 20.77, 16.36 [05:07:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:07:32] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 24.88, 20.05, 16.13 [05:08:30] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 24.00, 20.80, 16.76 [05:08:44] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.51, 21.78, 17.59 [05:08:45] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [05:10:30] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.91, 23.06, 18.11 [05:10:44] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.80, 21.54, 18.03 [05:10:58] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 19.95, 21.45, 17.81 [05:11:12] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 15.87, 19.91, 17.12 [05:11:32] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 17.09, 20.88, 17.53 [05:12:08] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 17.70, 23.16, 21.64 [05:12:30] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 14.33, 19.54, 17.41 [05:12:44] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 13.56, 18.43, 17.31 [05:12:57] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 17.39, 19.83, 17.65 [05:13:03] RECOVERY - cp41 Varnish Backends on cp41 is OK: All 19 backends are healthy [05:13:32] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 13.14, 18.16, 16.94 [05:17:18] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [05:18:26] PROBLEM - wiki.jsal.org - reverse DNS on sslhost is WARNING: rDNS WARNING - reverse DNS entry for wiki.jsal.org could not be found [05:20:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.05, 19.15, 20.28 [05:24:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.43, 20.98, 20.79 [05:26:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.86, 19.66, 20.32 [05:37:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:43:39] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [05:50:30] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o