[00:02:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.70, 23.21, 22.87 [00:05:27] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:06:17] PROBLEM - os162 Current Load on os162 is WARNING: LOAD WARNING - total load average: 7.47, 6.55, 5.94 [00:06:58] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:07:23] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.064 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:08:49] PROBLEM - mwtask181 Disk Space on mwtask181 is WARNING: DISK WARNING - free space: / 24055MiB (10% inode=94%); [00:10:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 14.98, 17.71, 20.32 [00:11:12] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [00:11:46] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:12:49] RECOVERY - mwtask181 Disk Space on mwtask181 is OK: DISK OK - free space: / 43939MiB (19% inode=94%); [00:17:53] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.069 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:19:47] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.77, 20.23, 20.44 [00:21:43] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.25, 21.90, 21.03 [00:23:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.04, 21.95, 21.17 [00:27:34] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.56, 23.26, 21.78 [00:29:31] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.05, 23.06, 21.90 [00:38:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:42:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.18, 22.73, 23.60 [00:43:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:46:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.67, 23.18, 23.57 [00:48:58] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.08, 18.98, 20.24 [00:51:18] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:53:18] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [00:54:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.52, 23.63, 23.66 [00:56:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.86, 24.87, 24.08 [00:57:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.74, 21.24, 20.70 [00:59:36] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.13, 22.03, 21.03 [01:01:32] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.96, 21.13, 20.82 [01:04:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.79, 22.89, 23.74 [01:05:26] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.76, 19.15, 20.07 [01:14:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.36, 23.37, 23.23 [01:16:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.22, 22.83, 23.04 [01:26:21] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:26:27] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [01:27:00] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.21, 22.24, 22.34 [01:28:25] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [01:28:25] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [01:29:00] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.55, 22.17, 22.33 [01:30:38] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 8.952 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [01:30:41] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 51 packages available for upgrade (0 critical updates). [01:32:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.27, 22.37, 22.29 [01:34:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.39, 22.37, 22.31 [01:40:29] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.11, 3.17, 3.84 [01:42:28] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.01, 4.01, 4.08 [01:46:26] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.38, 3.42, 3.85 [01:48:26] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.23, 4.64, 4.25 [01:52:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.05, 21.33, 21.39 [01:53:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:54:22] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.84, 3.69, 4.00 [01:54:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.70, 22.00, 21.63 [01:58:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:59:43] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 1 backends are down. mw181 [02:00:20] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 21.66, 19.69, 16.39 [02:01:40] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [02:02:16] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 15.02, 18.02, 16.18 [02:06:16] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.69, 2.75, 3.32 [02:10:13] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.77, 4.12, 3.78 [02:11:17] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:12:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.53, 22.29, 21.58 [02:13:21] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 8.216 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:18:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:22:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.50, 23.43, 23.13 [02:23:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:28:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:30:10] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:36:17] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.108 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:36:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.66, 24.14, 23.24 [02:38:49] PROBLEM - mwtask181 Disk Space on mwtask181 is WARNING: DISK WARNING - free space: / 23872MiB (10% inode=94%); [02:40:13] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:40:40] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:42:49] RECOVERY - mwtask181 Disk Space on mwtask181 is OK: DISK OK - free space: / 43755MiB (19% inode=94%); [02:44:20] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [02:44:42] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.476 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [02:46:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.91, 22.00, 22.97 [02:48:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:51:54] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 0.15, 2.07, 3.55 [02:53:53] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.09, 1.42, 3.13 [03:00:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.09, 22.87, 22.78 [03:02:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.70, 22.06, 22.45 [03:04:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.78, 22.80, 22.63 [03:18:17] RECOVERY - os162 Current Load on os162 is OK: LOAD OK - total load average: 3.81, 5.88, 6.71 [03:19:17] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:19:42] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.83, 5.10, 3.54 [03:19:49] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:21:13] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.080 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:21:47] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [03:22:17] PROBLEM - os162 Current Load on os162 is WARNING: LOAD WARNING - total load average: 7.37, 6.79, 6.91 [03:25:39] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.98, 3.64, 3.37 [03:27:39] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.54, 3.05, 3.18 [03:33:37] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.25, 4.26, 3.62 [03:33:41] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:35:38] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 1.329 second response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:39:33] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.67, 3.55, 3.61 [03:43:31] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.26, 2.67, 3.25 [03:49:29] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.50, 5.05, 4.07 [03:53:00] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:55:25] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.54, 3.65, 3.80 [04:01:21] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.53, 4.13, 3.89 [04:02:30] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.33, 18.76, 15.44 [04:02:46] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 1 backends are down. mw181 [04:03:12] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 25.23, 20.11, 15.98 [04:03:19] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 2 backends are down. mw162 mw182 [04:03:19] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.81, 3.43, 3.66 [04:03:32] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 26.69, 20.11, 16.23 [04:03:49] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 22.05, 19.56, 15.67 [04:04:44] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.24, 22.51, 17.30 [04:04:45] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [04:04:59] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.23, 22.22, 23.96 [04:05:08] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.80, 18.63, 13.74 [04:05:12] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 20.38, 20.89, 16.83 [04:05:19] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [04:05:32] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 20.68, 20.12, 16.72 [04:05:48] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 18.53, 19.55, 16.16 [04:06:30] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 14.52, 19.11, 16.60 [04:06:44] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 14.73, 19.54, 16.87 [04:06:59] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.65, 24.47, 24.57 [04:07:08] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 9.87, 15.27, 13.10 [04:07:12] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 14.59, 18.71, 16.54 [04:07:18] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.27, 4.53, 4.04 [04:07:32] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 14.66, 17.79, 16.26 [04:07:48] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:08:17] RECOVERY - os162 Current Load on os162 is OK: LOAD OK - total load average: 6.64, 6.44, 6.76 [04:08:19] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:09:45] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [04:12:17] PROBLEM - os162 Current Load on os162 is WARNING: LOAD WARNING - total load average: 6.92, 6.83, 6.87 [04:12:21] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.075 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:13:15] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.45, 3.76, 3.92 [04:14:17] RECOVERY - os162 Current Load on os162 is OK: LOAD OK - total load average: 6.07, 6.47, 6.73 [04:17:12] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.05, 4.44, 4.12 [04:18:40] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:20:40] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 4.213 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:23:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:27:13] PROBLEM - os162 Current Load on os162 is WARNING: LOAD WARNING - total load average: 7.35, 6.68, 6.51 [04:27:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:31:01] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.09, 3.22, 3.89 [04:34:58] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.79, 4.90, 4.34 [04:37:08] RECOVERY - os162 Current Load on os162 is OK: LOAD OK - total load average: 5.36, 6.27, 6.55 [04:38:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.00, 20.45, 19.22 [04:38:55] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.17, 3.66, 3.96 [04:40:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.98, 20.10, 19.28 [04:42:52] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.34, 3.79, 3.90 [04:46:36] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:38] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:47:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:49:42] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [04:51:56] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 51 packages available for upgrade (0 critical updates). [04:52:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:52:46] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.576 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:52:49] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [04:57:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:00:49] PROBLEM - mwtask181 Disk Space on mwtask181 is WARNING: DISK WARNING - free space: / 23941MiB (10% inode=94%); [05:02:38] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.24, 2.99, 3.88 [05:03:22] PROBLEM - wiki.jsal.org - LetsEncrypt on sslhost is CRITICAL: No address associated with hostnameHTTP CRITICAL - Unable to open TCP socket [05:04:49] RECOVERY - mwtask181 Disk Space on mwtask181 is OK: DISK OK - free space: / 43900MiB (19% inode=94%); [05:06:18] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.64, 22.90, 20.30 [05:06:35] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.04, 1.37, 3.01 [05:06:46] PROBLEM - cp26 Varnish Backends on cp26 is CRITICAL: 1 backends are down. mw181 [05:07:01] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 28.18, 19.75, 16.25 [05:07:06] PROBLEM - cp41 Varnish Backends on cp41 is CRITICAL: 1 backends are down. mw181 [05:07:12] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 29.17, 20.77, 16.36 [05:07:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:07:32] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 24.88, 20.05, 16.13 [05:08:30] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 24.00, 20.80, 16.76 [05:08:44] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.51, 21.78, 17.59 [05:08:45] RECOVERY - cp26 Varnish Backends on cp26 is OK: All 19 backends are healthy [05:10:30] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.91, 23.06, 18.11 [05:10:44] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.80, 21.54, 18.03 [05:10:58] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 19.95, 21.45, 17.81 [05:11:12] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 15.87, 19.91, 17.12 [05:11:32] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 17.09, 20.88, 17.53 [05:12:08] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 17.70, 23.16, 21.64 [05:12:30] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 14.33, 19.54, 17.41 [05:12:44] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 13.56, 18.43, 17.31 [05:12:57] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 17.39, 19.83, 17.65 [05:13:03] RECOVERY - cp41 Varnish Backends on cp41 is OK: All 19 backends are healthy [05:13:32] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 13.14, 18.16, 16.94 [05:17:18] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [05:18:26] PROBLEM - wiki.jsal.org - reverse DNS on sslhost is WARNING: rDNS WARNING - reverse DNS entry for wiki.jsal.org could not be found [05:20:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.05, 19.15, 20.28 [05:24:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.43, 20.98, 20.79 [05:26:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.86, 19.66, 20.32 [05:37:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:43:39] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [05:50:30] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [05:54:41] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.001588791609 secs [05:59:12] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [06:01:22] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:08] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.00, 22.17, 19.43 [06:02:30] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 29.06, 21.70, 17.02 [06:02:41] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 28.84, 21.26, 16.74 [06:02:44] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 29.59, 22.58, 17.80 [06:03:12] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 21.43, 21.09, 17.26 [06:03:23] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.0004749000072 secs [06:03:32] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 25.42, 21.84, 17.31 [06:04:05] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.11, 21.60, 19.58 [06:04:30] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 18.82, 20.30, 17.08 [06:04:41] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 20.62, 21.08, 17.24 [06:04:44] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.21, 21.60, 18.05 [06:05:12] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 17.06, 19.54, 17.15 [06:05:32] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 16.28, 20.07, 17.25 [06:06:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.77, 20.28, 19.37 [06:06:41] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 14.47, 18.31, 16.67 [06:08:44] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 13.80, 18.19, 17.49 [06:10:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.21, 20.75, 19.96 [06:12:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 19.69, 20.30, 19.89 [06:32:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:46:27] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [06:48:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.10, 20.77, 19.86 [06:52:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.53, 18.91, 19.34 [06:58:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.06, 20.48, 19.85 [07:00:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.10, 22.59, 20.66 [07:01:12] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 23.06, 18.78, 15.99 [07:01:19] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 28.71, 22.60, 18.28 [07:01:19] PROBLEM - cp51 Varnish Backends on cp51 is CRITICAL: 4 backends are down. mw152 mw161 mw181 mw182 [07:01:32] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 29.08, 20.98, 16.79 [07:02:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.66, 23.59, 21.31 [07:02:41] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 26.78, 22.60, 17.74 [07:02:44] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.84, 22.69, 17.89 [07:03:13] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 19.33, 22.03, 18.64 [07:03:19] RECOVERY - cp51 Varnish Backends on cp51 is OK: All 19 backends are healthy [07:03:32] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 19.01, 21.03, 17.39 [07:04:41] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 20.23, 20.91, 17.68 [07:04:44] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 22.29, 21.88, 18.15 [07:05:07] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 17.66, 20.34, 18.40 [07:05:13] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 19.10, 19.50, 17.05 [07:05:32] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 15.06, 19.42, 17.27 [07:05:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.61, 3.49, 1.60 [07:06:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.15, 23.30, 21.62 [07:06:44] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 12.27, 18.85, 17.55 [07:07:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:07:46] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.54, 2.36, 1.41 [07:08:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.30, 21.92, 21.36 [07:08:41] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 15.30, 18.57, 17.56 [07:09:08] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [07:11:08] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset 0.0005466341972 secs [07:15:40] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [07:18:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.15, 22.88, 21.50 [07:20:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 17.31, 20.95, 20.99 [07:22:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:26:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.52, 21.34, 20.98 [07:28:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.11, 21.76, 21.22 [07:32:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 15.39, 18.44, 19.98 [07:46:23] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [08:00:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.73, 23.41, 20.25 [08:00:14] PROBLEM - cp36 Varnish Backends on cp36 is CRITICAL: 1 backends are down. mw181 [08:00:30] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.33, 21.73, 17.91 [08:00:35] PROBLEM - cp41 Varnish Backends on cp41 is CRITICAL: 1 backends are down. mw182 [08:00:41] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 26.05, 21.31, 17.20 [08:00:44] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.32, 20.76, 17.14 [08:01:12] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 19.54, 21.21, 18.14 [08:02:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.83, 21.80, 20.05 [08:02:14] RECOVERY - cp36 Varnish Backends on cp36 is OK: All 19 backends are healthy [08:02:30] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 16.91, 19.51, 17.56 [08:02:35] RECOVERY - cp41 Varnish Backends on cp41 is OK: All 19 backends are healthy [08:02:41] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 17.80, 20.25, 17.36 [08:02:44] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 20.20, 19.70, 17.15 [08:04:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.77, 23.78, 20.99 [08:05:12] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 25.27, 22.34, 19.16 [08:06:30] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 16.65, 20.61, 18.67 [08:07:12] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 15.10, 19.44, 18.49 [08:08:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.33, 22.23, 21.10 [08:08:17] PROBLEM - os162 Current Load on os162 is WARNING: LOAD WARNING - total load average: 7.30, 6.45, 5.85 [08:08:30] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 13.99, 18.64, 18.20 [08:10:17] PROBLEM - os162 Current Load on os162 is CRITICAL: LOAD CRITICAL - total load average: 8.06, 6.95, 6.11 [08:12:17] RECOVERY - os162 Current Load on os162 is OK: LOAD OK - total load average: 6.45, 6.66, 6.10 [08:14:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.47, 23.39, 21.75 [08:18:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.17, 23.94, 22.47 [08:22:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [08:34:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.79, 19.20, 20.37 [08:38:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.86, 19.97, 20.46 [08:46:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.07, 23.28, 21.61 [08:48:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.10, 21.71, 21.25 [08:50:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.76, 23.32, 21.87 [08:52:05] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 29.06, 22.50, 18.43 [08:52:44] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 29.14, 23.35, 18.80 [08:52:54] PROBLEM - cp37 Varnish Backends on cp37 is CRITICAL: 3 backends are down. mw172 mw181 mw182 [08:53:08] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.17, 17.90, 13.75 [08:53:12] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 24.56, 19.81, 16.75 [08:53:13] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 26.85, 21.93, 18.28 [08:53:32] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 28.99, 23.11, 18.52 [08:53:33] PROBLEM - cp27 Varnish Backends on cp27 is CRITICAL: 1 backends are down. mw181 [08:54:54] RECOVERY - cp37 Varnish Backends on cp37 is OK: All 19 backends are healthy [08:55:07] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 20.14, 21.17, 18.45 [08:55:08] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 14.04, 16.53, 13.77 [08:55:12] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 18.57, 19.72, 17.13 [08:55:32] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 19.71, 21.97, 18.69 [08:56:02] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 19.36, 22.07, 19.30 [08:56:44] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 19.82, 22.95, 19.82 [08:57:30] RECOVERY - cp27 Varnish Backends on cp27 is OK: All 19 backends are healthy [08:57:32] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 18.48, 20.29, 18.45 [08:58:09] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 26.84, 22.97, 19.93 [08:58:56] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 29.55, 23.18, 19.68 [09:00:07] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 15.57, 21.49, 19.88 [09:00:50] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 15.90, 20.19, 19.00 [09:02:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.44, 23.29, 23.32 [09:02:06] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 14.16, 19.07, 19.20 [09:02:44] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 15.72, 18.97, 19.22 [09:04:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.62, 24.82, 23.86 [09:08:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.11, 22.94, 23.38 [09:08:43] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.25, 3.79, 1.87 [09:10:41] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 3.17, 3.39, 1.95 [09:11:48] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [09:12:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.87, 24.13, 23.65 [09:13:31] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 20.49, 19.09, 18.10 [09:13:48] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset 0.0009436905384 secs [09:15:25] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 24.50, 21.32, 19.03 [09:17:19] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 15.14, 18.95, 18.44 [09:26:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.78, 23.13, 23.98 [09:28:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.95, 24.37, 24.31 [09:30:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.72, 23.14, 23.89 [09:34:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.50, 22.15, 23.13 [09:34:22] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 8.14, 5.30, 3.58 [09:35:31] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.83, 20.71, 18.81 [09:35:52] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 27.82, 21.03, 18.59 [09:36:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.63, 22.85, 23.30 [09:37:25] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 21.96, 21.12, 19.20 [09:37:51] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.41, 21.05, 18.90 [09:39:15] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:39:50] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 17.38, 19.35, 18.52 [09:41:14] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 18.14, 19.81, 19.13 [09:42:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.19, 23.28, 23.40 [09:43:28] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [09:43:46] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 27.17, 21.38, 18.85 [09:44:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.07, 23.89, 23.66 [09:44:51] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.60, 21.31, 19.40 [09:45:12] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 19.03, 21.45, 20.09 [09:45:40] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 21.05, 21.12, 19.07 [09:45:44] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 26.08, 22.37, 19.96 [09:46:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.94, 24.21, 23.77 [09:46:14] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.62, 3.74, 3.70 [09:46:47] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 16.90, 20.12, 19.22 [09:47:12] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 17.21, 20.35, 19.88 [09:47:26] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:47:34] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 19.26, 20.30, 19.01 [09:47:42] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.88, 21.85, 20.06 [09:50:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.77, 23.99, 23.83 [09:50:12] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.03, 4.32, 3.88 [09:51:39] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 18.69, 19.59, 19.50 [09:52:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.78, 24.10, 23.88 [09:52:12] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.41, 3.53, 3.65 [09:54:03] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 28.86, 23.30, 20.36 [09:54:44] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.45, 22.47, 20.23 [09:55:11] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.89, 22.65, 20.19 [09:55:12] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 19.66, 20.73, 20.22 [09:55:59] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 22.36, 22.44, 20.39 [09:56:44] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.08, 21.58, 20.19 [09:57:05] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 14.69, 19.80, 19.46 [09:57:54] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 17.20, 20.08, 19.74 [09:58:07] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.27, 3.68, 3.64 [09:58:44] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.66, 23.18, 20.92 [10:00:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.82, 3.08, 3.41 [10:00:44] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 22.82, 22.57, 20.95 [10:01:12] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 25.90, 22.36, 20.95 [10:02:04] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.39, 2.73, 3.26 [10:02:44] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.29, 23.33, 21.39 [10:03:43] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 22.99, 21.65, 20.52 [10:03:47] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 19.68, 21.63, 20.39 [10:04:44] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 17.78, 21.82, 21.13 [10:05:12] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 20.62, 22.56, 21.44 [10:05:38] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 16.79, 20.18, 20.14 [10:05:41] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 16.75, 19.67, 19.80 [10:08:44] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 18.16, 19.44, 20.28 [10:09:12] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 16.05, 19.20, 20.30 [10:10:03] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.79, 22.70, 23.95 [10:12:03] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 21.59, 23.22, 24.03 [10:17:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:18:19] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [10:19:52] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 8.02, 5.16, 3.89 [10:22:25] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:23:05] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [10:25:06] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset 0.0007360875607 secs [10:25:48] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.81, 3.97, 3.81 [10:27:46] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.33, 4.56, 4.05 [10:29:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.26, 3.87, 3.85 [10:30:01] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 20.58, 20.58, 19.98 [10:33:48] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.01, 4.74, 4.12 [10:33:49] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 11.67, 17.71, 19.16 [10:35:47] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.95, 3.77, 3.83 [10:39:31] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.71, 21.59, 20.15 [10:39:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.26, 4.95, 4.26 [10:40:16]