[00:04:29] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:04:29] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.13, 21.35, 19.75 [00:04:36] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:06:27] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.67, 22.17, 20.20 [00:06:38] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.23, 20.52, 18.45 [00:06:49] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:06:52] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:07:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:08:24] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.23, 23.03, 20.79 [00:08:38] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.42, 21.58, 19.06 [00:10:38] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 22.13, 21.41, 19.28 [00:11:42] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [00:11:45] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 26 minutes ago with 0 failures [00:12:48] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [00:13:02] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.077 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:16:14] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 19.64, 20.16, 20.28 [00:16:39] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.52, 23.53, 20.82 [00:17:29] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:19:28] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.068 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:20:06] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.33, 21.89, 21.06 [00:20:39] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.99, 23.67, 21.58 [00:24:39] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.17, 24.39, 22.28 [00:25:58] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.16, 24.45, 22.32 [00:45:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.26, 22.81, 23.78 [00:46:38] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.47, 22.52, 23.61 [00:47:29] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.73, 3.25, 3.88 [00:49:28] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.05, 3.79, 3.98 [00:52:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:53:25] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.98, 3.77, 3.95 [00:56:38] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.97, 22.42, 22.57 [00:57:26] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 8.15, 4.98, 4.29 [00:57:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.32, 22.19, 22.29 [00:59:24] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.17, 3.98, 4.00 [01:02:38] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.83, 21.91, 22.36 [01:05:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.45, 23.29, 23.08 [01:07:17] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.55, 2.50, 3.37 [01:12:38] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 16.72, 18.21, 20.19 [01:13:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.03, 23.60, 23.30 [01:15:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.07, 22.79, 23.00 [01:17:25] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:19:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.02, 23.49, 23.21 [01:21:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.94, 21.69, 22.57 [01:24:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:29:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.16, 24.09, 23.03 [01:31:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.79, 23.33, 22.90 [01:36:43] PROBLEM - aryavartpedia.online - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'aryavartpedia.online' expires in 15 day(s) (Sun 25 Aug 2024 01:07:41 AM GMT +0000). [01:36:54] [02ssl] 07WikiTideSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ssl/compare/8877c2a7cd6d...d3204826561a [01:36:56] [02ssl] 07WikiTideSSLBot 03d320482 - Bot: Update SSL cert for aryavartpedia.online [01:37:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.31, 22.78, 22.64 [01:45:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.86, 23.57, 23.25 [01:50:45] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.78, 20.43, 19.72 [01:51:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.31, 23.70, 23.44 [01:52:41] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 17.44, 19.72, 19.58 [01:53:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.43, 22.96, 23.24 [02:06:32] RECOVERY - aryavartpedia.online - LetsEncrypt on sslhost is OK: OK - Certificate 'aryavartpedia.online' will expire on Thu 07 Nov 2024 12:38:18 AM GMT +0000. [02:13:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.80, 23.11, 22.38 [02:14:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:15:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.14, 22.74, 22.32 [02:19:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.56, 23.97, 22.81 [02:21:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.08, 23.90, 22.97 [02:23:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.49, 25.09, 23.53 [02:32:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:41:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 14.79, 20.46, 23.06 [02:49:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.54, 17.56, 20.39 [03:04:35] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 14.96, 19.34, 22.83 [03:04:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.43, 3.99, 1.65 [03:05:30] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [03:06:01] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [03:07:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:07:32] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [03:07:58] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 22 minutes ago with 0 failures [03:10:35] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 11.90, 14.22, 19.39 [03:11:54] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:12:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:13:54] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.365 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:23:30] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:27:38] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.091 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:32:34] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:33] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [03:55:18] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.83, 22.33, 18.01 [03:58:47] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.63, 3.09, 3.78 [03:59:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.69, 21.89, 15.98 [04:04:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.40, 4.36, 3.97 [04:06:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.33, 3.80, 3.84 [04:10:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.10, 4.67, 4.11 [04:11:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 15.06, 22.74, 20.70 [04:12:07] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:13:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.12, 24.07, 21.46 [04:15:34] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.79, 22.03, 18.47 [04:17:30] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 19.04, 20.59, 18.34 [04:18:18] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [04:18:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.25, 3.54, 3.87 [04:19:26] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 15.96, 18.86, 17.97 [04:19:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.47, 23.73, 22.31 [04:20:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.17, 4.53, 4.19 [04:21:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.86, 25.59, 23.18 [04:25:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.67, 23.59, 22.92 [04:27:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:28:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.97, 3.51, 3.97 [04:29:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.04, 25.08, 23.60 [04:33:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.15, 22.73, 23.11 [04:34:48] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.13, 3.36, 3.68 [04:35:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.37, 24.01, 23.55 [04:36:47] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.89, 3.05, 3.53 [04:37:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.11, 23.47, 23.41 [04:38:48] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.98, 2.65, 3.33 [04:42:48] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.62, 4.66, 4.05 [04:47:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:48:47] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.54, 3.59, 3.84 [04:50:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.25, 4.38, 4.11 [04:51:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.52, 22.04, 22.35 [04:52:47] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.56, 3.76, 3.92 [04:54:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.94, 4.60, 4.21 [04:55:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 17.35, 20.35, 21.65 [04:58:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.47, 3.37, 3.84 [05:00:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.89, 3.99, 4.00 [05:02:47] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.61, 3.23, 3.71 [05:04:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.74, 4.19, 4.01 [05:05:13] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:07:24] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [05:07:30] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [05:07:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.94, 18.56, 20.40 [05:09:20] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 23 minutes ago with 0 failures [05:09:21] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.074 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:09:26] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [05:12:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 0.25, 3.02, 3.88 [05:16:46] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.25, 1.50, 3.05 [05:17:34] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_dns] [05:22:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:23:23] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:30] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.000176936388 secs [05:45:47] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:15:49] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [06:26:08] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.54, 20.29, 18.51 [06:30:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.41, 19.83, 18.81 [06:42:50] PROBLEM - crustypedia.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'crustypedia.org' expires in 15 day(s) (Sun 25 Aug 2024 06:28:13 AM GMT +0000). [06:43:02] [02ssl] 07WikiTideSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ssl/compare/d3204826561a...a5d71e37affe [06:43:04] [02ssl] 07WikiTideSSLBot 03a5d71e3 - Bot: Update SSL cert for crustypedia.org [06:44:46] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:54:58] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:56:56] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.00133985281 secs [07:11:40] RECOVERY - crustypedia.org - LetsEncrypt on sslhost is OK: OK - Certificate 'crustypedia.org' will expire on Thu 07 Nov 2024 05:44:26 AM GMT +0000. [07:42:25] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:54:35] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.63, 22.26, 23.62 [07:56:35] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.33, 23.33, 23.85 [08:40:52] PROBLEM - os162 APT on os162 is CRITICAL: APT CRITICAL: 54 packages available for upgrade (1 critical updates). [08:42:30] PROBLEM - kafka181 APT on kafka181 is CRITICAL: APT CRITICAL: 38 packages available for upgrade (1 critical updates). [08:42:34] PROBLEM - bast181 APT on bast181 is CRITICAL: APT CRITICAL: 53 packages available for upgrade (1 critical updates). [08:43:15] PROBLEM - db161 APT on db161 is CRITICAL: APT CRITICAL: 59 packages available for upgrade (1 critical updates). [08:43:18] PROBLEM - mw151 APT on mw151 is CRITICAL: APT CRITICAL: 65 packages available for upgrade (1 critical updates). [08:43:19] PROBLEM - mw171 APT on mw171 is CRITICAL: APT CRITICAL: 65 packages available for upgrade (1 critical updates). [08:44:02] PROBLEM - mw162 APT on mw162 is CRITICAL: APT CRITICAL: 65 packages available for upgrade (1 critical updates). [08:44:05] PROBLEM - mw161 APT on mw161 is CRITICAL: APT CRITICAL: 65 packages available for upgrade (1 critical updates). [08:44:06] PROBLEM - mw172 APT on mw172 is CRITICAL: APT CRITICAL: 65 packages available for upgrade (1 critical updates). [08:44:07] PROBLEM - cloud18 APT on cloud18 is CRITICAL: APT CRITICAL: 103 packages available for upgrade (1 critical updates). [08:44:29] PROBLEM - mw182 APT on mw182 is CRITICAL: APT CRITICAL: 66 packages available for upgrade (1 critical updates). [08:44:34] PROBLEM - cp36 APT on cp36 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:44:35] PROBLEM - graylog161 APT on graylog161 is CRITICAL: APT CRITICAL: 56 packages available for upgrade (1 critical updates). [08:44:38] PROBLEM - db171 APT on db171 is CRITICAL: APT CRITICAL: 59 packages available for upgrade (1 critical updates). [08:44:56] PROBLEM - mw181 APT on mw181 is CRITICAL: APT CRITICAL: 66 packages available for upgrade (1 critical updates). [08:45:00] PROBLEM - eventgate181 APT on eventgate181 is CRITICAL: APT CRITICAL: 45 packages available for upgrade (1 critical updates). [08:45:05] PROBLEM - swiftobject181 APT on swiftobject181 is CRITICAL: APT CRITICAL: 53 packages available for upgrade (1 critical updates). [08:45:16] PROBLEM - swiftobject171 APT on swiftobject171 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:45:38] PROBLEM - cloud15 APT on cloud15 is CRITICAL: APT CRITICAL: 110 packages available for upgrade (1 critical updates). [08:45:46] PROBLEM - cp26 APT on cp26 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:45:56] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [08:46:14] PROBLEM - mw152 APT on mw152 is CRITICAL: APT CRITICAL: 65 packages available for upgrade (1 critical updates). [08:46:22] PROBLEM - cloud17 APT on cloud17 is CRITICAL: APT CRITICAL: 102 packages available for upgrade (1 critical updates). [08:46:25] PROBLEM - mwtask171 APT on mwtask171 is CRITICAL: APT CRITICAL: 63 packages available for upgrade (1 critical updates). [08:46:36] PROBLEM - mwtask181 APT on mwtask181 is CRITICAL: APT CRITICAL: 63 packages available for upgrade (1 critical updates). [08:46:39] PROBLEM - cp41 APT on cp41 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:46:56] PROBLEM - cp51 APT on cp51 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:47:14] PROBLEM - cp37 APT on cp37 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:47:18] PROBLEM - mem151 APT on mem151 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:47:55] PROBLEM - db151 APT on db151 is CRITICAL: APT CRITICAL: 60 packages available for upgrade (1 critical updates). [08:47:58] PROBLEM - os151 APT on os151 is CRITICAL: APT CRITICAL: 54 packages available for upgrade (1 critical updates). [08:48:07] PROBLEM - swiftproxy161 APT on swiftproxy161 is CRITICAL: APT CRITICAL: 41 packages available for upgrade (1 critical updates). [08:48:08] PROBLEM - jobchron171 APT on jobchron171 is CRITICAL: APT CRITICAL: 62 packages available for upgrade (1 critical updates). [08:48:32] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:48:40] PROBLEM - swiftproxy171 APT on swiftproxy171 is CRITICAL: APT CRITICAL: 55 packages available for upgrade (1 critical updates). [08:49:03] PROBLEM - mon181 APT on mon181 is CRITICAL: APT CRITICAL: 58 packages available for upgrade (1 critical updates). [08:49:17] PROBLEM - db181 APT on db181 is CRITICAL: APT CRITICAL: 60 packages available for upgrade (1 critical updates). [08:50:07] PROBLEM - swiftobject151 APT on swiftobject151 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:50:23] PROBLEM - ns1 APT on ns1 is CRITICAL: APT CRITICAL: 51 packages available for upgrade (1 critical updates). [08:50:23] PROBLEM - db182 APT on db182 is CRITICAL: APT CRITICAL: 60 packages available for upgrade (1 critical updates). [08:50:40] PROBLEM - ldap171 APT on ldap171 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:50:45] PROBLEM - rdb151 APT on rdb151 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:51:05] PROBLEM - os161 APT on os161 is CRITICAL: APT CRITICAL: 53 packages available for upgrade (1 critical updates). [08:51:35] PROBLEM - reports171 APT on reports171 is CRITICAL: APT CRITICAL: 64 packages available for upgrade (1 critical updates). [08:51:59] PROBLEM - mem161 APT on mem161 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:52:11] PROBLEM - cp27 APT on cp27 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:52:24] PROBLEM - graphite151 APT on graphite151 is CRITICAL: APT CRITICAL: 40 packages available for upgrade (1 critical updates). [08:52:27] PROBLEM - phorge171 APT on phorge171 is CRITICAL: APT CRITICAL: 54 packages available for upgrade (1 critical updates). [08:53:56] PROBLEM - swiftac171 APT on swiftac171 is CRITICAL: APT CRITICAL: 41 packages available for upgrade (1 critical updates). [08:54:11] PROBLEM - swiftobject161 APT on swiftobject161 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:54:16] PROBLEM - cloud16 APT on cloud16 is CRITICAL: APT CRITICAL: 102 packages available for upgrade (1 critical updates). [08:54:32] PROBLEM - matomo151 APT on matomo151 is CRITICAL: APT CRITICAL: 63 packages available for upgrade (1 critical updates). [08:54:32] PROBLEM - puppet181 APT on puppet181 is CRITICAL: APT CRITICAL: 61 packages available for upgrade (3 critical updates). [08:55:40] PROBLEM - bast161 APT on bast161 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:55:48] PROBLEM - test151 APT on test151 is CRITICAL: APT CRITICAL: 75 packages available for upgrade (1 critical updates). [08:57:05] PROBLEM - bots171 APT on bots171 is CRITICAL: APT CRITICAL: 63 packages available for upgrade (1 critical updates). [09:15:50] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:32:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:58:27] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.38, 21.29, 18.92 [11:02:22] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.55, 20.30, 19.11 [11:10:06] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.00, 21.02, 20.04 [11:15:58] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.32, 21.88, 20.68 [11:17:55] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.34, 21.18, 20.59 [11:21:50] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.74, 19.34, 20.03 [11:28:54] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [11:29:29] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 9.22, 5.65, 3.72 [11:29:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.53, 21.53, 20.73 [11:30:21] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:30:54] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [11:31:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.78, 20.24, 20.34 [11:32:18] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [11:32:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:32:50] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 16 minutes ago with 0 failures [11:33:08] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 5.893 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [11:37:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:37:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.84, 20.74, 20.45 [11:39:24] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.97, 3.87, 3.99 [11:39:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.56, 19.67, 20.13 [11:41:22] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.78, 5.15, 4.46 [11:43:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.76, 21.32, 20.73 [11:45:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.56, 20.12, 20.38 [11:46:49] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [11:55:16] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.05, 2.85, 3.87 [11:56:48] PROBLEM - db151 Backups SQL on db151 is CRITICAL: FILE_AGE CRITICAL: /var/log/sql-backup.log is 1209610 seconds old and 141321 bytes [11:59:13] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.08, 2.16, 3.37 [12:07:25] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:08:05] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.01, 3.51, 3.51 [12:10:05] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.64, 3.10, 3.36 [12:10:35] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.75, 19.47, 23.65 [12:17:31] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:18:56] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 2.56, 4.34, 3.89 [12:24:02] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o