[00:04:29] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:04:29] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.13, 21.35, 19.75 [00:04:36] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:06:27] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.67, 22.17, 20.20 [00:06:38] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.23, 20.52, 18.45 [00:06:49] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:06:52] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:07:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:08:24] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.23, 23.03, 20.79 [00:08:38] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.42, 21.58, 19.06 [00:10:38] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 22.13, 21.41, 19.28 [00:11:42] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [00:11:45] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 26 minutes ago with 0 failures [00:12:48] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [00:13:02] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.077 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:16:14] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 19.64, 20.16, 20.28 [00:16:39] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.52, 23.53, 20.82 [00:17:29] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:19:28] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.068 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [00:20:06] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.33, 21.89, 21.06 [00:20:39] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.99, 23.67, 21.58 [00:24:39] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.17, 24.39, 22.28 [00:25:58] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.16, 24.45, 22.32 [00:45:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.26, 22.81, 23.78 [00:46:38] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.47, 22.52, 23.61 [00:47:29] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.73, 3.25, 3.88 [00:49:28] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.05, 3.79, 3.98 [00:52:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:53:25] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.98, 3.77, 3.95 [00:56:38] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.97, 22.42, 22.57 [00:57:26] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 8.15, 4.98, 4.29 [00:57:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.32, 22.19, 22.29 [00:59:24] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.17, 3.98, 4.00 [01:02:38] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.83, 21.91, 22.36 [01:05:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.45, 23.29, 23.08 [01:07:17] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.55, 2.50, 3.37 [01:12:38] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 16.72, 18.21, 20.19 [01:13:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.03, 23.60, 23.30 [01:15:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.07, 22.79, 23.00 [01:17:25] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:19:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.02, 23.49, 23.21 [01:21:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.94, 21.69, 22.57 [01:24:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:29:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.16, 24.09, 23.03 [01:31:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.79, 23.33, 22.90 [01:36:43] PROBLEM - aryavartpedia.online - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'aryavartpedia.online' expires in 15 day(s) (Sun 25 Aug 2024 01:07:41 AM GMT +0000). [01:36:54] [02ssl] 07WikiTideSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ssl/compare/8877c2a7cd6d...d3204826561a [01:36:56] [02ssl] 07WikiTideSSLBot 03d320482 - Bot: Update SSL cert for aryavartpedia.online [01:37:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.31, 22.78, 22.64 [01:45:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.86, 23.57, 23.25 [01:50:45] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.78, 20.43, 19.72 [01:51:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.31, 23.70, 23.44 [01:52:41] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 17.44, 19.72, 19.58 [01:53:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.43, 22.96, 23.24 [02:06:32] RECOVERY - aryavartpedia.online - LetsEncrypt on sslhost is OK: OK - Certificate 'aryavartpedia.online' will expire on Thu 07 Nov 2024 12:38:18 AM GMT +0000. [02:13:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.80, 23.11, 22.38 [02:14:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:15:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.14, 22.74, 22.32 [02:19:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.56, 23.97, 22.81 [02:21:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.08, 23.90, 22.97 [02:23:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.49, 25.09, 23.53 [02:32:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:41:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 14.79, 20.46, 23.06 [02:49:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.54, 17.56, 20.39 [03:04:35] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 14.96, 19.34, 22.83 [03:04:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.43, 3.99, 1.65 [03:05:30] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [03:06:01] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [03:07:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:07:32] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [03:07:58] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 22 minutes ago with 0 failures [03:10:35] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 11.90, 14.22, 19.39 [03:11:54] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:12:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:13:54] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.365 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:23:30] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:27:38] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.091 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:32:34] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:34:33] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [03:55:18] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 28.83, 22.33, 18.01 [03:58:47] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.63, 3.09, 3.78 [03:59:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.69, 21.89, 15.98 [04:04:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.40, 4.36, 3.97 [04:06:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.33, 3.80, 3.84 [04:10:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.10, 4.67, 4.11 [04:11:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 15.06, 22.74, 20.70 [04:12:07] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:13:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.12, 24.07, 21.46 [04:15:34] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.79, 22.03, 18.47 [04:17:30] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 19.04, 20.59, 18.34 [04:18:18] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [04:18:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.25, 3.54, 3.87 [04:19:26] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 15.96, 18.86, 17.97 [04:19:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.47, 23.73, 22.31 [04:20:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.17, 4.53, 4.19 [04:21:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.86, 25.59, 23.18 [04:25:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.67, 23.59, 22.92 [04:27:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:28:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.97, 3.51, 3.97 [04:29:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.04, 25.08, 23.60 [04:33:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.15, 22.73, 23.11 [04:34:48] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.13, 3.36, 3.68 [04:35:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.37, 24.01, 23.55 [04:36:47] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.89, 3.05, 3.53 [04:37:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.11, 23.47, 23.41 [04:38:48] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.98, 2.65, 3.33 [04:42:48] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.62, 4.66, 4.05 [04:47:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:48:47] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.54, 3.59, 3.84 [04:50:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.25, 4.38, 4.11 [04:51:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.52, 22.04, 22.35 [04:52:47] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.56, 3.76, 3.92 [04:54:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.94, 4.60, 4.21 [04:55:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 17.35, 20.35, 21.65 [04:58:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.47, 3.37, 3.84 [05:00:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.89, 3.99, 4.00 [05:02:47] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.61, 3.23, 3.71 [05:04:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.74, 4.19, 4.01 [05:05:13] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:07:24] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [05:07:30] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [05:07:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.94, 18.56, 20.40 [05:09:20] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 23 minutes ago with 0 failures [05:09:21] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.074 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:09:26] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 52 packages available for upgrade (0 critical updates). [05:12:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 0.25, 3.02, 3.88 [05:16:46] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.25, 1.50, 3.05 [05:17:34] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_dns] [05:22:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:23:23] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:27:30] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.000176936388 secs [05:45:47] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:15:49] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [06:26:08] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.54, 20.29, 18.51 [06:30:03] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.41, 19.83, 18.81 [06:42:50] PROBLEM - crustypedia.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'crustypedia.org' expires in 15 day(s) (Sun 25 Aug 2024 06:28:13 AM GMT +0000). [06:43:02] [02ssl] 07WikiTideSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ssl/compare/d3204826561a...a5d71e37affe [06:43:04] [02ssl] 07WikiTideSSLBot 03a5d71e3 - Bot: Update SSL cert for crustypedia.org [06:44:46] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:54:58] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:56:56] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.00133985281 secs [07:11:40] RECOVERY - crustypedia.org - LetsEncrypt on sslhost is OK: OK - Certificate 'crustypedia.org' will expire on Thu 07 Nov 2024 05:44:26 AM GMT +0000. [07:42:25] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:54:35] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.63, 22.26, 23.62 [07:56:35] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.33, 23.33, 23.85 [08:40:52] PROBLEM - os162 APT on os162 is CRITICAL: APT CRITICAL: 54 packages available for upgrade (1 critical updates). [08:42:30] PROBLEM - kafka181 APT on kafka181 is CRITICAL: APT CRITICAL: 38 packages available for upgrade (1 critical updates). [08:42:34] PROBLEM - bast181 APT on bast181 is CRITICAL: APT CRITICAL: 53 packages available for upgrade (1 critical updates). [08:43:15] PROBLEM - db161 APT on db161 is CRITICAL: APT CRITICAL: 59 packages available for upgrade (1 critical updates). [08:43:18] PROBLEM - mw151 APT on mw151 is CRITICAL: APT CRITICAL: 65 packages available for upgrade (1 critical updates). [08:43:19] PROBLEM - mw171 APT on mw171 is CRITICAL: APT CRITICAL: 65 packages available for upgrade (1 critical updates). [08:44:02] PROBLEM - mw162 APT on mw162 is CRITICAL: APT CRITICAL: 65 packages available for upgrade (1 critical updates). [08:44:05] PROBLEM - mw161 APT on mw161 is CRITICAL: APT CRITICAL: 65 packages available for upgrade (1 critical updates). [08:44:06] PROBLEM - mw172 APT on mw172 is CRITICAL: APT CRITICAL: 65 packages available for upgrade (1 critical updates). [08:44:07] PROBLEM - cloud18 APT on cloud18 is CRITICAL: APT CRITICAL: 103 packages available for upgrade (1 critical updates). [08:44:29] PROBLEM - mw182 APT on mw182 is CRITICAL: APT CRITICAL: 66 packages available for upgrade (1 critical updates). [08:44:34] PROBLEM - cp36 APT on cp36 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:44:35] PROBLEM - graylog161 APT on graylog161 is CRITICAL: APT CRITICAL: 56 packages available for upgrade (1 critical updates). [08:44:38] PROBLEM - db171 APT on db171 is CRITICAL: APT CRITICAL: 59 packages available for upgrade (1 critical updates). [08:44:56] PROBLEM - mw181 APT on mw181 is CRITICAL: APT CRITICAL: 66 packages available for upgrade (1 critical updates). [08:45:00] PROBLEM - eventgate181 APT on eventgate181 is CRITICAL: APT CRITICAL: 45 packages available for upgrade (1 critical updates). [08:45:05] PROBLEM - swiftobject181 APT on swiftobject181 is CRITICAL: APT CRITICAL: 53 packages available for upgrade (1 critical updates). [08:45:16] PROBLEM - swiftobject171 APT on swiftobject171 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:45:38] PROBLEM - cloud15 APT on cloud15 is CRITICAL: APT CRITICAL: 110 packages available for upgrade (1 critical updates). [08:45:46] PROBLEM - cp26 APT on cp26 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:45:56] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [08:46:14] PROBLEM - mw152 APT on mw152 is CRITICAL: APT CRITICAL: 65 packages available for upgrade (1 critical updates). [08:46:22] PROBLEM - cloud17 APT on cloud17 is CRITICAL: APT CRITICAL: 102 packages available for upgrade (1 critical updates). [08:46:25] PROBLEM - mwtask171 APT on mwtask171 is CRITICAL: APT CRITICAL: 63 packages available for upgrade (1 critical updates). [08:46:36] PROBLEM - mwtask181 APT on mwtask181 is CRITICAL: APT CRITICAL: 63 packages available for upgrade (1 critical updates). [08:46:39] PROBLEM - cp41 APT on cp41 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:46:56] PROBLEM - cp51 APT on cp51 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:47:14] PROBLEM - cp37 APT on cp37 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:47:18] PROBLEM - mem151 APT on mem151 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:47:55] PROBLEM - db151 APT on db151 is CRITICAL: APT CRITICAL: 60 packages available for upgrade (1 critical updates). [08:47:58] PROBLEM - os151 APT on os151 is CRITICAL: APT CRITICAL: 54 packages available for upgrade (1 critical updates). [08:48:07] PROBLEM - swiftproxy161 APT on swiftproxy161 is CRITICAL: APT CRITICAL: 41 packages available for upgrade (1 critical updates). [08:48:08] PROBLEM - jobchron171 APT on jobchron171 is CRITICAL: APT CRITICAL: 62 packages available for upgrade (1 critical updates). [08:48:32] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:48:40] PROBLEM - swiftproxy171 APT on swiftproxy171 is CRITICAL: APT CRITICAL: 55 packages available for upgrade (1 critical updates). [08:49:03] PROBLEM - mon181 APT on mon181 is CRITICAL: APT CRITICAL: 58 packages available for upgrade (1 critical updates). [08:49:17] PROBLEM - db181 APT on db181 is CRITICAL: APT CRITICAL: 60 packages available for upgrade (1 critical updates). [08:50:07] PROBLEM - swiftobject151 APT on swiftobject151 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:50:23] PROBLEM - ns1 APT on ns1 is CRITICAL: APT CRITICAL: 51 packages available for upgrade (1 critical updates). [08:50:23] PROBLEM - db182 APT on db182 is CRITICAL: APT CRITICAL: 60 packages available for upgrade (1 critical updates). [08:50:40] PROBLEM - ldap171 APT on ldap171 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:50:45] PROBLEM - rdb151 APT on rdb151 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:51:05] PROBLEM - os161 APT on os161 is CRITICAL: APT CRITICAL: 53 packages available for upgrade (1 critical updates). [08:51:35] PROBLEM - reports171 APT on reports171 is CRITICAL: APT CRITICAL: 64 packages available for upgrade (1 critical updates). [08:51:59] PROBLEM - mem161 APT on mem161 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:52:11] PROBLEM - cp27 APT on cp27 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:52:24] PROBLEM - graphite151 APT on graphite151 is CRITICAL: APT CRITICAL: 40 packages available for upgrade (1 critical updates). [08:52:27] PROBLEM - phorge171 APT on phorge171 is CRITICAL: APT CRITICAL: 54 packages available for upgrade (1 critical updates). [08:53:56] PROBLEM - swiftac171 APT on swiftac171 is CRITICAL: APT CRITICAL: 41 packages available for upgrade (1 critical updates). [08:54:11] PROBLEM - swiftobject161 APT on swiftobject161 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:54:16] PROBLEM - cloud16 APT on cloud16 is CRITICAL: APT CRITICAL: 102 packages available for upgrade (1 critical updates). [08:54:32] PROBLEM - matomo151 APT on matomo151 is CRITICAL: APT CRITICAL: 63 packages available for upgrade (1 critical updates). [08:54:32] PROBLEM - puppet181 APT on puppet181 is CRITICAL: APT CRITICAL: 61 packages available for upgrade (3 critical updates). [08:55:40] PROBLEM - bast161 APT on bast161 is CRITICAL: APT CRITICAL: 52 packages available for upgrade (1 critical updates). [08:55:48] PROBLEM - test151 APT on test151 is CRITICAL: APT CRITICAL: 75 packages available for upgrade (1 critical updates). [08:57:05] PROBLEM - bots171 APT on bots171 is CRITICAL: APT CRITICAL: 63 packages available for upgrade (1 critical updates). [09:15:50] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:32:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [10:58:27] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.38, 21.29, 18.92 [11:02:22] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.55, 20.30, 19.11 [11:10:06] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.00, 21.02, 20.04 [11:15:58] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.32, 21.88, 20.68 [11:17:55] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.34, 21.18, 20.59 [11:21:50] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.74, 19.34, 20.03 [11:28:54] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [11:29:29] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 9.22, 5.65, 3.72 [11:29:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.53, 21.53, 20.73 [11:30:21] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:30:54] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [11:31:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.78, 20.24, 20.34 [11:32:18] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [11:32:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:32:50] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 16 minutes ago with 0 failures [11:33:08] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 5.893 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [11:37:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:37:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.84, 20.74, 20.45 [11:39:24] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.97, 3.87, 3.99 [11:39:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.56, 19.67, 20.13 [11:41:22] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.78, 5.15, 4.46 [11:43:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.76, 21.32, 20.73 [11:45:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.56, 20.12, 20.38 [11:46:49] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [11:55:16] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.05, 2.85, 3.87 [11:56:48] PROBLEM - db151 Backups SQL on db151 is CRITICAL: FILE_AGE CRITICAL: /var/log/sql-backup.log is 1209610 seconds old and 141321 bytes [11:59:13] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.08, 2.16, 3.37 [12:07:25] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:08:05] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.01, 3.51, 3.51 [12:10:05] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.64, 3.10, 3.36 [12:10:35] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.75, 19.47, 23.65 [12:17:31] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:18:56] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 2.56, 4.34, 3.89 [12:24:02] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [12:24:35] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.01, 22.95, 22.38 [12:26:13] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:26:35] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [12:28:09] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset 0.0008962452412 secs [12:28:20] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [12:28:34] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.249 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [12:30:44] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 12 minutes ago with 0 failures [12:34:43] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [12:38:42] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 29.03, 23.04, 18.44 [12:41:32] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [12:43:03] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 23.92, 19.50, 15.10 [12:43:31] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.063 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [12:43:56] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 21.15, 19.65, 15.28 [12:44:06] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 24.17, 19.84, 15.39 [12:45:49] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 22.78, 21.71, 17.29 [12:45:55] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 19.05, 19.06, 15.58 [12:46:06] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.87, 20.69, 16.26 [12:47:03] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 26.97, 22.34, 17.16 [12:47:47] PROBLEM - mw162 Current Load on mw162 is CRITICAL: LOAD CRITICAL - total load average: 26.74, 23.18, 18.34 [12:48:06] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 18.88, 20.03, 16.57 [12:49:03] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 20.28, 21.51, 17.50 [12:49:45] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 24.00, 23.20, 18.92 [12:51:17] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.71, 20.48, 16.70 [12:53:03] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 16.86, 19.73, 17.73 [12:53:14] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 18.23, 19.40, 16.75 [12:53:38] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:54:06] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 23.51, 21.28, 18.21 [12:55:47] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [12:56:06] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 20.27, 20.21, 18.16 [12:57:45] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset 0.0003691911697 secs [12:59:36] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 19.54, 20.30, 19.57 [13:03:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:06:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 0.18, 2.06, 3.78 [13:08:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:08:46] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.14, 1.43, 3.34 [13:10:00] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.78, 22.82, 23.84 [13:19:47] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.38, 24.70, 24.10 [13:25:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 17.86, 22.37, 23.49 [13:27:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.87, 23.32, 23.69 [13:41:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.99, 22.51, 23.54 [13:43:00] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:43:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.82, 24.29, 24.04 [13:45:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.63, 23.15, 23.70 [13:46:42] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_dns] [13:47:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.24, 24.95, 24.33 [14:01:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.51, 21.60, 23.60 [14:13:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.74, 22.89, 22.71 [14:15:05] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:15:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.64, 22.85, 22.71 [14:28:00] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [14:45:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.84, 19.89, 20.37 [14:45:42] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [14:53:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.79, 20.93, 20.64 [14:55:13] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [14:57:22] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:57:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.36, 22.61, 21.26 [14:59:40] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [15:01:38] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.001175165176 secs [15:03:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.79, 22.94, 21.96 [15:05:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.75, 23.67, 22.35 [15:05:43] PROBLEM - ping6 on ns2 is WARNING: PING WARNING - Packet loss = 0%, RTA = 205.87 ms [15:07:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [15:11:40] RECOVERY - ping6 on ns2 is OK: PING OK - Packet loss = 0%, RTA = 196.60 ms [15:14:07] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:15:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.80, 23.71, 23.18 [15:17:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [15:23:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.73, 23.22, 23.04 [15:27:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.70, 23.12, 23.17 [15:35:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 13.71, 16.40, 19.89 [15:37:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [15:50:35] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.76, 20.62, 22.69 [15:54:07] PROBLEM - ping6 on cp41 is CRITICAL: PING CRITICAL - Packet loss = 16%, RTA = 120.68 ms [15:54:35] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.30, 22.95, 23.06 [15:58:35] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.54, 23.11, 23.12 [16:00:35] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.76, 24.61, 23.66 [16:02:07] RECOVERY - ping6 on cp41 is OK: PING OK - Packet loss = 0%, RTA = 141.50 ms [16:07:40] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.55, 21.15, 18.42 PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.55, 21.15, 18.42 [16:09:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.95, 20.75, 18.61 [16:13:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.81, 19.66, 18.66 [16:14:58] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [16:28:27] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.00, 19.60, 18.57 [16:30:24] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 19.84, 19.36, 18.60 [16:32:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [16:37:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [16:39:12] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [16:41:10] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.001276344061 secs [16:43:31] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:45:58] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.15, 21.18, 20.04 [16:47:55] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 14.38, 18.82, 19.33 [16:49:00] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 1 datacenter is down: 2400:d320:2161:9775::1/cpweb [16:49:39] PROBLEM - cp41 HTTPS on cp41 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [16:51:01] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [16:51:37] RECOVERY - cp41 HTTPS on cp41 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3821 bytes in 0.789 second response time [16:53:45] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.44, 19.57, 19.36 [17:01:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.94, 20.05, 20.04 [17:04:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.96, 3.17, 1.43 [17:08:47] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.34, 2.71, 1.63 [17:20:40] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [17:20:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 8.65, 5.44, 3.37 [17:20:58] PROBLEM - ns2 NTP time on ns2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:22:39] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.063 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [17:22:57] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.001318752766 secs [17:26:46] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.33, 3.43, 3.24 [17:28:46] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.01, 3.11, 3.15 [17:33:40] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.97, 20.99, 19.77 [17:35:40] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.73, 20.15, 19.64 [17:37:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [17:37:42] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.65, 3.47, 3.27 [17:39:42] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.94, 4.07, 3.49 [17:42:25] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [17:48:58] PROBLEM - wiki.tmyt105.leyhp.com - reverse DNS on sslhost is WARNING: LifetimeTimeout: The resolution lifetime expired after 5.404 seconds: Server 2606:4700:4700::1111 UDP port 53 answered The DNS operation timed out.; Server 2606:4700:4700::1111 UDP port 53 answered The DNS operation timed out.; Server 2606:4700:4700::1111 UDP port 53 answered The DNS operation timed out. [18:03:28] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.49, 3.33, 3.93 [18:07:25] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [18:09:24] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.46, 2.40, 3.34 [18:10:22] PROBLEM - wiki.tmyt105.leyhp.com - LetsEncrypt on sslhost is CRITICAL: Temporary failure in name resolutionHTTP CRITICAL - Unable to open TCP socket [18:12:32] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [18:13:22] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 10.84, 6.10, 4.57 [18:14:31] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.064 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [18:18:42] PROBLEM - wiki.tmyt105.leyhp.com - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - wiki.tmyt105.leyhp.com All nameservers failed to answer the query. [18:19:47] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.07, 20.19, 18.72 [18:21:45] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.71, 18.53, 18.26 [18:23:17] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.66, 3.26, 3.79 [18:25:15] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.00, 4.73, 4.27 [18:26:03] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [18:28:09] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 5.496 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [18:28:12] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [18:28:39] [Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [18:32:36] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [18:33:17] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 18 minutes ago with 0 failures [18:33:39] [Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [18:34:34] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.070 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [18:40:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [18:40:53] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:41:00] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [18:42:09] [02mediawiki-repos] 07AverageHelper opened pull request 03#30: T12447: Add GoogleForms - 13https://github.com/miraheze/mediawiki-repos/pull/30 [18:42:31] [02mediawiki-repos] 07coderabbitai[bot] commented on pull request 03#30: T12447: Add GoogleForms - 13https://github.com/miraheze/mediawiki-repos/pull/30#issuecomment-2278535348 [18:42:42] [02mw-config] 07AverageHelper opened pull request 03#5633: T12447: Add GoogleForms - 13https://github.com/miraheze/mw-config/pull/5633 [18:42:50] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [18:42:59] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.066 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [18:43:45] miraheze/mw-config - AverageHelper the build passed. [18:45:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [18:46:06] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_dns] [18:48:23] RECOVERY - wiki.tmyt105.leyhp.com - reverse DNS on sslhost is OK: SSL OK - wiki.tmyt105.leyhp.com reverse DNS resolves to cp36.wikitide.net - CNAME OK [18:55:13]