[00:01:29] PROBLEM - mem161 Current Load on mem161 is CRITICAL: LOAD CRITICAL - total load average: 7.35, 3.74, 1.59 [00:03:09] PROBLEM - mem161 APT on mem161 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:03:12] PROBLEM - mem161 Puppet on mem161 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [00:06:46] RECOVERY - osdev.wiki - LetsEncrypt on sslhost is OK: OK - Certificate 'osdev.wiki' will expire on Sun 24 Nov 2024 10:39:44 PM GMT +0000. [00:07:39] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.15, 23.17, 22.29 [00:08:50] RECOVERY - mem161 Puppet on mem161 is OK: OK: Puppet is currently enabled, last run 41 minutes ago with 0 failures [00:08:51] RECOVERY - mem161 APT on mem161 is OK: APT OK: 51 packages available for upgrade (0 critical updates). [00:13:28] RECOVERY - mem161 Current Load on mem161 is OK: LOAD OK - total load average: 0.35, 3.16, 3.16 [00:15:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:17:46] [02mw-config] 07Universal-Omega closed pull request 03#5635: Update mediawiki/mediawiki-codesniffer requirement from 43.0.0 to 44.0.0 - 13https://github.com/miraheze/mw-config/pull/5635 [00:17:49] [02mw-config] 07Universal-Omega pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/mw-config/compare/c02ca1698d8a...2c40d0f2e62f [00:17:51] [02mw-config] 07dependabot[bot] 032c40d0f - Update mediawiki/mediawiki-codesniffer requirement from 43.0.0 to 44.0.0 (#5635) [00:17:52] [02mw-config] 07Universal-Omega deleted branch 03dependabot/composer/mediawiki/mediawiki-codesniffer-44.0.0 - 13https://github.com/miraheze/mw-config [00:17:55] [02mw-config] 07Universal-Omega deleted branch 03dependabot/composer/mediawiki/mediawiki-codesniffer-44.0.0 [00:18:44] miraheze/mw-config - Universal-Omega the build passed. [00:23:17] !log [@test151] starting deploy of {'config': True} to test151 [00:23:18] !log [@test151] finished deploy of {'config': True} to test151 - SUCCESS in 0s [00:23:23] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:23:30] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:24:57] RECOVERY - coffeewiki.net - LetsEncrypt on sslhost is OK: OK - Certificate 'coffeewiki.net' will expire on Sun 24 Nov 2024 10:57:49 PM GMT +0000. [00:32:35] !log [@mwtask181] starting deploy of {'config': True} to all [00:32:45] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:32:51] !log [@mwtask181] finished deploy of {'config': True} to all - SUCCESS in 15s [00:32:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:37:13] !log [@mwtask171] starting deploy of {'config': True} to all [00:37:20] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:37:24] !log [@mwtask171] finished deploy of {'config': True} to all - SUCCESS in 11s [00:37:35] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:37:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.36, 22.53, 23.88 [00:39:39] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.23, 23.10, 23.87 [00:41:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.57, 22.40, 23.52 [00:51:39] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.83, 23.35, 23.09 [00:53:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 17.77, 21.58, 22.50 [01:04:54] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [01:05:06] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.57, 3.92, 1.69 [01:07:07] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.063 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [01:07:12] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.49, 3.13, 1.70 [01:10:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:13:31] PROBLEM - mem151 PowerDNS Recursor on mem151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [01:13:39] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.73, 21.52, 21.82 [01:14:29] PROBLEM - mem151 Current Load on mem151 is CRITICAL: LOAD CRITICAL - total load average: 7.54, 4.02, 1.76 [01:15:33] RECOVERY - mem151 PowerDNS Recursor on mem151 is OK: DNS OK: 3.844 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [01:15:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.35, 21.70, 21.82 [01:16:06] PROBLEM - mem151 APT on mem151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [01:17:24] PROBLEM - mem151 Puppet on mem151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [01:22:29] RECOVERY - mem151 Puppet on mem151 is OK: OK: Puppet is currently enabled, last run 19 minutes ago with 0 failures [01:23:52] RECOVERY - mem151 APT on mem151 is OK: APT OK: 51 packages available for upgrade (0 critical updates). [01:26:28] RECOVERY - mem151 Current Load on mem151 is OK: LOAD OK - total load average: 0.34, 3.28, 3.34 [01:33:20] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 22.28, 18.93, 16.02 [01:35:20] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 17.76, 17.75, 15.90 [01:36:28] PROBLEM - mem151 Current Load on mem151 is CRITICAL: LOAD CRITICAL - total load average: 6.59, 4.24, 3.35 [01:37:24] PROBLEM - mem151 Puppet on mem151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [01:38:33] PROBLEM - mem151 APT on mem151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [01:40:38] RECOVERY - mem151 APT on mem151 is OK: APT OK: 51 packages available for upgrade (0 critical updates). [01:41:39] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.02, 21.90, 21.58 [01:42:14] RECOVERY - mem151 Puppet on mem151 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:42:57] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 25.47, 20.29, 16.97 [01:43:52] PROBLEM - mw161 Current Load on mw161 is CRITICAL: LOAD CRITICAL - total load average: 24.76, 20.49, 16.79 [01:44:28] PROBLEM - mw162 Current Load on mw162 is WARNING: LOAD WARNING - total load average: 23.61, 21.25, 18.18 [01:44:28] PROBLEM - mem151 Current Load on mem151 is WARNING: LOAD WARNING - total load average: 0.27, 2.93, 3.45 [01:44:57] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 20.51, 20.94, 17.65 [01:45:20] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 21.96, 20.22, 17.49 [01:45:42] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.70, 20.20, 17.38 [01:45:52] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 19.25, 20.15, 17.15 [01:46:28] RECOVERY - mem151 Current Load on mem151 is OK: LOAD OK - total load average: 0.19, 2.02, 3.05 [01:47:20] PROBLEM - mw172 Current Load on mw172 is CRITICAL: LOAD CRITICAL - total load average: 27.28, 23.14, 18.90 [01:47:42] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 16.26, 19.51, 17.53 [01:48:57] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 18.74, 20.10, 18.11 [01:49:20] PROBLEM - mw172 Current Load on mw172 is WARNING: LOAD WARNING - total load average: 19.46, 21.92, 18.99 [01:50:28] RECOVERY - mw162 Current Load on mw162 is OK: LOAD OK - total load average: 18.59, 19.81, 18.59 [01:51:20] RECOVERY - mw172 Current Load on mw172 is OK: LOAD OK - total load average: 15.79, 20.11, 18.71 [02:01:52] PROBLEM - mw161 Current Load on mw161 is WARNING: LOAD WARNING - total load average: 18.95, 20.58, 18.90 [02:03:52] RECOVERY - mw161 Current Load on mw161 is OK: LOAD OK - total load average: 17.80, 19.87, 18.87 [02:06:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:23:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.44, 20.92, 23.22 [02:35:39] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.52, 22.01, 22.06 [02:37:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.65, 21.51, 21.86 [02:47:39] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 14.92, 17.84, 19.82 [02:54:29] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.19, 20.51, 20.41 [02:56:25] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.59, 22.46, 21.12 [03:00:19] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.31, 20.98, 20.89 [03:01:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:02:15] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 13.54, 18.73, 20.11 [03:17:45] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.20, 21.18, 19.55 [03:19:42] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.52, 19.68, 19.22 [03:26:27] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:27:07] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.73, 4.49, 2.99 [03:28:21] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.067 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [03:29:07] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.24, 3.76, 2.92 [03:30:29] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.75, 21.48, 20.12 [03:31:06] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.94, 4.16, 3.16 [03:31:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:32:25] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.86, 22.61, 20.68 [03:33:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.61, 3.99, 3.25 [03:37:07] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.23, 3.65, 3.25 [03:39:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.74, 3.96, 3.44 [03:40:12] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.79, 22.58, 21.68 [03:42:09] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.16, 24.12, 22.33 [03:43:06] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.05, 3.55, 3.35 [03:43:18] PROBLEM - mw171 Current Load on mw171 is CRITICAL: LOAD CRITICAL - total load average: 29.02, 22.90, 18.88 [03:45:07] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 3.36, 3.14, 3.20 [03:45:14] PROBLEM - mw171 Current Load on mw171 is WARNING: LOAD WARNING - total load average: 19.05, 21.54, 18.91 [03:47:11] RECOVERY - mw171 Current Load on mw171 is OK: LOAD OK - total load average: 17.22, 19.65, 18.51 [03:47:58] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 18.95, 22.78, 22.62 [03:49:55] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.19, 25.19, 23.53 [03:51:52] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 20.04, 23.17, 23.00 [03:57:42] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.41, 23.79, 23.22 [03:59:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.08, 23.07, 23.03 [04:03:06] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.73, 3.51, 3.23 [04:04:54] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:06:56] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 8.214 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:07:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.65, 3.58, 3.38 [04:07:39] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.42, 22.15, 22.37 [04:09:06] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.64, 3.14, 3.26 [04:09:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 16.49, 19.79, 21.47 [04:19:39] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 14.59, 18.10, 19.93 [04:26:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:29:29] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.88, 3.29, 3.09 [04:31:23] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.57, 2.86, 2.95 [04:35:12] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.10, 4.65, 3.61 [04:36:20] [Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:39:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.43, 3.51, 3.44 [04:41:06] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.13, 3.90, 3.59 [04:41:20] [Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [04:43:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.50, 3.41, 3.42 [04:45:06] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.55, 4.71, 3.91 [04:45:28] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:47:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.30, 3.75, 3.65 [04:49:06] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.48, 4.02, 3.77 [04:49:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.79, 20.23, 18.23 [04:51:30] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.070 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [04:51:39] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 16.17, 18.63, 17.89 [04:53:08] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.96, 3.81, 3.80 [04:55:06] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.13, 4.78, 4.16 [04:57:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.47, 3.95, 3.92 [05:05:07] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.47, 4.56, 4.03 [05:05:48] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:07:43] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.649 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:08:20] [Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:13:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.74, 3.48, 3.94 [05:13:20] [Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:15:06] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.46, 4.10, 4.10 [05:29:07] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.15, 3.35, 3.96 [05:31:07] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.93, 4.40, 4.28 [05:32:04] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:33:58] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.064 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:34:45] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.60, 20.67, 19.41 [05:42:31] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.35, 23.45, 21.37 [05:44:19] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:46:13] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.101 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:46:25] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 26.07, 23.65, 21.83 [05:48:21] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.27, 23.15, 21.90 [05:49:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:51:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.98, 3.16, 3.88 [05:55:06] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.65, 3.70, 3.93 [05:59:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.68, 2.91, 3.53 [05:59:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:01:07] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.14, 4.23, 3.95 [06:01:12] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [06:01:24] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:03:57] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [06:04:05] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [06:04:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:05:10] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.241 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [06:05:21] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [06:05:53] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 51 packages available for upgrade (0 critical updates). [06:06:06] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 20 minutes ago with 0 failures [06:07:48] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.36, 18.78, 20.23 [06:09:30] [Grafana] FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:15:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.28, 20.46, 20.36 [06:17:39] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 18.32, 20.04, 20.25 [06:21:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.10, 3.33, 3.83 [06:21:39] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 24.52, 22.76, 21.32 [06:21:58] PROBLEM - franchise.franchising.org.ua - reverse DNS on sslhost is CRITICAL: rDNS CRITICAL - franchise.franchising.org.ua All nameservers failed to answer the query. [06:23:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 19.02, 21.60, 21.09 [06:24:30] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:25:06] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.76, 2.50, 3.37 [06:27:39] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 27.25, 23.27, 21.77 [06:29:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.52, 22.18, 21.56 [06:31:54] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.38, 3.67, 3.58 [06:33:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.98, 3.89, 3.65 [06:37:35] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.43, 3.73, 3.66 [06:37:39] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 28.10, 23.60, 22.20 [06:39:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 21.71, 23.22, 22.27 [06:41:23] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.39, 2.70, 3.24 [06:45:12] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.50, 4.14, 3.67 [06:47:39] PROBLEM - mw182 Current Load on mw182 is CRITICAL: LOAD CRITICAL - total load average: 25.95, 22.84, 22.30 [06:49:39] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.75, 22.72, 22.33 [06:53:06] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.31, 3.53, 3.69 [06:55:39] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o