[00:07:28] RECOVERY - cp171 Disk Space on cp171 is OK: DISK OK - free space: / 57129MiB (12% inode=99%); [00:07:33] RECOVERY - cp191 Disk Space on cp191 is OK: DISK OK - free space: / 57529MiB (12% inode=99%); [00:08:13] RECOVERY - cp201 Disk Space on cp201 is OK: DISK OK - free space: / 56807MiB (12% inode=99%); [03:16:47] PROBLEM - ping6 on mattermost1 is CRITICAL: PING CRITICAL - Packet loss = 16%, RTA = 197.80 ms [03:28:59] PROBLEM - ping6 on mattermost1 is WARNING: PING WARNING - Packet loss = 0%, RTA = 198.06 ms [03:32:54] PROBLEM - db182 Disk Space on db182 is WARNING: DISK WARNING - free space: / 46371MiB (10% inode=99%); [03:33:03] PROBLEM - ping6 on mattermost1 is CRITICAL: PING CRITICAL - Packet loss = 16%, RTA = 199.79 ms [03:39:14] PROBLEM - ping6 on mattermost1 is WARNING: PING WARNING - Packet loss = 0%, RTA = 198.80 ms [03:43:20] PROBLEM - ping6 on mattermost1 is CRITICAL: PING CRITICAL - Packet loss = 37%, RTA = 200.45 ms [03:47:25] PROBLEM - ping6 on mattermost1 is WARNING: PING WARNING - Packet loss = 0%, RTA = 201.01 ms [03:50:54] PROBLEM - db182 Disk Space on db182 is CRITICAL: DISK CRITICAL - free space: / 26303MiB (5% inode=99%); [04:30:54] RECOVERY - db182 Disk Space on db182 is OK: DISK OK - free space: / 141329MiB (31% inode=99%); [05:32:20] [Grafana] FIRING: The mediawiki job queue has more than 100000 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?from=1756355510000&orgId=1&to=1756359140014 [06:02:20] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?from=1756355510000&orgId=1&to=1756360910000 [06:56:59] PROBLEM - ping6 on mattermost1 is CRITICAL: PING CRITICAL - Packet loss = 28%, RTA = 187.73 ms [07:01:04] PROBLEM - ping6 on mattermost1 is WARNING: PING WARNING - Packet loss = 0%, RTA = 194.68 ms [07:10:13] PROBLEM - cp201 Disk Space on cp201 is WARNING: DISK WARNING - free space: / 49842MiB (10% inode=99%); [07:13:14] PROBLEM - ping6 on mattermost1 is CRITICAL: PING CRITICAL - Packet loss = 28%, RTA = 193.70 ms [07:23:32] PROBLEM - ping6 on mattermost1 is WARNING: PING WARNING - Packet loss = 0%, RTA = 190.06 ms [07:27:36] PROBLEM - ping6 on mattermost1 is CRITICAL: PING CRITICAL - Packet loss = 16%, RTA = 186.26 ms [08:12:20] [Grafana] FIRING: The mediawiki job queue has more than 100000 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?from=1756365110000&orgId=1&to=1756368740011 [08:42:20] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?from=1756365110000&orgId=1&to=1756370510000 [08:43:33] PROBLEM - cp191 Disk Space on cp191 is WARNING: DISK WARNING - free space: / 49941MiB (10% inode=99%); [09:11:27] PROBLEM - cp171 Disk Space on cp171 is WARNING: DISK WARNING - free space: / 49884MiB (10% inode=99%); [11:04:58] RECOVERY - ping6 on mattermost1 is OK: PING OK - Packet loss = 0%, RTA = 148.05 ms [12:22:33] [02RequestCustomDomain] 07translatewiki pushed 1 new commit to 03main 13https://github.com/miraheze/RequestCustomDomain/commit/7d7bbcc9cf5f76995dd15fdb4d2673bd04ef6332 [12:22:33] 02RequestCustomDomain/03main 07translatewiki.net 037d7bbcc Localisation updates from https://translatewiki.net. [12:39:12] !log [somerandomdeveloper@test151] starting deploy of {'versions': ['1.43', '1.44'], 'upgrade_extensions': 'MassEditRegex'} to test151 [12:39:14] !log [somerandomdeveloper@test151] finished deploy of {'versions': ['1.43', '1.44'], 'upgrade_extensions': 'MassEditRegex'} to test151 - SUCCESS in 2s [12:39:21] !log [somerandomdeveloper@mwtask181] starting deploy of {'versions': ['1.43', '1.44'], 'upgrade_extensions': 'MassEditRegex'} to all [12:39:21] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:39:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:39:40] PROBLEM - puppet181 Check unit status of listdomains_github_push on puppet181 is CRITICAL: CRITICAL: Status of the systemd unit listdomains_github_push [12:40:17] !log [somerandomdeveloper@mwtask181] finished deploy of {'versions': ['1.43', '1.44'], 'upgrade_extensions': 'MassEditRegex'} to all - SUCCESS in 56s [12:40:20] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:42:32] miraheze/RequestCustomDomain - translatewiki the build passed. [12:45:40] RECOVERY - puppet181 Check unit status of listdomains_github_push on puppet181 is OK: OK: Status of the systemd unit listdomains_github_push [12:45:59] [02ssl] 07WikiTideBot pushed 1 new commit to 03main 13https://github.com/miraheze/ssl/commit/37680a7f7932882e6ca4aa9fd4a52a942610a36b [12:45:59] 02ssl/03main 07WikiTideBot 0337680a7 Bot: Auto-update domain lists [13:39:40] PROBLEM - puppet181 Check unit status of listdomains_github_push on puppet181 is CRITICAL: CRITICAL: Status of the systemd unit listdomains_github_push [13:45:40] RECOVERY - puppet181 Check unit status of listdomains_github_push on puppet181 is OK: OK: Status of the systemd unit listdomains_github_push [13:49:40] PROBLEM - puppet181 Check unit status of listdomains_github_push on puppet181 is CRITICAL: CRITICAL: Status of the systemd unit listdomains_github_push [13:55:40] RECOVERY - puppet181 Check unit status of listdomains_github_push on puppet181 is OK: OK: Status of the systemd unit listdomains_github_push [14:16:08] PROBLEM - ping6 on mattermost1 is WARNING: PING WARNING - Packet loss = 0%, RTA = 150.12 ms [14:19:40] PROBLEM - puppet181 Check unit status of listdomains_github_push on puppet181 is CRITICAL: CRITICAL: Status of the systemd unit listdomains_github_push [14:25:40] RECOVERY - puppet181 Check unit status of listdomains_github_push on puppet181 is OK: OK: Status of the systemd unit listdomains_github_push [14:28:52] [02mw-config] 07dependabot[bot] created 03dependabot/composer/phpunit/phpunit-12.3.7 (+1 new commit) 13https://github.com/miraheze/mw-config/commit/f3571ab8a530 [14:28:52] 02mw-config/03dependabot/composer/phpunit/phpunit-12.3.7 07dependabot[bot] 03f3571ab Update phpunit/phpunit requirement from 12.3.6 to 12.3.7… [14:28:53] [02mw-config] 07dependabot[bot] added the label 'dependencies' to pull request #6075 (Update phpunit/phpunit requirement from 12.3.6 to 12.3.7) 13https://github.com/miraheze/mw-config/pull/6075 [14:28:53] [02mw-config] 07dependabot[bot] added the label 'php' to pull request #6075 (Update phpunit/phpunit requirement from 12.3.6 to 12.3.7) 13https://github.com/miraheze/mw-config/pull/6075 [14:28:55] [02mw-config] 07dependabot[bot] opened pull request #6075: Update phpunit/phpunit requirement from 12.3.6 to 12.3.7 (03main...03dependabot/composer/phpunit/phpunit-12.3.7) 13https://github.com/miraheze/mw-config/pull/6075 [14:28:57] [02mw-config] 07dependabot[bot] added the label 'dependencies' to pull request #6075 (Update phpunit/phpunit requirement from 12.3.6 to 12.3.7) 13https://github.com/miraheze/mw-config/pull/6075 [14:28:59] [02mw-config] 07dependabot[bot] added the label 'php' to pull request #6075 (Update phpunit/phpunit requirement from 12.3.6 to 12.3.7) 13https://github.com/miraheze/mw-config/pull/6075 [14:29:49] miraheze/mw-config - dependabot[bot] the build passed. [14:39:40] PROBLEM - puppet181 Check unit status of listdomains_github_push on puppet181 is CRITICAL: CRITICAL: Status of the systemd unit listdomains_github_push [14:45:40] RECOVERY - puppet181 Check unit status of listdomains_github_push on puppet181 is OK: OK: Status of the systemd unit listdomains_github_push [14:57:40] PROBLEM - puppet181 Check unit status of listdomains_github_push on puppet181 is CRITICAL: CRITICAL: Status of the systemd unit listdomains_github_push [15:05:40] RECOVERY - puppet181 Check unit status of listdomains_github_push on puppet181 is OK: OK: Status of the systemd unit listdomains_github_push [15:09:40] PROBLEM - puppet181 Check unit status of listdomains_github_push on puppet181 is CRITICAL: CRITICAL: Status of the systemd unit listdomains_github_push [15:15:40] RECOVERY - puppet181 Check unit status of listdomains_github_push on puppet181 is OK: OK: Status of the systemd unit listdomains_github_push [16:03:19] RECOVERY - ping6 on mattermost1 is OK: PING OK - Packet loss = 0%, RTA = 147.01 ms [16:37:40] PROBLEM - puppet181 Check unit status of listdomains_github_push on puppet181 is CRITICAL: CRITICAL: Status of the systemd unit listdomains_github_push [16:37:55] Why you so grumpy today [16:55:40] RECOVERY - puppet181 Check unit status of listdomains_github_push on puppet181 is OK: OK: Status of the systemd unit listdomains_github_push [18:37:20] [Grafana] FIRING: The mediawiki job queue has more than 100000 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?from=1756402610000&orgId=1&to=1756406240017 [19:07:20] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?from=1756402610000&orgId=1&to=1756408010000 [20:27:37] !log [skye@mwtask181] sudo -u www-data php /srv/mediawiki/1.43/maintenance/run.php MirahezeMagic:GenerateMirahezeSitemap --wiki=soulframewiki (END - exit=0) [20:27:40] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [21:21:40] PROBLEM - puppet181 Check unit status of listdomains_github_push on puppet181 is CRITICAL: CRITICAL: Status of the systemd unit listdomains_github_push [21:25:40] RECOVERY - puppet181 Check unit status of listdomains_github_push on puppet181 is OK: OK: Status of the systemd unit listdomains_github_push [21:54:52] quick nap huh [22:00:34] PROBLEM - mw171 HTTPS on mw171 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [22:00:36] PROBLEM - mw193 HTTPS on mw193 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [22:00:38] PROBLEM - mw203 HTTPS on mw203 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [22:00:40] PROBLEM - mw201 MediaWiki Rendering on mw201 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 8191 bytes in 0.012 second response time [22:00:44] PROBLEM - mw201 HTTPS on mw201 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [22:00:49] PROBLEM - mw172 HTTPS on mw172 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [22:00:51] PROBLEM - mw171 MediaWiki Rendering on mw171 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 8191 bytes in 0.012 second response time [22:00:51] PROBLEM - mw183 HTTPS on mw183 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [22:00:52] PROBLEM - mw192 MediaWiki Rendering on mw192 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 8191 bytes in 0.018 second response time [22:00:52] PROBLEM - mw203 MediaWiki Rendering on mw203 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 8191 bytes in 0.011 second response time [22:00:57] PROBLEM - mw162 HTTPS on mw162 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [22:00:59] uhoh [22:00:59] PROBLEM - cp171 HTTPS on cp171 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [22:01:03] PROBLEM - mw202 HTTPS on mw202 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [22:01:04] PROBLEM - mw191 MediaWiki Rendering on mw191 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 8191 bytes in 0.012 second response time [22:01:20] PROBLEM - cp201 HTTPS on cp201 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [22:01:22] PROBLEM - mw163 MediaWiki Rendering on mw163 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 8191 bytes in 0.012 second response time [22:01:23] @Infrastructure Specialists i think they're a bit sad [22:01:25] PROBLEM - mw161 HTTPS on mw161 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [22:01:28] PROBLEM - mw192 HTTPS on mw192 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [22:01:31] PROBLEM - mw182 MediaWiki Rendering on mw182 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 8191 bytes in 0.011 second response time [22:01:33] PROBLEM - cp161 HTTPS on cp161 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [22:01:34] PROBLEM - mw183 MediaWiki Rendering on mw183 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 8191 bytes in 0.011 second response time [22:01:35] PROBLEM - mw202 MediaWiki Rendering on mw202 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 8191 bytes in 0.012 second response time [22:01:36] PROBLEM - mw153 MediaWiki Rendering on mw153 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 8191 bytes in 0.014 second response time [22:01:37] PROBLEM - mw153 HTTPS on mw153 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [22:01:41] PROBLEM - db161 Current Load on db161 is CRITICAL: LOAD CRITICAL - total load average: 175.93, 88.20, 35.52 [22:01:48] PROBLEM - cp161 Varnish Backends on cp161 is CRITICAL: 19 backends are down. mw151 mw152 mw161 mw162 mw171 mw172 mw181 mw182 mw153 mw163 mw173 mw183 mw191 mw192 mw193 mw201 mw202 mw203 mediawiki [22:01:52] PROBLEM - mw193 MediaWiki Rendering on mw193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:01:52] PROBLEM - cp171 Varnish Backends on cp171 is CRITICAL: 19 backends are down. mw151 mw152 mw161 mw162 mw171 mw172 mw181 mw182 mw153 mw163 mw173 mw183 mw191 mw192 mw193 mw201 mw202 mw203 mediawiki [22:01:54] PROBLEM - mw161 MediaWiki Rendering on mw161 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:01:54] PROBLEM - mw162 MediaWiki Rendering on mw162 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:01:59] PROBLEM - mw151 HTTPS on mw151 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10004 milliseconds with 0 bytes received [22:02:02] PROBLEM - mw152 HTTPS on mw152 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10001 milliseconds with 0 bytes received [22:02:03] PROBLEM - mw181 MediaWiki Rendering on mw181 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:02:03] PROBLEM - cp201 Varnish Backends on cp201 is CRITICAL: 19 backends are down. mw151 mw152 mw161 mw162 mw171 mw172 mw181 mw182 mw153 mw163 mw173 mw183 mw191 mw192 mw193 mw201 mw202 mw203 mediawiki [22:02:04] PROBLEM - mw151 MediaWiki Rendering on mw151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:02:07] PROBLEM - mw181 HTTPS on mw181 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10004 milliseconds with 0 bytes received [22:02:12] PROBLEM - mw163 HTTPS on mw163 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10004 milliseconds with 0 bytes received [22:02:14] PROBLEM - mw182 HTTPS on mw182 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10004 milliseconds with 0 bytes received [22:02:15] PROBLEM - mw172 MediaWiki Rendering on mw172 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:02:20] PROBLEM - cp191 HTTPS on cp191 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [22:02:22] PROBLEM - mw173 MediaWiki Rendering on mw173 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:02:25] PROBLEM - mw191 HTTPS on mw191 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10004 milliseconds with 0 bytes received [22:02:26] PROBLEM - mw173 HTTPS on mw173 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10003 milliseconds with 0 bytes received [22:02:28] PROBLEM - mw152 MediaWiki Rendering on mw152 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:02:41] PROBLEM - cp191 Varnish Backends on cp191 is CRITICAL: 19 backends are down. mw151 mw152 mw161 mw162 mw171 mw172 mw181 mw182 mw153 mw163 mw173 mw183 mw191 mw192 mw193 mw201 mw202 mw203 mediawiki [22:02:45] PROBLEM - cp161 HTTP 4xx/5xx ERROR Rate on cp161 is CRITICAL: CRITICAL - NGINX Error Rate is 80% [22:03:58] I think db161 is not having it [22:04:03] PROBLEM - db161 APT on db161 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [22:04:46] nah she's fine wdym [22:05:13] RECOVERY - mw162 HTTPS on mw162 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 9.050 second response time [22:05:15] PROBLEM - db161 Puppet on db161 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [22:05:18] RECOVERY - mw202 HTTPS on mw202 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 8.174 second response time [22:05:20] RECOVERY - mw191 MediaWiki Rendering on mw191 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 9.115 second response time [22:05:22] uhh [22:05:27] wtf? [22:05:29] RECOVERY - mw163 MediaWiki Rendering on mw163 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.174 second response time [22:05:30] RECOVERY - mw161 HTTPS on mw161 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.064 second response time [22:05:36] RECOVERY - mw192 HTTPS on mw192 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.070 second response time [22:05:37] RECOVERY - mw182 MediaWiki Rendering on mw182 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.241 second response time [22:05:38] girls you can't just do that [22:05:40] RECOVERY - mw153 MediaWiki Rendering on mw153 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.181 second response time [22:05:42] RECOVERY - mw183 MediaWiki Rendering on mw183 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.197 second response time [22:05:42] RECOVERY - mw153 HTTPS on mw153 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.061 second response time [22:05:45] RECOVERY - mw202 MediaWiki Rendering on mw202 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.195 second response time [22:05:49] RECOVERY - mw193 MediaWiki Rendering on mw193 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.202 second response time [22:05:55] RECOVERY - mw161 MediaWiki Rendering on mw161 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.177 second response time [22:05:55] RECOVERY - mw162 MediaWiki Rendering on mw162 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.170 second response time [22:05:59] RECOVERY - mw151 HTTPS on mw151 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.060 second response time [22:06:01] RECOVERY - mw151 MediaWiki Rendering on mw151 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.176 second response time [22:06:03] RECOVERY - cp201 Varnish Backends on cp201 is OK: All 31 backends are healthy [22:06:08] RECOVERY - mw152 HTTPS on mw152 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.057 second response time [22:06:10] RECOVERY - mw181 MediaWiki Rendering on mw181 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.187 second response time [22:06:11] RECOVERY - mw181 HTTPS on mw181 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.064 second response time [22:06:13] PROBLEM - db161 MariaDB Connections on db161 is UNKNOWN: PHP Fatal error: Uncaught mysqli_sql_exception: Too many connections in /usr/lib/nagios/plugins/check_mysql_connections.php:66Stack trace:#0 /usr/lib/nagios/plugins/check_mysql_connections.php(66): mysqli_real_connect(Object(mysqli), 'db161.fsslc.wtn...', 'icinga', Object(SensitiveParameterValue), NULL, NULL, NULL, true)#1 {main} thrown in /usr/lib/nagios/plugins/check_mysql_conne [22:06:13] on line 66Fatal error: Uncaught mysqli_sql_exception: Too many connections in /usr/lib/nagios/plugins/check_mysql_connections.php:66Stack trace:#0 /usr/lib/nagios/plugins/check_mysql_connections.php(66): mysqli_real_connect(Object(mysqli), 'db161.fsslc.wtn...', 'icinga', Object(SensitiveParameterValue), NULL, NULL, NULL, true)#1 {main} thrown in /usr/lib/nagios/plugins/check_mysql_connections.php on line 66 RECOVERY - mw163 HTTPS on mw163 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.059 second response time [22:06:17] PROBLEM - db161 MariaDB on db161 is CRITICAL: Too many connections [22:06:19] RECOVERY - mw182 HTTPS on mw182 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.066 second response time [22:06:20] RECOVERY - cp191 HTTPS on cp191 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4161 bytes in 0.065 second response time [22:06:23] RECOVERY - mw172 MediaWiki Rendering on mw172 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.169 second response time [22:06:26] RECOVERY - mw152 MediaWiki Rendering on mw152 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.197 second response time [22:06:30] RECOVERY - mw173 MediaWiki Rendering on mw173 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.168 second response time [22:06:32] RECOVERY - mw173 HTTPS on mw173 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.077 second response time [22:06:35] RECOVERY - mw191 HTTPS on mw191 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 6.137 second response time [22:06:40] RECOVERY - mw193 HTTPS on mw193 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.068 second response time [22:06:41] RECOVERY - cp191 Varnish Backends on cp191 is OK: All 31 backends are healthy [22:06:42] RECOVERY - mw171 HTTPS on mw171 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.062 second response time [22:06:43] RECOVERY - mw203 HTTPS on mw203 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.067 second response time [22:06:44] RECOVERY - mw201 MediaWiki Rendering on mw201 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.186 second response time [22:06:45] PROBLEM - cp161 HTTP 4xx/5xx ERROR Rate on cp161 is WARNING: WARNING - NGINX Error Rate is 57% [22:06:55] RECOVERY - mw183 HTTPS on mw183 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.066 second response time [22:06:57] RECOVERY - mw201 HTTPS on mw201 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.071 second response time [22:06:57] RECOVERY - mw171 MediaWiki Rendering on mw171 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.165 second response time [22:06:58] gg [22:06:59] RECOVERY - cp171 HTTPS on cp171 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4161 bytes in 0.056 second response time [22:06:59] wp [22:06:59] RECOVERY - mw203 MediaWiki Rendering on mw203 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.188 second response time [22:07:06] RECOVERY - mw172 HTTPS on mw172 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.058 second response time [22:07:07] RECOVERY - mw192 MediaWiki Rendering on mw192 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.190 second response time [22:07:20] RECOVERY - cp201 HTTPS on cp201 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4161 bytes in 0.061 second response time [22:07:33] RECOVERY - cp161 HTTPS on cp161 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4216 bytes in 0.075 second response time [22:07:48] RECOVERY - cp161 Varnish Backends on cp161 is OK: All 31 backends are healthy [22:07:52] RECOVERY - cp171 Varnish Backends on cp171 is OK: All 31 backends are healthy [22:08:31] it recovered after this message lmao [22:08:43] skye magic strikes again 😎 [22:08:45] RECOVERY - cp161 HTTP 4xx/5xx ERROR Rate on cp161 is OK: OK - NGINX Error Rate is 27% [22:11:07] [1/2] I've got the feeling this is still not optimal [22:11:07] [2/2] https://cdn.discordapp.com/attachments/808001911868489748/1410748559032058027/image.png?ex=68b2257a&is=68b0d3fa&hm=44d72bbb3903fcddc15274e8d5b3755aa91d75a1641068e2e490a07db6010c6d& [22:13:01] just a few hundreds, can't hurt to do a little cardio [22:17:09] ^ @cosmicalpha [22:17:21] might be worth restarting db161 [22:17:41] ... thats just a little high... [22:17:43] looking [22:18:41] Why is db161 and 181 suddenly causing issues... previously it was more only 171 [22:18:44] you missed the part where it had a stroke lol [22:19:01] https://cdn.discordapp.com/attachments/808001911868489748/1410750547430084688/image.png?ex=68b22754&is=68b0d5d4&hm=9da17006c2240955c4ec16cadaca124ecc900b3f36f3cfb7346eb67a8a9e5c31& [22:19:28] I wonder if we need more RAM on db [22:20:01] the RAM usage didn't really change before the outage happened [22:20:07] 9gb free [22:20:55] [1/2] though, what is happening here? (last 7 days) [22:20:56] [2/2] https://cdn.discordapp.com/attachments/808001911868489748/1410751026612670585/image.png?ex=68b227c7&is=68b0d647&hm=f6304c025afe472964221c9b9d6409187bf5de999c335095bda21ebf7a16b3b5& [22:21:12] wtf [22:21:15] looks like a memory leak [22:27:43] [1/2] error: 'Received error packet before completion of TLS handshake. The authenticity of the following error cannot be verified: 1040 - Too many connections [22:27:43] [2/2] was today's error. I restarted db161 but before I did that I managed to dump SHOW FULL PROCESSLIST to my home directory so I can examine that later. [22:28:20] PROBLEM - db161 ferm_active on db161 is CRITICAL: connect to address 10.0.16.128 port 5666: Connection refusedconnect to host 10.0.16.128 port 5666: Connection refused [22:28:36] PROBLEM - db161 PowerDNS Recursor on db161 is CRITICAL: connect to address 10.0.16.128 port 5666: Connection refusedconnect to host 10.0.16.128 port 5666: Connection refused [22:28:43] PROBLEM - db161 Check unit status of sql-backup on db161 is CRITICAL: connect to address 10.0.16.128 port 5666: Connection refusedconnect to host 10.0.16.128 port 5666: Connection refused [22:29:23] RECOVERY - db161 APT on db161 is OK: APT OK: 85 packages available for upgrade (0 critical updates). [22:29:40] RECOVERY - db161 Current Load on db161 is OK: LOAD OK - total load average: 1.31, 0.35, 0.12 [22:30:13] RECOVERY - db161 MariaDB Connections on db161 is OK: OK connection usage: 47.6%Current connections: 476 [22:30:16] RECOVERY - db161 MariaDB on db161 is OK: Uptime: 58 Threads: 476 Questions: 6801 Slow queries: 1 Opens: 264 Open tables: 258 Queries per second avg: 117.258 [22:30:23] RECOVERY - db161 ferm_active on db161 is OK: OK ferm input default policy is set [22:30:38] RECOVERY - db161 PowerDNS Recursor on db161 is OK: DNS OK: 2.497 seconds response time. db161.fsslc.wtnet returns 10.0.16.128 [22:30:44] RECOVERY - db161 Check unit status of sql-backup on db161 is OK: OK: Status of the systemd unit sql-backup [22:31:28] @abaddriverlol load is skyrocketing again. [22:31:32] after rebooting [22:31:38] oh [22:31:41] that is not good [22:31:42] That’s normal [22:31:50] https://cdn.discordapp.com/attachments/808001911868489748/1410753774028591237/33e0f5d11dbae3823e9700d931a7eac7385c0daf.webp?ex=68b22a56&is=68b0d8d6&hm=6aeea185fd4e60c9832880cf0da3c4909745b7843415b984476cdbe5f9ce559d& [22:31:52] Set wgDatabaseClusertMaintenance for a bit [22:32:01] it should help fix it [22:32:09] PROBLEM - mw163 HTTPS on mw163 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [22:32:09] PROBLEM - mw182 HTTPS on mw182 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [22:32:10] It was at 900 for a second [22:32:27] PROBLEM - cp191 HTTPS on cp191 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [22:32:29] PROBLEM - mw172 MediaWiki Rendering on mw172 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:32:29] PROBLEM - mw191 HTTPS on mw191 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10003 milliseconds with 0 bytes received [22:32:30] pre-reboot [22:32:32] PROBLEM - mw152 MediaWiki Rendering on mw152 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:32:32] PROBLEM - mw173 HTTPS on mw173 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10001 milliseconds with 0 bytes received [22:32:41] PROBLEM - mw193 HTTPS on mw193 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10000 milliseconds with 0 bytes received [22:32:41] PROBLEM - cp191 Varnish Backends on cp191 is CRITICAL: 19 backends are down. mw151 mw152 mw161 mw162 mw171 mw172 mw181 mw182 mw153 mw163 mw173 mw183 mw191 mw192 mw193 mw201 mw202 mw203 mediawiki [22:32:41] PROBLEM - mw203 HTTPS on mw203 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10002 milliseconds with 0 bytes received [22:32:41] PROBLEM - mw173 MediaWiki Rendering on mw173 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:32:44] PROBLEM - mw171 HTTPS on mw171 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10003 milliseconds with 0 bytes received [22:32:46] PROBLEM - mw201 MediaWiki Rendering on mw201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:32:50] PROBLEM - mw201 HTTPS on mw201 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10003 milliseconds with 0 bytes received [22:32:50] @abaddriverlol do you mind doing this? [22:32:54] PROBLEM - mw171 MediaWiki Rendering on mw171 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:32:54] PROBLEM - mw203 MediaWiki Rendering on mw203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:32:58] PROBLEM - mw183 HTTPS on mw183 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10004 milliseconds with 0 bytes received [22:32:59] PROBLEM - cp171 HTTPS on cp171 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [22:33:01] PROBLEM - mw172 HTTPS on mw172 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10003 milliseconds with 0 bytes received [22:33:02] PROBLEM - mw192 MediaWiki Rendering on mw192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:33:04] yeah, gimme a sec [22:33:13] RECOVERY - db161 Puppet on db161 is OK: OK: Puppet is currently enabled, last run 48 minutes ago with 0 failures [22:33:20] PROBLEM - cp201 HTTPS on cp201 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [22:33:33] PROBLEM - cp161 HTTPS on cp161 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [22:33:40] PROBLEM - db161 Current Load on db161 is CRITICAL: LOAD CRITICAL - total load average: 202.07, 169.97, 73.11 [22:34:05] RECOVERY - mw163 HTTPS on mw163 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.060 second response time [22:34:07] RECOVERY - mw182 HTTPS on mw182 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.058 second response time [22:34:22] RECOVERY - cp191 HTTPS on cp191 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4161 bytes in 0.065 second response time [22:34:26] RECOVERY - mw152 MediaWiki Rendering on mw152 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.187 second response time [22:34:26] RECOVERY - mw191 HTTPS on mw191 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.065 second response time [22:34:28] RECOVERY - mw172 MediaWiki Rendering on mw172 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.180 second response time [22:34:30] RECOVERY - mw173 HTTPS on mw173 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.062 second response time [22:34:35] RECOVERY - mw193 HTTPS on mw193 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.062 second response time [22:34:37] RECOVERY - mw203 HTTPS on mw203 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.069 second response time [22:34:40] RECOVERY - mw171 HTTPS on mw171 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.059 second response time [22:34:40] RECOVERY - mw201 MediaWiki Rendering on mw201 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.204 second response time [22:34:41] RECOVERY - cp191 Varnish Backends on cp191 is OK: All 31 backends are healthy [22:34:41] RECOVERY - mw173 MediaWiki Rendering on mw173 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.179 second response time [22:34:47] RECOVERY - mw201 HTTPS on mw201 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.066 second response time [22:34:49] RECOVERY - mw171 MediaWiki Rendering on mw171 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.170 second response time [22:34:50] RECOVERY - mw203 MediaWiki Rendering on mw203 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.191 second response time [22:34:52] RECOVERY - mw183 HTTPS on mw183 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.068 second response time [22:34:53] can't find that setting, is it ? [22:34:53] [GitHub] [miraheze/mw-config] Database.php @ 2224d24fe8828a3392c940728523ad4d8a00bb98 | L88:  'readOnlyBySection' => [ [22:34:59] RECOVERY - cp171 HTTPS on cp171 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4161 bytes in 0.068 second response time [22:34:59] RECOVERY - mw192 MediaWiki Rendering on mw192 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.193 second response time [22:35:00] RECOVERY - mw172 HTTPS on mw172 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4112 bytes in 0.064 second response time [22:35:14] I think its in LocalSettings towards the very top [22:35:20] RECOVERY - cp201 HTTPS on cp201 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4161 bytes in 0.064 second response time [22:35:31] Anyway its going down now @abaddriverlol [22:35:33] RECOVERY - cp161 HTTPS on cp161 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4216 bytes in 0.077 second response time [22:35:43] @abaddriverlol https://github.com/miraheze/mw-config/blob/main/LocalSettings.php#L78 [22:35:43] [GitHub] [miraheze/mw-config] LocalSettings.php @ main | L78: $wgDatabaseClustersMaintenance = []; [22:35:48] load dropped from 800 to 36 [22:36:09] alr nice [22:36:12] ty [22:36:24] let's hope it stays that way [22:36:49] we should put this in the tech docs [22:37:07] I saw pre reboot load on db161 at 980 in htop. [22:37:21] Thats about the highest ive ever seen. [22:37:41] should consider consolidating some of the stuff we have strewn across the tech namespace [22:38:25] yep [22:38:48] I use static tech docs almost always as to me its easier to find and navigate lol. [22:39:28] Amazing how load goes from 980 to 3 lol [22:39:29] ironic [23:03:40] PROBLEM - db161 Current Load on db161 is WARNING: LOAD WARNING - total load average: 1.57, 1.56, 11.64 [23:05:04] please no we already cured you [23:05:39] [02ssl] 07WikiTideBot pushed 1 new commit to 03main 13https://github.com/miraheze/ssl/commit/7ef69f40b6449e83c47d8de2ad27b5000abc1db0 [23:05:39] 02ssl/03main 07WikiTideBot 037ef69f4 Bot: Auto-update domain lists [23:07:40] RECOVERY - db161 Current Load on db161 is OK: LOAD OK - total load average: 1.00, 1.22, 9.19