[00:00:47] RECOVERY - mwtask181 Check unit status of update-static-tech-docs on mwtask181 is OK: OK: Status of the systemd unit update-static-tech-docs [00:02:25] [02statichelp] 07WikiTideBot pushed 1 new commit to 03main 13https://github.com/miraheze/statichelp/commit/0f91f4f737b5379fcd68f0edd7714860fe1f0706 [00:02:25] 02statichelp/03main 07WikiTideBot 030f91f4f Bot: Auto-update Tech namespace pages 2025-11-11 00:02:23 [00:26:18] RECOVERY - cp201 Disk Space on cp201 is OK: DISK OK - free space: / 61670MiB (13% inode=99%); [00:39:33] PROBLEM - mw192 Current Load on mw192 is WARNING: LOAD WARNING - total load average: 22.59, 20.47, 18.95 [00:41:29] RECOVERY - mw192 Current Load on mw192 is OK: LOAD OK - total load average: 18.99, 19.90, 18.94 [00:43:19] PROBLEM - mw201 Current Load on mw201 is WARNING: LOAD WARNING - total load average: 20.54, 19.84, 18.52 [00:45:14] RECOVERY - mw201 Current Load on mw201 is OK: LOAD OK - total load average: 17.05, 19.18, 18.46 [00:49:20] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762818530000&orgId=1&to=1762822160037 [01:04:37] PROBLEM - mw201 Current Load on mw201 is WARNING: LOAD WARNING - total load average: 21.40, 20.32, 18.97 [01:06:37] RECOVERY - mw201 Current Load on mw201 is OK: LOAD OK - total load average: 17.30, 19.22, 18.74 [01:09:25] [02RequestCustomDomain] 07dependabot[bot] created 03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0 (+1 new commit) 13https://github.com/miraheze/RequestCustomDomain/commit/efae88555a47 [01:09:25] 02RequestCustomDomain/03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0 07dependabot[bot] 03efae885 Bump eslint-config-wikimedia from 0.31.0 to 0.32.0… [01:09:25] [02RequestCustomDomain] 07dependabot[bot] added the label 'javascript' to pull request #97 (Bump eslint-config-wikimedia from 0.31.0 to 0.32.0) 13https://github.com/miraheze/RequestCustomDomain/pull/97 [01:09:25] [02RequestCustomDomain] 07dependabot[bot] added the label 'dependencies' to pull request #97 (Bump eslint-config-wikimedia from 0.31.0 to 0.32.0) 13https://github.com/miraheze/RequestCustomDomain/pull/97 [01:09:27] [02RequestCustomDomain] 07dependabot[bot] opened pull request #97: Bump eslint-config-wikimedia from 0.31.0 to 0.32.0 (03main...03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0) 13https://github.com/miraheze/RequestCustomDomain/pull/97 [01:09:29] [02RequestCustomDomain] 07dependabot[bot] added the label 'dependencies' to pull request #97 (Bump eslint-config-wikimedia from 0.31.0 to 0.32.0) 13https://github.com/miraheze/RequestCustomDomain/pull/97 [01:09:31] [02RequestCustomDomain] 07dependabot[bot] added the label 'javascript' to pull request #97 (Bump eslint-config-wikimedia from 0.31.0 to 0.32.0) 13https://github.com/miraheze/RequestCustomDomain/pull/97 [01:09:37] [02RequestCustomDomain] 07coderabbitai[bot] commented on pull request #97: --- […] 13https://github.com/miraheze/RequestCustomDomain/pull/97#issuecomment-3514545111 [01:19:20] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762818530000&orgId=1&to=1762823720000 [01:19:55] PROBLEM - mw193 Current Load on mw193 is WARNING: LOAD WARNING - total load average: 22.13, 18.98, 16.48 [01:20:27] PROBLEM - mw203 Current Load on mw203 is WARNING: LOAD WARNING - total load average: 21.62, 19.27, 17.31 [01:21:10] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.91, 19.51, 17.95 [01:21:42] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.70, 19.79, 17.96 [01:21:55] RECOVERY - mw193 Current Load on mw193 is OK: LOAD OK - total load average: 19.18, 19.42, 16.98 [01:22:27] RECOVERY - mw203 Current Load on mw203 is OK: LOAD OK - total load average: 18.09, 18.70, 17.34 [01:22:37] PROBLEM - mw201 Current Load on mw201 is WARNING: LOAD WARNING - total load average: 20.62, 19.55, 18.70 [01:23:10] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 19.58, 19.42, 18.11 [01:23:42] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 19.43, 19.41, 18.04 [01:24:37] RECOVERY - mw201 Current Load on mw201 is OK: LOAD OK - total load average: 19.93, 19.65, 18.84 [01:25:15] PROBLEM - db171 Current Load on db171 is CRITICAL: LOAD CRITICAL - total load average: 11.71, 12.48, 9.21 [01:27:09] PROBLEM - mw192 Current Load on mw192 is WARNING: LOAD WARNING - total load average: 21.97, 20.03, 18.06 [01:27:15] RECOVERY - db171 Current Load on db171 is OK: LOAD OK - total load average: 6.82, 10.14, 8.73 [01:27:53] [Grafana] FIRING: The estimated time for the MediaWiki JobQueue to clear is excessively high (8 hours) for an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762820840000&orgId=1&to=1762824473360 [01:29:06] RECOVERY - mw192 Current Load on mw192 is OK: LOAD OK - total load average: 19.34, 19.90, 18.26 [01:29:58] miraheze/RequestCustomDomain - dependabot[bot] the build passed. [01:34:37] PROBLEM - mw201 Current Load on mw201 is WARNING: LOAD WARNING - total load average: 21.92, 20.17, 19.17 [01:36:37] RECOVERY - mw201 Current Load on mw201 is OK: LOAD OK - total load average: 16.77, 18.58, 18.70 [01:37:53] [Grafana] RESOLVED: MediaWiki JobQueue is stalled https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762821260000&orgId=1&to=1762824860000 [01:39:02] PROBLEM - mw192 Current Load on mw192 is WARNING: LOAD WARNING - total load average: 21.87, 20.25, 18.71 [01:41:02] RECOVERY - mw192 Current Load on mw192 is OK: LOAD OK - total load average: 15.17, 18.57, 18.30 [01:44:25] why do they always get a error when it's night now [01:47:53] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762821950000&orgId=1&to=1762825673363[Grafana] RESOLVED: MediaWiki JobQueue is stalled https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762821260000&orgId=1&to=1762824860000 [01:48:01] PROBLEM - db171 Current Load on db171 is WARNING: LOAD WARNING - total load average: 10.95, 10.20, 8.48 [01:49:58] RECOVERY - db171 Current Load on db171 is OK: LOAD OK - total load average: 5.68, 8.60, 8.11 [01:57:53] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762821950000&orgId=1&to=1762826180000 [02:52:20] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762825910000&orgId=1&to=1762829540038 [03:02:20] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762825910000&orgId=1&to=1762830020000 [03:29:30] PROBLEM - mw192 Current Load on mw192 is WARNING: LOAD WARNING - total load average: 20.93, 19.09, 17.28 [03:31:20] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762828250000&orgId=1&to=1762831880037 [03:31:26] RECOVERY - mw192 Current Load on mw192 is OK: LOAD OK - total load average: 15.91, 17.82, 17.03 [03:41:23] PROBLEM - mw202 Current Load on mw202 is WARNING: LOAD WARNING - total load average: 20.57, 18.11, 15.98 [03:43:19] RECOVERY - mw202 Current Load on mw202 is OK: LOAD OK - total load average: 16.97, 17.61, 16.06 [03:51:20] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762828250000&orgId=1&to=1762832960000 [04:03:10] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.46, 18.92, 17.24 [04:05:10] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 15.84, 17.84, 17.07 [04:24:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762831460000&orgId=1&to=1762835090041 [04:29:50] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762831760000&orgId=1&to=1762835360000 [04:57:47] PROBLEM - mw192 Current Load on mw192 is WARNING: LOAD WARNING - total load average: 23.05, 19.09, 17.25 [04:59:43] RECOVERY - mw192 Current Load on mw192 is OK: LOAD OK - total load average: 16.57, 17.99, 17.07 [05:13:36] PROBLEM - db161 Current Load on db161 is WARNING: LOAD WARNING - total load average: 10.52, 11.61, 8.59 [05:17:36] RECOVERY - db161 Current Load on db161 is OK: LOAD OK - total load average: 8.25, 9.89, 8.58 [05:35:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762835720000&orgId=1&to=1762839350038 [05:39:09] PROBLEM - mw202 Current Load on mw202 is WARNING: LOAD WARNING - total load average: 23.48, 19.74, 17.08 [05:41:09] RECOVERY - mw202 Current Load on mw202 is OK: LOAD OK - total load average: 13.54, 17.96, 16.79 [06:16:18] PROBLEM - cp201 Disk Space on cp201 is WARNING: DISK WARNING - free space: / 49741MiB (10% inode=99%); [06:20:50] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762838210000&orgId=1&to=1762841810000 [09:09:48] PROBLEM - db171 Current Load on db171 is CRITICAL: LOAD CRITICAL - total load average: 17.12, 12.37, 7.93 [09:11:45] RECOVERY - db171 Current Load on db171 is OK: LOAD OK - total load average: 5.03, 9.42, 7.39 [09:14:34] PROBLEM - cp191 Disk Space on cp191 is WARNING: DISK WARNING - free space: / 49857MiB (10% inode=99%); [09:52:27] PROBLEM - cp191 health.wikitide.net HTTPS on cp191 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10004 milliseconds with 0 bytes received [10:01:40] ` [11:08:29] PROBLEM - cp191 HTTPS on cp191 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10004 milliseconds with 0 bytes received [11:13:34] PROBLEM - cp201 HTTPS on cp201 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [11:14:23] RECOVERY - cp191 HTTPS on cp191 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4273 bytes in 0.068 second response time [11:15:29] RECOVERY - cp201 HTTPS on cp201 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4295 bytes in 0.062 second response time [11:21:34] PROBLEM - cp201 HTTPS on cp201 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [11:22:06] PROBLEM - cp171 HTTPS on cp171 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [11:22:36] PROBLEM - cp191 HTTPS on cp191 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10004 milliseconds with 0 bytes received [11:26:04] RECOVERY - cp171 HTTPS on cp171 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4295 bytes in 3.090 second response time [11:29:02] PROBLEM - puppet181 Check unit status of listdomains_github_push on puppet181 is CRITICAL: CRITICAL: Status of the systemd unit listdomains_github_push [11:29:41] RECOVERY - cp201 HTTPS on cp201 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4273 bytes in 5.039 second response time [11:33:43] PROBLEM - cp201 HTTPS on cp201 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [11:34:42] RECOVERY - cp191 HTTPS on cp191 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4273 bytes in 0.316 second response time [11:35:02] RECOVERY - puppet181 Check unit status of listdomains_github_push on puppet181 is OK: OK: Status of the systemd unit listdomains_github_push [11:35:47] RECOVERY - cp201 HTTPS on cp201 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4273 bytes in 9.245 second response time [11:36:17] PROBLEM - cp171 HTTPS on cp171 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [11:39:02] PROBLEM - puppet181 Check unit status of listdomains_github_push on puppet181 is CRITICAL: CRITICAL: Status of the systemd unit listdomains_github_push [11:39:59] PROBLEM - cp201 HTTPS on cp201 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10002 milliseconds with 0 bytes received [11:45:44] PROBLEM - cp191 HTTPS on cp191 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [11:47:39] RECOVERY - cp191 HTTPS on cp191 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4273 bytes in 0.075 second response time [11:50:26] RECOVERY - cp171 HTTPS on cp171 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4273 bytes in 1.214 second response time [11:51:49] PROBLEM - cp191 HTTPS on cp191 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10002 milliseconds with 0 bytes received [11:53:46] RECOVERY - cp201 HTTPS on cp201 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4273 bytes in 0.067 second response time [11:54:19] PROBLEM - cp171 HTTPS on cp171 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [12:00:19] RECOVERY - cp171 HTTPS on cp171 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4273 bytes in 0.664 second response time [12:00:42] PROBLEM - cp201 HTTPS on cp201 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [12:04:11] PROBLEM - cp171 HTTPS on cp171 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [12:05:02] RECOVERY - puppet181 Check unit status of listdomains_github_push on puppet181 is OK: OK: Status of the systemd unit listdomains_github_push [12:06:08] RECOVERY - cp171 HTTPS on cp171 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4295 bytes in 0.097 second response time [12:07:20] cp201 is critical [12:07:29] cp171 keeps flapping between critical and ok [12:07:43] not sure about the other ones [12:07:52] I can reboot them [12:07:56] thx [12:08:38] PROBLEM - cp201 Current Load on cp201 is CRITICAL: connect to address 10.0.20.166 port 5666: Connection refusedconnect to host 10.0.20.166 port 5666: Connection refused [12:09:02] PROBLEM - puppet181 Check unit status of listdomains_github_push on puppet181 is CRITICAL: CRITICAL: Status of the systemd unit listdomains_github_push [12:09:26] !log reboot cp171/201 [12:09:29] RECOVERY - cp191 HTTPS on cp191 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4295 bytes in 1.019 second response time [12:09:51] apparently 191 was critical as well [12:09:58] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:09:58] It says ok now [12:09:59] PROBLEM - cp171 HTTPS on cp171 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 7 - Failed to connect to cp171.wikitide.net port 443 after 0 ms: Couldn't connect to server [12:10:00] yep [12:10:13] Hopefully they reboot in a few minutes [12:10:28] RECOVERY - cp201 HTTPS on cp201 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4273 bytes in 0.068 second response time [12:10:32] RECOVERY - cp201 Current Load on cp201 is OK: LOAD OK - total load average: 10.89, 3.63, 1.28 [12:10:37] [1/2] wtf [12:10:38] [2/2] https://cdn.discordapp.com/attachments/808001911868489748/1437776529991143547/image.png?ex=6914793d&is=691327bd&hm=a99c2465b98fadd180f3bb2048a26d9cd8060dbd383f2bd3eb2527e64bdc5bf1& [12:10:45] cp171 has been critical since yesterday? [12:10:54] Hmm [12:11:54] I need to go back to work [12:11:57] RECOVERY - cp171 HTTPS on cp171 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4273 bytes in 0.069 second response time [12:12:09] alr [12:12:17] situation seems mostly fine now, at least most requests work for me [12:12:22] if not all [12:12:28] RECOVERY - cp171 health.wikitide.net HTTPS on cp171 is OK: HTTP OK: HTTP/2 200 - 112 bytes in 0.011 second response time [12:13:20] PROBLEM - swiftproxy161 Current Load on swiftproxy161 is CRITICAL: LOAD CRITICAL - total load average: 12.37, 7.04, 3.03 [12:14:13] PROBLEM - swiftproxy161 HTTPS on swiftproxy161 is CRITICAL: connect to address 10.0.16.135 and port 443: Connection refusedHTTP CRITICAL - Unable to open TCP socket [12:14:21] PROBLEM - swiftproxy171 Check unit status of swift_dispersion_stats on swiftproxy171 is CRITICAL: CRITICAL: Status of the systemd unit swift_dispersion_stats [12:14:27] why is it swiftproxy now [12:14:37] PROBLEM - swiftproxy161 Swift NGINX SSL check on swiftproxy161 is CRITICAL: connect to address localhost and port 443: Connection refused [12:15:20] RECOVERY - swiftproxy161 Current Load on swiftproxy161 is OK: LOAD OK - total load average: 1.68, 4.72, 2.67 [12:17:02] RECOVERY - puppet181 Check unit status of listdomains_github_push on puppet181 is OK: OK: Status of the systemd unit listdomains_github_push [12:19:15] PROBLEM - db171 Current Load on db171 is CRITICAL: LOAD CRITICAL - total load average: 19.85, 11.23, 5.72 [12:21:15] RECOVERY - db171 Current Load on db171 is OK: LOAD OK - total load average: 7.59, 9.30, 5.68 [12:26:11] RECOVERY - swiftproxy161 HTTPS on swiftproxy161 is OK: HTTP OK: Status line output matched "HTTP/1.1 404" - 352 bytes in 1.654 second response time [12:26:37] RECOVERY - swiftproxy161 Swift NGINX SSL check on swiftproxy161 is OK: OK - Certificate 'wikitide.net' will expire on Mon Dec 29 15:38:37 2025 +0000.TCP OK - 0.010 second response time on localhost port 443 [12:26:38] !log restart nginx/swift-proxy on swiftproxy161 [12:27:07] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:30:18] PROBLEM - cp201 Disk Space on cp201 is CRITICAL: DISK CRITICAL - free space: / 27183MiB (5% inode=99%); [12:41:15] PROBLEM - db171 Current Load on db171 is CRITICAL: LOAD CRITICAL - total load average: 12.30, 10.82, 7.96 [12:42:21] RECOVERY - swiftproxy171 Check unit status of swift_dispersion_stats on swiftproxy171 is OK: OK: Status of the systemd unit swift_dispersion_stats [12:44:53] [02puppet] 07paladox created 03paladox-patch-6 (+1 new commit) 13https://github.com/miraheze/puppet/commit/7f9f12baf9e0 [12:44:53] 02puppet/03paladox-patch-6 07paladox 037f9f12b nginx: Comment out tcp_nopush directive… [12:44:57] [02puppet] 07paladox opened pull request #4594: nginx: Comment out tcp_nopush directive (03main...03paladox-patch-6) 13https://github.com/miraheze/puppet/pull/4594 [12:45:04] [02puppet] 07coderabbitai[bot] commented on pull request #4594:

[…] 13https://github.com/miraheze/puppet/pull/4594#issuecomment-3516746541 [12:45:15] PROBLEM - db171 Current Load on db171 is WARNING: LOAD WARNING - total load average: 9.24, 11.18, 8.89 [12:45:21] [02ssl] 07WikiTideBot pushed 1 new commit to 03main 13https://github.com/miraheze/ssl/commit/20ac9e8042c40bfe94029cbdfa8c079ec1931edd [12:45:21] 02ssl/03main 07WikiTideBot 0320ac9e8 Bot: Auto-update domain lists [12:45:49] [02puppet] 07paladox merged pull request #4594: nginx: Comment out tcp_nopush directive (03main...03paladox-patch-6) 13https://github.com/miraheze/puppet/pull/4594 [12:45:49] [02puppet] 07paladox pushed 1 new commit to 03main 13https://github.com/miraheze/puppet/commit/7c67e495601ea1a62d0e14ce2eb7ff63f9d4341e [12:45:50] 02puppet/03main 07paladox 037c67e49 nginx: Comment out tcp_nopush directive (#4594)… [12:45:52] [02puppet] 07paladox 04deleted 03paladox-patch-6 at 037f9f12b 13https://api.github.com/repos/miraheze/puppet/commit/7f9f12b [12:49:15] RECOVERY - db171 Current Load on db171 is OK: LOAD OK - total load average: 6.57, 9.04, 8.57 [12:54:58] !log restart cp191 - varnish restart was hanging [12:55:02] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:57:35] RECOVERY - cp201 Disk Space on cp201 is OK: DISK OK - free space: / 61194MiB (13% inode=99%); [13:13:21] [02ManageWiki] 07paladox merged 07dependabot[bot]'s pull request #749: Bump eslint-config-wikimedia from 0.31.0 to 0.32.0 (03main...03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0) 13https://github.com/miraheze/ManageWiki/pull/749 [13:13:22] [02ManageWiki] 07paladox pushed 1 new commit to 03main 13https://github.com/miraheze/ManageWiki/commit/2b9375d4100d12d3b22fb406a66184729dc59acd [13:13:22] 02ManageWiki/03main 07dependabot[bot] 032b9375d Bump eslint-config-wikimedia from 0.31.0 to 0.32.0 (#749)… [13:13:23] [02ManageWiki] 07paladox 04deleted 03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0 at 03c17de54 13https://api.github.com/repos/miraheze/ManageWiki/commit/c17de54 [13:13:48] [02CreateWiki] 07paladox merged 07dependabot[bot]'s pull request #786: Bump eslint-config-wikimedia from 0.31.0 to 0.32.0 (03main...03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0) 13https://github.com/miraheze/CreateWiki/pull/786 [13:13:48] [02CreateWiki] 07paladox pushed 1 new commit to 03main 13https://github.com/miraheze/CreateWiki/commit/8fd5a6cfb513c32607d8276592c1471ea0839e4a [13:13:48] 02CreateWiki/03main 07dependabot[bot] 038fd5a6c Bump eslint-config-wikimedia from 0.31.0 to 0.32.0 (#786)… [13:13:49] [02CreateWiki] 07paladox 04deleted 03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0 at 031583e5a 13https://api.github.com/repos/miraheze/CreateWiki/commit/1583e5a [13:13:56] [02IncidentReporting] 07paladox merged 07dependabot[bot]'s pull request #111: Bump eslint-config-wikimedia from 0.31.0 to 0.32.0 (03main...03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0) 13https://github.com/miraheze/IncidentReporting/pull/111 [13:13:56] [02IncidentReporting] 07paladox pushed 1 new commit to 03main 13https://github.com/miraheze/IncidentReporting/commit/386637fb13ea775c45d484d89ba68e7a384e74c1 [13:13:56] 02IncidentReporting/03main 07dependabot[bot] 03386637f Bump eslint-config-wikimedia from 0.31.0 to 0.32.0 (#111)… [13:13:57] [02IncidentReporting] 07paladox 04deleted 03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0 at 03be7467d 13https://api.github.com/repos/miraheze/IncidentReporting/commit/be7467d [13:18:45] miraheze/IncidentReporting - paladox the build passed. [13:19:15] miraheze/ManageWiki - paladox the build passed. [13:30:26] [02WikiTideDebug] 07dependabot[bot] created 03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0 (+1 new commit) 13https://github.com/miraheze/WikiTideDebug/commit/625483f0b0ed [13:30:26] 02WikiTideDebug/03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0 07dependabot[bot] 03625483f Bump eslint-config-wikimedia from 0.31.0 to 0.32.0… [13:30:27] [02WikiTideDebug] 07dependabot[bot] added the label 'dependencies' to pull request #25 (Bump eslint-config-wikimedia from 0.31.0 to 0.32.0) 13https://github.com/miraheze/WikiTideDebug/pull/25 [13:30:27] [02WikiTideDebug] 07dependabot[bot] added the label 'javascript' to pull request #25 (Bump eslint-config-wikimedia from 0.31.0 to 0.32.0) 13https://github.com/miraheze/WikiTideDebug/pull/25 [13:30:29] [02WikiTideDebug] 07dependabot[bot] opened pull request #25: Bump eslint-config-wikimedia from 0.31.0 to 0.32.0 (03main...03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0) 13https://github.com/miraheze/WikiTideDebug/pull/25 [13:30:31] [02WikiTideDebug] 07dependabot[bot] added the label 'dependencies' to pull request #25 (Bump eslint-config-wikimedia from 0.31.0 to 0.32.0) 13https://github.com/miraheze/WikiTideDebug/pull/25 [13:30:33] [02WikiTideDebug] 07dependabot[bot] added the label 'javascript' to pull request #25 (Bump eslint-config-wikimedia from 0.31.0 to 0.32.0) 13https://github.com/miraheze/WikiTideDebug/pull/25 [13:30:36] [02WikiTideDebug] 07coderabbitai[bot] commented on pull request #25: --- […] 13https://github.com/miraheze/WikiTideDebug/pull/25#issuecomment-3516946632 [13:33:58] are all outages finally fixed [13:36:27] miraheze/CreateWiki - paladox the build passed. [13:44:42] [02WikiTideDebug] 07paladox merged 07dependabot[bot]'s pull request #25: Bump eslint-config-wikimedia from 0.31.0 to 0.32.0 (03main...03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0) 13https://github.com/miraheze/WikiTideDebug/pull/25 [13:44:43] [02WikiTideDebug] 07paladox pushed 1 new commit to 03main 13https://github.com/miraheze/WikiTideDebug/commit/6b8e6484d7ead125ab1fb875d949ed91e253a0f1 [13:44:45] 02WikiTideDebug/03main 07dependabot[bot] 036b8e648 Bump eslint-config-wikimedia from 0.31.0 to 0.32.0 (#25)… [13:44:50] [02WikiTideDebug] 07dependabot[bot] 04deleted 03dependabot/npm_and_yarn/eslint-config-wikimedia-0.32.0 at 03625483f 13https://api.github.com/repos/miraheze/WikiTideDebug/commit/625483f [13:54:22] RECOVERY - cp191 health.wikitide.net HTTPS on cp191 is OK: HTTP OK: HTTP/2 200 - 112 bytes in 0.010 second response time [14:41:15] PROBLEM - db171 Current Load on db171 is WARNING: LOAD WARNING - total load average: 9.31, 10.27, 8.35 [14:43:15] RECOVERY - db171 Current Load on db171 is OK: LOAD OK - total load average: 5.50, 8.49, 7.93 [15:33:23] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.34, 18.89, 15.97 [15:35:20] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 14.70, 17.59, 15.86 [15:41:09] PROBLEM - mw202 Current Load on mw202 is WARNING: LOAD WARNING - total load average: 21.53, 19.15, 16.36 [15:43:09] RECOVERY - mw202 Current Load on mw202 is OK: LOAD OK - total load average: 18.99, 18.91, 16.61 [15:48:27] PROBLEM - mw203 Current Load on mw203 is WARNING: LOAD WARNING - total load average: 21.47, 21.01, 18.77 [15:48:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762872500000&orgId=1&to=1762876130037 [15:50:24] PROBLEM - db171 Current Load on db171 is WARNING: LOAD WARNING - total load average: 11.61, 11.61, 9.11 [15:50:27] RECOVERY - mw203 Current Load on mw203 is OK: LOAD OK - total load average: 13.87, 18.96, 18.34 [15:54:18] RECOVERY - db171 Current Load on db171 is OK: LOAD OK - total load average: 6.75, 9.08, 8.64 [16:03:42] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 23.07, 20.24, 17.92 [16:09:42] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 14.28, 18.75, 18.33 [16:13:10] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.61, 19.41, 18.41 [16:17:10] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 14.17, 18.07, 18.19 [16:38:50] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762875470000&orgId=1&to=1762879070000 [16:39:28] !log upgrade phorge on phorge171 [16:39:39] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [16:55:58] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.82, 18.46, 16.41 [16:57:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.50, 18.60, 16.73 [17:09:20] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762877330000&orgId=1&to=1762880960038 [17:29:20] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?from=1762878470000&orgId=1&to=1762882070000 [17:33:10] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.69, 19.04, 17.44 [17:35:10] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 14.42, 17.43, 17.06 [18:05:10] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.20, 18.46, 16.39 [18:05:39] PROBLEM - db171 Current Load on db171 is WARNING: LOAD WARNING - total load average: 11.77, 9.85, 8.44 [18:07:10] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.92, 17.58, 16.32 [18:07:36] RECOVERY - db171 Current Load on db171 is OK: LOAD OK - total load average: 8.21, 9.16, 8.36 [18:07:42] PROBLEM - mw182 Current Load on mw182 is WARNING: LOAD WARNING - total load average: 22.27, 19.84, 17.07 [18:09:42] RECOVERY - mw182 Current Load on mw182 is OK: LOAD OK - total load average: 17.35, 19.05, 17.14 [18:10:39] [1/2] huh [18:10:40] [2/2] https://cdn.discordapp.com/attachments/808001911868489748/1437867135526371459/image.png?ex=6914cd9f&is=69137c1f&hm=91b44035a4fcddf5ac454a0261b5e6b57c20380e28cc613e54400faf7143c0c1& [18:11:03] this is so DANG old [18:11:12] they're still using the twitter link [18:11:17] lmao [18:11:54] nother 503 [18:11:56] PROBLEM - mw153 MediaWiki Rendering on mw153 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 8191 bytes in 0.011 second response time [18:12:13] nvm now 502 [18:12:15] https://cdn.discordapp.com/attachments/808001911868489748/1437867536212557977/image.png?ex=6914cdfe&is=69137c7e&hm=3d9f0803182d4fcc83ab43c7e35a13e3a5be3f5efc4b38bf1c432abeb3d9e4dc& [18:12:20] @blankeclair @pskyechology [18:12:21] PROBLEM - db161 SSH on db161 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:12:21] PROBLEM - db161 Check unit status of sql-backup on db161 is CRITICAL: connect to address 10.0.16.128 port 5666: No route to hostconnect to host 10.0.16.128 port 5666: No route to host [18:12:30] db161 [18:12:31] PROBLEM - mw203 MediaWiki Rendering on mw203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:12:32] Lovely [18:12:35] I'm about to eat [18:12:37] PROBLEM - db161 ferm_active on db161 is CRITICAL: connect to address 10.0.16.128 port 5666: Connection refusedconnect to host 10.0.16.128 port 5666: Connection refused [18:13:03] PROBLEM - mw183 HTTPS on mw183 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [18:13:07] https://cdn.discordapp.com/attachments/808001911868489748/1437867752999227433/videoA.mp4?ex=6914ce32&is=69137cb2&hm=06b735b2f4f42bd30c790afd169ed69b35014ba892f96ee73a8ab1eb3aa0d636& [18:13:31] PROBLEM - mw191 MediaWiki Rendering on mw191 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 8191 bytes in 0.011 second response time [18:13:34] PROBLEM - cp201 HTTPS on cp201 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 502 [18:13:38] PROBLEM - db161 Current Load on db161 is CRITICAL: LOAD CRITICAL - total load average: 169.98, 42.03, 14.04 [18:13:56] https://cdn.discordapp.com/attachments/808001911868489748/1437867958293762218/TRAININGPLEASE_1.mov?ex=6914ce63&is=69137ce3&hm=206b5d72b4b194fd141b560765b2dee5da6fe7f75d49a913cac2d8dd9e0c6467& [18:13:56] PROBLEM - cp171 Varnish Backends on cp171 is CRITICAL: 19 backends are down. mw151 mw152 mw161 mw162 mw171 mw172 mw181 mw182 mw153 mw163 mw173 mw183 mw191 mw192 mw193 mw201 mw202 mw203 mediawiki [18:13:56] PROBLEM - cp201 Varnish Backends on cp201 is CRITICAL: 19 backends are down. mw151 mw152 mw161 mw162 mw171 mw172 mw181 mw182 mw153 mw163 mw173 mw183 mw191 mw192 mw193 mw201 mw202 mw203 mediawiki [18:13:56] fuck [18:14:02] i'm in beddddddddd [18:14:15] yea uhh rhinos might be dealing with it [18:14:18] RECOVERY - db161 Check unit status of sql-backup on db161 is OK: OK: Status of the systemd unit sql-backup [18:14:18] MIGHT [18:14:19] PROBLEM - cp191 HTTPS on cp191 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [18:14:19] Infra are on it [18:14:21] PROBLEM - cp171 HTTPS on cp171 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/2 503 [18:14:21] @agentisai specifically [18:14:22] RECOVERY - db161 SSH on db161 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u7 (protocol 2.0) [18:14:24] PROBLEM - cp161 HTTP 4xx/5xx ERROR Rate on cp161 is WARNING: WARNING - NGINX Error Rate is 55% [18:14:24] oki cool [18:14:24] PROBLEM - db161 APT on db161 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [18:14:26] nini [18:14:27] RECOVERY - mw203 MediaWiki Rendering on mw203 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 1.723 second response time [18:14:34] Rhinos explicitly said he isn't available to deal with it [18:14:38] RECOVERY - db161 ferm_active on db161 is OK: OK ferm input default policy is set [18:14:42] PROBLEM - cp161 HTTPS on cp161 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10001 milliseconds with 0 bytes received [18:14:47] PROBLEM - cp191 Varnish Backends on cp191 is CRITICAL: 19 backends are down. mw151 mw152 mw161 mw162 mw171 mw172 mw181 mw182 mw153 mw163 mw173 mw183 mw191 mw192 mw193 mw201 mw202 mw203 mediawiki [18:14:59] wait there are two rhinos [18:15:03] RECOVERY - mw183 HTTPS on mw183 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4224 bytes in 0.067 second response time [18:15:22] ok were back [18:15:29] RECOVERY - cp201 HTTPS on cp201 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4273 bytes in 0.068 second response time [18:15:29] RECOVERY - mw191 MediaWiki Rendering on mw191 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 4.248 second response time [18:15:38] its really fucking sluggish but i can access wikis again [18:15:39] @djomla81 Watch your language. [18:15:40] 🔥 [18:15:44] shut up [18:15:57] Load is going down [18:15:59] Give it a moment [18:16:04] ok [18:16:18] RECOVERY - cp171 HTTPS on cp171 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4273 bytes in 0.070 second response time [18:16:19] RECOVERY - cp191 HTTPS on cp191 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4273 bytes in 0.057 second response time [18:16:21] RECOVERY - db161 APT on db161 is OK: APT OK: 127 packages available for upgrade (0 critical updates). [18:16:24] RECOVERY - cp161 HTTP 4xx/5xx ERROR Rate on cp161 is OK: OK - NGINX Error Rate is 17% [18:16:34] next time this happens whoever restarts the db should dump the process list so we can look into what's causing it [18:16:37] RECOVERY - cp161 HTTPS on cp161 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4328 bytes in 0.087 second response time [18:16:38] CA did that some time ago [18:16:48] RECOVERY - cp191 Varnish Backends on cp191 is OK: All 31 backends are healthy [18:16:59] wheres the puppet server [18:17:03] i find that nam,e funbny [18:17:47] (cc @paladox @agentisai) [18:17:56] RECOVERY - cp171 Varnish Backends on cp171 is OK: All 31 backends are healthy [18:17:56] RECOVERY - cp201 Varnish Backends on cp201 is OK: All 31 backends are healthy [18:17:58] RECOVERY - mw153 MediaWiki Rendering on mw153 is OK: HTTP OK: HTTP/1.1 200 OK - 8191 bytes in 0.186 second response time [18:18:05] I dumped it and it was MirahezeFunctions [18:18:09] oh [18:18:12] generateDatabaseList [18:18:17] On db161? [18:18:33] on 171, it also died for a moment [18:18:34] can you put the dump in a private phorge paste or sth [18:18:38] Oh [18:18:46] 161 was just collateral [18:19:44] [1/2] still waiting [18:19:45] [2/2] https://cdn.discordapp.com/attachments/808001911868489748/1437869421862457404/image.png?ex=6914cfc0&is=69137e40&hm=927f62c98b26fea78a0e8424c4e13d6e443fbaa13bc03c5e6e719a4c50a0a7ce& [18:20:01] whatever error that is, it’s unrelated [18:20:01] link @bartomelow ? [18:20:07] it's probably monaco [18:20:25] https://stopitslender.miraheze.org/wiki/Main_Page [18:20:38] I don't get this error [18:20:42] weird [18:20:43] i do [18:20:50] it's your personal monaco toolbox [18:20:51] hold on [18:21:06] the hell [18:21:09] it might be [18:21:12] my personal monaco toolbox is doing that??? [18:21:21] yep [18:21:24] damn i was just trying to see what it does 😭 [18:21:40] [1/2] yup, it's monaco [18:21:41] [2/2] https://cdn.discordapp.com/attachments/808001911868489748/1437869908061982752/image.png?ex=6914d034&is=69137eb4&hm=737312c8944a69bd8cf1ccac3e2e444b848b66cf54f18b2ca8385e9933f1f436& [18:21:56]