[00:02:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.79, 22.32, 22.43 [00:03:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:06:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.29, 23.16, 22.84 [00:12:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.74, 24.74, 23.56 [00:13:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:14:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.23, 23.51, 23.30 [00:18:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 9.45, 15.89, 20.28 [00:44:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [00:44:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.97, 20.18, 18.43 [00:46:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 415 system event log (SEL) entries present] [00:46:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 20.37, 20.16, 18.63 [00:57:28] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.42, 21.44, 19.78 [00:59:24] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.79, 21.58, 20.06 [01:01:20] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 29.09, 24.25, 21.20 [01:05:12] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.45, 23.30, 21.59 [01:11:00] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 13.82, 17.85, 19.72 [01:19:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:29:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:30:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [01:32:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 417 system event log (SEL) entries present] [01:33:28] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.15, 20.55, 20.14 [01:36:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.76, 19.38, 17.81 [01:38:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 16.61, 18.55, 17.72 [01:41:12] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.68, 19.71, 20.06 [01:42:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.81, 19.51, 18.26 [01:44:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 18.65, 19.16, 18.27 [02:02:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.23, 20.12, 19.18 [02:08:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 19.88, 20.15, 19.56 [02:14:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.11, 21.68, 20.28 [02:16:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 29.44, 24.58, 21.51 [02:20:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [02:22:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 419 system event log (SEL) entries present] [02:28:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:38:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:40:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.69, 23.27, 23.87 [02:54:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.35, 22.45, 22.53 [02:58:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.67, 21.78, 22.20 [03:04:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [03:06:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 421 system event log (SEL) entries present] [03:10:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.47, 22.50, 22.01 [03:12:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.18, 22.04, 21.94 [03:20:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.61, 22.43, 21.71 [03:28:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.28, 20.57, 18.68 [03:30:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 15.38, 19.03, 18.38 [03:32:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.84, 23.27, 23.49 [03:36:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:46:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:46:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.33, 21.91, 22.09 [03:52:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.11, 23.24, 22.82 [03:53:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 11.27, 18.50, 23.41 [03:54:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [03:54:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.53, 23.55, 22.93 [03:56:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 423 system event log (SEL) entries present] [03:56:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 24.00, 23.31, 22.90 [03:57:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.09, 22.56, 23.80 [03:58:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.53, 23.55, 23.03 [03:59:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.96, 22.22, 23.57 [04:02:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.87, 23.69, 23.32 [04:04:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.51, 23.71, 23.35 [04:05:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.67, 24.54, 23.99 [04:08:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.87, 23.04, 23.21 [04:10:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.52, 23.77, 23.44 [04:11:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.72, 23.87, 23.93 [04:15:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.20, 23.90, 23.80 [04:16:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.52, 23.08, 23.27 [04:18:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.13, 23.21, 23.24 [04:21:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.10, 23.07, 23.62 [04:23:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.38, 23.24, 23.56 [04:26:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.87, 22.32, 23.15 [04:27:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.80, 23.37, 23.57 [04:29:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.74, 23.58, 23.61 [04:32:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.03, 23.09, 23.05 [04:34:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.23, 22.48, 22.88 [04:42:40] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [04:43:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 10.85, 19.16, 22.42 [04:44:38] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 425 system event log (SEL) entries present] [04:45:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.62, 22.82, 23.36 [04:50:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.43, 21.92, 21.79 [04:52:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.54, 21.22, 21.53 [04:58:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.77, 22.87, 21.98 [05:00:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.20, 22.42, 21.95 [05:02:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:03:45] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 8.02, 3.31, 1.33 [05:05:39] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.79, 3.24, 1.57 [05:07:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:12:25] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:12:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 15.99, 18.18, 20.23 [05:13:38] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.24, 4.22, 2.54 [05:15:37] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.13, 3.73, 2.57 [05:17:25] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:19:37] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.08, 4.01, 2.93 [05:19:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:21:08] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:21:44] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:23:03] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.086 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:23:38] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [05:24:22] PROBLEM - hcw.tomat.dev - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'hcw.tomat.dev' expires in 15 day(s) (Wed 14 Aug 2024 04:57:34 AM GMT +0000). [05:24:35] [02ssl] 07WikiTideSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ssl/compare/b4fb9eedefaa...78b137846035 [05:24:37] [02ssl] 07WikiTideSSLBot 0378b1378 - Bot: Update SSL cert for hcw.tomat.dev [05:24:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:25:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:25:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 12.68, 19.50, 22.66 [05:26:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [05:28:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [426 system event log (SEL) entries present] [05:29:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.90, 20.89, 22.28 [05:31:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.81, 21.41, 22.28 [05:33:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.70, 22.37, 22.52 [05:35:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:35:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.66, 3.59, 3.67 [05:37:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:37:36] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 8.23, 5.36, 4.30 [05:42:56] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.05, 21.15, 19.78 [05:44:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.62, 20.61, 18.31 [05:46:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 15.77, 19.21, 18.10 [05:47:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:49:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:50:40] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.84, 20.03, 19.88 [05:54:15] RECOVERY - hcw.tomat.dev - LetsEncrypt on sslhost is OK: OK - Certificate 'hcw.tomat.dev' will expire on Sun 27 Oct 2024 04:24:28 AM GMT +0000. [05:55:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.54, 3.33, 3.96 [05:56:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.85, 23.31, 21.16 [05:59:36] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.14, 3.65, 3.91 [06:01:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.17, 3.64, 3.89 [06:03:36] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.05, 3.58, 3.82 [06:04:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:05:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.48, 3.59, 3.82 [06:07:36] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.75, 4.15, 3.98 [06:08:53] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [06:10:48] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.145 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [06:11:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.18, 3.88, 3.95 [06:14:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.06, 22.30, 22.79 [06:14:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:16:04] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [06:18:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.35, 23.80, 23.26 [06:19:36] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.09, 2.75, 3.39 [06:20:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.01, 22.65, 22.90 [06:22:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.72, 24.85, 23.69 [06:23:39] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.88, 4.60, 3.91 [06:25:38] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.89, 3.96, 3.78 [06:30:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.86, 22.63, 23.55 [06:31:43] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.45, 3.67, 3.66 [06:33:37] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.57, 3.04, 3.42 [06:35:36] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.98, 4.12, 3.77 [06:36:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.95, 19.74, 17.91 [06:37:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.16, 3.63, 3.66 [06:38:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.26, 18.61, 17.72 [06:43:36] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.99, 2.61, 3.22 [06:44:00] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:50:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.16, 23.26, 22.31 [06:52:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.52, 21.04, 21.59 [06:54:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [06:54:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:56:07] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.37, 4.31, 3.62 [06:56:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [428 system event log (SEL) entries present] [06:58:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.38, 3.63, 3.43 [06:59:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:59:56] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.82, 3.20, 3.31 [07:04:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.47, 23.21, 22.30 [07:06:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.07, 21.28, 21.66 [07:09:50] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:10:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.36, 22.99, 22.15 [07:12:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.17, 21.69, 21.76 [07:18:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.30, 23.33, 22.32 [07:20:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.74, 21.64, 21.79 [07:22:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.48, 22.30, 21.96 [07:24:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.89, 22.48, 22.06 [07:25:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.55, 23.18, 23.90 [07:28:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.26, 23.17, 22.43 [07:29:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.63, 24.65, 24.29 [07:32:43] PROBLEM - db181 Backups SQL on db181 is WARNING: FILE_AGE WARNING: /var/log/sql-backup.log is 864188 seconds old and 73191 bytes [07:35:59] PROBLEM - db161 Backups SQL on db161 is WARNING: FILE_AGE WARNING: /var/log/sql-backup.log is 864170 seconds old and 68452 bytes [07:38:30] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [07:38:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.31, 23.35, 23.47 [07:39:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 14.51, 19.82, 22.81 [07:40:29] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [429 system event log (SEL) entries present] [07:41:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.00, 23.26, 23.72 [07:54:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.49, 22.25, 22.39 [07:56:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.91, 20.83, 21.87 [07:58:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.80, 22.51, 22.34 [08:02:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.04, 23.04, 22.65 [08:16:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.41, 21.90, 21.88 [08:24:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [08:26:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [430 system event log (SEL) entries present] [08:26:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 16.83, 22.66, 23.05 [08:28:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.85, 23.42, 23.26 [08:30:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.83, 22.58, 22.94 [08:44:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 15.43, 18.15, 20.03 [08:56:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.04, 21.28, 20.15 [08:58:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.23, 21.68, 20.45 [09:02:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.69, 22.49, 20.95 [09:06:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.35, 23.53, 21.87 [09:13:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 16.39, 22.19, 23.71 [09:14:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [09:16:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [431 system event log (SEL) entries present] [09:16:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 13.97, 17.84, 19.77 [09:17:06] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [09:17:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.18, 21.02, 22.63 [09:19:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.07, 22.27, 22.94 [09:21:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.38, 22.75, 23.01 [09:23:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.63, 22.93, 23.04 [09:25:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.90, 23.30, 23.15 [09:27:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.07, 22.17, 22.73 [09:39:36] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.36, 23.89, 22.74 [09:45:08] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:47:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 16.80, 23.36, 23.47 [09:53:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.46, 21.94, 22.54 [09:55:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 19.56, 21.14, 22.20 [10:02:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.42, 20.37, 19.28 [10:03:35] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 13.67, 17.39, 20.00 [10:04:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [10:04:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.66, 22.47, 20.17 [10:06:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [432 system event log (SEL) entries present] [10:06:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.87, 21.72, 20.24 [10:09:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 29.26, 21.65, 20.56 [10:11:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 22.93, 22.14, 20.91 [10:15:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.20, 22.02, 20.98 [10:16:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.23, 19.64, 20.01 [10:17:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 19.19, 21.08, 20.79 [10:21:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.01, 23.68, 21.91 [10:22:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.84, 20.82, 20.33 [10:24:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.63, 19.34, 19.83 [10:27:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.19, 22.52, 22.04 [10:31:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.80, 24.44, 22.87 [10:41:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.92, 23.06, 23.60 [10:43:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.62, 23.02, 23.47 [10:49:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 12.51, 21.30, 23.14 [10:50:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ID | Date | Time | Name | Type | State | Event1 | Jun-10-2024 | 21:58:11 | SEL | Event Logging Disabled | Nominal | Log Area Reset/Cleared2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 8 [10:50:27] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 ipmi_sel_parse: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [10:52:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [433 system event log (SEL) entries present] [10:53:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.82, 23.64, 23.54 [10:56:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.15, 20.08, 18.08 [10:58:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 15.76, 18.48, 17.74 [11:02:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:03:38] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.31, 3.11, 1.25 [11:05:37] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.64, 2.62, 1.29 [11:07:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:17:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:19:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.66, 3.45, 2.35 [11:21:38] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.37, 3.94, 2.64 [11:22:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:23:37] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.30, 3.26, 2.56 [11:27:25] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:29:36] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.21, 3.79, 2.95 [11:31:36] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 3.18, 3.34, 2.87 [11:32:25] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:34:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:39:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:44:40] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.51, 19.91, 19.06 [11:46:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:46:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.88, 19.12, 18.90 [11:47:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.82, 3.97, 3.27 [11:50:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.15, 21.96, 20.13 [11:51:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:51:36] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.47, 3.14, 3.08 [11:52:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.51, 21.35, 20.18 [11:54:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:54:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.79, 22.49, 20.70 [11:56:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.97, 22.76, 21.04 [11:59:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:02:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.82, 19.85, 20.26 [12:13:38] [02CreateWiki] 07translatewiki pushed 031 commit to 03master [+0/-0/±2] 13https://github.com/miraheze/CreateWiki/compare/0e1b3a9cd50e...17d0f8166c57 [12:13:41] [02CreateWiki] 07translatewiki 0317d0f81 - Localisation updates from https://translatewiki.net. [12:13:42] [02ImportDump] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ImportDump/compare/3efbd8731e06...6056caa52dea [12:13:44] [02ImportDump] 07translatewiki 036056caa - Localisation updates from https://translatewiki.net. [12:13:47] [02IncidentReporting] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/IncidentReporting/compare/822a1a7b2c3e...0e23f84e356d [12:13:50] [02IncidentReporting] 07translatewiki 030e23f84 - Localisation updates from https://translatewiki.net. [12:13:51] [02DataDump] 07translatewiki pushed 031 commit to 03master [+0/-0/±2] 13https://github.com/miraheze/DataDump/compare/afe0c3712afa...cb6de0ab21c4 [12:13:52] [02DataDump] 07translatewiki 03cb6de0a - Localisation updates from https://translatewiki.net. [12:13:54] [02SpriteSheet] 07translatewiki pushed 031 commit to 03master [+0/-0/±2] 13https://github.com/miraheze/SpriteSheet/compare/2d68df532fa3...6e9bc31bfb94 [12:13:56] [02SpriteSheet] 07translatewiki 036e9bc31 - Localisation updates from https://translatewiki.net. [12:13:59] [02RottenLinks] 07translatewiki pushed 031 commit to 03master [+0/-0/±2] 13https://github.com/miraheze/RottenLinks/compare/b2a23a38f2ea...ee4f2dde768d [12:14:00] [02RottenLinks] 07translatewiki 03ee4f2dd - Localisation updates from https://translatewiki.net. [12:14:02] [02MirahezeMagic] 07translatewiki pushed 031 commit to 03master [+0/-0/±6] 13https://github.com/miraheze/MirahezeMagic/compare/6dc7f6fe84cb...559257078e1f [12:14:03] [02MirahezeMagic] 07translatewiki 035592570 - Localisation updates from https://translatewiki.net. [12:14:04] [02WikiDiscover] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/WikiDiscover/compare/d2c71a911108...20e70d0d7a53 [12:14:06] [02WikiDiscover] 07translatewiki 0320e70d0 - Localisation updates from https://translatewiki.net. [12:14:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:15:08] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.44, 20.18, 18.89 [12:16:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.71, 22.14, 20.52 [12:17:06] miraheze/CreateWiki - translatewiki the build has errored. [12:17:19] miraheze/SpriteSheet - translatewiki the build has errored. [12:17:37] miraheze/MirahezeMagic - translatewiki the build has errored. [12:18:04] miraheze/DataDump - translatewiki the build passed. [12:18:12] miraheze/RottenLinks - translatewiki the build passed. [12:18:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.28, 21.85, 20.61 [12:18:42] miraheze/IncidentReporting - translatewiki the build passed. [12:18:46] miraheze/WikiDiscover - translatewiki the build passed. [12:18:57] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.80, 23.15, 20.35 [12:19:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:20:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.38, 22.99, 21.14 [12:20:51] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.33, 21.81, 20.19 [12:22:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [12:22:46] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 11.55, 18.14, 19.04 [12:24:04] miraheze/ImportDump - translatewiki the build passed. [12:24:19] !log [@test151] starting deploy of {'folders': '1.42/extensions/MirahezeMagic'} to test151 [12:24:20] !log [@test151] finished deploy of {'folders': '1.42/extensions/MirahezeMagic'} to test151 - SUCCESS in 0s [12:24:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [434 system event log (SEL) entries present] [12:24:35] !log [@test151] starting deploy of {'folders': '1.43/extensions/MirahezeMagic'} to test151 [12:24:35] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:24:36] !log [@test151] finished deploy of {'folders': '1.43/extensions/MirahezeMagic'} to test151 - SUCCESS in 0s [12:24:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:25:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:25:06] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:26:35] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.75, 20.75, 19.81 [12:26:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.48, 23.52, 22.10 [12:26:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:28:10] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [12:28:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.68, 24.15, 22.46 [12:28:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.36, 4.50, 3.35 [12:30:05] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.073 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [12:30:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.42, 20.96, 20.18 [12:30:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.04, 21.69, 21.77 [12:30:41] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.12, 3.51, 3.12 [12:31:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:31:50] PROBLEM - db181 PowerDNS Recursor on db181 is CRITICAL: CRITICAL - Plugin timed out while executing system call [12:31:58] PROBLEM - db181 Current Load on db181 is CRITICAL: LOAD CRITICAL - total load average: 43.42, 20.11, 7.94 [12:32:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 18.74, 19.98, 19.90 [12:32:35] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.64, 4.30, 3.46 [12:32:49] !log [@mwtask181] starting deploy of {'folders': '1.42/extensions/MirahezeMagic'} to all [12:32:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:33:03] !log [@mwtask181] finished deploy of {'folders': '1.42/extensions/MirahezeMagic'} to all - SUCCESS in 14s [12:33:09] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:33:51] PROBLEM - db181 Puppet on db181 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [12:34:00] PROBLEM - db181 SSH on db181 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:35:50] RECOVERY - db181 Puppet on db181 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [12:35:50] RECOVERY - db181 PowerDNS Recursor on db181 is OK: DNS OK: 0.066 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [12:35:54] RECOVERY - db181 SSH on db181 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [12:36:23] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.50, 3.87, 3.48 [12:36:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.27, 20.46, 20.18 [12:36:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:37:37] PROBLEM - ping6 on cp51 is CRITICAL: PING CRITICAL - Packet loss = 16%, RTA = 185.54 ms [12:37:55] !log [@mwtask171] starting deploy of {'folders': '1.42/extensions/MirahezeMagic'} to all [12:38:03] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:38:07] !log [@mwtask171] finished deploy of {'folders': '1.42/extensions/MirahezeMagic'} to all - SUCCESS in 11s [12:38:14] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:38:17] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.88, 3.29, 3.30 [12:38:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 18.36, 19.61, 19.90 [12:38:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 20.11, 19.50, 20.40 [12:39:58] PROBLEM - db181 Current Load on db181 is WARNING: LOAD WARNING - total load average: 0.45, 11.64, 10.76 [12:40:49] PROBLEM - cp51 HTTPS on cp51 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10002 milliseconds with 0 bytes received [12:41:57] RECOVERY - db181 Current Load on db181 is OK: LOAD OK - total load average: 0.31, 7.89, 9.49 [12:42:45] RECOVERY - cp51 HTTPS on cp51 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3820 bytes in 1.294 second response time [12:43:45] RECOVERY - ping6 on cp51 is OK: PING OK - Packet loss = 0%, RTA = 184.36 ms [12:44:00] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 3.59, 4.20, 3.67 [12:45:54] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.33, 3.62, 3.52 [12:46:50] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:47:48] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.98, 3.01, 3.30 [12:50:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:53:14] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.43, 21.05, 20.03 [12:55:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:57:03] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.84, 20.16, 19.99 [12:57:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:57:28] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.94, 20.85, 20.11 [12:57:43] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [12:58:17] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.27, 3.79, 3.35 [12:59:24] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 14.81, 18.74, 19.45 [13:00:11] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.83, 3.12, 3.14 [13:00:52] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.13, 21.89, 20.71 [13:01:47] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.00169968605 secs [13:03:59] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.13, 3.57, 3.40 [13:05:53] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.39, 2.48, 3.01 [13:06:35] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.58, 23.55, 21.97 [13:07:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:11:02] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.56, 21.29, 20.11 [13:12:20] [Grafana] !tech FIRING: The mediawiki job queue has more than 500 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [13:14:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 11.40, 16.87, 19.59 [13:16:50] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.02, 23.51, 21.30 [13:22:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.29, 22.29, 21.55 [13:38:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.99, 19.63, 20.30 [13:45:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.67, 21.16, 23.86 [13:45:36] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o ] [13:47:27] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 22.17, 20.07, 18.84 [13:47:33] RECOVERY - ns2 NTP time on ns2 is OK: NTP OK: Offset -0.001676052809 secs [13:48:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.42, 21.06, 20.54 [13:49:21] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 16.35, 19.15, 18.69 [13:52:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 15.28, 18.83, 19.81 [13:57:36] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.62, 22.63, 22.96 [14:01:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 14.38, 20.01, 22.03 [14:04:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [14:05:35] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 9.61, 14.20, 19.19 [14:06:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [436 system event log (SEL) entries present] [15:14:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.40, 19.73, 16.23 [15:18:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 20.00, 19.34, 16.86 [15:19:06] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [15:45:29] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:46:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ID | Date | Time | Name | Type | State | Event1 | Jun-10-2024 | 21:58:11 | SEL | Event Logging Disabled | Nominal | Log Area Reset/Cleared2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 8 [15:46:27] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 [15:46:27] 5 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 ipmi_sel_parse: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [15:48:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [438 system event log (SEL) entries present] [15:52:17] PROBLEM - wiki.denby.tech - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.denby.tech' expires in 15 day(s) (Wed 14 Aug 2024 03:30:57 PM GMT +0000). [15:52:29] [02ssl] 07WikiTideSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ssl/compare/78b137846035...3d5e8ae4b263 [15:52:32] [02ssl] 07WikiTideSSLBot 033d5e8ae - Bot: Update SSL cert for wiki.denby.tech [16:04:18] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.69, 20.86, 16.99 [16:05:20]