[00:02:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.79, 22.32, 22.43 [00:03:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:06:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.29, 23.16, 22.84 [00:12:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.74, 24.74, 23.56 [00:13:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [00:14:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.23, 23.51, 23.30 [00:18:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 9.45, 15.89, 20.28 [00:44:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [00:44:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.97, 20.18, 18.43 [00:46:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 415 system event log (SEL) entries present] [00:46:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 20.37, 20.16, 18.63 [00:57:28] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.42, 21.44, 19.78 [00:59:24] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.79, 21.58, 20.06 [01:01:20] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 29.09, 24.25, 21.20 [01:05:12] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.45, 23.30, 21.59 [01:11:00] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 13.82, 17.85, 19.72 [01:19:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:29:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [01:30:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [01:32:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 417 system event log (SEL) entries present] [01:33:28] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.15, 20.55, 20.14 [01:36:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.76, 19.38, 17.81 [01:38:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 16.61, 18.55, 17.72 [01:41:12] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.68, 19.71, 20.06 [01:42:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.81, 19.51, 18.26 [01:44:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 18.65, 19.16, 18.27 [02:02:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.23, 20.12, 19.18 [02:08:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 19.88, 20.15, 19.56 [02:14:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.11, 21.68, 20.28 [02:16:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 29.44, 24.58, 21.51 [02:20:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [02:22:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 419 system event log (SEL) entries present] [02:28:30] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:38:30] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [02:40:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.69, 23.27, 23.87 [02:54:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.35, 22.45, 22.53 [02:58:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.67, 21.78, 22.20 [03:04:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [03:06:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 421 system event log (SEL) entries present] [03:10:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.47, 22.50, 22.01 [03:12:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.18, 22.04, 21.94 [03:20:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.61, 22.43, 21.71 [03:28:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.28, 20.57, 18.68 [03:30:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 15.38, 19.03, 18.38 [03:32:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 19.84, 23.27, 23.49 [03:36:00] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:46:00] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [03:46:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.33, 21.91, 22.09 [03:52:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.11, 23.24, 22.82 [03:53:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 11.27, 18.50, 23.41 [03:54:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [03:54:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.53, 23.55, 22.93 [03:56:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 423 system event log (SEL) entries present] [03:56:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 24.00, 23.31, 22.90 [03:57:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.09, 22.56, 23.80 [03:58:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.53, 23.55, 23.03 [03:59:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.96, 22.22, 23.57 [04:02:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.87, 23.69, 23.32 [04:04:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.51, 23.71, 23.35 [04:05:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.67, 24.54, 23.99 [04:08:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.87, 23.04, 23.21 [04:10:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.52, 23.77, 23.44 [04:11:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.72, 23.87, 23.93 [04:15:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.20, 23.90, 23.80 [04:16:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.52, 23.08, 23.27 [04:18:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.13, 23.21, 23.24 [04:21:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.10, 23.07, 23.62 [04:23:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.38, 23.24, 23.56 [04:26:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.87, 22.32, 23.15 [04:27:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.80, 23.37, 23.57 [04:29:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.74, 23.58, 23.61 [04:32:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.03, 23.09, 23.05 [04:34:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.23, 22.48, 22.88 [04:42:40] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [04:43:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 10.85, 19.16, 22.42 [04:44:38] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [Inlet Temp = Critical, 425 system event log (SEL) entries present] [04:45:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.62, 22.82, 23.36 [04:50:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.43, 21.92, 21.79 [04:52:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.54, 21.22, 21.53 [04:58:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.77, 22.87, 21.98 [05:00:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.20, 22.42, 21.95 [05:02:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:03:45] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 8.02, 3.31, 1.33 [05:05:39] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.79, 3.24, 1.57 [05:07:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:12:25] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:12:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 15.99, 18.18, 20.23 [05:13:38] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.24, 4.22, 2.54 [05:15:37] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.13, 3.73, 2.57 [05:17:25] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:19:37] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.08, 4.01, 2.93 [05:19:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:21:08] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:21:44] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:23:03] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.086 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [05:23:38] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [05:24:22] PROBLEM - hcw.tomat.dev - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'hcw.tomat.dev' expires in 15 day(s) (Wed 14 Aug 2024 04:57:34 AM GMT +0000). [05:24:35] [02ssl] 07WikiTideSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ssl/compare/b4fb9eedefaa...78b137846035 [05:24:37] [02ssl] 07WikiTideSSLBot 0378b1378 - Bot: Update SSL cert for hcw.tomat.dev [05:24:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:25:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:25:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 12.68, 19.50, 22.66 [05:26:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [05:28:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [426 system event log (SEL) entries present] [05:29:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.90, 20.89, 22.28 [05:31:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.81, 21.41, 22.28 [05:33:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.70, 22.37, 22.52 [05:35:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:35:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.66, 3.59, 3.67 [05:37:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:37:36] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 8.23, 5.36, 4.30 [05:42:56] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.05, 21.15, 19.78 [05:44:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.62, 20.61, 18.31 [05:46:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 15.77, 19.21, 18.10 [05:47:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:49:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [05:50:40] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.84, 20.03, 19.88 [05:54:15] RECOVERY - hcw.tomat.dev - LetsEncrypt on sslhost is OK: OK - Certificate 'hcw.tomat.dev' will expire on Sun 27 Oct 2024 04:24:28 AM GMT +0000. [05:55:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 1.54, 3.33, 3.96 [05:56:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.85, 23.31, 21.16 [05:59:36] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.14, 3.65, 3.91 [06:01:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.17, 3.64, 3.89 [06:03:36] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.05, 3.58, 3.82 [06:04:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:05:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.48, 3.59, 3.82 [06:07:36] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.75, 4.15, 3.98 [06:08:53] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [06:10:48] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.145 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [06:11:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.18, 3.88, 3.95 [06:14:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.06, 22.30, 22.79 [06:14:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:16:04] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [06:18:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.35, 23.80, 23.26 [06:19:36] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.09, 2.75, 3.39 [06:20:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.01, 22.65, 22.90 [06:22:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.72, 24.85, 23.69 [06:23:39] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.88, 4.60, 3.91 [06:25:38] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.89, 3.96, 3.78 [06:30:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.86, 22.63, 23.55 [06:31:43] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.45, 3.67, 3.66 [06:33:37] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.57, 3.04, 3.42 [06:35:36] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.98, 4.12, 3.77 [06:36:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.95, 19.74, 17.91 [06:37:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.16, 3.63, 3.66 [06:38:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.26, 18.61, 17.72 [06:43:36] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.99, 2.61, 3.22 [06:44:00] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:50:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.16, 23.26, 22.31 [06:52:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.52, 21.04, 21.59 [06:54:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [06:54:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:56:07] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.37, 4.31, 3.62 [06:56:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [428 system event log (SEL) entries present] [06:58:02] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.38, 3.63, 3.43 [06:59:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [06:59:56] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.82, 3.20, 3.31 [07:04:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.47, 23.21, 22.30 [07:06:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.07, 21.28, 21.66 [07:09:50] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [07:10:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.36, 22.99, 22.15 [07:12:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.17, 21.69, 21.76 [07:18:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 27.30, 23.33, 22.32 [07:20:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.74, 21.64, 21.79 [07:22:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.48, 22.30, 21.96 [07:24:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.89, 22.48, 22.06 [07:25:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.55, 23.18, 23.90 [07:28:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.26, 23.17, 22.43 [07:29:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.63, 24.65, 24.29 [07:32:43] PROBLEM - db181 Backups SQL on db181 is WARNING: FILE_AGE WARNING: /var/log/sql-backup.log is 864188 seconds old and 73191 bytes [07:35:59] PROBLEM - db161 Backups SQL on db161 is WARNING: FILE_AGE WARNING: /var/log/sql-backup.log is 864170 seconds old and 68452 bytes [07:38:30] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [07:38:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.31, 23.35, 23.47 [07:39:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 14.51, 19.82, 22.81 [07:40:29] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [429 system event log (SEL) entries present] [07:41:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.00, 23.26, 23.72 [07:54:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.49, 22.25, 22.39 [07:56:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 17.91, 20.83, 21.87 [07:58:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.80, 22.51, 22.34 [08:02:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 23.04, 23.04, 22.65 [08:16:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.41, 21.90, 21.88 [08:24:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [08:26:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [430 system event log (SEL) entries present] [08:26:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 16.83, 22.66, 23.05 [08:28:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.85, 23.42, 23.26 [08:30:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.83, 22.58, 22.94 [08:44:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 15.43, 18.15, 20.03 [08:56:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.04, 21.28, 20.15 [08:58:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.23, 21.68, 20.45 [09:02:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.69, 22.49, 20.95 [09:06:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.35, 23.53, 21.87 [09:13:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 16.39, 22.19, 23.71 [09:14:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [09:16:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [431 system event log (SEL) entries present] [09:16:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 13.97, 17.84, 19.77 [09:17:06] PROBLEM - ns2 Puppet on ns2 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. It might be a dependency cycle. [09:17:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.18, 21.02, 22.63 [09:19:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.07, 22.27, 22.94 [09:21:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.38, 22.75, 23.01 [09:23:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 23.63, 22.93, 23.04 [09:25:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.90, 23.30, 23.15 [09:27:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.07, 22.17, 22.73 [09:39:36] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 28.36, 23.89, 22.74 [09:45:08] RECOVERY - ns2 Puppet on ns2 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:47:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 16.80, 23.36, 23.47 [09:53:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 25.46, 21.94, 22.54 [09:55:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 19.56, 21.14, 22.20 [10:02:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 21.42, 20.37, 19.28 [10:03:35] RECOVERY - mw152 Current Load on mw152 is OK: LOAD OK - total load average: 13.67, 17.39, 20.00 [10:04:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [10:04:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.66, 22.47, 20.17 [10:06:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [432 system event log (SEL) entries present] [10:06:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.87, 21.72, 20.24 [10:09:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 29.26, 21.65, 20.56 [10:11:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 22.93, 22.14, 20.91 [10:15:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.20, 22.02, 20.98 [10:16:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.23, 19.64, 20.01 [10:17:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 19.19, 21.08, 20.79 [10:21:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 26.01, 23.68, 21.91 [10:22:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.84, 20.82, 20.33 [10:24:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 17.63, 19.34, 19.83 [10:27:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 21.19, 22.52, 22.04 [10:31:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.80, 24.44, 22.87 [10:41:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 20.92, 23.06, 23.60 [10:43:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 24.62, 23.02, 23.47 [10:49:35] PROBLEM - mw152 Current Load on mw152 is WARNING: LOAD WARNING - total load average: 12.51, 21.30, 23.14 [10:50:27] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ID | Date | Time | Name | Type | State | Event1 | Jun-10-2024 | 21:58:11 | SEL | Event Logging Disabled | Nominal | Log Area Reset/Cleared2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 8 [10:50:27] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 ipmi_sel_parse: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [10:52:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [433 system event log (SEL) entries present] [10:53:35] PROBLEM - mw152 Current Load on mw152 is CRITICAL: LOAD CRITICAL - total load average: 27.82, 23.64, 23.54 [10:56:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 21.15, 20.08, 18.08 [10:58:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 15.76, 18.48, 17.74 [11:02:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:03:38] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 7.31, 3.11, 1.25 [11:05:37] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.64, 2.62, 1.29 [11:07:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:17:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:19:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.66, 3.45, 2.35 [11:21:38] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.37, 3.94, 2.64 [11:22:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:23:37] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.30, 3.26, 2.56 [11:27:25] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:29:36] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 4.21, 3.79, 2.95 [11:31:36] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 3.18, 3.34, 2.87 [11:32:25] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:34:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:39:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:44:40] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.51, 19.91, 19.06 [11:46:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:46:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 16.88, 19.12, 18.90 [11:47:36] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.82, 3.97, 3.27 [11:50:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.15, 21.96, 20.13 [11:51:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:51:36] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.47, 3.14, 3.08 [11:52:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.51, 21.35, 20.18 [11:54:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [11:54:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 25.79, 22.49, 20.70 [11:56:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.97, 22.76, 21.04 [11:59:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:02:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 18.82, 19.85, 20.26 [12:13:38] [02CreateWiki] 07translatewiki pushed 031 commit to 03master [+0/-0/±2] 13https://github.com/miraheze/CreateWiki/compare/0e1b3a9cd50e...17d0f8166c57 [12:13:41] [02CreateWiki] 07translatewiki 0317d0f81 - Localisation updates from https://translatewiki.net. [12:13:42] [02ImportDump] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/ImportDump/compare/3efbd8731e06...6056caa52dea [12:13:44] [02ImportDump] 07translatewiki 036056caa - Localisation updates from https://translatewiki.net. [12:13:47] [02IncidentReporting] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/IncidentReporting/compare/822a1a7b2c3e...0e23f84e356d [12:13:50] [02IncidentReporting] 07translatewiki 030e23f84 - Localisation updates from https://translatewiki.net. [12:13:51] [02DataDump] 07translatewiki pushed 031 commit to 03master [+0/-0/±2] 13https://github.com/miraheze/DataDump/compare/afe0c3712afa...cb6de0ab21c4 [12:13:52] [02DataDump] 07translatewiki 03cb6de0a - Localisation updates from https://translatewiki.net. [12:13:54] [02SpriteSheet] 07translatewiki pushed 031 commit to 03master [+0/-0/±2] 13https://github.com/miraheze/SpriteSheet/compare/2d68df532fa3...6e9bc31bfb94 [12:13:56] [02SpriteSheet] 07translatewiki 036e9bc31 - Localisation updates from https://translatewiki.net. [12:13:59] [02RottenLinks] 07translatewiki pushed 031 commit to 03master [+0/-0/±2] 13https://github.com/miraheze/RottenLinks/compare/b2a23a38f2ea...ee4f2dde768d [12:14:00] [02RottenLinks] 07translatewiki 03ee4f2dd - Localisation updates from https://translatewiki.net. [12:14:02] [02MirahezeMagic] 07translatewiki pushed 031 commit to 03master [+0/-0/±6] 13https://github.com/miraheze/MirahezeMagic/compare/6dc7f6fe84cb...559257078e1f [12:14:03] [02MirahezeMagic] 07translatewiki 035592570 - Localisation updates from https://translatewiki.net. [12:14:04] [02WikiDiscover] 07translatewiki pushed 031 commit to 03master [+0/-0/±1] 13https://github.com/miraheze/WikiDiscover/compare/d2c71a911108...20e70d0d7a53 [12:14:06] [02WikiDiscover] 07translatewiki 0320e70d0 - Localisation updates from https://translatewiki.net. [12:14:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:15:08] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 20.44, 20.18, 18.89 [12:16:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 24.71, 22.14, 20.52 [12:17:06] miraheze/CreateWiki - translatewiki the build has errored. [12:17:19] miraheze/SpriteSheet - translatewiki the build has errored. [12:17:37] miraheze/MirahezeMagic - translatewiki the build has errored. [12:18:04] miraheze/DataDump - translatewiki the build passed. [12:18:12] miraheze/RottenLinks - translatewiki the build passed. [12:18:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.28, 21.85, 20.61 [12:18:42] miraheze/IncidentReporting - translatewiki the build passed. [12:18:46] miraheze/WikiDiscover - translatewiki the build passed. [12:18:57] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.80, 23.15, 20.35 [12:19:50] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:20:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.38, 22.99, 21.14 [12:20:51] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.33, 21.81, 20.19 [12:22:26] PROBLEM - cloud15 IPMI Sensors on cloud15 is UNKNOWN: ipmi_sdr_cache_open: /root/.freeipmi/sdr-cache/sdr-cache-cloud15.localhost: internal IPMI error-> Execution of /usr/sbin/ipmi-sel failed with return code 1.-> /usr/sbin/ipmi-sel was executed with the following parameters: sudo /usr/sbin/ipmi-sel --output-event-state --interpret-oem-data --entity-sensor-names --sensor-types=all [12:22:46] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 11.55, 18.14, 19.04 [12:24:04] miraheze/ImportDump - translatewiki the build passed. [12:24:19] !log [@test151] starting deploy of {'folders': '1.42/extensions/MirahezeMagic'} to test151 [12:24:20] !log [@test151] finished deploy of {'folders': '1.42/extensions/MirahezeMagic'} to test151 - SUCCESS in 0s [12:24:28] PROBLEM - cloud15 IPMI Sensors on cloud15 is CRITICAL: IPMI Status: Critical [434 system event log (SEL) entries present] [12:24:35] !log [@test151] starting deploy of {'folders': '1.43/extensions/MirahezeMagic'} to test151 [12:24:35] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:24:36] !log [@test151] finished deploy of {'folders': '1.43/extensions/MirahezeMagic'} to test151 - SUCCESS in 0s [12:24:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:25:00] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:25:06] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:26:35] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 24.75, 20.75, 19.81 [12:26:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 22.48, 23.52, 22.10 [12:26:50] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:28:10] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [12:28:38] PROBLEM - mw181 Current Load on mw181 is CRITICAL: LOAD CRITICAL - total load average: 26.68, 24.15, 22.46 [12:28:47] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 6.36, 4.50, 3.35 [12:30:05] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.073 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [12:30:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.42, 20.96, 20.18 [12:30:38] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 18.04, 21.69, 21.77 [12:30:41] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.12, 3.51, 3.12 [12:31:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:31:50] PROBLEM - db181 PowerDNS Recursor on db181 is CRITICAL: CRITICAL - Plugin timed out while executing system call [12:31:58] PROBLEM - db181 Current Load on db181 is CRITICAL: LOAD CRITICAL - total load average: 43.42, 20.11, 7.94 [12:32:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 18.74, 19.98, 19.90 [12:32:35] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 5.64, 4.30, 3.46 [12:32:49] !log [@mwtask181] starting deploy of {'folders': '1.42/extensions/MirahezeMagic'} to all [12:32:54] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:33:03] !log [@mwtask181] finished deploy of {'folders': '1.42/extensions/MirahezeMagic'} to all - SUCCESS in 14s [12:33:09] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:33:51] PROBLEM - db181 Puppet on db181 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [12:34:00] PROBLEM - db181 SSH on db181 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:35:50] RECOVERY - db181 Puppet on db181 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [12:35:50] RECOVERY - db181 PowerDNS Recursor on db181 is OK: DNS OK: 0.066 seconds response time. wikitide.net returns 2602:294:0:b13::110,2602:294:0:b23::112,38.46.223.205,38.46.223.206 [12:35:54] RECOVERY - db181 SSH on db181 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) [12:36:23] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.50, 3.87, 3.48 [12:36:25] PROBLEM - mw151 Current Load on mw151 is WARNING: LOAD WARNING - total load average: 19.27, 20.46, 20.18 [12:36:50] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:37:37] PROBLEM - ping6 on cp51 is CRITICAL: PING CRITICAL - Packet loss = 16%, RTA = 185.54 ms [12:37:55] !log [@mwtask171] starting deploy of {'folders': '1.42/extensions/MirahezeMagic'} to all [12:38:03] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:38:07] !log [@mwtask171] finished deploy of {'folders': '1.42/extensions/MirahezeMagic'} to all - SUCCESS in 11s [12:38:14] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:38:17] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 2.88, 3.29, 3.30 [12:38:25] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 18.36, 19.61, 19.90 [12:38:38] RECOVERY - mw181 Current Load on mw181 is OK: LOAD OK - total load average: 20.11, 19.50, 20.40 [12:39:58] PROBLEM - db181 Current Load on db181 is WARNING: LOAD WARNING - total load average: 0.45, 11.64, 10.76 [12:40:49] PROBLEM - cp51 HTTPS on cp51 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10002 milliseconds with 0 bytes received [12:41:57] RECOVERY - db181 Current Load on db181 is OK: LOAD OK - total load average: 0.31, 7.89, 9.49 [12:42:45] RECOVERY - cp51 HTTPS on cp51 is OK: HTTP OK: HTTP/2 404 - Status line output matched "HTTP/2 404" - 3820 bytes in 1.294 second response time [12:43:45] RECOVERY - ping6 on cp51 is OK: PING OK - Packet loss = 0%, RTA = 184.36 ms [12:44:00] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 3.59, 4.20, 3.67 [12:45:54] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 2.33, 3.62, 3.52 [12:46:50] [Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1[Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:47:48] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.98, 3.01, 3.30 [12:50:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:53:14] PROBLEM - mw151 Current Load on mw151 is CRITICAL: LOAD CRITICAL - total load average: 25.43, 21.05, 20.03 [12:55:20] [Grafana] !tech RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:57:03] RECOVERY - mw151 Current Load on mw151 is OK: LOAD OK - total load average: 17.84, 20.16, 19.99 [12:57:20] [Grafana] !tech FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?orgId=1 [12:57:28] PROBLEM - mw181 Current Load on mw181 is WARNING: LOAD WARNING - total load average: 20.94, 20.85, 20.11 [12:57:43] PROBLEM - ns2 NTP time on ns2 is UNKNOWN: check_ntp_time: Invalid hostname/address - time.cloudflare.comUsage: check_ntp_time -H [-4|-6] [-w ] [-c ] [-v verbose] [-o