[00:00:51] [02statichelp] 07WikiTideBot pushed 1 new commit to 03main 13https://github.com/miraheze/statichelp/commit/bf0cc1b0d932afa8b9ce79e90a23c2afd2ebad41 [00:00:51] 02statichelp/03main 07WikiTideBot 03bf0cc1b Bot: Auto-update Tech namespace pages 2026-02-27 00:00:44 [00:34:17] PROBLEM - mwtask181 Check unit status of mediawiki_job_generate-sitemap-index on mwtask181 is CRITICAL: CRITICAL: Status of the systemd unit mediawiki_job_generate-sitemap-index [00:35:01] !log [blankeclair@mwtask171] Starting import for animatorvsanimationwiki (XML: None; Images: images) (START) [00:35:02] !log [blankeclair@mwtask171] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php importImages --wiki=animatorvsanimationwiki --sleep=1 '--comment=Importing images from https://animatorvsanimation.fandom.com ([[phorge:T15023|T15023]])' -- images (START) [00:35:04] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [00:35:05] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [01:21:35] [02CreateWiki] 07codecov[bot] commented on pull request #808: :x: Patch coverage is `0%` with `1 line` in your changes missing coverage. Please review. […] 13https://github.com/miraheze/CreateWiki/pull/808#issuecomment-3970175148 [02:18:32] !log [skye@mwtask181] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php CreateWiki:DeleteWiki --wiki=loginwiki --delete --deletewiki armlessdetectivewiki (END - exit=0) [02:18:34] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:20:32] !log c4: DROP DATABASE armlessdetectivewiki; [02:20:34] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:24:44] !log [skye@mwtask181] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php MirahezeMagic:RenameDatabase --wiki=loginwiki --rename --old=armlessdetectiverobloxwiki --new=armlessdetectivewiki --user=Skye (END - exit=0) [02:24:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:25:33] !log [skye@mwtask181] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php CreateWiki:SetContainersAccess --wiki=armlessdetectivewiki (END - exit=0) [02:25:35] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:28:13] !log [skye@mwtask181] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php MirahezeMagic:RenameDatabase --wiki=loginwiki --rename --old=compediumglaguswiki --new=compendiumglaguswiki --user=Skye (END - exit=0) [02:28:15] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:28:57] !log [skye@mwtask181] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php CreateWiki:SetContainersAccess --wiki=compendiumglaguswiki (END - exit=0) [02:28:59] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [02:30:26] [02ssl] 07WikiTideBot pushed 1 new commit to 03main 13https://github.com/miraheze/ssl/commit/2c78ca06dca95d8972e427882bbdaf54a3ee040b [02:30:26] 02ssl/03main 07WikiTideBot 032c78ca0 Bot: Auto-update domain lists [04:04:25] !log [blankeclair@mwtask171] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php importImages --wiki=animatorvsanimationwiki --sleep=1 '--comment=Importing images from https://animatorvsanimation.fandom.com ([[phorge:T15023|T15023]])' -- images (END - exit=0) [04:04:26] !log [blankeclair@mwtask171] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php initSiteStats --wiki=animatorvsanimationwiki --update (START) [04:04:27] !log [blankeclair@mwtask171] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php initSiteStats --wiki=animatorvsanimationwiki --update (END - exit=0) [04:04:28] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:04:28] !log [blankeclair@mwtask171] Finished import for animatorvsanimationwiki (XML: None; Images: images) (END - exit=0) [04:04:29] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:04:31] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [04:04:33] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [05:02:38] [Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs a minute over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772164880000&orgId=1&to=1772168558162 [05:03:08] [02puppet] 07Honoka55 opened pull request #4798: CSP: add fastly.jsdelivr.net to connect-src (07miraheze:03main...07PGW-MH:03T15025) 13https://github.com/miraheze/puppet/pull/4798 [05:03:56] [02puppet] 07Universal-Omega merged 07Honoka55's pull request #4798: CSP: add fastly.jsdelivr.net to connect-src (07miraheze:03main...07PGW-MH:03T15025) 13https://github.com/miraheze/puppet/pull/4798 [05:03:56] [02puppet] 07Universal-Omega pushed 1 new commit to 03main 13https://github.com/miraheze/puppet/commit/7bcb8a81c0ed319cb1c043ace24cb3e814c10c5d [05:03:56] 02puppet/03main 07Honoka55 037bcb8a8 CSP: add fastly.jsdelivr.net to connect-src (#4798)… [05:07:38] [Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772164880000&orgId=1&to=1772168630000 [05:09:59] [Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs a minute over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165300000&orgId=1&to=1772168999148 [05:14:59] [Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165460000&orgId=1&to=1772169299149[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165460000&orgId=1&to=1772169299149[Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165300000&orgId=1&to=1772169140 [05:19:39] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 11.10, 7.61, 4.66 [05:20:08] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [05:20:08] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [05:20:36] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:24:51] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:24:59] [Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772166110000&orgId=1&to=1772169899151[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165460000&orgId=1&to=1772169899151[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from [05:24:59] 0000&orgId=1&to=1772169899151[Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165720000&orgId=1&to=1772169890000 [05:25:19] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.338 seconds response time. prometheus151.fsslc.wtnet returns 10.0.15.116 [05:25:21] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [05:25:21] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 20 minutes ago with 0 failures [05:26:45] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_10.0p2 (protocol 2.0) [05:27:09] [02puppet] 07Universal-Omega pushed 1 new commit to 03prometheus-drop-path 13https://github.com/miraheze/puppet/commit/298bb1b0d9baaa186b6909f5a746b37d506bc23e [05:27:10] 02puppet/03prometheus-drop-path 07CosmicAlpha 03298bb1b Include both [05:27:25] [02puppet] 07Universal-Omega merged pull request #4793: prometheus: drop path metrics in statsd_exporter (03main...03prometheus-drop-path) 13https://github.com/miraheze/puppet/pull/4793 [05:27:25] [02puppet] 07Universal-Omega pushed 1 new commit to 03main 13https://github.com/miraheze/puppet/commit/794fa10ce5cc8a8049ecdb65389aa1caac17d2d9 [05:27:25] 02puppet/03main 07CosmicAlpha 03794fa10 prometheus: drop path metrics in statsd_exporter [05:27:25] [02puppet] 07Universal-Omega 04deleted 03prometheus-drop-path at 03298bb1b 13https://api.github.com/repos/miraheze/puppet/commit/298bb1b [05:29:53] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 7.31, 7.90, 6.90 [05:29:59] [Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772166410000&orgId=1&to=1772170010000[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165460000&orgId=1&to=1772170199151[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165460000&orgId=1&to=1 [05:29:59] 51[Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165720000&orgId=1&to=1772170070000[Grafana] RESOLVED: DatasourceNoData https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772166380000&orgId=1&to=1772170070000 [05:31:51] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 11.25, 9.10, 7.46 [05:34:59] [Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772166890000&orgId=1&to=1772170490000[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165460000&orgId=1&to=1772170499152[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165460000&orgId=1&to=1 [05:34:59] 52[Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs a minute over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772166560000&orgId=1&to=1772170499152[Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs a minute over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772166890000&orgId=1&to=1772170499152 [05:36:21] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 3.28, 7.99, 7.67 [05:38:48] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.35, 5.49, 6.78 [05:39:59] [Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772166890000&orgId=1&to=1772170490000[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165460000&orgId=1&to=1772170799152[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165460000&orgId=1&to=1 [05:39:59] 52[Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772166560000&orgId=1&to=1772170580000[Grafana] RESOLVED: DatasourceNoData https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772166890000&orgId=1&to=1772170610000 [05:43:52] prometheus pls dont explode idk how to fix you [05:55:42] PROBLEM - cp161 HTTP 4xx/5xx ERROR Rate on cp161 is WARNING: WARNING - NGINX Error Rate is 40% [05:59:42] RECOVERY - cp161 HTTP 4xx/5xx ERROR Rate on cp161 is OK: OK - NGINX Error Rate is 20% [06:14:59] [Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165460000&orgId=1&to=1772172660000[Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772165460000&orgId=1&to=1772172660000 [06:42:41] [02puppet] 07dependabot[bot] created 03dependabot/github_actions/actions/upload-artifact-7 (+1 new commit) 13https://github.com/miraheze/puppet/commit/e1c3c906453c [06:42:41] 02puppet/03dependabot/github_actions/actions/upload-artifact-7 07dependabot[bot] 03e1c3c90 build(deps): bump actions/upload-artifact from 6 to 7… [06:42:42] [02puppet] 07dependabot[bot] added the label 'github_actions' to pull request #4799 (build(deps): bump actions/upload-artifact from 6 to 7) 13https://github.com/miraheze/puppet/pull/4799 [06:42:42] [02puppet] 07dependabot[bot] added the label 'dependencies' to pull request #4799 (build(deps): bump actions/upload-artifact from 6 to 7) 13https://github.com/miraheze/puppet/pull/4799 [06:42:44] [02puppet] 07dependabot[bot] opened pull request #4799: build(deps): bump actions/upload-artifact from 6 to 7 (03main...03dependabot/github_actions/actions/upload-artifact-7) 13https://github.com/miraheze/puppet/pull/4799 [06:42:46] [02puppet] 07dependabot[bot] added the label 'dependencies' to pull request #4799 (build(deps): bump actions/upload-artifact from 6 to 7) 13https://github.com/miraheze/puppet/pull/4799 [06:42:48] [02puppet] 07dependabot[bot] added the label 'github_actions' to pull request #4799 (build(deps): bump actions/upload-artifact from 6 to 7) 13https://github.com/miraheze/puppet/pull/4799 [06:44:27] miraheze/puppet - dependabot[bot] the build passed. [08:08:22] PROBLEM - cp191 Varnish Backends on cp191 is CRITICAL: 1 backends are down. mw153 [08:08:39] PROBLEM - cp171 Varnish Backends on cp171 is CRITICAL: 1 backends are down. mw153 [08:08:40] PROBLEM - mw153 php-fpm on mw153 is CRITICAL: CRITICAL - Plugin timed out after 10 seconds [08:09:01] PROBLEM - cp201 Varnish Backends on cp201 is CRITICAL: 1 backends are down. mw153 [08:09:03] PROBLEM - mw153 SSH on mw153 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:09:06] PROBLEM - mw153 HTTPS on mw153 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: cURL returned 28 - Operation timed out after 10004 milliseconds with 0 bytes received [08:09:23] PROBLEM - mw153 PowerDNS Recursor on mw153 is CRITICAL: CRITICAL - Plugin timed out while executing system call [08:09:32] PROBLEM - cp161 Varnish Backends on cp161 is CRITICAL: 1 backends are down. mw153 [08:09:34] PROBLEM - mw153 Current Load on mw153 is CRITICAL: LOAD CRITICAL - total load average: 56.62, 33.66, 16.86 [08:10:38] RECOVERY - mw153 php-fpm on mw153 is OK: PROCS OK: 25 processes with command name 'php-fpm8.4' [08:10:39] RECOVERY - cp171 Varnish Backends on cp171 is OK: All 31 backends are healthy [08:10:58] RECOVERY - mw153 SSH on mw153 is OK: SSH OK - OpenSSH_10.0p2 (protocol 2.0) [08:11:01] RECOVERY - cp201 Varnish Backends on cp201 is OK: All 31 backends are healthy [08:11:02] RECOVERY - mw153 HTTPS on mw153 is OK: HTTP OK: HTTP/2 410 - Status line output matched "HTTP/2 410" - 4308 bytes in 0.057 second response time [08:11:22] RECOVERY - mw153 PowerDNS Recursor on mw153 is OK: DNS OK: 0.046 seconds response time. mw153.fsslc.wtnet returns 10.0.15.140 [08:11:32] RECOVERY - cp161 Varnish Backends on cp161 is OK: All 31 backends are healthy [08:12:18] RECOVERY - cp191 Varnish Backends on cp191 is OK: All 31 backends are healthy [08:13:26] RECOVERY - mw153 Current Load on mw153 is OK: LOAD OK - total load average: 5.63, 18.20, 14.31 [09:03:30] [Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs a minute over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179310000&orgId=1&to=1772183010496 [09:07:04] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [09:07:16] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [09:07:25] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 24.23, 11.81, 5.17 [09:07:26] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [09:07:47] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:08:30] [Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179310000&orgId=1&to=1772183300000 [09:09:37] [Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179670000&orgId=1&to=1772183377025 [09:10:08] !log [skye@mwtask181] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php namespaceDupes --wiki=zeroerawiki --fix (END - exit=0) [09:10:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [09:10:45] PROBLEM - prometheus151 conntrack_table_size on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [09:14:37] [Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179670000&orgId=1&to=1772183677026[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772183677026[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from [09:14:37] 0000&orgId=1&to=1772183677026[Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179310000&orgId=1&to=1772183660000 [09:15:55] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_10.0p2 (protocol 2.0) [09:15:55] RECOVERY - prometheus151 conntrack_table_size on prometheus151 is OK: OK: nf_conntrack is 0 % full [09:15:58] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 40 minutes ago with 0 failures [09:16:32] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.057 seconds response time. prometheus151.fsslc.wtnet returns 10.0.15.116 [09:17:56] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [09:19:37] [Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179670000&orgId=1&to=1772183977027[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772183977027[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from [09:19:37] 0000&orgId=1&to=1772183977027[Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs a minute over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179310000&orgId=1&to=1772183977028 [09:19:56] PROBLEM - prometheus151 Prometheus on prometheus151 is CRITICAL: connect to address 10.0.15.116 and port 9090: Connection refused [09:22:32] PROBLEM - prometheus151 Current Load on prometheus151 is WARNING: LOAD WARNING - total load average: 0.04, 5.10, 7.31 [09:24:29] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.00, 3.41, 6.42 [09:34:37] [Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772181110000&orgId=1&to=1772184877029[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179670000&orgId=1&to=1772184877030[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/G [09:34:37] rom=1772179860000&orgId=1&to=1772184877030[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772184877030[Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs a minute over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179310000&orgId=1&to=1772184877030 [09:44:51] !log [skye@mwtask181] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php namespaceDupes --wiki=zeroerawiki --fix (END - exit=0) [09:44:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [09:49:52] PROBLEM - cp191 Varnish Backends on cp191 is CRITICAL: 1 backends are down. mw191 [09:50:43] PROBLEM - mw191 Current Load on mw191 is WARNING: LOAD WARNING - total load average: 18.90, 20.63, 12.61 [09:51:52] RECOVERY - cp191 Varnish Backends on cp191 is OK: All 31 backends are healthy [09:52:43] RECOVERY - mw191 Current Load on mw191 is OK: LOAD OK - total load average: 10.44, 16.77, 12.16 [10:04:37] [Grafana] FIRING: The mediawiki job queue has more than 500,000 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772182910000&orgId=1&to=1772186677034[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772181110000&orgId=1&to=1772186677034[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk? [10:04:37] 79670000&orgId=1&to=1772186677034[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772186677034[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772186677034[Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs [10:04:37] over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179310000&orgId=1&to=1772186677034 [10:53:19] [02mw-config] 07lihaohong6 merged 07YTFGolf's pull request #6329: T13977: install TableProgressTracking extension (07miraheze:03main...07YTFGolf:03patch-T13977) 13https://github.com/miraheze/mw-config/pull/6329 [10:53:19] [02mw-config] 07lihaohong6 pushed 1 new commit to 03main 13https://github.com/miraheze/mw-config/commit/758f454bdc18fdaba7c69b8afdb2dad6c8cae5fc [10:53:19] 02mw-config/03main 07YTFGolf 03758f454 T13977: install TableProgressTracking extension (#6329) [10:54:08] !log [petramagna@mwtask181] starting deploy of {'pull': 'config', 'config': True, 'l10n': True, 'extension_list': True, 'versions': '1.45', 'upgrade_extensions': 'TableProgressTracking'} to all [10:54:10] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [10:54:28] miraheze/mw-config - lihaohong6 the build passed. [11:03:56] !log [petramagna@mwtask181] finished deploy of {'pull': 'config', 'config': True, 'l10n': True, 'extension_list': True, 'versions': '1.45', 'upgrade_extensions': 'TableProgressTracking'} to all - SUCCESS in 587s [11:03:57] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [11:04:35] [02puppet] 07pskyechology opened pull request #4800: MW DB: Add bucketuser (07miraheze:03main...07pskyechology:03T14235) 13https://github.com/miraheze/puppet/pull/4800 [11:11:37] [02mw-config] 07pskyechology approved pull request #6309 13https://github.com/miraheze/mw-config/pull/6309#pullrequestreview-3866363071 [11:11:47] [02mw-config] 07pskyechology merged 07YTFGolf's pull request #6309: T14950: Add $wgVectorNightMode and $wgMinervaNightMode to ManageWiki (07miraheze:03main...07YTFGolf:03patch-night) 13https://github.com/miraheze/mw-config/pull/6309 [11:11:47] [02mw-config] 07pskyechology pushed 1 new commit to 03main 13https://github.com/miraheze/mw-config/commit/6db519ce0f2a7ce0256e23ecd113b6dc88251627 [11:11:47] 02mw-config/03main 07YTFGolf 036db519c T14950: Add $wgVectorNightMode and $wgMinervaNightMode to ManageWiki (#6309)… [11:12:07] !log [skye@mwtask181] starting deploy of {'pull': 'config', 'config': True} to all [11:12:09] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [11:12:30] !log [skye@mwtask181] finished deploy of {'pull': 'config', 'config': True} to all - SUCCESS in 23s [11:12:33] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [11:13:06] miraheze/mw-config - pskyechology the build passed. [11:36:08] [02mw-config] 07pskyechology commented on pull request #6310: I assume we haven't merged this because we would get more warnings than Prometheus had paths? 13https://github.com/miraheze/mw-config/pull/6310#issuecomment-3972460258 [11:44:20] [02mw-config] 07pskyechology pushed 1 new commit to 03main 13https://github.com/miraheze/mw-config/commit/424358a36790f0239c2cf880fb10c1ec3257b009 [11:44:20] 02mw-config/03main 07Skye 03424358a Add sitenotice for 2026/03/05 cloud upgrades [11:44:27] !log [skye@mwtask181] starting deploy of {'pull': 'config', 'config': True} to all [11:44:29] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [11:44:50] !log [skye@mwtask181] finished deploy of {'pull': 'config', 'config': True} to all - SUCCESS in 22s [11:44:52] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [11:45:39] miraheze/mw-config - pskyechology the build passed. [12:39:31] [02mw-config] 07Aeywoo opened pull request #6331: Undeploy MobileTabsPlugin per T15029 (07miraheze:03main...07Aeywoo:03main) 13https://github.com/miraheze/mw-config/pull/6331 [12:40:44] miraheze/mw-config - Aeywoo the build passed. [12:41:37] [02mediawiki-repos] 07Aeywoo opened pull request #127: Undeploy MobileTabsPlugin per T15029 (07miraheze:03main...07Aeywoo:03main) 13https://github.com/miraheze/mediawiki-repos/pull/127 [12:41:50] [02mw-config] 07pskyechology approved pull request #6331: once other steps are done 13https://github.com/miraheze/mw-config/pull/6331#pullrequestreview-3866751708 [12:42:54] [02mediawiki-repos] 07BlankEclair approved pull request #127 13https://github.com/miraheze/mediawiki-repos/pull/127#pullrequestreview-3866756650 [12:43:02] [02mw-config] 07Aeywoo commented on pull request #6331: Thanks :3 13https://github.com/miraheze/mw-config/pull/6331#issuecomment-3972766344 [12:44:27] [02mw-config] 07pskyechology pushed 1 new commit to 03main 13https://github.com/miraheze/mw-config/commit/13cb3ccfe96c726a0b65427427d3bbb105591bb0 [12:44:27] 02mw-config/03main 07Skye 0313cb3cc T15029: Globally disable mobiletabsplugin [12:44:30] !log [skye@mwtask181] starting deploy of {'pull': 'config', 'config': True} to all [12:44:32] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:44:59] !log [skye@mwtask181] finished deploy of {'pull': 'config', 'config': True} to all - SUCCESS in 28s [12:45:01] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:45:39] miraheze/mw-config - pskyechology the build passed. [12:46:32] !log [skye@mwtask181] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php ManageWiki:ToggleExtension --wiki=loginwiki --name=mobiletabsplugin --disable --all-wikis --execute (END - exit=0) [12:46:34] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:50:02] miraheze/mw-config - Aeywoo the build passed. [12:50:36] miraheze/mw-config - Aeywoo the build passed. [12:51:43] [02mw-config] 07pskyechology merged 07Aeywoo's pull request #6331: Undeploy MobileTabsPlugin per T15029 (07miraheze:03main...07Aeywoo:03main) 13https://github.com/miraheze/mw-config/pull/6331 [12:51:44] [02mw-config] 07pskyechology pushed 1 new commit to 03main 13https://github.com/miraheze/mw-config/commit/3b79ce14ce04ebadac6042eaad01635912c53b23 [12:51:45] 02mw-config/03main 07Aeywoo 033b79ce1 Undeploy MobileTabsPlugin per T15029 (#6331)… [12:51:59] !log [skye@mwtask181] starting deploy of {'pull': 'config', 'config': True} to all [12:52:02] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:52:24] !log [skye@mwtask181] finished deploy of {'pull': 'config', 'config': True} to all - SUCCESS in 25s [12:52:26] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [12:52:31] [02mediawiki-repos] 07BlankEclair merged 07Aeywoo's pull request #127: Undeploy MobileTabsPlugin per T15029 (07miraheze:03main...07Aeywoo:03main) 13https://github.com/miraheze/mediawiki-repos/pull/127 [12:52:31] [02mediawiki-repos] 07BlankEclair pushed 1 new commit to 03main 13https://github.com/miraheze/mediawiki-repos/commit/d5ce9fdb8add4ba951efcc0a71ab30647eba00d7 [12:52:31] 02mediawiki-repos/03main 07Aeywoo 03d5ce9fd Undeploy MobileTabsPlugin per T15029 (#127)… [12:52:59] miraheze/mw-config - pskyechology the build passed. [12:59:38] [02mediawiki-repos] 07Tali64 opened pull request #128: Add WikiPoints (07miraheze:03main...07Tali64:03patch-1) 13https://github.com/miraheze/mediawiki-repos/pull/128 [13:01:03] !log mwtask181: sudo -u www-data rm -rf /srv/mediawiki-staging/1.45/extensions/MobileTabsPlugin/ [13:01:06] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:02:12] [02mw-config] 07Aeywoo closed pull request #6305: install RefreshSpecial per T12835 (again) (07miraheze:03main...07Aeywoo:03Aeywoo-patch-1) 13https://github.com/miraheze/mw-config/pull/6305 [13:03:43] miraheze/mw-config - Aeywoo the build passed. [13:04:44] !log test151: sudo -u www-data rm -rf /srv/mediawiki-staging/1.45/extensions/MobileTabsPlugin/ [13:04:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:05:24] !log [skye@mwtask181] starting deploy of {'pull': 'config', 'config': True, 'world': True, 'l10n': True, 'extension_list': True, 'versions': '1.45'} to all [13:05:26] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:05:31] !log [skye@test151] starting deploy of {'pull': 'config', 'config': True, 'world': True, 'l10n': True, 'extension_list': True, 'versions': '1.45'} to test151 [13:05:33] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:05:51] perfect opportunity to use somerandomdeveloper memes [13:06:04] https://cdn.discordapp.com/attachments/867365805674201091/1449887061325775019/togif.gif [13:07:29] [02mw-config] 07Tali64 opened pull request #6332: Add WikiPoints (07miraheze:03main...07Tali64:03main) 13https://github.com/miraheze/mw-config/pull/6332 [13:08:40] miraheze/mw-config - Tali64 the build passed. [13:11:38] !log [skye@test151] finished deploy of {'pull': 'config', 'config': True, 'world': True, 'l10n': True, 'extension_list': True, 'versions': '1.45'} to test151 - SUCCESS in 366s [13:11:40] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:14:02] now get back to work [13:17:37] !log [skye@mwtask181] finished deploy of {'pull': 'config', 'config': True, 'world': True, 'l10n': True, 'extension_list': True, 'versions': '1.45'} to all - SUCCESS in 733s [13:17:40] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [13:44:49] [02WikiTideDebug] 07dependabot[bot] created 03dependabot/github_actions/actions/upload-artifact-7 (+1 new commit) 13https://github.com/miraheze/WikiTideDebug/commit/019cf3101c3a [13:44:49] 02WikiTideDebug/03dependabot/github_actions/actions/upload-artifact-7 07dependabot[bot] 03019cf31 Bump actions/upload-artifact from 6 to 7… [13:44:50] [02WikiTideDebug] 07dependabot[bot] added the label 'dependencies' to pull request #31 (Bump actions/upload-artifact from 6 to 7) 13https://github.com/miraheze/WikiTideDebug/pull/31 [13:44:52] [02WikiTideDebug] 07dependabot[bot] added the label 'github_actions' to pull request #31 (Bump actions/upload-artifact from 6 to 7) 13https://github.com/miraheze/WikiTideDebug/pull/31 [13:44:54] [02WikiTideDebug] 07dependabot[bot] opened pull request #31: Bump actions/upload-artifact from 6 to 7 (03main...03dependabot/github_actions/actions/upload-artifact-7) 13https://github.com/miraheze/WikiTideDebug/pull/31 [13:44:56] [02WikiTideDebug] 07dependabot[bot] added the label 'dependencies' to pull request #31 (Bump actions/upload-artifact from 6 to 7) 13https://github.com/miraheze/WikiTideDebug/pull/31 [13:44:58] [02WikiTideDebug] 07dependabot[bot] added the label 'github_actions' to pull request #31 (Bump actions/upload-artifact from 6 to 7) 13https://github.com/miraheze/WikiTideDebug/pull/31 [14:04:37] [Grafana] FIRING: The mediawiki job queue has more than 500,000 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772182910000&orgId=1&to=1772201077071[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772181110000&orgId=1&to=1772201077071[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk? [14:04:37] 79670000&orgId=1&to=1772201077071[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772201077071[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772201077071[Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs [14:04:37] over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179310000&orgId=1&to=1772201077071 [14:04:51] [02mw-config] 07pskyechology approved pull request #6307 13https://github.com/miraheze/mw-config/pull/6307#pullrequestreview-3867119277 [14:05:11] [02mw-config] 07pskyechology pushed 1 new commit to 03main 13https://github.com/miraheze/mw-config/commit/aab0285a0e605e78e6ffc8ad21d5b872a0b81919 [14:05:11] 02mw-config/03main 07Aeywoo 03aab0285 Re-enable SimpleBlogPage per T13252 (#6307)… [14:05:11] [02mw-config] 07pskyechology merged 07Aeywoo's pull request #6307: Re-enable SimpleBlogPage per T13252 (07miraheze:03main...07Aeywoo:03Aeywoo-patch-2) 13https://github.com/miraheze/mw-config/pull/6307 [14:10:39] [02mediawiki-repos] 07pskyechology approved pull request #128 13https://github.com/miraheze/mediawiki-repos/pull/128#pullrequestreview-3867145690 [14:11:30] [02mediawiki-repos] 07pskyechology merged 07Tali64's pull request #128: Add WikiPoints (07miraheze:03main...07Tali64:03patch-1) 13https://github.com/miraheze/mediawiki-repos/pull/128 [14:11:30] [02mediawiki-repos] 07pskyechology pushed 1 new commit to 03main 13https://github.com/miraheze/mediawiki-repos/commit/8e41ec712c617c566d48dc668878c860c159f1b9 [14:11:31] 02mediawiki-repos/03main 07Tali64 038e41ec7 Add WikiPoints (#128)… [14:23:09] [02mw-config] 07pskyechology approved pull request #6332 13https://github.com/miraheze/mw-config/pull/6332#pullrequestreview-3867205486 [14:23:26] [02mw-config] 07pskyechology pushed 1 new commit to 03main 13https://github.com/miraheze/mw-config/commit/3709b6f02abcad3535a550c4b38f8136cf46204a [14:23:26] 02mw-config/03main 07Tali64 033709b6f Add WikiPoints (#6332)… [14:23:26] [02mw-config] 07pskyechology merged 07Tali64's pull request #6332: Add WikiPoints (07miraheze:03main...07Tali64:03main) 13https://github.com/miraheze/mw-config/pull/6332 [14:38:47] !log [skye@test151] starting deploy of {'world': True, 'extension_list': True, 'versions': '1.45'} to test151 [14:38:49] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:39:10] !log [skye@test151] finished deploy of {'world': True, 'extension_list': True, 'versions': '1.45'} to test151 - SUCCESS in 22s [14:39:11] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:49:17] !log [skye@test151] starting deploy of {'pull': 'config', 'config': True, 'l10n': True, 'extension_list': True, 'force_upgrade': True, 'versions': '1.45', 'upgrade_extensions': ['SimpleBlogPage', 'WikiPoints']} to test151 [14:49:19] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [14:55:16] !log [skye@test151] finished deploy of {'pull': 'config', 'config': True, 'l10n': True, 'extension_list': True, 'force_upgrade': True, 'versions': '1.45', 'upgrade_extensions': ['SimpleBlogPage', 'WikiPoints']} to test151 - SUCCESS in 358s [14:55:18] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:01:48] [02mw-config] 07pskyechology pushed 1 new commit to 03main 13https://github.com/miraheze/mw-config/commit/386c12ecf563e17b26cdfc372ceeb5692e045de3 [15:01:48] 02mw-config/03main 07Skye 03386c12e Revert "Add WikiPoints (#6332)"… [15:03:02] miraheze/mw-config - pskyechology the build passed. [15:08:18] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus] [15:09:07] i forgot to put my damn glasses on after lifting [15:09:17] (its actually a skill issue) [15:11:58] [02mw-config] 07pskyechology pushed 2 new commits to 03main 13https://github.com/miraheze/mw-config/compare/386c12ecf563...feef09ff330a [15:11:58] 02mw-config/03main 07Skye 03dc651d1 Reapply "Add WikiPoints (#6332)"… [15:11:59] 02mw-config/03main 07Skye 03feef09f Load OOJSPlus with SimpleBlogPage [15:12:24] !log [skye@test151] starting deploy of {'pull': 'config', 'config': True} to test151 [15:12:25] !log [skye@test151] finished deploy of {'pull': 'config', 'config': True} to test151 - SUCCESS in 0s [15:12:26] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:12:28] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [15:13:12] miraheze/mw-config - pskyechology the build passed. [15:25:09] [02MediaWikiDebugJS] 07dependabot[bot] created 03dependabot/github_actions/actions/upload-artifact-7 (+1 new commit) 13https://github.com/miraheze/MediaWikiDebugJS/commit/95bfeab4f380 [15:25:09] 02MediaWikiDebugJS/03dependabot/github_actions/actions/upload-artifact-7 07dependabot[bot] 0395bfeab Bump actions/upload-artifact from 6 to 7… [15:25:10] [02MediaWikiDebugJS] 07dependabot[bot] added the label 'github_actions' to pull request #33 (Bump actions/upload-artifact from 6 to 7) 13https://github.com/miraheze/MediaWikiDebugJS/pull/33 [15:25:10] [02MediaWikiDebugJS] 07dependabot[bot] added the label 'dependencies' to pull request #33 (Bump actions/upload-artifact from 6 to 7) 13https://github.com/miraheze/MediaWikiDebugJS/pull/33 [15:25:12] [02MediaWikiDebugJS] 07dependabot[bot] opened pull request #33: Bump actions/upload-artifact from 6 to 7 (03main...03dependabot/github_actions/actions/upload-artifact-7) 13https://github.com/miraheze/MediaWikiDebugJS/pull/33 [15:25:14] [02MediaWikiDebugJS] 07dependabot[bot] added the label 'dependencies' to pull request #33 (Bump actions/upload-artifact from 6 to 7) 13https://github.com/miraheze/MediaWikiDebugJS/pull/33 [15:25:16] [02MediaWikiDebugJS] 07dependabot[bot] added the label 'github_actions' to pull request #33 (Bump actions/upload-artifact from 6 to 7) 13https://github.com/miraheze/MediaWikiDebugJS/pull/33 [15:36:18] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [15:36:52] [02mediawiki-repos] 07pskyechology pushed 1 new commit to 03main 13https://github.com/miraheze/mediawiki-repos/commit/9034c2285eb87e47f2478874c38af91f16b17ea2 [15:36:52] 02mediawiki-repos/03main 07Skye 039034c22 +Composer for SimpleBlogPage [16:29:38] !log [skye@test151] starting deploy of {'folders': '1.45/extensions/SimpleBlogPage'} to test151 [16:29:39] !log [skye@test151] finished deploy of {'folders': '1.45/extensions/SimpleBlogPage'} to test151 - SUCCESS in 0s [16:29:40] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [16:29:43] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [16:55:53] !log removed 3 usergroups from mw_permissions (luntikfanonwiki) [16:55:56] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:08:17] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[prometheus] [17:31:23] !log [skye@test151] starting deploy of {'pull': 'config', 'config': True, 'l10n': True, 'extension_list': True, 'force_upgrade': True, 'versions': '1.45', 'upgrade_extensions': 'SimpleBlogPage'} to test151 [17:31:25] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:32:16] !log [skye@test151] finished deploy of {'pull': 'config', 'config': True, 'l10n': True, 'extension_list': True, 'force_upgrade': True, 'versions': '1.45', 'upgrade_extensions': 'SimpleBlogPage'} to test151 - SUCCESS in 55s [17:32:19] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [17:36:17] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [17:41:10] [02puppet] 07SomeMWDev left a file comment in pull request #4800 0327ede94: The default PHP-side limits in Bucket are 500ms per query and 10s per page parse. `MAX_STATEMENT_TIME 600` means 10 minutes per query (per https://mariadb.com/docs/server/server-management/variables-and-modes/server-system- […] 13https://github.com/miraheze/puppet/pull/4800#discussion_r2865459997 [17:52:02] [02mediawiki-repos] 07pskyechology pushed 1 new commit to 03main 13https://github.com/miraheze/mediawiki-repos/commit/6b8a4007cc7a1a9dd9f71cf352d64f5c0b782098 [17:52:03] 02mediawiki-repos/03main 07Skye 036b8a400 +merge-plugin for SimpleBlogPage [18:01:00] !log [skye@test151] starting deploy of {'world': True, 'versions': '1.45', 'upgrade_extensions': 'SimpleBlogPage'} to test151 [18:01:03] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:01:06] [02puppet] 07Universal-Omega merged 07dependabot[bot]'s pull request #4799: build(deps): bump actions/upload-artifact from 6 to 7 (03main...03dependabot/github_actions/actions/upload-artifact-7) 13https://github.com/miraheze/puppet/pull/4799 [18:01:06] [02puppet] 07Universal-Omega pushed 1 new commit to 03main 13https://github.com/miraheze/puppet/commit/27280fef4c6ebf796f2b37dcc59b0e34cb068f9b [18:01:06] 02puppet/03main 07dependabot[bot] 0327280fe build(deps): bump actions/upload-artifact from 6 to 7 (#4799)… [18:01:07] [02puppet] 07Universal-Omega 04deleted 03dependabot/github_actions/actions/upload-artifact-7 at 03e1c3c90 13https://api.github.com/repos/miraheze/puppet/commit/e1c3c90 [18:01:21] !log [skye@test151] finished deploy of {'world': True, 'versions': '1.45', 'upgrade_extensions': 'SimpleBlogPage'} to test151 - SUCCESS in 21s [18:01:24] [02WikiTideDebug] 07Universal-Omega merged 07dependabot[bot]'s pull request #31: Bump actions/upload-artifact from 6 to 7 (03main...03dependabot/github_actions/actions/upload-artifact-7) 13https://github.com/miraheze/WikiTideDebug/pull/31 [18:01:24] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:01:24] [02WikiTideDebug] 07Universal-Omega pushed 1 new commit to 03main 13https://github.com/miraheze/WikiTideDebug/commit/32c81ef5c2ac3827789b1d0cf3f538f8a87fe2a6 [18:01:24] 02WikiTideDebug/03main 07dependabot[bot] 0332c81ef Bump actions/upload-artifact from 6 to 7 (#31)… [18:01:25] [02WikiTideDebug] 07Universal-Omega 04deleted 03dependabot/github_actions/actions/upload-artifact-7 at 03019cf31 13https://api.github.com/repos/miraheze/WikiTideDebug/commit/019cf31 [18:01:46] [02MediaWikiDebugJS] 07Universal-Omega merged 07dependabot[bot]'s pull request #33: Bump actions/upload-artifact from 6 to 7 (03main...03dependabot/github_actions/actions/upload-artifact-7) 13https://github.com/miraheze/MediaWikiDebugJS/pull/33 [18:01:46] [02MediaWikiDebugJS] 07Universal-Omega pushed 1 new commit to 03main 13https://github.com/miraheze/MediaWikiDebugJS/commit/8b6524f1839c048e1f9a8f2e58238b291c538e7f [18:01:46] 02MediaWikiDebugJS/03main 07dependabot[bot] 038b6524f Bump actions/upload-artifact from 6 to 7 (#33)… [18:01:47] [02MediaWikiDebugJS] 07Universal-Omega 04deleted 03dependabot/github_actions/actions/upload-artifact-7 at 0395bfeab 13https://api.github.com/repos/miraheze/MediaWikiDebugJS/commit/95bfeab [18:02:30] miraheze/puppet - Universal-Omega the build passed. [18:03:53] !log [skye@test151] starting deploy of {'world': True, 'force_upgrade': True, 'versions': '1.45', 'upgrade_extensions': 'SimpleBlogPage'} to test151 [18:03:55] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:04:09] !log [skye@test151] finished deploy of {'world': True, 'force_upgrade': True, 'versions': '1.45', 'upgrade_extensions': 'SimpleBlogPage'} to test151 - SUCCESS in 22s [18:04:11] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:04:37] [Grafana] FIRING: The mediawiki job queue has more than 500,000 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772182910000&orgId=1&to=1772215477106[Grafana] FIRING: Some MediaWiki Appservers are running out of PHP-FPM workers. https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772181110000&orgId=1&to=1772215477106[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk? [18:04:37] 79670000&orgId=1&to=1772215477106[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772215477106[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772215477106[Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs [18:04:37] over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179310000&orgId=1&to=1772215477106 [18:10:19] !log [skye@test151] starting deploy of {'world': True, 'versions': '1.45'} to test151 [18:10:22] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:10:44] !log [skye@test151] finished deploy of {'world': True, 'versions': '1.45'} to test151 - SUCCESS in 21s [18:10:46] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:25:00] [02puppet] 07pskyechology left a file comment in pull request #4800 0327ede94: 15 oughta give us enough space, I suppose. 13https://github.com/miraheze/puppet/pull/4800#discussion_r2865618410 [18:29:26] !log [skye@mwtask181] starting deploy of {'pull': 'config', 'config': True, 'world': True, 'l10n': True, 'extension_list': True, 'force_upgrade': True, 'versions': '1.45', 'upgrade_extensions': ['SimpleBlogPage', 'WikiPoints']} to all [18:29:29] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [18:40:30] !log [skye@mwtask181] finished deploy of {'pull': 'config', 'config': True, 'world': True, 'l10n': True, 'extension_list': True, 'force_upgrade': True, 'versions': '1.45', 'upgrade_extensions': ['SimpleBlogPage', 'WikiPoints']} to all - SUCCESS in 664s [18:40:32] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [19:42:09] !log [skye@mwtask181] starting deploy of {'versions': '1.45', 'upgrade_extensions': 'WikiPoints'} to all [19:42:11] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [19:42:34] !log [skye@mwtask181] finished deploy of {'versions': '1.45', 'upgrade_extensions': 'WikiPoints'} to all - SUCCESS in 24s [19:42:36] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:01:39] !log [skye@mwtask171] Starting import for gunvoltwiki (XML: None; Images: .) (START) [20:01:40] !log [skye@mwtask171] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php importImages --wiki=gunvoltwiki --sleep=1 '--comment=Importing images from https://azurestrikergunvolt.fandom.com/wiki/Azure_Striker_Gunvolt_Wiki ([[phorge:T15031|T15031]])' -- . (START) [20:01:41] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:01:43] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [20:10:26] PROBLEM - mw201 Current Load on mw201 is WARNING: LOAD WARNING - total load average: 20.94, 14.60, 8.96 [20:10:39] PROBLEM - cp171 Varnish Backends on cp171 is CRITICAL: 1 backends are down. mw183 [20:11:01] PROBLEM - cp201 Varnish Backends on cp201 is CRITICAL: 1 backends are down. mw183 [20:11:07] @paladox@abaddriverlol [20:11:32] PROBLEM - cp161 Varnish Backends on cp161 is CRITICAL: 1 backends are down. mw183 [20:11:36] whats going on [20:11:52] PROBLEM - cp191 Varnish Backends on cp191 is CRITICAL: 1 backends are down. mw183 [20:11:54] PROBLEM - mw183 SSH on mw183 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:12:01] why is grafana down [20:12:10] prometheus has been down since 10 hours ago 💀 [20:12:18] why is there no UBN task for that [20:12:19] PROBLEM - mw183 Current Load on mw183 is CRITICAL: LOAD CRITICAL - total load average: 56.01, 32.32, 16.33 [20:12:26] RECOVERY - mw201 Current Load on mw201 is OK: LOAD OK - total load average: 17.09, 15.13, 9.84 [20:12:39] i was blissfully unaware [20:13:23] prometheus is down [20:13:31] we really need to add monitoring for it [20:13:32] RECOVERY - cp161 Varnish Backends on cp161 is OK: All 31 backends are healthy [20:13:40] anyways > Feb 27 20:05:44 prometheus151 prometheus[3816453]: ts=2026-02-27T20:05:44.102Z caller=main.go:527 level=error msg="Error loading config (--config.file=/etc/prometheus/prometheus.yml)" file=/etc/prometheus/prometheus.yml err="parsing YAML file > [20:13:47] cc @cosmicalpha [20:13:47] why is there high load on some servers though [20:13:49] db/mw [20:13:52] RECOVERY - mw183 SSH on mw183 is OK: SSH OK - OpenSSH_10.0p2 (protocol 2.0) [20:13:52] RECOVERY - cp191 Varnish Backends on cp191 is OK: All 31 backends are healthy [20:14:12] > err="parsing YAML file /etc/prometheus/prometheus.yml: labeldrop action requires only 'regex', and no other fields" [20:14:16] PROBLEM - db161 Current Load on db161 is CRITICAL: LOAD CRITICAL - total load average: 34.37, 17.55, 8.50 [20:14:39] RECOVERY - cp171 Varnish Backends on cp171 is OK: All 31 backends are healthy [20:14:43] I thought I fixed that yesterday. Did I forget to update the commit when I pushed to puppet? [20:14:54] https://github.com/miraheze/puppet/blob/27280fef4c6ebf796f2b37dcc59b0e34cb068f9b/modules/role/manifests/prometheus.pp#L323 not sure [20:14:54] [GitHub] [miraheze/puppet] modules/role/manifests/prometheus.pp @ 27280fef4c6ebf796f2b37dcc59b0e34cb068f9b | L323:  'action' => 'labeldrop', [20:14:56] It seems I did lol. [20:15:01] RECOVERY - cp201 Varnish Backends on cp201 is OK: All 31 backends are healthy [20:15:02] not again lmfao [20:15:24] I fixed locally but when I merged to puppet it wasnt fixed. [20:15:51] We have no monitoring at all for prometheus? [20:16:15] RECOVERY - mw183 Current Load on mw183 is OK: LOAD OK - total load average: 10.78, 20.32, 15.06 [20:16:19] it should scream more after dying, i completely missed this https://discord.com/channels/407504499280707585/808001911868489748/1476871468406018173 [20:16:24] seems not [20:16:43] https://github.com/miraheze/puppet/blob/main/modules/prometheus/manifests/init.pp#L94 [20:16:43] [GitHub] [miraheze/puppet] modules/prometheus/manifests/init.pp @ main | L94:  monitoring::services { 'Prometheus': [20:16:45] we have this [20:16:56] oh hold on https://monitoring.wikitide.net/dashboard#!/monitoring/service/show?host=prometheus151&service=prometheus151%20Prometheus [20:18:06] I am fixing now. [20:19:56] it was hidden that's why i didn't see it (behind the other alerts) [20:22:16] PROBLEM - db161 Current Load on db161 is WARNING: LOAD WARNING - total load average: 5.39, 10.25, 9.15 [20:22:38] [02puppet] 07Universal-Omega pushed 1 new commit to 03main 13https://github.com/miraheze/puppet/commit/659d54a5d0db3b26fac8cb4d6b9ead74e9a4d657 [20:22:39] 02puppet/03main 07CosmicAlpha 03659d54a prometheus: fix labeldrop [20:24:16] RECOVERY - db161 Current Load on db161 is OK: LOAD OK - total load average: 4.25, 8.68, 8.74 [20:25:19] RECOVERY - prometheus151 Prometheus on prometheus151 is OK: TCP OK - 0.000 second response time on 10.0.15.116 port 9090 [20:25:24] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Exec[prometheus-reload] [20:25:56] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [20:29:37] [Grafana] FIRING: The mediawiki job queue has more than 500,000 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772182910000&orgId=1&to=1772224177127[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772220410000&orgId=1&to=1772224010000[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772220410000&orgId=1&to=1772224010000[Grafana] FIRING: [20:29:37] ly high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772224177127[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772224177127[Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179310000&orgId=1&to=1772224070000[Grafana] FIRING: The med [20:29:37] Queue backlog is increasing by more than 100 jobs a minute over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772220380000&orgId=1&to=1772224177127 [20:30:08] it's oom'ing @cosmicalpha [20:30:40] just frozen [20:31:32] oh nvm [20:31:42] seems the high load was unrelated and the freezing? [20:31:50] last oom was at 9am [20:32:02] which is around the time the thing stopped working [20:33:00] keeps freezing [20:33:36] you'll have to maybe drop MW using the statsd [20:33:38] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [20:33:47] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:34:07] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [20:34:37] [Grafana] FIRING: The mediawiki job queue has more than 500,000 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772182910000&orgId=1&to=1772224477128[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772224477128[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1X [20:34:37] 72179860000&orgId=1&to=1772224477128[Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772220530000&orgId=1&to=1772224430000[Grafana] RESOLVED: DatasourceNoData https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772220380000&orgId=1&to=1772224220000 [20:34:50] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [20:34:59] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [20:35:09] PROBLEM - prometheus151 ferm_active on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [20:36:33] @cosmicalpha ^ [20:36:42] PROBLEM - prometheus151 conntrack_table_size on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [20:37:05] RECOVERY - prometheus151 ferm_active on prometheus151 is OK: OK ferm input default policy is set [20:37:07] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [20:37:07] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 11 minutes ago with 0 failures [20:37:47] Looking more now. [20:38:29] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.289 seconds response time. prometheus151.fsslc.wtnet returns 10.0.15.116 [20:38:34] ok it OOM'd [20:38:40] RECOVERY - prometheus151 conntrack_table_size on prometheus151 is OK: OK: nf_conntrack is 0 % full [20:39:01] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 1.68, 0.53, 0.19 [20:39:37] [Grafana] FIRING: The mediawiki job queue has more than 500,000 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772182910000&orgId=1&to=1772224777128[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772221130000&orgId=1&to=1772224730000[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772220890000&orgId=1&to=1772224730000[Grafana] FIRING: [20:39:37] ly high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772224777129[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772224777129[Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs a minute over an extended time period https://grafana.wikiti [20:39:37] txbP1Xnk?from=1772220530000&orgId=1&to=1772224777129[Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs a minute over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772221100000&orgId=1&to=1772224777129 [20:39:49] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_10.0p2 (protocol 2.0) [20:42:45] PROBLEM - llm191 Puppet on llm191 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[openwebui] [20:44:06] PROBLEM - prometheus151 SSH on prometheus151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:44:37] [Grafana] FIRING: The mediawiki job queue has more than 500,000 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772182910000&orgId=1&to=1772225077129[Grafana] RESOLVED: PHP-FPM Worker Usage High https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772221130000&orgId=1&to=1772224730000[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772221130000&orgId=1&to=1772224730000[Grafana] FIRING: [20:44:37] ly high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772225077129[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772225077129[Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772221220000&orgId=1&to=1772225030000[Grafana] RESOLVED: Datas [20:44:37] a https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772221100000&orgId=1&to=1772224910000 [20:44:58] PROBLEM - prometheus151 PowerDNS Recursor on prometheus151 is CRITICAL: CRITICAL - Plugin timed out while executing system call [20:45:50] PROBLEM - prometheus151 Current Load on prometheus151 is CRITICAL: LOAD CRITICAL - total load average: 21.30, 12.43, 5.46 [20:45:59] PROBLEM - prometheus151 APT on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [20:46:00] PROBLEM - prometheus151 Puppet on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [20:47:47] PROBLEM - prometheus151 conntrack_table_size on prometheus151 is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds. [20:49:37] [Grafana] FIRING: The mediawiki job queue has more than 500,000 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772182910000&orgId=1&to=1772225377130[Grafana] FIRING: There has been a rise in the MediaWiki exception rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772221610000&orgId=1&to=1772225377130[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/Gt [20:49:37] om=1772179860000&orgId=1&to=1772225377130[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772225377130[Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772221220000&orgId=1&to=1772225300000 [20:51:42] https://github.com/wikimedia/operations-puppet/commit/922eb63c9ba61208f5598f886802be7be68841b2 - maybe this? but idk [20:51:42] [GitHub] [wikimedia/operations-puppet] david-caro: prometheus: add memorymax parameter… | 5 changes in 3 files | Authored 5 months, 9 days ago | Committed 5 months, 9 days ago [20:51:50] RECOVERY - prometheus151 Current Load on prometheus151 is OK: LOAD OK - total load average: 0.56, 0.12, 0.04 [20:52:11] RECOVERY - prometheus151 SSH on prometheus151 is OK: SSH OK - OpenSSH_10.0p2 (protocol 2.0) [20:52:44] RECOVERY - prometheus151 conntrack_table_size on prometheus151 is OK: OK: nf_conntrack is 0 % full [20:53:14] RECOVERY - prometheus151 PowerDNS Recursor on prometheus151 is OK: DNS OK: 0.085 seconds response time. prometheus151.fsslc.wtnet returns 10.0.15.116 [20:53:31] RECOVERY - prometheus151 APT on prometheus151 is OK: APT OK: 1 packages available for upgrade (0 critical updates). [20:53:31] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 27 minutes ago with 0 failures [20:54:37] [Grafana] FIRING: The mediawiki job queue has more than 500,000 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772182910000&orgId=1&to=1772225677130[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772221610000&orgId=1&to=1772225570000[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000& [20:54:37] =1772225677131[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772225677131[Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs a minute over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772221220000&orgId=1&to=1772225677131[Grafana] FIRING: The mediawiki JobQueue backlog is increasing by mo [20:54:37] 0 jobs a minute over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772221940000&orgId=1&to=1772225677131 [20:57:32] PROBLEM - prometheus151 Puppet on prometheus151 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 3 minutes ago with 0 failures [20:59:11] i added the max mem 90% temp to prometheus (local hack) to see how that helps but it's trying to use all the ram @cosmicalpha [20:59:37] [Grafana] FIRING: The mediawiki job queue has more than 500,000 unclaimed jobs https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772182910000&orgId=1&to=1772225977131[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772222090000&orgId=1&to=1772225690000[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000& [20:59:37] =1772225977131[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772225977131[Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772221220000&orgId=1&to=1772225750000[Grafana] RESOLVED: DatasourceNoData https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772221940000&orgId=1&to=1772225690000 [21:04:37] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772182910000&orgId=1&to=1772226110000[Grafana] RESOLVED: MediaWiki Exception Rate https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772222090000&orgId=1&to=1772225690000[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772226277132[Grafana] FI [21:04:37] diaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772226277132 [21:06:13] [02puppet] 07paladox created 03paladox-patch-1 (+1 new commit) 13https://github.com/miraheze/puppet/commit/bc647959b3ad [21:06:13] 02puppet/03paladox-patch-1 07paladox 03bc64795 prometheus: Set MemoryMax to 90% in systemd service file [21:06:16] [02puppet] 07paladox opened pull request #4801: prometheus: Set MemoryMax to 90% in systemd service file (03main...03paladox-patch-1) 13https://github.com/miraheze/puppet/pull/4801 [21:10:33] [02puppet] 07paladox merged pull request #4801: prometheus: Set MemoryMax to 90% in systemd service file (03main...03paladox-patch-1) 13https://github.com/miraheze/puppet/pull/4801 [21:10:33] [02puppet] 07paladox pushed 1 new commit to 03main 13https://github.com/miraheze/puppet/commit/6c24f9e691402ec4b9f53b1393334a4917548738 [21:10:33] 02puppet/03main 07paladox 036c24f9e prometheus: Set MemoryMax to 90% in systemd service file (#4801) [21:10:34] [02puppet] 07paladox 04deleted 03paladox-patch-1 at 03bc64795 13https://api.github.com/repos/miraheze/puppet/commit/bc64795 [21:10:44] for now i've done https://github.com/miraheze/puppet/pull/4801 @cosmicalpha [21:10:44] [GitHub] [miraheze/puppet #4801] merged PR by paladox, created 4 minutes, 30 seconds ago: prometheus: Set MemoryMax to 90% in systemd service file | (empty comment) [21:10:53] RECOVERY - llm191 Puppet on llm191 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:10:54] may want to see why it's trying to use all the RAM [21:11:02] and then revert this if you find the fix [21:11:26] RECOVERY - prometheus151 Puppet on prometheus151 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:13:48] yep works with me. Still looking into it. [21:14:37] [Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772226877133[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772226877133[Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs a minute over an extended time p [21:14:37] s://grafana.wikitide.net/d/GtxbP1Xnk?from=1772223200000&orgId=1&to=1772226877133 [21:19:17] hmm i wonder if it causes grafana webpage not to load if it uses all the max mem we give to it @cosmicalpha [21:19:28] it's currently just stalled loading and it says it has 22mb left [21:19:37] [Grafana] RESOLVED: High Job Queue Backlog https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772223410000&orgId=1&to=1772227010000[Grafana] FIRING: An unusually high number of threats are being reported by CloudFlare! https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772227177133[Grafana] FIRING: A MediaWiki pool is sick according to CloudFlare https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=177 [21:19:37] [Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772223200000&orgId=1&to=1772226920000 [21:20:02] If prometheus151 is unresponsive it can but not if just OOM I think? [21:20:12] seems to have been [21:20:21] no high load, no OOM's [21:20:47] and like when it's a OOM the page loads, but shows a refresh icon spining. This is just a white page and stalled loading [21:21:52] PROBLEM - cp191 Varnish Backends on cp191 is CRITICAL: 1 backends are down. mw173 [21:22:10] PROBLEM - mw173 SSH on mw173 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:22:39] PROBLEM - cp171 Varnish Backends on cp171 is CRITICAL: 1 backends are down. mw173 [21:23:24] PROBLEM - mw173 Current Load on mw173 is WARNING: LOAD WARNING - total load average: 20.08, 22.35, 12.58 [21:23:52] RECOVERY - cp191 Varnish Backends on cp191 is OK: All 31 backends are healthy [21:24:09] RECOVERY - mw173 SSH on mw173 is OK: SSH OK - OpenSSH_10.0p2 (protocol 2.0) [21:24:39] RECOVERY - cp171 Varnish Backends on cp171 is OK: All 31 backends are healthy [21:25:24] RECOVERY - mw173 Current Load on mw173 is OK: LOAD OK - total load average: 5.23, 15.96, 11.43 [21:34:37] [Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772227860000[Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772179860000&orgId=1&to=1772227860000 [21:38:37] [02ManageWiki] 07lihaohong6 opened pull request #774: T14616: Index the canonical name of extensions (07miraheze:03main...07lihaohong6:03index) 13https://github.com/miraheze/ManageWiki/pull/774 [21:44:46] miraheze/ManageWiki - lihaohong6 the build passed. [21:47:39] PROBLEM - puppet181 Puppet on puppet181 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki-patches-private] [21:49:10] miraheze/ManageWiki - lihaohong6 the build passed. [22:00:37] [02ssl] 07WikiTideBot pushed 1 new commit to 03main 13https://github.com/miraheze/ssl/commit/784d5b4b32accaae0c67b15424c5c970b6a494a5 [22:00:37] 02ssl/03main 07WikiTideBot 03784d5b4 Bot: Auto-update domain lists [22:13:39] RECOVERY - puppet181 Puppet on puppet181 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [22:34:01] !log [skye@mwtask171] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php importImages --wiki=gunvoltwiki --sleep=1 '--comment=Importing images from https://azurestrikergunvolt.fandom.com/wiki/Azure_Striker_Gunvolt_Wiki ([[phorge:T15031|T15031]])' -- . (END - exit=0) [22:34:02] !log [skye@mwtask171] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php initSiteStats --wiki=gunvoltwiki --update (START) [22:34:03] !log [skye@mwtask171] sudo -u www-data php /srv/mediawiki/1.45/maintenance/run.php initSiteStats --wiki=gunvoltwiki --update (END - exit=0) [22:34:03] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:34:04] !log [skye@mwtask171] Finished import for gunvoltwiki (XML: None; Images: .) (END - exit=0) [22:34:05] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:34:07] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [22:34:09] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log [23:18:29] [Grafana] FIRING: The mediawiki JobQueue backlog is increasing by more than 100 jobs a minute over an extended time period https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772230670000&orgId=1&to=1772234309162 [23:23:29] [Grafana] RESOLVED: DatasourceError https://grafana.wikitide.net/d/GtxbP1Xnk?from=1772230670000&orgId=1&to=1772234360000 [23:53:16] PROBLEM - cp201 Varnish Backends on cp201 is CRITICAL: 1 backends are down. mw183 [23:53:37] [02puppet] 07pskyechology drafted pull request #4802: T15032: Add WWR to bastion and mediawiki-test-admins (07miraheze:03main...07pskyechology:03T15032) 13https://github.com/miraheze/puppet/pull/4802 [23:53:52] PROBLEM - cp191 Varnish Backends on cp191 is CRITICAL: 1 backends are down. mw183 [23:54:04] PROBLEM - mw183 Current Load on mw183 is CRITICAL: LOAD CRITICAL - total load average: 27.90, 21.65, 11.01 [23:55:11] RECOVERY - cp201 Varnish Backends on cp201 is OK: All 31 backends are healthy [23:55:52] RECOVERY - cp191 Varnish Backends on cp191 is OK: All 31 backends are healthy [23:56:02] RECOVERY - mw183 Current Load on mw183 is OK: LOAD OK - total load average: 6.98, 15.71, 10.07