[01:20:18] Majavah to look at the logs for the beta cluster, I previously used "$ ssh deployment-fluorine02.deployment-prep.eqiad.wmflabs" but I understand you've been working on changing the logs for beta? That command no longer works for me - is there an updated url I can use to see the logs? [01:33:29] thats not a uri, it's a hostname [01:33:40] /FQDN [01:33:51] deployment-mwlog01 [01:34:23] s/uri/url/ [02:11:49] (03PS1) 1020after4: Fix a few typescript warnings. [releng/phatality] - 10https://gerrit.wikimedia.org/r/671367 [02:38:50] 10Continuous-Integration-Config, 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO, 10MediaWiki-Core-Testing, and 6 others: Reduce runtime of MW shared gate Jenkins jobs to 5 min - https://phabricator.wikimedia.org/T225730 (10DannyS712) I forgot about this task when... [05:02:30] DannyS712: use `ssh deployment-mwlog01.deployment-prep.eqiad1.wikimedia.cloud`, and if it doesn't work make sure your .ssh/config is configured for .wikimedia.cloud FQDNs using the wikitech tutorial to access cloud vps machines [12:37:41] 10Beta-Cluster-Infrastructure: Beta SWIFT seems to be broken - https://phabricator.wikimedia.org/T276179 (10AlexisJazz) Trying to view https://upload.beta.wmflabs.org/wikipedia/en/f/fb/Green_Park_tube_station.jpeg using https://hidester.com/proxy/ results in a form: >401 >Authorization Required >The site https:... [17:00:28] has someone around here broken gerrit [17:01:31] See -operations, Ree.dy knows [17:02:39] PROBLEM - Gerrit Health Check on gerrit.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://gerrit.wikimedia.org/r/config/server/healthcheck%7Estatus [17:02:45] PROBLEM - Gerrit JSON on gerrit.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Gerrit%23Monitoring [17:03:05] slow icinga-wm is slow [17:03:32] 10Gerrit, 10Release-Engineering-Team: Gerrit Apache out of workers - https://phabricator.wikimedia.org/T277127 (10Majavah) This is happening again :-( [17:03:54] 10Gerrit, 10Release-Engineering-Team: Gerrit Apache out of workers - https://phabricator.wikimedia.org/T277127 (10Majavah) [17:05:21] from that task: "I have notice there is some bot from WMCS which hammers gitiles as fast as it can. The requests are using format=TEXT which emits some base64 payload" :( [17:05:23] hmm [17:05:30] gerrit down it seems [17:05:41] twentyafterfour: yeah, we noticed before icinga :P [17:05:53] nvm my last comment, just cherry picking things to read, seems to have other causes as well [17:06:10] * twentyafterfour will take a look [17:06:18] thanks! [17:07:06] not overloaded, the machine looks idle [17:07:37] twentyafterfour: https://grafana.wikimedia.org/d/L0-l1o0Mz/apache?orgId=1&refresh=1m&var-host=gerrit1001&var-port=9117 and https://phabricator.wikimedia.org/T277127 are probably related [17:07:49] that dashboard says that apache2 is running out of workers [17:10:08] !log restart apache on gerrit1001 [17:10:11] RECOVERY - Gerrit JSON on gerrit.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 21661 bytes in 0.040 second response time https://wikitech.wikimedia.org/wiki/Gerrit%23Monitoring [17:10:11] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:11:05] that isn't really a fix but maybe it'll keep things up for now [17:12:09] thanks for the links Majavahthat helped a lot :) [17:12:35] RECOVERY - Gerrit Health Check on gerrit.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 904 bytes in 0.028 second response time https://gerrit.wikimedia.org/r/config/server/healthcheck%7Estatus [17:19:03] 10Gerrit, 10Release-Engineering-Team: Gerrit Apache out of workers - https://phabricator.wikimedia.org/T277127 (10mmodell) Restarting `apache2` service on `gerrit1001` resolved the immediate problem, however, it's likely to recur if we don't address the root cause. I might be able to do something to reduce th... [22:02:59] 10phabricator maintenance bot, 10User-Ladsgroup: Remove #Patch-For-Review when patch is abandoned in Gerrit - https://phabricator.wikimedia.org/T276390 (10Ladsgroup) [22:07:28] 10Phabricator, 10Project-Admins, 10phabricator maintenance bot (legacy), 10User-Ladsgroup: #phabricator_maintenance_bot should be under #tools, not #toolforge - https://phabricator.wikimedia.org/T273273 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup I made a new project and archived the old one. I mov... [22:13:11] 10phabricator maintenance bot: Add a VPS-project-Wikistats tasks when creating a new project - https://phabricator.wikimedia.org/T270547 (10Ladsgroup) [22:15:11] 10Phabricator, 10Project-Admins, 10phabricator maintenance bot (legacy), 10User-Ladsgroup: #phabricator_maintenance_bot should be under #tools, not #toolforge - https://phabricator.wikimedia.org/T273273 (10bd808) Thank you @Ladsgroup [22:33:40] 10Gerrit, 10Release-Engineering-Team: Gerrit Apache out of workers - https://phabricator.wikimedia.org/T277127 (10hashar) Thanks for the restart @mmodell! The Apache view in Grafana: {F34158565} [22:43:31] 10Gerrit, 10Release-Engineering-Team: Gerrit Apache out of workers - https://phabricator.wikimedia.org/T277127 (10mmodell) A possible cause or contributing factor for the extra high load on gerrit could be this: Periodically (maybe a couple of times per year?) Phabricator gets hit by a spider that doesn't hav... [23:31:55] 10Gerrit, 10Release-Engineering-Team: Gerrit Apache out of workers - https://phabricator.wikimedia.org/T277127 (10Legoktm) >>! In T277127#6904496, @hashar wrote: > I can't tell what went wrong. I have notice there is some bot from WMCS which hammers gitiles as fast as it can. The requests are using `format=TE... [23:43:16] 10Release-Engineering-Team (Unit & Int & System Tooling), 10Pywikibot: Lint test to match function signature and documentation - https://phabricator.wikimedia.org/T277396 (10Huji)