[09:00:14] !log urbanecm@tools-bastion-13 tools.sal $ mv service.template service.template.bak # seemingly invalid file, argument ACTION: invalid choice [09:00:23] !log urbanecm@tools-bastion-13 tools.sal $ webservice restart # tool was down [09:00:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.sal/SAL [09:00:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.sal/SAL [10:58:06] urbanecm: uh oh, that sounds like potential fallout from https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/59 😬 could you create a phabricator task with the error you saw? [10:58:13] (re sal tool’s service.template) [11:01:26] lucaswerkmeister: ^ are you looking into that? I can give it a lookt if you don't have time [11:01:29] *look [11:08:22] not looking at the moment, no [11:09:43] ack, I'll give it a try then [11:14:36] dcaro: lucaswerkmeister : filled T379903. it was not a _permission_ error though, just a syntax one [11:14:38] T379903: webservice restart suddenly stopped working - https://phabricator.wikimedia.org/T379903 [11:16:14] fwiw, that file did not get recreated [11:16:59] thanks, yep, that file is created by hand only [11:17:57] it sets the "defaults" when running `toolforge webservice` so you don't have to pass the options every time (like `--health-check-path`) [11:18:16] ah, gotcha. so i basically deleted the health check, good to know [11:18:50] the health check didn’t really work properly anyway so not much of value was lost there I’d say [11:18:55] (though I never figured out why it didn’t work) [11:19:27] it actually is still there, as restart does not reset that option [11:19:32] https://www.irccloud.com/pastebin/IsVlHtk3/ [11:19:55] restart uses the `service.manifest` stored data (that one is created automatically) [11:20:22] (it's a rather clever setup) [15:15:21] !log lucaswerkmeister@tools-bastion-13 tools.codex-playground deployed 4918453fd3 (Codex 1.16.0) [15:15:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.codex-playground/SAL [16:41:42] !log bd808@tools-bastion-12 tools.sal Restored service.template; changed health check to hit a PHP page [16:41:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.sal/SAL [16:54:53] ohhh [16:54:55] that might help, yeah… [16:59:26] @lucaswerkmeister: my hunch is that most lockups of any lighttpd+php webservice are at some level caused by the php-fcgi process running out of threads or lighttpd losing contact with the process. [16:59:40] ok, so toolinfo.json could still have been served… [16:59:46] I didn’t consider that, good point [16:59:55] I just wanted to keep the health check lightweight ^^ [17:00:25] I have it hitting a page that is just ` hehe, nice [17:00:50] still better than hitting elasticsearch for one of the real pages [17:02:38] yeah, a full end-to-end test as a health check would probably cause more harm than good