[00:36:47] notconfusing: hey! [00:36:48] aroudn? [00:43:37] yuvipanda yep [00:43:55] notconfusing: which webservice didn't come back up? also the jobs, were they continuous? [00:44:01] continuous jobs get rescheduled [00:44:15] ok, but webservice start didn't come back up [00:44:40] good to know about continuous jobs though [00:45:04] actually one of the things i have is that in the webservice page i have a button which can submit the job to queue [00:45:16] so that users can restart the bot if they need to [00:45:28] so the most important thing is webservice [00:48:29] notconfusing: whwich webservice is this? [00:48:34] notconfusing: they should've automatically come up [00:48:46] lighttpd [00:48:55] notconfusing: no, name of the tool [00:49:00] recitation-bot [00:49:50] yuvipanda|maybe, its also totally possible that these going down is incidental to the last outtage [00:51:13] notconfusing: did you start it back up manually? [00:53:37] yuvipanda|maybe, yes [00:54:02] notconfusing: hmm, so normally that shouldn't happen. I'm going to kill your webgrid manually, it should be back up in 10s, let me check [00:54:30] and it just did [01:01:21] yuvipanda|maybe ok, mysterious... something else must have happened [01:02:00] notconfusing: yeah :| [01:02:40] what happens if the entire system thats powering labs and the SGE goes down? [01:02:47] has that ever happened? [01:05:06] yeah, SGE keeps state in the file system [01:05:09] so it should come back up too [01:05:42] notconfusing: it's possible that yesterday's outage took down the systems that keep the webservices restarting and you restarted it manually before they could kick in? [01:16:18] yuvipand_ ok that must have just been it then [01:18:05] notconfusing: yeah. [01:18:26] notconfusing: sorry about that - they were redundant but require manual failover which didn't happen this time, I think [01:18:48] yuvipand_ thanks for figuring it out with me [01:19:04] notconfusing: yw! also in the future, the labs-l list is probably going to help more than wikitech-l [15:12:11] 10Quarry, 6Analytics-Kanban: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1296360 (10Milimetric) a:5Milimetric>3None