[00:01:10] Hi, can help me? [00:01:17] White_Master: with? [00:02:35] A problem with commands on WinSCP in tools-login [00:02:43] (Sorry for my bad english) [00:02:54] Ryan_Lane, ^ [00:04:54] well, I don't really know winscp, but maybe I can help. which commands? [00:05:24] maintainer@tools-login:~$ become toolaccount [00:05:41] Error: -bash: line 6: maintainer@tools-login:~$: command not found [00:05:56] maintainer@tools-login:~$ [00:05:57] ^^ [00:06:01] that's not a command [00:06:37] Ryan_Lane, huh. [00:06:48] it's an example of a user's prompt (user maintainer, on tools-login, in current directory ~ which is the home directory) [00:07:12] the $ says this is a normal user, rather than root [00:07:30] become is the command [00:07:53] toolaccount is a placeholder for your tool account [00:08:26] so, if you made a tool user called catscan, you'd run: become catscan [00:09:50] Oh, perfect. [00:10:46] Ryan_Lane, However, when trying to run a command, the terminal does not respond and finally jumps an error message indicated that the server is not responding [00:12:42] White_Master: you're using putty, right? you don't want to use winscp for this [00:12:49] winscp is for copying files [00:25:07] Ryan_Lane, for this: Disconnected: No supported authentication methods available (server sent: publickey) [00:26:11] did you add your public ssh key to your wikitech preferences? [00:27:40] Ryan_Lane, yes. [00:42:13] !ping [00:42:13] !pong [00:43:38] !ping [00:43:38] !pong [00:44:01] labs-morebots: ... [00:44:01] I am a logbot running on tools-exec-06. [00:44:01] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [00:44:01] To log a message, type !log . [00:44:06] * Ryan_Lane sighs [00:46:09] !ping [00:46:10] !pong [00:46:15] \o/ [00:47:05] Ryan_Lane, couldn't connect to "tools-login.wmflabs.org" reason: "Connection timed out (110)" [00:47:07] Pff. [01:01:17] Coren, ping [01:56:26] Any can help me? [02:35:54] I've just been testing the new account in tools [02:35:54] An IRC bot, then to try and "get out" of the terminal, the bot does not stay connected. there a way to keep active bots/tools that are configured in tools? [03:20:47] Coren: not reading the scrollback now but maybe you want to look at docker [03:21:23] White_Master: sounds like you need nohup. and maybe a basic unix tutorial. sorry i can't give you one right now [06:00:31] Is there any easy trick to debug HTTP/500's with cgi scripts? There seems to be no general error log available... [06:00:53] valhallasw: if it's python, import cgitb; cgitb.enable() [06:01:07] legoktm: let me try that... [06:02:31] [bz] (8UNCONFIRMED - created by: 2bgwhite, priority: 4Unprioritized - 6critical) [Bug 55498] Webserver is down - https://bugzilla.wikimedia.org/show_bug.cgi?id=55498 [06:02:55] valhallasw: also that ^ maybe [06:03:02] -_-' [06:03:14] OK, good. [06:04:35] At least my app works using the built-in flask webserver [06:04:38] result: https://gerrit.wikimedia.org/r/#/c/88688/ [06:04:54] web-based patch uploader \o/ [06:04:54] ooh :D [06:07:57] https://github.com/valhallasw/gerrit-patch-uploader/blob/master/app.py [06:08:09] it's not going to win any prizes for beauty, soon, but hey, it works. [06:13:45] and it should be available at http://tools.wmflabs.org/gerrit-reviewer-bot/patchuploader/index.py once the web server clears. Won't be the final URL, though - I should make a new project for it. [09:30:14] [bz] (8PATCH_TO_REVIEW - created by: 2Antoine "hashar" Musso, priority: 4Unprioritized - 6normal) [Bug 38524] [OPS] puppet has undefined function get_var - https://bugzilla.wikimedia.org/show_bug.cgi?id=38524 [10:17:22] @replag [10:17:23] Replication lag is approximately 00:00:01.3443900 [10:31:53] [bz] (8UNCONFIRMED - created by: 2bgwhite, priority: 4Unprioritized - 6critical) [Bug 55498] Webserver is down - https://bugzilla.wikimedia.org/show_bug.cgi?id=55498 [11:45:41] Tools webserver is giving Error 500 on php-sites again. [11:48:21] [Bug 55498] https://bugzilla.wikimedia.org/show_bug.cgi?id=55498 [12:17:24] legoktm: remmber my request? [13:29:24] petan: around? [13:29:32] yep [13:29:39] any chance you can fix webserver-01 [13:29:45] let me check [13:29:54] or at least switch everything that is using it onto the other 2 webservers? [13:30:05] -01 seems to be the only one throwing error 500 [13:30:48] it doesn't to me :/ [13:31:35] interesting... It was before my meeting :P But isn't any more [13:32:04] aha [13:32:56] I was just looking into this. Logs show there was a burst of activity with glamtools that lasted a couple of hours. [13:33:18] Some of the glamtools processes were up for, literally, hours. [13:33:33] ahh Coren your here also :> [13:33:58] I wonder if there would be a way to slightly automate the distribution of load across the webservers [13:35:23] I just got here. [13:36:07] addshore: That'd just make the problem worse, IMO, when a tool is misbehaving. I'm looking into partitionning resources per tool; probably with vhosts. [13:37:58] hmm indeed [13:38:44] That's going to require a *lot* of manual tweaking over time though. [13:39:00] (Which is why I really didn't favor that option at first) [13:40:14] That, or Ryan suggested per-tool webservers in containers (which would work great), but that'll need Yuvi's proxy to work right. [13:42:38] [bz] (8ASSIGNED - created by: 2Beta16, priority: 4Unprioritized - 6normal) [Bug 54962] Missing or wrong information in meta_p.wiki table - https://bugzilla.wikimedia.org/show_bug.cgi?id=54962 [15:08:46] WHOOOOOO [15:08:47] addshore: ok, http://tools.wmflabs.org/gerrit-reviewer-bot/patchuploader/ is working now :-) [15:08:52] https://gerrit.wikimedia.org/r/#/c/88749/ [15:08:54] *victory dance* [15:10:19] it should add the commiter as reviewer, though [15:12:00] +t [15:32:14] xD [15:40:27] [bz] (8RESOLVED - created by: 2Antoine "hashar" Musso, priority: 4Unprioritized - 6normal) [Bug 38524] [OPS] puppet has undefined function get_var - https://bugzilla.wikimedia.org/show_bug.cgi?id=38524 [15:46:35] Oh! I have a *much* easier solution I hadn't even considered! [15:46:48] Kill *all* the things? [15:46:49] Every CGI invokation goes through the one script; I can choke there. [15:47:16] per-uid, and put ulimits. [15:47:50] That might help. Do you have any idea why the toolserver did not encounter this issue? [15:48:16] I think it uses fcgi for most things, but it should not be too different otherwise [15:48:38] I'm not privy to their configuration; I expect they put in place some limits as well, one way or another. Chances are, they ran in exactly the same problem themselves in the elder days. [15:50:41] IIRC processes on the Toolserver were limited to 1G of memory, for instance, to prevent runaway memory usage [15:51:33] they had a daemon killing processes that went over [15:51:56] Yeah, I got something similar for the grid, it's really just the httpd that's an issue right now. [15:52:25] Any tool can consume all the worker threads and hold them, causing the others to starve and 500 out. [15:52:42] Apache isn't all that friendly to per-tenant worker thread allocation. [15:53:10] (There's a module for it, but it's not part of core and doesn't seem all that well-supported) [15:53:37] Nor all that flexible. [15:55:00] Another alternative is to use a light httpd in a container, per tool. [15:56:56] * Coren ponders. [15:57:36] Actually, I don't even need containers. Hm. [15:57:40] I haz a inspiration! [15:57:49] Coren: I have seen hosts do lighttpd + nginx as reverse proxy. It has the added advantage (or disadvantage, depending on your POV) users can change the server settings [15:58:28] I know *exactly* what I'm going to do. [15:58:36] And this gives us FCGI for free. [16:03:12] The only trickery will be making it backward compatible. [16:04:08] And getting people to use the new fancy method and not the then-deprecated method ;-) [16:04:30] No, that's what I want to avoid; it's going to be automagic. [16:05:14] Wikimedia Labs | https://www.mediawiki.org/wiki/Wikimedia_Labs | Channel logs: https://bit.ly/11GZvbS | #wikimedia-labs-nagios | #wikimedia-labs-requests | #wikimedia-labs-offtopic | bots-bsql01 server is going to be deleted | Help: https://bit.ly/15vsyRj. [16:05:26] 24. 9. 2013 is over [16:06:43] Ooo. And that gives users their error logs to boot. [16:07:07] * Coren invents a time machine, hoping to have done this that way in the first place. [16:09:40] I simply start a suitably limited lighttpd per-tool on the grid (with some avoidance of empty public_htmls for efficiency), and use the proxy to point at the right place. Since I start the httpd myself, I automatically know for sure where it is and what its port is. [16:11:03] The httpd itself runs with suitable ulimits, and as the tool user; so I can actually allow users to tweak its config if they want. [16:13:42] I'm going to make it strictly elective at first; then deprecate the apaches and make a fully automatic system for the stragglers with non-empty public_htmls [16:15:13] That is SO much better as a model. I wish I had thought of it first. [16:21:08] <^d> :) [16:28:55] bots-bsql01 server = deleted? [16:33:26] Coren^^ [16:33:52] Steinsplitter: I know it was planned. The right person to ask for that is petan [16:34:21] ok [16:34:36] Ping: Petan (see above) [18:32:33] [bz] (8NEW - created by: 2Antoine "hashar" Musso, priority: 4Highest - 6major) [Bug 48501] [OPS] beta: get SSL certificates - https://bugzilla.wikimedia.org/show_bug.cgi?id=48501 [18:43:34] ok, it's possible with some logging fiddling [18:43:44] and now also documented: https://wikitech.wikimedia.org/wiki/Setting_up_Flask_cgi_app_as_a_tool#Debugging [18:45:22] I really wish we had a logging service [19:57:50] hey andrewbogott. [19:58:01] just came back online temperoarily. I'll probably be out for a bit more :( [19:58:04] YuviPanda: step away from the keyboard! [19:58:08] OK, glad to hear :) [19:58:08] sorry! [19:58:12] heh yea [19:58:17] typing in one hand [19:58:22] so not too bad [19:58:23] I mean, not glad that you're injured, but glad you're taking time [19:58:31] heh yeah [19:58:53] andrewbogott: doc's pretty sure it is CTS. Tests tomorrow to confirm [19:58:59] Did your days of rest help at all? [19:59:04] Oh, damn. [19:59:40] andrewbogott: on painkillers and sucj [20:00:22] *such [20:01:29] YuviPanda: https://www.youtube.com/watch?v=qm3HQsEQxa8#t=0m20s [20:01:58] That's what I assume the CTS test looks like [20:02:29] andrewbogott: heh. Electro something [20:02:31] forgot its name [20:39:07] Oooo. This new scheme is going 20 bazillion times better. [21:14:20] * Coren dances around the channel. [21:20:10] Coren, is everything fixed? Can we all take the next eight months off? [21:20:32] No, but my new scheme is working so well I'm unreasonably pleased with myself. :-0 [21:21:20] * andrewbogott has high expectations [21:36:55] * Nettrom sends Coren some virtual champagne for celebrations [22:57:48] Ryan_Lane: I just went through and cleared out all the outstanding shell requests and noticed that there were three users with requests that don't actually exist -- any ideas on how that would happen? [23:02:30] mwalker: yep [23:02:44] mwalker: there's an issue with the shell request creation [23:03:03] a user can fail to be created (like spam bots), but the shell request will still occur [23:25:56] Hey guys, how do I add someone to a project? [23:31:28] I figured it out, nvm, thanks [23:32:25] Yeay! I haz a success!! [23:32:47] So: new scheme for web services that is over 9000!! better. [23:32:59] (opt-in atm, will eventually deprecate the current scheme) [23:33:38] If you have a nonempty public_html, it creates a job on a special queue that starts a lighttpd just for you. :-) [23:37:26] Bonus: fcgi for free. :-) [23:41:42] Ryan_Lane: I expected some enthusiasm. ^^ [23:41:51] Extra bonus: users get their error log. :-) [23:42:42] Booo! Nobody loves my incredidble super mega improvement. QQ