[00:03:13] Argh. [00:15:23] Damn it broke. [00:15:27] Hm. [00:15:37] * SigmaWP prods Damianz for help >_> [00:17:31] Define broke [00:19:11] 2014-02-09 00:15:05: (mod_fastcgi.c.3356) response not received, request sent: 912 on socket: unix:/tmp/fastcgi.python.socket-0 for /sigma/foo.py?u=1&j=2, closing connection [00:20:04] What does foo.py look like? [00:20:58] http://pastebin.com/A7keVSKZ [00:21:19] For now, I have reverted to cgi [00:26:10] Question: how long does it usually take for new projects to be approved? [00:56:39] (03PS1) 10Adamw: Add a target channel for Education Program notifications [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/112308 [01:07:41] (03CR) 10Legoktm: Add a target channel for Education Program notifications (031 comment) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/112308 (owner: 10Adamw) [01:08:09] Thanks, everyone [02:11:23] why is it that when i taif -f the output of my bot on labs after jsubb'ing it, that the lines are often only half printed [02:12:35] like however jsub wrtite the outputt just ocassionally does a chunk of bytes without taking into consideration the lines [02:12:53] im printing to stdout form pythout, and then lines are delviered chopped [02:44:29] (03CR) 10Yuvipanda: [C: 031] "Someone more sober than me should merge and deploy :)" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/112311 (owner: 10Adamw) [04:26:06] (03CR) 10Legoktm: [C: 032] Add a target channel for Education Program notifications [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/112308 (owner: 10Adamw) [04:26:18] sigh. [04:26:19] (03CR) 10Legoktm: [C: 032] Add a target channel for Education Program notifications [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/112308 (owner: 10Adamw) [04:27:29] !log tools rebooting grrrit-wm - https://gerrit.wikimedia.org/r/#/c/112308 [04:27:31] Logged the message, Master [10:59:46] (03PS5) 10Tim Landscheidt: Simple tool to simplify using the backup snapshots [labs/toollabs] - 10https://gerrit.wikimedia.org/r/76313 (owner: 10Platonides) [11:11:47] (03PS6) 10Tim Landscheidt: Simple tool to simplify using the backup snapshots [labs/toollabs] - 10https://gerrit.wikimedia.org/r/76313 (owner: 10Platonides) [11:12:31] (03CR) 10Tim Landscheidt: "IMHO ready for merge." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/76313 (owner: 10Platonides) [12:00:34] Does anyone know how can I determine that a certain database has been migrated from Toolserver to WMFLabs? I'm specifically interested in the u_kolossos database from the ptolemy server [12:02:35] cff: Databases have not been migrated in general. [12:02:50] cff: You have to ask Kolossos for that. [12:07:58] scfc_de: ok, thanks, I'll try to contact him [15:12:51] the tools webserver seems non-responsive. [15:13:05] valhallasw: indeed [15:14:39] Hi [15:14:48] Is something wrong? [15:15:40] I can't log into any Tools instance? [15:15:56] (same here) [15:16:22] Reedy any idea? [15:16:25] The remove host is down or non responsive: ssh_exchange_identification: Connection closed by remote host [15:16:31] cff: same here [15:17:06] andrewbogott_afk: All Tools instances seem to be stuck after "If you are having access problems" and the webserver is down as well. [15:18:38] toolsbeta-login.pmtpa.wmflabs is stuck as well, but I can log into bastion{,2,3}.wmflabs.org. [15:21:00] scfc_de: whom can we poke about this? [15:22:22] benestar: Hi. [15:22:32] Do you know why Labs broke? [15:22:54] I would mail andrewbogott_afk and Coren. [15:23:06] incognitus: Could you do that? [15:23:07] incognitus: hey :) [15:23:12] I have no idea :/ [15:24:15] maybe petan knows something? [15:26:09] scfc_de: emailed them both, can you do labs-l ? [15:27:32] incognitus: Unfortunately, I broke my mailer an hour ago and am still picking up the pieces :-). So I'd rather not; could you do that as well? [15:27:46] Okay. [15:33:33] http://lists.wikimedia.org/pipermail/labs-l/2014-February/002077.html scfc_de [15:34:25] incognitus: Thanks! [15:35:11] incognitus: does tools-dev work? [15:35:25] benestar: idk, can you check? [15:38:02] anyone called coren? [15:38:10] incognitus: doesn't work either [15:38:27] :/ [15:45:33] Betacommand: any idea why Labs is down? [15:45:42] i.e., what's causing the outage [15:45:46] incognitus: No [15:46:09] I would suspect a file system outage given the previous issues [15:46:10] I think it's only Tool Labs actually [15:46:29] incognitus: its also some of the exc nodes in tools [15:56:17] ... [15:56:51] Is there a status page for the tools operations? [15:57:19] cff: there might be, but it's probably on tools ... [15:57:41] http://status.wikimedia.org/ [15:57:44] that works [15:58:02] unfortunately has not labs info [15:59:28] cff: https://gdash.wikimedia.org/ has not either... where would it be? [16:00:37] cff: is it in https://ganglia.wikimedia.org/latest/ ? [16:03:45] incognitus: not sure [16:03:59] It should be in ganglia.wmflabs.org or icinga.wmflabs.org. [16:04:35] https://ganglia.wikimedia.org/latest/?c=MySQL%20eqiad&h=labsdb1001.eqiad.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2 [16:05:18] scfc_de: is icinga private? [16:05:28] there are multiple hosts, labsdb1002, 1003 [16:05:55] cff: These are just the replica DB servers. [16:06:09] incognitus: icinga.wikimedia.org is; icinga.wmflabs.org should be open. [16:06:19] scfc_de: ah, thanks [16:06:37] But diagnosing the problem isn't very effective, as noone seems to have the power to fix it :-). [16:07:35] gnah, this is really annoying ... [16:21:56] benestar: is anyone from WMF here on Sunday? [16:22:13] incognitus: don't know [16:22:37] obviously not in this channel [16:23:13] maybe #wikimedia-tech [16:38:01] guillom: http://lists.wikimedia.org/pipermail/labs-l/2014-February/002077.html [16:38:18] also http://lists.wikimedia.org/pipermail/labs-l/2014-February/002078.html [16:47:50] petan: given the chit chat in here I think it is all broke :> [16:51:28] addshore: do you have any idea to fix it? [16:52:54] Working on the labs issue. XFS woes again. Shouldn [16:53:00] Shouldn't be too long. [16:53:04] :) have fun Coren :) [16:53:15] benestar: I think that answers your question ;p [16:53:18] thx, came here looking for info why tools.wmflabs.org is down [16:53:36] Coren: thanks a lot :) [16:53:45] Blahma: XFS issues.. again :D should be back soon :) [16:56:40] thanks Coren :-) [17:05:35] i see recoveries in nagios :P [17:06:00] thumbs up Coren :> [17:12:24] Coren: great, now I get 400 Bad Request ;) [17:13:43] Recovery should be gradual, but ongoing [17:14:25] was it the same issue as before coming to surprise you on a sunday? :P [17:14:25] bbiab to check on things [17:14:56] Pretty much. Recent versions of XFS sucks (and explains why eqiad file server is ext4) [17:15:09] :d [17:15:10] :D [17:16:34] Webserver is up for me now [17:16:40] Coren: is there a way I can configure newweb to auto start? [17:18:15] My tool is working again, thanks for quick help. [17:18:59] great, thx Coren :) [17:26:41] Betacommand: It was intended to do so, but the migration has stolen all my time. :-( It autostarts by default in eqiad [17:27:00] (In fact, there is no longer an apache default at all there) [17:27:22] Coren: Ive seen several times where my webservice just dies [17:29:03] Betacommand: Can you tell me what the croak message in your error log is? That really shouldn [17:29:10] shouldn't happen [17:29:17] 17:28:14 up 116 days, 20:27, 44 users, load average: 0.41, 17.37, 381.58 [17:29:19] Coren: next time it happens Ill take a look [17:29:49] 17:28:14 up 116 days, 20:27, 44 users, load average: 0.41, 17.37, 381.58 <-- impressively stuck [17:31:19] :> [17:31:36] On the plus side, my students at the Facebook Open Academy thing are working on a distributed cron thingy, so if they suceed we'll now have redundancy at that level too. :-) [17:32:01] They named it 'Megacron'. More than meets the eye, I guess. :-) [17:34:09] Coren: when did you get the message about the outage? [17:35:17] About 30 minutes ago. I'm travelling atm so my email access is irregular. (Well, I'm at the venue now) [17:35:32] Coren: it's fine [17:36:02] Coren: Figured [18:12:31] lolr [18:12:53] Anyone who has access to lolrrit-wm? [18:13:12] Gloria, legoktm ... ? [18:13:12] yuvi's generally the one to ask. [18:13:15] incognitus: yup [18:13:16] what's up? [18:13:17] he is apparently away [18:13:29] legoktm: it keeps joining and quitting [18:13:41] lemme reboot [18:14:14] !log tools restarting grrrit-wm, it keeps joining and quitting [18:14:16] Logged the message, Master [18:15:10] ok [18:15:16] lets see if gerrit-to-redis is still working [18:16:57] legoktm: do you remember the .orig file from yesterday? [18:17:07] yes [18:17:13] looks like everything is up [18:17:19] apparently it needs git config --global mergetool.keepBackup false [18:17:25] but I'm not an expert [18:42:51] Hello. [18:42:53] I'm here. [18:47:19] * Nemo_bis heard a whisper and looks around wondering if it was a ghost. [18:57:26] legoktm: is grrrit-wm alright? [18:57:33] now yes [18:57:36] YuviPanda: yeah, I just restarted it [18:57:41] thanks! [19:13:03] I tried restarting. [19:13:13] and it worked! [19:18:08] Does it really need to begin with "Topic for #wikimedia-labs" ? [19:30:32] ok, so I created a tool, and when I want to run become, I get "sudo: sorry, a password is required to run sudo" [19:31:20] dungodung: try logging out and in again [19:31:49] ah, that did the trick :) [19:36:58] Coren: how do I fix read-only filesystem for gluster fs [19:37:16] on bots-labs /data/project became read only [19:38:42] petan: not chmod -R ? [19:39:14] incognitus: read-only filesystem has nothing to do with chmod [19:39:14] it requires remount [19:39:26] which isn't really simple in case of gluster [19:42:19] [2014-02-09 19:42:01.283063] W [fuse-bridge.c:1726:fuse_create_cbk] 0-glusterfs-fuse: 354: /public_html/petrb/logs/#semantic-mediawiki/20140209.txt => -1 (Read-only file system) [20:53:43] ... even though https://www.mediawiki.org/wiki/Wikimedia_Labs looks fairly outdated. [20:53:59] wikitech link [20:54:01] ? [20:55:20] Yeah, but which? https://wikitech.wikimedia.org/wiki/Main_Page links to https://wikitech.wikimedia.org/wiki/Help:Contents, but that doesn't mention Tools at all (and I wouldn't want people trying to set up a tool request a project instead). [20:59:47] you could update the mw page// [20:59:51] .. [21:01:56] If I had a good idea with what, yes ... :-) [22:33:09] Coren: It looks as if the outage has confused SGE: qstat for local-meetbot shows job 2475808 as "r", while on the exec node there is no process. [22:34:41] Coren: I'll take that back, the continuous job is in the "sleep" part.