[00:24:37] looks like gluster 3.4 and 3.3.2 were released a couple days ago [00:38:30] god damn it [00:38:35] gluster seems to have a gid limit [00:38:43] guess who's over the limit? :) [00:39:40] 3.4 doesn't have this issue [00:39:54] also, the nfs server is giving me shit on tools-login [00:40:17] Coren: ^^ [00:40:52] Ryan_Lane: Yeah, it wedged again. It should clear up in a 30-60 secs. [00:41:02] in dmesg: nfsd: peername failed (err 107)! [00:41:19] Googling further, this seems to be a known issue with ESX since Nov 2012 [00:41:26] ESX? [00:41:29] nfsd peername == symptom [00:41:41] Yeah, they gots a linux kernel underneath, remember? :-) [00:42:14] ah. right. ESXi uses the bsd kernel [00:43:41] Hm. From what I can tell, it's about the kernel never getting interupts for completed commands under some circumstances; so the driver times out, reaps manually, and soft resets. [00:44:08] (Speaking of, the resets took place, NFS should be back) [00:44:25] it is [00:44:48] http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&h=labstore3.pmtpa.wmnet&m=cpu_report&s=descending&mc=2&g=cpu_report&c=Labs+NFS+cluster+pmtpa [00:44:56] Orange spikes == wedged. [00:45:44] It was quiet during the (UTC-4) day; and has picked up a few hours ago. Definitely driven by traffic. [00:46:59] (The small spikes over the blue bumps aren't the controller being wedged -- they are puppet runs) [00:52:45] Ryan_Lane: According to MegaCli, everything is full of joy on the hardware side. [00:52:55] heh [00:53:18] maybe we should go with a stable kernel and drop the thin provisioning? [00:53:40] or is it already too late for that? [00:55:15] Ryan_Lane: It's doable, but would require dump/restore. [00:55:42] It'd be one hell of a regression though if it affects 3.5 /and/ 3.8 kernels. [00:56:15] I'm thinking something else might trigger the issue. Ima do some comparison with labstore1 [00:56:15] * Ryan_Lane nods [00:56:29] Has an H800 too, right? [00:56:34] this didn't happen in 3.5 did it? [00:56:39] * Coren nods. [00:56:50] But it was completely overshadowed by the 14-day problem. [00:57:07] it doesn't look like the same issue... [00:57:12] * Coren checks the older syslogs [00:57:18] I checked the older syslogs; there were resets in there too. [00:57:20] this completely blocks filesystem actions [00:57:39] Yeah; different problems though I wouldn't be stunned if one precepitated the other. [00:57:55] and yeah, they both have the same controller [00:58:23] There is also the dumb possibility of an actual hardware problem with the controller or the shelf. :-) [01:03:29] dear gluster, I fucking hate you [01:03:38] it's such a gigantic piece of shit [01:04:19] tried to delete and recreate a volume [01:04:23] god help you if you try that [01:13:08] Aha! [01:13:21] http://forums.freenas.org/threads/lsi-megaraid-sas-9261-8i-timeout-with-write-back.5960/ [01:13:36] labstore3 is in write-back; labstore1 is in write-through [01:14:25] Wrong OS, but same basic hardware. [01:16:41] Thankfully, that's easy to test: I just switched all devices to WriteThrough. [01:17:26] Ryan_Lane: ^^ [01:17:31] Now I sit back and wait. [01:18:18] heh [01:18:49] WMF ENGINEER ACTS RECKLESS WITH HARDWARE SETTINGS! LET US FORK ENGLISH WIKI!!!!1 [01:18:50] :) [01:19:19] well, this would be a more conservative setting ;) [01:19:35] with worse write performance [01:19:56] who cares?! I wasn't informed! [01:20:47] :) [01:22:11] Holy... [01:22:17] You have GOT to be shitting me. [01:22:30] I really don't understand why this one project is giving me issues with gluster and no other ones are [01:22:35] Coren: ? [01:22:36] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&h=labstore3.pmtpa.wmnet&m=cpu_report&s=descending&mc=2&g=cpu_report&c=Labs+NFS+cluster+pmtpa [01:22:46] Look at the bunch of tiny spikes every 30s [01:22:57] My flush_dirty interval is set at 30s. [01:23:18] So now it does a bit of iowait when buffers are flushed. What you'd expect with writethrough right? [01:23:53] ... what if the timeout was caused by the controller doing all of /its/ writeback flush at once when its own cache got full? [01:24:17] That'd explain the decreasing interval as write traffic increases. [01:24:39] And, being over the time limit, would cause a soft reset (that'd politely wait until everything was flushed). [01:25:29] s/30s/60s/ [01:25:34] 600 centiseconds. [01:26:49] If setting writethrough fixes things, I'm going to be /very/ cross at whoever wrote that firmware. [01:27:19] YuviPanda: Where is that fork enwp thread anyways? I could use a laugh. [01:27:31] Coren: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28proposals%29#Fork_the_wiki [01:27:43] Coren: When did you change the setting? Or does it need a restart? [01:27:52] scfc_de: I did, it doesn't. [01:27:54] Coren: they found our secret (WMF has other 'clients' we 'sell to', apparently) :) [01:28:28] scfc_de: If you look at the graph I pointed to, you can see the change of setting at the last blue spike after the big orange lump. [01:28:31] Is there anyone here that can restart Helpmebot by chance? retsreklawts seems to be asleep already for the night. [01:31:10] heh. seems I hit gluster's 64 gid limit [01:31:14] sad that I was in that many groups [01:32:16] in 3.4 the 64 gid limit is gone too [01:32:23] maybe I should upgrade right now [01:32:24] :D [01:32:29] * Ryan_Lane is kidding [01:32:35] Ryan_Lane: I'm pretty sure you can't make things worse. :-) [01:32:39] hahaha [01:32:57] gluster responding and serving filesystem is better than it shitting itself [01:33:02] * Coren would rather continue struggling with issues on an NFS server than attempt to wrestle gluster. :-) [01:33:20] I'm just going to wait until we replace it with NFS [01:34:18] But really, if that effing firmware implements writeback as "wait until my buffers are all full then flush them all at once", it wins the Gluster Prize of Excellence! :-) [01:34:45] * Coren decides 'Gluster Prize of Excellence' needs to be awarded to other truly horrid pieces of software. :-) [01:35:43] * YuviPanda awards Gluster Prize of Excellence to Mediawiki [01:35:58] Coren: The graph looks like a Rorschach test to me :-). The "Wait" line seems to be a bit more prominent after 1:15Z. [01:37:16] scfc_de: That's to be expected; setting the controller to writeback means that when the OS issues a write, it'll get the OK only once the block has actually landed on the rust; the orange spikes are the flush processes waiting on that IO [01:38:29] They used to be much flatter because then it only had to wait for the block to hit the controller's cache instead. [01:39:18] Coren: gluster would probably be better than NFS for us, if we didn't need multi-tenancy [01:39:31] alas [01:40:20] Ryan_Lane: I'm not giving up on the idea of a DFS, but we need to hammer it out seriously before we put stuff people expect to keep working on it. :-) [01:40:39] gluster fails because it can't handle the number of volumes we're running [01:41:05] can't do multi-tenancy without running multiple volumes [01:41:05] Coren: Really? I googled for H800, and for example the excerpt in http://en.community.dell.com/support-forums/servers/f/906/t/19398489.aspx defined write-back = "controller cache has received all the data in a transaction", write-through = "disk subsystem has received all the data in a transaction". "Disk subsystem" = superset? [01:42:10] OS buffers -> controller -> cache -> disk hardware -> disks [01:42:57] Coren: But then "setting the controller to writeback means that when the OS issues a write, it'll get the OK only once the block has actually landed on the rust" isn't true? [01:43:04] "disk subsystem", in that context, probably means the hardware that actually talks to the disks including the disks themselves. [01:44:11] That terminology is wonky though. It probably was written neutrally enough to avoid presuming that what's on the other end of the SAS cable is an actual disk. [01:46:16] Coren: So you meant to say "That's to be expected; setting the controller to *writethrough* means that when the OS issues a write, it'll get the OK only once the block has actually landed on the rust"? [01:47:00] ... I didn't notice that error, even though you pointed it out. :-P Yes, of course. [01:47:55] uuuuuggggghhhh. role::puppet::self regenerates the client key when you specify a central puppet master? [01:48:13] * Ryan_Lane stabs [01:48:16] * Ryan_Lane stabs hard [01:48:43] Coren: As long as the controller understands you correctly ... :-) [01:48:50] Good night everybody. [02:07:41] hm. these instructions for multi-node role:puppet:self doesn't work [02:09:10] *don't [04:08:59] [bz] (8NEW - created by: 2Ryan Lane, priority: 4Normal - 6normal) [Bug 51581] Deployment-prep deploys from master and uses a submodule with submodules - https://bugzilla.wikimedia.org/show_bug.cgi?id=51581 [05:15:18] Why tools-login is responding so slow? [05:16:06] because it's having nfs server issues [05:24:39] Ryan_Lane: again? [05:24:49] more like still [06:04:46] Ryan_Lane: you're probably aware en.wikipedia.beta.wmflabs.org is down again. Not a problem for me, just letting you know. It was working OK around 90 minutes ago. [06:05:04] spagewmf: I don't actually have anything to do with beta [06:06:22] Ryan_Lane: sorry, the .wmflabs.org in the hostname steered me wrong. I know hashar deals with it, should I let ops know? [06:06:40] hashar is the one that maintains it [06:06:50] k, thanks [06:06:51] no one in ops does as far as I know [06:07:43] I'm digging the gluster comments in the scrollback [06:14:18] Ryan_Lane: this is horribly slow [06:14:30] spagewmf: it's not glusrer [06:14:35] spagewmf: it's the nfs server [06:14:41] deployment-prep switched to nfs [06:23:10] Ryan_Lane: better now [06:24:09] zhuyifei1999: see: https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&s=by+name&c=Labs+NFS+cluster+pmtpa&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [06:26:23] Ryan_Lane: thanks [06:26:28] yw [07:18:32] Ryan_Lane: why did it happen? [07:23:32] Hi everyone. [07:23:38] For a little bit at least. [07:23:48] I'm in Germany now. [07:24:15] Cyberpower678: you ar in germany? who? [07:24:19] *hi [07:24:42] who? [07:24:46] Hi! [07:24:54] *where? [07:25:05] Gross-Gerau [07:26:01] And jetlagged. [07:26:20] * Cyberpower678 might be going to sleep in a moment to catch up on sleep. [07:27:24] Cyberpower678: Why Germany? [07:27:56] Because it is my birthplace and my nationality, my citizenship, and it's where my family lives. [07:28:32] I'm on a temporary mobile connection right now, so I will go offline later. [07:28:58] Just running some tests on my new script now. [07:31:52] :) [07:32:08] Steinsplitter, where are you? [07:33:28] North-Italy [07:33:51] Oh. [07:34:01] ca. 4/5 H from Muinch [07:34:26] I'm near Frankfurt [07:35:27] :) [07:36:41] I just landed. [07:41:37] I sincerely hope that's merely a spike on the load and not permanently going up again. [07:42:26] Yes it is. [07:49:59] . [08:10:56] zhuyifei1999, >:( [08:39:58] Coren: on toolabs the hu database is mostlkely missing revision_userindex [08:42:18] i just wrote https://wikitech.wikimedia.org/wiki/User:Legoktm/pywikibot_on_tools_lab, i think someone in here was asking help with it the other day [08:50:17] Coren: what do you think about https://bugzilla.wikimedia.org/show_bug.cgi?id=51310? [09:20:41] [bz] (8VERIFIED - created by: 2Aude, priority: 4Unprioritized - 6normal) [Bug 48743] wikidata_singlenode puppet manifest is broken - https://bugzilla.wikimedia.org/show_bug.cgi?id=48743 [11:38:23] T13|sleeps: ^^ [11:38:28] @ping [11:38:34] !ping [11:38:35] pong [11:58:50] !log bots installed nano on bots-labs :> [11:59:05] >.< [12:01:18] Logged the message, Master [12:01:52] Coren: did the person having SMTP issues eventually get it fixed? [12:01:56] forgot who that was :| [12:10:59] addshore why you install nano on bots-labs? [12:11:09] cause i like nano! [12:11:25] what happened to wm-bot [12:11:31] it died :> [12:11:38] howcome [12:11:55] unresponsiveness :D [12:12:04] you don't need to restart bouncers to fix that [12:12:27] hmm [12:13:47] that's weird [12:13:53] !pin [12:13:53] pong [12:14:11] petan: just scrolled up in my console [12:14:21] as far as I remember i didnt kill the bouncers :P [12:14:29] you didn't? [12:14:36] how did you restart the bot [12:14:42] *checks* im the wrong user actually [12:15:41] started by killing wmib.exe and re starting restart.sh [12:16:22] nothing happened :> so i killed the bouncers and restarted them also [12:19:07] petan: i also fixed your docs ;p https://wikitech.wikimedia.org/w/index.php?title=Nova_Resource:Bots/Documentation/wm-bot&diff=77756&oldid=77520 [12:24:13] Morning petan [12:24:25] addshore: https://wikitech.wikimedia.org/wiki/Nova_Resource:Bots/Documentation/wm-bot#How_to_fix_1_or_more_disconnected_instances_for_bot_which_is_running [12:25:05] :> [12:25:09] you should put that at the top ;p [12:25:24] BB [12:25:37] oh wait, thats new! :O [12:41:14] !log deployment-prep Text cache was not in wgSquidNoPurge, that caused all requests to be interpreted as coming from the text cache causing misc issue (such as throttling account creation for everyone). [12:41:17] Logged the message, Master [12:42:05] http://bots.wmflabs.org/~wm-bot/dump/%23wikimedia-labs.htm [12:43:52] !addshore [12:43:52] fail [12:44:01] lol ha.. [12:44:02] :< [12:44:22] I didn't do it, but find it funny.. [12:44:45] hehe i know :P [12:44:45] !accountreq [12:44:45] in case you want to have an account on labs please read here: https://labsconsole.wikimedia.org/wiki/Help:Access#Access_FAQ [12:46:22] !Coren [12:46:22] Coren is dead. petan killed him. He now roams about as a zombie. [12:46:32] !Cyberpower678 [12:46:33] addshore, how do you rollback? with the rollback button? :D [12:46:42] !demon [12:46:42] <^demon> Docs exist solely for developers to go "omg you didn't read the docs!" when people ask common questions. In practice, nobody reads docs before asking questions. [12:46:57] So true!!!! [12:47:35] !Cyberpower678 [12:47:36] addshore, how do you rollback? with the rollback button? :D [12:48:26] :> [12:50:33] T13|needsCoffee: I don't think this channel is for playing with bots [12:52:09] zhuyifei1999: just reading the stuff already there.. :) [12:53:03] T13|needsCoffee: #wikimedia-labs-offtopic [12:59:53] @trusted [12:59:53] I trust: petan!.*@wikimedia/Petrb (2admin), .*@wikimedia/.* (2trusted), .*@mediawiki/.* (2trusted), .*@wikimedia/Ryan-lane (2admin), .*@wikipedia/.* (2trusted), .*@nightshade.toolserver.org (2trusted), .*@wikimedia/Krinkle (2admin), .*@[Ww]ikimedia/.* (2trusted), .*@wikipedia/Cyberpower678 (2admin), .*@wirenat2\.strw\.leidenuniv\.nl (2trusted), .*@unaffiliated/valhallasw (2trusted), .*@mediawiki/yuvipanda (2admin), [13:08:52] meh [13:09:19] @channels [13:09:36] !ping [13:09:36] pong [13:09:45] zhuyifei1999, that's right. So don't make @kick you. :p [13:09:50] I am in 92 channels in this moment [13:10:02] wm-bot sure get around. [13:10:23] petan: is that with both nicks? [13:10:38] http://bots.wmflabs.org/~petrb/db/systemdata.htm [13:10:38] Cyberpower678: ? [13:10:52] wm-bot Online in 50 channels 6667 [13:10:53] wm-bot2 Online in 42 channels 6668 [13:11:05] @trusted always pings me zhuyifei1999. I'm an admin in this channel. [13:11:05] I trust: petan!.*@wikimedia/Petrb (2admin), .*@wikimedia/.* (2trusted), .*@mediawiki/.* (2trusted), .*@wikimedia/Ryan-lane (2admin), .*@wikipedia/.* (2trusted), .*@nightshade.toolserver.org (2trusted), .*@wikimedia/Krinkle (2admin), .*@[Ww]ikimedia/.* (2trusted), .*@wikipedia/Cyberpower678 (2admin), .*@wirenat2\.strw\.leidenuniv\.nl (2trusted), .*@unaffiliated/valhallasw (2trusted), .*@mediawiki/yuvipanda (2admin), [13:11:09] oops [13:11:10] OMG OMG [13:11:13] can you just stop using it [13:11:40] @trustdel petan!.*@wikimedia/Petrb [13:11:40] User was deleted from access list [13:11:41] petan, sorry. :p [13:11:46] here we go [13:12:13] Cyberpower678: petan is root [13:12:23] so don't worry [13:12:26] actually I am Peter [13:12:28] :3 [13:12:38] I am root and addshore is stump [13:12:51] zhuyifei1999, I know. [13:13:14] petan, and I'm the branches. [13:13:17] Cyberpower678: What's @trusted always pings me zhuyifei1999. I'm an admin in this channel. [13:13:20] :3 [13:13:48] zhuyifei1999, ?? [13:14:02] petan: what ever, but it's your nick [13:14:03] zhuyifei1999, when you use @ trusted. You ping me everytime, [13:14:21] Cyberpower678: ok [13:16:11] I am stump! [13:16:29] :P [13:16:36] addshore: you're trusted [13:16:39] better than root [13:16:46] at least cute chicks can sit on you [13:16:53] O_o [13:16:55] xD [13:16:55] petan: I thought you were Petr? :p [13:16:55] XD [13:17:03] Peter == Petr [13:17:06] how much beer have you had today petan ? :P [13:17:11] none yet [13:17:15] I am in office :P [13:17:19] Yet... [13:17:19] not that I couldn't have it [13:17:25] thats no reason not to have beer ;p [13:17:25] but I prefer coke in work [13:17:44] i drink so much water at work xD [13:17:54] This is seriously offtopic [13:18:14] @kick petan stop posting this off topic shit plz [13:18:14] !!log [13:18:17] petan needs a new hobby :P [13:18:24] :< [13:19:02] What timing... [13:19:40] petan: wm-bot kicked you then said you need a new hobby.. :p [13:19:40] petan: [13:19:42] (13:18:15) ChanServ!ChanServ@services. changed mode +o wm-bot [13:19:43] (13:18:16) User wm-bot!~wm-bot@wikimedia/bot/wm-bot kicked petan from channel with reason: stop posting this off topic shit plz [13:19:44] (13:18:17) petan needs a new hobby :P [13:19:46] (13:18:23) :< [13:19:47] (13:18:39) petan!~pidgeon@wikimedia/Petrb just joined the channel [13:20:04] :> [13:20:08] Priceless.. [13:21:29] addshore how big is your flat :D [13:21:35] not very big [13:21:37] october is sooon [13:21:39] XD [13:21:42] but i dont seem to spend much time there xD [13:21:48] you better make some party there lol [13:21:56] haha, i dont think thats possible :P [13:21:59] :( [13:22:08] maybe party somewhere else and collapse on the floor at the flat? :P [13:22:24] !addshore [13:22:25] fail [13:22:25] yes that work :P [13:22:29] lol [13:23:04] addshore you got many friends in there already? how are these WMDE folks [13:23:24] they are lovely :) [13:23:34] this reminds me I wanted to take andre klapper to some pub :D [13:23:37] he lives in prague lol [13:25:19] !addshore | addshore [13:25:19] addshore: addshore is no longer fail! [13:25:31] :O [13:25:43] what a trick [13:25:48] * prick :D [13:25:51] :D [13:26:00] hush now :) [13:28:56] !zhuyifei1999 [13:28:56] 9991iefiyuhz [13:30:09] !delete is petan deleted me once, and then andrewbogott came and deleted whole my server by accident. I tell you people, deleting software is evil and should be illegal. If you don't like some program, don't delete it, just shoot yourself or something... [13:30:09] Key was added [13:30:22] !delete [13:30:22] petan deleted me once, and then andrewbogott came and deleted whole my server by accident. I tell you people, deleting software is evil and should be illegal. If you don't like some program, don't delete it, just shoot yourself or something... [13:42:36] That server is cursed! Cursed I tell you! [13:50:02] nfs? :P [13:50:27] ok, gluster suck, nfs suck, is there anytthing else we can switch to [13:50:50] * petan quiets YuviPanda before he starts about SMB-crap [13:51:00] we could use SMB [13:51:04] :P [13:51:07] ... [13:52:57] Well, NFS doesn't suck. In fact, it's been incredibly robust given the fact that the @&#^ controller flakes out every 20 minutes. [13:53:46] And, BTW, it's called CIFS; SMB is the predecessor that sucked even more but that nobody has used for ~10 years. :-) [13:54:04] YuviPanda: What's SMB? Some Magic Bean? [13:54:14] LOL [13:54:21] !smb is YuviPanda: What's SMB? Some Magic Bean? [13:54:21] Key was added [13:54:38] Coren: more people know 'SMB', hence it works better :) [13:54:47] and I am pretty sure everyone agress that anything is better than SMB [13:54:51] here he is. the gut with a magic bean... [13:54:55] * guy [13:55:28] there is some fuckshitcrap [13:55:56] I am trying to find out how to call it [13:56:12] Coren forbid me saying "there is load over 20" so I will just say there is fuckshitcrap on grid [13:56:43] hm... but it's pretty constant [13:56:48] maybe nfs is not borked this time [13:57:05] nah [13:57:17] <^demon> Friendly reminder: Gerrit is coming down in a few minutes for some hardware work. Please don't panic. [13:57:30] NOOOOOOOOOOOOOOOO what are we going to do eeeeeeek [13:57:38] <^demon> Go outside and get some fresh air :D [14:01:18] What is an "outside"? [14:01:56] <^demon> Good question. I've only heard tales... [14:03:27] I was out once and I can tell you it's fucking scary [14:04:30] [10:04] DEBUG Exception in module Feed: The remote server returned an error: (502) Bad Gateway. last input was petan chan: #wikimedia-labs I was out once and I can tell you it's fucking scary [14:04:38] ... [14:04:41] I saw that [14:06:23] <^demon> petan: Your profanity scared gerrit-wm away! ;-) [14:08:12] ǃlog [14:09:33] !hashar [14:09:33] [10:15:12] !log WMFLabs seems to have recovered now [14:10:00] ǃhashar [14:10:23] !log ? [14:10:23] Message missing. Nothing logged. [14:10:28] <^demon> !me [14:10:29] bleh [14:10:38] !^demon [14:10:57] !demon [14:10:57] <^demon> Docs exist solely for developers to go "omg you didn't read the docs!" when people ask common questions. In practice, nobody reads docs before asking questions. [14:11:13] ^^^ True story... [14:12:24] <^demon> Technical_13: This is why FAQ stands for "asked" questions and not "answered" questions ;-) [14:12:49] ǃlog [14:12:54] I think I am blacklisted by the bot [14:13:02] must have been throttled [14:26:05] [bz] (8NEW - created by: 2spage, priority: 4Unprioritized - 6normal) [Bug 51580] configure beta labs for SUL2 - https://bugzilla.wikimedia.org/show_bug.cgi?id=51580 [14:32:52] <^demon> Ok, gerrit's back up. Things might be a little slow for a bit while the disks finish sync'ing, but nothing to worry about. [14:42:23] um... Coren: an issue with fcgi again [14:42:49] JohannesK_WMDE: please to give details. [14:43:13] typing! :) i checked whether the request function is actually called without the interpreter on subsequent request, and it isn't. [14:43:45] JohannesK_WMDE: did you get your SMTP issue solved, btw? [14:43:48] the script is always executed from the start, like a normal CGI script. i'm using flup. anything special i need to do to activate fcgi? [14:43:49] I'm not sure I understand what you mean. [14:44:11] Actually, I'm sure I don't understand. :-) [14:44:11] Do we support fcgi at all? [14:44:11] YuviPanda: yes, it seems to work now. [14:44:13] I don't think so [14:44:43] No FCGI support atm. I'd have been done with WSGI by now if it hadn't been for the NFS server's controller giving me issues. [14:44:53] yeah, that'll explain JohannesK_WMDE's problem [14:44:57] :) [14:45:00] ohhhhhh. okay. well that explains it :p [14:45:16] :P [14:45:29] i faintly remember you saying something about fcgi partly working, Coren [14:45:31] FCGI is on the plan, after WSGI which is a more requested feature. :-) [14:45:59] JohannesK_WMDE: No, I said FCGI would be easier to implement in this setup than WSGI, but the latter is waited on by more people. :-) [14:47:26] Coren: can you have a look at https://bugzilla.wikimedia.org/show_bug.cgi?id=51310 and tell me what you think? [14:48:46] well, i want to have the requests served without an interpreter being started each time. that is possible with wsgi, right Coren? i don't care about the first letter really... [14:48:52] giftpflanze: I /can/ create a dedicated instance with its own queue, but why do you think it's going to be necessary? [14:49:13] JohannesK_WMDE: Yes, wSGI is pretty much "FCGI on steroids" [14:51:48] Coren: hm, let me first try implementing it with the infrastructure given [14:52:00] but the Thread package would be nice [14:52:05] so, how much work is it approximately to make wsgi work Coren? [14:52:45] JohannesK_WMDE: About a day's worth, but right now I'm delayed by the NFS hardware/driver issues. [14:53:07] giftpflanze: I see no reason why that wouldn't work; is there a missing package? [14:54:07] it is named tclthread [15:02:57] Coren: I tried resetting my password on en.wikipedia, put in my email, and it never got sent to me. Is this something you could investigate for me? [15:07:49] rachel99: Not unless I could positively establish your identity, sorry. [15:11:25] coren: I really want to make it be one of the accounts for my SUL, but it won't work for the en.wikipedia, as apparently I have the wrong password for it. What do you suggest I do to fix this? [15:12:04] rachel99: Do you currently own the SUL account? [15:12:09] yes, I do [15:12:41] rachel99: Then you need not worry, when SUL finalization takes place, the enwp account will be renamed out of the way. [15:13:28] Coren; Oh, ok When will that be? As of now, i can't log into enwp. [15:14:23] It is scheduled for mid-August, IIRC. In the meantime, you can create a new enwp account and request it be renamed/merged into your SUL account after finalization. [15:15:01] coren: Ok, good idea. I will try it! [15:16:54] rachel99: I had a similar problem not long ago for Selenium_user for some reason. Also, can you try the password-reset option for your existing account and see if you get the proper email? [15:17:45] chrismcmahon: I already tried the pw-reset option and did not get email. I 've tried it before too, and never got email. [15:30:52] [bz] (8NEW - created by: 2Chris McMahon, priority: 4Unprioritized - 6major) [Bug 51616] beta commons fatal 503 from UserLogin and Varnish - https://bugzilla.wikimedia.org/show_bug.cgi?id=51616 [16:19:36] * Coren rages at the stoopid controller. [16:21:22] !Coren [16:21:22] Coren is dead. petan killed him. He now roams about as a zombie. [16:31:25] ?? [16:40:50] [bz] (8NEW - created by: 2Chris Steipp, priority: 4Unprioritized - 6normal) [Bug 51622] Add loginwiki to beta - https://bugzilla.wikimedia.org/show_bug.cgi?id=51622 [17:46:36] *** Brief outage of NFS (3-4 minutes) while I reboot to try a different driver setting *** [17:52:05] [bz] (8ASSIGNED - created by: 2Yuvi Panda, priority: 4High - 6normal) [Bug 49058] Support WSGI for Running Python Scripts - https://bugzilla.wikimedia.org/show_bug.cgi?id=49058 [17:59:48] :'( [18:03:13] Is tools broken? Can't log in, web tools forbidden. [18:03:49] tb4: (07:46:36 PM) Coren: *** Brief outage of NFS (3-4 minutes) while I reboot to try a different driver setting *** [18:04:02] Ah cool, ta [18:04:16] i had trouble logging in since a few hours, probably related... [18:04:20] Yeah, should be back in 1 min. Took a bit longer than I would have hoped. [18:06:36] Bah! I'll need a cluster-wide reboot. [18:07:18] No load, yet tools is down? [18:07:24] Coren, ^ [18:07:33] *** Brief outage of NFS (3-4 minutes) while I reboot to try a different driver setting *** [18:07:35] Cyberpower678: Check backscroll. [18:07:35] Cyberpower678: ^ [18:07:52] Cyberpower678: Also, "load" is not an indication of anything except "load". :-) [18:08:13] Coren, I don't have a backscroll [18:08:19] I just joined. [18:11:18] There we go. Problem fixed, tools should return in ~40s [18:14:45] Coren, Coren, he's our man... ah nevermind.. too flippin hot here... [18:31:49] why are reboots not announced ahead? [18:31:59] so one wouldn't loose his data [18:32:27] Danny_B: NFS crashed there wasnt any option [18:33:21] hmm, seems nfs is actually an evil (maybe "Not a File System"?) - it creates issues everywhere, same on ts [18:33:23] whats the link to see currently running jobs? [18:33:38] Betacommand: Jobs aren't yet back up, will be shortly. [18:33:46] ah [18:33:48] Danny_B: That was an unplanned outage. [18:34:19] On the plus side, it also seems to have been a working fix for the controller issue on the NFS server *knock on wood* [18:34:27] omg, it crashed again [18:36:28] Danny_B: What did? [18:36:59] Database servers seem t have gone the way of the Norwegian blue. [18:38:40] tb4: I'm on enwiki_p fine; which database are you trying to connect to? [18:40:25] Coren: Alive now (enwiki.labsdb). Slow to come up after reboot perhaps. Ta. [19:03:27] Note: The gridengine master instance is currently having problems booting (known issue). We're on it. [19:10:02] any eta on getting jobs back up? [19:24:45] Gridmaster is back up, jobs have restarted. [19:25:05] And thus ends today's installment of "Sysadmin panicking because that shouldn't happen" [19:26:25] Coren, Coren, he's our man... ah nevermind.. too flippin hot here... [19:27:25] (For the curious, the instance was actually stuck at the grub menu waiting for someone to hit enter on the keyboard that doesn't exist) :-) [19:31:09] <^demon> Coren: Maybe we should add keyboards to labs instances [19:31:23] SGE just restarted? [19:31:34] that's nice, apparently one of the bugs with one of my scripts was that it needed a restart :) [19:31:35] YuviPanda: Some minutes ago. [19:31:51] Heh. Accidental fix FTW! :-) [19:32:19] Some Magic Beans FTW [19:32:21] ... [19:33:07] hehe :D [19:33:15] Coren: it also spammed marktraceur with a lot of emails though. [19:33:29] Coren: that's actually heartening, since that means my redis based queueing system *works*, and no data is ever lost [19:34:04] That is teh cool. I'm going to be looking at reddis once come the times to replace cron. [19:35:29] Coren: :) [19:35:52] Coren: poke me when you start, I'm interested too. Or even generally describe the problem at some time so I can think of it :) [19:37:32] Coren: redis to replace cron? :) [19:37:50] Coren: for putting stuff into the job queue? [19:43:34] Ryan_Lane: https://github.com/dotcloud/hipache uses Redis in a very very nice / interesting way, and is something I'd like us to use at some point [19:43:40] to let 'real time' stuff work [19:44:05] Ryan_Lane: I need some sort of scalable distributed cron-like thing. Reddis seems like a good place to stuff schedules. [19:44:25] *Redis [19:44:32] YuviPanda: that looks like a load balancer to me [19:44:59] Ryan_Lane: not primarily. It lets you proxy things to multiple services that keep popping up elsewhere [19:46:25] Actually, all that trouble does point out that the Tool Labs setup is fairly robust. [19:46:40] Although cron annoys me because it's a single point of failure. [19:46:46] Ryan_Lane: so we can spin up node.js stuff, twisted, etc on SGE, and have the web proxy to them [19:46:46] YuviPanda: this is actually pretty neat [19:46:49] Ryan_Lane: very! [19:46:51] Incidentally, can reddis be distributed? [19:46:52] it's built for multi-tenancy [19:46:54] Ryan_Lane: their usage of Redis is wonderful [19:46:56] I assume it can. [19:46:58] Ryan_Lane: indeed. nails our SGE usecase [19:47:02] and can be reconfigured while running [19:47:06] exactly! [19:47:14] YuviPanda: it could be multi-tenant, too [19:47:24] so it could actually be used as a proxy for labs [19:47:24] Ryan_Lane: if we can replace our apache proxy with this, it'll be very very neat. [19:48:00] * Coren points out that the proxy also does header rewrite that cannot be avoided. [19:48:17] if we added a simple API with keystone support in front of this it could be used as a proxy as a service [19:48:59] I wonder if this is a single npm [19:49:07] if so, we could package it easy enough [19:49:58] Coren: this can easily do that too, I am sure [19:50:01] Ryan_Lane: keystone? [19:50:02] * YuviPanda looks [19:51:19] YuviPanda: centralized authn/z for openstack [19:51:32] oh? why would we need that? does our apache proxy do that right now? [19:51:49] YuviPanda: I meant if we were to use this as a proxy as a service [19:51:52] yeah, ProxyAsAservice. We already have a bug for one of these [19:52:12] Ryan_Lane: aaah, so people can register via their tool accounts? [19:52:13] nice [19:52:16] so that users could configure a proxy through the web interface (or the cli in the future) [19:52:32] yeah [19:52:33] or wikitech could do it automatically in cases it knows [19:52:55] almost every public IP address in use could be freed up if we had this [19:53:19] oooh, you're talking about labs in general, not just toollabs [19:53:22] yep [19:53:25] oooh, yes. that too. [19:53:32] thus the need for keystone and an api :) [19:53:37] I didn't realize the wider implications :D [19:53:57] hm. does this have ssl support? [19:54:05] Ryan_Lane: yeah [19:54:14] nice [19:55:06] Coren: any reason my scripts didnt restart? [19:55:25] I guess we could just have wikitech write directly to redis, but then we'd never be able to have a cli for this [19:56:04] Betacommand: I don't see errored out jobs. Were they on the continuous queue? [19:56:12] yeah [19:56:54] You wouldn't happen to have your last job numbers handy, would you? [19:57:02] (I can search for them, just faster if you do) [19:57:03] one sec [19:57:12] YuviPanda: heh. it only supports a single ssl cert/key eh? [19:57:37] I guess we could limit the proxy to *.wmflabs.org and *.tools.wmflabs.org [19:57:49] and modify the * cert for future needs [19:57:59] Ryan_Lane: we could have two of them running, one for tools and one for everything else [19:58:07] Ryan_Lane: Happy fun joy: reset happened. [19:58:10] Ryan_Lane: and why can't we have a CLI later if we write to redis? [19:58:12] * Coren curses. [19:58:17] Coren: :( [19:58:19] 552239 [19:58:34] Coren: no need to have two running [19:58:37] err [19:58:42] YuviPanda: ^^ [19:58:59] Ryan_Lane: hmm, right. Plus this will give us all websockets! \o/ [19:59:12] YuviPanda: you would have a * cert with a SAL of another * [19:59:17] *SAN [19:59:27] Betacommand: end_time Thu Jul 18 17:59:26 2013 with exit_status 0 [19:59:28] [19:59:31] that flew over my head, since I've not fully understood how Certs work yet. Sorry :( [19:59:46] Coren: it never edits [19:59:49] *ends [19:59:51] cn=*.wmflabs.org; SubjectAltName: *.tools.wmflabs.org [19:59:58] Betacommand: As far as I can tell, it exited rather than die. Odd. [19:59:58] aah, right. [20:00:05] Betacommand: Lemme correlate with the logs. [20:00:16] YuviPanda: when I say by cli, I mean letting any projectadmin modify the config by cli [20:00:21] and only their configs [20:00:28] and no one else's [20:00:34] Ryan_Lane: ah, right. But can't we enforce that with some sort of a shared secret? [20:00:38] we'd need that anyway with Redis [20:00:47] Ryan_Lane: have something in front of Redis [20:00:48] we wouldn't need that with redis [20:00:57] if we control redis fully [20:01:02] right. put something in front [20:01:06] and have authn/z in front [20:01:13] (hence an api with keystone support ;) ) [20:01:20] how hard is it to implement keystone support? [20:01:33] should be relatively trivial if it's written in python [20:02:02] wsgi + openstack-oslo (common libraries) [20:02:29] hmm, what do we have for all of labs right now? I remember hearing about there being a proxy [20:02:38] there's an instance-proxy [20:02:48] it has a fixed hostname [20:02:58] .instance-proxy.wmflabs.org [20:03:06] aah [20:03:06] right [20:03:18] Betacommand: That may be my fault actually; when I restarted NFS the first time I got the sequence wrong and for a very brief period of time (~12s) the NFS was up with no visible files. [20:03:37] Ryan_Lane: are you going to look into it at some point in the very near future? :) [20:03:40] Because I was doing manual changes on the fly. [20:03:42] Coren: grrr [20:03:47] ipv4 addresses are a national treausre! [20:03:50] heh [20:03:53] YuviPanda did you see !paste [20:03:55] * Betacommand goes to restart them [20:03:56] well, it's not the highest thing on my priority list [20:03:59] !paste [20:04:00] http://tools.wmflabs.org/paste/ [20:04:10] Betacommand: That timestamp is exactly in that brief period. Sorry about this. [20:04:10] I'm going to need to migrate labs to eqiad in the near future [20:04:24] this seems like an *excellent* volunteer task [20:04:26] Ryan_Lane: if it isn't, I can try experiment with it just on toollabs. [20:04:41] petan: nice, but I like mine cleaner :) [20:04:51] YuviPanda which [20:05:03] petan: dpaste.de [20:05:04] usually [20:05:06] heh. I wanted to make a labs paste [20:05:26] this one has real time syntax highlight [20:05:28] one that worked like gist [20:05:29] <3 [20:05:37] Ryan_Lane: exactly. +Oauth :) [20:05:40] with git support [20:05:50] YuviPanda: at minimum + OpenID :) [20:05:57] +OAuth would be nice too [20:05:59] Ryan_Lane: indeed. If I were evil I'll store this as userpages even :P [20:06:18] (the non-temp ones, at least) [20:06:30] petan: if you're typing code into a pastebin, you are doin it wrong :) [20:07:08] Ryan_Lane: so... can I get a public IP for, say, tools-hi.wmflabs.org at some point (before/during WM maybe) to experiment with this just for toollabs? [20:07:27] does that instance already exist? [20:07:33] in labsconsole is there a way to define a variable that will be passed to a role? it looks like it defines variables are in top scope or something. [20:07:46] I guess I can allocate an IP and add the dns name [20:07:52] Ryan_Lane: not yet, I'll create it right before. Can't work on it for another few weeks at least. [20:08:10] manybubbles: you can define them via "Manage puppet groups" [20:08:16] manybubbles: per-project [20:08:25] then you can add them via "configure" [20:08:34] Ryan_Lane: let me try that again.... [20:08:51] you need to add a group, then you add a variable to it [20:09:07] it's one of the more terrible interfaces in wikitech [20:09:59] <^demon> Ryan_Lane: Oh speaking of wikitech UI stuff, I deleted my first instance today since the ajaxy stuff went in. That's pretty snazzy, thanks for that :) [20:10:30] ^demon: yw :) [20:10:43] right after I deployed it I added a bug for it too [20:11:07] instance deletion is async. a successful deletion api action just changes the instance's state [20:11:25] it should update the state unless the instance deletion occurs fast enough, then it should delete the instance from the row :) [20:12:58] * Damianz appears [20:13:08] !ping [20:13:08] meh [20:13:08] pong [20:14:21] YuviPanda: added tools-hi dns to a newly allocated IP [20:14:30] Ryan_Lane: oh, nice. no instance yet though :) [20:14:33] that's fine [20:14:40] when you have one it just needs to be associated [20:14:40] Ryan_Lane: ty :) [20:14:42] yeah [20:15:12] Coren: why does tools.wmflabs.org also have an alias of betaweb? [20:15:54] ... oh! That was for testing Apache 2.4. No longer relevant. [20:15:58] * Coren removes it. [20:16:01] thanks :) [20:16:38] why's apache 2.4 no longer relevant? [20:17:07] Ryan_Lane: It will become relevant again eventually, but not in that context. :-) [20:17:19] ah ok [20:17:28] I.e.: it's not Apache 2.4 that isn't relevant, it's my testing of it this way. [20:17:55] ahhh. ok :) [20:18:06] * Coren sent an email to labs-l with outage info. [20:18:44] yep. saw [20:18:55] Ryan_Lane: Next step for me, I'm going to finish configuring labstore4 now that it has working ram, and switchover to see if the controller itself is flaky. [20:19:07] Prolly Monday. [20:19:51] If that fails, then it's definitely the driver having issues in kernels >3.2 so I'll remove thin provisioning and return to 3.2 [20:20:07] * Ryan_Lane nods [20:20:38] * Coren is out of settings to tweak to make the recent kernel not have the issue. [20:20:54] did you try switching it to SMB yet? [20:21:01] I swear that is the last time I'm going to say that :) [20:21:19] if NFS was the problem, then maybe SMB would be an option ;) [20:21:46] did we try switching to USB3 drives? [20:32:37] [bz] (8NEW - created by: 2Chris McMahon, priority: 4Unprioritized - 6major) [Bug 50622] Special:NewPagesFeed intermittently fails on beta cluster; causes test failure - https://bugzilla.wikimedia.org/show_bug.cgi?id=50622 [20:35:22] [bz] (8NEW - created by: 2Michelle Grover, priority: 4Unprioritized - 6major) [Bug 51635] Beta labs is not loading editor.js though the file exist - https://bugzilla.wikimedia.org/show_bug.cgi?id=51635 [20:45:24] rtng [20:48:39] Coren: I... am cloning mediawiki/core on NFS now. Hopefully that doesn't kill it [20:48:53] (in toollabs) [20:49:02] let me know if it is fucking with debugging. [20:49:08] I don't think it can be /killed/, per se. Worse that can happen in the current situation is a stall. [20:49:35] more like giving you a headsup so you don't have to debug another apparently random occurance :) [21:10:57] Coren: still firefighting? [21:11:15] Raging and fuming more like it. [21:11:27] ok, np, i'll wait, no rush [21:11:39] Danny_B: Yeah, I'll be able to do the query later tonight; I'll email you the result. [21:11:50] thx [21:12:50] * Damianz throws some cooling water over Coren [21:15:41] * Nettrom if Coren.likes(cool_beer): give(Coren, cool_beer) [21:18:38] On the plus side, the number of stalls is currently very low. Once in ~3h is the best we've had to date. [21:18:56] * Coren starts wrangling with Dell support. This is going to be fun. [21:19:16] Coren, can you riddle me this? [21:19:36] All 10 of my continuous tasks halted and never rebooted. [21:23:12] Cyberpower678: Please to check what I told Betacommand earlier. I fumbled a restart in such a way that for ~12s all the files seemed gone to user processes. [21:24:55] Some jobs may have failed if they tried accessing files during that interval. [21:25:17] Coren, why didn't they autorestart? [21:26:16] Cyberpower678: Because the script /itself/ would have appeared gone while restarting. It's a freak occurence, and I know to watch out for it in the future. [21:26:33] AH [21:27:16] Well I just restarted them. I created a list of commands to simply paste into the terminal in events like these. [21:49:59] [bz] (8NEW - created by: 2Ryan Lane, priority: 4Unprioritized - 6normal) [Bug 51642] Replace SMW/SRF/SF with wikidata + lua - https://bugzilla.wikimedia.org/show_bug.cgi?id=51642 [21:55:02] Coren, the local db host is tools-db. Is it not? [21:55:12] * Coren nods. [21:55:32] your name, is Cyberpower678, is it not? [21:55:34] Then why the hell aren't my queries working. [21:55:53] It's like the database doesn't exist. [21:57:38] I'm ditching SQL [21:57:46] it never works right for me. [21:58:05] Cyberpower678: I like how your problem reports are so detailed and specific. [21:58:27] $dblocal = new Database( 'tools-db', $toolserver_username, $toolserver_password, 'cyberbot' ); [21:58:28] I have no issue connecting to and using tools-db. What /exactly/ is the issue you have? [21:58:49] $dblocal->insert( "blacklisted_links", array( 'url'=>$page['el_to'], 'page'=>$page['el_from'] ) ); [21:59:49] ... that doesn't tell me much. I don't know what that Database class is, what those variables contains, or have any hint of what errors you might be getting. [22:01:03] That function sends INSERT INTO blacklisted_links (`url`,`page`) VALUES ('the url','pageid'); [22:01:19] It keeps returing nothing. [22:01:41] ... why would an insert return something? [22:01:43] When I use that exact generated SQL query in the terminal, it works just fine. [22:02:03] returning = doing sorry [22:02:27] Do you have the _exact_ query text it sends? [22:02:41] Not really. [22:02:58] It only sends a url and pageid through. [22:03:07] It fails everytime. [22:03:44] hey Coren [22:03:49] Well, you'll need to know what it really tried to do, and what the error was. The former is indispensible. Otherwise, "dunno" is the only answer you could possibly get. [22:03:50] so we got everything running and it's great [22:03:56] thanks a lot for the info on labsdb setup [22:03:57] milimetric: Yeay! [22:04:05] now there's a performance issue though :) [22:04:19] And it's not a fault with the database class since the other queries for the replicated tables work. [22:04:24] looks like labsdb doesn't have any indices on some tables [22:04:44] for example, the revision table [22:04:46] Coren, so I'm thinking that I'm not connecting to my db correctly. [22:05:15] milimetric: Tables where suppression can occur have alternatives with indexes and missing rows, this is probably what you want. [22:05:23] milimetric: What's the query that hits you like? [22:05:46] Cyberpower678: Well, surely your Database class has provision for showing you the errors it gets? [22:05:48] well, we're going to be doing a of different queries [22:06:03] *a lot of [22:06:09] Oh wait. It's not using the old .my.cnf file. [22:06:22] but i tried a very simple one like count(*) from revision where rev_user = something [22:06:28] I've got an older local set up. [22:06:48] Ah. Whenever you have a where clause on _user or _user_text you should use the revision_userindex view instead. [22:07:02] huh, interesting [22:07:09] milimetric: ^^ That view has the indices on users, at the cost of not having the rows where the user was suppressed. [22:07:10] ok, will try [22:07:24] Coren, how do I view a file in the terminal again? [22:07:26] i'm not sure what "suppressed" means [22:07:26] (Supression complicates matters) [22:08:15] milimetric: aka "Oversight" (although that's no longer an accurate term) [22:09:25] sorry Coren, I'm not very familiar with how mediawiki works [22:09:29] I know! right? :) [22:10:36] milimetric: Heh. It suffices to say that if you have any where clauses on rev_user or rev_user_text you should be using revision_userindex. :-) [22:10:48] cool, works for me [22:10:54] but the revision table has no indices [22:11:05] at least none that show in "show indexes from revision;" [22:11:18] so i would definitely never want to use that :) [22:11:18] Coren: minor 1 line patch to exec_environ, can you +2? https://gerrit.wikimedia.org/r/74536 [22:11:31] but thanks a bundle, the revision_userindex view works lightning fast [22:12:05] That's a two-line patch. [22:12:33] Did you intend to add mono-runtime explicitly like this? [22:12:52] Coren: hmm? [22:12:54] Coren: grr, rebase fail [22:12:56] Coren: let me fix [22:13:46] Coren: new patchset uploaded [22:13:52] thanks for catching that [22:19:30] Coren: ty [22:21:53] Coren|Away, I know what the problem is. The replica.my.cnf doesn't work if you have an older db. [22:22:52] Well I'm off to bed. [22:22:55] hi guys [22:23:23] who could create a shell account for me (moving from old SVN)? [22:23:36] i wanted to create my account at wikitech [22:23:53] malafaya: what's your svn username? [22:23:58] and what wikitech username do you want? [22:24:04] it's "malafaya" [22:24:05] and which email address do you want to use? [22:24:42] thanks, Ryan_Lane [22:26:01] you want malafaya as your wikitech username? [22:26:12] yes please [22:26:14] it'll also be your git name [22:26:16] ok [22:27:55] malafaya: done. go to wikitech and say you forgot your password [22:28:02] it'll email you a new one [22:28:08] ok, thank you [22:28:11] yw [22:28:21] stupid script I have for this doesn't work anymore [22:28:26] I can't remember the last time I ran that :D [22:28:57] Ryan_Lane: Should of used integration testing for that [22:28:58] * Damianz troll [22:29:06] Damianz: it's missing a library [22:29:11] must have been removed from puppet [22:29:32] Can't decide if I hate puppet right now or not [22:29:42] the answer to that is "always" [22:30:04] Trying to sell using puppet/vagrant to a software team, seem to have buy in but now have to explain puppet's weirdness that defies logic [22:30:35] :d [22:30:36] err [22:30:37] :D [22:30:41] there's always salty-vagrant [22:31:53] Kinda don't want to use salt for this - want to have every level of the scale, clone yourself and use scripts that run puppet apply for local setup to full blown puppet master with vagrant in the middle.... though salt is sexy and I want to play with that [22:32:44] Oh btw devstack is cool, but it seems to really really really be for dev only (like a vg on a loop device that's on a file), which is a shame... but openstack is cool, even if suck and breaky in parts [22:33:02] yes. devstack is for development [22:33:07] and for ci [22:34:01] I have a feeling I'd have less problems with openstack if I actually used 2 network interface because of the bridge setup... but for now kvm alone will have to do. [22:34:13] * Damianz dislikes using a desktop pc because his servers havn't arrived but he needs to do work [22:35:04] oh. yeah. it's a pain with a single network device [22:45:42] is the user space on toollabs limited? [22:48:34] Do we have unlimited hard drives? [22:50:18] we can, if we bought enough USB3 Hard Drives and used them over SMB/CIFS [22:54:29] * Damianz orders 500 YB from YuviPanda [22:54:44] * YuviPanda ships him a suspiciously large but light looking container from India [22:54:55] diz vary good! cheep prize! [22:55:13] We totally need a Yottabyte sized vm for labs [22:55:21] we're secretly the NSA [22:55:38] The NSA probably just use network based raid [22:56:04] No Such Agency [22:56:51] IIRC gchq (pretty much the uk nsa) uses DRAID for storage [22:57:24] properly setting up a sartoris environment is a pain in the ass [22:57:51] well, in labs anyway, thanks to salt [22:58:21] I need to make salt purge the cached master public key when switching masters [23:01:35] heh [23:01:48] * Damianz wonders if he setup Tahoe on a bunch of labs instances how long it would take someone to notice [23:02:01] Tahoe? [23:02:14] It's a distributed, encrypted filesystem [23:02:19] https://tahoe-lafs.org/trac/tahoe-lafs ? [23:02:24] mhm [23:02:28] it's pretty cool [23:02:39] and facebook sucks for storing my photos :P [23:02:41] I'd likely notice [23:03:12] not that I'm taking that as a challenge, but there's usually some key indicators of abuse ;) [23:03:28] Like 2 tb of random looking data appearing on your nfs servers... [23:03:35] that [23:03:42] and external bandwidth use [23:03:52] Obuvasly I'd only be storing commons data... which you cant check lol [23:03:54] and NFS dying :P [23:03:55] and a spike in disk ip [23:03:57] *io [23:04:02] * Damianz wouldn't really, but it would be amusing [23:04:19] if it's open data it doesn't need to be encrypted ;) [23:05:17] !damianz [23:05:17] some weirdo around here [23:05:55] Hmmm thinking of encryption, I wonder what happens if someone from a weird country downloads crypto stuff from labs who gets sued... since exporting out of us is apaprently bad and may be used in nukes... [23:06:12] Technical_13: It's good being weird, you can to look at humanity and laugh