[00:00:06] iSCSI taget framework, centos6. [00:00:27] ah [00:00:37] Though I'm kinda leaning towards just using ubuntu server and getting a cluster up for internal stuff over trying to test the rpms in epel-testing as the ubuntu ones seem kinda stable and supported. [00:00:37] I'm not using iscsi [00:00:46] why not just use gluster? [00:00:56] already have a reliable iscsi server? [00:01:24] I probably would have used iscsi if our netapp didn't want to charge us insane amounts of money to use it [00:01:35] I probably will use gluster as we're kinda thinking about dtiching the sans. We use local storage for vms with off site iscsi mounts for some people currently. [00:01:41] * Ryan_Lane nods [00:01:45] it'll be slower, likely [00:01:55] cheaper, though [00:02:00] Yeah we have some nice dell sans that cost about 60k and are a beast to admin :( [00:02:20] eww. dell ones? [00:02:27] how's the failover for those? [00:02:51] I don't think they are setup for failover currently as well... when they where the failover failed. [00:03:03] * Ryan_Lane twitches [00:03:06] For sans I'm more of a DRDB+pacemaker fan. [00:03:10] than gluster may actually be more reliable [00:03:13] But I'd rather not tough iscsi with a broom. [00:03:59] I'm just worried about losing more than two gluster nodes and having a partial outage on whichever instances only exist on those two nodes [00:04:09] Currently playing with gluster for some storage, not 100% sure on vms but I'd rather not have a big probablySPOF in regards to a san. [00:04:18] though, if two nodes go down, then we lost 60+ instances right there anyway [00:04:31] and with the new hardware 200+ instances per node [00:04:32] iSCSI mounts do weird things when they loose network access or the network lags out =/ [00:04:37] yep [00:05:09] we're splitting our IO [00:05:43] one for space, another to try to divide the IO into OS and application specific traffic [00:05:43] The only thing I have about gluster is if a node is down then comes back up, re-stating the files to sync it is quite extensive and there is no simple way to check the sync status (I don't believe) so I couldn't say ah yeah if we loose 2 nodes right now the remainings ones are only 70% in sync. [00:06:02] it's easy to make it re-sync [00:06:09] with the newer version of cluster [00:06:13] same to make it rebalance [00:06:20] it's a single command [00:06:33] Oooh really [00:06:36] yeah [00:06:45] I added a node and rebalanced it in a couple hours [00:06:49] I've not properly tried gluster in a while but it use to be a case of find /data and :( [00:06:55] well, the command took a couple hours to run [00:07:07] But then it was also a huge manual re-specifying replicaes on new nodes. [00:07:10] I think the find is still necessary [00:07:33] but that usually takes no time at all [00:07:45] rebalances take a while, but that makes sense, since it needs to transfer data [00:07:48] Does the cluster use server side replication or rely on the clients to do it? [00:08:01] rebalances are server side [00:08:07] the find requires a client [00:08:16] so, you mount the volumes on the server and do a find [00:08:52] that said, when a node comes back up, if a client tries to access a file, it'll be sync'd then [00:09:02] Yeah [00:09:03] which means requests for those files will be slow [00:09:24] easiest to just do it when the node comes up [00:09:27] you can script it [00:09:30] My concern is when you get into the issues on loosing 2 nodes and suddenly you have your redundant part out of sync. Then you loose something and bam weird data loss issues. [00:09:47] and even with hundreds of volumes, a find shouldn't take ages, since it just needs to scan metadata [00:09:49] But it can be worked around pretty easily. [00:10:03] well, you'd have to fully lose those two nodes [00:10:14] I'm using raid underneath gluster [00:10:29] two raid 6 arrays LVM'd together, then shared via gluster [00:10:36] it's lossy, but safer [00:11:08] on the instance storage, every node has a raid 10, shared via gluster [00:11:09] Any reason for raid6 over say raid 10 and just bang a big zfs or xfs fs on there? Though zfs still ins't support in linux I don't think which is a shame as its snapshotting is pretty cool. [00:11:23] raid 6 for volume storage, and raid 10 for instance storage [00:11:39] raid 6 because raid 10 costs too much for volume storage [00:11:53] it eats *way* too many disks [00:12:10] raid6 is raid5 with an extra spare, right? [00:12:13] yeah [00:12:23] I worry about a 12 disk raid 5 [00:12:31] Yeah... [00:12:47] raid 6 is slower, but it makes me sleep better at night :) [00:13:12] One of our raid6 sans blew a disk the other day, raid5 would have caused arghness. [00:13:22] yeah [00:13:39] I honestly don't trust raid 5 anymore, with these much larger disks [00:14:06] the chance of getting a silent write error is basically guaranteed with raid 5 [00:16:16] Raid 5 takes long enough to rebuild with like a 300gb disks, wouldn't even try it with a 2tb. Thats like building something over 2tb on ext3 then hitting a 2month fsck xD [00:44:27] ssmollett: your change went through everywhere and is working properly :) [01:03:04] !log deployment-prep reconfiguring the web server instances to remove puppet classes that no longer exist [01:03:07] Logged the message, Master [01:33:23] * Damianz gives ssmollett a cookie [01:33:30] Also I FUCKING hate dc people [02:24:03] Damianz: hahaha. why's that? [02:27:21] 02/22/2012 - 02:27:21 - Creating a project directory for orgcharts [02:37:25] 02/22/2012 - 02:37:24 - Creating a home directory for marktraceur at /export/home/orgcharts/marktraceur [02:38:11] 02/22/2012 - 02:38:10 - Creating a home directory for marktraceur at /export/home/bastion/marktraceur [02:38:23] 02/22/2012 - 02:38:22 - Updating keys for marktraceur [02:39:10] 02/22/2012 - 02:39:10 - Updating keys for marktraceur [09:44:33] I'd like to add a jdk and tomcat appserver to the puppet config menu how do I proceed [09:47:46] Hey, gadgets aren't working at hi-wp deployment [09:47:56] anyone know why? [09:48:39] Change on 12mediawiki a page Wikimedia Labs was modified, changed by OrenBochman link https://www.mediawiki.org/w/index.php?diff=502555 edit summary: /* Proposals */ [09:48:49] *mean at http://hi.wikipedia.beta.wmflabs.org ofcourse [09:58:44] anyone? [10:06:21] anyone here? [10:07:45] anyone? [13:45:43] Why aren't gadgets working at the test site? [13:46:07] ? [13:46:20] Sid-G: could you be more specific? which test site? [13:46:46] sumanah: I'm not getting gadgets at hi-wp labs [13:46:54] at http://hi.wikipedia.beta.wmflabs.org [13:47:34] sumanah:No gadgets tab in the prefs anymore (it was there yesterday) [13:47:43] That is strange! [13:48:24] sumanah: Yeah, and the gadgets I had enabled dont show up anymore either [13:48:47] The script doesn't load (checked with the chrome console) [13:48:54] Sid-G: let's talk about this in #wikimedia-dev as I think some folks will be there who care about this.... [13:49:44] ok [14:41:10] 02/22/2012 - 14:41:10 - Updating keys for marktraceur [14:41:22] 02/22/2012 - 14:41:21 - Updating keys for marktraceur [15:05:08] 02/22/2012 - 15:05:07 - Updating keys for hydriz [15:05:17] 02/22/2012 - 15:05:17 - Updating keys for hydriz [18:13:45] hi Ryan_Lane do you have a couple minutes for a labs question? I'm trying to ssh to my labs instance from bastion and failing. [18:13:53] sure [18:13:57] are you forwarding your key? [18:14:01] which instance? [18:14:07] *forwarding your agent? [18:14:46] from bastion can you ssh to bastion1? [18:15:00] (it's the same system, but lets me know if forwarding is working correctly. [18:15:12] s/\./)/ heh [18:15:48] I have an instance named "firstinstance". I ssh to bastion just fine. according to https://labsconsole.wikimedia.org/wiki/Access, it seems I should be able to ssh from bastion to "firstinstance" but I get "Permission denied (publickey)." [18:16:33] actually bastion1, I'm "cmcmahon@bastion1" at the shell prompt [18:18:08] <^demon|away> Ryan_Lane: I think my gitweb redirect was wrong. https://gerrit.wikimedia.org/gitweb/operations/puppet.git gives a 404, and just /gitweb/ exposes the perl script :p [18:20:05] yeah [18:20:25] chrismcmahon: yeah, but try sshing to bastion1 from bastion [18:20:37] it lets me know your agent is forwarding properly [18:21:21] Ryan_Lane: I do "ssh -A cmcmahon@bastion.wmflabs.org" and end up with a shell prompt on bastion1 [18:21:28] ..... [18:21:38] now ssh to bastion1, from bastion1 [18:21:57] if your agent is forwarded properly it'll work, otherwise it wont [18:22:22] which project is this instance in? [18:22:31] weird. "cmcmahon@bastion1:~$ ssh bastion1" "Permission denied (publickey)" [18:22:57] type: ssh-add -l [18:23:15] does it show a key? [18:23:16] my "firstinstance" instance is in project "quality-assurance" [18:23:27] ok. gimme a sec, I need to add myself to it [18:23:39] $ ssh-add -l [18:23:39] The agent has no identities. [18:23:44] actually, no I don't [18:23:51] you're instance works properly [18:24:00] hi Ryan_Lane [18:24:04] you need to get agent forwarding working properly [18:24:15] chrismcmahon: https://labsconsole.wikimedia.org/wiki/Help:Access#Using_agent_forwarding [18:24:22] koolhead12: howdy [18:24:24] * Damianz stretches out [18:24:36] Ryan_Lane, am good. 1 more to go. :) [18:24:52] PHP + file permissions = sad person [18:24:53] chrismcmahon: this is a common problem ;) [18:24:59] Damianz: heh [18:25:03] koolhead12: 1 more to go? [18:25:34] Ryan_Lane, multi node automation of diablo :) [18:25:39] ahhhh ok [18:27:17] although the isolating the nova-network from API is still not working for me. have to dig deeper inside :) [18:27:36] yeah, I had to put the api on the network node [18:28:04] Ryan_Lane: seems to me that "ssh -A" should just dtrt. not sure where the breakdown is. [18:28:59] otherwise there's a need for some awkward networking rules [18:29:12] I think in essex they are moving the metadata server to the compute service [18:29:24] then the API can be separate [18:29:37] chrismcmahon: on your local system: ssh-add -l [18:29:49] you want to ensure your agent has it locally, then also has it on bastion [18:29:57] Isn't essex due for release soon? [18:29:58] you don't want to forward for agent past bastion, in general [18:30:06] Damianz: yeah, in weeks, likely [18:30:11] Damianz, yes in few months!! :) [18:30:14] I need to defer a bug of mine :( [18:30:45] I'm wondering how smooth the update path is going to be. [18:30:56] no clue. I had a terrible, terrible time for diablo [18:31:06] but I also kind of fucked myself with facebook mysql libraries [18:31:22] lol [18:31:28] I had an outage because of that [18:31:30] it wasn't pleasant [18:31:34] Damianz, if your using keystone u will be safe [18:31:38] I really hope they don't get stuck in a 'you have to update all nodes at once' place [18:31:42] else you have to take same route [18:31:55] apparently openstack dislikes changing the default charset [18:32:01] or, well, sqlalchemy does [18:32:05] ha. got it. Ryan_Lane I didn't realize "eval 'ssh-agent'" wasn't a permanent condition. I'm on my instance now. [18:32:27] chrismcmahon: ah, yeah, whenever you open a new shell your environment won't have the agent [18:32:38] you can re-connect to your agent in other shells, though [18:32:49] btw Damianz hellos :) [18:32:59] I have scripts that write the environment variables out to a file, then I source the file in other shells [18:33:01] * Damianz waves at koolhead12 [18:33:11] Ryan_Lane: been some time since I've been deep in ssh, it's coming back to me, thanks [18:33:13] I run two agents, one for production and another for labs [18:33:15] Damianz, are you also a stacker :) [18:33:24] good to see openstack people in the channel :) [18:33:24] * koolhead12 is still clueless of swift [18:33:32] * Ryan_Lane is also mostly clueless of swift [18:33:41] I need to learn it soon enough, though [18:33:43] I'm more confused about switft... [18:33:47] hahaha [18:33:53] Damianz: thanks for the cookie; glad the ldap library change worked. [18:34:04] ben will be out of town for the openstack conference. I may need to give the swift talk for us. [18:35:15] my next week target is to start swimming inside it --> swift [18:35:30] Ryan_Lane, and how will one loadbalance nova-api? [18:35:38] by keeping multiple zones [18:35:44] ssmollett: a few instances still need puppet fixed and run [18:35:55] I've been doing that. it's only about 5-6 more right now [18:36:04] this is a good way of ensuring systems are running puppet properly :) [18:37:09] is it worth having monitoring tell us if puppet hasn't run recently or has run but failed? [18:37:18] yes, definitely [18:37:28] we do that in production using an snmptrap [18:37:36] Bah stupid home internets [18:37:40] if you could add that to the nagios instance, that would rock [18:37:53] it probably just needs to be slightly modified in puppet to work [18:38:29] o.0 [18:38:31] * Damianz doesn't trust snmp traps to be reliable enough [18:38:53] Damianz: well, if the snmptrap isn't working, then all nodes show puppet as not running [18:39:05] then we know the snmptrap is broken [18:39:27] Ah, so you have the thingy option on the pasive check that if it isn't updated in xmin nagios moans? [18:39:41] I believe that's how it works [18:39:51] I didn't set up the trap [18:41:54] Hmm I really should fix my nagios box, was kinda leaning towards having puppet manage it though over my lovely mysql+perl solution *shudder* [18:42:17] New patchset: Sara; "First iteration of adding ganglia for labs." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/2157 [18:42:36] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/2157 [18:44:00] Ryan_Lane: those ganglia changes are all minor. i still need to work out the gmond stuff. maybe we can sit down and figure that out friday? [18:44:16] Damianz: it's going to suck. puppet managament of nagios is really, really painful [18:44:22] ssmollett: sure. [18:44:48] Hmm [18:46:17] ssmollett: basically, you'll want to change the cluster name to the project, the udp_send_channel to use unicast, rather than multicast, where the address is that of the ganglia instance [18:46:29] and then set the port by the gid [18:46:52] I think alll of the "accept" stuff isn't needed, since none of the instances will be acting as aggregators [18:47:04] accept/recv [18:47:41] ... [18:47:48] I can put ubuntu on a driod o.0 [18:47:56] but do you *want* to? :D [18:48:09] Nope, I have an iPhone :P [18:48:21] heh [18:48:22] Though I do want an andriod. [18:48:48] I may eventually get one. I'm not buying a new phone for a while, though [18:49:02] but I want a non-skinned android, that isn't fucking massive [18:49:16] all of these tablet sized phones can die in a fire [18:49:35] I mainly use my phone for music and email so kinda don't want to pay for a new one. [18:50:39] hm. how am I going to sanely manage the project storage till the volume driver is ready? [18:51:05] this is gonna need caching of come variety. heh [18:51:08] Free for all! [18:51:34] well… the problem is that the number of directories is based on the number of users and projects [18:51:51] though, I guess it's based on users in a project, so it's not as bad as I'm making it out [18:52:07] I guess I'll just modify my home directory script for now [18:52:33] it creates an exports file right now. I need to change it to manage gluster volumes instead [18:53:15] I also need to work out a directory structure for volumes [18:53:35] andrewbogott_afk: I should likely coordinate this with you, so that our structures match [18:54:18] It will get more complex when you take into account projects having blocks of storage that arn't related to users. [18:54:34] nah, that'll be quota'd by project [18:54:50] right now it's volume/project [18:55:08] so, home/bastion/laner, home/bastion/damianz, etc. [18:55:49] but, the gluster volume driver would manage home/bastion, home/mail, home/feeds, etc. and a script will add the home directories underneath that volume [18:56:05] even the home directories per project will have a project specific quota, not per-user [18:56:08] it makes things easier [18:56:45] I'll make the script add a default quota, and I can manually adjust them for now [18:56:57] Makes sense [18:57:11] though I could always do project/volume, too [18:57:21] not sure which is easier [18:57:41] I think likely volume/project, since it's what I'm already doing :) [18:57:44] Puppet modules that come with install instructions past 'put it in the modules dir' confuse me :( [18:57:49] heh [18:58:03] well, some properly require server side things [18:58:09] we have a few puppet things that do [18:58:47] For some reason this module actually comes with instructions of how to install puppet... [18:59:02] hahaha [18:59:54] * koolhead12 is scared of Ruby [19:00:21] I wish cfengine was more highlevelish [19:00:42] It's just a more reliable way of what I do now and writing bash scripts to validate stuff other bash scripts did just ends in pain [19:00:43] ruby is easy enough, it's just managed poorly [19:00:47] Damianz, don`t we have a configuration management system in python [19:00:56] unfortunately, no [19:00:58] they are all ruby [19:01:03] That would be nice [19:01:07] cfengine sucks [19:01:15] well, it's outdated, really [19:01:24] Ryan_Lane, it was pain for me to install it on my ubuntu box :P [19:01:33] Ubuntu is easy... [19:01:38] Damianz, i still prefer old bash for my deployments [19:01:40] You should try making puppetmaster work on c5 [19:01:57] Damianz, yeah. puppet is easily configured in Ubuntu [19:02:03] bash is evil [19:02:04] Random bash scripts don't work when you have 500 servers that might be up/down/lagged out etc that you'd rather be in one state. [19:02:20] I rarely write bash scripts anymore [19:02:39] Ryan_Lane, so using puppet most of your work? [19:02:47] $(perl -e "") :D [19:02:49] almost exclusively [19:02:55] and python scripts for a lot too [19:02:57] Damianz, your scaring me ;P [19:03:01] and adding support to openstack for other things [19:03:39] I really want openstack to be fully extendable. it's not right now :( [19:03:42] I'm stuck with perl for a bunch of stuff due to the python version on centos sucking but python is nice for lager stuff that requires mroe than like 2k lines in one file. [19:03:50] Damianz: are you going to the openstack conference? [19:03:53] koolhead12: or you? [19:04:01] Ryan_Lane, no. am in india :( [19:04:04] ah [19:04:27] Hmm maybe, dunno [19:04:36] i am learning python, still learning :D [19:04:39] have you submitted code to openstack at all? [19:04:45] invites right now are limited to contributors [19:04:48] I'm in the UK so it's a trek but... I also have half my team in the us so I could find an excuse for the flight. [19:04:53] heh [19:04:54] Ryan_Lane: been looking around and not finding: is there some canonical procedure to install Mediawiki on a labs instance? [19:05:05] And nope, I keep meaning to but been busy. [19:05:12] chrismcmahon, juju :P [19:05:19] i can help if needed :) [19:05:21] chrismcmahon: unfortunately it's from scratch until someone makes an all-in-one puppet manifest for it [19:05:24] no juju! [19:05:27] heh [19:05:29] Ryan_Lane, :D [19:05:36] puppet only, please ;) [19:05:40] looks like a lot of yaks from here :) [19:05:51] Ryan_Lane, gosh!! you really want me to learn puppet. ok sir [19:05:52] :) [19:06:00] chrismcmahon, hehe [19:06:58] koolhead12: we have no plans to use anything other than puppet or openstack right now [19:07:03] koolhead12: if you could get me started down the road to install MW on an instance, I am sure it would save me time exploring dead ends. [19:07:25] chrismcmahon: documentation on mediawiki.org for installing on ubuntu should be appropriate [19:07:34] don't install from the mediawiki package, though [19:07:35] chrismcmahon, i can do that. right away if needed [19:07:37] You need docs? [19:07:51] Don't you just like download it and add a mysql db, run the maint script and done? [19:07:56] basically, yes [19:07:59] Damianz, yes [19:08:10] you need to install apache, mysql, php, apc, memcache, and a few other things [19:08:13] memcache isn't necessary [19:08:24] neither is apc, but you'd be crazy to not install apc [19:08:30] and create upload directory with right permission [19:08:30] memcache makes it faster though... not that mw is ever going to be fast. [19:09:04] often we install in /srv for one-off installs, but /mnt may be more appropriate on labs [19:10:13] heh, oren made an instance called dumpster1? :D [19:10:15] I really hate people that have like 25 quotes in email replies... also top posting is confusing [19:10:37] lol [19:10:51] people name instances the strangest things [19:11:07] at least we haven't been hitting naming conflicts yet :) [19:11:32] these docs? http://www.mediawiki.org/wiki/Manual:Installation_requirements http://www.mediawiki.org/wiki/Manual:Installing_MediaWiki [19:11:34] Is that possible? I thought all the references where to the stupid openstack generated hostname. [19:11:52] there's two names [19:11:58] instance name, and instance id [19:12:03] both need to be unique [19:12:12] and mediawiki enforces it [19:12:48] so people can't name two things the same, but they can try [19:12:55] and no one has complained yet. heh [19:13:09] i mean you guys might find it funny but am 4 experimenting with internal infra, i used Bash and 5 nodes, cobbler :P [19:13:28] cobbler is kinda annoying [19:14:14] cobbler is fine for a regular build server [19:14:21] Damianz, indeed. changes from version to version, espacially the whole preseed rule gets screwed [19:14:41] Ryan_Lane, that post_install script is cool. it does most automation [19:14:55] we just use puppet for that too [19:15:01] like injecting public key and stuff [19:15:05] Ryan_Lane, got it :P [19:15:40] Damianz, you also a openstack dev? [19:15:52] I used to do a lot of stuff via post-inst and simple config files, when I used red hat satellite server [19:15:55] nope [19:15:57] puppet is more powerful [19:16:51] hmm i will dig into it soon. [19:17:21] i installed it and passed simple stuff like changing file permission and installing apache :) [19:18:00] I have puppet forcing ntpd on my test servers atm, currently figuring out making it manage openstack though D: Though there is an openstack module in puppetlabs. [19:19:02] our repo has puppet manifests for openstack [19:19:08] I don't think I recommend mine, though ;) [19:19:18] they aren't modules, for one [19:19:42] I'm in 2 minds about modules [19:19:55] I can see why they are good but I'm also taking others peoples ideas of how my configs should look. [19:20:12] Though I guess I could just change the templates and then screw pulling updates from upstream. [19:20:44] yeah [19:20:53] we basically write everything from scratch [19:21:22] Ryan_Lane, any wiki/doc you would suggest for n00b like me who knows no ruby :) [19:21:28] for puppet heads up [19:21:42] I was reading about facter or w/e it's called the other day which looks interesting. Hope I can pull the data out of that to use in manifests and set values in templates based on things like ram in a box. [19:23:17] koolhead12: well, it's a puppet dsl, not ruby [19:23:30] koolhead12: read the tutorial and intro documents for puppet [19:23:33] they are a good start [19:23:44] Damianz: yeah, we use facter quite a bit [19:24:01] Ryan_Lane, factor uses python i suppose :P [19:24:08] Damianz: I'd *really* love to pull the metadata information for instances, then have it available as facter info [19:24:11] does it? [19:24:14] I thought it was also ruby [19:24:34] Facter is ruby I'm pretty sure [19:24:50] it's scripts are defintitely ruby [19:24:58] its* [19:25:00] Instance data would be cool. [19:25:14] You could do like location based puppet things based on which zone it's in. [19:25:29] we do that now, by populating ldap with that info [19:25:50] but we want to eventually move away from using ldap for node info [19:27:44] Hmm [19:28:46] I'd prefer to keep it in the metadata [19:29:12] hmm, no gcc on instances it seems [19:29:24] Is openstacks metadata distributed or on the controller? [19:29:30] apt-get install gcc? [19:29:36] Damianz, Ryan_Lane yes ruby :P [19:29:39] Why do you need gcc though [19:29:47] Compiling stuff = PITA the maintain. [19:29:48] Damianz: installing Apache [19:29:54] apt-get install apache2? [19:29:55] Damianz: it's on the API server right now [19:30:08] Damianz: I think in essex it's on every compute node [19:30:22] the data is stored in the database, thouguh [19:30:23] *though [19:30:44] chrismcmahon: no. installs are base, on purpose :) [19:30:54] there are puppet classes for build systems, though [19:31:05] you can add that by reconfiguring the instance, then force running puppet [19:31:16] some puppet classes may cause puppet to stop running, though... [19:31:17] no sudo, so apt-get install fails [19:31:24] you have sudo [19:31:26] You should have sudo. [19:31:28] it's your wiki password [19:31:34] you don't have it on bastion [19:32:11] aha. wrong password :) [19:32:17] I'm in 2 minds about the openstack structure, like a single controller is pointless spreading the databases but then after that you still have to keep all the dbs in replication etc. Tempting to just mirror the single controller than have to build a cluster to manage the cluster =/ [19:33:22] Bah I just remembered I need to fix my ldap server, hate corrupt slapd dbs :( [19:33:35] heh [19:33:46] another reason I hate openldap [19:33:54] its database gets corrupted far too easily [19:34:10] somehow opendj uses a bdb, but its database *never* gets corrupted [19:34:27] sun directory server also never corrupted itself [19:34:34] opendj is java isn't it? [19:34:59] And yeah, it's like the 3rd time though this last time I blame a certain person for powercycling the host in a not very nice way ... [19:35:32] yeah, java [19:35:50] and the bdb implementation is java too [19:35:58] Hmm probably easier to update than openldap then, I assume it's just a jar you run? [19:36:08] well.... [19:36:16] it's not that easy, unfortunately [19:36:23] Nothing ever is :P [19:36:29] I plan on upgrading a replica, then upgrading the other one [19:36:35] then I don't need to run the upgrade scripts [19:36:50] I'll just totally reinitialize the replica [19:37:03] Hmmm [19:37:10] then, yes, it's a matter of updating the jar ;) [19:37:12] Does it support master/master or is it just one way? [19:37:16] master/master [19:37:25] up to some large number of nodes [19:38:02] configuration of it is slightly more painful, though [19:38:17] it's reliable, which is why I prefer it [19:38:24] Openldap is painful enough lol [19:38:28] I also like that ACIs are stored directly in the DIT, which is replicated [19:38:44] Though saying that... RADIUS was more of a PITA than ldap ever was. [19:38:48] yeah [19:38:53] I don't plan on ever using radius :) [19:39:05] I need to setup kerberos at some point [19:39:12] Sadly switches and routers generally don't play nicely with ldap directly. [19:39:13] integration there is going to suck, so I don't really want to [19:39:19] yeah [19:39:47] Krb would be nice for labs. [19:39:57] yes [19:40:05] setting the password is the problematic part [19:40:21] and managing the keytabs and such [19:40:46] Mhm, but it would mean I could never enter my passwords on labs. Get the keytab locally then use that for all ssh/sudo etc. [19:41:30] Ryan_Lane, i will need an account too or the labs :P [19:41:56] you don't already have one? [19:42:04] nopes :( [19:42:27] * Damianz waits for the bot [19:43:06] !account-questions | akhanna [19:43:06] akhanna: I need the following info from you: 1. Your preferred wiki user name. This will also be your git username, so if you'd prefer this to be your real name, then provide your real name. 2. Your SVN account name, or your preferred shell account name, if you do not have SVN access. 3. Your preferred email address. [19:43:12] !account-questions | koolhead12 [19:43:13] koolhead12: I need the following info from you: 1. Your preferred wiki user name. This will also be your git username, so if you'd prefer this to be your real name, then provide your real name. 2. Your SVN account name, or your preferred shell account name, if you do not have SVN access. 3. Your preferred email address. [19:43:19] koolhead12: what are you going to work on in labs [19:43:20] ? [19:43:33] Ryan_Lane, openstack related stuff [19:44:08] oh? what did you want to do with openstack? [19:44:42] Ryan_Lane, depends on whats help is needed from my side :) [19:45:12] hm [19:45:19] do you know php? [19:45:27] I wish i didn't :P [19:45:32] Ryan_Lane, :( no [19:45:34] I need to switch us to using the nova api, rather than the ec2 api in openstackmanage extension for mediawiki [19:45:57] do you know python? :) [19:46:04] Ryan_Lane: We don't currently have a controller etc setup for development things do we? [19:46:10] Damianz: we do [19:46:13] Ryan_Lane, yes. but not l33t :) [19:46:20] Damianz: we've been using devstack, though [19:46:28] :o [19:46:37] well, all the openstack stuff is either going to be python or php ;) [19:46:42] * Damianz might hack some of Ryan_Lane's hacked up php [19:46:51] heh [19:46:52] Ryan_Lane, why not custom deployment script [19:46:55] my php isn't hacked up :) [19:47:02] All php is hacked up :P [19:47:03] it's fully OO and everything [19:47:08] true [19:47:22] koolhead12: custom deployment script? for what? [19:47:28] I'll just take your string and make it into an int which acts like a bool... [19:47:45] i think devstack script uses github repo for every pkg [19:49:49] devstack is basically what everyone uses for development [19:50:16] Devstack is just a bunch of scripts that will turn any old bit of kit into an openstack setup that works. [19:50:18] umm. ok [19:50:45] My mac needs more ram D: [19:50:54] i never used it, i preffered installing manually from the ubuntu repo [19:50:55] !initial-login | akhanna [19:50:55] akhanna: https://labsconsole.wikimedia.org/wiki/Access#Initial_log_in [19:50:56] No Nova credentials found for your account. [19:51:00] DIE IN A BALL OF FIRE [19:51:02] heh [19:51:10] Damianz: what did you do to trigger that? [19:51:19] Clicked on manage instances... [19:51:22] did you click any redlinks, or visit any special pages other than that one? [19:51:27] Nope [19:51:31] Home page -> manage instances [19:51:38] fuck. I really want to know why this is happening [19:52:26] It isn't happening to me [19:52:37] I haven't had it happen since the last fix I pushed [19:54:06] I might install devstack on my desktop and see if mediawiki will play nicely with my lcaptop. [19:55:54] * Damianz thinks Ryan_Lane should use git as he doesn't have a svn account :P [19:55:58] Also it's lunch time. [19:56:12] 02/22/2012 - 19:56:11 - Creating a home directory for akhanna at /export/home/bastion/akhanna [19:56:20] 02/22/2012 - 19:56:20 - Creating a home directory for akhanna at /export/home/outreach/akhanna [19:56:28] koolhead12: I created an account for you [19:56:32] koolhead12: it should have sent an email [19:56:51] thanks got the mail [19:57:10] ah. good. glad to see the new account creation process is working :) [19:57:11] 02/22/2012 - 19:57:11 - Updating keys for akhanna [19:58:48] BTW google really wants me to remove/delete all my info from there server [19:59:10] after they allowed/bent against my govt to give account access of any user :( [19:59:32] eh? what do you mean? [19:59:40] which server? [19:59:55] yeah, your government's policies are problematic [20:00:32] we've had to think about things like panic credentials and such [20:00:49] so far we just haven't given access out to production from anyone there [20:01:29] well got to know about google, allowing indian govt to access any gmaila ccount [20:01:38] or wtever [20:51:30] !accountreq [20:51:30] in case you want to have an account on labs, please contact someone who is in charge of doing that: Ryan.Lane, m.utante or ssmolle.tt [20:52:24] sumanah: ping [20:52:29] hi ashish_d [20:52:51] ashish_d: what is your preferred wiki username ? this will be used on https://labsconsole.wikimedia.org [20:52:53] sumanah: Hey [20:53:05] dash1291 [20:53:17] ok, and pm me your preferred email address ? [20:53:47] got it. [20:53:49] ok, just a moment. [20:53:56] cool [20:54:33] sumanah: it's !account-questions [20:54:39] Lol [20:54:40] !account-questions | ashish_d [20:54:40] ashish_d: I need the following info from you: 1. Your preferred wiki user name. This will also be your git username, so if you'd prefer this to be your real name, then provide your real name. 2. Your SVN account name, or your preferred shell account name, if you do not have SVN access. 3. Your preferred email address. [20:54:46] Shh Ryan_Lane|away you're away [20:54:46] @search account [20:54:46] :P [20:54:46] Results (found 4): credentials, account-questions, account, accountreq, [20:54:52] I'm back! :) [20:54:57] I'm front! [20:55:11] 02/22/2012 - 20:55:10 - Creating a home directory for ashishd at /export/home/bastion/ashishd [20:55:24] ashish_d: https://wikitech.wikimedia.org/view/Gerrit [20:55:39] ashish_d: specifically, look for the phrase "Then the user must do the following:" [20:56:02] and do those things :-) [20:56:10] 02/22/2012 - 20:56:10 - Updating keys for ashishd [20:56:29] I thought they didn't have to now? As in the signup thing did the email reset etc already [20:56:31] Ryan_Lane: I have now successfully created a Labs account for someone who already had SVN access. [20:57:04] Ryan_Lane: is the wiki name (also git username) changeable? [20:58:20] not easily [20:58:27] especially once they log into gerrit [20:58:43] if they do it before then, we can rename it in ldap (modify-ldap-user), then modify it in the wiki [20:59:01] if they log into gerrit, they then need to have painful mysql queries run to rename them [20:59:10] Ryan_Lane: also, what's the process for giving Labs accounts to people who don't have SVN access? [20:59:11] 02/22/2012 - 20:59:11 - Updating keys for ashishd [20:59:21] same as creating svn accounts, then linking them [20:59:39] it's also possible to create accounts via the wiki, if they don't already have an svn account [20:59:41] ok, so, make an LDAP user on formey, simply don't add them to any SVN groups [20:59:45] yep [20:59:46] Ryan_Lane: oh? how? [20:59:51] use a fake key too [21:00:05] I need to add a create user permission to the wiki for a group, then add you guys [21:00:08] so that you don't need wiki admin [21:00:15] lemme see how to do that, then I'll show you how [21:01:22] "it's also possible to create accounts via the wiki" - show me on Labsconsole where I link a wiki username to an LDAP account? [21:01:34] can't link accounts there [21:01:39] sumanah: Done with it [21:01:40] it's only for brand new accounts [21:01:55] createaccount seems to be the right permission [21:03:17] ashish_d: great. enjoy Labs! [21:03:29] Thank you! [21:03:32] sumanah: who else should I add to the accountcreators group? [21:03:34] ashish_d: here's hoping you can show off your collaborative editor stuff there as it grows [21:03:35] I added you [21:03:48] Ryan_Lane: Can you tell me more about directory structure for volumes? [21:04:10] andrewbogott: I don't have one planned just yet, wanted to talk with you about it first :) [21:04:13] I think that what I'm working on now is pretty much agnostic about directory structure, but that makes me think that I'm not doing what you think/want [21:04:15] so that our implementations will match [21:04:21] oh [21:04:22] hm [21:04:30] sumanah: Looking forward to it :) [21:04:32] I guess it makes sense that it wouldn't [21:04:36] Do you mean structure within the created volumes, or where the virtual volumes are w/respect to the host structure? [21:04:54] I mean where the volume's directories exist [21:05:15] so, when a user creates a volume, where does it live on the filesystem? [21:05:26] is it volume/project, or is it project/volume [21:05:29] Ryan_Lane: ok, updated https://wikitech.wikimedia.org/view/Gerrit#Giving_users_Labs_access.2C_if_they_don.27t_already_have_SVN_access . Add RobLa, Mark Hershberger, Sam Reed, Gabriel, and Chad? [21:05:31] I'm assuming the latter is easier [21:05:49] sumanah: https://labsconsole.wikimedia.org/w/index.php?title=Special:UserLogin&type=signup [21:06:00] that's the interface for creating an account for someone [21:06:15] Ryan_Lane: ok. those buttons at the end, which should one choose? [21:06:28] by email [21:06:31] it'll send them a password [21:06:34] Oh, ok. At the moment I have things set up so that that's a global nova flag. But maybe we need to specify it per volume. It might be that the code I have causes collisions; not sure if I've tried creating >1 volume at a time :) [21:06:38] which also makes !initial-login less necessary [21:07:05] andrewbogott: yeah. we may have conflicts if we let users specify any location they want [21:07:18] I'd imagine the global would specify the top level they'll exist from [21:07:23] so, /export, for instance [21:07:29] then /export/bastion [21:07:35] /export/bastion/home (for home volume) [21:07:46] /export/bastion/data (for default data volume) [21:08:05] then volumes can't conflict between projects, even if they have the same name [21:08:19] So... looking at 'gluster volume create' [21:08:29] It just takes the location of the bricks. [21:08:35] And the name of the volume to create... [21:09:00] volume create [21:09:00] Ryan_Lane: does an instance have an internet-accessible IP addr or URL? [21:09:05] I presume that where that volume is mounted in the instance is handled by nova code elsewhere [21:09:08] chrismcmahon: only if we give it one [21:09:30] chrismcmahon: but, unless the instance needs to be publically accessible, you should use a socks-proxy [21:09:39] !socks-proxy | chrismcmahon [21:09:39] chrismcmahon: ssh @bastion.wmflabs.org -D ; # [21:09:55] andrewbogott: I'm not sure if we can make the instance mount the volume [21:10:04] Right, and the definitions include a path component, but I'm presuming that the set of bricks is the same for the entire gluster install. Is that wrong? [21:10:11] I think the nova-volume code assumes libvirt is going to mount it [21:10:26] no, you can define anything you want there [21:10:32] ok. [21:10:37] so, brick:/export/bastion/home [21:11:14] // [21:11:48] So, sorry, not sure by what your 'no' means. You're saying that /each/ volume needs a different set of brick definitions? [21:11:58] I believe so [21:12:10] you can choose any combination of bricks [21:12:23] I think we should be always using all bricks, though [21:12:34] unless you can think of some sane way of handling it :) [21:12:44] I'd imagine we'd have the brick list as a global as well [21:12:53] Right. And if you create two different volumes with the same list of bricks "brick1:/glusterstuff brick2/glusterstuff" they'll coexist or conflict? [21:13:12] 02/22/2012 - 21:13:12 - Updating keys for akhanna [21:13:16] coexist, as long as the volume name is different, I think [21:13:19] I haven't tried that [21:13:21] 02/22/2012 - 21:13:21 - Updating keys for akhanna [21:13:35] but if we use //, we can ensure they never conflict [21:13:54] the only hard thing, is that it's then impossible to share storage cross-project [21:14:27] we don't really *need* that right now, though [21:14:40] and we can always manually create volumes for that, if we need it [21:14:41] Sorry, it feels like you're contradicting yourself, which I think means we're using conflicting terminology. [21:14:47] oh. sorry [21:14:51] Change on 12mediawiki a page Wikimedia Labs was modified, changed by Sumanah link https://www.mediawiki.org/w/index.php?diff=502751 edit summary: Got the power to make an account [21:15:00] lemme pastebin what I'm thinking [21:15:13] Right now, my code lets you define a flag, e.g. "bricklist=[host1:/place1 host2:/place2 host3:/place3] [21:15:28] And then, it calls gluster create volume $bricklist [21:15:39] So, bricklist is global, nova-wide. [21:16:18] lemme see what that does quick [21:16:28] http://pastebin.com/afMcY0fC [21:18:36] it would then also need to share the volume with all instances of the project, which may actually be kind of difficult [21:18:53] since it would need to re-do that when instances were created or deleted [21:19:19] I can't see a way around making the brick-list global [21:19:32] unless we're going to let users define the bricks, but I think that's a bad idea [21:20:11] I was confused because I've been using 'brick list' to mean hostnames and paths, and you were using it mean just hosts. So now I think I understand what's happening. [21:20:25] the brick list is the gluster peers [21:20:26] I think having it be global is good, there's no reason for users to know about the set of file servers. [21:20:30] yeah [21:20:39] In fact, I sort of wish that gluster abstracted all that away into a config file someplace. [21:20:53] yeah, they want you to be able to mix and match volume styles, though [21:21:04] I think we should globally allow either striped or replicated [21:21:28] and I think we should default to replicated (because only crazy people, or people that only want scratch data use striped) [21:21:42] Yep, I have flags for mode and #. [21:21:45] ah. cool [21:22:02] transport could be interesting, for those using infiniband [21:22:36] I wonder if the driver knows the project name though... [21:22:57] oh. hm. I'd hope it does [21:23:02] Transport is just tcp or rdma. I don't think I understand when/if someone would use rdma [21:23:20] people would use rdma if they were using some very fast networking mode that supported it, like infiniband [21:25:36] doesn't hurt to have the option, defaulting to tcp [21:25:47] * andrewbogott nods [21:25:59] hm. it'll be a pain if it doesn't know about the project [21:26:37] Well, it may be that nova does something sensible about creating unique volume ids automatically. I will test that shortly. [21:26:43] * Ryan_Lane nods [21:27:15] Ideally, the location of the volumes is hidden from everyone but nova, right? [21:27:17] o.0 [21:27:27] I think we'll need people to know the volume names [21:27:31] or they won't be able to mount them [21:27:42] because I don't think nova will be able to automatically mount then [21:27:43] *them [21:27:57] hm, ok. [21:28:08] we're going a little outside the volume driver's normal mode of operation [21:28:51] usually, it creates a volume, then an instance mounts it, which causes nova to tell libvirt to mount the drive and make it available as /dev/vdb, or something like that [21:29:03] but libvirt can't mount this [21:29:21] of course, this is all manageable by the driver [21:30:01] I was going to use automount with wildcards [21:30:33] so if a user goes to /usr/data/ it'll try brick:/exports// [21:30:49] so that the instance doesn't actually need to know if something exists [21:31:02] as long as the gluster server is exporting to the node, it'll exist [21:32:14] I must not know enough about mounting. I sort of thought that once volumes are created they are more-or-less similar from the point of view of mounting. [21:32:43] "I must not know enough about mounting." [21:32:45] well, a lot of the volume drivers inject a device into the instance, using libvirt [21:33:04] so it just makes a new hard drive available [21:33:17] we're making a network accessible drive [21:33:29] *making a network drive accessible [21:33:47] so, funny thing, is either way, the instance still needs to handle the mounting [21:34:00] Ah, ok, I follow. [21:34:23] this is slightly more convenient for instance users, but probably harder for us [21:34:30] since we need to keep the share list up to date [21:35:09] I wonder if that's even possible..... [21:38:07] So I take it 'attaching' is different from mounting? [21:38:16] yeah, this is when creating the volume [21:39:09] gluster volume set auth.allow [21:39:18] otherwise the default is to share to the world [21:39:22] which would be bad :) [21:39:56] this is where having the project name in the dns name would be helpful [21:40:23] when an instance is created, it would need to be added to the allow list [21:40:32] when one is deleted, it would need to be removed from the allow list [21:40:55] Hmm why does passwd/sudo rely on openldap :( [21:41:07] Damianz: what do you mean? [21:41:32] To install passwd/sudo you have to install ldap lol [21:41:42] on centos? [21:41:48] that's my fault, actually [21:41:52] Yeah [21:42:09] I'm the one who convinced red hat to use sudo-ldap by default [21:42:21] you're welcome ;) [21:42:28] * Damianz shoots Ryan_Lane [21:42:39] they didn't want two package names [21:42:41] * Ryan_Lane shrugs [21:42:45] heh [21:42:47] lol [21:42:52] and I needed sudo-ldap [21:42:55] so, there you go [21:43:06] I can't say I ever manage sudo via ldap, feels a little dodgy [21:43:10] I do [21:43:12] it works well [21:44:27] though we're actually using puppet for nearly everything right now [21:44:32] sudo for ops is managed by ldap [21:46:53] Bah. I don't suppose it's possible to add RAM to an existing instance? [21:47:17] nope :( [21:47:20] My poor test machine creaks to a halt anytime it tries to run an instance (which is not surprising since it's running a VM that is as big as... itself.) [21:47:24] maybe with openstack api [21:47:50] I wish disk resizing on kvm boxes worked [21:48:11] it should in essex, right? [21:48:25] they were aiming at feature parity, I believe [21:49:07] I think so [21:49:45] andrewbogott: ah, yeah. that makes sense [21:49:46] At least it's not as bad as xen which will randomly not let you mount blocks to running vms. [21:49:49] heh [21:50:12] another reason I don't want to mount blocks, and instead mount a network filesystem [21:50:37] I really hope it's possible to share to instances without needing to tell the instance to mount it [21:50:49] because that'll be a bitch [21:51:24] I can do this by scripting the access, but that's painful [21:51:39] though that's what I plan on doing right now [21:52:09] andrewbogott: have you looked at quota support yet? [21:52:19] might be good to have a default quota setable [21:53:01] I'm taking the 'size' val that's passed in by nova and using that as a gluster size quota. [21:53:08] ah. cool [21:53:20] would that allow users to set their own size? [21:53:44] I totally forget ec2/nova let you choose any size you want [21:53:49] it also lets you resize too [21:54:09] I'm not sure about resizing, but otherwise it should just get relayed from nova. [21:54:15] * Ryan_Lane nods [21:54:26] One thing that troubles me a bit is there's a lot of support for snapshots in the driver spec, but gluster does not support snapshots. [21:54:44] I think that doesn't matter, but I'm worried that I'm going to discover some essential nova feature that relies on snapshot [21:55:58] ah [21:55:59] hm [21:56:27] In horizon at least it looks like snapshots are just used for... snapshots. [21:56:58] yeah [21:57:02] we may be able to ignore them [21:57:33] * andrewbogott sets up yet another test machine, this time with ALL the ram [21:57:43] :D [21:57:50] * Damianz watches andrewbogott explode [21:57:57] some large instances have been failing to build :( [21:58:01] I hope it works [21:58:39] Yeah, I presume if I create an xlarge instance that'll spoil the party for everyone [21:59:53] heh [21:59:58] maybe [22:00:08] when we have the new hardware it'll be fine [22:03:46] I think this is likely another situation where nova hook points would be useful :) [22:06:24] Ryan_Lane: I think that a lot of what we want to do with hooks can be done with a notification handler listening to rabbit. [22:06:37] Certainly creation/destruction of instances should show up there. [22:07:34] oh. cool [22:07:39] that'll work then [22:07:55] maybe that'll all the hook kind of functionality we'll need [22:08:09] I've gotten as far as receiving notifications but not as far as parsing them. [22:08:09] seems like a hacky way of extending :) [22:08:15] In theory they have a standard format. [22:10:22] * Ryan_Lane nods [22:16:40] I really need to add more ganglia stuff for the virtual hosts [22:16:50] number of instances running on each would be good to know [22:17:44] Over 9000. [22:18:01] heh [22:18:17] virt2 has 38 instances [22:18:24] virt1 only has 12 [22:18:30] I *really* need to change schedulers [22:19:14] It would be nice if live migration worked reliable and you could gangliafy re-balencing the load. [22:19:27] it works reliably if you do one instance at a time [22:20:18] so I need to write a script that'll balance [22:22:58] I really wish it didn't need to pause instances to move them [22:23:20] o.0 [22:23:44] it's "live" migration. heh [22:25:03] Doesn't libvirt support 'live' migration in the sense that it drops about 2 packets when it does the final memory sync. [22:25:16] you'd think so, but it isn't how it works with nova [22:25:29] it pauses the instance, transfers it, and unpauses it [22:26:07] That sucks [22:26:13] and it's taking fucking forever for the instance I just ran [22:26:32] ah. seems it failed [22:26:34] of course [22:27:31] Does opendj really hav to log every search LOL [22:27:36] no [22:27:45] but I log it so that I can debug [22:28:54] It seems nicer than openldap so far anyway.... the backup stuff is cleaner lol. [22:29:06] yeah, the backup is easy [22:29:09] I have it scripted [22:29:37] 0 * * * * /usr/local/opendj/bin/backup -a -d /root/opendj-backups/$(date +%d-%m-%y-%H:%M)/ [22:30:24] shit [22:30:33] I transferred my nova instance [22:30:41] and I have uncommitted changes. a ton of them [22:30:43] and now it failed [22:30:48] this is pissing me off [22:31:02] I assume failed = a right pain in the ass to get working again? [22:31:13] well, kind of [22:35:58] wow. it's done [22:36:02] that took fucking forever [22:36:23] though it still says it's running on virt4 [22:36:27] lol [22:36:28] which is wrong [22:37:19] * Ryan_Lane groans [22:37:29] fucking broken live migration [22:38:49] it pisses me off so much [22:43:00] well, the next one worked fine [22:48:26] we have an instance for labs tools? :P [22:48:34] I've seen in a log... [22:49:14] I meant bots [22:49:32] perhaps I should move wm-bot there [22:50:05] Damianz: now you can download all logs from irc in one file, you don't need to grep my $HOME [22:50:06] :P [22:54:17] Ryan_Lane [22:54:22] ? [22:54:31] Any guess why mysql and apache are taking, like, 15 minutes to restart on my new instance? [22:54:38] hm [22:54:39] Seems like they're timing out. [22:54:43] I'm doing migrations.... [22:54:50] maybe the host is getting overloaded [22:54:53] Ah, so maybe just cpu-bound? [22:54:55] that's weird, though [22:54:58] I'll try to be patient :) [22:56:02] it definitely shouldn't be taking that long [22:56:07] this is a devstack instance? [22:56:10] what's the instance name? [22:56:18] driver-dev-large [22:56:38] Um... be warned that mysql and apache are not puppetized, they're getting installed by devstack. [22:56:51] yeah. that's no big deal [22:57:48] it's on virt4, [22:57:50] hm [22:57:55] lemme look at resource allocation there [22:57:59] I may migrate it to virt1 [22:58:15] 4GB of swap on that system [22:58:22] you're probably knee deep in swap. heh [22:58:30] it's going to take quite a bit to migrate, I apologize for that [22:58:36] Mmm swappy nodes is what you want [23:00:56] actually, that was pretty quick [23:01:04] oh. wait. no [23:01:06] wrong instance [23:01:30] it's done too [23:02:00] what's up with this instance? [23:02:09] I can't even run top [23:02:29] lol [23:02:46] vmstat 2 is taking way longer than 2 seconds [23:03:11] there's something seriously fucked up [23:03:40] hm. ldap issues [23:04:09] I can delete that instance if you need the resources for wiggle room [23:04:13] nah. it's not that [23:04:51] ok [23:05:13] stupid pdns ldap backend causes issues on the pdns servers [23:08:56] your instance has an ldap issue, though [23:09:50] gonna force run puppet [23:09:59] something isn't configured right, maybe [23:10:28] o [23:10:29] ok [23:12:39] ldap is working now [23:12:53] nslcd was broken [23:13:04] I tried to restart it and it failed [23:13:08] then I started it and it worked [23:13:13] I cleared the nscd cache [23:13:27] it's bad news if this is failing on instance build [23:13:34] this is the first time I've seen it fail, though [23:14:09] top still doesn't want to run [23:14:25] I'm going to reboot the instance [23:14:32] I think a lot of crap is hung [23:14:40] wow, reboot doesn't want to work either [23:14:45] well, via nova :) [23:16:39] let me guess, it isn't going to reboot -_- [23:17:09] oh, wait. is this oneiric? [23:17:19] I wonder if I added the wrong image [23:17:54] trying to reboot it again [23:17:56] this is annoying [23:18:17] yeah, oneiric [23:18:28] it should still be able to reboot :( [23:18:32] you've rebooted other ones, right? [23:18:42] Yeah, I reboot frequently. [23:18:44] it isn't coming back up [23:19:24] * Ryan_Lane sighs [23:19:26] Shall I delete it and make a new one? There's only 5 minute's worth of config invested in it. [23:19:29] yeah [23:19:40] it may be due to migration, or something [23:19:50] You should make Ryan_Lane sweat for 6hours to get your 5min of work back ;) [23:20:27] i-0000012d also doesn't reboot [23:20:28] Hey, I offered to delete it 20 minutes ago :) [23:20:40] heh [23:21:05] I probably should get around to taking a mysqldump and rsync of my bots right around now... [23:21:06] schmir: which project is that in? [23:21:13] @search resource [23:21:13] No results found! :| [23:21:21] @search instance [23:21:21] Results (found 4): instancelist, instance-json, access, instance, [23:21:22] Ryan_Lane: pediapress [23:22:01] that instance looks up to me [23:22:18] or do you mean the reboot command doesn't do anything? [23:23:22] "ssh: connect to host 10.4.0.90 port 22: No route to host" for at least 8 minutes after rebooting... [23:23:30] I just ssh'd into it [23:23:35] did you delete it and recreate it? [23:23:42] no [23:23:53] hmm [23:23:56] instance name of bob? [23:25:08] schmir: I just ssh'd into it [23:25:10] bob according to the labsconsole interface [23:25:20] Ryan_Lane: yes, it seems to be working again [23:25:27] ok [23:25:47] maybe the instances take a really long time to reboot, which would be weird [23:25:55] I'm seeing the same with one I just tested, though [23:26:48] and now it's u[p [23:26:54] I wonder why the reboots are taking a while [23:27:02] maybe it's taking a while to reach the dhcp server? [23:27:18] oh. I know [23:27:20] Console say out? [23:27:40] stupid live-migrate is causing issues as usual [23:27:59] seems the nova-compute services are having issues due to the migrations [23:28:06] fail [23:28:21] so, when a reboot happens, it doesn't quickly fix the iptables rules [23:28:33] then, when the service finally comes back up, it handles it [23:29:18] there's also the possibility that a puppet run is being forced before the instance starts ssh [23:30:17] no. user scripts only run once. [23:31:52] well, either way, the driver-dev-large instance wasn't booting at all [23:32:00] the other ones boot, just take a while [23:32:06] different issues [23:33:58] Ryan_Lane: So... I think that labsconsole is offline. You know this? [23:34:02] is it? [23:34:07] http://www.downforeveryoneorjustme.com/labsconsole.wikimedia.org [23:34:08] 404 [23:34:08] o.O [23:34:23] puppet is running [23:34:26] I wonder if it broke something [23:34:35] It's up here but /wiki is a 404 [23:35:24] wtf [23:36:35] * Damianz finds Ryan_Lane some coffee [23:36:50] seems php is broken somehow? [23:36:59] -_- [23:37:08] ldap is broken, somehow [23:37:12] I concur, php is broken. [23:37:37] puppet is probably reinstalling the ldap libraries [23:37:50] and puppet is running slow as shit [23:38:12] :( [23:38:51] well, this is seriously annoying [23:39:03] I thought you did the whole ldap thing yesterday? [23:39:07] only formey [23:39:19] and it seems puppet was broken on virt0 [23:39:19] oh [23:39:45] hm, this has no ldap libraries installed at all? [23:39:54] hahahaa [23:39:59] now I see the problem [23:40:10] we're ensuring libnss-ldap isn't installed [23:40:18] That sounds fail [23:40:22] it probably runs before libnss-ldapd installs [23:40:26] there should be a require there [23:40:45] it shouldn't cause these kinds of problems, though [23:40:55] hm. no nscd either [23:41:35] wait, it doesn't need those [23:41:42] something else must be wrong [23:52:43] Bah [23:52:58] screw you freenode [23:55:38] * Damianz waits for the server to catch up and respond [23:56:37] stupid apache needed to be fully stopped and then started [23:56:41] I wonder what caused that [23:58:05] So... would you expect my instance to work now? [23:58:15] I really hope so :) [23:58:20] did you delete/recreate? [23:58:55] Yeah, made a new one named driver-dev-big, I can't ssh to it. [23:59:00] oh? [23:59:03] The log looks ok to me... [23:59:28] hm. I can't either [23:59:59] it never ran puppet