[00:17:23] Ryan_Lane: I haven't been following in detail, but you mentioned a jruby project earlier… Subbu (who works on the visual editor parser) is a deeply involved in the jruby project. So he might be a useful resource if you/I/someone ends up having to do development there. [00:17:44] On the other hand, if you just want to shout about how much you hate jruby, subb is safely out of earshot (he's here in MN.) [00:17:53] s/subb/Subbu/ [00:22:12] heh [00:22:21] andrewbogott: well, we are looking at logstash [00:22:25] which happens to be jruby [00:22:31] it's easy to install, which is nice [00:23:54] I watched my brother install a ruby framework last week, and it went pretty well… I couldn't at all tell what was happening though. [00:24:22] * andrewbogott -> dinner [00:52:25] ehh, I have trouble with connecting via socks proxy to the instance, I do like written here labsconsole.wikimedia.org/wiki/Help:Access#Accessing_services_using_port_forwarding , I am using foxyproxy addon for firefox, if I go to whatismyipaddres.com then it shows Wikimedia foundation's ip address 208.80.153.194, in a terminal after I "ssh mshavlovsky@bastion.wmflabs.org -D 8080" I have messages like :"channel 4: open [00:52:26] failed: connect failed: Connection timed out". On the instance itself I can restart apache so I have it and at least default index.html in /var/www should work. How can I troubleshoot this? [00:54:56] MichaelShavlovsk: what's the URL you're trying to visit? [00:55:08] http://i-0000039e.pmtpa.wmflabs/ [00:55:27] hrmmm [00:55:36] maybe you need to enable remote DNS [00:55:41] i did [00:55:55] maybe i need to read the instructions [00:56:10] i see it there [00:57:09] even this way "ssh @bastion.wmflabs.org -L 8080::80" and connect to localhost:8080 does not work. I just don't know how to troubleshoot it. [00:57:26] well what does work? [00:57:34] i can't get to it with curl from the inside [00:57:43] doesn't even accept the connection [00:58:11] Oh [00:58:14] one sec, I will get error again [00:58:20] Did you set up the security group such that port 80 is accessible? [00:58:25] no [00:58:28] okay [00:58:29] Right, that'll be it [00:58:38] thanks, I will try it [04:53:57] quiet here today ;) [05:08:38] * jeremyb wants wheezy on labs ;) [10:22:28] has the certificate warning issue with puppet not been fixed? [10:42:43] laggy labs: gi sohw EHAD [11:08:41] Ryan_Lane: There? Can you fix the gluster mounting issue for the dumps project (all three instances)? I keep getting "Transport endpoint is not connected" issues with it, despite rebooting the instances. Thanks. :) [15:00:06] Hey folks. I'm struggling to connect to a new instance. Should I be able to SSH directly to bastion.wmflabs.org using my key? [15:01:08] yes [15:01:39] Did you add your key on the wiki/see wm-bot yell about your account in here at some point? [15:04:12] Ori helped me set things up and suggested I should be able to SSH right in. I did add my key to the wiki, yes. [15:04:26] I did not interact with wm-bot. [15:04:51] Hmm, are you a member of the bastion project? [15:05:01] I'm not sure. [15:05:06] Looks like you're not [15:05:11] What's your username on labsconsole? [15:05:16] Halfak [15:05:29] OK there you go, I added you to bastion [15:05:38] In a few minutes a bot will report it's created a home directory for you [15:05:42] Once that's happened, you should be able to ssh in [15:05:48] Thank you! [15:08:58] Roan++ [15:12:06] RoadKattouw: I just logged into bastion. Thanks again for your help! [15:12:24] Hmm [15:12:25] !pin [15:12:25] pong [15:12:27] !ping [15:12:27] pong [15:12:32] Eh [15:12:35] w/e [15:12:46] maybe ryan switched the home dirs to use pam magic already [15:12:54] !p [15:12:54] There are multiple keys, refine your input: pageant, password, pastebin, pathconflict, petan, ping, pl, pong, port-forwarding, project-access, project-discuss, projects, puppet, puppetmaster::self, puppet-variables, putty, pxe, [15:12:56] !pi [15:12:57] pong [15:13:03] Oh so wm-bot does partial matching [15:13:05] That's weird [17:16:10] andrewbogott: i got a broken instance, "waiting for metadata service at http://169.254.169.254/2009-04-04/meta-data/instance-id" --> "No route to host" --> "url error [ŧimed out]" [17:16:45] mutante: While creating a new instance? Where is that message showing up? [17:17:13] andrewbogott: no, an existing one. i see that in console output [17:18:41] andrewbogott: getting "no route to host" when trying to ssh into it, so i rebooted and checked console output, and then i saw that. just wanted to report to see if that IP it is trying is still correct [17:18:53] mutante: Hm, OK. I don't quite know where to start with that but I will have a look soon. Ryan_Lane might have a more immediate answer. [17:19:05] I think that IP is defined by OpenStack, not an external machine. So it should always be right. [17:19:14] But maybe the nova metadata service has crashed :( [17:19:23] mutante: what's the instance number [17:19:25] ? [17:19:39] i-00000042 [17:19:41] the IP comes from the network service [17:20:42] mutante: was this after an instance reboot? [17:20:58] Ryan_Lane: before and after actually [17:21:15] ok [17:22:16] Ryan_Lane: it's not necessarily a new thing that started today. i did not login there in a while. might have happened during migration i guess [17:22:42] but yea, i saw the same error in console output before rebooting it and then rebooted and it stayed [17:25:02] let me try migrating to another node [17:25:27] thanks [17:25:35] sometimes their network definitions get screwed up on migration [17:25:46] <^demon> Ryan_Lane: You can't work from home today, I've scheduled 4 meetings for us ;-) [17:26:02] <^demon> Out of...SPITE [17:26:03] is that serious, or a joke? :) [17:26:16] That would be a totally asshat thing to do [17:26:58] <^demon> Ryan_Lane: I'm joking. No meetings :) [17:27:03] heh [17:27:04] ok [17:27:11] <^demon> Damianz: Yes, and? You're talking to /me/, you know. [17:27:21] video conferencing? [17:27:28] There is that [17:27:29] :D [17:27:44] mutante: that didn't help [17:29:16] hm [17:29:21] it says it's on virt3.. hmm [17:29:22] it seems to have the same ip as another instance [17:29:26] ignore the wiki [17:29:28] its wrong [17:29:31] ok [17:31:02] Ryan_Lane: i can just fire up a new instance and call it a test to see if it is all puppetized [17:31:16] no. I think I found the problem [17:31:21] ah:) [17:31:51] or not [17:32:06] oh [17:32:41] one more reboot should hopefully fix it [17:33:24] ok, trying [17:35:22] not yet [17:35:47] init: plymouth main process (47) killed by SEGV signal ? [17:37:03] https://bugs.launchpad.net/ubuntu/+source/plymouth/+bug/571258 [17:38:56] that's not really an issue [17:39:07] the issue is that libvirt has the IP defined incorrectly [17:39:41] and I don't know how [17:41:09] hmm, that is in XML rigt? [17:44:01] yeah. I'm trying to track down why its redefining it incorrectly [17:44:33] fixed [17:44:48] I needed to edit the xml file located with the instance [17:44:55] then undefine the xml file in virsh [17:45:05] then redefine it using the xml file on disk [17:45:19] otherwise, nova always took the one that was in memory [17:45:38] funny enough, I edited the one in memory, so it should have worked without all of those steps [17:45:43] * Ryan_Lane shrugs [17:45:47] libvirt is weird [17:51:59] Ryan_Lane: thanks! [17:52:05] yw [18:31:45] andrewbogott: can you help me debug something? [18:31:57] Sure -- what's up? [18:31:58] it's a python with paramiko issue [18:32:23] labstore2, as the glustermanager user runs a script called manage-volumes [18:32:52] since the gluster upgrade, when it runs "sudo gluster " the call never returns [18:33:04] it's running that over ssh using paramiko [18:33:45] so, project storage's gluster shares aren't being updated [18:33:51] Have you tried ssh'ing by hand using the same account, in order to make sure know_hosts is updated? [18:33:55] yep [18:34:03] and it returns nearly instantly [18:34:31] maybe an EOF isn't being sent the same way, or the call isn't returning in a way needd [18:34:33] *needed [18:34:49] it may be possible to remove all gluster commands from going over ssh [18:35:00] since it's running on a gluster node [18:35:32] * andrewbogott visits labstore2 [18:35:54] I recommend making manage-volumes2 or something [18:36:01] and modifying it till it works :) [18:36:03] sure [18:36:05] thanks [18:36:42] I'm really close to finishing the OSM upgrade :) [19:24:14] Ryan_Lane: yt? [19:24:28] yes, what's up? [19:25:11] so, i tried adding phabricator.wmflabs.org to i-000003a2 [19:25:24] it said okay, btu the hostname magically turned into "integration.wmflabs.org" [19:25:36] o.O [19:25:42] i thought, okay, odd. tried adding it again, didn't work [19:26:04] *did* manage to assign phab.wmflabs.org to i-000003a2, but that resolves to some integration wiki machine [19:26:16] gimme a sec [19:26:32] np, thanks for checking [19:27:21] Ryan_Lane: Does this script live in git anywhere? [19:27:33] it's in puppet, probably [19:27:35] or in svn [19:28:00] why? were you able to get it working? :) [19:28:00] heh [19:28:10] these 500 errors are killing me [19:28:54] I think so… just a second. [19:29:54] wow this dns entry is really fucked up [19:30:02] I can't wait to switch dns code [19:32:16] ori-l: ok. I fixed it manually [19:32:21] Ryan_Lane: thanks [19:32:24] yw [19:32:35] no clue how those two got added to the same entry [19:32:52] ~glustermanager/managetest/manage-volumes2 seems to work. [19:33:07] I wish I understood this better… the previous code was blocking when trying to get the exit code. [19:33:20] yeah [19:33:23] well [19:33:23] Pulling everything off of stdout causes the remote command to close, and then we can get the exit code. [19:33:32] ah [19:33:37] by doing recv? [19:33:40] So, probably something changed with gluster that made it more verbose, and flooded us, which caused a block. [19:33:42] then checking the exit code? [19:33:56] Yeah, I just made the recv loop happen every time, and then just throw away stdout if the caller doesn't ask for it. [19:34:03] * Ryan_Lane nods [19:34:06] cool [19:34:26] The old code used recv_ready incorrectly, but I don't think that was causing actual problems. [19:34:33] lemme find where that code exists [19:34:41] ** server can't find phabricator.wmflabs.org: NXDOMAIN [19:34:45] delay? [19:34:49] Anyway, you can check the diff, and then do whatever to move it into production if it looks OK. [19:35:15] * Ryan_Lane nods [19:35:22] ori-l: sec [19:35:31] My code now takes whole-second naps, so if you need this script to be fast then that should change. [19:35:49] it does sleeps? [19:35:58] I'll check it [19:36:05] ideally it shouldn't sleep [19:36:19] it ran in under a second previously, if there were no changes to be made [19:36:23] and it runs every minute [19:37:18] ori-l: works for me now, it may be negatively cached in the office dns, though [19:37:28] I purged the negative cache on the dns server [19:37:38] Ryan_Lane: much obliged! [19:37:41] yw [19:41:28] Ryan_Lane: It sleeps if the connection is still open but stdout isn't ready yet. That seems to happen frequently. [19:41:41] But there's probably some way to sleep .01 second rather than 1.0 seconds. I'll look... [19:41:41] ah [19:42:14] hm. I don't see any output from the command when I run it [19:42:19] ah [19:42:20] heh, the command I'm already using takes a float. I will adjust! [19:42:24] I totally know what the problem is [19:42:26] Doesn't it only output when it has work to do? [19:42:32] it's not the script [19:42:40] it's an instance missing an arecord, I bet [19:42:44] I got output on the first working run but not subsequently. [19:43:32] that is indeed the problem [19:43:34] damn it [19:43:42] I need to live-hack openstackmanager [19:43:52] hm [19:44:01] or I'll just make a maintenance script for now [19:48:38] live hack it is [19:54:58] yep [19:55:07] now that I fixed the arecords, the script is outputting [19:55:22] there were a ton of broken instance records [19:55:40] I put in a live hack so that the job will run 10 times before it gives up [19:56:09] yikes :) [19:56:37] I hate that it runs as a job to begin with [19:56:59] I'm considering putting in a blocking loop in instance creation so that I can kill it off [19:57:18] that's also a shitty solution, though [19:57:34] the real solution is to move to the new dns code [19:59:11] * andrewbogott really hopes the new dns code actually works [19:59:45] if not we'll fix it :) [20:08:52] How's the osm work going in regards to migrate to the latest os version with chunks of bug fixes? [20:09:52] eh? [20:10:08] osm upgrade only relates to essex and nova api compatibility [20:11:01] I thought you where dropping ec2 and adding horizon support in for the next upgrade? [20:14:27] horizon? [20:14:32] that's the web interface [20:14:43] I'm dropping ec2 and adding openstack api [20:14:53] and adding keystone support [20:15:13] err yeah I meant keystone [20:15:17] too many random names [20:21:45] heh [20:21:46] yeah [20:33:44] Ryan_Lane: so what's the verdict for the log server? :) [20:34:12] let's keep testing it out [20:34:53] log server? logstash thing? [20:34:58] * Ryan_Lane sighs [20:35:02] No scribe+hadopp *sadelephant* [20:35:04] gluster issues [20:35:22] 'gluster' implies 'issues' when in the context of io or scale [20:37:51] * Damianz goes back to looking at how much a topbox is [20:42:17] Need bunch of random logs thrown towards it though. Looks like I would need some "volunteer log donors" :) [20:42:39] what type of logs [20:44:05] i wonder if mutante ever got sorted out [20:44:58] Anything would work. I just need to have an idea about the load, space requirements for elasticsearch indices, etc [20:45:35] having issues with auth.allow and gluster [20:45:43] I think my bot logs would suck sa on api errors the dump the whole article so are no longer line seperated [20:45:50] * jeremyb moves back to scrollback [20:50:48] Damianz: does not matter. Just need huge number of logs/sec for at least a week or two. Then, I need to check with random searches on them [21:00:06] ok. worked out the gluster issues [21:04:37] Ryan_Lane, is there some way to do an interactive puppet run? So that I can check the state after each step before it procedes? [21:04:54] nope [21:06:06] Hmph. Do you have a moment to look at something? in generic::mysql::server, the debian package seems to be getting installed before my.cfg is created. But the dependencies look right to me. [21:06:46] the package should get installed first [21:07:11] it needs to be installed before the file can be added [21:08:49] Ryan_Lane: When you say 'should' do you mean that the manifest says that, or that you think it would be correct for that to happen? [21:09:04] that's the only correct way [21:09:13] Because, I'm pretty sure I just demonstrated in a test that if you create the config file first, it alters the debian's behavior. [21:09:25] Specifically, the debian installer sets up the mysql datadir based on the cfg file. [21:09:40] yeah [21:09:45] changing the value in the cfg file after the fact doesn't work because… the dir needs to be set up in various fancy ways. [21:09:54] why? [21:10:01] hm [21:10:07] because mysql starts automatically on install [21:10:10] fucking debian [21:10:22] that's seriously one of the most annoying things about debian/ubuntu [21:10:29] Oh, yeah, maybe it's first run that sets up the dir, rather than the dpkg. [21:10:34] paravoid: can you tell them to stop doing that? [21:10:39] yeah [21:10:39] it is [21:10:55] you can in Debian, not sure in Ubuntu (with upstart et al) [21:11:14] the package itself makes the daemon start [21:11:17] Anyway, I probably get puppet to move the mysql directory after it's set up in the wrong place. Just seems nicer to have the config file in place beforehand so it all just works. [21:11:29] *can probably [21:11:33] all maint scripts (postinst etc.) are mandated to not call initscripts directly [21:11:37] but rather call invoke-rc.d [21:11:39] ah [21:11:53] so, we can change its behavior? [21:11:58] and invoke-rc.d calls /usr/bin/policy-rc.d first to check if it should start services or not [21:11:58] it's nice to have a debian person with us :) [21:12:11] if policy-rc.d requires 101, it doesn't do anything [21:12:23] however, invoke-rc.d is probably replaced by something in upstart [21:12:31] and I have no idea how it'll work with that [21:13:20] * paravoid googles [21:30:56] paravoid: Upstart happens separately from puppet, though, doesn't it? (Maybe I'm confused by what 'upstart' is.) Couldn't we just have a puppet rule like "Before you do anything else, turn off auto-starts"? [21:31:54] dpkg runs stuff before and after installing packages [21:32:10] they're called "maintainer scripts" [21:32:29] the gory graphs of how they work are in http://wiki.debian.org/MaintainerScripts [21:33:00] the important part is that these scripts are responsible for stopping/starting daemons and that happens indepdently of puppet [21:42:38] paravoid: Sure… but we can disable that by hacking invoke-rc.d? (I thought that's what you were saying before, maybe I misunderstood.) [21:43:33] invoke-rc.d has a bultin mechanism for telling it to not do that [21:43:40] but upstart doesn't use invoke-rc.d [21:43:47] (it seems so, not sure yet) [21:44:26] in any case, I'm not sure how I'd feel about enforcing that policy, it has the potential of breaking a lot of stuff [21:46:23] I agree. It might make sense to set the policy at the start of a puppet run and then unset it at the end... [21:46:34] Presuming that no one is monkeying with apt at the same time :( [22:02:40] what's the right way to do an ACPI reboot? [22:02:58] restart from CLI doesn't work IIRC [22:03:16] and idk if the labsconsole restart is forced or what [22:12:38] jeremyb: just reboot doesn't work? [22:12:50] jeremyb: you want magic sys requests? [22:13:09] mutante: iirc if i rebooted from CLI then it never came back up [22:13:24] echo b > /proc/sysrq-trigger [22:13:38] i just tried that and came back [22:14:12] but i would have also expected regular reboot to do that.. (and yea the labsconsole restart is forced) [22:15:12] well, using "b" is also not waiting for filesystems or anything [22:15:43] http://en.wikipedia.org/wiki/Magic_SysRq_key#Commands [22:43:56] paravoid: ok, opinion... [22:44:10] there's no such thing as global roles anymore [22:44:29] in the interface I've completely eliminated the need for an admin user [22:44:52] however, if I don't add an admin user with full permissions in every project and every role, then it'll be a pain for us to manage vms [22:45:03] we'd have to add ourselves into the projects and roles before we could admin them [22:45:05] thoughts? [22:45:52] leave out the admin user (which is safer), or add in the admin user? [22:46:21] <^demon> What if you automatically add $admins when creating a project? [22:46:49] <^demon> So you would still be doing the latter idea, but less manual-ness. [22:46:57] I am automatically adding whoever creates the project right now [22:48:39] well, there's an admin user [22:48:50] hmm. [22:48:51] It doesn't necessarily need to be all of the admins [22:49:01] but I'm keen on deleting the admin user [22:49:10] or at least limiting its use where possible [22:49:48] I think it's more important for us to be able to manage vms [22:49:51] indeed [22:49:57] I'll keep that in, then [22:50:15] yeah [22:50:24] it'd be too much of a pain to add ourselves after the fact [22:50:52] yeah [22:51:01] also, this version of nova commandline requires you use a password [22:51:13] that user's password is on the filesystem anyway [22:51:24] since it needs to be in the config files [22:53:06] paravoid: the bad thing is that users can remove that user from the roles/groups [22:53:17] maybe I can hide that from the interface [22:53:52] I really want global roles back [22:54:20] in the interface we'll need to add/remove ourselves from projects like we previous have been [22:54:29] but from the commandline on virt0 we can use the admin user [22:56:19] mutante: it didn't require a token when you logged in? [22:56:58] well, holy shit. that's broken [22:57:51] Ryan_Lane: no, it didn't when i tried logging out and back in right after activating [22:58:01] so i assumed it remembers for some time [23:00:39] so, wait, openstack is not adding global roles again? [23:00:44] how's that supposed to work with other clouds? [23:00:59] public clouds don't use global roles [23:01:16] there's a blueprint in for adding global roles back in