[00:05:48] PROBLEM host: driver-dev2 is DOWN address: driver-dev2 check_ping: Invalid hostname/address - driver-dev2 [00:10:41] * mmovchin goes to sleep. Good night. [02:40:35] RECOVERY Free ram is now: OK on nagios 127.0.0.1 output: OK: 21% free memory [02:41:25] RECOVERY Free ram is now: OK on bots-sql3 bots-sql3 output: OK: 20% free memory [02:44:35] PROBLEM Free ram is now: CRITICAL on puppet-lucid puppet-lucid output: Critical: 3% free memory [02:58:35] PROBLEM Free ram is now: WARNING on nagios 127.0.0.1 output: Warning: 19% free memory [02:59:25] PROBLEM Free ram is now: WARNING on bots-sql3 bots-sql3 output: Warning: 18% free memory [03:04:50] Ryan_Lane: I can't get sg to work on a labs instance. That's not a special labs security thing, is it? [03:05:14] sg? [03:05:36] Oh, so that's just as obscure for you as for me, huh? [03:05:45] It sets group id. [03:05:49] ahhhh. ok [03:06:05] yeah, it's known to be broken right now [03:06:20] sara fixed it the other day, I'll test it and push it out to everything tomorrow [03:06:35] it's an issue with the ldap libraries [03:06:36] cool, that explains a lot :) [03:07:06] yeah. it's subtlety breaking things [03:08:03] Well, I'm about done for the night anyway. [03:08:44] cool. yeah. the fix could possible break all the instances, so I'm a little hesitant to do it without testing it well :) [03:29:35] RECOVERY Free ram is now: OK on puppet-lucid puppet-lucid output: OK: 20% free memory [06:29:25] PROBLEM Current Load is now: WARNING on deployment-web deployment-web output: WARNING - load average: 11.97, 16.09, 7.52 [06:30:35] PROBLEM Current Load is now: WARNING on deployment-webs1 deployment-webs1 output: WARNING - load average: 3.72, 13.11, 7.22 [06:39:29] RECOVERY Current Load is now: OK on deployment-web deployment-web output: OK - load average: 0.05, 2.23, 3.97 [06:40:15] RECOVERY Current Load is now: OK on deployment-webs1 deployment-webs1 output: OK - load average: 0.06, 2.02, 3.94 [10:17:27] PROBLEM Free ram is now: WARNING on deployment-webs1 deployment-webs1 output: Warning: 12% free memory [18:27:08] Ryan_Lane: Do you have an eta for the setgid fix? [18:29:34] when I get into the office I'll start working on it :) [18:29:37] what's the problem? [18:29:40] heading there soon [18:29:50] libnss-ldap library is broken when using ssl/tls [18:31:03] which breaks setguid calls [18:31:24] so, we are switching to a different library, libnss-ldapd [18:31:59] oh, that's the reason why it acted as you weren't in the group, and newgrp didn't work? [18:32:06] good [18:32:53] yeah [18:33:46] it's breaking a lot of things subtly [18:33:51] I thought it was a NFS problem [18:34:25] I thought so as well, at first [18:34:32] preilly found the actual issue [18:34:46] when he was trying to get screen sharing working [19:47:16] ok… let's see about this ldap fic [19:47:17] *fix [19:51:29] just pinged the #gerrit room with our questions [19:51:30] we'll see... [20:04:50] Ryan_Lane [20:05:00] re push notification without git-review [20:05:14] I've been pointed to this by someone in #gerrit: http://gerrit-documentation.googlecode.com/svn/Documentation/2.2.2/config-hooks.html#_ref_updated [21:07:52] andrewbogott: travel for openstack conference is approved [21:08:03] andrewbogott: Laura will get with you about arrangements [21:08:06] great. [21:08:24] ottomata1: seems like an appropriate hook [21:08:41] Ryan_Lane: When is the OS DS again? [21:08:49] I know it's in SF and it's soonish... [21:08:59] April 16-20 [21:09:02] did you want to go? [21:09:04] Ah, OK [21:09:06] No, not really [21:09:13] ok [21:09:16] But it's useful to know when it is [21:09:24] was going to have to try to get you in. they always completely book [21:09:36] Maybe I'll work more closely with the #openstack-infra people again, like a few weeks ago [21:09:41] cool [21:09:45] In that case it would've been nice to drop by for half a day or so [21:09:47] Ryan_Lane so what do you need from me to install that? should I find (or write) a hook that emails out? I can probably hack one of the post-receive hooks i've used to take those args [21:10:55] well, we want to make sure only refs that aren't changes send emails from that [21:11:40] RoanKattouw_away: they may open attendance, like they have before [21:12:00] basically it means if you don't have a badge, you can be in the room, but have to give up your seat for someone with a badge [21:14:03] only refs that aren't changes? [21:14:08] yep [21:14:18] ah as in, changes (um) 'merged' by gerrit? [21:14:35] well, everything that comes in is a ref [21:14:36] as in, only those that come directly from git push and bypass review? [21:15:00] yep [21:15:20] aye, ha, mmmk, will ask #gerrit again, maybe they know how to do that [21:17:39] heh. I was in there too :) [21:17:44] nice :) [21:20:43] ok. let's see if I can break all of labs [21:22:27] hah, for the hook? [21:22:44] nah. fixing the setguid brokenness [21:22:54] by changing out the LDAP libraries [21:23:50] aye [21:24:12] New patchset: Ryan Lane; "Applying LDAP fix to all instances" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/2693 [21:24:20] * Ryan_Lane crosses fingers [21:24:25] gonna be a long day if this doesn't work [21:24:56] New review: Ryan Lane; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2693 [21:24:56] Change merged: Ryan Lane; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/2693 [21:27:00] might as well break it well and kill bastion [21:27:19] \o/ it didn't break! [21:27:22] heh [21:28:12] How do I pick up that change on my instance? Puppet refresh? [21:28:19] yep [21:29:24] looks good so far. [21:29:28] great [21:31:12] I need to force run puppet on all of the instances [21:31:30] dear SMW powered ddsh [21:40:33] ottomata1: well, seems we might need to wait for that patch to land [21:43:38] aye, hmm, ok [21:46:50] unless we can figure out how to use that hook [21:48:20] well, do we know that that hook will even do what we think it will? [21:48:26] sounds like mfick_ wasn't really sure [21:48:35] i can't really tell from the doc [21:48:50] is a ref updated only by gerrit? or by git? [21:51:35] I think everything is a ref [21:51:42] including whatever bypasses review [21:51:56] but it will be everything, including all of the other stuff too [21:52:17] do we have a gerrit instance running? [21:52:34] I think I made the project and still haven't made the instance [21:53:37] aye [21:53:41] yeah that would be a good way to check [21:55:15] so. many. broken. instances. [22:35:38] * Damianz eyes Ryan_Lane [22:35:49] is your instance broken? [22:35:58] The bots still running [22:36:06] can't log into it, though? [22:36:08] * Damianz will try and login later [22:36:10] which instance? [22:36:32] I'd like to fix it now, rather than later [22:36:35] I can login to bots-cb [22:36:44] And sudo [22:36:49] So not so broken for once [22:36:51] hm. it's working for me [22:41:07] Damianz: is there some instance not working? [22:46:38] Just checked mine and they are, I just feel like eyeing you closley when you say broken and instance in the same sentance as it usualy ends up in my talk page getting spammed from people :P [22:47:21] I assume you just pushed the ldap libs fix that was randomly breaking things like changing groups? [22:48:01] heh [22:48:04] yeah [22:48:11] newgrp should work now [22:48:21] and anything else that uses setguid [22:58:10] !log deployment-prepbackup stopped mysql to free up enough memory to run puppet [22:59:12] damn it [22:59:14] where's the bot :( [22:59:37] maybe I need to move it somewhere it won't keep dying [23:02:42] Ryan_Lane: /me waves to bots-cb :P [23:02:57] eh? [23:03:07] "[22:59:37] maybe I need to move it somewhere it won't keep dying" [23:03:14] !log deployment-prepbackup stopped mysql to free up enough memory to run puppet [23:03:15] Logged the message, Master [23:03:28] !log deployment-prepbackup brought mysql back up [23:03:28] Logged the message, Master [23:04:02] I might just make an instance for irc bots [23:04:11] Ryan_Lane: We could probably do with another instance genreally [23:04:17] yeah [23:04:21] Though an instance for labs tools would be good [23:04:28] * Ryan_Lane nods [23:05:32] methecooldude: CB is playing better since I replaced the code that was causing it to shoot its self in the head with fork spam when the irc connection got dropped :D [23:05:53] Damianz: Awesome [23:06:06] What about 3? [23:06:31] Still need to work on the core more though, threading would really help. Kinda waiting until we have mysql access and can drop the stupid toolserver call as that's the slowest and most un-reliable part. [23:06:36] 3 is still meh [23:06:37] dies a bit [23:07:47] "We're is cobi? � Preceding unsigned comment added by Thetechexpert (talk � contribs) 05:29, 8 February 2012 (UTC) [23:07:47] Everywhere. -- Cobi(t|c|b) 12:22, 8 February 2012 (UTC) [23:07:47] But nowhere, he is omnipresent :) - Rich(MTCD)T|C|E-Mail 12:14, 21 February 2012 (UTC)" [23:08:10] lol I saw that [23:10:29] bahahaha. the banner for the deployment-prep instances is hilarious [23:20:06] Does it give you a huge list of things not to break? [23:21:11] it tells you not to run maintanance scripts [23:21:24] and has a big "deployment cluster" ascii art banner [23:21:41] I'm having to reconfigure some of these to run puppet [23:21:48] I hope it doesn't cause issuses [23:21:59] Yeah... dunno how they keep the dbs up to date with svn [23:22:23] via other maintenance scripts [23:22:31] we have cluster-specific ones [23:22:35] the mediawiki ones break things [23:23:04] we really need nagios alerts for broken puppet [23:23:52] ottomata1: we'll likely run into a similar issue as you guys in labs [23:23:58] with git/gerrit [23:24:31] we're going to be running each project from a separate gerrit branch, and will need to move changes from the test branch, and merge them into all of the project branches [23:24:48] and will need to move changes from production, into test, then into the branches [23:24:54] and vice versa [23:26:22] I wonder if branches should be throw-away, and rather than projects always running in a branch, they can make a short-lived feature branch, push into test, get the change code-reviewed, then switch back to using test [23:26:34] then delete the branch [23:26:49] it would make the merge process less problematic [23:33:39] ottomata1: btw: https://labsconsole.wikimedia.org/wiki/Gerrit_bugs_that_matter#Issue_118:_Direct_push_to_a_branch_doesn.27t_send_email_to_subscribers [23:37:56] Personally I think all test vms should run in test, for a new feature it should get branched, tested, merged back into test then deleted. [23:38:10] yeah, that sounds reasonable to me too [23:38:20] otherwise we'll have instances running really old versions of puppet [23:38:31] Or we could do it the kernel way and just email you patch files to merge :D [23:38:40] we should have a nagios check for instances that run a feature branch for too long [23:38:54] that's basically how it works right now, and it makes me want to strangle myself [23:39:01] it also means test is almost always broken [23:39:13] and terribly out of sync with production [23:39:22] it's a nightmare [23:39:34] !account-questions [23:39:34] I need the following info from you: 1. Your preferred wiki user name. This will also be your git username, so if you'd prefer this to be your real name, then provide your real name. 2. Your SVN account name, or your preferred shell account name, if you do not have SVN access. 3. Your preferred email address. [23:39:44] Imo test and production should be mergable constantly. [23:39:50] heh. copy/paste into email. yes, I'm lazy. [23:39:54] Damianz: +1 [23:40:16] The test branch should effectivly be production for the test cluster and then picked into the live production branch :D [23:40:25] yes [23:41:13] we need a way to easily create/delete branches in gerrit [23:41:29] since we really need all instances in a project to be able to create a branch, and switch into it [23:42:19] How will we handle updating puppet.conf though? Not manage via puppet or have a local include folder we can drop stuff into. [23:42:39] we are moving away from a central puppet master [23:42:47] each instance will have a full checkout of the repo [23:43:02] node info can either stay in puppet, or live on the system [23:43:07] hell, we could keep the node info in LDAP [23:43:18] and have puppet pull it locally before runs [23:43:26] err [23:43:34] s/LDAP/mediawiki [23:43:48] though we're considering keeping it in nova [23:43:50] Yeah but I mean if I switch from test to the branch 'wibblewobblemyfeature' currently puppet.conf is updated from puppet so when puppet ran it would switch the config back? [23:43:55] in the metadata server [23:44:13] oh. the puppet.conf stays the same [23:44:20] no need for it to change [23:44:33] we aren't using environments [23:44:44] Ryan_Lane, yeah, it seems anytime people want to use branches, gerrit might get a bit annoying [23:45:15] Also puppet suggest layout != mwf layout which = confusing :P [23:45:17] well, I think branches are inherently problematic, over long periods of time [23:45:28] since they diverge, quickly [23:45:39] Damianz: what do you mean? [23:45:47] Damianz: that we don't use puppet modules? [23:46:03] it would be ideal for us to move to using puppet modules, in the long run. [23:46:10] yeah, basically [23:46:41] Puppet modules are kinda weird though, for me anyway as I have certain files that should only be accessible by certain ip ranges and the fileserver acls are a bit meh. [23:46:58] yeah [23:47:37] for labs I'm not terribly worried about the security issues [23:47:43] production bothers me more [23:48:55] Though... you should try having perl scripts that general htaccess files on cron ;) That makes for a world of fun. [23:49:07] * Ryan_Lane twitches [23:49:24] well, with git though, that's kinda the point [23:49:33] its ok if they diverge, and people (or you) can delete old ones [23:49:34] but [23:49:42] they shouldn't really diverge that much [23:49:46] if people merge back from master often [23:49:50] that should be part of their workflow [23:49:58] yeah, we could do that with gerrit branches too [23:49:58] The entire point of a git branch is that you can merge it back or you pull often from upstream. [23:50:02] even multiple times a day, they should merge back from master [23:50:04] it's just that people don't do it [23:50:06] to get any new master changes [23:50:13] Though you can trash the content in a branch and use it like a folder in svn. [23:50:14] svn branches are supposed to work that way too [23:50:16] and then you get branches that are ancient and won't easily merge [23:50:34] this is the problem we have with our test branch for puppet [23:50:37] not really, git should be nice about it [23:50:44] even if it is ancient [23:50:49] oh, it handles most of the merging fairly well [23:50:53] git merge master should only conflict with changes the person has in the branch [23:50:59] We really need someone that will merge prod back into test often as well as test branches into test - perhaps a tidy up of abandoned test branches getting deleted after so many days too. [23:51:00] that's the thing :) [23:51:11] we change a lot of stuff, really often, in both production and test [23:51:28] test is so fucked right now that we are planning on scrapping it and recreating it from master [23:52:03] part of the problem is that changes aren't tested well enough before going into test, so we can't merge them to production [23:52:14] this is why our solution is more branches, not less :) [23:52:17] Also - is there any reason we can't have a more production clone in test? Like say nagios [23:52:40] Nagios currently is just over *THERE* and adding stuff is a) pointless and b) un-tested. [23:52:44] actually, nagios in puppet won't work with puppet in labs, when we get rid of the puppet master [23:53:11] the way it's currently done is slightly better than in production. we just need a way of adding checks [23:53:19] mhm [23:53:37] production uses puppet external resources [23:53:41] Ryan_Lane: I'm very interested in commits going to a usable test env quickly. [23:53:54] It would also be nice to have a way of quickly creating and destroying instances rather than having to use the web interface :( [23:54:15] chrismcmahon: the idea is each project gets its own branch, you test the changes directly in your project, then push to test [23:54:19] test is required to always be stable [23:54:28] then we try to quickly move the changes from test to production [23:54:35] we also often merge from production to test [23:54:42] Ryan_Lane: and I see myself working in test more than anywhere else. [23:54:55] yeah. I almost always work in test [23:55:01] most ops work directly in production [23:55:18] thinking of that… I need to merge some test changes into production right now [23:55:23] It will make more sense when we have the labs production side as test is kinda production right now with test hashed on a little for bots at least. [23:55:52] yeah [23:55:58] we've been doing new projects in labs [23:56:04] just not all of the old ones, yet [23:56:10] it's a slow process [23:56:44] it's way better than before, when we tested everything in production, though ;) [23:56:51] lol [23:57:24] gah [23:57:25] Btw do we have any ideas on packages - for example for bots where production can't use 3rdparty repos (well can if ops are supplied cookies and cake)? [23:57:30] of *course* this cherry pick conflicts [23:57:37] yes [23:57:42] we want an automated build server [23:57:55] per-project, and labs-wide [23:57:55] :D [23:58:06] so, projects can push anything they want into their own repo [23:58:06] That would be nice [23:58:13] then can have it code-reviewed to have labs-wide [23:58:25] then we can move the packages into the production repo, after testing [23:58:44] if you haven't noticed, I'm big on the multi-tenant ideal of openstack :) [23:59:21] I kinda wish zones was nicer supported though... oh and tgt support in c6 [23:59:35] yeah, zone support sucks right now [23:59:39] tgt? [23:59:42] c6?