[00:03:52] PROBLEM Current Load is now: CRITICAL on testing-virt9.pmtpa.wmflabs 10.4.1.74 output: Connection refused by host [00:04:32] PROBLEM Current Users is now: CRITICAL on testing-virt9.pmtpa.wmflabs 10.4.1.74 output: Connection refused by host [00:04:32] PROBLEM Free ram is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: Warning: 19% free memory [00:04:57] there's one new compute node [00:05:14] PROBLEM Disk Space is now: CRITICAL on testing-virt9.pmtpa.wmflabs 10.4.1.74 output: Connection refused by host [00:05:54] PROBLEM Free ram is now: CRITICAL on testing-virt9.pmtpa.wmflabs 10.4.1.74 output: Connection refused by host [00:06:04] you know pretty sure we're going to have so many compute nodes that storage is going to be crappy as hell if we loose one [00:06:18] what do you mean? [00:06:20] storage is local [00:06:33] if we lose a compute node we lose all the instances on it too [00:06:54] I thought you had that glusteris too or is that the one you got rid of due to the hartid of images? [00:07:16] I got rid of that over a year ago ;) [00:07:24] PROBLEM Total processes is now: CRITICAL on testing-virt9.pmtpa.wmflabs 10.4.1.74 output: Connection refused by host [00:07:41] totally don't keep track of time [00:07:47] heh [00:07:54] PROBLEM dpkg-check is now: CRITICAL on testing-virt9.pmtpa.wmflabs 10.4.1.74 output: Connection refused by host [00:08:05] puppet takes so fucking long to run [00:08:11] It's ruby *shrug* [00:08:25] it takes 403 seconds to run on a new instance [00:08:36] that's absurd [00:08:40] That's redic [00:08:54] since we're already imaging a new instance should be done in like <2min [00:08:55] RECOVERY Current Load is now: OK on testing-virt9.pmtpa.wmflabs 10.4.1.74 output: OK - load average: 0.71, 0.92, 0.52 [00:09:34] RECOVERY Current Users is now: OK on testing-virt9.pmtpa.wmflabs 10.4.1.74 output: USERS OK - 0 users currently logged in [00:09:56] that users logged in check is silly [00:10:07] yeah [00:10:07] agreed [00:10:13] RECOVERY Disk Space is now: OK on testing-virt9.pmtpa.wmflabs 10.4.1.74 output: DISK OK [00:10:26] maybe I could clone a fully puppetized system [00:10:30] and use that cloned image [00:10:40] keys and stuff could be problematic [00:10:46] all the state is stored locally [00:10:47] what keys? [00:10:53] that's over nfs [00:10:57] on gluster [00:11:03] RECOVERY Free ram is now: OK on testing-virt9.pmtpa.wmflabs 10.4.1.74 output: OK: 900% free memory [00:11:05] puppet key [00:11:07] ok ssl shiz [00:11:08] ah [00:11:08] right [00:11:28] I'd need to wipe some stuff out before cloning [00:12:07] hmm hashar's pep8 check doesn't agree with me... I should talk to him about just using make test since that works [00:12:23] RECOVERY Total processes is now: OK on testing-virt9.pmtpa.wmflabs 10.4.1.74 output: PROCS OK: 84 processes [00:12:53] RECOVERY dpkg-check is now: OK on testing-virt9.pmtpa.wmflabs 10.4.1.74 output: All packages OK [00:12:57] New patchset: DamianZaremba; "I couldn't give a flying monkey turd" [labs/nagios-builder] (master) - https://gerrit.wikimedia.org/r/45080 [00:13:40] Change merged: DamianZaremba; [labs/nagios-builder] (master) - https://gerrit.wikimedia.org/r/45080 [00:13:59] something is still causing really slow logins *sigh* [00:15:23] right.. that's that check gone... just leaves fixing puppet freashness which I have no idea how I can do right now hmm... I could re-write the snmp trap to do a ldap lookup I guess [00:15:41] Damianz: is it? [00:15:45] I'm not seeing slow logins [00:15:56] it hangs after banner for a few seconds randomly for me [00:16:10] oh. you mean slightly slow logins [00:16:13] not like yesterday logins [00:16:20] the ldap servers are slightly overloaded [00:16:25] I mean like - it's not instant, this makes me sad [00:17:00] there's some things I can do to speed up lookups som [00:17:01] *some [00:17:10] probably doesn't help we killed caching [00:17:10] some options in nslcd that didn't exist in nssldap [00:17:19] that's only negative caching we killed [00:17:24] true [00:17:32] hmm like /etc/wmflabs-instancename exists, is there one for region the instance is in? [00:18:10] hm [00:18:12] I guess I could pull it from salt but wow shit is that hacky... [00:18:17] heh [00:18:19] or facter [00:18:24] actually... [00:18:32] I can do this straight in puppet... sorta [00:18:37] puppet knows [00:18:47] custom facter to pill the instance name and then replace the snmptrap call if in relm labs [00:18:48] is it $site? [00:18:50] boom [00:19:09] Also... CAN WE JUST USE ONE GOD DAMN FWDN [00:19:13] s/W/Q/ [00:19:21] what do you mean? [00:19:25] get rid of i-xxx? [00:19:41] i-0000030d.pmtpa.wmflabs nagios-main.pmtpa.wbflabs, we should use the later [00:19:46] indeed [00:19:48] that's in the plans [00:20:01] for now... let's just hack it up [00:20:02] it's not amazingly easy [00:20:26] have to ensure uniqueness in a bunch of places it didn't exist before [00:20:32] have to immediately delete old keys [00:20:53] It's more fun when you have duplicate hostnames [00:20:56] openstack (in this release) doesn't care about unique hostnames [00:21:06] I think andrew fixed that in grizzly [00:21:18] you can do unique names globally or per project [00:21:38] I really kind of wish we went with ...wmflabs [00:21:40] so if we'd just gone with ..wmflabs :D [00:22:13] we still can, but it's going to be painful [00:22:21] nuke all the things [00:22:30] it's just a matter of changing their dns names [00:22:38] but it's probably going to break a lot of people's stuff [00:23:01] another compute node in [00:23:01] well for a start you gotta update host files or you're gonna break acls, clear caches, ensure people aren't retarded etc [00:23:12] host files? [00:23:27] I don't think the hostname is in thefile [00:23:32] oh yeah... you not seen really weird behaviour when your hostname doesn't resolve to 127.0.0.1 [00:23:37] it isn't [00:23:47] actually it's not [00:23:56] if I try that on redhat or with pgsql it freaks out [00:24:00] heh [00:24:03] PROBLEM Current Load is now: CRITICAL on testing-virt10.pmtpa.wmflabs 10.4.0.82 output: Connection refused by host [00:24:08] it's a problem with a lot of java services too [00:24:13] RECOVERY Free ram is now: OK on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: OK: 20% free memory [00:24:25] java hates everyone [00:24:37] $::site yay [00:24:40] !log testlabs shutting labs-nfs1 instance down [00:24:42] Logged the message, Master [00:24:51] \o/ [00:24:53] PROBLEM Disk Space is now: CRITICAL on testing-virt10.pmtpa.wmflabs 10.4.0.82 output: Connection refused by host [00:24:53] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 20% free memory [00:25:00] hasn't been any network connection on there for a while [00:25:13] I'm going to kill it in a few days [00:25:20] delete it, that is [00:25:34] I think I may have a celebratory drink when I do [00:25:59] get the 15 year old single malt out? [00:26:17] probably the 18yo bourbon [00:26:30] hmmm bourbon... american [00:26:58] I drink bourbon, rye and scotch [00:27:09] I don't have any scotch on my desk right now, though [00:27:41] Never had rye, I found a nice bottle of Caol Ila next to my Laphroaig earlier though [00:27:44] btw, check out the awesome change: https://gerrit.wikimedia.org/r/#/c/44948/2 [00:27:59] Krenair is great :) [00:28:05] * Damianz tickles Krenair [00:28:08] we're going to have proper notifications! [00:28:24] :D [00:28:27] snmptt is the most retarded bit of software ever [00:28:33] Damianz: yes, it is [00:30:44] hmm I could totally write a facter for this or just cat the file... lets cat the file [00:32:14] PROBLEM Free ram is now: WARNING on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: Warning: 19% free memory [00:33:53] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [00:35:31] * Damianz waits [00:36:38] well snmptt is borked but https://gerrit.wikimedia.org/r/#/c/45081/ needs to get merged before it will work anyway [00:37:39] Ryan_Lane: Has anyone ever considered writing a custom puppet reporter module to talk to nagios? it would be far easier, more reliable and not need snmp =\ [00:38:17] well, it's checking to see if puppet ran ;) [00:38:26] if puppet doesn't run, it can't report, right? [00:38:29] oh. right [00:38:31] ... [00:38:33] if it doesn't then it times out [00:38:35] :D [00:38:37] ignore me [00:38:41] * Damianz ignores you [00:38:48] o.o [00:38:52] yes, that would be a better solution than an snmp trap [00:38:57] though you could also make it warn/error in real time if it failed a run hard [00:39:09] indeed. that would be much nicer [00:39:17] https://github.com/DamianZaremba/sentry-puppet/blob/master/lib/puppet/reports/sentry.rb < I send mine to Sentry [00:39:33] and yuck ruby is horrid [00:39:34] RECOVERY Free ram is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: OK: 22% free memory [00:39:38] yep [00:39:54] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 27% free memory [00:40:19] * Damianz gives mimi a cookie [00:40:41] thanks Damianz :o [00:40:42] Damianz: did you see the scheduler in salt 0.12.0? http://docs.saltstack.org/en/latest/topics/releases/0.12.0.html [00:40:56] * mimi eats [00:40:58] like a cron for the network :D [00:41:24] nope but that would be really useful... I was thinking of pushing metrics to graphite via it [00:41:51] pure python dsl renderer could be useful too [00:42:14] I'm hoping the xp build for windows gets fixed... 7/08 works awesomly [00:42:33] is the one in this release not ok? [00:42:36] * mimi brings cake [00:42:39] :o [00:42:41] someone want [00:42:45] A few weeks back it wasn't... I have a bug open for it [00:42:50] Does weird cpu things on xp [00:42:54] ah [00:43:20] https://f.cloud.github.com/assets/142120/50392/892b2fae-59a3-11e2-90e9-604d277f5cd8.png < [00:43:29] hahahaha [00:43:57] I realy need group based acls so I can jam it into ldap for users then use peer runs and stuff to do monitoring :D [00:46:08] yep [00:46:11] that would be nice [00:46:24] nice than nrpe/nsclinet++ [00:46:29] indeed [00:46:53] I was trying to use a custom c# schedular with powershell modules for graphing system metrics on windows... it turns out powershell sucks ass compared to python [00:47:23] heh [00:47:27] just use python, then ;) [00:47:57] Means I have to install it a few hundred times... but if I can push out salt then that's my excuse and I can just use modules [00:48:39] I'm still learning that real time distributed metric collection is hard when you're polling hundreds of thousands of data points :( [00:49:41] multicast udp ;) [00:51:01] Damianz: merged your change through [00:51:47] Mmmm udp, if only it was a bit more clingy over crappy links [00:52:01] Yay... now just to fix snmptt... maybe tomorrow [00:52:31] I wonder if I can get a commit into production puppet with the line 'couldn't give a flying monkey turd in it'.... I think this should be a goal for a refactor [00:52:50] Krenair: are you ready for that change to be merged in? [00:52:56] I reviewed it and it looks fine to me [00:52:58] no [00:53:00] ok [00:53:14] it works with or without echo enabled, which is nice :) [00:53:15] there's still some stuff missing like preference messages [00:53:17] ah [00:53:18] ok [00:53:35] it does? I was about to go and probably break the labsconsole test to check that :) [00:53:39] we need openstack's gerrit change for "work-in-progress" [00:53:42] I just tested it [00:53:46] never give users choices, just force it upon them! [00:54:12] people yell when you use gerrit for wip [00:54:42] I only have 45 revisions of bots up which sent like 2 messages to ops channel for each :D [00:55:44] That maint script is 100% untested [00:55:55] ah. right [00:56:24] will fit right into mediawiki [00:56:26] * Damianz ducks [00:56:35] oh, shouldn't that be in a separate change? [00:57:25] I developed it all together... Kinda regret it now because it's going to be a pain to split up [00:58:28] well, you can remove the file from the change by doing an amended change [00:59:02] I hate that about gerrit [00:59:19] so, I started using github for something recently [00:59:23] and I *hate* it [00:59:24] Much before the feature branch, merge to master workflow... compressing to 1 commit looses so much context [01:00:30] for group workflow github sucks [01:00:36] GH wouldn't work very well for the likes of puppet [01:00:45] I can't modify someone else's pull request [01:00:48] For 1/2/3 people managing a project it's awesome [01:00:49] I have to ask them to change it [01:00:56] I could just stick echo "This script isn't ready for use yet.\n"; die(); at the top [01:01:03] You can pull their branch, modify it and submit a PR [01:01:16] die is evil... [01:01:19] Damianz: then I steal their change [01:01:23] which is bullshit [01:01:32] Not really [01:01:43] Commit history would still be ,,,,, [01:01:51] yes [01:02:22] but the github interface would show someone else as adding the pull-request [01:02:35] dealing with rebases is annoying too [01:02:43] with git review, it'll rebase automatically for you [01:02:51] gerrit has a button to rebase dependencies [01:02:52] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 9% free memory [01:02:57] in github you have to do it manually [01:03:04] Hmm I don't really care who does the PR as long as the code is good [01:03:36] the person with the original pull request may care [01:04:01] rebasing automatically can be dodgy... if you actually edit the same file over multiple branches and it can't merge [01:04:04] people use github as their personal resume now-a-days [01:04:26] gerrit will tell if you if fails the rebase [01:04:34] My resume is on github, a large proportion of the code there is crap/random and not stuff I use professionally :D [01:04:36] and git review will bring you into a mode to fix it [01:04:52] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [01:05:53] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 19% free memory [01:07:00] Gerrit could be a lot nicer ux wise with the same workflow... like the github/travisci integration is sweet [01:07:24] Ryan_Lane, if I add that to the top of the maint script would it be okay? [01:07:33] PROBLEM Free ram is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: Warning: 19% free memory [01:07:43] PROBLEM Total processes is now: WARNING on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS WARNING: 174 processes [01:08:07] Krenair: well, it would really be better to split the file out. how are you pushing the change in right now? [01:08:32] Damianz: travisci? [01:08:48] I commit amend, run git review. Then to test, kill the branch on nova-precise2 and run git fetch, git checkout, etc. again [01:08:59] Krenair: you can create a new branch, add the file in the other branch [01:09:04] then rm the file from the current branch [01:09:13] then do an amended patchset [01:09:52] hosted ci -> https://travis-ci.org/DamianZaremba/labsnagiosbuilder for example [01:10:06] ah, for people who don't use jenkins? :) [01:10:20] mhm [01:10:22] easier to setup [01:10:26] yeah [01:10:27] jenkins is really a bitch [01:10:51] That sounds difficult [01:10:56] Spent like 30min trying to make it build my site, gate up, wrote 10 lines and bash, stuck it in cron and it just works (tm) [01:11:13] I'm just going to delete it for now. It'll remain in the Git history and I'll keep a backup [01:12:09] git stash ftw! [01:12:42] RECOVERY Total processes is now: OK on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS OK: 97 processes [01:16:19] I need to go now. gnight. [01:18:53] Krenair: good night [01:34:53] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [02:04:53] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [02:17:31] Ryan_Lane, did you update that puppet config? [02:17:40] which one? [02:17:45] the bots one? [02:17:49] ya [02:17:49] no. I'd imagine that Damianz did [02:17:59] but it isn't applied currently anyway [02:18:06] you need to manually install the packages for now [02:18:26] well sudo is kinda disabled on that server... xD [02:19:18] which packages is it? I'll install [02:19:33] this is on bots-3? [02:19:43] nr2 [02:20:50] openjdk-7-jdk openjdk-7-jre [02:21:30] installing [02:22:43] done [02:22:50] awesome, thanks :) [02:24:00] yw [02:24:02] PROBLEM Free ram is now: CRITICAL on testing-virt11.pmtpa.wmflabs 10.4.0.82 output: Connection refused by host [02:25:33] PROBLEM Total processes is now: CRITICAL on testing-virt11.pmtpa.wmflabs 10.4.0.82 output: Connection refused by host [02:26:16] \o/ all compute nodes in now [02:27:11] hmm Ryan_Lane i gave you the wrong packages i think cause of errors, can you uninstall those and install openjdk-6-jdk openjdk-6-jre [02:28:01] compatability does not exist in java [02:28:24] done [02:30:05] still same issue, let me see what i had on bots-1 for the old fbot... [02:34:13] PROBLEM Current Load is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: WARNING - load average: 4.10, 5.65, 5.11 [02:34:53] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [02:39:12] RECOVERY Current Load is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: OK - load average: 4.34, 5.03, 4.96 [02:42:11] i guess ill just build it on my laptop [02:55:53] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 20% free memory [03:04:53] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [03:09:02] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 18% free memory [03:36:23] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [03:55:11] hmm. is bastion having issues? [03:55:33] yes [03:55:42] the ldap change I pushed caused issues [03:55:47] kk [03:55:51] fixing now [03:56:57] ok. working again [03:58:09] the changes should ideally make things faster and cause less load on the ldap servers [04:06:23] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [04:28:04] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 31% free memory [04:33:43] PROBLEM Free ram is now: CRITICAL on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: Critical: 5% free memory [04:36:24] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [04:38:44] RECOVERY Free ram is now: OK on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: OK: 35% free memory [04:38:54] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 22% free memory [04:41:02] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 19% free memory [04:41:22] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 21% free memory [04:54:22] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 16% free memory [04:56:53] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 19% free memory [05:07:12] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [05:21:24] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 31% free memory [05:22:34] PROBLEM Total processes is now: WARNING on dumps-bot2.pmtpa.wmflabs 10.4.0.60 output: PROCS WARNING: 152 processes [05:27:32] RECOVERY Total processes is now: OK on dumps-bot2.pmtpa.wmflabs 10.4.0.60 output: PROCS OK: 148 processes [05:37:14] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [06:07:42] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [06:07:52] PROBLEM Current Load is now: WARNING on orgcharts-dev.pmtpa.wmflabs 10.4.0.122 output: WARNING - load average: 6.00, 5.79, 5.37 [06:31:02] PROBLEM Total processes is now: WARNING on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS WARNING: 155 processes [06:35:53] RECOVERY Total processes is now: OK on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS OK: 147 processes [06:37:43] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [06:44:48] petan, around? :) [06:53:53] PROBLEM dpkg-check is now: CRITICAL on deployment-cache-mobile01.pmtpa.wmflabs 10.4.1.82 output: DPKG CRITICAL dpkg reports broken packages [06:58:53] RECOVERY dpkg-check is now: OK on deployment-cache-mobile01.pmtpa.wmflabs 10.4.1.82 output: All packages OK [07:09:52] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [07:32:02] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 21% free memory [07:39:53] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [07:51:02] addshore sure [07:51:26] any chance I can have a mysql account e.t.c :) [07:51:28] on bots [07:51:31] sure [07:51:33] on which server [07:51:45] what are my choices? [07:51:54] @labs-resolve bots-sql [07:51:54] I don't know this instance - aren't you are looking for: I-000000af (bots-sql2), I-000000b4 (bots-sql3), I-000000b5 (bots-sql1), [07:52:44] 1 or 3 I think (they have more ram) ;p [07:52:57] ok, but sql2 has more storage :P [07:53:10] I shouldnt need the space :) [07:53:22] doubt i will ever use over 10mb [07:53:22] xD [07:53:23] 3 is better ;) [07:53:33] * addshore would like 3 then :) [07:53:35] petan: is deployment-prep-backup sql instance doing stuff still? [07:53:59] Ryan_Lane not much, but it still holds lot of db backups [07:54:03] ok [07:54:07] for whole beta [07:54:13] I need to move it from virt6 to another host [07:54:20] should be ok to shut it down, right? [07:54:20] no problem, you can even shut it down [07:54:29] sure [07:54:36] does it just keep backup files? [07:54:51] sometimes I run backup script which clone the beta sql in there [07:55:01] so it also run mysql server [07:55:04] if so, any reason not to keep the files in gluster? [07:55:04] ah [07:55:16] it actually runs the db too [07:55:16] ok [07:55:17] I don't copy files, I use sql commands for that [07:55:23] * Ryan_Lane nods [07:55:28] makes sense [07:55:32] ok. I'll move that tomorrow [07:55:37] copying files on the fly is something btrfs can do, but not ext3 :P [07:55:38] we're getting low on space on virt6 [07:55:57] like, they are huge and being modified when you copy them [07:56:13] * Ryan_Lane nods [07:56:42] I guess purging nscd cache on every system isn't going to help the load of the LDAP server any [07:56:47] heh [07:56:59] oh well, it should help when they fill back up [07:57:12] stupid nscd has to be fully purged when you change most of its settings [08:00:00] @search phpmy [08:00:00] No results were found, remember, the bot is searching through content of keys and their names [08:00:02] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 19% free memory [08:00:32] 87% and 57% hitrate on bastion1. up from 1% and 0% [08:01:25] addshore did you receive my pm? [08:04:42] Damianz: thanks for your fix of the instance names in the nagios notifications \O/ [08:04:52] nom [08:05:00] snmptt is still broken though :( [08:05:26] Damianz: is the Nagios host receiving any traps at all? [08:05:48] port 162 udp [08:06:07] so you could tcpdump and find out if the nagios is at least receiving them [08:06:37] Oh it's recieving them, snmptt is just not calling the script to submit passive results... didn't get time to fix it last night [08:07:41] ahh [08:08:46] Should get chance to sort it today maybe... hmm, just pushed a change to use vars for the instance name in puppet also [08:11:09] merged [08:11:33] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [08:11:42] * Damianz pats Ryan_Lane [08:20:53] PROBLEM Free ram is now: WARNING on bots-nr1.pmtpa.wmflabs 10.4.1.2 output: Warning: 19% free memory [08:22:13] I hate java [08:22:17] period [08:39:23] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 21% free memory [08:39:53] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 22% free memory [08:40:54] RECOVERY Free ram is now: OK on bots-nr1.pmtpa.wmflabs 10.4.1.2 output: OK: 21% free memory [08:41:34] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [08:43:34] hashar +1 [08:44:03] I hate oracle, java is just an arm [08:52:24] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 16% free memory [08:53:54] PROBLEM Free ram is now: WARNING on bots-nr1.pmtpa.wmflabs 10.4.1.2 output: Warning: 18% free memory [09:04:23] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 15% free memory [09:12:12] PROBLEM host: labs-nfs1.pmtpa.wmflabs is DOWN address: 10.4.0.13 CRITICAL - Host Unreachable (10.4.0.13) [09:13:52] Damianz: if you are still around, do you know if we have a nagios check for ircecho bot ? [09:14:36] misc::ircecho apparently does not have any [09:17:54]