[00:08:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [00:08:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [00:08:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [00:23:44] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [00:27:06] Thehelpfulone: you broke it yourself: https://labsconsole.wikimedia.org/w/index.php?title=Nova_Resource:I-00000235&diff=3360&oldid=3349 [00:27:19] * jeremyb reverts [00:28:03] oh, you already reverted [00:28:06] but it's still there [00:35:03] I didn't break that? [00:35:10] oh yeah [00:35:15] mutante was working through it with me [00:35:19] so we removed apache2 [00:35:24] but then it was still running, so we had to kill it [00:35:24] but you didn't really [00:35:32] I did [00:35:37] yeah I just unticked it, didn't actually delete it [00:36:24] PROBLEM HTTP is now: CRITICAL on mailman-01 i-00000235 output: Connection refused [00:38:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [00:38:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [00:38:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [00:48:41] is the web server back up then jeremyb? [00:52:33] Thehelpfulone: try now [00:53:00] yep, but http://mailman.wmflabs.org/ is giving me a 500 - can that redirect to /mailman/listinfo ? [00:53:44] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [00:55:31] Thehelpfulone: err, ok [00:55:42] Thehelpfulone: anyway, next time puppet runs it will break [00:56:00] oh? the same thing happens with https://lists.wikimedia.org --> https://lists.wikimedia.org/mailman/listinfo [00:56:19] Thehelpfulone: i know [00:56:24] RECOVERY HTTP is now: OK on mailman-01 i-00000235 output: HTTP OK: HTTP/1.1 301 Moved Permanently - 163 bytes in 0.003 second response time [00:59:14] i'm going to need some gerrit merges for puppet for the test branch. i think. but what happens once a merge is done? how does it get to the puppetmaster? [00:59:30] i've no idea who might be around to answer or do them [00:59:51] i'm not even sure where the puppetmaster is any more. virt4? [01:00:02] you're asking me :P [01:00:51] yes! [01:02:02] parav oid's new per project stuff can't come soon enough! [01:06:08] heh [01:08:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [01:08:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [01:08:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [01:12:14] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [01:12:48] Does nagios make those announcements ever 30 minutes? [01:13:26] 06 00:08:44 < labs-nagios-wm> PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [01:13:29] 06 00:38:44 < labs-nagios-wm> PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [01:13:33] 06 01:08:44 < labs-nagios-wm> PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [01:13:38] judge for yourself ;) [01:13:53] So... why not fix it? [01:14:12] looking at the update log for mailman jeremyb - .14 https://support.eapps.com/index.php?/Knowledgebase/Article/View/374/24/release-notes---mailman The buttons for Cancel and Subscribe have been reordered on the subscription confirmation page so that the default action when pressing “Enter” will be the Subscribe button (if the browser defaults to picking the first button) -- looks like we... [01:14:13] ...might need to consider upgrading mailman before working on the list info and subscriber pages? [01:14:46] sorry, that link should be https://launchpad.net/mailman/2.1/2.1.14 for the full log [01:17:06] i don't follow [01:17:12] i've never had a problem with hitting enter [01:19:55] nah I was just cpying some things that were relevant to want I'm planning to work on - I mean updating these pages and working on them when everything is possibly going to change from a 2.1.13 -> 2.1.14 upgrade doesn't make sense, do you know why the upgrade hasn't been done (it's been a year and half since 2.1.14 was released) [01:20:06] ah so it seems they upgrade mailman recently (in Jan) to 2.1.13 [01:23:44] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [01:24:51] upgraded* [01:25:14] PROBLEM HTTP is now: WARNING on deployment-web i-00000217 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.005 second response time [01:38:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [01:38:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [01:38:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [01:53:44] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [01:56:50] Thehelpfulone: so, actually it seems puppet doesn't break it immediately. but that's a bug in the manifests that needs fixing ;) [01:57:19] (it just breaks the config but doesn't restart/reload lighttpd. i'll fix it to do the restart/reload) [01:58:55] ok thanks :) [02:08:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [02:08:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [02:08:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [02:23:44] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [02:29:14] PROBLEM Puppet freshness is now: CRITICAL on mobile-feeds i-000000c1 output: Puppet has not run in last 20 hours [02:38:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [02:38:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [02:38:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [02:50:44] RECOVERY Disk Space is now: OK on maps-test2 i-00000253 output: DISK OK [02:51:14] RECOVERY Free ram is now: OK on maps-test2 i-00000253 output: OK: 93% free memory [02:51:54] RECOVERY Total Processes is now: OK on incubator-bot2 i-00000252 output: PROCS OK: 116 processes [02:52:44] RECOVERY Total Processes is now: OK on maps-test2 i-00000253 output: PROCS OK: 85 processes [02:52:49] RECOVERY dpkg-check is now: OK on incubator-bot2 i-00000252 output: All packages OK [02:53:14] RECOVERY dpkg-check is now: OK on maps-test2 i-00000253 output: All packages OK [02:53:44] RECOVERY Current Load is now: OK on incubator-bot2 i-00000252 output: OK - load average: 0.55, 0.50, 0.47 [02:53:44] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [02:54:24] RECOVERY Current Users is now: OK on incubator-bot2 i-00000252 output: USERS OK - 0 users currently logged in [02:54:24] RECOVERY Current Load is now: OK on maps-test2 i-00000253 output: OK - load average: 0.06, 0.10, 0.08 [02:55:04] RECOVERY Disk Space is now: OK on incubator-bot2 i-00000252 output: DISK OK [02:55:04] RECOVERY Current Users is now: OK on maps-test2 i-00000253 output: USERS OK - 0 users currently logged in [02:55:44] RECOVERY Free ram is now: OK on incubator-bot2 i-00000252 output: OK: 84% free memory [03:08:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [03:08:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [03:08:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [03:12:15] PROBLEM Puppet freshness is now: CRITICAL on deployment-web i-00000217 output: Puppet has not run in last 20 hours [03:23:44] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [03:38:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [03:38:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [03:38:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [03:44:24] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 16% free memory [03:48:54] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 14% free memory [03:53:44] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [03:54:34] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 12% free memory [03:59:24] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [04:00:44] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 14% free memory [04:03:54] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 4% free memory [04:08:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [04:08:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [04:08:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [04:08:54] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 96% free memory [04:09:24] PROBLEM HTTP is now: CRITICAL on mailman-01 i-00000235 output: Connection refused [04:09:24] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [04:14:34] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 4% free memory [04:15:44] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 5% free memory [04:19:34] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:20:27] Hey, jeremyb... got a nifty test in php to check if something is passed-by-reference? [04:21:19] i don't know php [04:21:24] but i can dig [04:22:22] [04:22:44] There isn't an easy way to tell AFAIK [04:23:15] You shouldn't really need to know either as if you're using refrences there is one way you use the code, if not there is a totally difference way... it's not really interchangeable [04:23:44] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [04:25:44] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 96% free memory [04:27:08] Damianz: I'm writing unit tests, then writing code which meets the unit tests. In this case I'm building a Registery pattern to index objects, because there could be quite a few thousands of them generated, so I really need to know exactly how the object is being stored to ensure the registry is correctly setting/getting the objects in the index. [04:27:53] Amgine: the docs say the function definition determines if you've passed by reference and passing by reference on the calling side is deprecated [04:28:11] see the note under the first code block: http://php.net/manual/en/language.references.pass.php [04:28:19] Thanks jeremyb! [04:29:44] PROBLEM Free ram is now: CRITICAL on test3 i-00000093 output: Critical: 4% free memory [04:31:31] * Damianz thinks you're doing it wrong if you added 'few thousand' and 'php' together :D [04:34:44] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 96% free memory [04:38:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [04:38:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [04:38:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [04:53:44] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [05:08:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [05:08:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [05:08:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [05:23:44] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [05:25:24] PROBLEM HTTP is now: CRITICAL on deployment-web i-00000217 output: CRITICAL - Socket timeout after 10 seconds [05:31:34] PROBLEM host: deployment-web is DOWN address: i-00000217 CRITICAL - Host Unreachable (i-00000217) [05:36:44] RECOVERY host: deployment-web is UP address: i-00000217 PING OK - Packet loss = 0%, RTA = 5.07 ms [05:38:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [05:38:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [05:38:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [05:39:24] RECOVERY HTTP is now: OK on mailman-01 i-00000235 output: HTTP OK: HTTP/1.1 301 Moved Permanently - 190 bytes in 0.004 second response time [05:43:14] PROBLEM HTTP is now: WARNING on deployment-web i-00000217 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.007 second response time [05:53:44] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [06:08:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [06:08:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [06:08:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [06:23:44] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [06:32:24] PROBLEM HTTP is now: CRITICAL on mailman-01 i-00000235 output: Connection refused [06:38:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [06:38:44] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [06:38:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [06:54:24] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [07:09:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [07:09:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [07:09:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [07:25:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [07:39:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [07:39:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [07:39:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [07:56:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [07:59:23] PROBLEM HTTP is now: CRITICAL on deployment-web i-00000217 output: CRITICAL - Socket timeout after 10 seconds [08:04:13] PROBLEM host: deployment-web is DOWN address: i-00000217 CRITICAL - Host Unreachable (i-00000217) [08:09:23] RECOVERY host: deployment-web is UP address: i-00000217 PING OK - Packet loss = 0%, RTA = 6.51 ms [08:09:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [08:09:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [08:09:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [08:12:53] PROBLEM Current Load is now: WARNING on deployment-web i-00000217 output: WARNING - load average: 1.05, 5.92, 6.58 [08:17:13] PROBLEM HTTP is now: WARNING on deployment-web i-00000217 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.007 second response time [08:22:53] RECOVERY Current Load is now: OK on deployment-web i-00000217 output: OK - load average: 1.01, 1.67, 3.91 [08:26:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [08:39:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [08:39:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [08:39:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [08:44:16] hmmm [08:56:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [08:58:27] New patchset: Jeremyb; "notify (+ do a reload) on lighttpd when its config changes" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/6727 [08:58:41] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/6727 [08:59:13] That's so gonna break mailman :D [09:09:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [09:09:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [09:09:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [09:09:49] Damianz: elaborate? [09:20:47] guys someone ack these on nag [09:20:53] I am tired of checks hehe [09:21:16] It's just running on a patched config until someone merges my change so if the webserver restarts it will break (so when that gets merged it will reload when puppet fixes my path and break stuff) :D [09:26:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [09:34:54] petan: i don't even know how to ack [09:35:20] Damianz: i know [09:35:30] Damianz: where are your commits? [09:36:36] I could ack them but I'd have to find my nagios login (please move to ldap :D) [09:36:47] https://gerrit.wikimedia.org/r/#/c/6584/ [09:38:34] i don't think i even have a login [09:39:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [09:39:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [09:39:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [09:56:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [10:09:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [10:09:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [10:09:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [10:15:49] jeremyb: if u need login to nagios let me know [10:16:01] I forgot pw XD [10:16:08] whoa, that's a lot of flood [10:16:25] yeh [10:17:23] PROBLEM HTTP is now: CRITICAL on deployment-web i-00000217 output: CRITICAL - Socket timeout after 10 seconds [10:22:23] PROBLEM HTTP is now: WARNING on deployment-web i-00000217 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 4.629 second response time [10:26:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [10:39:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [10:39:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [10:39:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [10:56:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [11:09:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [11:09:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [11:09:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [11:12:13] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [11:26:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [11:39:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [11:39:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [11:39:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [11:56:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [12:09:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [12:09:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [12:09:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [12:26:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [12:29:13] PROBLEM Puppet freshness is now: CRITICAL on mobile-feeds i-000000c1 output: Puppet has not run in last 20 hours [12:39:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [12:39:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [12:39:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [12:56:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [13:09:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [13:09:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [13:09:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [13:12:13] PROBLEM Puppet freshness is now: CRITICAL on deployment-web i-00000217 output: Puppet has not run in last 20 hours [13:26:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [13:39:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [13:39:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [13:39:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [13:56:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [14:09:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [14:09:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [14:09:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [14:26:53] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [14:39:23] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [14:39:23] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [14:39:23] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [14:59:16] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [15:09:26] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [15:09:26] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [15:09:26] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [15:29:26] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [15:39:26] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [15:39:26] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [15:39:26] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [15:59:26] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [16:09:26] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [16:09:26] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [16:09:26] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [16:31:50] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [16:39:50] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [16:41:40] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [16:41:40] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [16:48:10] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 16% free memory [17:01:50] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [17:08:10] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5 output: Critical: 4% free memory [17:10:50] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [17:11:40] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [17:11:40] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [17:13:10] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 6% free memory [17:31:50] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [17:41:40] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [17:41:40] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [17:41:40] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [18:01:50] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [18:03:20] PROBLEM Current Load is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [18:05:11] PROBLEM Disk Space is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [18:07:18] PROBLEM SSH is now: CRITICAL on bots-cb i-0000009e output: CRITICAL - Socket timeout after 10 seconds [18:07:26] PROBLEM Current Users is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [18:07:26] PROBLEM dpkg-check is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [18:07:56] PROBLEM Free ram is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [18:08:06] PROBLEM Total Processes is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [18:09:56] RECOVERY Disk Space is now: OK on bots-cb i-0000009e output: DISK OK [18:10:07] * Damianz shoots nagios [18:11:46] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [18:11:46] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [18:11:46] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [18:12:06] RECOVERY SSH is now: OK on bots-cb i-0000009e output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:12:16] RECOVERY Current Users is now: OK on bots-cb i-0000009e output: USERS OK - 0 users currently logged in [18:12:16] RECOVERY dpkg-check is now: OK on bots-cb i-0000009e output: All packages OK [18:12:56] RECOVERY Free ram is now: OK on bots-cb i-0000009e output: OK: 60% free memory [18:12:56] RECOVERY Total Processes is now: OK on bots-cb i-0000009e output: PROCS OK: 106 processes [18:23:16] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 0.32, 7.23, 18.67 [18:32:16] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [18:41:46] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [18:41:46] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [18:41:46] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [18:45:42] Damianz: just disable checks on these [18:45:45] I forgot pw [18:46:10] I think that are just intances no one uses anymore Ryan should delete them [18:48:16] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.54, 0.46, 4.04 [18:51:56] We should have like a if turned off -> disable service monitoring then just have 1 alert [19:03:16] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [19:11:46] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [19:11:46] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [19:13:56] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [19:17:01] Damianz: it doesn't check services if machine is off [19:17:11] Damianz: it only report that machine is down [19:29:36] PROBLEM Disk Space is now: WARNING on deployment-transcoding i-00000105 output: DISK WARNING - free space: / 78 MB (5% inode=53%): [19:34:16] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [19:41:46] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [19:41:46] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [19:44:36] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [20:03:36] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [20:04:10] hello ,,,! [20:04:36] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [20:11:46] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [20:11:46] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [20:13:36] RECOVERY Free ram is now: OK on bots-2 i-0000009c output: OK: 21% free memory [20:14:36] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [20:32:27] PROBLEM HTTP is now: CRITICAL on deployment-web i-00000217 output: CRITICAL - Socket timeout after 10 seconds [20:34:37] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [20:37:17] PROBLEM HTTP is now: WARNING on deployment-web i-00000217 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.004 second response time [20:41:47] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [20:41:47] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [20:44:37] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [21:04:37] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [21:11:47] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [21:11:47] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [21:13:07] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [21:14:37] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [21:34:37] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [21:41:47] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [21:41:47] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [21:44:37] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [22:04:37] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [22:11:47] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [22:11:47] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [22:14:37] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [22:30:07] PROBLEM Puppet freshness is now: CRITICAL on mobile-feeds i-000000c1 output: Puppet has not run in last 20 hours [22:34:37] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [22:41:47] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [22:41:47] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [22:44:37] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [23:04:37] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [23:11:47] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [23:11:47] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [23:13:07] PROBLEM Puppet freshness is now: CRITICAL on deployment-web i-00000217 output: Puppet has not run in last 20 hours [23:14:37] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [23:34:37] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [23:41:47] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [23:41:47] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [23:44:37] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2)