[00:00:00] LeslieCarr: Can haz ipv6? :D [00:00:13] that's a openstack issue :( [00:00:24] ipv6 is awesome, public ip's for everyone! [00:00:43] PROBLEM Disk Space is now: WARNING on deployment-feed i-00000118 output: DISK WARNING - free space: / 67 MB (5% inode=40%): [00:00:50] Ryan_Lane: are you around ? [00:01:01] ish [00:01:08] ipv6 will screw everyone who thinks nat = firewall though, yay for home noobs [00:01:09] gluster is having issues again [00:01:16] well, kind of [00:01:23] oh fun :( [00:01:26] it's working, but project storage isn't mounting with newer versions of the client installed [00:01:37] i forget the host/command to add ip's the pool [00:02:01] nova something… :) [00:03:24] nova-manage floating something iirc, *shrugs* euca-* makes me never want to login to the controller. [00:04:33] As a side note, running kvm in kvm in kvm with bridged networking causes your bridges to go crazy on the host [00:04:43] PROBLEM Disk Space is now: CRITICAL on deployment-transcoding i-00000105 output: DISK CRITICAL - free space: / 37 MB (2% inode=53%): [00:05:13] I assume that's because project storage has un-mounted [00:08:59] hopefully i didn't just break labs ... [00:09:21] <^demon> Hide, just in case? [00:10:09] LeslieCarr: break it in which way? [00:10:42] i don't know ;) i did the nova-manage commands on virt0 (hopefully that was the correct one) to allocate a new range of floating ip's and increase the quota of my project [00:10:47] Sacrifice a chicken and blame Ryan :) [00:10:56] I don't see why anythng would be project [00:10:57] err [00:10:58] broken [00:11:14] I upgraded gluster and the new version is having issues with clients mounting [00:11:20] I'll probably end up rolling back [00:11:30] unless I can fix it [00:16:53] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [00:19:53] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [00:21:53] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [00:46:53] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [00:49:53] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [00:51:53] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [01:10:21] * jeremyb waves a paravoid [01:10:28] hi! [01:10:45] hola [01:11:07] taking spanish lessons yet? ;) [01:11:32] * jeremyb is still wondering some about the new family of puppetmasters. or whatever the plan is [01:12:19] although i guess you've been to at least dc9 or dc8 (right?) which were both spanish [01:14:43] PROBLEM HTTP is now: CRITICAL on deployment-apache03 i-00000248 output: CRITICAL - Socket timeout after 10 seconds [01:16:53] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [01:19:33] PROBLEM HTTP is now: WARNING on deployment-apache03 i-00000248 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.009 second response time [01:19:53] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [01:21:53] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [01:30:29] Is deployment.wikimedia.beta.wmflabs.org down, or did it get renamed? [01:33:56] mutante: do you know ^^ ? [01:34:08] hashar had a more relevant looking !log but he's gone [01:39:03] jeremyb: Probably because Ryan_Lane broke storage [01:39:06] err I mean andrewbogott [01:39:19] likely due to me [01:39:34] oh, ok [01:39:43] I'll check it in a second [01:40:23] i was just going based on [[Nova Resource:Deployment-prep/SAL]] [01:40:33] but on second look those weren't as recent as i was thinking [01:41:21] hm [01:41:22] no [01:41:29] I don't see why there's a problem [01:41:40] I take that back [01:41:44] gluster upgraded on the squid [01:42:29] seems it should work [01:43:58] !log deployment-prep restarting squid on deployment-squid [01:43:59] Logged the message, Master [01:44:22] this isn't working terribly well [01:45:29] jeremyb: for mutante it's like 3.45am :p [01:45:45] Ryan_Lane: It's time for you to eat dinner; I shouldn't have said anything until tomorrow :( [01:46:00] heh [01:46:08] well, this is realted to the other problems I'm having [01:46:53] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [01:47:11] Anyway, this is a great opportunity for me to build some timeouts into my code. [01:48:05] heh [01:49:01] !log deployment-prep rebooting deployment-squid [01:49:02] Logged the message, Master [01:49:03] <3 sigalrm [01:49:09] <\3 zabbix right now [01:49:31] Seriously -- at the moment the module I'm testing is determined to hang all of Nova until the deployment comes back up. Pretty rude. [01:49:53] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [01:50:33] :D [01:50:44] what module are you testing? [01:50:57] Uhhh, I just made crazy things happen o.0 [01:51:53] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [01:51:53] Damianz: what do you mean? [01:52:03] PROBLEM Free ram is now: CRITICAL on deployment-squid i-000000dc output: Connection refused or timed out [01:52:24] I just managed to make php executing 1.3 million sqlqueries on a page load [01:52:38] un-related to gluster but still wtf [01:52:43] PROBLEM Disk Space is now: CRITICAL on deployment-squid i-000000dc output: Connection refused or timed out [01:52:43] PROBLEM Total Processes is now: CRITICAL on deployment-squid i-000000dc output: Connection refused or timed out [01:53:03] PROBLEM Current Load is now: CRITICAL on deployment-squid i-000000dc output: Connection refused or timed out [01:53:03] PROBLEM SSH is now: CRITICAL on deployment-squid i-000000dc output: No route to host [01:53:03] PROBLEM dpkg-check is now: CRITICAL on deployment-squid i-000000dc output: Connection refused or timed out [01:54:25] Jesus, the default behavior is 25 retries at 30 seconds per try. [01:54:38] Ryan_Lane: Instance status pages. [01:54:54] Well, using that code to debug the plugin framework code. [01:54:55] ah [01:55:02] heh [01:55:08] deployment is back up [01:55:09] kindof [01:55:10] And vice versa, apparently. [01:55:35] well, it was anyway [01:56:07] oh. backends must be screwed up [01:56:27] Reedy: i can never remember where he or hashar or binasher are [01:56:40] but i guess that makes sense for CEST: 08 01:45:29 < Reedy> jeremyb: for mutante it's like 3.45am :p [01:56:42] he was in australia, but he's back in germany [01:56:53] hashar is usually CEST also, but is currently on PDT [01:57:01] binasher is usually always on PDT [01:57:03] RECOVERY Free ram is now: OK on deployment-squid i-000000dc output: OK: 91% free memory [01:57:33] RECOVERY Total Processes is now: OK on deployment-squid i-000000dc output: PROCS OK: 101 processes [01:57:38] RECOVERY Disk Space is now: OK on deployment-squid i-000000dc output: DISK OK [01:58:03] RECOVERY Current Load is now: OK on deployment-squid i-000000dc output: OK - load average: 1.30, 0.74, 0.29 [01:58:03] RECOVERY SSH is now: OK on deployment-squid i-000000dc output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [01:58:03] RECOVERY dpkg-check is now: OK on deployment-squid i-000000dc output: All packages OK [01:58:45] it's working [02:02:23] PROBLEM host: deployment-web is DOWN address: i-00000217 CRITICAL - Host Unreachable (i-00000217) [02:02:37] Ryan_Lane: Yep, looks up. Cool. [02:05:43] RECOVERY host: deployment-web is UP address: i-00000217 PING OK - Packet loss = 0%, RTA = 0.61 ms [02:11:53] PROBLEM HTTP is now: WARNING on deployment-web i-00000217 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 6.985 second response time [02:15:43] RECOVERY Disk Space is now: OK on deployment-feed i-00000118 output: DISK OK [02:16:53] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [02:19:53] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [02:21:53] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [02:46:53] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [02:49:53] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [02:51:53] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [02:52:23] RECOVERY Puppet freshness is now: OK on hugglewiki i-000000aa output: puppet ran at Tue May 8 02:52:13 UTC 2012 [03:16:13] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [03:16:53] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [03:19:53] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [03:21:53] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [03:36:13] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 14% free memory [03:36:43] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 16% free memory [03:45:33] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 15% free memory [03:46:53] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [03:49:53] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [03:51:53] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [03:56:13] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 5% free memory [03:56:43] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 4% free memory [04:01:43] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 96% free memory [04:01:53] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 15% free memory [04:05:33] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 3% free memory [04:06:13] PROBLEM Free ram is now: CRITICAL on test3 i-00000093 output: Critical: 4% free memory [04:06:13] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:10:33] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:11:13] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 96% free memory [04:16:43] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [04:16:53] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [04:19:53] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [04:21:53] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [04:26:43] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [04:33:13] PROBLEM Puppet freshness is now: CRITICAL on mobile-feeds i-000000c1 output: Puppet has not run in last 20 hours [04:46:53] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [04:49:53] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [04:51:53] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [05:14:43] PROBLEM Disk Space is now: CRITICAL on deployment-transcoding i-00000105 output: DISK CRITICAL - free space: / 38 MB (2% inode=53%): [05:16:13] PROBLEM Puppet freshness is now: CRITICAL on deployment-web i-00000217 output: Puppet has not run in last 20 hours [05:16:53] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [05:19:53] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [05:22:06] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [05:46:53] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [05:49:53] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [05:51:53] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [06:16:53] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [06:19:53] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [06:21:53] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [06:31:45] 05/08/2012 - 06:31:45 - Creating a project directory for demo [06:31:46] 05/08/2012 - 06:31:45 - Creating a home directory for laner at /export/home/demo/laner [06:32:39] 05/08/2012 - 06:32:39 - Updating keys for laner [06:43:49] PROBLEM Current Load is now: CRITICAL on demo-web1 i-00000255 output: Connection refused by host [06:44:24] PROBLEM Current Users is now: CRITICAL on demo-web1 i-00000255 output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:45:05] PROBLEM Disk Space is now: CRITICAL on demo-web1 i-00000255 output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:45:44] PROBLEM Free ram is now: CRITICAL on demo-web1 i-00000255 output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:46:54] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [06:46:54] PROBLEM Total Processes is now: CRITICAL on demo-web1 i-00000255 output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:47:34] PROBLEM dpkg-check is now: CRITICAL on demo-web1 i-00000255 output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:49:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [06:50:54] PROBLEM Current Load is now: WARNING on mobile-enwp i-000000ce output: WARNING - load average: 6.03, 9.73, 6.50 [06:51:54] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [07:00:53] RECOVERY Current Load is now: OK on mobile-enwp i-000000ce output: OK - load average: 0.96, 2.33, 4.09 [07:01:53] RECOVERY Total Processes is now: OK on demo-web1 i-00000255 output: PROCS OK: 108 processes [07:02:33] RECOVERY dpkg-check is now: OK on demo-web1 i-00000255 output: All packages OK [07:02:43] RECOVERY Disk Space is now: OK on demo-web1 i-00000255 output: DISK OK [07:03:43] RECOVERY Current Load is now: OK on demo-web1 i-00000255 output: OK - load average: 0.14, 0.37, 0.46 [07:04:23] RECOVERY Current Users is now: OK on demo-web1 i-00000255 output: USERS OK - 1 users currently logged in [07:05:43] RECOVERY Free ram is now: OK on demo-web1 i-00000255 output: OK: 94% free memory [07:06:23] PROBLEM HTTP is now: CRITICAL on demo-web1 i-00000255 output: CRITICAL - Socket timeout after 10 seconds [07:10:13] PROBLEM Puppet freshness is now: CRITICAL on deployment-apache05 i-0000024a output: Puppet has not run in last 20 hours [07:15:33] PROBLEM host: deployment-web is DOWN address: i-00000217 CRITICAL - Host Unreachable (i-00000217) [07:18:23] RECOVERY host: deployment-web is UP address: i-00000217 PING OK - Packet loss = 0%, RTA = 5.12 ms [07:18:43] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [07:20:15] PROBLEM Puppet freshness is now: CRITICAL on deployment-apache01 i-00000246 output: Puppet has not run in last 20 hours [07:21:45] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [07:21:45] PROBLEM HTTP is now: WARNING on deployment-web i-00000217 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 3.231 second response time [07:21:55] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [07:23:45] PROBLEM Current Load is now: CRITICAL on demo-mysql1 i-00000256 output: Connection refused by host [07:24:25] PROBLEM Current Users is now: CRITICAL on demo-mysql1 i-00000256 output: CHECK_NRPE: Error - Could not complete SSL handshake. [07:25:05] PROBLEM Disk Space is now: CRITICAL on demo-mysql1 i-00000256 output: CHECK_NRPE: Error - Could not complete SSL handshake. [07:25:45] PROBLEM Free ram is now: CRITICAL on demo-mysql1 i-00000256 output: CHECK_NRPE: Error - Could not complete SSL handshake. [07:26:55] PROBLEM Total Processes is now: CRITICAL on demo-mysql1 i-00000256 output: CHECK_NRPE: Error - Could not complete SSL handshake. [07:27:35] PROBLEM dpkg-check is now: CRITICAL on demo-mysql1 i-00000256 output: CHECK_NRPE: Error - Could not complete SSL handshake. [07:49:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [07:54:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [07:54:54] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [08:06:34] PROBLEM Free ram is now: CRITICAL on bots-2 i-0000009c output: Critical: 3% free memory [08:19:54] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [08:24:54] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [08:24:54] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [08:37:10] PROBLEM Total Processes is now: CRITICAL on dumps-2 i-00000257 output: Connection refused by host [08:37:37] PROBLEM dpkg-check is now: CRITICAL on dumps-2 i-00000257 output: Connection refused by host [08:40:47] PROBLEM Current Users is now: CRITICAL on dumps-2 i-00000257 output: CHECK_NRPE: Error - Could not complete SSL handshake. [08:40:47] PROBLEM Free ram is now: CRITICAL on dumps-2 i-00000257 output: CHECK_NRPE: Error - Could not complete SSL handshake. [08:40:47] PROBLEM Current Load is now: CRITICAL on dumps-2 i-00000257 output: CHECK_NRPE: Error - Could not complete SSL handshake. [08:41:08] PROBLEM Disk Space is now: CRITICAL on dumps-2 i-00000257 output: CHECK_NRPE: Error - Could not complete SSL handshake. [08:44:56] PROBLEM Current Load is now: CRITICAL on dumps-5 i-00000258 output: Connection refused by host [08:45:01] PROBLEM Current Users is now: CRITICAL on dumps-5 i-00000258 output: Connection refused by host [08:45:41] RECOVERY Current Users is now: OK on dumps-2 i-00000257 output: USERS OK - 1 users currently logged in [08:45:41] RECOVERY Free ram is now: OK on dumps-2 i-00000257 output: OK: 93% free memory [08:45:51] RECOVERY Current Load is now: OK on dumps-2 i-00000257 output: OK - load average: 0.24, 0.98, 1.24 [08:45:56] PROBLEM Disk Space is now: CRITICAL on dumps-5 i-00000258 output: Connection refused by host [08:46:01] RECOVERY Disk Space is now: OK on dumps-2 i-00000257 output: DISK OK [08:46:21] PROBLEM Free ram is now: CRITICAL on dumps-5 i-00000258 output: Connection refused by host [08:47:01] RECOVERY Total Processes is now: OK on dumps-2 i-00000257 output: PROCS OK: 102 processes [08:47:41] RECOVERY dpkg-check is now: OK on dumps-2 i-00000257 output: All packages OK [08:47:51] PROBLEM Total Processes is now: CRITICAL on dumps-5 i-00000258 output: Connection refused by host [08:48:11] PROBLEM dpkg-check is now: CRITICAL on dumps-5 i-00000258 output: Connection refused by host [08:50:01] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [08:52:51] RECOVERY Total Processes is now: OK on dumps-5 i-00000258 output: PROCS OK: 83 processes [08:53:11] RECOVERY dpkg-check is now: OK on dumps-5 i-00000258 output: All packages OK [08:54:51] RECOVERY Current Load is now: OK on dumps-5 i-00000258 output: OK - load average: 0.11, 0.96, 1.09 [08:55:01] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [08:55:01] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [08:55:01] RECOVERY Current Users is now: OK on dumps-5 i-00000258 output: USERS OK - 1 users currently logged in [08:55:51] RECOVERY Disk Space is now: OK on dumps-5 i-00000258 output: DISK OK [08:56:21] RECOVERY Free ram is now: OK on dumps-5 i-00000258 output: OK: 92% free memory [09:20:01] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [09:25:01] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [09:25:01] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [09:43:21] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 15.38, 11.43, 5.06 [09:48:21] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.49, 4.44, 3.76 [09:50:01] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [09:52:01] PROBLEM Disk Space is now: CRITICAL on bots-2 i-0000009c output: CHECK_NRPE: Socket timeout after 10 seconds. [09:53:01] PROBLEM SSH is now: CRITICAL on bots-2 i-0000009c output: CRITICAL - Socket timeout after 10 seconds [09:53:01] PROBLEM Current Load is now: CRITICAL on bots-2 i-0000009c output: CHECK_NRPE: Socket timeout after 10 seconds. [09:53:01] PROBLEM Current Users is now: CRITICAL on bots-2 i-0000009c output: CHECK_NRPE: Socket timeout after 10 seconds. [09:53:01] PROBLEM Total Processes is now: CRITICAL on bots-2 i-0000009c output: CHECK_NRPE: Socket timeout after 10 seconds. [09:54:51] PROBLEM dpkg-check is now: CRITICAL on bots-2 i-0000009c output: CHECK_NRPE: Socket timeout after 10 seconds. [09:55:01] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [09:55:01] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [10:20:01] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [10:25:01] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [10:25:01] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [10:27:21] PROBLEM dpkg-check is now: CRITICAL on wikidata-dev-3 i-00000225 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:32:11] RECOVERY dpkg-check is now: OK on wikidata-dev-3 i-00000225 output: All packages OK [10:50:01] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [10:55:01] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [10:55:01] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [11:05:01] PROBLEM HTTP is now: CRITICAL on deployment-web5 i-00000213 output: No route to host [11:09:51] PROBLEM host: deployment-web5 is DOWN address: i-00000213 CRITICAL - Host Unreachable (i-00000213) [11:11:01] RECOVERY host: deployment-web5 is UP address: i-00000213 PING OK - Packet loss = 0%, RTA = 0.62 ms [11:17:45] hashar: hi [11:18:01] PROBLEM HTTP is now: WARNING on deployment-web5 i-00000213 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 5.061 second response time [11:20:01] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [11:25:01] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [11:25:01] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [11:50:01] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [11:55:01] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [11:55:01] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [12:20:01] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [12:25:01] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [12:25:01] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [12:32:44] PROBLEM Free ram is now: CRITICAL on wikidata-dev-2 i-00000259 output: Connection refused by host [12:34:24] PROBLEM Disk Space is now: CRITICAL on wikidata-dev-2 i-00000259 output: Connection refused by host [12:34:24] PROBLEM Total Processes is now: CRITICAL on wikidata-dev-2 i-00000259 output: Connection refused by host [12:34:29] PROBLEM dpkg-check is now: CRITICAL on wikidata-dev-2 i-00000259 output: Connection refused by host [12:34:54] PROBLEM Current Users is now: CRITICAL on wikidata-dev-2 i-00000259 output: Connection refused by host [12:51:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [12:55:04] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [12:56:34] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [12:57:44] RECOVERY Free ram is now: OK on wikidata-dev-2 i-00000259 output: OK: 86% free memory [12:59:24] RECOVERY Disk Space is now: OK on wikidata-dev-2 i-00000259 output: DISK OK [12:59:24] RECOVERY Total Processes is now: OK on wikidata-dev-2 i-00000259 output: PROCS OK: 81 processes [12:59:29] RECOVERY dpkg-check is now: OK on wikidata-dev-2 i-00000259 output: All packages OK [12:59:54] RECOVERY Current Users is now: OK on wikidata-dev-2 i-00000259 output: USERS OK - 0 users currently logged in [13:01:04] RECOVERY Current Load is now: OK on wikidata-dev-2 i-00000259 output: OK - load average: 0.01, 0.15, 0.09 [13:17:07] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [13:21:47] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [13:26:37] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [13:26:47] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [13:51:47] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [13:52:27] PROBLEM dpkg-check is now: CRITICAL on wikidata-dev-2 i-00000259 output: DPKG CRITICAL dpkg reports broken packages [13:56:47] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [13:56:47] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [14:21:47] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [14:26:47] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [14:26:47] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [14:34:07] PROBLEM Puppet freshness is now: CRITICAL on mobile-feeds i-000000c1 output: Puppet has not run in last 20 hours [14:51:47] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [14:59:36] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [14:59:56] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [15:17:06] PROBLEM Puppet freshness is now: CRITICAL on deployment-web i-00000217 output: Puppet has not run in last 20 hours [15:21:48] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [15:29:58] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [15:29:58] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [15:44:32] PROBLEM HTTP is now: CRITICAL on deployment-web4 i-00000214 output: CRITICAL - Socket timeout after 10 seconds [15:49:22] PROBLEM HTTP is now: WARNING on deployment-web4 i-00000214 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.004 second response time [15:51:52] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [16:00:02] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [16:00:02] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [16:21:52] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [16:30:02] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [16:30:02] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [16:51:52] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [17:00:02] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [17:00:02] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [17:10:12] PROBLEM Puppet freshness is now: CRITICAL on deployment-apache05 i-0000024a output: Puppet has not run in last 20 hours [17:20:14] PROBLEM Puppet freshness is now: CRITICAL on deployment-apache01 i-00000246 output: Puppet has not run in last 20 hours [17:21:54] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [17:30:06] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [17:30:16] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [17:47:19] !log deployment-prep rebooting all apaches [17:49:12] why? [17:49:39] because they were hung [17:49:52] I think they were hung on fuse, thanks to the gluster upgrade [17:49:56] PROBLEM HTTP is now: CRITICAL on deployment-apache03 i-00000248 output: Connection refused [17:50:07] PROBLEM HTTP is now: CRITICAL on deployment-apache05 i-0000024a output: Connection refused [17:50:12] yay [17:50:22] yep. I fucked that one up good :) [17:50:57] <^demon> Ryan_Lane: Gerrit's timing out. [17:51:06] oh [17:51:07] ? [17:51:36] <^demon> It's just sitting there saying "Sending request..." and doing nothing else [17:51:41] PROBLEM HTTP is now: CRITICAL on deployment-web3 i-00000219 output: Connection refused [17:51:46] PROBLEM HTTP is now: CRITICAL on deployment-web i-00000217 output: Connection refused [17:51:46] hm [17:51:53] mine just says connecting [17:51:55] <^demon> ping timed out too [17:52:37] <^demon> Hmm, can ssh in [17:52:44] well, let's try restarting gerrit [17:53:01] PROBLEM HTTP is now: CRITICAL on deployment-web5 i-00000213 output: Connection refused [17:53:08] stop is taking quite a while [17:53:33] well, it's still not working [17:53:34] wtf [17:53:41] PROBLEM HTTP is now: CRITICAL on deployment-apache04 i-00000249 output: Connection refused [17:53:45] <^demon> I haven't touched it today, no clue. [17:53:51] RECOVERY Puppet freshness is now: OK on deployment-web i-00000217 output: puppet ran at Tue May 8 17:53:47 UTC 2012 [17:53:59] hm [17:54:01] PROBLEM HTTP is now: CRITICAL on deployment-apache01 i-00000246 output: Connection refused [17:54:01] network issue [17:54:21] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [17:54:31] PROBLEM HTTP is now: CRITICAL on deployment-web4 i-00000214 output: Connection refused [17:56:41] PROBLEM HTTP is now: WARNING on deployment-web i-00000217 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.013 second response time [17:58:29] deployment is much happier now :) [18:00:41] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [18:01:31] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [18:04:51] RECOVERY Puppet freshness is now: OK on deployment-apache01 i-00000246 output: puppet ran at Tue May 8 18:04:36 UTC 2012 [18:06:21] RECOVERY Puppet freshness is now: OK on deployment-apache05 i-0000024a output: puppet ran at Tue May 8 18:06:10 UTC 2012 [18:08:41] PROBLEM HTTP is now: WARNING on deployment-apache04 i-00000249 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.010 second response time [18:09:01] PROBLEM HTTP is now: WARNING on deployment-apache01 i-00000246 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.201 second response time [18:09:51] PROBLEM HTTP is now: WARNING on deployment-apache05 i-0000024a output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.015 second response time [18:15:41] PROBLEM Disk Space is now: WARNING on deployment-feed i-00000118 output: DISK WARNING - free space: / 78 MB (5% inode=40%): [18:25:01] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [18:31:41] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [18:31:51] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [18:38:18] <^demon> Ryan_Lane: So yesterday at their hackathon they talked all about plugin interfaces, right? [18:38:19] 05/08/2012 - 18:38:18 - Creating a home directory for faidon at /export/home/deployment-prep/faidon [18:38:23] <^demon> Today they're working on a mascot. [18:38:33] o.O [18:38:34] really? [18:38:40] <^demon> Someone is, at least :p [18:38:43] a mascot? that's what they spend their hackathon time on? [18:39:20] 05/08/2012 - 18:39:20 - Updating keys for faidon [18:40:12] <^demon> "first draft of Diffy, the Review Cuckoo: http://oi48.tinypic.com/33wtnjk.jpg" [18:40:55] <^demon> I promise I'm not making this up. [18:42:43] <^demon> We should name our unicorn. [18:42:45] <^demon> He needs a name. [18:44:19] <^demon> Ulysses [18:44:20] <^demon> ? [18:45:41] <^demon> Uberto. [18:46:02] <^demon> http://www.justmommies.com/baby-names/boys/u so many choices [18:49:40] New patchset: Hashar; "/home/wikipedia is only created on production" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/6947 [18:49:54] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/6947 [18:50:03] Ryan_Lane: done. Thanks for the hint. https://gerrit.wikimedia.org/r/6947 [18:50:06] yw [18:50:35] ^demon: I never thought that someone in this organization would find a valid work-related reason to be on justmommies.com [18:50:44] But apparently that just happened [18:51:00] <^demon> :) [18:51:26] RoanKattouw: are you going to be a father? [18:51:41] <^demon> I want to name the unicorn. [18:52:45] hashar: Not that I'm aware of :) no Chad is just trying to name the labs unicorn [18:53:01] I am puzzeld [18:53:04] puzzled [18:53:10] <^demon> RoanKattouw: Not that you're aware of? There's doubt? [18:53:14] * hashar reads backlog [18:53:16] * Damianz thinks they smoked too many drugs [18:53:21] hashar: Chad is trying to find a name for the unicorn mascot [18:53:45] PROBLEM Current Load is now: CRITICAL on deployment-imagescaler01 i-0000025a output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:54:00] Uberto sounds good [18:54:02] to me [18:54:09] ^demon: Well he said "going to be", so I can't just say no, can I? :) I can't guarantee that I'll never be a father [18:54:18] <^demon> Ah. [18:54:25] PROBLEM Current Users is now: CRITICAL on deployment-imagescaler01 i-0000025a output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:54:36] RoanKattouw: We can arrange that [18:54:38] haha [18:54:41] * Damianz grabs the chainsaw [18:55:04] * ^demon goes back to work and grabs a leaflet on the way back to his desk [18:55:05] PROBLEM Disk Space is now: CRITICAL on deployment-imagescaler01 i-0000025a output: Connection refused by host [18:55:45] PROBLEM Free ram is now: CRITICAL on deployment-imagescaler01 i-0000025a output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:56:12] <^demon> Ok, Uberto? Any other suggestions? Objections? [18:56:15] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [18:56:35] Dave [18:56:36] Reedy: want a citrus fresca ? [18:56:45] <^demon> Dave the Unicorn? [18:56:55] PROBLEM Total Processes is now: CRITICAL on deployment-imagescaler01 i-0000025a output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:56:58] Everyone nameless is called dave [18:57:08] hashar: are you being a waiter? :D [18:57:10] Dave sounds to common [18:57:17] Reedy: just happen to be in kitchen right now [18:57:29] so I can bring you one while moving back to desk :-D [18:57:35] PROBLEM dpkg-check is now: CRITICAL on deployment-imagescaler01 i-0000025a output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:57:56] Please [19:01:44] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [19:01:54] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [19:26:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [19:31:44] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [19:31:54] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [19:37:18] 05/08/2012 - 19:37:18 - Updating keys for laner [19:38:44] RECOVERY Current Load is now: OK on deployment-imagescaler01 i-0000025a output: OK - load average: 0.56, 1.00, 0.87 [19:39:24] RECOVERY Current Users is now: OK on deployment-imagescaler01 i-0000025a output: USERS OK - 0 users currently logged in [19:40:44] RECOVERY Free ram is now: OK on deployment-imagescaler01 i-0000025a output: OK: 85% free memory [19:41:54] RECOVERY Total Processes is now: OK on deployment-imagescaler01 i-0000025a output: PROCS OK: 87 processes [19:42:34] RECOVERY dpkg-check is now: OK on deployment-imagescaler01 i-0000025a output: All packages OK [19:42:44] RECOVERY Disk Space is now: OK on deployment-imagescaler01 i-0000025a output: DISK OK [19:56:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [20:01:44] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [20:01:54] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [20:26:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [20:29:10] !log deployment-prep update MediaWiki to latest master [20:29:20] !log fooo [20:29:25] Stupid irc bot [20:29:30] it is not reliable [20:31:44] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [20:31:54] PROBLEM host: salt is DOWN address: i-000001c1 CRITICAL - Host Unreachable (i-000001c1) [20:43:15] ^demon: do you have any idea how the labs IRC bots work ? [20:43:23] <^demon> Nope [20:43:33] thanks [20:43:35] :-D [20:44:44] hashar, you need to provide the project name [20:44:54] !log deployment-prep logging [20:44:55] :D [20:44:56] still [20:45:11] Platonides: labs-morebot is dead [20:45:26] seems it needs to be restarted on some instance part of the Bots project [20:45:30] found doc at https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots [20:45:39] I have no access there though :/ [20:46:19] I will just pretend I forgot logging :-]]]] [20:47:20] I don't have access to bots, either [20:47:47] * Platonides pokes petan [20:49:19] Platonides: I will ask Ryan_Lane whenever he is back from lunch :D [20:49:35] bah [20:49:38] I need to move that bot [20:49:42] I'm going to do that right now [20:49:46] \o/ [20:49:56] deployment-prep can wait. the bot has been pissing me off :) [20:56:44] PROBLEM host: ganglia-test is DOWN address: i-00000202 CRITICAL - Host Unreachable (i-00000202) [21:01:37] andrewbogott: it was utf8 support in nova that you pushed in a while back, right? [21:01:45] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [21:02:24] Ryan_Lane: Nova mostly supported utf8 already, but I fixed some bugs in the command-line. [21:02:35] it was broken via the api [21:02:48] Yeah, I think that had something to do with auth. Lemme check. [21:02:54] * Ryan_Lane nods [21:04:25] PROBLEM Current Users is now: CRITICAL on ganglia-test3 i-0000025b output: Connection refused by host [21:05:15] PROBLEM Disk Space is now: CRITICAL on ganglia-test3 i-0000025b output: Connection refused by host [21:05:45] PROBLEM Free ram is now: CRITICAL on ganglia-test3 i-0000025b output: Connection refused by host [21:06:55] PROBLEM Total Processes is now: CRITICAL on ganglia-test3 i-0000025b output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:07:35] PROBLEM dpkg-check is now: CRITICAL on ganglia-test3 i-0000025b output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:08:32] Ryan_Lane: Looks like I fixed nova-manage and also a utf-8 bug in keystone. [21:08:39] Which is weird since we don't use keystone. [21:08:39] cool [21:08:42] heh [21:08:45] PROBLEM Current Load is now: CRITICAL on ganglia-test3 i-0000025b output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:08:46] well, we will :) [21:09:10] There was a fair bit of other utf8 work that preceded mine. So it's /probably/ fixed in essex, but I'm no longer positive that I fixed the exact bug you were hitting. [21:10:12] * Ryan_Lane nods [21:10:35] I just wanted to make sure I was being accurate when I said we pushed in fixes for utf8 support. [21:15:23] Ryan_Lane: paravoid: any of you moved apache to /data ? [21:15:34] no [21:15:37] .: 45: Can't open /etc/apache2/envvars [21:15:39] :-D [21:15:51] it's possible that petan did [21:16:01] is /data/project not working? [21:16:13] which instance is this? [21:16:23] deployment-web5 [21:16:28] did not [21:16:34] indeed no /data/ mounted there [21:16:42] damn it [21:16:47] is that the shared directory? [21:16:48] new gluster is still install there [21:17:16] so for some reason that probably killed all the apaches :-] [21:17:34] deployment-web5 is fixed [21:17:58] do you just mount -a or something like that? [21:18:01] no [21:18:05] it's an automount [21:18:08] I downgraded glusterfs [21:19:17] as I understand it we had squids / apaches etc moved to use /data/ [21:19:20] probably petr did [21:19:30] which also mean that glusterFS and or /data/ are spof [21:20:06] I have no idea what he did :-( [21:20:11] well, yeah. it is [21:20:17] I'm fixing it everywhere [21:20:22] I thought I did yesterday [21:20:30] stupid upgrade broke things [21:22:19] 05/08/2012 - 21:22:19 - Updating keys for laner [21:22:55] PROBLEM HTTP is now: WARNING on deployment-web5 i-00000213 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.011 second response time [21:24:37] New patchset: Sara; "Define appropriate directories to install ganglia-webfrontend 3.3.5 in labs." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/6993 [21:24:51] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/6993 [21:26:45] PROBLEM HTTP is now: WARNING on deployment-web3 i-00000219 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.609 second response time [21:28:45] RECOVERY Current Load is now: OK on ganglia-test3 i-0000025b output: OK - load average: 0.40, 0.14, 0.15 [21:29:27] RECOVERY Current Users is now: OK on ganglia-test3 i-0000025b output: USERS OK - 1 users currently logged in [21:30:15] RECOVERY Disk Space is now: OK on ganglia-test3 i-0000025b output: DISK OK [21:30:45] RECOVERY Free ram is now: OK on ganglia-test3 i-0000025b output: OK: 92% free memory [21:31:45] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [21:31:56] RECOVERY Total Processes is now: OK on ganglia-test3 i-0000025b output: PROCS OK: 86 processes [21:32:35] RECOVERY dpkg-check is now: OK on ganglia-test3 i-0000025b output: All packages OK [21:37:01] !log deployment-prep log bot broken :-D [21:40:00] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [21:40:29] hashar: heh [21:40:33] I'm fixing things [21:40:34] New patchset: Faidon; "Add imagescaler::labs role class" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/6996 [21:40:42] I am sure you do :-D [21:40:48] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/6996 [21:41:55] New review: Faidon; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6996 [21:41:58] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/6996 [21:47:56] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: DISK CRITICAL - free space: /home/dzahn 825 MB (4% inode=81%): [21:49:25] PROBLEM HTTP is now: WARNING on deployment-web4 i-00000214 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.012 second response time [21:55:50] RECOVERY Disk Space is now: OK on deployment-feed i-00000118 output: DISK OK [22:00:04] hashar: so, /home/wikipedia should exist in deployment-prep now [22:00:12] I haven't permanently fixed the issue yet, though [22:00:44] I have noticed it exists on deployment-dbdump which is receiving syslog [22:00:51] * Ryan_Lane nods [22:01:09] I have manually created /home/wikipedia/syslog pending some merge in puppet [22:01:19] which change is it again? [22:01:32] https://gerrit.wikimedia.org/r/#/c/6546/ [22:02:08] and the fix we talked about earlier before lunch is https://gerrit.wikimedia.org/r/#/c/6947/ [22:02:31] 6947 creates /home/wikipedia only on production [22:02:37] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [22:02:49] since on labs it is going to be handled by you^W^Wmanually [22:03:58] PROBLEM Current Load is now: CRITICAL on deployment-apache02 i-00000247 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:03:58] PROBLEM SSH is now: CRITICAL on deployment-apache02 i-00000247 output: CRITICAL - Socket timeout after 10 seconds [22:04:01] PROBLEM dpkg-check is now: CRITICAL on deployment-apache02 i-00000247 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:04:12] PROBLEM Disk Space is now: WARNING on deployment-feed i-00000118 output: DISK WARNING - free space: / 78 MB (5% inode=40%): [22:05:26] hashar: the first needs to be rebased [22:05:30] oh [22:05:33] wait. no it doesn't [22:05:43] New review: Ryan Lane; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6546 [22:05:55] PROBLEM Current Users is now: CRITICAL on deployment-apache02 i-00000247 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:05:56] PROBLEM Disk Space is now: CRITICAL on deployment-apache02 i-00000247 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:05:57] yes it does [22:05:58] :) [22:06:05] I guess OUTDATED means the change is based on a change which have been updated later on [22:06:20] Gerrit 2.3 probably attempt a merge in the background [22:06:41] ohh [22:06:53] there is another thing I totally forgot during our meeting this morning [22:07:02] we might want to have the production branch to be merged back in test :-] [22:07:11] cause there is like a 3 months difference between the two [22:07:40] PROBLEM Free ram is now: CRITICAL on deployment-apache02 i-00000247 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:09:11] !log bots rebooting bots-2 [22:09:16] New patchset: Hashar; "/home/wikipedia is only created on production" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/6947 [22:09:22] RECOVERY SSH is now: OK on deployment-apache02 i-00000247 output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:09:22] RECOVERY Current Load is now: OK on deployment-apache02 i-00000247 output: OK - load average: 4.91, 5.72, 3.37 [22:09:22] RECOVERY dpkg-check is now: OK on deployment-apache02 i-00000247 output: All packages OK [22:09:31] New patchset: Hashar; "syslog-server requires /home/wikipedia/syslog" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/6546 [22:09:36] PROBLEM Free ram is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [22:09:45] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/6947 [22:09:45] New review: Ryan Lane; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6947 [22:09:45] Change merged: Ryan Lane; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/6947 [22:09:46] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/6546 [22:10:02] thanks [22:10:06] yw [22:10:20] !log deployment-prep rebooting deployment-apache02 [22:11:11] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 5.38, 6.49, 3.90 [22:11:37] PROBLEM Current Load is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [22:11:37] PROBLEM dpkg-check is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [22:13:05] baaah [22:14:59] PROBLEM Total Processes is now: CRITICAL on deployment-apache02 i-00000247 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:15:28] RECOVERY SSH is now: OK on bots-2 i-0000009c output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:15:28] RECOVERY Current Load is now: OK on bots-2 i-0000009c output: OK - load average: 3.27, 2.66, 1.18 [22:15:28] RECOVERY Current Users is now: OK on bots-2 i-0000009c output: USERS OK - 1 users currently logged in [22:15:38] PROBLEM HTTP is now: CRITICAL on deployment-apache01 i-00000246 output: Connection refused [22:15:38] RECOVERY Total Processes is now: OK on bots-2 i-0000009c output: PROCS OK: 84 processes [22:17:48] PROBLEM dpkg-check is now: CRITICAL on ganglia-test3 i-0000025b output: DPKG CRITICAL dpkg reports broken packages [22:17:48] PROBLEM Current Users is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:17:48] PROBLEM Disk Space is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:17:48] PROBLEM Free ram is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:18:15] RECOVERY Free ram is now: OK on bots-2 i-0000009c output: OK: 94% free memory [22:18:15] RECOVERY Disk Space is now: OK on bots-2 i-0000009c output: DISK OK [22:18:15] RECOVERY dpkg-check is now: OK on bots-2 i-0000009c output: All packages OK [22:18:50] !log deployment-prep hereby declaring the deployment-apacheXX to be f*** up and deleting them [22:19:01] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [22:20:26] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [22:20:48] New review: Sara; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6993 [22:20:50] Change merged: Sara; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/6993 [22:21:30] * hashar waits [22:22:20] RECOVERY Current Users is now: OK on worker1 i-00000208 output: USERS OK - 0 users currently logged in [22:22:20] RECOVERY Disk Space is now: OK on worker1 i-00000208 output: DISK OK [22:22:20] RECOVERY Free ram is now: OK on worker1 i-00000208 output: OK: 95% free memory [22:26:51] hashar: wait wait [22:26:55] fucked up in which way? [22:27:48] did you try rebooting them before deleting them? [22:31:09] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 0.31, 1.14, 3.14 [22:31:34] hashar: image scaler is kind of ready, it's lacking mediawiki and apache configs, I presume you handle those? [22:32:59] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [22:32:59] paravoid: arent apaches config in a puppet class ? [22:33:17] some of them are [22:33:20] Ryan_Lane: I have reboot deployment2 [22:33:29] hashar: go for it [22:33:32] everything that's /usr/local/apache/conf is considered deployment-material though [22:33:37] and rsynced from fenari in production [22:33:46] Ryan_Lane: anyway the issue is that I am not sure what Petr did, apparently it was not just about making /etc/apache2 a symlink to some /data/ [22:33:49] PROBLEM Current Load is now: CRITICAL on deployment-apache06 i-0000025c output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:33:49] PROBLEM Current Load is now: CRITICAL on deployment-apache08 i-0000025f output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:33:49] PROBLEM Current Load is now: CRITICAL on deployment-apache09 i-0000025e output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:33:53] he changes some other things too :/ [22:34:29] PROBLEM Current Users is now: CRITICAL on deployment-apache09 i-0000025e output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:34:29] PROBLEM Current Users is now: CRITICAL on deployment-apache06 i-0000025c output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:34:29] PROBLEM Current Users is now: CRITICAL on deployment-apache08 i-0000025f output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:34:36] Ryan_Lane: so just to be sure, I claimed the "new" apache instances unstable and prefer to have them deleted. I have created 4 new ones [22:34:48] ah [22:35:04] PROBLEM Disk Space is now: CRITICAL on deployment-apache09 i-0000025e output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:35:04] PROBLEM Disk Space is now: CRITICAL on deployment-apache08 i-0000025f output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:35:04] PROBLEM Disk Space is now: CRITICAL on deployment-apache06 i-0000025c output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:35:59] PROBLEM Free ram is now: CRITICAL on deployment-apache09 i-0000025e output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:35:59] PROBLEM Free ram is now: CRITICAL on deployment-apache06 i-0000025c output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:35:59] PROBLEM Free ram is now: CRITICAL on deployment-apache08 i-0000025f output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:36:09] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.81, 0.70, 2.37 [22:36:29] PROBLEM HTTP is now: CRITICAL on deployment-apache06 i-0000025c output: CRITICAL - Socket timeout after 10 seconds [22:36:29] PROBLEM HTTP is now: CRITICAL on deployment-apache08 i-0000025f output: CRITICAL - Socket timeout after 10 seconds [22:36:29] PROBLEM HTTP is now: CRITICAL on deployment-apache09 i-0000025e output: CRITICAL - Socket timeout after 10 seconds [22:37:34] PROBLEM Total Processes is now: CRITICAL on deployment-apache06 i-0000025c output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:37:39] PROBLEM Total Processes is now: CRITICAL on deployment-apache09 i-0000025e output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:37:44] PROBLEM Total Processes is now: CRITICAL on deployment-apache08 i-0000025f output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:38:09] paravoid: what do you mean by """everything that's /usr/local/apache/conf is considered deployment-material though""" ?? [22:38:15] is there a git repo already or a deb package ? [22:38:23] PROBLEM dpkg-check is now: CRITICAL on deployment-apache06 i-0000025c output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:38:24] PROBLEM dpkg-check is now: CRITICAL on deployment-apache09 i-0000025e output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:38:24] PROBLEM dpkg-check is now: CRITICAL on deployment-apache08 i-0000025f output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:38:29] or did you mean that is only on fenari ? [22:50:00] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [23:03:04] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [23:14:25] ssmollett: it'd be really cool if you could link to and maybe pull in some summary graphs for each project from https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bastion [23:14:42] i.e. on the project page, you get the SAL, the instance list, some ganglia graphs, etc. [23:14:45] :D [23:16:13] also, would you mind if I add a link to ganglia from the main labsconsole page/ [23:16:15] ? [23:17:14] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [23:20:15] maplebed: maybe just be bold and do it ? [23:20:19] +1 [23:20:22] +1 [23:20:23] too late. [23:20:34] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce) [23:20:35] you can't +1 yourself hashar [23:20:38] permissions don't allow that :P [23:20:40] I already did. [23:20:42] ohh [23:20:57] * hashar fixes permissions [23:21:19] but the templates scare me too much to try and add graphs to the project pages. [23:21:19] maplebed: that is usually the best way to have anything done in our community :-D just [edit] [23:21:23] what's with all these silly templates maplebed? [23:21:29] but you are most probably aware of that [23:21:58] you'd have to ask petrb about all the templates. [23:23:24] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: DISK CRITICAL - free space: /home/dzahn 831 MB (4% inode=81%): [23:23:44] PROBLEM Current Load is now: CRITICAL on deployment-apache10 i-00000260 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:24:24] PROBLEM Current Users is now: CRITICAL on deployment-apache10 i-00000260 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:25:04] PROBLEM Disk Space is now: CRITICAL on deployment-apache10 i-00000260 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:25:44] PROBLEM Free ram is now: CRITICAL on deployment-apache10 i-00000260 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:26:24] PROBLEM HTTP is now: CRITICAL on deployment-apache10 i-00000260 output: CRITICAL - Socket timeout after 10 seconds [23:27:18] !log deployment-prep deleted apache07 which did not start [23:27:34] PROBLEM Total Processes is now: CRITICAL on deployment-apache10 i-00000260 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:28:14] PROBLEM dpkg-check is now: CRITICAL on deployment-apache10 i-00000260 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:33:04] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [23:40:27] Ryan_Lane: where you the one that originally installed gerrit at WMF ? [23:40:34] yep [23:40:54] was it after meeting some OpenStack people? [23:41:49] !log deployment-prep despooling deployment-web{,3,4,5} [23:44:02] no. I think we installed it before they did [23:50:34] PROBLEM host: mobile-enwp is DOWN address: i-000000ce CRITICAL - Host Unreachable (i-000000ce)