[04:30:13] [[Wiki server assignments]] ! 10https://wiki.toolserver.org/w/index.php?diff=8141&oldid=8124&rcid=22236 * 185.15.59.202 * (+0) (updated page) [05:35:00] 2013/09/11 05:31 WARN ortelius Load avg. WARNING - load average: 12.17, 15.80, 12.23 [05:36:00] 2013/09/11 05:35 OK ortelius Load avg. OK - load average: 9.26, 14.35, 11.95 [05:42:00] 2013/09/11 05:41 WARN wolfsbane Load avg. WARNING - load average: 27.37, 24.45, 16.12 [05:53:01] 2013/09/11 05:52 CRIT wolfsbane Load avg. CRITICAL - load average: 52.40, 27.89, 20.07 [05:56:01] 2013/09/11 05:55 WARN wolfsbane Load avg. WARNING - load average: 12.42, 22.82, 19.73 [05:57:02] 2013/09/11 05:56 CRIT wolfsbane Load avg. CRITICAL - load average: 31.46, 25.70, 20.91 [06:11:03] 2013/09/11 06:10 WARN wolfsbane Load avg. WARNING - load average: 8.29, 16.21, 19.13 [06:14:03] 2013/09/11 06:11 WARN ortelius Load avg. WARNING - load average: 23.58, 21.84, 16.35 [06:15:03] 2013/09/11 06:14 CRIT ortelius Load avg. CRITICAL - load average: 46.52, 29.07, 19.29 [06:17:03] 2013/09/11 06:16 WARN ortelius Load avg. WARNING - load average: 9.58, 21.19, 17.55 [06:17:03] 2013/09/11 06:16 CRIT wolfsbane Load avg. CRITICAL - load average: 46.67, 23.83, 20.64 [06:25:05] 2013/09/11 06:24 OK ortelius Load avg. OK - load average: 5.14, 12.05, 14.53 [06:32:05] 2013/09/11 06:25 WARN ortelius Load avg. WARNING - load average: 19.96, 17.24, 16.13 [06:37:05] 2013/09/11 06:36 OK ortelius Load avg. OK - load average: 9.75, 13.70, 14.86 [06:45:05] 2013/09/11 06:40 WARN ortelius Load avg. WARNING - load average: 17.27, 21.53, 18.93 [06:52:05] 2013/09/11 06:51 OK ortelius Load avg. OK - load average: 5.21, 10.74, 14.86 [07:13:09] 2013/09/11 07:08 CRIT z-dat-s3-a MySQL Can't connect to MySQL server on 'z-dat-s3-a' (110) [07:13:09] 2013/09/11 07:08 CRIT z-dat-s3-a MySQL slave Can't connect to MySQL server on 'z-dat-s3-a' (110) [07:14:09] 2013/09/11 07:12 OK z-dat-s3-a MySQL Uptime: 594466 Threads: 19 Questions: 668427527 Slow queries: 30684 Opens: 4154043 Flush tables: 1 Open tables: 16385 Queries per second avg: 1124.416 [07:14:09] 2013/09/11 07:12 OK z-dat-s3-a MySQL slave Uptime: 594466 Threads: 20 Questions: 668427523 Slow queries: 30684 Opens: 4154042 Flush tables: 1 Open tables: 16384 Queries per second avg: 1124.416 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 298 [07:23:10] 2013/09/11 07:20 WARN ortelius Load avg. WARNING - load average: 14.23, 22.00, 19.10 [07:29:11] 2013/09/11 07:28 OK ortelius Load avg. OK - load average: 6.92, 11.12, 14.88 [07:36:11] 2013/09/11 07:35 WARN wolfsbane Load avg. WARNING - load average: 2.24, 9.71, 19.08 [07:42:12] 2013/09/11 07:41 OK wolfsbane Load avg. OK - load average: 5.20, 6.36, 14.43 [08:10:04] Federico Leva (Nemo) * Re: [Toolserver-l] Investigation on migrating Jira to Bugzilla - any volunteers around? [08:20:05] Federico Leva (Nemo) * [Toolserver-l] DB replication monopoly or not [08:40:16] 2013/09/11 08:38 WARN z-dat-s4-a MySQL slave SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3317 [08:43:16] 2013/09/11 08:41 OK z-dat-s4-a MySQL slave Uptime: 599800 Threads: 1 Questions: 33739630 Slow queries: 58 Opens: 263 Flush tables: 1 Open tables: 251 Queries per second avg: 56.251 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1549 [11:02:27] 2013/09/11 11:01 CRIT z-dat-s4-a /sql CHECK_NRPE: Socket timeout after 30 seconds. [11:04:27] 2013/09/11 11:03 WARN z-dat-s4-a /sql DISK WARNING - free space: /sql 61568 MB (10% inode=99%): [11:06:27] 2013/09/11 11:01 CRIT z-dat-s3-a MySQL Can't connect to MySQL server on 'z-dat-s3-a' (110) [11:06:28] 2013/09/11 11:01 CRIT z-dat-s3-a MySQL slave Can't connect to MySQL server on 'z-dat-s3-a' (110) [11:07:27] 2013/09/11 11:06 OK z-dat-s3-a MySQL Uptime: 608458 Threads: 13 Questions: 684893192 Slow queries: 31553 Opens: 4178837 Flush tables: 1 Open tables: 16384 Queries per second avg: 1125.621 [11:07:28] 2013/09/11 11:06 OK z-dat-s3-a MySQL slave Uptime: 608458 Threads: 11 Questions: 684893195 Slow queries: 31553 Opens: 4178837 Flush tables: 1 Open tables: 16384 Queries per second avg: 1125.621 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 171 [19:02:55] hello [19:03:02] maintenance tonight [19:03:26] we will update the solaris hosts and reboot afterwards [19:03:36] going to prepare [19:16:15] starting updates [19:20:47] updates are running [19:36:42] download&install will take a while [20:01:56] ortelius is done [20:02:02] would reboot ortelius now [20:03:26] fits me [20:04:58] 2013/09/11 20:04 CRIT ortelius / Connection refused or timed out [20:04:58] 2013/09/11 20:04 CRIT ortelius /tmp Connection refused or timed out [20:04:58] 2013/09/11 20:03 CRIT ortelius DiskSuite Timeout while attempting connection [20:04:58] 2013/09/11 20:04 CRIT ortelius Environment IPMI Connection refused or timed out [20:04:58] 2013/09/11 20:03 CRIT ortelius Load avg. Timeout while attempting connection [20:04:58] 2013/09/11 20:03 CRIT ortelius NTP CRITICAL - Socket timeout after 10 seconds [20:04:59] 2013/09/11 20:04 CRIT ortelius PING CRITICAL - Host Unreachable (ortelius) [20:04:59] 2013/09/11 20:04 CRIT ortelius SMTP No route to host [20:05:00] 2013/09/11 20:03 CRIT ortelius SSH CRITICAL - Socket timeout after 10 seconds [20:05:00] 2013/09/11 20:03 CRIT ortelius Sun Grid Engine execd Timeout while attempting connection [20:05:01] 2013/09/11 20:04 CRIT ortelius toolserver.org HTTP No route to host [20:05:13] ok, willow could be next when ortelius is done [20:05:26] i'd say please safe your files now [20:06:57] 2013/09/11 20:06 OK ortelius / DISK OK - free space: / 11278 MB (37% inode=91%): [20:06:57] 2013/09/11 20:06 OK ortelius /tmp DISK OK - free space: /tmp 27026 MB (99% inode=99%): [20:06:57] 2013/09/11 20:06 OK ortelius DiskSuite OK - No disk failures detected [20:06:57] 2013/09/11 20:06 OK ortelius Environment IPMI ok: temperature ok fan ok voltage ok chassis ok [20:06:57] 2013/09/11 20:06 OK ortelius Load avg. OK - load average: 2.00, 0.85, 0.32 [20:06:57] 2013/09/11 20:06 OK ortelius PING PING OK - Packet loss = 0%, RTA = 0.14 ms [20:06:57] 2013/09/11 20:06 OK ortelius SMTP SMTP OK - 0.752 sec. response time [20:06:58] 2013/09/11 20:06 OK ortelius toolserver.org HTTP HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.018 second response time [20:07:44] ok ortelius is done [20:07:54] so ill go reboot willow in 15 min [20:07:57] 2013/09/11 20:06 OK ortelius NTP NTP OK: Offset 0.060622 secs [20:07:57] 2013/09/11 20:06 OK ortelius SSH SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [20:07:57] 2013/09/11 20:06 WARN ortelius Sun Grid Engine execd NRPE: Unable to read output [20:10:29] nosy: huh, are you rebooting willow too? [20:10:37] yes [20:10:53] scared? ;) [20:11:05] nosy: mmt, pls, need to save my stuff. [20:11:18] 14 min left i think [20:11:27] which might not be enough [20:11:30] No, just forget that it was a SUN host [20:11:42] Danny_B: Aahaahahahah, panic [20:11:45] hurry!!!! [20:12:52] amette: what about your hosts? are there updates through? [20:13:22] nosy: still running - getting faster though [20:13:44] nosy: You might want to update the MOTD next time ;-) [20:13:54] "Next general maintenance window: None." [20:13:56] damn... [20:16:43] nosy: You should do a dump of all the running processes just before reboot. Funny to see all the zombie crap in there [20:17:07] nope ;) [20:17:28] I've a PHP problem :| [20:17:31] but willow had really hard times with loads beyond 15...and about 1000 processes... [20:17:46] Superzerocool: I know ill fix this for sure [20:18:01] Superzerocool: you had the apc problem? [20:18:17] yes.. [20:18:33] Superzerocool: ok ill do after the updates [20:18:50] http://toolserver.org/~superzerocool/wlm-2/?pais=china throws a random 500-error :| [20:19:31] Superzerocool: funny thing is [20:19:35] http://ortelius.toolserver.org/~superzerocool/wlm-2/?pais=china [20:19:44] is working (the host was already rebooted) [20:19:50] and wolfsbane does not [20:20:03] probably it will be better when amette rebooted it [20:20:06] lets see [20:20:26] maybe just for a short time... it's php 5.4, right? APC is not stable with that version. [20:20:31] we'll quite well might solve this today [20:20:40] mmm... [20:21:02] amette: are there any newer php versions that would be suitable? [20:21:08] I don't want use apc... [20:21:17] if the machines are running php 5.4 the only chance is to migrate to Zend OpCache [20:21:25] which is meanwhile also integrated in 5.5 [20:21:41] amette: what about using php 5.5? [20:21:43] I'd say apc is quite dead [20:22:06] php 5.5 would be the most future proof, yes... but... [20:22:27] ... a lot of stuff will probably break as php 5.5 is very strict. [20:22:34] amette: i had no good feeling making such a big change in the php version [20:22:56] yeah, 5.5 should be the next step, but it most certainly will break stuff [20:22:58] so i thought upgrading from 5.3 to 5.4 was probably a good idea first [20:23:10] in F3 (Fat Free Framework) is disabled the cache to try to debug something, I disabled the apc in .htaccess and nothing :| [20:23:25] yes, that's not too tricky - the php language didn't change too much there - only problem I know is apc [20:23:26] so, php.ini is looking for apc.so :| [20:24:29] beep beep, beep beep, beep beep, beeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeep [20:24:59] php 5.3 is still maintained with security upgrades for about a year afaik [20:25:11] what? false alarm? [20:25:32] lol [20:25:46] aah, clematis could be rebooted [20:25:57] 2013/09/11 20:25 CRIT willow Environment IPMI Connection refused or timed out [20:25:58] 2013/09/11 20:25 CRIT willow NTP CRITICAL - Socket timeout after 10 seconds [20:25:58] 2013/09/11 20:25 CRIT willow SMTP CRITICAL - Socket timeout after 10 seconds [20:25:58] 2013/09/11 20:25 CRIT willow SSH No route to host [20:25:58] 2013/09/11 20:25 CRIT willow Sun Grid Engine execd Connection refused or timed out [20:26:09] amette: go ahead [20:26:13] nosy: ok [20:26:57] 2013/09/11 20:25 CRIT willow / Connection refused or timed out [20:26:57] 2013/09/11 20:26 CRIT willow /tmp Connection refused or timed out [20:26:57] 2013/09/11 20:25 CRIT willow Load avg. Connection refused or timed out [20:26:57] 2013/09/11 20:26 CRIT willow PING CRITICAL - Host Unreachable (willow) [20:27:57] 2013/09/11 20:27 OK willow /tmp DISK OK - free space: / 40147 MB (38% inode=99%): [20:27:57] 2013/09/11 20:27 OK willow Environment IPMI ok: temperature ok fan ok voltage ok chassis ok [20:27:57] 2013/09/11 20:27 OK willow Load avg. OK - load average: 1.74, 0.55, 0.20 [20:27:57] 2013/09/11 20:27 OK willow NTP NTP OK: Offset 0.046049 secs [20:27:57] 2013/09/11 20:27 OK willow PING PING OK - Packet loss = 0%, RTA = 0.12 ms [20:27:57] 2013/09/11 20:27 OK willow SMTP SMTP OK - 0.109 sec. response time [20:27:58] 2013/09/11 20:27 OK willow SSH SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [20:27:58] 2013/09/11 20:27 WARN willow Sun Grid Engine execd NRPE: Unable to read output [20:28:57] 2013/09/11 20:27 OK willow / DISK OK - free space: / 40146 MB (38% inode=99%): [20:29:54] willow is back [20:30:13] clematis, too [20:30:27] nosy: I'll reboot hawthorn [20:30:35] amette: does svcs -xv show anything? [20:30:45] ok [20:31:09] yes, execd repeatedly died [20:31:28] please do a svcadm clear for this [20:33:24] ok, done [20:33:56] cool [20:34:01] Fatal error: PHP Startup: zip: Unable to initialize module Module compiled with module API=20090626 PHP compiled with module API=20100525 These options need to match [20:34:47] weird, ssh not restarted, but screen terminated, how come? [20:34:49] Superzerocool: yes i am afraid i forgot half of the php update - ill build the modules now [20:35:25] root@willow:~# uptime [20:35:25] 20:35pm up 0:08, 4 users, load average: 0.38, 0.39, 0.26 [20:35:25] I'm sorry :$ [20:35:29] dont know [20:35:36] Superzerocool: no problem, was me [20:36:20] hawthorn is back [20:36:35] nosy: clearing execd on hawthorn too, I presume? [20:36:42] yes [20:36:53] it has to do with the boot range i guess [20:36:56] ok, done [20:37:09] and wolfsbane is ready for rebooting [20:37:16] i think execd tries to start before /sge is mounted and fails to find the commands then [20:37:17] ok [20:37:30] ok, wolfsbane rebooting [20:39:58] 2013/09/11 20:38 CRIT wolfsbane / Timeout while attempting connection [20:39:58] 2013/09/11 20:39 CRIT wolfsbane /tmp Connection refused or timed out [20:39:58] 2013/09/11 20:38 CRIT wolfsbane Environment IPMI Connection refused or timed out [20:39:58] 2013/09/11 20:39 CRIT wolfsbane Load avg. Connection refused or timed out [20:39:58] 2013/09/11 20:39 CRIT wolfsbane PING CRITICAL - Host Unreachable (wolfsbane) [20:39:58] 2013/09/11 20:39 CRIT wolfsbane SMTP CRITICAL - Socket timeout after 10 seconds [20:39:59] 2013/09/11 20:38 CRIT wolfsbane Sun Grid Engine execd Timeout while attempting connection [20:40:58] 2013/09/11 20:39 CRIT wolfsbane NTP CRITICAL - Socket timeout after 10 seconds [20:40:58] 2013/09/11 20:39 CRIT wolfsbane SSH CRITICAL - Socket timeout after 10 seconds [20:40:58] 2013/09/11 20:39 CRIT wolfsbane toolserver.org HTTP CRITICAL - Socket timeout after 10 seconds [20:41:58] 2013/09/11 20:41 OK wolfsbane /tmp DISK OK - free space: / 13337 MB (44% inode=93%): [20:41:58] 2013/09/11 20:41 OK wolfsbane Environment IPMI ok: temperature ok fan ok voltage ok chassis ok [20:41:58] 2013/09/11 20:41 OK wolfsbane Load avg. OK - load average: 2.02, 0.75, 0.28 [20:41:58] 2013/09/11 20:41 OK wolfsbane PING PING OK - Packet loss = 0%, RTA = 0.25 ms [20:41:58] 2013/09/11 20:41 OK wolfsbane SMTP SMTP OK - 0.717 sec. response time [20:41:58] 2013/09/11 20:41 OK wolfsbane SSH SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [20:42:58] 2013/09/11 20:41 OK wolfsbane / DISK OK - free space: / 13336 MB (44% inode=93%): [20:42:58] 2013/09/11 20:41 WARN wolfsbane Sun Grid Engine execd NRPE: Unable to read output [20:42:58] 2013/09/11 20:41 OK wolfsbane toolserver.org HTTP HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.009 second response time [20:43:32] wolfsbane back, execd cleared, all looking good [21:01:00] 1970/01/01 00:00 OK clematis PING OK - Packet loss = 0%, RTA = 0.16 ms [21:01:00] 2013/09/11 21:00 CRIT hawthorne check_ping: Invalid hostname/address - hawthorne [21:01:00] 1970/01/01 00:00 OK clematis toolserver.org HTTP [21:01:00] 2013/09/11 21:00 CRIT hawthorne toolserver.org HTTP Name or service not known [21:03:42] amette: its hawthorn [21:03:50] not hawthorne [21:04:23] jupp, just noticed, too [21:04:43] in which groups do they belong? I put them into solaris servers only for now [21:04:51] esam-solaris [21:04:57] probably also esam-sge? [21:05:00] 2013/09/11 21:04 OK hawthorn PING OK - Packet loss = 0%, RTA = 0.23 ms [21:05:00] 2013/09/11 21:04 CRIT hawthorn toolserver.org HTTP Connection refused [21:08:01] 2013/09/11 21:07 ?? clematis / CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:08:02] 2013/09/11 21:07 ?? clematis /tmp CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:08:02] 2013/09/11 21:07 ?? clematis Environment IPMI CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:08:02] 2013/09/11 21:07 ?? clematis Load avg. CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:08:02] 1970/01/01 00:00 OK clematis NTP [21:08:02] 1970/01/01 00:00 OK clematis PING [21:08:02] 2013/09/11 21:07 OK clematis SMTP SMTP OK - 0.003 sec. response time [21:08:02] 2013/09/11 21:07 OK clematis SSH SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [21:08:03] 2013/09/11 21:07 ?? clematis Sun Grid Engine execd CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:08:03] 2013/09/11 21:01 CRIT clematis toolserver.org HTTP Connection refused [21:08:04] 2013/09/11 21:07 ?? hawthorn / CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:08:04] 1970/01/01 00:00 OK hawthorn /tmp [21:11:00] 2013/09/11 21:04 CRIT hawthorn toolserver.org HTTP Connection refused [21:14:00] 2013/09/11 21:07 ?? clematis / CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:14:00] 2013/09/11 21:07 ?? clematis /tmp CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:14:00] 2013/09/11 21:07 ?? clematis Environment IPMI CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:14:00] 2013/09/11 21:07 ?? clematis Load avg. CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:14:00] 2013/09/11 21:07 ?? clematis Sun Grid Engine execd CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:14:00] 2013/09/11 21:07 ?? hawthorn / CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:14:00] 2013/09/11 21:07 ?? hawthorn Load avg. CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:14:01] 2013/09/11 21:07 ?? hawthorn Sun Grid Engine execd CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:15:00] 2013/09/11 21:08 ?? hawthorn /tmp CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:15:00] 2013/09/11 21:08 ?? hawthorn Environment IPMI CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [21:22:09] ill reboot damiana now [21:25:00] 2013/09/11 21:24 CRIT clematis / NRPE: Command 'check_root' not defined [21:25:00] 2013/09/11 21:24 OK clematis /tmp DISK OK - free space: /tmp 4870 MB (100% inode=99%): [21:25:00] 2013/09/11 21:24 OK clematis Environment IPMI ok: temperature ok fan ok voltage ok chassis ok [21:25:00] 2013/09/11 21:24 OK clematis Load avg. OK - load average: 0.04, 0.02, 0.02 [21:25:00] 2013/09/11 21:24 CRIT clematis Sun Grid Engine execd NRPE: Command 'check_sge_execd' not defined [21:25:00] 2013/09/11 21:24 CRIT damiana DiskSuite Connection refused or timed out [21:25:01] 2013/09/11 21:24 CRIT damiana NTP CRITICAL - Socket timeout after 10 seconds [21:25:01] 2013/09/11 21:24 CRIT damiana PING CRITICAL - Host Unreachable (damiana) [21:25:02] 2013/09/11 21:24 CRIT damiana SSH No route to host [21:25:02] 2013/09/11 21:24 CRIT damiana ts-array5 Connection refused or timed out [21:26:00] 2013/09/11 21:25 CRIT damiana / Connection refused or timed out [21:26:00] 2013/09/11 21:24 CRIT damiana /tmp Connection refused or timed out [21:26:00] 2013/09/11 21:24 CRIT damiana Environment IPMI Connection refused or timed out [21:26:00] 2013/09/11 21:25 CRIT damiana Free Memory Connection refused or timed out [21:26:00] 2013/09/11 21:24 CRIT damiana Load avg. Connection refused or timed out [21:26:00] 2013/09/11 21:25 CRIT damiana SMTP No route to host [21:26:01] 2013/09/11 21:25 CRIT ha-dns-auth Authoritative DNS CRITICAL - Plugin timed out while executing system call [21:26:01] 2013/09/11 21:25 CRIT ha-dns-auth PING CRITICAL - Host Unreachable (ha-dns-auth) [21:26:02] 2013/09/11 21:24 CRIT ha-dns-recursor.esi DNS recursor CRITICAL - Plugin timed out while executing system call [21:26:02] 2013/09/11 21:25 CRIT ha-dns-recursor.esi PING CRITICAL - Host Unreachable (ha-dns-recursor.esi) [21:26:03] 2013/09/11 21:25 CRIT ha-ldap.esi LDAP Could not bind to the LDAP server [21:26:03] 2013/09/11 21:25 CRIT ha-ldap.esi PING CRITICAL - Host Unreachable (ha-ldap.esi) [21:27:01] 2013/09/11 21:26 OK damiana /tmp DISK OK - free space: /tmp 14490 MB (99% inode=99%): [21:27:01] 2013/09/11 21:26 OK damiana Load avg. OK - load average: 1.64, 0.68, 0.26 [21:27:01] 2013/09/11 21:26 OK damiana PING PING OK - Packet loss = 0%, RTA = 0.16 ms [21:27:01] 2013/09/11 21:26 OK damiana ts-array5 2/2 paths are active [21:27:01] 2013/09/11 21:26 OK ha-nfs.esi PING PING OK - Packet loss = 0%, RTA = 0.28 ms [21:27:01] 2013/09/11 21:26 OK hawthorn /tmp DISK OK - free space: /tmp 4774 MB (99% inode=99%): [21:28:00] 2013/09/11 21:24 CRIT adenia SMTP CRITICAL - Socket timeout after 10 seconds [21:28:00] 2013/09/11 21:27 OK clematis / DISK OK - free space: / 19245 MB (64% inode=93%): [21:28:00] 2013/09/11 21:27 ?? clematis Sun Grid Engine execd option -w not recognized [21:28:00] 2013/09/11 21:27 OK damiana / DISK OK - free space: / 22072 MB (30% inode=95%): [21:28:00] 2013/09/11 21:26 OK damiana Environment IPMI ok: temperature ok fan ok voltage ok chassis ok [21:28:00] 2013/09/11 21:27 OK damiana Free Memory OK - 88.1% (7382120 kB) free. [21:28:01] 2013/09/11 21:27 OK damiana SSH SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [21:28:01] 2013/09/11 21:27 OK hawthorn / DISK OK - free space: / 18785 MB (62% inode=93%): [21:28:02] 2013/09/11 21:24 CRIT nightshade SMTP CRITICAL - Socket timeout after 10 seconds [21:30:00] 2013/09/11 21:29 CRIT ha-dns-auth CRITICAL - Host Unreachable (ha-dns-auth) [21:30:00] 2013/09/11 21:29 CRIT ha-ldap.esi CRITICAL - Host Unreachable (ha-ldap.esi) [21:30:00] 2013/09/11 21:29 CRIT ha-proxy.esi CRITICAL - Host Unreachable (ha-proxy.esi) [21:30:00] 2013/09/11 21:24 CRIT damiana DiskSuite CRITICAL - submirror d51 of mirror d50 is "Needs" and submirror d52 of mirror d50 is "Needs" and submirror d31 of mirror d30 is "Needs" and submirror d32 of mirror d30 is "Needs" and submirror d11 of mirror d10 is "Needs" and submirror d12 of mirror d10 is "Needs" and submirror d21 of mirror d20 is "Needs" and submirror d22 of mirror d20 is "Needs" [21:30:00] 2013/09/11 21:24 CRIT damiana NTP NTP CRITICAL: No response from NTP server [21:31:00] 2013/09/11 21:30 CRIT ha-dns-recursor.esi CRITICAL - Host Unreachable (ha-dns-recursor.esi) [21:31:00] 2013/09/11 21:24 CRIT cassia SMTP CRITICAL - Socket timeout after 10 seconds [21:31:00] 2013/09/11 21:24 CRIT hawthorn SMTP CRITICAL - Socket timeout after 10 seconds [21:31:00] 2013/09/11 21:23 CRIT hemlock /home CHECK_NRPE: Socket timeout after 30 seconds. [21:31:00] 2013/09/11 21:24 CRIT hyacinth SMTP CRITICAL - Socket timeout after 10 seconds [21:31:01] 2013/09/11 21:24 ?? nightshade Sun Grid Engine execd Execution timeout exceeded [21:31:01] 2013/09/11 21:24 CRIT nightshade aliasd CRITICAL - Socket timeout after 10 seconds [21:31:01] 2013/09/11 21:24 CRIT ortelius SMTP CRITICAL - Socket timeout after 10 seconds [21:31:02] 2013/09/11 21:24 CRIT ortelius toolserver.org HTTP CRITICAL - Socket timeout after 10 seconds [21:31:02] 2013/09/11 21:24 CRIT ptolemy SMTP CRITICAL - Socket timeout after 10 seconds [21:31:03] 2013/09/11 21:23 CRIT rosemary SMTP CRITICAL - Socket timeout after 10 seconds [21:31:03] 2013/09/11 21:23 CRIT thyme SMTP CRITICAL - Socket timeout after 10 seconds [21:32:01] 2013/09/11 21:25 CRIT ha-nfs.esi NFS Connection refused [21:32:01] 2013/09/11 21:25 CRIT willow SMTP CRITICAL - Socket timeout after 10 seconds [21:32:01] 2013/09/11 21:24 CRIT yarrow SMTP CRITICAL - Socket timeout after 10 seconds [21:33:01] 2013/09/11 21:26 CRIT clematis SMTP CRITICAL - Socket timeout after 10 seconds [21:33:01] 2013/09/11 21:32 OK damiana DiskSuite OK - No disk failures detected [21:33:01] 2013/09/11 21:25 CRIT damiana SMTP CRITICAL - Socket timeout after 10 seconds [21:33:01] 2013/09/11 21:32 OK hawthorn SMTP SMTP OK - 5.612 sec. response time [21:33:01] 2013/09/11 21:28 CRIT nightshade Load avg. CRITICAL - load average: 42.93, 30.38, 15.48 [21:34:02] 2013/09/11 21:33 WARN ortelius toolserver.org HTTP HTTP WARNING: HTTP/1.1 404 Not found - 161 bytes in 0.009 second response time [21:34:02] 2013/09/11 21:27 CRIT wolfsbane SMTP CRITICAL - Socket timeout after 10 seconds [21:34:02] 2013/09/11 21:32 WARN wolfsbane toolserver.org HTTP HTTP WARNING: HTTP/1.1 404 Not found - 161 bytes in 0.005 second response time [21:35:01] 2013/09/11 21:34 OK ha-dns-auth PING OK - Packet loss = 0%, RTA = 0.42 ms [21:35:01] 2013/09/11 21:34 OK ha-dns-recursor.esi PING OK - Packet loss = 0%, RTA = 0.20 ms [21:35:01] 2013/09/11 21:34 OK ha-proxy.esi PING OK - Packet loss = 0%, RTA = 0.16 ms [21:35:01] 2013/09/11 21:34 OK adenia SMTP SMTP OK - 5.924 sec. response time [21:35:01] 2013/09/11 21:34 OK cassia SMTP SMTP OK - 0.007 sec. response time [21:35:01] 2013/09/11 21:34 OK clematis SMTP SMTP OK - 5.018 sec. response time [21:35:02] 2013/09/11 21:34 OK damiana NTP NTP OK: Offset -0.013473 secs [21:35:02] 2013/09/11 21:34 OK damiana SMTP SMTP OK - 5.016 sec. response time [21:35:03] 2013/09/11 21:34 OK ha-dns-auth Authoritative DNS DNS OK: 0.263 seconds response time. 1.www.toolserver.org returns 91.198.174.203 [21:35:03] 2013/09/11 21:34 OK ha-dns-auth PING PING OK - Packet loss = 0%, RTA = 0.42 ms [21:35:04] 2013/09/11 21:34 OK ha-dns-recursor.esi DNS recursor DNS OK: 0.259 seconds response time. www.google.com returns 74.125.128.103,74.125.128.104,74.125.128.105,74.125.128.106,74.125.128.147,74.125.128.99 [21:35:04] 2013/09/11 21:34 OK ha-dns-recursor.esi PING PING OK - Packet loss = 0%, RTA = 0.20 ms [21:36:02] 2013/09/11 21:35 OK ha-sql.esi PING PING OK - Packet loss = 0%, RTA = 0.16 ms [21:36:02] 2013/09/11 21:35 OK hyacinth SMTP SMTP OK - 0.003 sec. response time [21:36:02] 2013/09/11 21:34 OK rosemary SMTP SMTP OK - 0.020 sec. response time [21:36:02] 2013/09/11 21:34 OK thyme SMTP SMTP OK - 0.010 sec. response time [21:36:02] 2013/09/11 21:35 OK willow SMTP SMTP OK - 0.002 sec. response time [21:37:02] 2013/09/11 21:36 OK nightshade Load avg. OK - load average: 1.06, 14.08, 12.19 [21:41:01] known problem with the webservers? [21:41:08] have to fight with ldap [21:41:11] please wait [21:41:40] ok, as long as it is known, then I trust it will be fixed [21:42:02] 2013/09/11 21:41 OK ha-nfs.esi NFS TCP OK - 0.005 second response time on port 2049 [21:59:03] 2013/09/11 21:58 CRIT damiana DiskSuite Timeout while attempting connection [21:59:03] 2013/09/11 21:58 CRIT damiana NTP CRITICAL - Socket timeout after 10 seconds [21:59:03] 2013/09/11 21:58 CRIT damiana ts-array5 Connection refused or timed out [21:59:03] 2013/09/11 21:58 CRIT ha-nfs.esi PING CRITICAL - Host Unreachable (ha-nfs.esi) [21:59:03] 2013/09/11 21:58 CRIT ortelius toolserver.org HTTP CRITICAL - Socket timeout after 10 seconds [22:00:03] 2013/09/11 21:59 CRIT damiana / Connection refused or timed out [22:00:03] 2013/09/11 21:58 CRIT damiana /tmp Connection refused or timed out [22:00:03] 2013/09/11 21:58 CRIT damiana Environment IPMI Connection refused or timed out [22:00:03] 2013/09/11 21:59 CRIT damiana Free Memory Connection refused or timed out [22:00:03] 2013/09/11 21:58 CRIT damiana Load avg. Connection refused or timed out [22:00:03] 2013/09/11 21:59 CRIT damiana PING CRITICAL - Host Unreachable (damiana) [22:00:03] 2013/09/11 21:59 CRIT damiana SMTP No route to host [22:00:05] 2013/09/11 21:59 CRIT damiana SSH No route to host [22:00:05] 2013/09/11 21:59 CRIT ha-dns-auth Authoritative DNS CRITICAL - Plugin timed out while executing system call [22:00:05] 2013/09/11 21:59 CRIT ha-dns-auth PING CRITICAL - Host Unreachable (ha-dns-auth) [22:00:06] 2013/09/11 21:58 CRIT ha-dns-recursor.esi DNS recursor CRITICAL - Plugin timed out while executing system call [22:00:06] 2013/09/11 21:59 CRIT ha-dns-recursor.esi PING CRITICAL - Host Unreachable (ha-dns-recursor.esi) [22:01:04] 2013/09/11 22:00 OK damiana /tmp DISK OK - free space: /tmp 14656 MB (99% inode=99%): [22:01:04] 2013/09/11 22:00 OK damiana Load avg. OK - load average: 1.72, 0.66, 0.25 [22:01:04] 2013/09/11 22:00 OK damiana PING PING OK - Packet loss = 0%, RTA = 0.22 ms [22:01:04] 2013/09/11 22:00 OK ha-nfs.esi PING PING OK - Packet loss = 0%, RTA = 0.50 ms [22:02:04] 2013/09/11 22:01 CRIT ha-dns-auth CRITICAL - Host Unreachable (ha-dns-auth) [22:02:04] 2013/09/11 21:58 CRIT adenia SMTP CRITICAL - Socket timeout after 10 seconds [22:02:04] 2013/09/11 22:01 OK damiana / DISK OK - free space: / 23062 MB (32% inode=95%): [22:02:04] 2013/09/11 22:00 OK damiana Environment IPMI ok: temperature ok fan ok voltage ok chassis ok [22:02:04] 2013/09/11 22:01 OK damiana Free Memory OK - 86.4% (7236020 kB) free. [22:02:04] 2013/09/11 22:01 OK damiana SSH SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [22:02:05] 2013/09/11 22:01 OK damiana ts-array5 2/2 paths are active [22:02:05] 2013/09/11 21:58 CRIT nightshade SMTP CRITICAL - Socket timeout after 10 seconds [22:03:04] 2013/09/11 22:02 CRIT ha-dns-recursor.esi CRITICAL - Host Unreachable (ha-dns-recursor.esi) [22:03:04] 2013/09/11 22:02 CRIT ha-proxy.esi CRITICAL - Host Unreachable (ha-proxy.esi) [22:05:04] 2013/09/11 22:04 CRIT ha-sql.esi CRITICAL - Host Unreachable (ha-sql.esi) [22:05:04] 2013/09/11 21:58 CRIT cassia SMTP CRITICAL - Socket timeout after 10 seconds [22:05:04] 2013/09/11 21:58 CRIT damiana DiskSuite CRITICAL - submirror d51 of mirror d50 is "Needs" and submirror d52 of mirror d50 is "Needs" and submirror d31 of mirror d30 is "Needs" and submirror d32 of mirror d30 is "Needs" and submirror d11 of mirror d10 is "Needs" and submirror d12 of mirror d10 is "Needs" and submirror d21 of mirror d20 is "Needs" and submirror d22 of mirror d20 is "Needs" [22:05:04] 2013/09/11 21:58 CRIT damiana NTP CRITICAL - Socket timeout after 10 seconds [22:05:04] 2013/09/11 21:58 CRIT hawthorn SMTP CRITICAL - Socket timeout after 10 seconds [22:05:04] 2013/09/11 21:58 CRIT hemlock SMTP CRITICAL - Socket timeout after 10 seconds [22:05:05] 2013/09/11 21:58 ?? nightshade Sun Grid Engine execd Execution timeout exceeded [22:05:05] 2013/09/11 21:58 CRIT ptolemy SMTP CRITICAL - Socket timeout after 10 seconds [22:05:06] 2013/09/11 21:58 CRIT wolfsbane SMTP CRITICAL - Socket timeout after 10 seconds [22:05:06] 2013/09/11 21:58 CRIT z-dat-s1-b SMTP CRITICAL - Socket timeout after 10 seconds [22:05:07] 2013/09/11 21:58 CRIT z-dat-s5-b SMTP CRITICAL - Socket timeout after 10 seconds [22:06:04] 2013/09/11 21:59 CRIT clematis SMTP CRITICAL - Socket timeout after 10 seconds [22:06:04] 2013/09/11 22:05 OK hawthorn SMTP SMTP OK - 8.320 sec. response time [22:06:04] 2013/09/11 21:59 CRIT hyacinth SMTP CRITICAL - Socket timeout after 10 seconds [22:06:04] 2013/09/11 21:58 CRIT rosemary SMTP CRITICAL - Socket timeout after 10 seconds [22:06:04] 2013/09/11 21:58 CRIT thyme SMTP CRITICAL - Socket timeout after 10 seconds [22:06:04] 2013/09/11 21:59 CRIT willow SMTP CRITICAL - Socket timeout after 10 seconds [22:06:05] 2013/09/11 21:58 CRIT yarrow SMTP CRITICAL - Socket timeout after 10 seconds [22:06:05] 2013/09/11 21:58 CRIT z-dat-s2-b SMTP CRITICAL - Socket timeout after 10 seconds [22:07:01] NFS server ha-nfs.esi not responding still trying [22:08:04] 2013/09/11 22:07 OK damiana DiskSuite OK - No disk failures detected [22:08:04] 2013/09/11 22:07 WARN ortelius toolserver.org HTTP HTTP WARNING: HTTP/1.1 404 Not found - 161 bytes in 0.014 second response time [22:08:04] 2013/09/11 22:07 OK willow SMTP SMTP OK - 5.031 sec. response time [22:08:04] 2013/09/11 22:07 OK wolfsbane SMTP SMTP OK - 5.008 sec. response time [22:08:04] 2013/09/11 22:06 WARN wolfsbane toolserver.org HTTP HTTP WARNING: HTTP/1.1 404 Not found - 161 bytes in 0.011 second response time [22:09:04] 2013/09/11 22:08 OK ha-dns-auth PING OK - Packet loss = 0%, RTA = 564.72 ms [22:09:04] 2013/09/11 22:08 OK ha-dns-recursor.esi PING OK - Packet loss = 0%, RTA = 0.34 ms [22:09:04] 2013/09/11 22:08 OK ha-proxy.esi PING OK - Packet loss = 16%, RTA = 158.17 ms [22:09:04] 2013/09/11 22:08 OK ha-sql.esi PING OK - Packet loss = 0%, RTA = 0.29 ms [22:09:04] 2013/09/11 22:08 OK adenia SMTP SMTP OK - 5.909 sec. response time [22:09:04] 2013/09/11 22:08 OK cassia SMTP SMTP OK - 0.002 sec. response time [22:09:05] 2013/09/11 22:08 OK clematis SMTP SMTP OK - 0.002 sec. response time [22:09:05] 2013/09/11 22:08 OK damiana NTP NTP OK: Offset -0.056573 secs [22:09:06] 2013/09/11 22:08 OK ha-dns-auth Authoritative DNS DNS OK: 0.236 seconds response time. 1.www.toolserver.org returns 91.198.174.203 [22:09:06] 2013/09/11 22:08 OK ha-dns-auth PING PING OK - Packet loss = 0%, RTA = 0.33 ms [22:09:07] 2013/09/11 22:08 OK ha-dns-recursor.esi DNS recursor DNS OK: 0.193 seconds response time. www.google.com returns 74.125.128.103,74.125.128.104,74.125.128.105,74.125.128.106,74.125.128.147,74.125.128.99 [22:09:07] 2013/09/11 22:08 OK ha-dns-recursor.esi PING PING OK - Packet loss = 0%, RTA = 0.29 ms [22:10:05] 2013/09/11 22:09 OK ha-proxy.esi PING PING OK - Packet loss = 0%, RTA = 0.19 ms [22:10:05] 2013/09/11 22:09 OK ha-sql.esi PING PING OK - Packet loss = 0%, RTA = 0.16 ms [22:10:05] 2013/09/11 22:09 OK hyacinth SMTP SMTP OK - 0.003 sec. response time [22:10:05] 2013/09/11 22:08 OK rosemary SMTP SMTP OK - 0.008 sec. response time [22:10:05] 2013/09/11 22:08 OK thyme SMTP SMTP OK - 3.394 sec. response time [22:14:04] 2013/09/11 22:13 OK ortelius toolserver.org HTTP HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.004 second response time [22:14:04] 2013/09/11 22:12 CRIT wolfsbane toolserver.org HTTP CRITICAL - Socket timeout after 10 seconds [22:15:05] 2013/09/11 22:13 OK hemlock /home DISK OK - free space: /home 14247 MB (28% inode=81%): [22:15:05] 2013/09/11 22:13 OK wolfsbane toolserver.org HTTP HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.012 second response time [23:12:07] 2013/09/11 23:11 OK ha-ldap.esi PING OK - Packet loss = 0%, RTA = 0.19 ms [23:13:07] 2013/09/11 23:12 OK ha-ldap.esi PING PING OK - Packet loss = 0%, RTA = 0.24 ms [23:13:39] ts is back i think