[00:03:01] Load avg. on willow is WARNING: WARNING - load average: 18.64, 15.33, 11.23 [00:05:01] Load avg. on willow is OK: OK - load average: 13.80, 14.67, 11.50 [00:13:02] Load avg. on willow is WARNING: WARNING - load average: 15.93, 16.05, 13.45 [00:21:01] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.239258/1.95, alarm hl:np_load_avg=2.000977/2.0, alarm hl:mem_free=262.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.239258/2.3, alarm hl:np_load_long=1.802734/2.5, alarm hl:cpu=99.000000/98, alarm hl:mem_free=262.000000M/150M, alarm hl:available=1/0 [00:22:02] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [00:23:21] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [00:24:55] 3(commented) [UTRS-98] Account creation is impossible <10https://jira.toolserver.org/browse/UTRS-98> (DeltaQuad) [00:25:02] Sun Grid Engine execd on willow is WARNING: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.878906/2.3, alarm hl:np_load_long=1.798828/2.5, alarm hl:cpu=99.900000/98, alarm hl:mem_free=377.000000M/150M, alarm hl:available=1/0 [00:29:03] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:29:03] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [00:33:00] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [00:40:28] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 437482 MB (8% inode=42%): [00:44:56] 3(assigned) [UTRS-102] Add link to [[Special:BlockList]] in account links <10https://jira.toolserver.org/browse/UTRS-102> (DeltaQuad) [00:44:59] 3(work started) [UTRS-102] Add link to [[Special:BlockList]] in account links <10https://jira.toolserver.org/browse/UTRS-102> (DeltaQuad) [00:49:57] 3(work stopped) [UTRS-102] Add link to [[Special:BlockList]] in account links <10https://jira.toolserver.org/browse/UTRS-102> (DeltaQuad) [00:54:58] 3(resolved) [UTRS-102] Add link to [[Special:BlockList]] in account links <10https://jira.toolserver.org/browse/UTRS-102> (DeltaQuad) [00:56:09] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [00:59:09] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [01:22:01] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.790039/1.95, alarm hl:np_load_avg=1.844726/2.0, alarm hl:mem_free=336.000000M/350M, alarm hl:available=1/0 [01:23:09] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [01:24:20] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [01:26:09] Sun Grid Engine execd on willow is WARNING: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.808594/2.3, alarm hl:np_load_long=1.830566/2.5, alarm hl:cpu=98.000000/98, alarm hl:mem_free=382.000000M/150M, alarm hl:available=1/0 [01:27:03] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [01:29:10] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:29:10] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [01:40:29] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 437455 MB (8% inode=42%): [01:49:10] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.568848/1.95, alarm hl:np_load_avg=1.639160/2.0, alarm hl:mem_free=166.000000M/350M, alarm hl:available=1/0 [01:56:11] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [01:56:11] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [01:59:09] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.643066/1.95, alarm hl:np_load_avg=1.647461/2.0, alarm hl:mem_free=125.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.643066/2.3, alarm hl:np_load_long=1.672851/2.5, alarm hl:cpu=98.600000/98, alarm hl:mem_free=125.000000M/150M, alarm hl:available=1/0 [01:59:09] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [02:03:01] Load avg. on willow is WARNING: WARNING - load average: 14.28, 15.72, 14.55 [02:07:09] Load avg. on willow is OK: OK - load average: 12.65, 14.79, 14.49 [02:09:10] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [02:24:30] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [02:24:30] /tmp on willow is WARNING: DISK WARNING - free space: / 21991 MB (20% inode=99%): [02:24:39] / on willow is WARNING: DISK WARNING - free space: / 21990 MB (20% inode=99%): [02:29:09] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:29:10] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [02:33:10] Load avg. on willow is WARNING: WARNING - load average: 14.11, 15.68, 14.95 [02:35:10] Load avg. on willow is OK: OK - load average: 12.77, 14.73, 14.69 [02:40:30] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 435468 MB (8% inode=42%): [02:42:19] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.797363/1.95, alarm hl:np_load_avg=1.822754/2.0, alarm hl:mem_free=263.000000M/350M, alarm hl:available=1/0 [02:45:10] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.070312/1.00, alarm hl:np_load_long=0.916992/1.50, alarm hl:mem_free=18014.000000M/350M, alarm hl:available=1/0 [02:46:09] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [02:48:09] Load avg. on willow is WARNING: WARNING - load average: 14.55, 15.23, 14.90 [02:48:19] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [02:51:19] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.719726/1.95, alarm hl:np_load_avg=1.795899/2.0, alarm hl:mem_free=217.000000M/350M, alarm hl:available=1/0 [02:53:19] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [02:57:08] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [02:59:19] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [03:04:10] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.558594/1.10, alarm hl:np_load_long=0.896485/1.55, alarm hl:mem_free=18844.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.558594/1.00, alarm hl:np_load_long=0.896485/1.50, alarm hl:mem_free=18844.000000M/350M, alarm hl:available=1/0 [03:06:19] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [03:11:58] 3(commented) [UTRS-102] Add link to [[Special:BlockList]] in account links <10https://jira.toolserver.org/browse/UTRS-102> (Chris Howie) [03:13:10] Load avg. on willow is WARNING: WARNING - load average: 21.17, 19.04, 16.85 [03:20:30] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.219727/1.95, alarm hl:np_load_avg=2.261230/2.0, alarm hl:mem_free=136.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.219727/2.3, alarm hl:np_load_long=2.167969/2.5, alarm hl:cpu=99.900000/98, alarm hl:mem_free=136.000000M/150M, alarm hl:available=1/0 [03:24:29] /tmp on willow is WARNING: DISK WARNING - free space: / 21848 MB (20% inode=99%): [03:24:30] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [03:25:39] / on willow is WARNING: DISK WARNING - free space: / 21846 MB (20% inode=99%): [03:29:19] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [03:29:19] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:37:09] Load avg. on willow is OK: OK - load average: 10.53, 12.77, 14.85 [03:40:39] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 435539 MB (8% inode=42%): [03:42:20] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [03:57:19] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [03:58:20] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.523926/1.95, alarm hl:np_load_avg=1.447266/2.0, alarm hl:mem_free=287.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.523926/2.3, alarm hl:np_load_long=1.537598/2.5, alarm hl:cpu=98.800000/98, alarm hl:mem_free=287.000000M/150M, alarm hl:available=1/0 [03:59:19] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [04:00:20] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [04:02:21] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.014649/1.00, alarm hl:np_load_long=0.634765/1.50, alarm hl:mem_free=17820.000000M/350M, alarm hl:available=1/0 [04:03:09] Load avg. on willow is WARNING: WARNING - load average: 14.64, 15.05, 13.70 [04:03:20] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [04:04:09] Load avg. on willow is OK: OK - load average: 13.51, 14.64, 13.64 [04:08:19] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.165039/1.10, alarm hl:np_load_long=0.738281/1.55, alarm hl:mem_free=16836.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.165039/1.00, alarm hl:np_load_long=0.738281/1.50, alarm hl:mem_free=16836.000000M/350M, alarm hl:available=1/0 [04:12:10] Load avg. on willow is WARNING: WARNING - load average: 17.67, 16.50, 14.78 [04:22:20] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.637695/1.95, alarm hl:np_load_avg=1.826660/2.0, alarm hl:mem_free=335.000000M/350M, alarm hl:available=1/0 [04:24:38] /tmp on willow is WARNING: DISK WARNING - free space: / 21699 MB (20% inode=99%): [04:24:50] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [04:25:49] / on willow is WARNING: DISK WARNING - free space: / 21696 MB (20% inode=99%): [04:29:20] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [04:29:20] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:29:20] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [04:30:11] Sun Grid Engine execd on wolfsbane is WARNING: short-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.359863/1.10, alarm hl:np_load_long=0.348633/1.55, alarm hl:mem_free=293.000000M/300M, alarm hl:available=1/0 [04:30:14] [[Wiki server assignments]] ! 10https://wiki.toolserver.org/w/index.php?diff=7025&oldid=7011&rcid=9258 * 91.198.174.202 * (+448) (updated page) [04:31:09] Sun Grid Engine execd on wolfsbane is OK: short-sol@wolfsbane OK: medium-sol@wolfsbane disabled [04:32:21] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.059082/1.95, alarm hl:np_load_avg=1.975098/2.0, alarm hl:mem_free=244.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.059082/2.3, alarm hl:np_load_long=1.880371/2.5, alarm hl:cpu=99.900000/98, alarm hl:mem_free=244.000000M/150M, alarm hl:available=1/0 [04:33:18] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [04:37:20] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.447266/1.10, alarm hl:np_load_long=1.207031/1.55, alarm hl:mem_free=15067.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.447266/1.00, alarm hl:np_load_long=1.207031/1.50, alarm hl:mem_free=15067.000000M/350M, alarm hl:available=1/0 [04:41:01] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 435472 MB (8% inode=42%): [04:54:20] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [04:57:20] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [04:58:09] Load avg. on willow is WARNING: WARNING - load average: 16.17, 15.85, 15.73 [04:59:20] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [05:02:09] Load avg. on willow is CRITICAL: CRITICAL - load average: 55.31, 24.83, 18.85 [05:09:11] Load avg. on willow is WARNING: WARNING - load average: 16.07, 20.84, 19.86 [05:12:09] Load avg. on willow is CRITICAL: CRITICAL - load average: 36.88, 25.39, 21.59 [05:15:48] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:16:29] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=3.769531/1.95, alarm hl:np_load_avg=2.959473/2.0, alarm hl:mem_free=62.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=3.769531/2.3, alarm hl:np_load_long=2.680176/2.5, alarm hl:cpu=89.900000/98, alarm hl:mem_free=62.000000M/150M, alarm hl:available=1/0 [05:17:20] /tmp on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:17:29] / on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:17:49] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:18:01] /tmp on willow is WARNING: DISK WARNING - free space: / 21572 MB (20% inode=99%): [05:18:01] / on willow is WARNING: DISK WARNING - free space: / 21572 MB (20% inode=99%): [05:19:10] /tmp on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:19:28] / on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:25:01] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [05:30:20] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [05:30:20] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:37:19] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.618652/1.95, alarm hl:np_load_avg=2.170410/2.0, alarm hl:mem_free=316.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.618652/2.3, alarm hl:np_load_long=2.958984/2.5, alarm hl:cpu=95.200000/98, alarm hl:mem_free=316.000000M/150M, alarm hl:available=1/0 [05:38:48] /tmp on willow is WARNING: DISK WARNING - free space: / 21536 MB (20% inode=99%): [05:38:59] / on willow is WARNING: DISK WARNING - free space: / 21536 MB (20% inode=99%): [05:41:59] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 435429 MB (8% inode=42%): [05:44:08] Load avg. on willow is WARNING: WARNING - load average: 12.50, 13.86, 19.55 [05:44:19] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [05:53:21] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.953125/1.10, alarm hl:np_load_long=0.828125/1.55, alarm hl:mem_free=17859.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.953125/1.00, alarm hl:np_load_long=0.828125/1.50, alarm hl:mem_free=17859.000000M/350M, alarm hl:available=1/0 [05:54:21] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [05:54:48] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:55:22] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:22] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:22] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:22] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:56:09] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [05:56:10] SMTP on hyacinth is OK: SMTP OK - 0.002 sec. response time [05:56:10] SMTP on z-dat-s7-a is OK: SMTP OK - 0.007 sec. response time [05:56:10] SMTP on z-dat-s3-a is OK: SMTP OK - 0.003 sec. response time [05:58:19] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [05:59:10] Load avg. on willow is OK: OK - load average: 9.68, 10.96, 14.84 [05:59:20] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [06:03:10] Load avg. on willow is WARNING: WARNING - load average: 12.74, 14.35, 15.56 [06:03:20] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.621582/1.95, alarm hl:np_load_avg=1.806152/2.0, alarm hl:mem_free=239.000000M/350M, alarm hl:available=1/0 [06:10:48] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:11:49] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [06:12:20] MySQL on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [06:12:20] MySQL slave on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [06:12:28] MySQL on z-dat-s7-a is OK: Uptime: 4654152 Threads: 15 Questions: 1087339018 Slow queries: 135597 Opens: 8726540 Flush tables: 1 Open tables: 7376 Queries per second avg: 233.627 [06:12:29] MySQL slave on z-dat-s7-a is OK: Uptime: 4654152 Threads: 15 Questions: 1087339019 Slow queries: 135597 Opens: 8726540 Flush tables: 1 Open tables: 7376 Queries per second avg: 233.627 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 175 [06:19:29] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [06:22:49] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:23:09] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:23:09] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:23:39] / on z-dat-s6-a is OK: DISK OK - free space: / 8484 MB (28% inode=85%): [06:24:00] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [06:25:10] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [06:30:19] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [06:30:19] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:34:20] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [06:34:49] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:39:00] /tmp on willow is WARNING: DISK WARNING - free space: / 21391 MB (20% inode=99%): [06:39:00] / on willow is WARNING: DISK WARNING - free space: / 21391 MB (20% inode=99%): [06:39:28] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.077637/1.95, alarm hl:np_load_avg=1.857910/2.0, alarm hl:mem_free=509.000000M/350M, alarm hl:available=1/0 [06:41:59] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 435379 MB (8% inode=42%): [06:42:29] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [06:42:39] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:42:48] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:42:48] SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:43:09] SMF on z-dat-s3-a is OK: OK - all services online [06:43:19] SMF on z-dat-s4-a is OK: OK - all services online [06:43:19] SMF on z-dat-s7-a is OK: OK - all services online [06:44:19] Load avg. on willow is OK: OK - load average: 12.59, 14.03, 14.94 [06:47:19] Load avg. on willow is WARNING: WARNING - load average: 18.30, 16.21, 15.64 [06:55:18] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:55:19] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:55:19] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:55:19] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:55:30] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:56:01] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:10] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:10] /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:10] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:10] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:10] / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:10] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:11] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:11] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:11] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:12] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:56:19] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:56:19] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:30] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:30] Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:39] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:48] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:48] SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:57:00] / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:57:01] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:57:02] / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:57:02] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:57:11] SMTP on z-dat-s4-a is OK: SMTP OK - 4.202 sec. response time [06:57:11] Load avg. on z-dat-s3-a is OK: OK - load average: 0.55, 1.53, 1.91 [06:57:11] /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 178236 MB (18% inode=99%): [06:57:11] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [06:57:11] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [06:57:11] Load avg. on z-dat-s4-a is OK: OK - load average: 0.59, 1.57, 1.93 [06:57:12] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [06:57:12] SMTP on z-dat-s7-a is OK: SMTP OK - 0.179 sec. response time [06:57:20] SMTP on z-dat-s3-a is OK: SMTP OK - 0.002 sec. response time [06:57:30] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [06:57:30] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [06:57:31] /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 178233 MB (18% inode=99%): [06:57:31] / on z-dat-s7-a is OK: DISK OK - free space: / 8484 MB (28% inode=85%): [06:57:31] / on z-dat-s4-a is OK: DISK OK - free space: / 8484 MB (28% inode=85%): [06:57:31] SMF on z-dat-s6-a is OK: OK - all services online [06:57:31] /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 102857 MB (25% inode=99%): [06:57:40] MySQL on z-dat-s3-a is OK: Uptime: 4224442 Threads: 13 Questions: 4824453484 Slow queries: 236158 Opens: 35892955 Flush tables: 1 Open tables: 16384 Queries per second avg: 1142.33 [06:57:40] MySQL slave on z-dat-s3-a is OK: Uptime: 4224442 Threads: 12 Questions: 4824453486 Slow queries: 236158 Opens: 35892955 Flush tables: 1 Open tables: 16384 Queries per second avg: 1142.33 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 176 [06:57:40] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 1937 MB (99% inode=99%): [06:57:40] /tmp on z-dat-s7-a is OK: DISK OK - free space: /tmp 1938 MB (99% inode=99%): [06:57:40] Load avg. on z-dat-s7-a is OK: OK - load average: 1.68, 1.67, 1.94 [06:57:40] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 1936 MB (99% inode=99%): [06:57:40] / on z-dat-s3-a is OK: DISK OK - free space: / 8484 MB (28% inode=85%): [06:57:41] Load avg. on z-dat-s6-a is OK: OK - load average: 1.74, 1.68, 1.94 [06:57:41] / on z-dat-s6-a is OK: DISK OK - free space: / 8484 MB (28% inode=85%): [06:57:42] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 2024 MB (99% inode=99%): [06:57:42] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 99023 MB (24% inode=99%): [06:57:59] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [06:58:29] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [06:59:40] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [07:05:50] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.958008/1.95, alarm hl:np_load_avg=2.167480/2.0, alarm hl:mem_free=133.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.958008/2.3, alarm hl:np_load_long=2.187500/2.5, alarm hl:cpu=97.300000/98, alarm hl:mem_free=133.000000M/150M, alarm hl:available=1/0 [07:18:20] Load avg. on willow is CRITICAL: CRITICAL - load average: 32.33, 21.17, 18.99 [07:19:20] Load avg. on willow is WARNING: WARNING - load average: 23.68, 20.91, 19.05 [07:26:09] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [07:30:39] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:30:40] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [07:33:30] Load avg. on willow is CRITICAL: CRITICAL - load average: 29.31, 23.11, 20.30 [07:39:00] /tmp on willow is WARNING: DISK WARNING - free space: / 21252 MB (20% inode=98%): [07:39:00] / on willow is WARNING: DISK WARNING - free space: / 21252 MB (20% inode=98%): [07:39:19] Load avg. on willow is WARNING: WARNING - load average: 15.90, 19.47, 19.68 [07:40:30] Load avg. on willow is CRITICAL: CRITICAL - load average: 28.32, 21.57, 20.37 [07:42:00] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 435214 MB (8% inode=42%): [07:42:18] any toolserver user available? [07:48:28] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.324219/1.10, alarm hl:np_load_long=0.828125/1.55, alarm hl:mem_free=19030.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.324219/1.00, alarm hl:np_load_long=0.828125/1.50, alarm hl:mem_free=19030.000000M/350M, alarm hl:available=1/0 [07:58:39] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [07:58:39] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [07:59:20] Load avg. on willow is WARNING: WARNING - load average: 15.02, 17.45, 19.79 [07:59:50] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [08:00:19] Load avg. on willow is CRITICAL: CRITICAL - load average: 31.80, 20.75, 20.75 [08:05:50] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.760254/1.95, alarm hl:np_load_avg=2.646973/2.0, alarm hl:mem_free=124.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.760254/2.3, alarm hl:np_load_long=2.625488/2.5, alarm hl:cpu=99.800000/98, alarm hl:mem_free=124.000000M/150M, alarm hl:available=1/0 [08:09:29] Load avg. on willow is WARNING: WARNING - load average: 13.80, 17.94, 19.76 [08:26:09] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [08:31:40] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:31:40] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [08:36:29] Load avg. on willow is OK: OK - load average: 13.14, 13.53, 14.92 [08:39:00] / on willow is WARNING: DISK WARNING - free space: / 21114 MB (20% inode=98%): [08:39:00] /tmp on willow is WARNING: DISK WARNING - free space: / 21114 MB (20% inode=98%): [08:39:39] / on wolfsbane is WARNING: DISK WARNING - free space: / 6262 MB (20% inode=93%): [08:41:59] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 435057 MB (8% inode=42%): [08:54:00] /tmp on willow is OK: DISK OK - free space: / 23084 MB (22% inode=99%): [08:54:00] / on willow is OK: DISK OK - free space: / 23084 MB (22% inode=99%): [08:58:40] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [09:00:39] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [09:02:29] Load avg. on willow is WARNING: WARNING - load average: 16.29, 15.13, 14.36 [09:03:29] Load avg. on willow is OK: OK - load average: 13.11, 14.38, 14.14 [09:06:00] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.727051/1.95, alarm hl:np_load_avg=1.743164/2.0, alarm hl:mem_free=130.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.727051/2.3, alarm hl:np_load_long=1.747559/2.5, alarm hl:cpu=85.300000/98, alarm hl:mem_free=130.000000M/150M, alarm hl:available=1/0 [09:23:59] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [09:26:20] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [09:31:48] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:31:49] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [09:33:00] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.354980/1.95, alarm hl:np_load_avg=1.500000/2.0, alarm hl:mem_free=337.000000M/350M, alarm hl:available=1/0 [09:39:40] / on wolfsbane is WARNING: DISK WARNING - free space: / 5987 MB (19% inode=93%): [09:42:59] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 434616 MB (8% inode=42%): [09:49:59] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [09:59:40] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [10:00:49] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [10:01:59] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.734375/1.95, alarm hl:np_load_avg=1.523926/2.0, alarm hl:mem_free=245.000000M/350M, alarm hl:available=1/0 [10:12:29] Load avg. on willow is WARNING: WARNING - load average: 15.23, 13.84, 12.42 [10:16:31] Load avg. on willow is OK: OK - load average: 13.49, 14.27, 12.96 [10:19:59] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [10:26:39] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [10:27:59] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.449707/1.95, alarm hl:np_load_avg=1.481445/2.0, alarm hl:mem_free=171.000000M/350M, alarm hl:available=1/0 [10:31:49] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:31:49] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [10:32:30] Sun Grid Engine execd on wolfsbane is WARNING: short-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.411621/1.10, alarm hl:np_load_long=0.282715/1.55, alarm hl:mem_free=230.000000M/300M, alarm hl:available=1/0: medium-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.411621/1.00, alarm hl:np_load_long=0.282715/1.50, alarm hl:mem_free=230.000000M/350M, alarm hl:available=1/0 [10:33:29] Sun Grid Engine execd on wolfsbane is OK: short-sol@wolfsbane OK: medium-sol@wolfsbane OK [10:39:49] / on wolfsbane is WARNING: DISK WARNING - free space: / 5652 MB (18% inode=93%): [10:43:00] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 434372 MB (8% inode=42%): [10:47:59] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [10:50:59] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.546875/1.95, alarm hl:np_load_avg=1.624024/2.0, alarm hl:mem_free=186.000000M/350M, alarm hl:available=1/0 [10:55:31] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:56:09] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [10:59:39] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [11:00:39] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.038086/1.00, alarm hl:np_load_long=0.707031/1.50, alarm hl:mem_free=17361.000000M/350M, alarm hl:available=1/0 [11:00:50] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [11:02:40] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [11:07:45] [[Special:Log/newusers]] create 10 * Brandon Sky * (New user account) [11:08:39] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.061524/1.00, alarm hl:np_load_long=0.868164/1.50, alarm hl:mem_free=18260.000000M/350M, alarm hl:available=1/0 [11:12:29] Load avg. on willow is WARNING: WARNING - load average: 15.25, 15.00, 14.16 [11:13:29] Load avg. on willow is OK: OK - load average: 14.22, 14.73, 14.11 [11:15:49] / on wolfsbane is OK: DISK OK - free space: / 10242 MB (34% inode=93%): [11:17:16] [[Template:Unblock]] !N 10https://wiki.toolserver.org/w/index.php?oldid=7026&rcid=9260 * Brandon Sky * (+1363) (Created page with "
'''This [[meta:Help:Blocked users|blocked user]] is asking ...") [11:18:40] Sun Grid Engine execd on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:19:39] Sun Grid Engine execd on wolfsbane is OK: short-sol@wolfsbane OK: medium-sol@wolfsbane OK [11:19:49] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:20:18] [[Template:Unblock rejected]] !N 10https://wiki.toolserver.org/w/index.php?oldid=7027&rcid=9261 * Brandon Sky * (+772) (Created page with "
'''This [[meta:Help:Blocked users|blocked user]]'s unb...") [11:21:29] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:21:30] SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:21:30] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:21:30] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:21:30] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:21:30] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:21:39] [[Template:Unblock granted]] !N 10https://wiki.toolserver.org/w/index.php?oldid=7028&rcid=9262 * Brandon Sky * (+649) (Created page with "
'''This [[meta:Help:Blocked users|blocked user]]'s u...") [11:21:39] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:21:39] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:21:39] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:21:39] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:21:39] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:21:58] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:21:59] SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:21:59] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:10] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:10] / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:10] SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:10] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:10] / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:10] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:19] s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [11:22:19] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:19] /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:19] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:19] / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:20] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:20] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:20] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:20] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:21] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:29] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:29] Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:39] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [11:22:39] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [11:23:19] /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 177848 MB (18% inode=99%): [11:23:19] Load avg. on z-dat-s4-a is OK: OK - load average: 0.38, 0.92, 1.27 [11:23:19] Load avg. on z-dat-s3-a is OK: OK - load average: 0.38, 0.91, 1.26 [11:23:29] MySQL on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [11:23:39] MySQL slave on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [11:23:40] /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 177848 MB (18% inode=99%): [11:23:41] / on z-dat-s7-a is OK: DISK OK - free space: / 8483 MB (28% inode=85%): [11:23:41] / on z-dat-s4-a is OK: DISK OK - free space: / 8483 MB (28% inode=85%): [11:23:41] /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 102565 MB (25% inode=99%): [11:23:49] MySQL slave on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [11:23:49] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 2212 MB (99% inode=99%): [11:23:49] /tmp on z-dat-s7-a is OK: DISK OK - free space: /tmp 2212 MB (99% inode=99%): [11:23:49] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 2212 MB (99% inode=99%): [11:23:50] / on z-dat-s3-a is OK: DISK OK - free space: / 8483 MB (28% inode=85%): [11:23:50] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 98758 MB (24% inode=99%): [11:23:50] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 2212 MB (99% inode=99%): [11:23:50] Load avg. on z-dat-s7-a is OK: OK - load average: 0.36, 0.83, 1.21 [11:23:50] / on z-dat-s6-a is OK: DISK OK - free space: / 8483 MB (28% inode=85%): [11:23:51] Load avg. on z-dat-s6-a is OK: OK - load average: 0.36, 0.83, 1.21 [11:24:09] MySQL on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [11:24:09] MySQL slave on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [11:24:39] MySQL on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [11:24:49] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [11:25:08] SMF on z-dat-s6-a is OK: OK - all services online [11:25:08] SMF on hyacinth is OK: OK - all services online [11:25:08] MySQL slave on z-dat-s3-a is OK: Uptime: 4240489 Threads: 22 Questions: 4839728654 Slow queries: 236676 Opens: 36079409 Flush tables: 1 Open tables: 16384 Queries per second avg: 1141.313 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 388 [11:25:08] MySQL on z-dat-s3-a is OK: Uptime: 4240489 Threads: 22 Questions: 4839728656 Slow queries: 236676 Opens: 36079409 Flush tables: 1 Open tables: 16384 Queries per second avg: 1141.313 [11:25:09] MySQL slave on z-dat-s4-a is OK: Uptime: 4149377 Threads: 6 Questions: 223763289 Slow queries: 49396 Opens: 84825 Flush tables: 1 Open tables: 612 Queries per second avg: 53.926 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 364 [11:25:09] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [11:25:09] MySQL slave on z-dat-s7-a is OK: Uptime: 4672917 Threads: 9 Questions: 1092232226 Slow queries: 136209 Opens: 8793522 Flush tables: 1 Open tables: 7265 Queries per second avg: 233.736 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 321 [11:25:18] SMTP on z-dat-s6-a is OK: SMTP OK - 0.003 sec. response time [11:25:19] SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:25:19] SMTP on z-dat-s4-a is OK: SMTP OK - 0.003 sec. response time [11:25:19] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:25:19] SMTP on hyacinth is OK: SMTP OK - 0.004 sec. response time [11:25:20] MySQL on z-dat-s7-a is OK: Uptime: 4672925 Threads: 8 Questions: 1092236495 Slow queries: 136209 Opens: 8793522 Flush tables: 1 Open tables: 7265 Queries per second avg: 233.737 [11:25:20] s4 replag on z-dat-s4-a is OK: QUERY OK: SELECT ts_rc_age() returned 202.000000 [11:25:20] MySQL on z-dat-s4-a is OK: Uptime: 4149392 Threads: 6 Questions: 223766750 Slow queries: 49396 Opens: 84825 Flush tables: 1 Open tables: 612 Queries per second avg: 53.927 [11:25:29] MySQL on z-dat-s6-a is OK: Uptime: 1181089 Threads: 12 Questions: 279105872 Slow queries: 69695 Opens: 2867207 Flush tables: 2 Open tables: 2882 Queries per second avg: 236.312 [11:25:30] MySQL slave on z-dat-s6-a is OK: Uptime: 1181089 Threads: 11 Questions: 279105873 Slow queries: 69695 Opens: 2867207 Flush tables: 2 Open tables: 2882 Queries per second avg: 236.312 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 325 [11:25:30] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:25:30] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:25:30] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:25:31] SMF on z-dat-s3-a is OK: OK - all services online [11:25:31] SMF on z-dat-s4-a is OK: OK - all services online [11:25:31] SMTP on z-dat-s3-a is OK: SMTP OK - 0.016 sec. response time [11:25:31] SMF on z-dat-s7-a is OK: OK - all services online [11:27:39] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [11:28:00] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [11:28:48] /tmp on wolfsbane is CRITICAL: DISK CRITICAL - free space: /tmp 63 MB (6% inode=99%): [11:29:10] toolserver.org HTTP on wolfsbane is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 1.883 second response time [11:29:10] toolserver.org HTTP on ortelius is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 2.056 second response time [11:31:00] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.411 second response time [11:31:50] /tmp on wolfsbane is UNKNOWN: CHECK_NRPE: Error receiving data from daemon. [11:31:51] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:31:51] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [11:32:42] SMF on wolfsbane is CRITICAL: ERROR - offline: svc:/application/sge/execd:toolserver [11:33:09] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.441 second response time [11:33:29] [[User talk:Brandon Sky]] !N 10https://wiki.toolserver.org/w/index.php?oldid=7029&rcid=9263 * Brandon Sky * (+16) (Created page with "Messages do down") [11:33:40] SMF on wolfsbane is OK: OK - all services online [11:33:50] /tmp on wolfsbane is CRITICAL: DISK CRITICAL - free space: /tmp 84 MB (7% inode=99%): [11:35:00] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.289062/1.95, alarm hl:np_load_avg=1.480957/2.0, alarm hl:mem_free=314.000000M/350M, alarm hl:available=1/0 [11:42:30] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:42:30] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:42:39] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:42:39] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:42:39] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:43:08] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [11:43:18] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:43:29] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:43:29] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:43:30] SMTP on z-dat-s7-a is OK: SMTP OK - 0.105 sec. response time [11:43:59] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 434201 MB (8% inode=42%): [11:53:59] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [11:56:59] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.624512/1.95, alarm hl:np_load_avg=1.585449/2.0, alarm hl:mem_free=347.000000M/350M, alarm hl:available=1/0 [11:59:40] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [12:00:30] Load avg. on willow is CRITICAL: CRITICAL - load average: 36.95, 19.84, 15.14 [12:01:29] Load avg. on willow is WARNING: WARNING - load average: 22.51, 18.73, 15.03 [12:01:50] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [12:09:29] Load avg. on willow is OK: OK - load average: 11.54, 14.57, 14.48 [12:27:50] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [12:32:49] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:32:49] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [12:32:59] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [12:36:40] Sun Grid Engine execd on wolfsbane is WARNING: short-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.339355/1.10, alarm hl:np_load_long=0.342285/1.55, alarm hl:mem_free=159.000000M/300M, alarm hl:available=1/0: medium-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.339355/1.00, alarm hl:np_load_long=0.342285/1.50, alarm hl:mem_free=159.000000M/350M, alarm hl:available=1/0 [12:37:00] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.457520/1.95, alarm hl:np_load_avg=1.566406/2.0, alarm hl:mem_free=223.000000M/350M, alarm hl:available=1/0 [12:38:00] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [12:38:41] Sun Grid Engine execd on wolfsbane is OK: short-sol@wolfsbane OK: medium-sol@wolfsbane OK [12:45:00] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 433988 MB (8% inode=42%): [12:53:41] Sun Grid Engine execd on wolfsbane is WARNING: short-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.342285/1.10, alarm hl:np_load_long=0.335938/1.55, alarm hl:mem_free=153.000000M/300M, alarm hl:available=1/0: medium-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.342285/1.00, alarm hl:np_load_long=0.335938/1.50, alarm hl:mem_free=153.000000M/350M, alarm hl:available=1/0 [12:59:41] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [13:01:50] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [13:27:59] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [13:32:00] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.162598/1.95, alarm hl:np_load_avg=1.785156/2.0, alarm hl:mem_free=514.000000M/350M, alarm hl:available=1/0 [13:32:59] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [13:33:50] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:33:50] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [13:36:39] Free Memory on damiana is CRITICAL: CRITICAL - 4.9% (206796 kB) free! [13:39:40] Free Memory on damiana is OK: OK - 24.2% (1012284 kB) free. [13:44:59] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 433777 MB (8% inode=42%): [13:59:50] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [14:01:50] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [14:21:49] Sun Grid Engine execd on wolfsbane is WARNING: short-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.347656/1.10, alarm hl:np_load_long=0.365723/1.55, alarm hl:mem_free=291.000000M/300M, alarm hl:available=1/0: medium-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.347656/1.00, alarm hl:np_load_long=0.365723/1.50, alarm hl:mem_free=291.000000M/350M, alarm hl:available=1/0 [14:25:50] Sun Grid Engine execd on wolfsbane is OK: short-sol@wolfsbane OK: medium-sol@wolfsbane OK [14:28:00] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [14:28:50] Sun Grid Engine execd on wolfsbane is WARNING: short-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.424316/1.10, alarm hl:np_load_long=0.368652/1.55, alarm hl:mem_free=222.000000M/300M, alarm hl:available=1/0: medium-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.424316/1.00, alarm hl:np_load_long=0.368652/1.50, alarm hl:mem_free=222.000000M/350M, alarm hl:available=1/0 [14:34:00] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [14:34:00] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [14:44:59] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 433555 MB (8% inode=42%): [14:59:50] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [15:02:00] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [15:12:59] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.288574/1.95, alarm hl:np_load_avg=1.304688/2.0, alarm hl:mem_free=148.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.288574/2.3, alarm hl:np_load_long=1.318848/2.5, alarm hl:cpu=77.800000/98, alarm hl:mem_free=148.000000M/150M, alarm hl:available=1/0 [15:14:00] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [15:23:00] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.062988/1.95, alarm hl:np_load_avg=1.158203/2.0, alarm hl:mem_free=189.000000M/350M, alarm hl:available=1/0 [15:27:59] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [15:35:10] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:35:11] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [15:35:23] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [15:39:21] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.276367/1.95, alarm hl:np_load_avg=1.317383/2.0, alarm hl:mem_free=298.000000M/350M, alarm hl:available=1/0 [15:45:22] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 433338 MB (8% inode=42%): [15:50:21] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [15:55:30] Hi all [15:59:52] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [16:02:02] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [16:12:21] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.719726/1.95, alarm hl:np_load_avg=1.656738/2.0, alarm hl:mem_free=187.000000M/350M, alarm hl:available=1/0 [16:14:22] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [16:28:30] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [16:33:21] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.474121/1.95, alarm hl:np_load_avg=1.554688/2.0, alarm hl:mem_free=323.000000M/350M, alarm hl:available=1/0 [16:34:22] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [16:35:11] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:35:11] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [16:45:22] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 432658 MB (8% inode=42%): [17:00:01] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [17:02:11] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [17:12:51] Load avg. on willow is WARNING: WARNING - load average: 14.73, 15.38, 13.82 [17:14:01] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.578125/1.10, alarm hl:np_load_long=0.875977/1.55, alarm hl:mem_free=18612.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.578125/1.00, alarm hl:np_load_long=0.875977/1.50, alarm hl:mem_free=18612.000000M/350M, alarm hl:available=1/0 [17:14:51] Load avg. on willow is OK: OK - load average: 10.88, 13.96, 13.50 [17:15:01] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [17:20:31] s4 replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 1848.000000 [17:28:31] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [17:32:31] s4 replag on cassia is OK: QUERY OK: SELECT ts_rc_age() returned 1781.000000 [17:35:12] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:35:12] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [17:42:31] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:42:31] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:42:31] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:42:31] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:42:31] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:42:32] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:42:41] / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:42:41] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:42:41] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:42:41] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:43:01] /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 177046 MB (18% inode=99%): [17:43:01] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 2148 MB (99% inode=99%): [17:43:01] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 2148 MB (99% inode=99%): [17:43:01] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 98349 MB (24% inode=99%): [17:43:11] / on z-dat-s3-a is OK: DISK OK - free space: / 8455 MB (28% inode=85%): [17:43:11] Load avg. on z-dat-s6-a is OK: OK - load average: 0.96, 1.28, 1.91 [17:43:11] / on z-dat-s6-a is OK: DISK OK - free space: / 8455 MB (28% inode=85%): [17:43:11] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 2165 MB (99% inode=99%): [17:43:21] SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [17:43:21] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [17:46:21] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 432650 MB (8% inode=42%): [17:49:51] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:53:21] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.283691/1.95, alarm hl:np_load_avg=1.512207/2.0, alarm hl:mem_free=330.000000M/350M, alarm hl:available=1/0 [17:55:22] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [17:55:31] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:55:40] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:55:40] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:55:51] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:55:51] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:55:51] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:55:52] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:55:52] Environment IPMI on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:56:10] Load avg. on z-dat-s7-a is OK: OK - load average: 0.48, 0.90, 1.45 [17:56:21] Load avg. on z-dat-s4-a is OK: OK - load average: 1.08, 1.01, 1.48 [17:56:31] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [17:56:31] Environment IPMI on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [17:56:41] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [17:56:41] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [17:58:21] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.555664/1.95, alarm hl:np_load_avg=1.506836/2.0, alarm hl:mem_free=321.000000M/350M, alarm hl:available=1/0 [17:59:31] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [18:00:00] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [18:02:11] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [18:06:21] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [18:11:01] Environment IPMI on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:11:41] Environment IPMI on adenia is OK: ok: temperature ok fan ok voltage ok chassis ok [18:12:51] Load avg. on willow is WARNING: WARNING - load average: 15.45, 15.13, 14.01 [18:14:51] Load avg. on willow is OK: OK - load average: 12.86, 14.49, 13.92 [18:18:00] Environment IPMI on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:19:51] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:28:40] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [18:36:11] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:36:12] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [18:45:01] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.025391/1.00, alarm hl:np_load_long=0.828125/1.50, alarm hl:mem_free=18441.000000M/350M, alarm hl:available=1/0 [18:46:01] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [18:46:20] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 431254 MB (8% inode=42%): [18:47:51] Load avg. on willow is WARNING: WARNING - load average: 15.80, 14.11, 13.23 [18:48:51] Load avg. on willow is OK: OK - load average: 13.59, 13.80, 13.18 [19:00:02] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [19:02:11] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [19:03:21] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.763672/1.95, alarm hl:np_load_avg=1.776855/2.0, alarm hl:mem_free=209.000000M/350M, alarm hl:available=1/0 [19:07:21] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [19:28:41] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [19:33:21] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.553711/1.95, alarm hl:np_load_avg=1.633301/2.0, alarm hl:mem_free=286.000000M/350M, alarm hl:available=1/0 [19:35:21] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [19:36:22] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:36:22] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [19:38:21] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.579101/1.95, alarm hl:np_load_avg=1.546875/2.0, alarm hl:mem_free=232.000000M/350M, alarm hl:available=1/0 [19:43:01] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.559570/1.10, alarm hl:np_load_long=0.847656/1.55, alarm hl:mem_free=17692.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.559570/1.00, alarm hl:np_load_long=0.847656/1.50, alarm hl:mem_free=17692.000000M/350M, alarm hl:available=1/0 [19:46:22] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 431110 MB (8% inode=42%): [19:47:02] Load avg. on willow is WARNING: WARNING - load average: 24.77, 18.96, 15.02 [19:47:02] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [19:59:02] Load avg. on willow is OK: OK - load average: 11.74, 13.83, 14.95 [20:00:11] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [20:02:11] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [20:24:51] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:28:42] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [20:34:41] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [20:37:12] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:37:21] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [20:44:11] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.175781/1.10, alarm hl:np_load_long=0.773438/1.55, alarm hl:mem_free=18113.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.175781/1.00, alarm hl:np_load_long=0.773438/1.50, alarm hl:mem_free=18113.000000M/350M, alarm hl:available=1/0 [20:47:02] Load avg. on willow is WARNING: WARNING - load average: 15.10, 13.73, 12.86 [20:47:21] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 430917 MB (8% inode=42%): [20:49:01] Load avg. on willow is OK: OK - load average: 14.70, 14.22, 13.16 [20:49:31] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.863770/1.95, alarm hl:np_load_avg=1.786133/2.0, alarm hl:mem_free=156.000000M/350M, alarm hl:available=1/0 [20:50:11] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [20:59:32] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [21:00:12] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [21:02:11] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [21:02:31] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.104004/1.95, alarm hl:np_load_avg=2.037109/2.0, alarm hl:mem_free=133.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.104004/2.3, alarm hl:np_load_long=1.846680/2.5, alarm hl:cpu=92.500000/98, alarm hl:mem_free=133.000000M/150M, alarm hl:available=1/0 [21:03:02] Load avg. on willow is WARNING: WARNING - load average: 14.91, 15.86, 14.69 [21:04:31] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [21:10:01] Load avg. on willow is OK: OK - load average: 12.84, 14.93, 14.76 [21:13:11] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.168945/1.10, alarm hl:np_load_long=0.890625/1.55, alarm hl:mem_free=18109.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.168945/1.00, alarm hl:np_load_long=0.890625/1.50, alarm hl:mem_free=18109.000000M/350M, alarm hl:available=1/0 [21:17:12] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [21:26:32] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.605469/1.95, alarm hl:np_load_avg=1.812012/2.0, alarm hl:mem_free=292.000000M/350M, alarm hl:available=1/0 [21:28:41] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [21:32:32] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [21:37:22] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:37:23] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [21:41:31] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.642090/1.95, alarm hl:np_load_avg=1.625976/2.0, alarm hl:mem_free=194.000000M/350M, alarm hl:available=1/0 [21:43:32] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [21:47:22] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 430052 MB (8% inode=42%): [22:00:21] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [22:02:22] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [22:05:03] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:08:22] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.713867/1.10, alarm hl:np_load_long=0.865235/1.55, alarm hl:mem_free=19056.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.713867/1.00, alarm hl:np_load_long=0.865235/1.50, alarm hl:mem_free=19056.000000M/350M, alarm hl:available=1/0 [22:11:33] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.581543/1.95, alarm hl:np_load_avg=1.520020/2.0, alarm hl:mem_free=169.000000M/350M, alarm hl:available=1/0 [22:12:22] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [22:18:34] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [22:28:51] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [22:34:31] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [22:37:46] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:37:46] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [22:38:46] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.178223/1.95, alarm hl:np_load_avg=1.303711/2.0, alarm hl:mem_free=307.000000M/350M, alarm hl:available=1/0 [22:43:45] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [22:44:23] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.099609/1.00, alarm hl:np_load_long=0.728515/1.50, alarm hl:mem_free=18895.000000M/350M, alarm hl:available=1/0 [22:45:23] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [22:47:45] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 429971 MB (8% inode=42%): [22:49:43] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.521973/1.95, alarm hl:np_load_avg=1.491699/2.0, alarm hl:mem_free=182.000000M/350M, alarm hl:available=1/0 [23:00:46] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [23:02:46] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [23:29:03] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [23:32:13] /sql on z-dat-s4-a is WARNING: DISK WARNING - free space: /sql 38674 MB (9% inode=99%): [23:37:46] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [23:37:46] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [23:38:13] /sql on z-dat-s4-a is CRITICAL: DISK CRITICAL - free space: /sql 24364 MB (5% inode=99%): [23:43:13] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 90406 MB (22% inode=99%): [23:47:46] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 429904 MB (8% inode=42%): [23:54:13] /sql on z-dat-s4-a is WARNING: DISK WARNING - free space: /sql 35353 MB (8% inode=99%):