[00:02:46] Load avg. on willow is WARNING: WARNING - load average: 17.53, 15.76, 13.18 [00:18:35] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [00:19:26] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:19:36] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [00:19:36] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.463867/1.95, alarm hl:np_load_avg=1.886230/2.0, alarm hl:mem_free=151.000000M/350M, alarm hl:available=1/0 [00:24:45] Load avg. on willow is OK: OK - load average: 12.03, 14.02, 14.82 [00:30:36] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 34861.000000 [00:31:35] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [00:31:56] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 442627 MB (8% inode=43%): [00:35:37] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [00:38:35] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.393066/1.95, alarm hl:np_load_avg=1.433105/2.0, alarm hl:mem_free=187.000000M/350M, alarm hl:available=1/0 [00:48:26] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [00:50:45] Load avg. on willow is WARNING: WARNING - load average: 20.04, 16.32, 14.55 [00:52:45] Load avg. on willow is OK: OK - load average: 12.84, 14.77, 14.18 [00:53:27] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 30655 [01:02:45] Load avg. on willow is WARNING: WARNING - load average: 14.27, 15.89, 14.92 [01:18:35] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [01:18:45] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [01:20:26] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:20:35] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [01:21:35] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.142090/1.95, alarm hl:np_load_avg=1.927246/2.0, alarm hl:mem_free=220.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.142090/2.3, alarm hl:np_load_long=1.897461/2.5, alarm hl:cpu=99.800000/98, alarm hl:mem_free=220.000000M/150M, alarm hl:available=1/0 [01:31:34] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [01:31:35] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 20558.000000 [01:32:06] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 442591 MB (8% inode=43%): [01:32:35] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [01:32:45] Load avg. on willow is WARNING: WARNING - load average: 13.75, 15.18, 14.95 [01:33:47] Load avg. on willow is OK: OK - load average: 12.29, 14.52, 14.73 [01:49:26] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [01:54:25] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 15026 [02:00:36] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.930176/1.95, alarm hl:np_load_avg=1.938965/2.0, alarm hl:mem_free=141.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.930176/2.3, alarm hl:np_load_long=1.908691/2.5, alarm hl:cpu=100.000000/98, alarm hl:mem_free=141.000000M/150M, alarm hl:available=1/0 [02:04:01] Is Nosy around? [02:18:45] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [02:20:36] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:20:36] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [02:20:45] Load avg. on willow is WARNING: WARNING - load average: 20.70, 17.14, 16.82 [02:31:35] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [02:31:35] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 5113.000000 [02:32:06] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 441534 MB (8% inode=42%): [02:36:26] MySQL slave on rosemary is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3480 [02:36:46] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3366.000000 [02:39:45] Load avg. on willow is OK: OK - load average: 10.76, 13.23, 14.83 [02:43:26] MySQL slave on rosemary is OK: Uptime: 386762 Threads: 4 Questions: 263223494 Slow queries: 312 Opens: 2308 Flush tables: 1 Open tables: 316 Queries per second avg: 680.582 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1609 [02:43:47] s1 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1516.000000 [02:44:25] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.089844/1.00, alarm hl:np_load_long=0.727539/1.50, alarm hl:mem_free=18516.000000M/350M, alarm hl:available=1/0 [02:46:26] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [02:48:36] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [02:50:26] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [02:52:06] Don't think so. [02:52:11] She's usually on as nosy. [02:52:47] * Dispenser wonders what whym is doing to constitute 26 CPU with emacs [02:52:56] CPU hours* [02:53:35] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.458008/1.95, alarm hl:np_load_avg=1.612305/2.0, alarm hl:mem_free=289.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.458008/2.3, alarm hl:np_load_long=1.749024/2.5, alarm hl:cpu=98.500000/98, alarm hl:mem_free=289.000000M/150M, alarm hl:available=1/0 [02:55:47] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:55:56] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:55:56] / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:55:56] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:55:56] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:55:56] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:55:56] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:56:15] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [02:56:25] / on z-dat-s3-a is OK: DISK OK - free space: / 8484 MB (28% inode=85%): [02:56:25] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 2546 MB (99% inode=99%): [02:56:25] /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 179493 MB (18% inode=99%): [02:56:46] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [02:56:46] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [02:56:46] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [03:02:48] Load avg. on willow is WARNING: WARNING - load average: 16.20, 15.34, 14.42 [03:13:26] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.334961/1.10, alarm hl:np_load_long=0.808594/1.55, alarm hl:mem_free=18457.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.334961/1.00, alarm hl:np_load_long=0.808594/1.50, alarm hl:mem_free=18457.000000M/350M, alarm hl:available=1/0 [03:15:25] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [03:18:55] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [03:20:35] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:20:36] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [03:28:17] SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:28:47] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:29:05] SMTP on z-dat-s6-a is OK: SMTP OK - 0.112 sec. response time [03:29:14] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [03:31:47] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [03:32:07] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 440649 MB (8% inode=42%): [03:33:17] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:33:46] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:34:05] SMTP on hyacinth is OK: SMTP OK - 0.005 sec. response time [03:34:36] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [03:37:35] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.790039/1.95, alarm hl:np_load_avg=1.845703/2.0, alarm hl:mem_free=214.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.790039/2.3, alarm hl:np_load_long=2.002930/2.5, alarm hl:cpu=100.000000/98, alarm hl:mem_free=214.000000M/150M, alarm hl:available=1/0 [03:42:15] Sun Grid Engine execd on wolfsbane is WARNING: short-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.284668/1.10, alarm hl:np_load_long=0.268555/1.55, alarm hl:mem_free=221.000000M/300M, alarm hl:available=1/0: medium-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.284668/1.00, alarm hl:np_load_long=0.268555/1.50, alarm hl:mem_free=221.000000M/350M, alarm hl:available=1/0 [03:43:36] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.222656/1.10, alarm hl:np_load_long=0.870117/1.55, alarm hl:mem_free=18183.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.222656/1.00, alarm hl:np_load_long=0.870117/1.50, alarm hl:mem_free=18183.000000M/350M, alarm hl:available=1/0 [03:44:36] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [03:50:16] /tmp on wolfsbane is WARNING: DISK WARNING - free space: /tmp 190 MB (16% inode=99%): [03:50:26] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [03:51:15] /tmp on wolfsbane is OK: DISK OK - free space: /tmp 269 MB (21% inode=99%): [03:54:16] /tmp on wolfsbane is WARNING: DISK WARNING - free space: /tmp 167 MB (14% inode=99%): [03:54:25] Sun Grid Engine execd on wolfsbane is OK: short-sol@wolfsbane OK: medium-sol@wolfsbane OK [04:03:26] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in error state: QERROR as result of job 1909016s failure: medium-sol@wolfsbane in error state: QERROR as result of job 1909016s failure [04:03:35] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.155274/1.10, alarm hl:np_load_long=0.872070/1.55, alarm hl:mem_free=18090.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.155274/1.00, alarm hl:np_load_long=0.872070/1.50, alarm hl:mem_free=18090.000000M/350M, alarm hl:available=1/0 [04:03:46] Load avg. on willow is WARNING: WARNING - load average: 15.77, 17.09, 16.66 [04:15:35] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [04:18:55] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [04:19:36] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.130859/1.10, alarm hl:np_load_long=1.039062/1.55, alarm hl:mem_free=16595.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.130859/1.00, alarm hl:np_load_long=1.039062/1.50, alarm hl:mem_free=16595.000000M/350M, alarm hl:available=1/0 [04:21:36] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:21:36] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [04:28:45] Load avg. on willow is OK: OK - load average: 13.29, 13.77, 14.88 [04:32:45] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [04:32:46] Load avg. on willow is WARNING: WARNING - load average: 14.58, 16.48, 15.90 [04:33:05] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 440601 MB (8% inode=42%): [04:37:46] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.858887/1.95, alarm hl:np_load_avg=1.939453/2.0, alarm hl:mem_free=126.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.858887/2.3, alarm hl:np_load_long=1.946777/2.5, alarm hl:cpu=100.000000/98, alarm hl:mem_free=126.000000M/150M, alarm hl:available=1/0 [04:42:36] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.370117/1.10, alarm hl:np_load_long=1.318359/1.55, alarm hl:mem_free=15030.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.370117/1.00, alarm hl:np_load_long=1.318359/1.50, alarm hl:mem_free=15030.000000M/350M, alarm hl:available=1/0 [04:50:25] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [04:55:36] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [04:58:36] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.350586/1.10, alarm hl:np_load_long=1.443359/1.55, alarm hl:mem_free=15806.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.350586/1.00, alarm hl:np_load_long=1.443359/1.50, alarm hl:mem_free=15806.000000M/350M, alarm hl:available=1/0 [05:00:35] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [05:00:46] Load avg. on willow is CRITICAL: CRITICAL - load average: 31.47, 19.70, 16.86 [05:01:47] Load avg. on willow is WARNING: WARNING - load average: 22.04, 19.14, 16.84 [05:03:26] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in error state: QERROR as result of job 1909016s failure: medium-sol@wolfsbane in error state: QERROR as result of job 1909016s failure [05:18:05] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:19:05] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:19:45] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.750977/1.95, alarm hl:np_load_avg=2.446777/2.0, alarm hl:mem_free=35.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.750977/2.3, alarm hl:np_load_long=2.232910/2.5, alarm hl:cpu=99.400000/98, alarm hl:mem_free=35.000000M/150M, alarm hl:available=1/0 [05:19:55] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [05:21:35] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:21:35] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [05:32:46] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [05:33:15] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 440550 MB (8% inode=42%): [05:39:45] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:50:25] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [05:50:46] Load avg. on willow is WARNING: WARNING - load average: 16.31, 15.36, 16.59 [06:01:04] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:02:05] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:02:17] Load avg. on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:02:55] Environment IPMI on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:03:06] Load avg. on willow is WARNING: WARNING - load average: 18.17, 16.20, 16.02 [06:03:06] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:03:46] Load avg. on willow is CRITICAL: CRITICAL - load average: 54.44, 26.21, 19.54 [06:04:25] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in error state: QERROR as result of job 1909016s failure: medium-sol@wolfsbane in error state: QERROR as result of job 1909016s failure [06:04:35] Environment IPMI on willow is OK: ok: temperature ok fan ok voltage ok chassis ok [06:05:05] SMF on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:05:47] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [06:19:55] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [06:22:35] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:22:35] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.013672/1.95, alarm hl:np_load_avg=2.424805/2.0, alarm hl:mem_free=168.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.013672/2.3, alarm hl:np_load_long=2.563477/2.5, alarm hl:cpu=100.000000/98, alarm hl:mem_free=168.000000M/150M, alarm hl:available=1/0 [06:22:35] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [06:28:55] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:29:35] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:29:35] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:29:54] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:30:35] MySQL slave on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [06:30:35] MySQL on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [06:30:35] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [06:30:45] MySQL slave on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [06:30:45] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [06:30:45] MySQL on z-dat-s3-a is OK: Uptime: 4136426 Threads: 27 Questions: 4714647315 Slow queries: 230761 Opens: 34860071 Flush tables: 1 Open tables: 16384 Queries per second avg: 1139.787 [06:30:45] MySQL on z-dat-s6-a is OK: Uptime: 1077005 Threads: 11 Questions: 261551383 Slow queries: 62719 Opens: 2687329 Flush tables: 2 Open tables: 2874 Queries per second avg: 242.850 [06:30:45] MySQL slave on z-dat-s6-a is OK: Uptime: 1077005 Threads: 11 Questions: 261551383 Slow queries: 62719 Opens: 2687329 Flush tables: 2 Open tables: 2874 Queries per second avg: 242.850 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 259 [06:30:45] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [06:30:46] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [06:30:55] MySQL slave on z-dat-s7-a is OK: Uptime: 4568857 Threads: 11 Questions: 1056913149 Slow queries: 132633 Opens: 8395914 Flush tables: 1 Open tables: 7413 Queries per second avg: 231.329 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 238 [06:30:55] MySQL slave on z-dat-s3-a is OK: Uptime: 4136436 Threads: 23 Questions: 4714654850 Slow queries: 230761 Opens: 34860088 Flush tables: 1 Open tables: 16384 Queries per second avg: 1139.786 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 246 [06:31:05] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:31:25] SMTP on z-dat-s7-a is OK: SMTP OK - 0.050 sec. response time [06:31:25] SMTP on z-dat-s3-a is OK: SMTP OK - 0.056 sec. response time [06:31:45] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.968262/1.95, alarm hl:np_load_avg=2.424316/2.0, alarm hl:mem_free=31.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.968262/2.3, alarm hl:np_load_long=2.528809/2.5, alarm hl:cpu=93.800000/98, alarm hl:mem_free=31.000000M/150M, alarm hl:available=1/0 [06:31:45] Load avg. on willow is WARNING: WARNING - load average: 15.49, 18.45, 19.82 [06:33:15] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 440453 MB (8% inode=42%): [06:34:16] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [06:40:55] Load avg. on willow is CRITICAL: CRITICAL - load average: 42.53, 25.55, 21.40 [06:50:35] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [06:55:36] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:55:55] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:56] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:56] / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:05] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:56:05] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:56:05] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:15] Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:27] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [06:56:28] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:56] Load avg. on willow is WARNING: WARNING - load average: 12.02, 16.52, 19.52 [06:56:56] /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:56] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:57] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:57] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:57] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:57:06] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [06:57:06] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:57:06] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:57:06] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:57:15] MySQL slave on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [06:57:15] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:57:15] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:57:15] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:57:16] SMF on z-dat-s3-a is OK: OK - all services online [06:57:27] MySQL on z-dat-s3-a is OK: Uptime: 4138025 Threads: 26 Questions: 4715262784 Slow queries: 231012 Opens: 34860802 Flush tables: 1 Open tables: 16384 Queries per second avg: 1139.495 [06:57:27] MySQL on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [06:57:27] MySQL slave on z-dat-s7-a is OK: Uptime: 4570448 Threads: 14 Questions: 1057016730 Slow queries: 132759 Opens: 8401576 Flush tables: 1 Open tables: 7414 Queries per second avg: 231.272 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 396 [06:57:28] SMTP on z-dat-s3-a is OK: SMTP OK - 0.003 sec. response time [06:57:28] /tmp on z-dat-s7-a is OK: DISK OK - free space: /tmp 2058 MB (99% inode=99%): [06:57:28] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 2058 MB (99% inode=99%): [06:57:28] Load avg. on z-dat-s7-a is OK: OK - load average: 0.46, 0.96, 1.52 [06:57:29] Load avg. on z-dat-s4-a is OK: OK - load average: 0.46, 0.96, 1.52 [06:57:29] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 98848 MB (24% inode=99%): [06:57:30] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 2058 MB (99% inode=99%): [06:57:30] / on z-dat-s3-a is OK: DISK OK - free space: / 8487 MB (28% inode=85%): [06:57:31] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [06:57:36] MySQL on z-dat-s7-a is OK: Uptime: 4570460 Threads: 10 Questions: 1057017891 Slow queries: 132763 Opens: 8401651 Flush tables: 1 Open tables: 7414 Queries per second avg: 231.271 [06:57:36] SMF on z-dat-s7-a is OK: OK - all services online [06:57:36] /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 178091 MB (18% inode=99%): [06:57:46] MySQL slave on z-dat-s3-a is OK: Uptime: 4138047 Threads: 22 Questions: 4715273501 Slow queries: 231017 Opens: 34860854 Flush tables: 1 Open tables: 16384 Queries per second avg: 1139.492 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 445 [06:57:46] / on z-dat-s6-a is OK: DISK OK - free space: / 8487 MB (28% inode=85%): [06:57:46] Load avg. on z-dat-s3-a is OK: OK - load average: 1.10, 1.07, 1.55 [06:57:46] Load avg. on z-dat-s6-a is OK: OK - load average: 1.17, 1.09, 1.55 [06:57:55] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [06:57:55] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [06:57:55] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [06:57:55] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [06:58:04] SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [07:01:05] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:01:26] Load avg. on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:02:06] Load avg. on willow is WARNING: WARNING - load average: 21.64, 18.39, 19.30 [07:02:37] SSH on willow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:02:56] Cluster on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:02:56] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:02:56] Environment IPMI on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:03:05] SMTP on willow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:03:05] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:03:14] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:03:15] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:03:28] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:03:28] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:03:28] SSH on willow is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [07:03:36] Cluster on willow is OK: CLUSTER OK ! [07:03:37] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:03:56] SMF on z-dat-s6-a is OK: OK - all services online [07:04:05] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [07:04:05] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.346680/1.95, alarm hl:np_load_avg=2.292480/2.0, alarm hl:mem_free=31.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.346680/2.3, alarm hl:np_load_long=2.400879/2.5, alarm hl:cpu=98.700000/98, alarm hl:mem_free=31.000000M/150M, alarm hl:available=1/0 [07:04:15] SMTP on z-dat-s4-a is OK: SMTP OK - 0.122 sec. response time [07:04:27] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in error state: QERROR as result of job 1909016s failure: medium-sol@wolfsbane in error state: QERROR as result of job 1909016s failure [07:06:05] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:06:15] SMF on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:06:56] SMTP on willow is OK: SMTP OK - 0.022 sec. response time [07:06:56] Environment IPMI on willow is OK: ok: temperature ok fan ok voltage ok chassis ok [07:17:05] SMF on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [07:17:55] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [07:19:04] SMTP on willow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:19:35] SSH on willow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:19:55] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [07:19:55] Environment IPMI on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:21:55] Load avg. on willow is CRITICAL: CRITICAL - load average: 23.53, 30.58, 29.23 [07:23:36] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:23:36] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [07:27:55] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:29:05] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:29:35] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [07:29:54] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [07:31:44] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.831055/1.95, alarm hl:np_load_avg=2.152832/2.0, alarm hl:mem_free=52.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.831055/2.3, alarm hl:np_load_long=2.798828/2.5, alarm hl:cpu=94.300000/98, alarm hl:mem_free=52.000000M/150M, alarm hl:available=1/0 [07:33:26] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 440382 MB (8% inode=42%): [07:35:55] Load avg. on willow is WARNING: WARNING - load average: 12.10, 14.20, 19.78 [07:45:36] FMA on willow is CRITICAL: ERROR - unexpected output from snmpwalk [07:46:04] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:46:27] Load avg. on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:46:36] FMA on willow is OK: OK [07:46:55] Environment IPMI on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:47:14] Load avg. on willow is WARNING: WARNING - load average: 18.14, 16.61, 17.98 [07:47:55] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=4.361328/1.95, alarm hl:np_load_avg=2.687500/2.0, alarm hl:mem_free=31.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=4.361328/2.3, alarm hl:np_load_long=2.457520/2.5, alarm hl:cpu=98.400000/98, alarm hl:mem_free=31.000000M/150M, alarm hl:available=1/0 [07:48:27] Sun Grid Engine execd on wolfsbane is OK: short-sol@wolfsbane OK: medium-sol@wolfsbane OK [07:50:36] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [07:52:05] SMTP on willow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:52:46] Environment IPMI on willow is OK: ok: temperature ok fan ok voltage ok chassis ok [07:52:55] SMTP on willow is OK: SMTP OK - 0.038 sec. response time [07:57:36] FMA on willow is CRITICAL: ERROR - unexpected output from snmpwalk [08:00:05] SMF on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:01:05] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:01:14] SMF on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:02:04] SMTP on willow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:02:55] Environment IPMI on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:09:55] Load avg. on willow is CRITICAL: CRITICAL - load average: 12.13, 21.43, 23.75 [08:14:36] SSH on willow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:14:56] / on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:14:56] Cluster on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:16:28] / on willow is OK: DISK OK - free space: / 24620 MB (23% inode=99%): [08:16:28] Cluster on willow is OK: CLUSTER OK ! [08:16:28] SSH on willow is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [08:16:56] 3(commented) [MNT-1183] Hyacinth crashed <10https://jira.toolserver.org/browse/MNT-1183> (Marlen Caemmerer) [08:19:56] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [08:24:36] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.015625/1.00, alarm hl:np_load_long=0.878906/1.50, alarm hl:mem_free=18547.000000M/350M, alarm hl:available=1/0 [08:24:36] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:24:36] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [08:25:36] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [08:33:27] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 440322 MB (8% inode=42%): [08:34:15] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:35:05] Load avg. on willow is CRITICAL: CRITICAL - load average: 10.01, 21.80, 25.93 [08:45:55] Load avg. on willow is WARNING: WARNING - load average: 13.71, 14.38, 19.86 [08:50:36] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [08:54:56] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.462891/1.95, alarm hl:np_load_avg=1.613770/2.0, alarm hl:mem_free=59.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.462891/2.3, alarm hl:np_load_long=2.099121/2.5, alarm hl:cpu=93.800000/98, alarm hl:mem_free=59.000000M/150M, alarm hl:available=1/0 [08:56:05] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:56:55] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [09:00:56] Load avg. on willow is OK: OK - load average: 10.12, 11.40, 14.84 [09:02:15] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:02:15] SMF on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:03:35] SSH on willow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:04:05] Load avg. on willow is WARNING: WARNING - load average: 13.12, 13.20, 15.02 [09:04:05] Cluster on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:04:05] Environment IPMI on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:04:05] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.644531/1.95, alarm hl:np_load_avg=1.651367/2.0, alarm hl:mem_free=70.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.644531/2.3, alarm hl:np_load_long=1.877930/2.5, alarm hl:cpu=94.600000/98, alarm hl:mem_free=70.000000M/150M, alarm hl:available=1/0 [09:04:25] SSH on willow is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:04:35] Cluster on willow is OK: CLUSTER OK ! [09:04:45] Environment IPMI on willow is OK: ok: temperature ok fan ok voltage ok chassis ok [09:12:46] SSH on willow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:14:05] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:14:15] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:14:27] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:14:36] FMA on willow is CRITICAL: ERROR - unexpected output from snmpwalk [09:15:05] MySQL on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [09:15:06] MySQL slave on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [09:15:07] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:15:27] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [09:15:36] FMA on willow is OK: OK [09:15:36] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:15:56] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [09:16:05] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:16:36] s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [09:16:56] / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:16:56] / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:16:56] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:16:56] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:16:56] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:16:56] MySQL on z-dat-s6-a is OK: Uptime: 1086980 Threads: 25 Questions: 263694101 Slow queries: 63549 Opens: 2720443 Flush tables: 2 Open tables: 2836 Queries per second avg: 242.593 [09:16:56] MySQL slave on z-dat-s6-a is OK: Uptime: 1086980 Threads: 25 Questions: 263694102 Slow queries: 63549 Opens: 2720443 Flush tables: 2 Open tables: 2836 Queries per second avg: 242.593 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 323 [09:16:57] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:16:57] SMF on z-dat-s7-a is OK: OK - all services online [09:16:58] s4 replag on z-dat-s4-a is OK: QUERY OK: SELECT ts_rc_age() returned 246.000000 [09:16:58] MySQL on z-dat-s3-a is OK: Uptime: 4146404 Threads: 23 Questions: 4723295505 Slow queries: 231891 Opens: 34915060 Flush tables: 1 Open tables: 16384 Queries per second avg: 1139.130 [09:17:15] MySQL slave on z-dat-s3-a is OK: Uptime: 4146415 Threads: 11 Questions: 4723304145 Slow queries: 231896 Opens: 34915164 Flush tables: 1 Open tables: 16383 Queries per second avg: 1139.129 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 465 [09:17:15] SMTP on z-dat-s4-a is OK: SMTP OK - 0.004 sec. response time [09:17:26] / on z-dat-s7-a is OK: DISK OK - free space: / 8486 MB (28% inode=85%): [09:17:26] / on z-dat-s4-a is OK: DISK OK - free space: / 8486 MB (28% inode=85%): [09:17:26] /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 178005 MB (18% inode=99%): [09:17:26] /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 103679 MB (25% inode=99%): [09:17:26] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 2143 MB (99% inode=99%): [09:17:26] SMTP on z-dat-s3-a is OK: SMTP OK - 0.008 sec. response time [09:17:35] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [09:19:55] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [09:23:05] Environment IPMI on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:24:06] Cluster on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:24:46] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:24:46] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [09:34:26] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 440226 MB (8% inode=42%): [09:44:05] SMTP on willow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:49:55] SMTP on willow is OK: SMTP OK - 0.044 sec. response time [09:50:36] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [09:52:46] SMF on willow is OK: OK - all services online [09:55:15] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:55:26] SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:55:46] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [09:56:05] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:56:05] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:56:06] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:56:06] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:56:06] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:56:25] SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:56:25] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:56:26] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:56:46] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:56:56] Load avg. on willow is CRITICAL: CRITICAL - load average: 13.84, 27.81, 27.82 [09:56:56] / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:56:56] / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:56:56] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:57:05] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:57:06] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:57:06] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [09:57:06] s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [09:57:06] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:57:06] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:57:06] / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:57:06] /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:57:07] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:57:07] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:57:08] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:57:08] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:57:09] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:57:15] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:57:15] SMF on z-dat-s6-a is OK: OK - all services online [09:57:15] SMF on z-dat-s3-a is OK: OK - all services online [09:57:15] SMF on z-dat-s4-a is OK: OK - all services online [09:57:15] SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:57:16] s4 replag on z-dat-s4-a is OK: QUERY OK: SELECT ts_rc_age() returned 249.000000 [09:57:16] SMTP on z-dat-s6-a is OK: SMTP OK - 0.002 sec. response time [09:57:37] /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 103650 MB (25% inode=99%): [09:57:37] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [09:57:37] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 2060 MB (99% inode=99%): [09:57:37] SMTP on z-dat-s3-a is OK: SMTP OK - 0.127 sec. response time [09:57:37] Load avg. on z-dat-s4-a is OK: OK - load average: 1.46, 1.43, 1.48 [09:57:37] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 2059 MB (99% inode=99%): [09:57:37] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 2059 MB (99% inode=99%): [09:57:38] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 98778 MB (24% inode=99%): [09:57:38] Load avg. on z-dat-s7-a is OK: OK - load average: 1.49, 1.43, 1.49 [09:57:39] / on z-dat-s3-a is OK: DISK OK - free space: / 8486 MB (28% inode=85%): [09:57:39] /tmp on z-dat-s7-a is OK: DISK OK - free space: /tmp 2060 MB (99% inode=99%): [09:57:40] SMF on z-dat-s7-a is OK: OK - all services online [09:57:56] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:57:56] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:57:56] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:57:56] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [10:08:47] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.281738/1.95, alarm hl:np_load_avg=1.792480/2.0, alarm hl:mem_free=276.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.281738/2.3, alarm hl:np_load_long=2.517090/2.5, alarm hl:cpu=76.200000/98, alarm hl:mem_free=276.000000M/150M, alarm hl:available=1/0 [10:08:55] Load avg. on willow is WARNING: WARNING - load average: 9.32, 13.78, 19.77 [10:09:56] 3(commented) [MNT-1227] Re-Import of enwiki <10https://jira.toolserver.org/browse/MNT-1227> (Marlen Caemmerer) [10:16:47] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [10:19:55] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [10:20:55] Load avg. on willow is OK: OK - load average: 9.50, 10.60, 14.79 [10:24:47] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:24:47] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:24:47] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [10:29:15] whym_away: had to kill your emacs on willow - sorry [10:29:15] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [10:30:41] nosy: no problem, was it occupying the CPU? [10:30:50] no - ram [10:31:03] ah, cpu a little yes but that was not the main problem [10:31:16] it took 500 mb ram of 8 gb [10:34:26] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 440166 MB (8% inode=42%): [10:43:04] nosy: hmm that was bad of me, i'll see if i can fix it [10:43:34] whym_away: yes a bit - you fix emacs? ;) [10:44:07] no, possibly some of the elisp programs that I run on it [10:50:25] /tmp on wolfsbane is CRITICAL: DISK CRITICAL - free space: /tmp 81 MB (7% inode=99%): [10:50:47] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [10:51:25] /tmp on wolfsbane is OK: DISK OK - free space: /tmp 343 MB (26% inode=99%): [10:55:48] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [10:58:05] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [10:58:14] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:58:15] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:58:37] /tmp on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:58:48] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.243164/1.10, alarm hl:np_load_long=0.693360/1.55, alarm hl:mem_free=18744.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.243164/1.00, alarm hl:np_load_long=0.693360/1.50, alarm hl:mem_free=18744.000000M/350M, alarm hl:available=1/0 [10:58:48] Sun Grid Engine execd on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:58:48] Load avg. on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:59:05] SMF on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:59:47] Load avg. on wolfsbane is OK: OK - load average: 1.45, 1.71, 1.72 [11:00:05] SMF on wolfsbane is OK: OK - all services online [11:00:36] / on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:00:37] Environment IPMI on wolfsbane is CRITICAL: NRPE: Unable to read output [11:00:37] Cluster on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:00:47] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [11:01:26] / on wolfsbane is OK: DISK OK - free space: / 10635 MB (35% inode=93%): [11:01:37] Environment IPMI on wolfsbane is OK: ok: temperature ok fan ok voltage ok chassis ok [11:01:37] Cluster on wolfsbane is OK: CLUSTER OK ! [11:01:48] Sun Grid Engine execd on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:03:46] Load avg. on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:05:35] Cluster on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:05:35] / on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:05:35] Environment IPMI on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:05:35] SSH on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:07:25] SSH on wolfsbane is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:10:46] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [11:13:04] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.009 second response time [11:13:04] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.004 second response time [11:13:47] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.318848/1.95, alarm hl:np_load_avg=1.337891/2.0, alarm hl:mem_free=315.000000M/350M, alarm hl:available=1/0 [11:13:47] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.216797/1.10, alarm hl:np_load_long=1.140625/1.55, alarm hl:mem_free=19608.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.216797/1.00, alarm hl:np_load_long=1.140625/1.50, alarm hl:mem_free=19608.000000M/350M, alarm hl:available=1/0 [11:14:46] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [11:20:05] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [11:25:04] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [11:25:48] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:25:48] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [11:30:46] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.520508/1.95, alarm hl:np_load_avg=1.470215/2.0, alarm hl:mem_free=172.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.520508/2.3, alarm hl:np_load_long=1.276855/2.5, alarm hl:cpu=94.600000/98, alarm hl:mem_free=172.000000M/150M, alarm hl:available=1/0 [11:31:46] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [11:34:25] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 436534 MB (8% inode=42%): [11:34:48] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [11:50:58] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [11:56:46] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [11:59:01] Hi all. how can i edir mt email on toolserver? [11:59:09] edit* [11:59:50] https://wiki.toolserver.org/view/Email_forwarding [12:01:30] Thanks you nosy ;) [12:01:57] Darafsh: you just sent me an email asking about interwiki [12:02:07] i am afraid i cannot answer the question [12:02:28] but the fact is that there were many processes that were days old [12:02:53] i dont understood! [12:02:58] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.769531/1.95, alarm hl:np_load_avg=1.527344/2.0, alarm hl:mem_free=281.000000M/350M, alarm hl:available=1/0 [12:03:43] do you need the interwiki process all the time and do you need multiple instances? [12:04:31] yes [12:04:51] oh i see [12:05:33] merlissimo just told me it was very hard to memory optimize the bots especially when it comes to python [12:05:56] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [12:06:08] but he changed variable types from integer to byte where byte is enough and this had significant impact [12:07:11] nosy: i am using java not python. python is a script language without forced types [12:07:32] merlissimo: thanks for clearing this up [12:08:56] 3(updated) [DBQ-183] SQL Query Issue <10https://jira.toolserver.org/browse/DBQ-183> (Franz Herbach) [12:08:57] 3(updated) [DBQ-183] SQL Query Issue <10https://jira.toolserver.org/browse/DBQ-183> (Franz Herbach) [12:10:50] 3(updated) [DBQ-183] SQL Query Issue <10https://jira.toolserver.org/browse/DBQ-183> (Franz Herbach) [12:20:05] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [12:21:59] SMF on wolfsbane is OK: OK - all services online [12:22:46] Sun Grid Engine execd on wolfsbane is OK: short-sol@wolfsbane OK: medium-sol@wolfsbane OK [12:25:56] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:25:56] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [12:32:59] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.584961/1.95, alarm hl:np_load_avg=1.583008/2.0, alarm hl:mem_free=171.000000M/350M, alarm hl:available=1/0 [12:34:25] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 439955 MB (8% inode=42%): [12:41:56] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [12:51:56] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [12:56:47] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [13:01:56] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.343750/1.95, alarm hl:np_load_avg=1.771973/2.0, alarm hl:mem_free=222.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.343750/2.3, alarm hl:np_load_long=1.513184/2.5, alarm hl:cpu=95.600000/98, alarm hl:mem_free=222.000000M/150M, alarm hl:available=1/0 [13:04:56] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [13:21:05] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [13:25:56] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:25:56] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [13:28:39] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=3.543945/1.10, alarm hl:np_load_long=1.346680/1.55, alarm hl:mem_free=17913.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=3.543945/1.00, alarm hl:np_load_long=1.346680/1.50, alarm hl:mem_free=17913.000000M/350M, alarm hl:available=1/0 [13:30:55] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [13:35:25] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 439821 MB (8% inode=42%): [13:43:56] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.348633/1.10, alarm hl:np_load_long=1.116211/1.55, alarm hl:mem_free=17286.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.348633/1.00, alarm hl:np_load_long=1.116211/1.50, alarm hl:mem_free=17286.000000M/350M, alarm hl:available=1/0 [13:52:55] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [13:55:07] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:07] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:07] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:25] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:35] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:55:35] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:56:05] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [13:56:05] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [13:56:05] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [13:56:06] SMF on z-dat-s3-a is OK: OK - all services online [13:56:15] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [13:56:15] SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [13:56:56] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [14:04:25] SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:04:36] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:04:36] Environment IPMI on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:04:56] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.370605/1.95, alarm hl:np_load_avg=1.520508/2.0, alarm hl:mem_free=206.000000M/350M, alarm hl:available=1/0 [14:05:06] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:05:06] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:05:06] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:05:06] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:05:25] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:05:26] SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:05:57] s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [14:06:06] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [14:06:07] SMF on hyacinth is OK: OK - all services online [14:06:07] s4 replag on z-dat-s4-a is OK: QUERY OK: SELECT ts_rc_age() returned 219.000000 [14:06:15] MySQL on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [14:06:15] MySQL slave on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [14:06:15] Environment IPMI on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [14:06:16] SMTP on z-dat-s6-a is OK: SMTP OK - 0.003 sec. response time [14:06:24] MySQL on z-dat-s7-a is OK: Uptime: 4596187 Threads: 21 Questions: 1068964419 Slow queries: 133652 Opens: 8557151 Flush tables: 1 Open tables: 7317 Queries per second avg: 232.576 [14:06:24] MySQL slave on z-dat-s7-a is OK: Uptime: 4596187 Threads: 22 Questions: 1068964423 Slow queries: 133652 Opens: 8557151 Flush tables: 1 Open tables: 7317 Queries per second avg: 232.576 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 225 [14:06:25] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [14:06:35] MySQL slave on z-dat-s3-a is OK: Uptime: 4163775 Threads: 19 Questions: 4736886434 Slow queries: 232934 Opens: 35004445 Flush tables: 1 Open tables: 16384 Queries per second avg: 1137.642 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 206 [14:06:55] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [14:21:55] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [14:26:55] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [14:26:56] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [14:35:25] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 439694 MB (8% inode=42%): [14:42:25] SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:42:35] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:42:45] Environment IPMI on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:42:55] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:43:15] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [14:43:15] SMTP on z-dat-s6-a is OK: SMTP OK - 0.002 sec. response time [14:43:25] Environment IPMI on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [14:43:45] SMTP on z-dat-s3-a is OK: SMTP OK - 0.003 sec. response time [14:52:55] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [14:56:57] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [15:03:05] Load avg. on willow is WARNING: WARNING - load average: 15.52, 16.22, 13.48 [15:07:15] Load avg. on willow is OK: OK - load average: 12.26, 14.45, 13.40 [15:12:05] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.696289/1.95, alarm hl:np_load_avg=1.863770/2.0, alarm hl:mem_free=239.000000M/350M, alarm hl:available=1/0 [15:12:45] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:13:19] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:13:19] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:13:26] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:13:46] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:13:55] s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [15:14:16] s4 replag on z-dat-s4-a is OK: QUERY OK: SELECT ts_rc_age() returned 221.000000 [15:14:16] SMTP on hyacinth is OK: SMTP OK - 0.002 sec. response time [15:14:25] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [15:14:35] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [15:14:36] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [15:14:36] MySQL on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [15:14:36] MySQL slave on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [15:14:36] SMTP on z-dat-s7-a is OK: SMTP OK - 0.058 sec. response time [15:14:46] MySQL slave on z-dat-s3-a is OK: Uptime: 4167866 Threads: 18 Questions: 4743290547 Slow queries: 233561 Opens: 35049201 Flush tables: 1 Open tables: 16384 Queries per second avg: 1138.62 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 175 [15:14:46] MySQL on z-dat-s3-a is OK: Uptime: 4167866 Threads: 18 Questions: 4743290549 Slow queries: 233561 Opens: 35049201 Flush tables: 1 Open tables: 16384 Queries per second avg: 1138.62 [15:14:46] MySQL on z-dat-s6-a is OK: Uptime: 1108445 Threads: 9 Questions: 270073502 Slow queries: 65039 Opens: 2726788 Flush tables: 2 Open tables: 2813 Queries per second avg: 243.650 [15:14:46] MySQL slave on z-dat-s6-a is OK: Uptime: 1108445 Threads: 9 Questions: 270073506 Slow queries: 65039 Opens: 2726788 Flush tables: 2 Open tables: 2813 Queries per second avg: 243.650 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 145 [15:15:05] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [15:15:06] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [15:21:25] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [15:27:05] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:27:19] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [15:30:16] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [15:34:09] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.185059/1.95, alarm hl:np_load_avg=1.334961/2.0, alarm hl:mem_free=322.000000M/350M, alarm hl:available=1/0 [15:35:39] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 438901 MB (8% inode=42%): [15:37:08] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [15:42:28] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:42:48] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:42:48] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:42:57] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:43:09] / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:09] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:18] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:18] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:43:19] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:19] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:19] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:19] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:19] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:43:19] / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:19] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:43:28] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:28] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:39] Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:39] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:39] SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:39] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:39] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:39] SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:39] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:43:55] 3(created) [UTRS-102] Add link to [[Special:BlockList]] in account links; UTRS; Improvement <10https://jira.toolserver.org/browse/UTRS-102> (T. Canens) [15:43:59] MySQL slave on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [15:44:09] MySQL slave on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [15:44:09] MySQL on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [15:44:09] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [15:44:09] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [15:44:19] MySQL slave on z-dat-s6-a is OK: Uptime: 1110219 Threads: 14 Questions: 270320518 Slow queries: 65123 Opens: 2726795 Flush tables: 2 Open tables: 2814 Queries per second avg: 243.483 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 247 [15:44:19] MySQL on z-dat-s6-a is OK: Uptime: 1110219 Threads: 14 Questions: 270320520 Slow queries: 65123 Opens: 2726795 Flush tables: 2 Open tables: 2814 Queries per second avg: 243.483 [15:44:28] s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [15:44:28] SMF on z-dat-s6-a is OK: OK - all services online [15:44:37] SMF on z-dat-s7-a is OK: OK - all services online [15:44:37] SMF on z-dat-s4-a is OK: OK - all services online [15:44:38] SMF on hyacinth is OK: OK - all services online [15:44:38] MySQL slave on z-dat-s4-a is OK: Uptime: 4078546 Threads: 8 Questions: 220533643 Slow queries: 48752 Opens: 84455 Flush tables: 1 Open tables: 608 Queries per second avg: 54.71 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 231 [15:44:38] s4 replag on z-dat-s4-a is OK: QUERY OK: SELECT ts_rc_age() returned 231.000000 [15:44:38] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [15:44:38] / on z-dat-s4-a is OK: DISK OK - free space: / 8484 MB (28% inode=85%): [15:44:39] /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 178288 MB (18% inode=99%): [15:44:39] SMTP on z-dat-s7-a is OK: SMTP OK - 0.002 sec. response time [15:44:48] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 2159 MB (99% inode=99%): [15:44:49] SMTP on z-dat-s3-a is OK: SMTP OK - 0.004 sec. response time [15:44:49] Load avg. on z-dat-s4-a is OK: OK - load average: 0.84, 1.23, 1.74 [15:44:49] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 2159 MB (99% inode=99%): [15:44:49] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 99514 MB (24% inode=99%): [15:44:49] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 2160 MB (99% inode=99%): [15:44:49] MySQL on z-dat-s3-a is OK: Uptime: 4169675 Threads: 17 Questions: 4747083183 Slow queries: 233683 Opens: 35082642 Flush tables: 1 Open tables: 16384 Queries per second avg: 1138.477 [15:44:50] MySQL slave on z-dat-s3-a is OK: Uptime: 4169675 Threads: 18 Questions: 4747083184 Slow queries: 233683 Opens: 35082642 Flush tables: 1 Open tables: 16384 Queries per second avg: 1138.477 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 204 [15:44:50] / on z-dat-s3-a is OK: DISK OK - free space: / 8484 MB (28% inode=85%): [15:44:59] Load avg. on z-dat-s6-a is OK: OK - load average: 1.18, 1.29, 1.76 [15:44:59] / on z-dat-s6-a is OK: DISK OK - free space: / 8484 MB (28% inode=85%): [15:45:08] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [15:45:09] Load avg. on z-dat-s3-a is OK: OK - load average: 1.38, 1.33, 1.77 [15:45:09] /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 178283 MB (18% inode=99%): [15:45:09] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [15:45:09] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [15:45:09] SMF on z-dat-s3-a is OK: OK - all services online [15:45:19] SMTP on z-dat-s4-a is OK: SMTP OK - 0.004 sec. response time [15:53:09] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [15:57:08] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [16:21:27] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [16:27:08] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:27:18] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [16:30:51] tsnag, I know all that stuff is probably important, but I never have the first clue as to what you're complaining about [16:36:20] [[Special:Log/newusers]] create 10 * Anilkoc21 * (New user account) [16:36:38] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 438518 MB (8% inode=42%): [16:44:08] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.905762/1.95, alarm hl:np_load_avg=1.426758/2.0, alarm hl:mem_free=185.000000M/350M, alarm hl:available=1/0 [16:45:18] Load avg. on willow is WARNING: WARNING - load average: 17.09, 12.71, 10.39 [16:50:08] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [16:52:09] Load avg. on willow is OK: OK - load average: 11.64, 14.46, 12.56 [16:53:08] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [16:56:49] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [16:57:09] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [16:57:27] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [16:59:08] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=0.969238/1.95, alarm hl:np_load_avg=1.196289/2.0, alarm hl:mem_free=316.000000M/350M, alarm hl:available=1/0 [17:04:12] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.132812/1.10, alarm hl:np_load_long=0.706055/1.55, alarm hl:mem_free=16418.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.132812/1.00, alarm hl:np_load_long=0.706055/1.50, alarm hl:mem_free=16418.000000M/350M, alarm hl:available=1/0 [17:05:09] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [17:21:49] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [17:28:08] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:28:18] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [17:36:50] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 438424 MB (8% inode=42%): [17:44:49] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:54:09] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [17:57:08] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [17:59:38] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [18:03:08] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.159668/1.95, alarm hl:np_load_avg=1.198242/2.0, alarm hl:mem_free=279.000000M/350M, alarm hl:available=1/0 [18:06:08] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [18:21:49] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [18:28:08] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:28:18] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [18:29:07] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=0.812012/1.95, alarm hl:np_load_avg=0.875488/2.0, alarm hl:mem_free=257.000000M/350M, alarm hl:available=1/0 [18:30:08] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [18:33:21] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=0.960449/1.95, alarm hl:np_load_avg=1.021484/2.0, alarm hl:mem_free=201.000000M/350M, alarm hl:available=1/0 [18:37:49] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 438261 MB (8% inode=42%): [18:53:20] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.068359/1.00, alarm hl:np_load_long=0.721680/1.50, alarm hl:mem_free=17465.000000M/350M, alarm hl:available=1/0 [18:54:21] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [18:54:22] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [18:57:19] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [19:10:56] 3(commented) [UTRS-98] Account creation is impossible <10https://jira.toolserver.org/browse/UTRS-98> (TParis) [19:22:08] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [19:28:19] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:28:20] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [19:38:00] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 438112 MB (8% inode=42%): [19:54:30] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [19:56:58] 3(commented) [UTRS-98] Account creation is impossible <10https://jira.toolserver.org/browse/UTRS-98> (Martijn Hoekstra) [19:57:30] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [20:13:30] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.137695/1.95, alarm hl:np_load_avg=1.064453/2.0, alarm hl:mem_free=192.000000M/350M, alarm hl:available=1/0 [20:16:29] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [20:22:09] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [20:28:29] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [20:28:29] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:38:01] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 437975 MB (8% inode=42%): [20:43:40] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.411133/1.10, alarm hl:np_load_long=0.864258/1.55, alarm hl:mem_free=18030.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.411133/1.00, alarm hl:np_load_long=0.864258/1.50, alarm hl:mem_free=18030.000000M/350M, alarm hl:available=1/0 [20:45:40] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [20:47:40] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=0.950684/1.95, alarm hl:np_load_avg=1.005859/2.0, alarm hl:mem_free=320.000000M/350M, alarm hl:available=1/0 [20:49:40] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [20:52:40] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=0.854492/1.95, alarm hl:np_load_avg=0.926269/2.0, alarm hl:mem_free=223.000000M/350M, alarm hl:available=1/0 [20:54:40] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [20:57:40] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [21:22:19] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [21:28:40] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:28:40] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [21:34:40] Load avg. on willow is WARNING: WARNING - load average: 15.31, 16.48, 12.33 [21:36:41] Load avg. on willow is OK: OK - load average: 8.65, 13.70, 11.79 [21:39:01] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 437871 MB (8% inode=42%): [21:50:00] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1911051s failure: longrun-sol@willow in error state: QERROR as result of job 1911051s failure [21:55:01] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [21:58:02] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [22:22:29] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [22:28:59] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [22:29:00] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:33:00] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.884766/1.10, alarm hl:np_load_long=0.767578/1.55, alarm hl:mem_free=19036.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.884766/1.00, alarm hl:np_load_long=0.767578/1.50, alarm hl:mem_free=19036.000000M/350M, alarm hl:available=1/0 [22:37:59] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [22:37:59] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=0.961426/1.95, alarm hl:np_load_avg=1.004883/2.0, alarm hl:mem_free=186.000000M/350M, alarm hl:available=1/0 [22:39:20] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 437788 MB (8% inode=42%): [22:41:00] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [22:44:00] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.059082/1.95, alarm hl:np_load_avg=1.048828/2.0, alarm hl:mem_free=223.000000M/350M, alarm hl:available=1/0 [22:55:59] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [22:58:59] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [23:09:03] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.305664/1.10, alarm hl:np_load_long=0.851562/1.55, alarm hl:mem_free=19221.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.305664/1.00, alarm hl:np_load_long=0.851562/1.50, alarm hl:mem_free=19221.000000M/350M, alarm hl:available=1/0 [23:12:01] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [23:17:01] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.216797/1.95, alarm hl:np_load_avg=1.356934/2.0, alarm hl:mem_free=135.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.216797/2.3, alarm hl:np_load_long=1.259766/2.5, alarm hl:cpu=68.900000/98, alarm hl:mem_free=135.000000M/150M, alarm hl:available=1/0 [23:19:01] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [23:23:08] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [23:27:46] anyone home [23:29:02] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [23:29:02] APT on yarrow is CRITICAL: APT CRITICAL: 6 packages available for upgrade (6 critical updates). [23:38:02] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.059082/1.95, alarm hl:np_load_avg=1.059570/2.0, alarm hl:mem_free=190.000000M/350M, alarm hl:available=1/0 [23:40:00] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [23:40:20] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 437741 MB (8% inode=42%): [23:41:55] 3(commented) [UTRS-98] Account creation is impossible <10https://jira.toolserver.org/browse/UTRS-98> (TParis) [23:56:02] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [23:59:03] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default