[00:02:45] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=3.796875/1.95, alarm hl:np_load_avg=3.232910/2.0, alarm hl:mem_free=244.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=3.796875/2.3, alarm hl:np_load_long=2.699219/2.5, alarm hl:cpu=99.600000/98, alarm hl:mem_free=244.000000M/200M, alarm hl:available=1/0  
[00:47:27] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[00:47:34] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[00:47:45] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[00:48:14] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[00:48:14] <tsnag>	 /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 295719 MB (5% inode=33%):  
[00:51:14] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 16.34, 18.98, 19.38  
[00:54:56] <tsnag>	 SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[00:55:04] <tsnag>	 SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[00:55:04] <tsnag>	 SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[00:55:04] <tsnag>	 SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[00:55:04] <tsnag>	 SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[00:55:14] <tsnag>	 Environment IPMI on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[00:55:45] <tsnag>	 SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[00:55:55] <tsnag>	 Environment IPMI on hyacinth is OK: ok:  temperature ok fan ok voltage ok chassis ok  
[00:55:56] <tsnag>	 SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[00:55:56] <tsnag>	 SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[00:55:56] <tsnag>	 SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[00:55:56] <tsnag>	 SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[01:01:14] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 41.26, 25.34, 21.12  
[01:02:46] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.617188/1.95, alarm hl:np_load_avg=2.807617/2.0, alarm hl:mem_free=436.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.617188/2.3, alarm hl:np_load_long=2.550781/2.5, alarm hl:cpu=95.800000/98, alarm hl:mem_free=436.000000M/200M, alarm hl:available=1/0  
[01:03:14] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 16.23, 20.92, 19.96  
[01:06:37] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 38.12, 23.13, 20.66  
[01:14:13] <tsnag>	 SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[01:14:13] <tsnag>	 SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[01:41:06] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[01:45:05] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.824707/1.95, alarm hl:np_load_avg=2.144043/2.0, alarm hl:mem_free=593.000000M/350M, alarm hl:available=1/0  
[01:47:35] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[01:47:35] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[01:48:04] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[01:48:34] <tsnag>	 /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 295675 MB (5% inode=33%):  
[01:49:14] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[01:50:52] <msgbot>	 3(updated) [TS-1371] Add user "sactage" to MMT group "cvn" and "swmtbot" <10https://jira.toolserver.org/browse/TS-1371>  (Krinkle)
[01:52:53] <msgbot>	 3(commented) [TS-1371] Add user "sactage" to MMT group "cvn" and "swmtbot" <10https://jira.toolserver.org/browse/TS-1371>  (Krinkle)
[02:00:05] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[02:15:36] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 18.80, 21.12, 20.90  
[02:17:35] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 13.34, 18.15, 19.80  
[02:18:35] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 37.29, 22.60, 21.19  
[02:22:05] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.727051/1.95, alarm hl:np_load_avg=2.332520/2.0, alarm hl:mem_free=556.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.727051/2.3, alarm hl:np_load_long=2.489258/2.5, alarm hl:cpu=99.400000/98, alarm hl:mem_free=556.000000M/200M, alarm hl:available=1/0  
[02:22:34] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 15.00, 18.48, 19.81  
[02:33:14] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.291016/1.10, alarm hl:np_load_long=0.896485/1.55, alarm hl:mem_free=13438.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.291016/1.00, alarm hl:np_load_long=0.896485/1.50, alarm hl:mem_free=13438.000000M/600M, alarm hl:available=1/0  
[02:35:18] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[02:40:16] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.224609/1.10, alarm hl:np_load_long=0.953125/1.55, alarm hl:mem_free=13308.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.224609/1.00, alarm hl:np_load_long=0.953125/1.50, alarm hl:mem_free=13308.000000M/600M, alarm hl:available=1/0  
[02:47:36] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[02:47:36] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[02:48:13] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[02:48:37] <tsnag>	 /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 293369 MB (5% inode=33%):  
[02:49:16] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[02:52:17] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[02:53:17] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[02:55:24] <tsnag>	 SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[02:56:15] <tsnag>	 SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[02:57:17] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.160645/1.95, alarm hl:np_load_avg=2.424316/2.0, alarm hl:mem_free=540.000000M/350M, alarm hl:available=1/0  
[03:01:24] <tsnag>	 Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 2005443s failure: longrun-sol@willow in error state: QERROR as result of job 2005443s failure  
[03:04:15] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.049805/1.00, alarm hl:np_load_long=1.007812/1.50, alarm hl:mem_free=13412.000000M/600M, alarm hl:available=1/0  
[03:06:14] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[03:15:34] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 21.64, 22.73, 22.73  
[03:21:16] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.867676/1.95, alarm hl:np_load_avg=2.479004/2.0, alarm hl:mem_free=517.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.867676/2.3, alarm hl:np_load_long=2.678223/2.5, alarm hl:cpu=84.600000/98, alarm hl:mem_free=517.000000M/200M, alarm hl:available=1/0  
[03:21:16] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.038086/1.00, alarm hl:np_load_long=0.952148/1.50, alarm hl:mem_free=13280.000000M/600M, alarm hl:available=1/0  
[03:29:34] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 13.82, 17.50, 19.75  
[03:30:34] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 43.14, 23.16, 21.50  
[03:46:19] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.036133/1.00, alarm hl:np_load_long=0.792969/1.50, alarm hl:mem_free=13492.000000M/600M, alarm hl:available=1/0  
[03:47:18] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[03:47:35] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 14.49, 18.30, 19.82  
[03:47:56] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[03:47:56] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[03:48:17] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[03:49:34] <tsnag>	 /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 293367 MB (5% inode=33%):  
[03:50:16] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[03:57:14] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.082031/1.00, alarm hl:np_load_long=0.849610/1.50, alarm hl:mem_free=13554.000000M/600M, alarm hl:available=1/0  
[04:00:16] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[04:03:15] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.395508/1.95, alarm hl:np_load_avg=2.782227/2.0, alarm hl:mem_free=599.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.395508/2.3, alarm hl:np_load_long=2.583984/2.5, alarm hl:cpu=97.200000/98, alarm hl:mem_free=599.000000M/200M, alarm hl:available=1/0  
[04:30:12] <wikirc>	 [[Wiki server assignments]] ! 10https://wiki.toolserver.org/w/index.php?diff=7198&oldid=7156&rcid=9592 * 91.198.174.202 * (+0) (updated page)
[04:47:35] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 18.32, 21.30, 21.14  
[04:48:04] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[04:48:15] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[04:48:35] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[04:49:35] <tsnag>	 /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 293333 MB (5% inode=33%):  
[04:50:36] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[04:58:34] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 12.98, 17.65, 19.88  
[05:00:35] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 47.69, 23.83, 21.63  
[05:03:24] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.056641/1.95, alarm hl:np_load_avg=2.565430/2.0, alarm hl:mem_free=880.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.056641/2.3, alarm hl:np_load_long=2.596191/2.5, alarm hl:cpu=95.300000/98, alarm hl:mem_free=880.000000M/200M, alarm hl:available=1/0  
[05:07:24] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.165039/1.10, alarm hl:np_load_long=0.837890/1.55, alarm hl:mem_free=13679.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.165039/1.00, alarm hl:np_load_long=0.837890/1.50, alarm hl:mem_free=13679.000000M/600M, alarm hl:available=1/0  
[05:08:25] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[05:33:24] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.416992/1.10, alarm hl:np_load_long=0.935547/1.55, alarm hl:mem_free=13650.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.416992/1.00, alarm hl:np_load_long=0.935547/1.50, alarm hl:mem_free=13650.000000M/600M, alarm hl:available=1/0  
[05:34:25] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[05:43:35] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.035156/1.00, alarm hl:np_load_long=0.935547/1.50, alarm hl:mem_free=14676.000000M/600M, alarm hl:available=1/0  
[05:48:15] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[05:49:04] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[05:49:34] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[05:50:35] <tsnag>	 /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 293273 MB (5% inode=33%):  
[05:51:34] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[05:57:36] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 15.44, 18.07, 19.98  
[06:00:35] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 53.57, 24.76, 21.76  
[06:03:34] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.870117/1.95, alarm hl:np_load_avg=3.044922/2.0, alarm hl:mem_free=335.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.870117/2.3, alarm hl:np_load_long=2.780273/2.5, alarm hl:cpu=100.000000/98, alarm hl:mem_free=335.000000M/200M, alarm hl:available=1/0  
[06:22:44] <tsnag>	 RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[06:37:34] <tsnag>	 RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0  
[06:45:09] <wikirc>	 [[Special:Log/newusers]] create 10 * Sh555anmugamp7 *  (New user account)
[06:48:14] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[06:49:05] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[06:49:34] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[06:50:45] <tsnag>	 /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 290934 MB (5% inode=32%):  
[06:51:35] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[07:00:44] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 54.69, 28.18, 24.75  
[07:02:44] <tsnag>	 RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[07:03:35] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.811523/1.95, alarm hl:np_load_avg=3.463379/2.0, alarm hl:mem_free=280.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.811523/2.3, alarm hl:np_load_long=3.174805/2.5, alarm hl:cpu=99.700000/98, alarm hl:mem_free=280.000000M/200M, alarm hl:available=1/0  
[07:47:32] <jeremyb>	 is tsnag dead? or did willow really not recover?
[07:48:27] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[07:49:07] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[07:49:37] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[07:50:47] <tsnag>	 /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 290892 MB (5% inode=32%):  
[07:51:37] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[07:52:46] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 13.05, 18.04, 19.98  
[07:54:47] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 29.23, 20.99, 20.74  
[07:56:47] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 15.34, 18.64, 19.90  
[08:03:46] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.826660/1.95, alarm hl:np_load_avg=3.168457/2.0, alarm hl:mem_free=317.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.826660/2.3, alarm hl:np_load_long=2.804688/2.5, alarm hl:cpu=99.400000/98, alarm hl:mem_free=317.000000M/200M, alarm hl:available=1/0  
[08:12:18] <tsnag>	 SMF on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon.  Check the remote server logs for error messages.  
[08:13:18] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[08:13:31] <jeremyb>	 wow, i broguht tsnag back to life
[08:16:57] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 14.30, 22.23, 22.91  
[08:24:49] <jeremyb>	 who's breaking willow?
[08:25:47] <tsnag>	 RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[08:26:27] <tsnag>	 RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0  
[08:28:57] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 12.03, 16.37, 19.67  
[08:30:57] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 32.44, 20.61, 20.73  
[08:32:57] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 14.53, 17.95, 19.72  
[08:34:28] <tsnag>	 SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[08:34:39] <tsnag>	 SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[08:34:39] <tsnag>	 SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[08:34:47] <tsnag>	 RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[08:35:17] <tsnag>	 SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[08:35:27] <tsnag>	 SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[08:35:28] <tsnag>	 SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[08:49:08] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[08:49:47] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[08:50:47] <tsnag>	 /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 290816 MB (5% inode=32%):  
[08:51:47] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[09:03:48] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.590332/1.95, alarm hl:np_load_avg=3.141602/2.0, alarm hl:mem_free=353.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.590332/2.3, alarm hl:np_load_long=2.743164/2.5, alarm hl:cpu=96.800000/98, alarm hl:mem_free=353.000000M/200M, alarm hl:available=1/0  
[09:11:56] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 16.49, 20.08, 20.95  
[09:13:28] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[09:22:39] <tsnag>	 SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[09:22:39] <tsnag>	 SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[09:22:39] <tsnag>	 SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[09:22:47] <tsnag>	 SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[09:22:47] <tsnag>	 SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[09:22:57] <tsnag>	 RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:22:58] <tsnag>	 / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:10] <tsnag>	 SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:10] <tsnag>	 /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:18] <tsnag>	 SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[09:23:18] <tsnag>	 /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:18] <tsnag>	 /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:19] <tsnag>	 / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:19] <tsnag>	 /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:29] <tsnag>	 SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:38] <tsnag>	 Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:38] <tsnag>	 /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:38] <tsnag>	 /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:38] <tsnag>	 /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:38] <tsnag>	 SMTP on z-dat-s6-a is OK: SMTP OK - 7.059 sec. response time  
[09:23:38] <tsnag>	 /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:38] <tsnag>	 Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:39] <tsnag>	 Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[09:23:39] <tsnag>	 / on z-dat-s4-a is OK: DISK OK - free space: / 8337 MB (27% inode=85%):  
[09:23:40] <tsnag>	 SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[09:23:40] <tsnag>	 SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[09:23:41] <tsnag>	 SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[09:23:57] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 12.02, 17.37, 19.74  
[09:23:57] <tsnag>	 SMF on z-dat-s7-a is OK: OK - all services online  
[09:24:07] <tsnag>	 Load avg. on z-dat-s4-a is OK: OK - load average: 0.60, 1.27, 1.81  
[09:24:07] <tsnag>	 SMTP on hyacinth is OK: SMTP OK - 0.004 sec. response time  
[09:24:07] <tsnag>	 /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 161666 MB (16% inode=99%):  
[09:24:07] <tsnag>	 /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 2118 MB (99% inode=99%):  
[09:24:07] <tsnag>	 /tmp on z-dat-s7-a is OK: DISK OK - free space: /tmp 2118 MB (99% inode=99%):  
[09:24:07] <tsnag>	 /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 2116 MB (99% inode=99%):  
[09:24:07] <tsnag>	 Load avg. on z-dat-s3-a is OK: OK - load average: 0.62, 1.27, 1.81  
[09:24:08] <tsnag>	 Load avg. on z-dat-s6-a is OK: OK - load average: 0.62, 1.27, 1.81  
[09:24:56] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 30.06, 21.63, 21.10  
[09:28:57] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 13.93, 18.16, 19.84  
[09:49:10] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[09:49:52] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[09:50:51] <tsnag>	 /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 290790 MB (5% inode=32%):  
[09:51:51] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[09:52:51] <tsnag>	 RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:03:50] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=3.531738/1.95, alarm hl:np_load_avg=4.137207/2.0, alarm hl:mem_free=161.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=3.531738/2.3, alarm hl:np_load_long=3.275879/2.5, alarm hl:cpu=96.500000/98, alarm hl:mem_free=161.000000M/200M, alarm hl:available=1/0  
[10:13:51] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[10:17:02] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 21.48, 28.12, 27.39  
[10:17:30] <tsnag>	 RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0  
[10:17:51] <tsnag>	 SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:01] <tsnag>	 SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:18:02] <tsnag>	 SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:18:10] <tsnag>	 Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:11] <tsnag>	 SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:18:11] <tsnag>	 SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:18:11] <tsnag>	 SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:18:12] <tsnag>	 SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:18:12] <tsnag>	 SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:18:22] <tsnag>	 / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:22] <tsnag>	 / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:22] <tsnag>	 SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:18:30] <tsnag>	 SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:18:30] <tsnag>	 SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:30] <tsnag>	 / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:30] <tsnag>	 SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:30] <tsnag>	 /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:30] <tsnag>	 SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:30] <tsnag>	 /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:31] <tsnag>	 /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:31] <tsnag>	 /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:32] <tsnag>	 RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:32] <tsnag>	 / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:33] <tsnag>	 SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:40] <tsnag>	 /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:40] <tsnag>	 /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:40] <tsnag>	 Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:40] <tsnag>	 /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:41] <tsnag>	 Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:41] <tsnag>	 Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:18:51] <tsnag>	 SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:18:51] <tsnag>	 /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:19:01] <tsnag>	 MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out)  
[10:19:12] <tsnag>	 MySQL slave on z-dat-s3-a is OK: Uptime: 1000703  Threads: 19  Questions: 1065339138  Slow queries: 75598  Opens: 10235623  Flush tables: 1  Open tables: 16384  Queries per second avg: 1064.590 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 243  
[10:19:20] <tsnag>	 MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out)  
[10:19:21] <tsnag>	 /tmp on z-dat-s7-a is OK: DISK OK - free space: /tmp 2050 MB (99% inode=99%):  
[10:19:21] <tsnag>	 Load avg. on z-dat-s6-a is OK: OK - load average: 0.33, 1.03, 1.60  
[10:19:21] <tsnag>	 /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 161624 MB (16% inode=99%):  
[10:19:21] <tsnag>	 /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 2047 MB (99% inode=99%):  
[10:19:21] <tsnag>	 / on z-dat-s6-a is OK: DISK OK - free space: / 8337 MB (27% inode=85%):  
[10:19:22] <tsnag>	 /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 82227 MB (20% inode=99%):  
[10:19:22] <tsnag>	 / on z-dat-s7-a is OK: DISK OK - free space: / 8337 MB (27% inode=85%):  
[10:19:23] <tsnag>	 Load avg. on z-dat-s4-a is OK: OK - load average: 0.33, 1.04, 1.60  
[10:19:23] <tsnag>	 /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 94113 MB (23% inode=99%):  
[10:19:24] <tsnag>	 /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 161624 MB (16% inode=99%):  
[10:19:24] <tsnag>	 /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 2052 MB (99% inode=99%):  
[10:19:25] <tsnag>	 Load avg. on z-dat-s3-a is OK: OK - load average: 0.33, 1.03, 1.60  
[10:19:25] <tsnag>	 /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 2058 MB (99% inode=99%):  
[10:19:26] <tsnag>	 SMF on z-dat-s4-a is OK: OK - all services online  
[10:19:26] <tsnag>	 SMF on z-dat-s3-a is OK: OK - all services online  
[10:19:39] <tsnag>	 Load avg. on z-dat-s7-a is OK: OK - load average: 0.67, 1.05, 1.59  
[10:19:39] <tsnag>	 SMTP on z-dat-s4-a is OK: SMTP OK - 0.128 sec. response time  
[10:19:50] <tsnag>	 / on z-dat-s3-a is OK: DISK OK - free space: / 8337 MB (27% inode=85%):  
[10:19:51] <tsnag>	 SMTP on z-dat-s6-a is OK: SMTP OK - 0.016 sec. response time  
[10:19:51] <tsnag>	 SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[10:19:51] <tsnag>	 / on z-dat-s4-a is OK: DISK OK - free space: / 8337 MB (27% inode=85%):  
[10:20:01] <tsnag>	 SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[10:20:01] <tsnag>	 SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[10:20:02] <tsnag>	 SMTP on z-dat-s7-a is OK: SMTP OK - 0.056 sec. response time  
[10:20:02] <tsnag>	 SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[10:20:02] <tsnag>	 SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[10:20:10] <tsnag>	 SMTP on hyacinth is OK: SMTP OK - 0.004 sec. response time  
[10:31:11] <tsnag>	 SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:31:11] <tsnag>	 SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:31:11] <tsnag>	 SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:31:30] <tsnag>	 RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:31:51] <tsnag>	 SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:36:51] <ToAruShiroiNeko>	 jeremyb do you have a toolserver account?
[10:37:13] <ToAruShiroiNeko>	 I need a quick check of template use in file namespace on smaller wikis
[10:41:30] <tsnag>	 SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:41:31] <tsnag>	 SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:41:51] <tsnag>	 /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:41:51] <tsnag>	 SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:42:00] <tsnag>	 SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:42:00] <tsnag>	 SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:42:11] <tsnag>	 Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:42:22] <tsnag>	 SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:42:22] <tsnag>	 SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:42:22] <tsnag>	 / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:42:22] <tsnag>	 / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[10:42:22] <tsnag>	 SMTP on z-dat-s3-a is OK: SMTP OK - 0.002 sec. response time  
[10:42:30] <tsnag>	 SMF on hyacinth is OK: OK - all services online  
[10:42:30] <tsnag>	 SMTP on hyacinth is OK: SMTP OK - 9.321 sec. response time  
[10:42:39] <tsnag>	 Load avg. on z-dat-s7-a is OK: OK - load average: 0.54, 0.93, 1.30  
[10:42:50] <tsnag>	 MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out)  
[10:42:50] <tsnag>	 SMTP on z-dat-s6-a is OK: SMTP OK - 0.003 sec. response time  
[10:42:50] <tsnag>	 SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[10:42:50] <tsnag>	 / on z-dat-s4-a is OK: DISK OK - free space: / 8337 MB (27% inode=85%):  
[10:42:50] <tsnag>	 / on z-dat-s3-a is OK: DISK OK - free space: / 8337 MB (27% inode=85%):  
[10:42:59] <tsnag>	 MySQL slave on z-dat-s3-a is OK: Uptime: 1002131  Threads: 19  Questions: 1066123597  Slow queries: 75723  Opens: 10240569  Flush tables: 1  Open tables: 16384  Queries per second avg: 1063.856 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 224  
[10:43:00] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.120117/1.10, alarm hl:np_load_long=0.747070/1.55, alarm hl:mem_free=13114.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.120117/1.00, alarm hl:np_load_long=0.747070/1.50, alarm hl:mem_free=13114.000000M/600M, alarm hl:available=1/0  
[10:43:10] <tsnag>	 SMTP on z-dat-s7-a is OK: SMTP OK - 0.003 sec. response time  
[10:48:00] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[10:49:20] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[10:50:00] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[10:51:00] <tsnag>	 /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 290728 MB (5% inode=32%):  
[10:52:00] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[10:59:00] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.196289/1.10, alarm hl:np_load_long=0.948242/1.55, alarm hl:mem_free=14040.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.196289/1.00, alarm hl:np_load_long=0.948242/1.50, alarm hl:mem_free=14040.000000M/600M, alarm hl:available=1/0  
[11:04:00] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.265625/1.95, alarm hl:np_load_avg=2.858398/2.0, alarm hl:mem_free=250.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.265625/2.3, alarm hl:np_load_long=2.823730/2.5, alarm hl:cpu=91.800000/98, alarm hl:mem_free=250.000000M/200M, alarm hl:available=1/0  
[11:08:50] <tsnag>	 SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[11:09:11] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[11:09:11] <tsnag>	 SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[11:09:31] <tsnag>	 NTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[11:09:39] <tsnag>	 SMF on hyacinth is OK: OK - all services online  
[11:10:01] <tsnag>	 SMTP on z-dat-s6-a is OK: SMTP OK - 0.014 sec. response time  
[11:10:21] <tsnag>	 NTP on hyacinth is OK: NTP OK: Offset 0.003084 secs  
[11:13:10] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.337891/1.10, alarm hl:np_load_long=1.318359/1.55, alarm hl:mem_free=14276.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.337891/1.00, alarm hl:np_load_long=1.318359/1.50, alarm hl:mem_free=14276.000000M/600M, alarm hl:available=1/0  
[11:14:01] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[11:17:10] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 17.62, 24.40, 24.37  
[11:24:31] <tsnag>	 SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[11:24:50] <tsnag>	 SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[11:25:11] <tsnag>	 SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[11:25:20] <tsnag>	 SMTP on hyacinth is OK: SMTP OK - 5.570 sec. response time  
[11:25:39] <tsnag>	 SMTP on z-dat-s4-a is OK: SMTP OK - 0.003 sec. response time  
[11:25:49] <tsnag>	 MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out)  
[11:26:01] <tsnag>	 MySQL slave on z-dat-s3-a is OK: Uptime: 1004711  Threads: 15  Questions: 1067374856  Slow queries: 75840  Opens: 10255216  Flush tables: 1  Open tables: 16384  Queries per second avg: 1062.370 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 158  
[11:51:10] <tsnag>	 /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 290427 MB (5% inode=32%):  
[11:52:10] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[11:54:10] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 12.08, 15.68, 19.70  
[11:55:11] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 25.66, 20.13, 21.10  
[11:59:10] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 15.37, 17.67, 19.89  
[12:04:11] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.133301/1.95, alarm hl:np_load_avg=2.954102/2.0, alarm hl:mem_free=162.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.133301/2.3, alarm hl:np_load_long=2.817383/2.5, alarm hl:cpu=87.500000/98, alarm hl:mem_free=162.000000M/200M, alarm hl:available=1/0  
[12:15:00] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[12:18:30] <ToAruShiroiNeko>	 hello Betacommand
[12:19:10] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 27.39, 27.29, 26.37  
[12:23:31] <ToAruShiroiNeko>	 Betacommand you have a momment? I am in need of a tool or query that would help establish how well are wikis complying with WMFR:LP
[12:24:13] <ToAruShiroiNeko>	 I want to get numbers and percentages on what percentage of the files on smaller wikis are, orphaned, unlicensed (without templates), or are uncategorised
[12:24:21] <Betacommand>	 what is WMFR:LP\
[12:24:32] <ToAruShiroiNeko>	 http://wikimediafoundation.org/wiki/Resolution:Licensing_policy
[12:24:52] <ToAruShiroiNeko>	 This is to supplement http://meta.wikimedia.org/wiki/Requests_for_comment/Disable_local_uploads_on_smaller_wikis
[12:25:05] <ToAruShiroiNeko>	 it would also be interested to know which admins delete the most files on those wikis
[12:25:07] <Betacommand>	 ToAruShiroiNeko: drop me an email and Ill see what I can do
[12:25:11] <tsnag>	 /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 332603 MB (6% inode=35%):  
[12:25:12] <ToAruShiroiNeko>	 ok
[12:25:17] <ToAruShiroiNeko>	 what is your email tho? :)
[12:25:26] <Betacommand>	 @toolserver.org
[12:25:34] <ToAruShiroiNeko>	 Betacommand@toolserver.org it is
[12:28:10] <tsnag>	 SMF on hemlock is CRITICAL: ERROR - maintenance:  svc:/application/database/postgresql_83:default_32bit  
[12:34:52] <ToAruShiroiNeko>	 email sent
[12:37:30] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.144531/1.10, alarm hl:np_load_long=0.908203/1.55, alarm hl:mem_free=13241.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.144531/1.00, alarm hl:np_load_long=0.908203/1.50, alarm hl:mem_free=13241.000000M/600M, alarm hl:available=1/0  
[12:38:30] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[12:49:40] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[12:50:31] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[12:52:20] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[13:01:51] <tsnag>	 Sun Grid Engine execd on wolfsbane is WARNING: short-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.365234/1.10, alarm hl:np_load_long=0.327637/1.55, alarm hl:mem_free=367.000000M/500M, alarm hl:available=1/0: medium-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.365234/1.00, alarm hl:np_load_long=0.327637/1.50, alarm hl:mem_free=367.000000M/600M, alarm hl:available=1/0  
[13:04:31] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.704590/1.95, alarm hl:np_load_avg=3.300781/2.0, alarm hl:mem_free=239.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.704590/2.3, alarm hl:np_load_long=3.133789/2.5, alarm hl:cpu=95.800000/98, alarm hl:mem_free=239.000000M/200M, alarm hl:available=1/0  
[13:04:50] <tsnag>	 Sun Grid Engine execd on wolfsbane is OK: testqueue@wolfsbane OK: short-sol@wolfsbane OK: medium-sol@wolfsbane OK  
[13:06:55] <msgbot>	 3(resolved) [TS-1373] Send some SFPs to Haarlem <10https://jira.toolserver.org/browse/TS-1373>  (Marlen Caemmerer)
[13:08:56] <msgbot>	 3(resolved) [TS-1374] Clean up nagios regarding SAN connections <10https://jira.toolserver.org/browse/TS-1374>  (Marlen Caemmerer)
[13:09:02] <msgbot>	 3(commented) [TS-1340] Cronie jobs not being run intermittently <10https://jira.toolserver.org/browse/TS-1340>  (Marlen Caemmerer)
[13:10:20] <tsnag>	 SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:10:20] <tsnag>	 SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:10:20] <tsnag>	 SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:10:41] <tsnag>	 SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:11:00] <tsnag>	 / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:11:01] <tsnag>	 /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:11:01] <tsnag>	 SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:11:01] <tsnag>	 SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:11:02] <tsnag>	 SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:11:02] <tsnag>	 SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:11:10] <tsnag>	 /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:11:10] <tsnag>	 RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:11:10] <tsnag>	 SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:11:20] <tsnag>	 SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:11:20] <tsnag>	 Environment IPMI on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:11:20] <tsnag>	 SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:11:30] <tsnag>	 Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:11:30] <tsnag>	 SMTP on hyacinth is OK: SMTP OK - 0.002 sec. response time  
[13:11:41] <tsnag>	 /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 93951 MB (23% inode=99%):  
[13:11:41] <tsnag>	 / on z-dat-s7-a is OK: DISK OK - free space: / 8336 MB (27% inode=85%):  
[13:11:41] <tsnag>	 SMF on z-dat-s4-a is OK: OK - all services online  
[13:11:42] <tsnag>	 SMF on z-dat-s3-a is OK: OK - all services online  
[13:11:42] <tsnag>	 /tmp on z-dat-s7-a is OK: DISK OK - free space: /tmp 2040 MB (99% inode=99%):  
[13:11:42] <tsnag>	 SMF on z-dat-s7-a is OK: OK - all services online  
[13:11:42] <tsnag>	 SMF on z-dat-s6-a is OK: OK - all services online  
[13:11:50] <tsnag>	 SMF on hyacinth is OK: OK - all services online  
[13:11:50] <tsnag>	 RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0  
[13:12:00] <tsnag>	 Load avg. on z-dat-s7-a is OK: OK - load average: 0.71, 1.04, 1.61  
[13:12:00] <tsnag>	 Environment IPMI on hyacinth is OK: ok:  temperature ok fan ok voltage ok chassis ok  
[13:12:00] <tsnag>	 SMTP on z-dat-s6-a is OK: SMTP OK - 0.002 sec. response time  
[13:12:09] <tsnag>	 SMTP on z-dat-s7-a is OK: SMTP OK - 0.004 sec. response time  
[13:12:09] <tsnag>	 SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[13:12:10] <tsnag>	 SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[13:12:10] <tsnag>	 SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[13:15:01] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[13:17:30] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.029297/1.00, alarm hl:np_load_long=0.915039/1.50, alarm hl:mem_free=14310.000000M/600M, alarm hl:available=1/0  
[13:18:30] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[13:19:19] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 30.97, 25.03, 24.48  
[13:20:00] <tsnag>	 MySQL slave on z-dat-s3-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1885  
[13:21:55] <msgbot>	 3(commented) [OSM-11] WIWOSM  https <10https://jira.toolserver.org/browse/OSM-11>  (Marlen Caemmerer)
[13:22:00] <msgbot>	 3(resolved) [TS-1293] Add more fibre cables to the SAN installation for redundancy <10https://jira.toolserver.org/browse/TS-1293>  (Marlen Caemmerer)
[13:23:52] <msgbot>	 3(commented) [TS-1294] Configure second SAN switch <10https://jira.toolserver.org/browse/TS-1294>  (Marlen Caemmerer)
[13:24:12] <DaBPunkt>	 hello all
[13:25:20] <tsnag>	 /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 332529 MB (6% inode=35%):  
[13:26:01] <tsnag>	 MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out)  
[13:27:11] <tsnag>	 SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:27:11] <tsnag>	 RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:27:11] <tsnag>	 SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:27:11] <tsnag>	 SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:27:20] <tsnag>	 SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:27:20] <tsnag>	 SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:27:21] <tsnag>	 SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:27:21] <tsnag>	 SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:27:21] <tsnag>	 SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:27:41] <tsnag>	 SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:28:19] <tsnag>	 s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out)  
[13:28:19] <tsnag>	 MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out)  
[13:28:30] <tsnag>	 SMF on hemlock is CRITICAL: ERROR - maintenance:  svc:/application/database/postgresql_83:default_32bit  
[13:28:31] <tsnag>	 MySQL slave on z-dat-s3-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2229  
[13:28:31] <tsnag>	 MySQL on z-dat-s3-a is OK: Uptime: 1012069  Threads: 17  Questions: 1072442837  Slow queries: 76975  Opens: 10309890  Flush tables: 1  Open tables: 16384  Queries per second avg: 1059.653  
[13:28:41] <tsnag>	 s4 replag on z-dat-s4-a is OK: QUERY OK: SELECT ts_rc_age() returned 201.000000  
[13:29:01] <tsnag>	 SMTP on z-dat-s4-a is OK: SMTP OK - 0.021 sec. response time  
[13:29:01] <tsnag>	 SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[13:29:10] <tsnag>	 SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[13:32:56] <msgbot>	 3(commented) [TS-1371] Add user "sactage" to MMT group "cvn" and "swmtbot" <10https://jira.toolserver.org/browse/TS-1371>  (Marlen Caemmerer)
[13:33:47] <Merlissimo>	 @replag
[13:36:21] <tsnag>	 SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:36:31] <tsnag>	 / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:36:51] <tsnag>	 /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:36:55] <msgbot>	 3(commented) [TS-1366] Lost my SSH key pair, please update <10https://jira.toolserver.org/browse/TS-1366>  (Marlen Caemmerer)
[13:37:00] <tsbot>	 Merlissimo: s2-user: 14s [-]; s3-rr-a: error; s3-user: error; s4-user: error; s6-rr-a: 3m 33s [-]; s6-user: error; s7-rr-a: 3m 19s [-]; s7-user: error
[13:37:01] <tsnag>	 SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:37:01] <tsnag>	 SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:37:01] <tsnag>	 SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:37:01] <tsnag>	 SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[13:37:01] <tsnag>	 / on z-dat-s4-a is OK: DISK OK - free space: / 8336 MB (27% inode=85%):  
[13:37:19] <tsnag>	 /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 2084 MB (99% inode=99%):  
[13:37:30] <tsnag>	 SMF on z-dat-s4-a is OK: OK - all services online  
[13:37:31] <tsnag>	 SMF on z-dat-s3-a is OK: OK - all services online  
[13:37:31] <tsnag>	 SMF on z-dat-s6-a is OK: OK - all services online  
[13:37:31] <tsnag>	 SMF on z-dat-s7-a is OK: OK - all services online  
[13:49:50] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[13:49:59] <msgbot>	 3(commented) [TS-1366] Lost my SSH key pair, please update <10https://jira.toolserver.org/browse/TS-1366>  (DaB.)
[13:51:20] <Merlissimo>	 @replag
[13:51:20] <tsbot>	 Merlissimo: s2-user: 10s [-0.00 s/s]; s3-rr-a: 46m 38s [-]; s3-user: 46m 38s [-]
[13:51:30] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[13:52:21] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[13:53:22] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 14.87, 18.28, 19.76  
[13:54:09] <tsnag>	 SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:54:10] <tsnag>	 SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:54:50] <tsnag>	 SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[13:55:03] <tsnag>	 s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out)  
[13:55:10] <tsnag>	 SMTP on z-dat-s6-a is OK: SMTP OK - 5.862 sec. response time  
[13:55:10] <tsnag>	 SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[13:55:10] <tsnag>	 s4 replag on z-dat-s4-a is OK: QUERY OK: SELECT ts_rc_age() returned 166.000000  
[13:55:20] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 30.32, 22.85, 21.26  
[13:55:40] <tsnag>	 SMTP on z-dat-s3-a is OK: SMTP OK - 0.003 sec. response time  
[13:59:20] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 15.41, 18.61, 19.86  
[14:04:31] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.047851/1.00, alarm hl:np_load_long=0.828125/1.50, alarm hl:mem_free=14561.000000M/600M, alarm hl:available=1/0  
[14:04:31] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.521973/1.95, alarm hl:np_load_avg=2.995117/2.0, alarm hl:mem_free=410.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.521973/2.3, alarm hl:np_load_long=2.807617/2.5, alarm hl:cpu=89.600000/98, alarm hl:mem_free=410.000000M/200M, alarm hl:available=1/0  
[14:05:31] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[14:12:10] <tsnag>	 MySQL slave on z-dat-s3-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2094  
[14:13:50] <tsnag>	 SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[14:13:51] <tsnag>	 /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:01] <tsnag>	 /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:01] <tsnag>	 /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:01] <tsnag>	 / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:01] <tsnag>	 / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:01] <tsnag>	 SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:02] <tsnag>	 SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:02] <tsnag>	 SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:02] <tsnag>	 SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:10] <tsnag>	 MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out)  
[14:14:10] <tsnag>	 /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:10] <tsnag>	 Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:11] <tsnag>	 /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:11] <tsnag>	 /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:11] <tsnag>	 Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:11] <tsnag>	 Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:11] <tsnag>	 /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:30] <tsnag>	 Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:31] <tsnag>	 / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:31] <tsnag>	 / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:14:51] <tsnag>	 /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[14:15:00] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[14:15:20] <tsnag>	 Load avg. on z-dat-s7-a is OK: OK - load average: 0.33, 1.32, 1.72  
[14:15:21] <tsnag>	 / on z-dat-s4-a is OK: DISK OK - free space: / 8336 MB (27% inode=85%):  
[14:15:21] <tsnag>	 / on z-dat-s3-a is OK: DISK OK - free space: / 8336 MB (27% inode=85%):  
[14:15:21] <tsnag>	 /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 161816 MB (16% inode=99%):  
[14:15:21] <tsnag>	 /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 1924 MB (99% inode=99%):  
[14:15:30] <tsnag>	 MySQL slave on z-dat-s3-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2268  
[14:15:31] <tsnag>	 /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 93246 MB (23% inode=99%):  
[14:15:31] <tsnag>	 /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 82047 MB (20% inode=99%):  
[14:15:31] <tsnag>	 / on z-dat-s7-a is OK: DISK OK - free space: / 8336 MB (27% inode=85%):  
[14:15:31] <tsnag>	 / on z-dat-s6-a is OK: DISK OK - free space: / 8336 MB (27% inode=85%):  
[14:15:31] <tsnag>	 SMF on z-dat-s3-a is OK: OK - all services online  
[14:15:31] <tsnag>	 SMF on z-dat-s4-a is OK: OK - all services online  
[14:15:31] <tsnag>	 SMF on z-dat-s6-a is OK: OK - all services online  
[14:15:32] <tsnag>	 SMF on z-dat-s7-a is OK: OK - all services online  
[14:15:41] <tsnag>	 /tmp on z-dat-s7-a is OK: DISK OK - free space: /tmp 1964 MB (99% inode=99%):  
[14:15:41] <tsnag>	 SMTP on z-dat-s3-a is OK: SMTP OK - 0.013 sec. response time  
[14:15:41] <tsnag>	 /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 161814 MB (16% inode=99%):  
[14:15:41] <tsnag>	 /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 1961 MB (99% inode=99%):  
[14:15:41] <tsnag>	 Load avg. on z-dat-s4-a is OK: OK - load average: 1.19, 1.38, 1.71  
[14:15:41] <tsnag>	 Load avg. on z-dat-s6-a is OK: OK - load average: 1.19, 1.38, 1.71  
[14:15:41] <tsnag>	 /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 1961 MB (99% inode=99%):  
[14:15:42] <tsnag>	 Load avg. on z-dat-s3-a is OK: OK - load average: 1.22, 1.39, 1.72  
[14:19:20] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 26.96, 22.48, 21.84  
[14:19:31] <tsnag>	 MySQL slave on z-dat-s3-a is OK: Uptime: 1015121  Threads: 15  Questions: 1074101667  Slow queries: 77293  Opens: 10316349  Flush tables: 1  Open tables: 16384  Queries per second avg: 1058.102 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1617  
[14:25:20] <tsnag>	 /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 332449 MB (6% inode=35%):  
[14:28:31] <tsnag>	 SMF on hemlock is CRITICAL: ERROR - maintenance:  svc:/application/database/postgresql_83:default_32bit  
[14:47:21] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 18.78, 19.00, 19.94  
[14:48:20] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 25.52, 20.18, 20.26  
[14:49:50] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[14:52:20] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[14:52:30] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[14:53:20] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 16.77, 18.84, 19.85  
[15:05:41] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.190430/1.95, alarm hl:np_load_avg=2.487793/2.0, alarm hl:mem_free=137.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.190430/2.3, alarm hl:np_load_long=2.551270/2.5, alarm hl:cpu=91.500000/98, alarm hl:mem_free=137.000000M/200M, alarm hl:available=1/0  
[15:07:01] <tsnag>	 Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[15:07:42] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=6.848633/1.95, alarm hl:np_load_avg=3.494629/2.0, alarm hl:mem_free=301.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=6.848633/2.3, alarm hl:np_load_long=2.885742/2.5, alarm hl:cpu=88.100000/98, alarm hl:mem_free=301.000000M/200M, alarm hl:available=1/0  
[15:10:30] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.068359/1.00, alarm hl:np_load_long=0.951172/1.50, alarm hl:mem_free=14034.000000M/600M, alarm hl:available=1/0  
[15:11:30] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[15:16:00] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[15:19:31] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 19.77, 21.02, 21.87  
[15:25:20] <tsnag>	 /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 332353 MB (6% inode=35%):  
[15:27:30] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 15.39, 17.91, 19.90  
[15:29:30] <tsnag>	 SMF on hemlock is CRITICAL: ERROR - maintenance:  svc:/application/database/postgresql_83:default_32bit  
[15:30:28] <Toto_Azero>	 Merlissimo: around ?
[15:30:30] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 33.42, 20.21, 20.20  
[15:30:48] <Merlissimo>	 Toto_Azero: yes
[15:30:54] <msgbot>	 3(commented) [TS-1366] Lost my SSH key pair, please update <10https://jira.toolserver.org/browse/TS-1366>  (Junaid PV)
[15:31:17] <Toto_Azero>	 Merlissimo: did you received the mail I sent you ?
[15:34:11] <Merlissimo>	 Toto_Azero: yes, but i did not notice it until now
[15:34:26] <Toto_Azero>	 :p
[15:34:30] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 15.08, 18.81, 19.83  
[15:34:30] <Merlissimo>	 my bot won't be part of that mmp
[15:35:24] <Toto_Azero>	 I'm afraid I don't really understand… so you don't want to be a member of the MMP ?
[15:35:29] <Merlissimo>	 DaBPunkt allowed me to stay in my own mmp for this bot, because it needs so many database connection and uses tomcat
[15:35:44] <Toto_Azero>	 ok
[15:35:49] <Merlissimo>	 i am not running and pwb interwiki bot
[15:36:35] <Merlissimo>	 and the main focus of my bot is to find langlinks to deleted pages and automatically solve interwiki conflicts, so its different to pwd
[15:36:52] <Toto_Azero>	 ok
[15:37:03] <Merlissimo>	 perhaps it would be good to have a common mailinglist
[15:37:16] <Toto_Azero>	 yeah… think so too…
[15:37:38] <Toto_Azero>	 will your but run after the deadline (May 15th) ?
[15:38:03] <Toto_Azero>	 (i assume so, but i'd prefer ask)
[15:39:54] <jeremyb>	 ToAruShiroiNeko: i doesn't
[15:41:09] <jeremyb>	 no nosy ;(
[15:49:50] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[15:52:40] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[15:52:41] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[16:05:00] <Toto_Azero|away>	 Merlissimo: ?
[16:05:51] <Merlissimo>	 Toto_Azero: yes, it will also run after the dealine. DaBPunkt said its ok because of the special task and design
[16:06:08] <Toto_Azero>	 ok ;)
[16:07:03] <Merlissimo>	 the bot was special written to run on the toolserver. 
[16:07:42] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=4.413086/1.95, alarm hl:np_load_avg=3.274902/2.0, alarm hl:mem_free=295.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=4.413086/2.3, alarm hl:np_load_long=2.782227/2.5, alarm hl:cpu=88.500000/98, alarm hl:mem_free=295.000000M/200M, alarm hl:available=1/0  
[16:10:32] <Merlissimo>	 Toto_Azero: it takes about 40 days to scan all wikipedia related wmf projects by my bot. I think is fast enough for this kind of task
[16:10:41] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 24.49, 22.88, 21.52  
[16:11:21] <Toto_Azero>	 40 days for all wp projects ! :o I think that's pretty great indeed…
[16:12:33] <Toto_Azero>	 Merlissimo: if ever you see Luckas Blade, can you ask him whether he received my mail ? :p maybe he hasn't seen it yet…
[16:14:53] <msgbot>	 3(resolved) [TS-1366] Lost my SSH key pair, please update <10https://jira.toolserver.org/browse/TS-1366>  (DaB.)
[16:16:20] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[16:19:55] <msgbot>	 3(commented) [TS-1366] Lost my SSH key pair, please update <10https://jira.toolserver.org/browse/TS-1366>  (Junaid PV)
[16:25:20] <tsnag>	 /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 326193 MB (6% inode=35%):  
[16:28:58] <msgbot>	 3(commented) [TS-1340] Cronie jobs not being run intermittently <10https://jira.toolserver.org/browse/TS-1340>  (Russell Blau)
[16:29:42] <tsnag>	 SMF on hemlock is CRITICAL: ERROR - maintenance:  svc:/application/database/postgresql_83:default_32bit  
[16:34:56] <msgbot>	 3(created) [ACCAPP-507] New account to run itwiki/global cvn bots; Account Approval; New Account <10https://jira.toolserver.org/browse/ACCAPP-507>  (Melos)
[16:36:52] <msgbot>	 3(reopened) [DBQ-127] List of Categories assigned to Commons images of the Wikisource PSM project and uploaded by me. <10https://jira.toolserver.org/browse/DBQ-127>  (Ineuw)
[16:47:54] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 13.79, 17.28, 19.81  
[16:48:55] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 30.59, 21.62, 21.19  
[16:50:05] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[16:51:56] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 14.27, 18.07, 19.84  
[16:52:56] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[16:52:56] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[16:53:55] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[16:53:56] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.216797/1.10, alarm hl:np_load_long=0.759765/1.55, alarm hl:mem_free=14089.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.216797/1.00, alarm hl:np_load_long=0.759765/1.50, alarm hl:mem_free=14089.000000M/600M, alarm hl:available=1/0  
[16:54:55] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[16:56:55] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.969726/1.95, alarm hl:np_load_avg=2.209961/2.0, alarm hl:mem_free=536.000000M/350M, alarm hl:available=1/0  
[16:59:56] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[17:16:23] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[17:21:56] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.062988/1.95, alarm hl:np_load_avg=2.574219/2.0, alarm hl:mem_free=324.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.062988/2.3, alarm hl:np_load_long=2.605957/2.5, alarm hl:cpu=79.900000/98, alarm hl:mem_free=324.000000M/200M, alarm hl:available=1/0  
[17:23:55] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.575195/1.10, alarm hl:np_load_long=0.802735/1.55, alarm hl:mem_free=13894.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.575195/1.00, alarm hl:np_load_long=0.802735/1.50, alarm hl:mem_free=13894.000000M/600M, alarm hl:available=1/0  
[17:25:24] <tsnag>	 /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 329624 MB (6% inode=35%):  
[17:26:55] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[17:29:56] <tsnag>	 SMF on hemlock is CRITICAL: ERROR - maintenance:  svc:/application/database/postgresql_83:default_32bit  
[17:30:24] <tsnag>	 SMF on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon.  Check the remote server logs for error messages.  
[17:31:24] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[17:32:56] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.077149/1.00, alarm hl:np_load_long=0.881836/1.50, alarm hl:mem_free=13321.000000M/600M, alarm hl:available=1/0  
[17:45:49] <jeremyb>	 has TS-1370 been fixed? can it be closed?
[17:50:15] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[17:53:56] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[17:53:56] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[18:07:56] <tsnag>	 RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[18:12:25] <tsnag>	 RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0  
[18:21:56] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.330566/1.95, alarm hl:np_load_avg=3.442383/2.0, alarm hl:mem_free=275.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.330566/2.3, alarm hl:np_load_long=3.142090/2.5, alarm hl:cpu=86.500000/98, alarm hl:mem_free=275.000000M/200M, alarm hl:available=1/0  
[18:23:38] <DaBPunkt>	 re
[18:25:25] <tsnag>	 /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 329542 MB (6% inode=35%):  
[18:28:56] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 12.25, 18.81, 22.09  
[18:29:56] <tsnag>	 SMF on hemlock is CRITICAL: ERROR - maintenance:  svc:/application/database/postgresql_83:default_32bit  
[18:31:25] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[18:41:57] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 12.20, 16.60, 19.75  
[18:42:03] <tsnag>	 SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[18:42:03] <tsnag>	 SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[18:42:26] <tsnag>	 SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[18:42:27] <tsnag>	 /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[18:42:27] <tsnag>	 Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[18:42:27] <tsnag>	 /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[18:42:27] <tsnag>	 Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[18:42:34] <tsnag>	 SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[18:42:56] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 26.44, 20.21, 20.84  
[18:42:56] <tsnag>	 SMTP on hyacinth is OK: SMTP OK - 0.006 sec. response time  
[18:42:57] <tsnag>	 SMTP on z-dat-s3-a is OK: SMTP OK - 0.004 sec. response time  
[18:42:57] <tsnag>	 /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 1994 MB (99% inode=99%):  
[18:42:57] <tsnag>	 Load avg. on z-dat-s6-a is OK: OK - load average: 0.81, 1.26, 1.75  
[18:42:57] <tsnag>	 /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 160128 MB (16% inode=99%):  
[18:42:57] <tsnag>	 Load avg. on z-dat-s4-a is OK: OK - load average: 0.81, 1.26, 1.75  
[18:43:14] <tsnag>	 SMTP on z-dat-s4-a is OK: SMTP OK - 0.005 sec. response time  
[18:43:24] <tsnag>	 SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0)  
[18:44:55] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 12.82, 17.26, 19.66  
[18:50:24] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[18:53:57] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[18:53:57] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[18:55:55] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.054688/1.00, alarm hl:np_load_long=0.743164/1.50, alarm hl:mem_free=14241.000000M/600M, alarm hl:available=1/0  
[18:56:59] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[18:59:56] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[19:00:56] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 32.83, 20.51, 18.84  
[19:02:56] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.639648/1.95, alarm hl:np_load_avg=2.495117/2.0, alarm hl:mem_free=232.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.639648/2.3, alarm hl:np_load_long=2.352051/2.5, alarm hl:cpu=99.200000/98, alarm hl:mem_free=232.000000M/200M, alarm hl:available=1/0  
[19:25:34] <tsnag>	 /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 329435 MB (6% inode=35%):  
[19:29:55] <tsnag>	 SMF on hemlock is CRITICAL: ERROR - maintenance:  svc:/application/database/postgresql_83:default_32bit  
[19:31:34] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[19:39:56] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[19:42:56] <tsnag>	 FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk  
[19:43:54] <tsnag>	 NTP on yarrow is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown  
[19:44:56] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.834961/1.95, alarm hl:np_load_avg=2.083984/2.0, alarm hl:mem_free=583.000000M/350M, alarm hl:available=1/0  
[19:47:56] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[19:50:24] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[19:50:54] <tsnag>	 NTP on yarrow is WARNING: NTP WARNING: Server has the LI_ALARM bit set, Offset -0.006817 secs  
[19:50:54] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 13.83, 16.60, 17.30  
[19:54:56] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[19:54:56] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[20:00:56] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 31.41, 18.63, 17.01  
[20:01:55] <tsnag>	 NTP on yarrow is WARNING: NTP WARNING: Server has the LI_ALARM bit set, Offset -0.019921 secs  
[20:01:56] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 17.97, 17.16, 16.59  
[20:02:55] <tsnag>	 RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[20:17:24] <tsnag>	 RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0  
[20:42:56] <tsnag>	 RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.  
[20:43:05] <tsnag>	 FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk  
[20:48:55] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 39.23, 24.29, 19.64  
[20:50:34] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[20:56:12] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[20:56:12] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[21:00:51] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.601074/1.95, alarm hl:np_load_avg=2.047852/2.0, alarm hl:mem_free=786.000000M/350M, alarm hl:available=1/0  
[21:04:51] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[21:13:51] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.817871/1.95, alarm hl:np_load_avg=2.143066/2.0, alarm hl:mem_free=247.000000M/350M, alarm hl:available=1/0  
[21:44:03] <tsnag>	 FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk  
[21:50:44] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[21:55:15] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[21:55:15] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[22:02:35] <DaBPunkt>	 nahct ts
[22:18:24] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[22:18:36] <Jasper|TV_>	 can somebody tell me where I can find a tool that shows articles I started at wikipedia-nl?
[22:19:09] <Jasper|TV_>	 Akoopal: jij mss?
[22:21:24] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.678223/1.95, alarm hl:np_load_avg=2.103516/2.0, alarm hl:mem_free=320.000000M/350M, alarm hl:available=1/0  
[22:23:14] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.077149/1.00, alarm hl:np_load_long=0.708985/1.50, alarm hl:mem_free=13946.000000M/600M, alarm hl:available=1/0  
[22:23:25] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[22:24:15] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[22:25:56] <tsnag>	 /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 328881 MB (6% inode=35%):  
[22:31:15] <tsnag>	 SMF on hemlock is CRITICAL: ERROR - maintenance:  svc:/application/database/postgresql_83:default_32bit  
[22:31:55] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[22:44:05] <tsnag>	 FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk  
[22:45:34] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.130859/1.95, alarm hl:np_load_avg=2.184082/2.0, alarm hl:mem_free=233.000000M/350M, alarm hl:available=1/0  
[22:50:43] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[22:50:54] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 13.62, 16.89, 17.52  
[22:52:34] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[22:55:24] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[22:55:35] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[22:56:35] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.751465/1.95, alarm hl:np_load_avg=1.932617/2.0, alarm hl:mem_free=299.000000M/350M, alarm hl:available=1/0  
[22:57:35] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.423828/1.10, alarm hl:np_load_long=0.885742/1.55, alarm hl:mem_free=14361.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.423828/1.00, alarm hl:np_load_long=0.885742/1.50, alarm hl:mem_free=14361.000000M/600M, alarm hl:available=1/0  
[22:58:36] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[23:00:55] <tsnag>	 Load avg. on willow is CRITICAL: CRITICAL - load average: 32.30, 19.89, 17.74  
[23:01:55] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 18.22, 18.19, 17.27  
[23:23:34] <tsnag>	 Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.223633/1.10, alarm hl:np_load_long=0.769531/1.55, alarm hl:mem_free=14574.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.223633/1.00, alarm hl:np_load_long=0.769531/1.50, alarm hl:mem_free=14574.000000M/600M, alarm hl:available=1/0  
[23:24:34] <tsnag>	 Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK  
[23:24:39] <wikirc>	 [[Special:Log/newusers]] create 10 * The Queen (QueenLady) *  (New user account)
[23:25:55] <tsnag>	 /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 328801 MB (6% inode=35%):  
[23:28:34] <tsnag>	 Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK  
[23:28:46] <wikirc>	 [[User talk:The Queen (QueenLady)]] !N 10https://wiki.toolserver.org/w/index.php?oldid=7199&rcid=9595 * The Queen (QueenLady) * (+4464) (The Queen wants to go home to be Queen, but the people in the United States of America won't help her. If some foreign will come and pick up BabyGirl, they will be greatly rewarded.)
[23:31:34] <tsnag>	 SMF on hemlock is CRITICAL: ERROR - maintenance:  svc:/application/database/postgresql_83:default_32bit  
[23:32:55] <tsnag>	 SMF on willow is CRITICAL: ERROR - maintenance:  svc:/network/puppetmasterd:default  
[23:33:36] <tsnag>	 Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.584473/1.95, alarm hl:np_load_avg=2.061523/2.0, alarm hl:mem_free=315.000000M/350M, alarm hl:available=1/0  
[23:39:46] <brion>	 Is toolserver ok?
[23:40:04] <brion>	 I was doing some testing with the WIki Love Monuments api, and suddenly I can't fetch any pages from toolserver.org
[23:40:11] <brion>	 requests seem to hang
[23:40:28] <brion>	 not answering on port 80
[23:40:58] <Krinkle>	 yeah
[23:41:00] <Krinkle>	 same here
[23:41:03] <Krinkle>	 time outs
[23:41:18] <Krinkle>	 both https and wolfsbane/ortelius servers having issues
[23:41:21] <brion>	 i hope i didn't break it :)
[23:41:42] <Krinkle>	 brion: Of course you did, who else :P
[23:41:48] <brion>	 hehehe
[23:42:13] <Merlissimo>	 brion: load balancer seems to have problem, webservers are ok
[23:42:23] <Krinkle>	 Error 118 (net::ERR_CONNECTION_TIMED_OUT): The operation timed out.
[23:42:37] * brion  blames load balancers, they're always an easy target :)
[23:42:38] <Seahorse>	 for whatever reason I can't get into willow. It is refusing my key
[23:42:41] <Merlissimo>	 wolfsbane/ortelius works for me
[23:42:58] <Krinkle>	 wolfsbane: dns not resolved, ortelius: 404 error
[23:43:25] <Krinkle>	 https://toolserver.org/~krinkle/ http://toolserver.org/~krinkle/ http://ortelius.toolserver.org/~krinkle/ http://wolfsbane.toolserver.org/~krinkle/
[23:43:48] <Merlissimo>	 load balancer and home (error reported by Seahorse) are on tunera
[23:44:18] <Krinkle>	 good time for a break :)
[23:44:19] <Merlissimo>	 and ssl handshake is also done by tunera
[23:44:24] <Magog_the_Ogre>	 ok good
[23:44:30] <Magog_the_Ogre>	 I'm not the only one locked out then >_>
[23:44:38] <brion>	 :)
[23:44:49] <Krinkle-away>	 Magog_the_Ogre: yeah, home dirs, ssl and web server load balancer is having issues
[23:45:26] <Seahorse>	 why has the toolserver proven to be extremely unreliable for the last few months? It used to have solid stability, but then after nightshade went down (which still hasn't been fixed), everything started to suck
[23:45:40] <Magog_the_Ogre>	 is my cron job going to run at 0002 UTC or will I need to run it on my home machine? I wonder.
[23:45:45] <Magog_the_Ogre>	 haha
[23:45:50] <Magog_the_Ogre>	 >the toolserver
[23:45:51] <brion>	 hmm, now i get a 404 from http://toolserver.org/~erfgoed/api/api.php
[23:45:53] <Magog_the_Ogre>	 >once reliable
[23:45:54] <Magog_the_Ogre>	 I laughed
[23:45:58] <brion>	 aha there it is it's back \o/
[23:46:06] <Seahorse>	 it was
[23:46:15] <Krinkle-away>	 brion: yeah, probably because home pool is down, so it can't mount it / find those files
[23:46:21] <Krinkle-away>	 since they are all in home/**/public_html
[23:46:41] <Krinkle-away>	 back up ?
[23:47:20] <brion>	 it's up and running for me
[23:47:48] <Merlissimo>	 Magog_the_Ogre: if you are using submit.toolserver.org cronie that is independend (only if your are executing files in your home)
[23:48:20] <Magog_the_Ogre>	 I don't know what submit.toolserver is
[23:48:23] <Magog_the_Ogre>	 I use cronie on willow
[23:48:34] <Magog_the_Ogre>	 I once used it on nightshade but obviously that stopped working
[23:50:05] <Merlissimo>	 Magog_the_Ogre: cronie on nightshade/willow is local only. cronie on submit.toolserver.org is redundant, but you aren't allowed to run jobs there. It's only for submitting to sge
[23:50:33] <Magog_the_Ogre>	 well then how am I supposed to use it?
[23:50:35] <tsnag>	 Load avg. on willow is WARNING: WARNING - load average: 16.81, 13.43, 12.73  
[23:51:36] <tsnag>	 SMF on damiana is CRITICAL: ERROR - maintenance:  svc:/network/ldap/client:default offline:  svc:/system/cluster/scsymon-srv:default  
[23:53:24] <Merlissimo>	 Magog_the_Ogre: are you using sge for running jobs?
[23:53:31] <Magog_the_Ogre>	 no
[23:53:40] <Magog_the_Ogre>	 I just SSH and use cronie
[23:54:07] <Magog_the_Ogre>	 every time I come on toolserver chat I learn about another program that's eluded me
[23:54:49] <Merlissimo>	 perhaps you should read https://wiki.toolserver.org/view/Job_scheduling (as included on every login message)
[23:55:08] <Magog_the_Ogre>	 I have read it
[23:55:35] <tsnag>	 SMF on turnera is CRITICAL: ERROR -  offline:  svc:/system/cluster/scsymon-srv:default  
[23:55:44] <Magog_the_Ogre>	 maybe it was changed?
[23:55:45] <tsnag>	 DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs  
[23:56:00] <Magog_the_Ogre>	 I know I read about using cronie and cronsub on there, which is what I do now