[00:04:10] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [00:05:10] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 242 bytes in 0.518 second response time [00:06:50] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [00:13:10] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [00:38:00] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1063350.000000 [00:38:50] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:38:50] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 96605 MB (1% inode=51%): [00:38:50] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [00:39:40] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 82081 MB (13% inode=99%): [00:41:10] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [00:43:10] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [00:45:50] Load avg. on willow is CRITICAL: CRITICAL - load average: 31.57, 30.84, 30.91 [00:49:10] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [01:04:10] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [01:05:10] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 243 bytes in 0.515 second response time [01:06:50] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [01:13:10] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [01:29:41] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 73281 MB (7% inode=99%): [01:38:00] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1061161.000000 [01:38:50] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:38:50] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 99199 MB (1% inode=51%): [01:38:51] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [01:39:40] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 81106 MB (13% inode=99%): [01:41:10] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [01:43:10] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [01:45:50] Load avg. on willow is CRITICAL: CRITICAL - load average: 35.16, 33.33, 33.59 [01:49:10] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [02:04:10] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [02:05:10] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.513 second response time [02:06:50] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [02:13:10] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [02:38:00] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1058401.000000 [02:38:50] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:38:51] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 98836 MB (1% inode=51%): [02:38:51] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [02:39:40] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 81864 MB (13% inode=99%): [02:41:10] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [02:43:10] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [02:45:50] Load avg. on willow is CRITICAL: CRITICAL - load average: 34.70, 34.26, 34.50 [02:49:10] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [03:04:10] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [03:05:10] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.513 second response time [03:06:51] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [03:13:10] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [03:38:01] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1056228.000000 [03:38:49] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:38:50] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 97460 MB (1% inode=50%): [03:39:01] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [03:39:18] is the inability to access jira on the toolserver a known problem? [03:39:40] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 81806 MB (13% inode=99%): [03:41:10] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [03:41:54] any humans here? [03:43:10] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [03:45:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 36.57, 36.92, 36.40 [03:49:10] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [03:51:50] Hi. [03:51:52] JIRA is broken, yes. [03:56:43] thanks susan [03:56:50] No problem. [03:56:52] been broken a few days now>? any ETA? [03:57:33] This weekend. [03:57:43] great, that's what i needed, cheers [04:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [04:05:10] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.515 second response time [04:14:09] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [04:36:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [04:38:10] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1054668.000000 [04:38:59] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [04:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 81701 MB (13% inode=99%): [04:39:50] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 97632 MB (1% inode=50%): [04:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [04:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [04:45:55] Load avg. on willow is CRITICAL: CRITICAL - load average: 38.50, 36.93, 37.31 [04:50:09] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [05:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [05:06:10] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.512 second response time [05:14:09] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [05:36:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [05:36:55] helo all [05:37:27] aude, do you know what the status of the osm replication is? [05:38:10] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1052933.000000 [05:38:59] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [05:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 81637 MB (13% inode=99%): [05:39:50] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 97381 MB (1% inode=50%): [05:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [05:44:04] @replag [05:44:04] dschwen: s1-rr-a-wd: 1m 22s [+0.00 s/s]; s1-user: 13s [-0.01 s/s]; s1-user-wd: 1m 22s [+0.00 s/s]; s2-rr: 33m 31s [+0.02 s/s]; s2-user: 33m 31s [+0.02 s/s]; s2-user-c: error; s2-user-wd: 4h 51m 34s [+0.68 s/s]; s3-user: 2m 39s [+0.01 s/s] [05:44:05] dschwen: s4-user-wd: 1m 22s [+0.00 s/s]; s5-user-c: 1w 5d 4h 22m 15s [-0.72 s/s]; s5-user-wd: 1m 24s [+0.00 s/s]; s6-user-wd: 1m 22s [+0.00 s/s]; s7-user-wd: 1m 23s [+0.00 s/s] [05:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [05:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 32.27, 32.26, 33.29 [05:50:09] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [06:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [06:06:09] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.511 second response time [06:14:10] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [06:36:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [06:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1049232.000000 [06:38:58] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:39:10] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [06:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 81542 MB (13% inode=99%): [06:39:49] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 97155 MB (1% inode=50%): [06:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [06:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [06:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 34.96, 33.62, 33.29 [06:50:09] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [07:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [07:06:09] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.517 second response time [07:14:09] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [07:36:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [07:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1033060.000000 [07:38:59] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:39:10] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [07:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 81410 MB (13% inode=99%): [07:39:49] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 96792 MB (1% inode=50%): [07:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [07:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [07:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 33.43, 34.03, 34.26 [07:50:09] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [08:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [08:06:10] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.513 second response time [08:14:09] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [08:36:59] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [08:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1016680.000000 [08:38:59] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [08:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 81206 MB (13% inode=99%): [08:39:49] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 96348 MB (1% inode=50%): [08:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [08:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [08:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 35.49, 34.81, 33.91 [08:50:09] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [08:50:49] Load avg. on ortelius is WARNING: WARNING - load average: 23.26, 19.09, 11.05 [08:54:49] Load avg. on wolfsbane is WARNING: WARNING - load average: 15.92, 14.98, 8.86 [09:01:49] Load avg. on wolfsbane is OK: OK - load average: 13.41, 14.77, 11.12 [09:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [09:06:09] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.522 second response time [09:14:10] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [09:29:48] Load avg. on ortelius is CRITICAL: CRITICAL - load average: 24.53, 23.07, 20.10 [09:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1004275.000000 [09:38:58] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [09:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 80955 MB (13% inode=99%): [09:39:49] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 95646 MB (1% inode=50%): [09:40:20] Hi! I am confused: Where some user databases on s2 deleted? [09:40:45] Were deleted [09:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [09:41:27] I am looking for the db u_rdf_bots_2013 and u_rdf_bots [09:42:44] um [09:42:45] kinda [09:42:50] did you read the mailing list post? [09:43:20] http://lists.wikimedia.org/pipermail/toolserver-l/2013-February/005684.html and http://lists.wikimedia.org/pipermail/toolserver-l/2013-February/005686.html [09:43:24] rdf-toolserver: ^ [09:44:10] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [09:45:39] no, thanks for the info... (I have to many mailing list and have overseen this message). If if understand the post correctly the user databases are currently copied and will be available again after copying?! [09:45:46] de_wiki_p is also missing on s2 [09:46:49] Load avg. on ortelius is WARNING: WARNING - load average: 18.09, 18.68, 19.97 [09:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 31.32, 32.01, 32.87 [09:46:51] um [09:46:55] no i think they were just moved [09:46:58] not sure [09:47:04] ah, everything is on s5 now [09:47:17] there are probably a few more mails that explain it [09:48:05] Oh, OK. I must read the toolserver mails more thoroughly [09:48:12] nice weekend, bye [09:49:34] bye! [09:50:10] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [09:53:10] /sql on cassia is WARNING: DISK WARNING - free space: /sql 112208 MB (9% inode=99%): [09:55:11] @replag [09:55:21] multichill: s1-rr-a-wd: 1m 43s [+0.00 s/s]; s1-user-wd: 1m 43s [+0.00 s/s]; s2-rr: error; s2-user: error; s2-user-c: error; s2-user-wd: error; s3-user: 2m 19s [-0.00 s/s]; s4-user-wd: 1m 50s [+0.00 s/s] [09:55:22] multichill: s5-user-c: 1w 4d 14h 48m 56s [-3.24 s/s]; s5-user-wd: 1m 49s [+0.00 s/s]; s6-user-wd: 1m 50s [+0.00 s/s]; s7-user: 1m 44s [+0.00 s/s]; s7-user-wd: 1m 49s [+0.00 s/s] [10:02:48] Hi multichill. [10:02:49] Load avg. on ortelius is OK: OK - load average: 10.80, 11.71, 14.87 [10:05:10] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [10:06:09] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.521 second response time [10:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [10:14:10] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [10:17:02] hello Susan [10:24:27] oursql.ProgrammingError: (1146, "Table 'simplewiki_p.recentchanges' doesn't exist", None) [10:24:28] wat. [10:27:21] simple? that should have been vaporized ages ago... [10:27:31] simple.wikipedia.org... [10:30:49] Load avg. on ortelius is WARNING: WARNING - load average: 23.54, 20.93, 16.80 [10:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1002078.000000 [10:38:58] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [10:39:51] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 80715 MB (13% inode=99%): [10:39:51] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 94612 MB (1% inode=50%): [10:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [10:41:49] Load avg. on ortelius is OK: OK - load average: 11.95, 13.57, 14.88 [10:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [10:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 32.32, 32.67, 33.05 [10:51:09] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [11:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [11:06:09] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.514 second response time [11:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [11:14:09] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [11:27:49] Load avg. on ortelius is WARNING: WARNING - load average: 21.61, 19.32, 15.71 [11:36:48] MySQL slave on z-dat-s3-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2066 [11:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 999849.000000 [11:38:59] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [11:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 80458 MB (13% inode=99%): [11:39:50] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 93875 MB (1% inode=50%): [11:41:18] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [11:41:19] MySQL slave on z-dat-s7-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2064 [11:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [11:46:49] Load avg. on ortelius is OK: OK - load average: 11.96, 13.54, 14.86 [11:46:50] Load avg. on willow is CRITICAL: CRITICAL - load average: 33.98, 33.57, 33.43 [11:51:09] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [11:56:49] MySQL slave on z-dat-s6-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2099 [12:05:10] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [12:06:09] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.520 second response time [12:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [12:14:10] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [12:19:18] MySQL slave on z-dat-s7-a is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3615 [12:21:49] MySQL slave on z-dat-s3-a is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3626 [12:26:49] MySQL slave on z-dat-s4-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2116 [12:28:55] MySQL slave on z-dat-s6-a is OK: Uptime: 86765 Threads: 24 Questions: 23110171 Slow queries: 6018 Opens: 129262 Flush tables: 1 Open tables: 2813 Queries per second avg: 266.353 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1727 [12:36:50] Load avg. on ortelius is WARNING: WARNING - load average: 17.47, 16.02, 13.59 [12:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 995183.000000 [12:38:58] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:39:10] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [12:39:51] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 80154 MB (13% inode=99%): [12:39:51] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 93266 MB (1% inode=50%): [12:40:49] MySQL slave on z-dat-s4-a is OK: Uptime: 2208505 Threads: 1 Questions: 268219494 Slow queries: 79 Opens: 391 Flush tables: 7 Open tables: 119 Queries per second avg: 121.448 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1632 [12:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [12:42:19] MySQL slave on z-dat-s7-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3486 [12:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [12:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 35.85, 35.24, 34.93 [12:48:49] Load avg. on ortelius is OK: OK - load average: 10.53, 14.22, 14.46 [12:50:49] MySQL slave on z-dat-s3-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3551 [12:51:09] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [12:51:19] MySQL slave on z-dat-s7-a is OK: Uptime: 88119 Threads: 4 Questions: 44573249 Slow queries: 4180 Opens: 318628 Flush tables: 1 Open tables: 4101 Queries per second avg: 505.830 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1699 [13:01:49] MySQL slave on z-dat-s3-a is OK: Uptime: 88745 Threads: 16 Questions: 98229590 Slow queries: 7143 Opens: 1232775 Flush tables: 1 Open tables: 16384 Queries per second avg: 1106.874 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1730 [13:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [13:06:10] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.513 second response time [13:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [13:14:09] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [13:29:49] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 73468 MB (7% inode=99%): [13:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 981394.000000 [13:38:59] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [13:39:56] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 79882 MB (13% inode=99%): [13:39:56] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 92754 MB (1% inode=50%): [13:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [13:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [13:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 38.61, 35.81, 35.07 [13:51:10] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [13:56:49] Load avg. on ortelius is WARNING: WARNING - load average: 18.48, 16.80, 13.82 [14:02:49] Load avg. on ortelius is OK: OK - load average: 10.15, 14.42, 13.90 [14:04:05] hello all [14:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [14:06:09] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.683 second response time [14:06:50] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [14:14:19] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [14:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 967415.000000 [14:38:59] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [14:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 79593 MB (13% inode=99%): [14:39:49] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 91959 MB (1% inode=50%): [14:39:57] [[Category:Tools]] ! 10https://wiki.toolserver.org/w/index.php?diff=7805&oldid=7647&rcid=21537 * DGideas * (+54) (add zh teanslate) [14:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [14:44:10] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [14:45:41] [[Category:Tools]] ! 10https://wiki.toolserver.org/w/index.php?diff=7806&oldid=7805&rcid=21538 * Liangent * (-23) () [14:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 36.07, 36.14, 36.38 [14:50:59] [[User:DGideas]] ! 10https://wiki.toolserver.org/w/index.php?diff=7807&oldid=7222&rcid=21539 * DGideas * (-30) (edit redirect) [14:51:09] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [15:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [15:06:09] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.519 second response time [15:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [15:15:09] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [15:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 950532.000000 [15:38:59] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [15:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 79305 MB (13% inode=99%): [15:39:49] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 91398 MB (1% inode=50%): [15:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [15:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [15:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 35.15, 36.32, 36.56 [15:51:09] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [16:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [16:06:10] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 244 bytes in 0.757 second response time [16:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [16:15:10] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [16:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 937654.000000 [16:38:59] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [16:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [16:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 79023 MB (12% inode=99%): [16:39:50] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 90500 MB (1% inode=50%): [16:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [16:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [16:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 39.31, 39.83, 39.41 [16:51:10] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [17:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [17:06:09] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 245 bytes in 0.713 second response time [17:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [17:15:09] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [17:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 928421.000000 [17:38:59] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [17:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 78757 MB (12% inode=99%): [17:39:49] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 89573 MB (1% inode=50%): [17:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [17:44:10] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [17:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 40.33, 40.61, 40.63 [17:51:09] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [18:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [18:06:09] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 245 bytes in 0.513 second response time [18:06:48] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [18:15:09] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [18:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 906035.000000 [18:38:59] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [18:39:44] Hi all! [18:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 78466 MB (12% inode=99%): [18:39:49] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 88970 MB (1% inode=50%): [18:40:10] Does anyone know if/when we will get GeoData Extension database tables on the toolserver? [18:40:34] (I did'nt see them in the current DB schema on the ts. Or did I just miss them?) [18:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [18:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [18:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 37.49, 37.82, 38.31 [18:51:10] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [19:05:10] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [19:06:09] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 245 bytes in 0.517 second response time [19:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [19:15:09] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [19:30:51] oh damn, today's the deadline for SGE, isn't it. better get to it [19:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 882846.000000 [19:38:59] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [19:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [19:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 78118 MB (12% inode=99%): [19:39:49] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 88333 MB (1% inode=50%): [19:40:48] there'll be some crying tomorrow [19:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [19:44:10] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [19:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 36.86, 37.09, 37.48 [19:51:10] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [19:56:04] DarkoNeko: hm. Maybe I should do some SGE setup then -_-' [19:56:16] i'm on it currently [19:56:28] * DarkoNeko reads and rereads https://wiki.toolserver.org/view/SGE_for_beginners [19:56:46] or I could invest my time into movign the bot to labs [19:56:49] choices, choices [19:57:00] eh, whyever would you do that [19:57:07] * DarkoNeko goes back to his crons [19:57:10] because that will have to happen at some time anyway [19:58:17] using SGE is easier than learning labs :P [19:58:35] ohh, first job's a success. yay [19:58:37] labs is slow to login to, but it's basic linux [19:59:15] still, i'm kinda annoyed i have to go throught this because some other people can't cron correctly to save their lives <_< [20:01:20] I'm mainly annoyed by the fact my home directory gets spammed by SGE output junk [20:01:45] uho [20:02:15] thanks, better that I found that out immediatly [20:02:16] redirect it? [20:02:25] mine goes into a random folder called ~/logs [20:02:27] * DarkoNeko adds #$ -o /dev/null [20:02:30] :P [20:02:44] DarkoNeko: oh, good one [20:03:06] for a few of the scripts, i'd rather be able to have it sent throught the mail like cron did [20:03:43] wonder if there's a way for that, checking the doc [20:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [20:05:15] you can get an e-mail on errors etc, but other than taht, I'm not sure [20:05:23] qcronsub -help is not very informative [20:05:33] https://wiki.toolserver.org/view/Job_scheduling makes no mention of sending log content [20:05:54] -M [20:06:08] but I think those are the messages from -m [20:06:08] mail ... list ? :o [20:06:19] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 245 bytes in 0.516 second response time [20:06:27] hmm [20:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [20:07:50] let me check what happens if I queue an error :p [20:08:30] :) [20:08:50] meh, not very informative [20:09:03] I get 'Exit Status = 1' [20:09:05] but that's all [20:09:24] it does give some interesting info on run length [20:11:42] Labs is pretty unstable. [20:13:02] dereckson has been annoyhing me a lot about labs at some point. propagandaaa [20:13:42] I'm not sure which is more unstable: SGE or Labs. [20:13:47] It's a toss-up, I guess. [20:13:55] I don't really trust either. [20:14:38] hm [20:15:19] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [20:15:26] there's "core" file in my root directory, i have no idea what its purpose is [20:15:37] It's a core dump. [20:15:45] wonder if it's a remain from when the account was moved from nightshade ? [20:15:52] You can view it by running "pstack ~/core". [20:15:54] Susan, okay, but more precisely ? ^^; [20:16:00] DarkoNeko, you can't get the result of your program run by SGE mailed to you [20:16:05] aw :( [20:16:16] (yes, that would be a nice addition) [20:16:37] uwa, i had a php module crash on me once, apparently :D [20:16:58] that happened to me, too [20:17:19] Crashes aren't very rare. [20:17:26] Platonides: I'm thinking of writing a script that just mails all the .e files in the home folder & mails those [20:17:43] For yourself or all users? [20:17:57] valhallasw, even better would be a post-run script on SGE [20:18:01] for the nlwikibots project, mainly, but it should be relatively generic [20:18:06] Platonides: I have no clue how to do that [20:18:08] it's weird, i mean it's something that runs twice a day and doesn't use much RAM/CPU [20:18:16] It seems pretty awful to make users jump through so many hoops. [20:18:24] oh, just another wrapper script [20:18:38] I understand that the Toolserver is a free service, but it just seems like an awful user experience. [20:18:59] $SCRIPT >/dev/null 2>&1 | mail -whatever -subject $USER@toolserver.org [20:19:05] i'd argue it's becoming awful because of a few of its users ^^; [20:19:21] I think most developers like stability. [20:19:24] And it's not stable. [20:19:35] crontab v. cronie v. SGE; things are unstable or break or need additional hack scripts. [20:20:00] Not that I think Labs will be any better, but it's easy to see why people want a change. [20:20:12] the few big, big problems i've been confronted to have all been caused by an user "forgetting" to not run crons on script aht don't end timely [20:20:13] valhallasw, but I don't think it would skip the mail if it has no output [20:20:28] or that it does send it telling you that it existed with non-0 status [20:20:28] Platonides: I'm not sure if mail mails if there is no content [20:20:30] let me check [20:20:35] which ended clogging up the server and having all the "good" users crons behave erratically [20:20:49] Platonides: cron mails if there is something in stderr, not if the exit status is not 0 [20:21:49] DarkoNeko: Solaris' cron is apparently so broken I had to resort to using cronie. [20:21:59] Because jobs just stopped working reliably. [20:22:14] It's strange that (non-native) cronie works better. [20:22:20] it's because of these few users [20:22:35] If you say so. [20:22:38] hmm [20:23:06] Replicated databases are nice, but if I had the time, I'd rewrite all my tools to not use them and just move to a VPS. [20:23:38] if someone has a cron-ed script that doesn't end timely, there's more and more until it fills most or all of the slots [20:23:39] The lack of headaches is worth the few dollars a month. [20:23:47] that's what happened in the cases i was confronted to, anyway [20:24:31] oh COME ON [20:24:46] DarkoNeko: yes, exactly. Just having the part of qcronsub that checks for a running process should be more than enough [20:24:55] * DarkoNeko just did "top", there's once more a ton of "python /home/javadyou/pywikipedia/radeh7.py ", one of the user i had to report for the latest big failure [20:25:27] aaand it's using all the CPU again. [20:25:34] * DarkoNeko shakes fist [20:25:38] wow [20:25:41] that code is also horrible : [20:25:41] What is redah.py? [20:25:42] :p [20:25:58] some complicated search-and-replace [20:25:58] i have no idea [20:26:00] radeh.py * [20:26:21] is see radeh7.py for one, and radeh.py for another user (reeza) [20:26:30] ...it' sthe same user sthat caused the previous crash. [20:26:44] * DarkoNeko lights up a torch and goes burn their cron [20:26:48] try: [20:26:49] import MySQLdb [20:26:49] except: [20:26:49] wikipedia.output(u'\03{lightred}you should use this code only on toolserver\03{default}') [20:26:55] Lawl. [20:27:21] So rather than ban bad users, everyone has to switch to SGE? [20:27:25] That seems sensible... [20:27:40] well, the problem is, no one know when the same similar problem will happen [20:27:49] You could just look every day. [20:28:14] the firs tthing I do now when my crons stop running, is to run "top", rather than look at my scripts :) [20:28:51] But we don't need more roots. [20:28:52] it's sad that people tend to accuse me instead when the welcome script stop working :| [20:29:04] This whole system is pretty broken. [20:29:05] I'd make an horrible root :) [20:29:16] killall -9 radeh7..py [20:29:37] and voila ! [20:29:52] You missed radeh.py. [20:29:57] i guess you can't do as a root what I do as a sysop [20:30:03] * DarkoNeko coughs [20:38:10] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 867861.000000 [20:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:39:10] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [20:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 77860 MB (12% inode=99%): [20:39:49] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 87731 MB (1% inode=50%): [20:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [20:44:10] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [20:44:48] SGE-trololololol: reading the job name from the script file, then complaining "newtask: exec of test.py failed: No such file or directory" [20:46:50] Load avg. on willow is CRITICAL: CRITICAL - load average: 38.59, 38.38, 38.75 [20:51:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [20:57:05] Maarten Dammers * [Toolserver-l] Save the date: Amsterdam Hackathon 2013 (May 24-26) [21:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [21:06:23] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 245 bytes in 0.862 second response time [21:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [21:15:19] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [21:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 864602.000000 [21:39:13] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [21:39:13] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [21:39:49] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 87185 MB (1% inode=50%): [21:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 77465 MB (12% inode=99%): [21:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [21:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [21:44:41] /home/project/n/l/w/nlwikibots/bots/tvpupdater/runbot: line 12: /home/project/n/l/w/nlwikibots/bots/tvpupdater/bin/python: No such file or directory [21:44:41] nlwikibots@willow:~$ /home/project/n/l/w/nlwikibots/bots/tvpupdater/bin/python [21:44:44] . /home/project/n/l/w/nlwikibots/bots/tvpupdater/runbot: line 12: /home/project/n/l/w/nlwikibots/bots/tvpupdater/bin/python: No such file or directory [21:44:47] ... [21:45:14] why the heck can't SGE find the file if it's right there? [21:45:47] where was it compiled? [21:46:17] that... is a good question [21:46:31] I think it's a virtualenv with a sunos binary [21:46:39] so that will fail on linux hosts [21:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 36.91, 37.09, 37.20 [21:47:19] the program headers request program interpreter: /usr/lib/ld.so.1 [21:47:32] yeah, it fails on yarrow [21:47:37] which doesn't exist in Linux [21:47:50] but is no problem in willow [21:47:59] I'll set arch=solaris [21:48:26] why don't you use /usr/bin/python ? [21:48:31] virtualenv [21:48:38] I think having a symlink might work, though [21:51:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [21:52:07] Platonides: nope, doesn't work with a symlink [21:53:09] /sql on cassia is WARNING: DISK WARNING - free space: /sql 109356 MB (9% inode=99%): [21:55:28] I don't know how does virtualenv work [21:56:00] Platonides: I don't know the internals, but the basic idea is to have a project environment to install packages in [22:00:13] Please notice that it is "arch=sol" and not "arch=solaris" [22:01:20] I checked before adding it :-) But thanks nonetheless [22:04:21] you could use a shell script which switches the binary based on the uname [22:04:27] but that would require to compile it twice [22:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [22:06:01] Platonides: setting it to only work on solaris hosts is good enough [22:06:14] re-generating the virtualenv when solaris dies... is something I can fix then [22:06:19] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 245 bytes in 0.513 second response time [22:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [22:15:19] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [22:33:19] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:33:49] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 73004 MB (7% inode=99%): [22:38:10] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 854333.000000 [22:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:39:10] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [22:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 77201 MB (12% inode=99%): [22:39:59] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 86860 MB (1% inode=50%): [22:41:18] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:41:20] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [22:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [22:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 37.21, 37.32, 37.43 [22:50:05] Wolfgang ten Weges * Re: [Toolserver-l] Cron on submit [22:51:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [23:01:48] so, it's bot killing time! :p [23:05:10] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [23:06:19] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 245 bytes in 0.591 second response time [23:06:29] MySQL slave on rosemary is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2132 [23:06:39] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2130.000000 [23:06:50] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [23:09:28] MySQL slave on rosemary is CRITICAL: (Service Check Timed Out) [23:09:29] s4 replag on rosemary is CRITICAL: (Service Check Timed Out) [23:10:09] MySQL slave on rosemary is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2307 [23:10:09] s4 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2159.000000 [23:11:39] s1 replag on rosemary is CRITICAL: (Service Check Timed Out) [23:11:39] s4 replag on rosemary is CRITICAL: (Service Check Timed Out) [23:12:09] Environment IPMI on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:12:59] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2452.000000 [23:13:59] Environment IPMI on rosemary is OK: ok: temperature ok fan ok voltage ok chassis ok [23:15:19] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [23:16:08] what happened to willow? [23:17:59] wikidata replag on rosemary is CRITICAL: (Service Check Timed Out) [23:18:03] sorry, it looks like I killed some processes by accident. [23:18:28] wikidata replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2020.000000 [23:18:30] yeah, my screens died [23:18:38] my too [23:18:48] which sort of sucks... ;-) [23:18:49] SMTP on willow is CRITICAL: Connection refused [23:19:10] looks like it killed some system-processes too [23:19:14] DaBPunkt: any further accidents planned or can i restart? ;-) [23:19:53] assuming there is no way to get back to dead screen? [23:19:58] let me check how big the impeck on the system-processes was. In worst case I have to reboot [23:20:26] [[Rules]] 10https://wiki.toolserver.org/w/index.php?diff=7808&oldid=6603&rcid=21540 * Dab * (+13) (/* Terms of Use */ bots-via-SGE-rule active since today) [23:21:49] SMTP on willow is OK: SMTP OK - 0.073 sec. response time [23:23:39] Danny_B|backup: you can restart, but I iwll announce an reboot for tomorrow evening just to be sure [23:25:09] wikidata replag on rosemary is CRITICAL: (Service Check Timed Out) [23:25:55] o_o [23:25:59] so another restart [23:26:10] nightshade is doing well? [23:26:19] i could start my screens there instead [23:26:21] should, yes [23:27:26] hmm, so no chance to get back the data of dead screens [23:27:43] that hurts [23:28:09] Environment IPMI on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:32:19] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:33:09] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.269 second response time [23:34:03] Danny_B|backup: no, sorry. It was a hard kill-command so every killed process was away immedially [23:35:17] happens [23:35:35] we're not going to desysop you because of that, don't worry [23:37:05] DaB. * [Toolserver-announce] Reboot of willow Monday [23:37:19] SSH on rosemary is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 841652.000000 [23:38:09] SSH on rosemary is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [23:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:39:10] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [23:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 76897 MB (12% inode=99%): [23:40:19] /mnt user-store on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:41:20] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [23:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [23:45:19] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 4208 [23:46:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 18.75, 18.85, 20.79 [23:51:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [23:53:19] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4668.000000 [23:53:19] SMF on web.amaranth is OK: OK - all services online [23:53:19] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:53:49] Load avg. on willow is WARNING: WARNING - load average: 18.38, 18.59, 19.95 [23:55:19] jira.toolserver.org on web.amaranth is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 170 bytes in 0.549 second response time [23:56:19] jira.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 302 Moved Temporarily - 241 bytes in 0.577 second response time [23:56:29] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4442.000000