[00:01:19] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [00:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [00:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [00:09:19] SMF on web.amaranth is OK: OK - all services online [00:11:19] wikidata replag on rosemary is CRITICAL: (Service Check Timed Out) [00:12:19] jira.toolserver.org on web.amaranth is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 170 bytes in 0.604 second response time [00:22:30] hmm, nightshade slllooowwww now [00:36:46] ok, looks like I fixed jira (=is broken like befoire amaranth went away') [00:37:26] yay! [00:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 833333.000000 [00:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:39:10] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [00:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 76669 MB (12% inode=99%): [00:40:58] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 86412 MB (1% inode=50%): [00:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [00:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [00:44:14] awesome [00:45:39] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 7670 [00:48:32] the connection to fisheye is still broken because of an invald ssl-path and all users beside me and danielK are still missing [00:49:18] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [00:51:18] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [00:53:29] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 8111.000000 [00:53:49] Load avg. on willow is WARNING: WARNING - load average: 18.87, 18.92, 18.98 [00:56:49] s4 replag on rosemary is CRITICAL: (Service Check Timed Out) [01:05:10] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [01:06:50] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [01:11:39] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7233.000000 [01:25:19] toolserver.org HTTP on wolfsbane is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 8.580 second response time [01:25:19] toolserver.org HTTP on ortelius is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 9.923 second response time [01:26:09] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.004 second response time [01:27:56] any OSM people at the keyboard? Anything up with ptolemy? DB request seem awfully slow right now [01:32:09] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.026 second response time [01:38:08] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 817692.000000 [01:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [01:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 75374 MB (12% inode=99%): [01:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [01:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [01:44:49] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 86314 MB (1% inode=50%): [01:46:19] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 11039 [01:49:19] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [01:50:18] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:51:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [01:53:48] Load avg. on willow is WARNING: WARNING - load average: 18.87, 18.82, 18.91 [01:54:09] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11455.000000 [01:58:09] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.256 second response time [01:58:28] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 10263.000000 [02:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [02:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [02:12:19] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 9987.000000 [02:26:49] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 72568 MB (7% inode=99%): [02:28:19] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:28:49] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 72564 MB (7% inode=99%): [02:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 806100.000000 [02:39:10] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:39:10] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [02:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 76468 MB (12% inode=99%): [02:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [02:44:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [02:45:09] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 86251 MB (1% inode=50%): [02:46:29] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 14044 [02:49:19] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [02:50:19] toolserver.org HTTP on ortelius is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 9.619 second response time [02:51:18] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [02:53:49] Load avg. on willow is WARNING: WARNING - load average: 19.22, 19.21, 19.16 [02:54:49] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 14454.000000 [02:55:51] nacht ts [02:58:39] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11413.000000 [02:59:09] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.023 second response time [03:05:09] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [03:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [03:13:09] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11960.000000 [03:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 789818.000000 [03:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:39:10] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [03:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 76272 MB (12% inode=99%): [03:41:21] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [03:44:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [03:45:49] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 88575 MB (1% inode=50%): [03:46:49] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 17044 [03:49:19] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [03:51:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [03:53:49] Load avg. on willow is WARNING: WARNING - load average: 18.50, 18.79, 18.99 [03:55:19] s1 replag on rosemary is CRITICAL: (Service Check Timed Out) [03:58:39] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 12652.000000 [04:05:10] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [04:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [04:13:09] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 13387.000000 [04:25:49] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 72214 MB (7% inode=99%): [04:32:19] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:32:20] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:35:09] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.010 second response time [04:35:09] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.228 second response time [04:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 780596.000000 [04:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [04:39:49] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 76165 MB (12% inode=99%): [04:41:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [04:45:47] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [04:46:29] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:46:30] /mnt user-store on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:46:58] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 72231 MB (7% inode=99%): [04:47:19] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 20105 [04:49:19] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [04:49:29] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:51:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [04:53:59] Load avg. on willow is WARNING: WARNING - load average: 18.55, 18.82, 19.01 [04:56:09] s1 replag on rosemary is CRITICAL: (Service Check Timed Out) [04:58:58] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 14128.000000 [05:05:19] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [05:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [05:13:39] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 15500.000000 [05:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 777136.000000 [05:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [05:39:59] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 76106 MB (12% inode=99%): [05:41:20] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [05:44:18] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [05:46:29] /mnt user-store on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:47:39] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 23121 [05:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [05:51:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [05:53:59] Load avg. on willow is WARNING: WARNING - load average: 19.48, 19.25, 19.18 [05:56:59] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 23570.000000 [05:59:19] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 15137.000000 [06:05:19] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [06:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [06:13:59] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 16926.000000 [06:15:59] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 71512 MB (7% inode=99%): [06:18:29] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:19:09] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 71480 MB (7% inode=99%): [06:25:29] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:35:20] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:35:21] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:37:10] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.014 second response time [06:37:10] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.099 second response time [06:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 771871.000000 [06:39:08] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [06:39:59] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 76038 MB (12% inode=99%): [06:41:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [06:44:18] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [06:47:19] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 87646 MB (1% inode=50%): [06:47:58] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 26235 [06:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [06:51:18] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [06:53:59] Load avg. on willow is WARNING: WARNING - load average: 19.35, 19.49, 19.42 [06:56:59] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 26599.000000 [06:57:59] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 71405 MB (7% inode=99%): [07:00:08] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 15604.000000 [07:05:19] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [07:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [07:08:29] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:08:59] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 71165 MB (7% inode=99%): [07:14:29] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 17567.000000 [07:17:29] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 755099.000000 [07:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [07:39:59] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 75933 MB (12% inode=99%): [07:41:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [07:44:19] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [07:48:29] /mnt user-store on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:49:09] MySQL slave on rosemary is CRITICAL: (Service Check Timed Out) [07:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [07:51:20] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [07:53:59] Load avg. on willow is WARNING: WARNING - load average: 19.32, 19.29, 19.35 [07:57:29] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 29970.000000 [08:00:59] s4 replag on rosemary is CRITICAL: (Service Check Timed Out) [08:05:29] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [08:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [08:08:59] Load avg. on willow is CRITICAL: CRITICAL - load average: 20.48, 20.41, 20.01 [08:09:59] Load avg. on willow is WARNING: WARNING - load average: 19.87, 20.25, 19.98 [08:10:59] Load avg. on willow is CRITICAL: CRITICAL - load average: 20.23, 20.29, 20.01 [08:13:59] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 71066 MB (7% inode=99%): [08:14:29] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 18802.000000 [08:26:35] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:27:09] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 71044 MB (7% inode=99%): [08:34:00] Load avg. on willow is CRITICAL: CRITICAL - load average: 19.70, 19.97, 20.04 [08:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 751992.000000 [08:38:58] Load avg. on willow is WARNING: WARNING - load average: 19.86, 19.92, 20.00 [08:39:08] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [08:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:39:59] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 75792 MB (12% inode=99%): [08:40:59] Load avg. on willow is CRITICAL: CRITICAL - load average: 21.75, 20.39, 20.15 [08:41:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [08:44:18] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [08:48:59] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 87266 MB (1% inode=50%): [08:49:29] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 32357 [08:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [08:51:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [08:57:49] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 32639.000000 [08:59:29] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:00:09] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 70968 MB (7% inode=99%): [09:01:19] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 12265.000000 [09:06:19] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [09:06:48] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [09:14:39] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 18207.000000 [09:22:18] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:22:19] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:24:08] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.056 second response time [09:24:09] toolserver.org HTTP on ortelius is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.853 second response time [09:25:19] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:27:29] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:28:09] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 70927 MB (7% inode=99%): [09:30:29] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:34:23] [[Special:Log/newusers]] create 10 * Maxinesolution * (New user account) [09:37:05] [[Category:Tools by gurkan]] ! 10https://wiki.toolserver.org/w/index.php?diff=7809&oldid=4451&rcid=21542 * Maxinesolution * (+10) (Maxine Solution Pvt Ltd - provide Software patna, bihar, jharkhand, delhi, Software Company patna, bihar, jharkhand, delhi, Website Company in Patna, Bihar and Jharkhand, software development company jharkhand, seo patna, web design in Patna, Website) [09:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 748163.000000 [09:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [09:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:40:00] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 75621 MB (12% inode=99%): [09:40:59] Load avg. on willow is CRITICAL: CRITICAL - load average: 21.11, 20.54, 20.41 [09:41:28] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [09:44:19] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [09:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [09:50:59] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 87047 MB (1% inode=50%): [09:51:09] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 35180 [09:51:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [09:53:09] /sql on cassia is WARNING: DISK WARNING - free space: /sql 106088 MB (8% inode=99%): [09:58:49] s1 replag on rosemary is CRITICAL: (Service Check Timed Out) [10:01:18] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 10834.000000 [10:06:19] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [10:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [10:14:39] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 18785.000000 [10:20:18] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:20:19] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:22:44] Oh, toolserver.org is back it seems. [10:22:59] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 70854 MB (7% inode=99%): [10:24:10] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.013 second response time [10:24:10] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.124 second response time [10:34:18] toolserver.org HTTP on wolfsbane is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 7.153 second response time [10:34:19] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 743514.000000 [10:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [10:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:39:59] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 75461 MB (12% inode=99%): [10:40:59] Load avg. on willow is CRITICAL: CRITICAL - load average: 20.68, 20.48, 20.49 [10:41:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [10:43:20] Hi! Today I ofter have problems with phpmyadmin. I frequently get the message: "500 Internal Server Error nginx/1.0.4" [10:43:31] is it a caching problem? [10:43:44] I use the URL https://phpmyadmin.toolserver.org [10:44:19] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [10:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [10:51:09] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 86820 MB (1% inode=50%): [10:51:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [10:51:59] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 37947 [10:59:39] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 38387.000000 [11:00:59] Load avg. on nightshade is WARNING: WARNING - load average: 10.53, 17.30, 13.27 [11:01:59] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 12194.000000 [11:02:59] Load avg. on nightshade is OK: OK - load average: 4.96, 13.00, 12.18 [11:03:29] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:03:59] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 70788 MB (7% inode=99%): [11:06:29] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [11:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [11:15:18] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 20514.000000 [11:17:53] @replag [11:17:54] russblau: s1-rr-a-wd: 56s [-0.00 s/s]; s1-user: 10h 57m 6s [+0.37 s/s]; s1-user-c: 3h 24m 54s [+0.05 s/s]; s1-user-wd: 5h 39m 15s [+0.22 s/s]; s2-user-c: error; s2-user-wd: 1d 10h 25m 23s [+1.00 s/s]; s3-user: 13s [-0.00 s/s]; s4-user-wd: 59s [-0.00 s/s] [11:17:55] russblau: s5-user-c: 1w 1d 14h 3m 25s [-2.87 s/s]; s5-user-wd: 59s [-0.00 s/s]; s6-user-wd: 56s [-0.00 s/s]; s7-user-wd: 58s [-0.00 s/s] [11:21:31] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:33:59] Load avg. on willow is WARNING: WARNING - load average: 19.59, 19.83, 19.98 [11:34:59] Load avg. on willow is CRITICAL: CRITICAL - load average: 20.32, 19.99, 20.03 [11:37:05] rdf-toolserver: It should be working. If not, please file a bug at jira.toolserver.org. [11:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 741490.000000 [11:39:09] @Susan: Did you make some changes? Seems to be faster now? Do you know what kind of error it is? "500 Internal Server Error nginx/1.0.4"" ? [11:39:09] rdf-toolserver: Error: No closing quotation [11:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [11:39:10] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:39:32] I didn't make any changes. [11:39:45] @tsbot: What do you mean? No quotation in sql-queries? [11:39:59] Load avg. on willow is WARNING: WARNING - load average: 19.76, 19.90, 19.99 [11:39:59] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 75265 MB (12% inode=99%): [11:40:10] tsbot is a robot. [11:40:21] He was activated by your use of "@" at the beginning of a message. [11:40:24] @replag [11:40:24] Susan: s1-rr-a-wd: 1m 42s [+0.03 s/s]; s1-user: 11h 16m 34s [+0.86 s/s]; s1-user-c: 3h 28m 42s [+0.17 s/s]; s1-user-wd: 5h 40m 35s [+0.06 s/s]; s2-user-c: error; s2-user-wd: 1d 10h 47m 54s [+1.00 s/s]; s3-user: 30s [+0.01 s/s]; s4-user-wd: 1m 42s [+0.03 s/s] [11:40:25] Susan: s5-user-c: 1w 1d 13h 57m 50s [-0.25 s/s]; s5-user-wd: 1m 42s [+0.03 s/s]; s6-user-wd: 1m 42s [+0.03 s/s]; s7-user: 12s [-0.00 s/s]; s7-user-wd: 1m 42s [+0.03 s/s] [11:41:21] ah ok... Probably I will fill a bug report. The error happens frequently when I use phpmyadmin [11:41:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [11:44:18] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [11:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [11:51:18] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [11:51:29] /mnt user-store on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:51:59] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 41129 [12:00:09] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 41378.000000 [12:02:19] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 12885.000000 [12:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [12:07:18] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [12:08:46] is it me or is TS loading very slowly? [12:08:55] What do you mean by "TS"? [12:09:00] Via HTTP? SSH? SFTP? [12:09:11] sorry, http & ssh [12:09:27] Yeah, seems slow to me. [12:09:38] load on willow seems high. [12:09:59] Not that it's a Web server, but still. [12:10:28] yes, it does seem high. javadyou seems to be running a lot of python stuff longterm [12:10:52] Yeah. :-/ [12:11:05] The roots don't seem interested in killing those. [12:15:59] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 20529.000000 [12:18:17] Susan: he seems to be running with about 45% of the CPU right now [12:19:20] * Izawayz wonders what this guy's username is [12:19:58] Load avg. on willow is CRITICAL: CRITICAL - load average: 21.24, 21.30, 21.04 [12:20:20] ugh...this is killing my page load time [12:20:53] Perhaps you could e-mail ts-admins@toolserver.org? [12:20:54] Dunno. [12:21:10] The Toolserver has been annoying me lately more than usual. [12:22:19] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:22:34] same [12:23:18] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:09] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.009 second response time [12:25:09] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.010 second response time [12:35:09] Is the toolserver down for anybody else? O_O [12:35:16] http://isup.me/toolserver.org [12:36:05] its up and down right now [12:36:39] duh: Thanks. Any reason why? I thought only Willow was being restarted [12:36:49] high load [12:36:50] 18<Susan> load on willow seems high. <- still radeh.py ? [12:37:16] they're basically hogging the first page of "top" [12:37:21] CPU wise [12:37:25] yeah i think so [12:37:43] mentionned it on the ML yesterday, we'll see wether it's checked out [12:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 740650.000000 [12:38:13] i think they were already responisble for the previous big failure... [12:38:58] @replag [12:38:59] Vacation9: s1-rr-a-wd: 1m 21s [-0.01 s/s]; s1-user: 12h 6m 24s [+0.85 s/s]; s1-user-c: 3h 53m 55s [+0.43 s/s]; s1-user-wd: 5h 57m 27s [+0.29 s/s]; s2-user-c: error; s2-user-wd: 1d 11h 46m 29s [+1.00 s/s]; s3-user: 35s [+0.00 s/s]; s4-user-wd: 1m 21s [-0.01 s/s] [12:39:00] Vacation9: s5-user-c: 1w 1d 13h 44m 56s [-0.22 s/s]; s5-user-wd: 1m 18s [-0.01 s/s]; s6-user-wd: 1m 21s [-0.01 s/s]; s7-user-wd: 1m 21s [-0.01 s/s] [12:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [12:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:39:16] hmm those don't look very good [12:39:37] @replag [12:39:37] DarkoNeko: s1-rr-a-wd: 1m 31s [+0.26 s/s]; s1-user: error; s1-user-c: error; s1-user-wd: error; s2-user-c: error; s2-user-wd: 1d 11h 47m 7s [+0.99 s/s]; s3-user: 31s [-0.10 s/s]; s4-user-wd: 1m 28s [+0.18 s/s] [12:39:38] DarkoNeko: s5-user-c: 1w 1d 13h 45m 24s [+0.73 s/s]; s5-user-wd: 1m 28s [+0.26 s/s]; s6-user-wd: 1m 28s [+0.18 s/s]; s7-user-wd: 1m 31s [+0.26 s/s] [12:39:51] uh oh. [12:39:59] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 75098 MB (12% inode=99%): [12:41:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [12:42:01] oh well it'll get sorted [12:42:03] eventually [12:44:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [12:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [12:51:29] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [12:51:59] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 44101 [12:52:09] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 86359 MB (1% inode=50%): [13:00:05] Platonides * Re: [Toolserver-l] New Rule: SGE-constraint for bots [13:00:39] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 44565.000000 [13:02:39] s4 replag on rosemary is CRITICAL: (Service Check Timed Out) [13:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [13:07:19] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [13:13:20] It's down again. [13:15:14] Is it because some people don't use SGE and hog all the resources? [13:15:59] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 22152.000000 [13:16:40] seems like the user-store disk access is unstable? [13:17:09] toolserver.org HTTP on ortelius is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 4.454 second response time [13:17:19] * whym_away admittedly has no clue [13:19:44] The user-store is "almost" full: 99% used, although the 1% left is still 85G. [13:19:59] Load avg. on willow is CRITICAL: CRITICAL - load average: 20.02, 20.16, 20.32 [13:26:09] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.120 second response time [13:30:05] DaB. * Re: [Toolserver-l] New Rule: SGE-constraint for bots [13:30:13] Hello all [13:30:15] @replag [13:30:15] DaBPunkt: s1-rr-a-wd: 1m 22s [-0.00 s/s]; s1-user: error; s1-user-c: error; s1-user-wd: error; s2-user-c: error; s2-user-wd: 1d 12h 37m 45s [+1.00 s/s]; s3-user: 40s [+0.00 s/s]; s4-user-wd: 1m 24s [-0.00 s/s] [13:30:16] DaBPunkt: s5-user-c: 1w 1d 13h 1m 18s [-0.87 s/s]; s5-user-wd: 1m 24s [-0.00 s/s]; s6-user-wd: 1m 22s [-0.00 s/s]; s7-user-wd: 1m 25s [-0.00 s/s] [13:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 736616.000000 [13:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [13:39:59] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 74895 MB (12% inode=99%): [13:41:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [13:44:30] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [13:49:30] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [13:50:48] Toolserver.org is down again. [13:51:18] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:51:29] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [13:51:47] Hi DaBPunkt. Could this be due to the overload on willow? [13:52:29] /mnt user-store on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:52:31] no [13:52:38] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 47580 [13:52:54] Dakdada: what excactly is not working? [13:53:40] The webpages on toolserver.org are sometimes unreachable [13:54:13] Or really slow to load. [13:55:16] Dakdada: I need an example. [13:57:37] For me, loading http://www.toolserver.org in my browser is either really slow, or it gives me an error like "Bad Gateway" [13:58:15] ah ok [14:01:28] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 48095.000000 [14:03:30] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 16816.000000 [14:03:30] Dakdada: does it work now for you? [14:04:15] Yes. [14:04:52] ok, looks like 1 of the loadbalancer had a hiccup [14:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [14:07:07] DaBPunkt: I am missing a userdatabase on s2 for a tool, I already informed the owner, who is unfortunately less active, but as I understand the tool correctly it will be needed on both s2 and s5, so should likely be copied and then be left on s5 as well [14:07:21] play it via the toolowner (erwin), or can you copy it already? [14:07:29] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [14:09:27] akoopal: I can copy the database over 1 time if you like. But erwin has to update it himself [14:09:47] u_erwin85 it is [14:10:29] You can't access this database [14:10:31] if I look at the errors he needs to clean the s5 wiki's from s2, and the s2 wiki's from s5, the database is a category cache [14:10:59] oh it is for a tool of HIM. mm, than better not [14:11:49] NTP on turnera is WARNING: NTP WARNING: Server has the LI_ALARM bit set, Offset -0.000498 secs [14:12:48] Thanks for your time DaBPunkt [14:12:51] akoopal: please speak with erwin. If he is too busy to do the copy himself and he gives his ok I can copy the database – but not without his ok because it could break something [14:12:57] Dakdada: no porblem [14:14:18] DaBPunkt: ok, understood, I don't know the internals either, I will hope he pays attention to his talkpage :) [14:15:09] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 70391 MB (7% inode=99%): [14:16:10] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 23864.000000 [14:20:09] Load avg. on willow is CRITICAL: CRITICAL - load average: 20.46, 20.70, 20.93 [14:21:49] NTP on turnera is OK: NTP OK: Offset 0.005297 secs [14:25:09] Load avg. on willow is WARNING: WARNING - load average: 12.42, 17.82, 19.80 [14:29:39] /sql on rosemary is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:30:29] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 70333 MB (7% inode=99%): [14:31:09] Load avg. on willow is OK: OK - load average: 1.82, 7.66, 14.53 [14:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 734030.000000 [14:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:39:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [14:39:27] I restart mysql on rosemary. It swaps [14:40:09] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 74658 MB (12% inode=99%): [14:41:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [14:44:39] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [14:46:09] MySQL on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [14:46:19] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:46:19] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:47:02] need to reboot rosemary [14:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [14:52:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [14:55:39] MySQL slave on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [15:03:49] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on rosemary (146) [15:03:50] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on rosemary (146) [15:05:09] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 70517 MB (7% inode=99%): [15:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [15:07:29] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [15:16:49] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on rosemary (146) [15:24:49] is sulinfo broken? http://www.toolserver.org/~quentinv57/tools/sulinfo.php?username=LA2 [15:25:06] or the entire toolserver? [15:28:25] LA2: toolserver is acting weird right now [15:28:30] Arg it looks like there is still something wrong. [15:29:09] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 70517 MB (7% inode=99%): [15:29:26] ssh connection is also slow (via willow) [15:29:59] yeah [15:30:08] also running qstat on nightshade throws an error [15:30:26] legoktm@nightshade:~$ qstat [15:30:26] critical error: Please set the environment variable SGE_ROOT. [15:30:33] HTTP giving 324 [15:30:42] qstat on willow is fine [15:30:52] i'm getting 503's on HTTP [15:31:04] 504 for me :D [15:31:10] er, 500 [15:31:14] it looks like there is a problem on rosemary or the ha-cluster [15:31:19] I'm not sure yet [15:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 731536.000000 [15:39:10] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:10] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [15:40:09] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 74533 MB (12% inode=99%): [15:41:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [15:45:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [15:46:19] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:46:19] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:46:29] MySQL on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [15:49:28] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [15:52:29] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [15:55:25] I have to remove /mnt/user-store fom the hosts [15:55:29] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [15:56:29] MySQL slave on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [15:58:29] MySQL on rosemary is OK: Uptime: 989 Threads: 8 Questions: 7249 Slow queries: 2 Opens: 98 Flush tables: 1 Open tables: 88 Queries per second avg: 7.329 [16:02:09] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 70659 MB (7% inode=99%): [16:02:29] Sun Grid Engine execd on ortelius is CRITICAL: Connection refused by host [16:04:19] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 55030.000000 [16:04:19] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 21888.000000 [16:04:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [16:05:39] ok, the webserver should be better now [16:05:43] +s [16:06:21] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.006 second response time [16:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [16:07:29] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [16:11:29] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [16:17:28] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 30260.000000 [16:17:42] Thanks DaBPunkt! [16:24:59] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [16:25:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [16:27:05] DaB. * [Toolserver-announce] Postmortem: Partial Toolserver-outage [16:34:59] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [16:35:59] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [16:37:39] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [16:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 715769.000000 [16:38:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [16:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [16:39:10] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [16:40:09] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 74368 MB (12% inode=99%): [16:40:28] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [16:42:24] DaBPunkt: there's not a mount command in some bash history file? [16:42:44] just brainstorming [16:42:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [16:45:59] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [16:46:16] jeremyb_: The one in it is not working [16:46:30] (I hate this SAN-system…) [16:48:34] how can a sane person assume that "c2t600C0FF000111F0D4F82604F01000000d0s15" is a good name?? [16:48:59] (yes, I know that there is a global identier in it) [16:49:28] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [16:54:00] DaBPunkt: but then maybe it's the wrong bash history? maybe it's her personal user or "root"? (and maybe you only checked one of those) [16:54:09] her personal [16:54:26] the on for root is very short [16:54:31] on → one [16:54:40] anyway, does seem like a bad situation [16:55:52] the problem is that several other important partitions are on this SAN-system so I try not to tinker with it too much [16:56:28] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 52195 [16:56:42] @replag [16:56:43] DaBPunkt: s1-rr-a-wd: 1m 20s [-]; s1-user: 14h 28m 38s [-]; s1-user-c: 1h 43m 19s [-]; s1-user-wd: 9h 3m 44s [-]; s2-rr: 10m 2s [-]; s2-user: 10m 2s [-]; s2-user-c: error; s2-user-wd: 22h 37m 16s [-] [16:56:43] DaBPunkt: s3-user: 17s [-]; s4-user-wd: 1m 20s [-]; s5-user-c: 1w 1d 5h 19m 30s [-]; s5-user-wd: 1m 20s [-]; s6-user-wd: 1m 20s [-]; s7-user-wd: 1m 20s [-] [17:01:41] DaBPunkt: sure [17:04:19] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 51867.000000 [17:04:19] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 3962.000000 [17:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [17:07:05] Tim Landscheidt * Re: [Toolserver-l] Postmortem: Partial Toolserver-outage [17:07:29] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [17:08:19] s4 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3571.000000 [17:11:13] i wonder if tim does IRC ever? [17:12:42] which tim? [17:16:19] s4 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1709.000000 [17:17:28] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 33859.000000 [17:17:47] @replag [17:17:47] DaBPunkt: s1-rr-a-wd: 1m 31s [+0.01 s/s]; s1-user: 13h 55m 12s [-1.59 s/s]; s1-user-c: 22m 24s [-3.84 s/s]; s1-user-wd: 9h 24m 49s [+1.00 s/s]; s2-rr: 27m 25s [+0.82 s/s]; s2-user: 27m 25s [+0.82 s/s]; s2-user-c: error; s2-user-wd: 22h 51m 28s [+0.67 s/s] [17:17:48] DaBPunkt: s4-user-wd: 1m 33s [+0.01 s/s]; s5-user-c: 1w 1d 4h 3m 2s [-3.63 s/s]; s5-user-wd: 1m 31s [+0.01 s/s]; s6-user-wd: 1m 31s [+0.01 s/s]; s7-user-wd: 1m 33s [+0.01 s/s] [17:24:59] @replag [17:24:59] DaBPunkt: s1-rr-a-wd: 59s [-0.07 s/s]; s1-user: 13h 42m 3s [-1.83 s/s]; s1-user-wd: 8h 44m 43s [-5.58 s/s]; s2-rr: 31m 24s [+0.55 s/s]; s2-user: 31m 24s [+0.55 s/s]; s2-user-c: error; s2-user-wd: 22h 52m 26s [+0.13 s/s]; s4-user-wd: 59s [-0.08 s/s] [17:25:00] DaBPunkt: s5-user-c: 1w 1d 3h 35m 29s [-3.83 s/s]; s5-user-wd: 59s [-0.07 s/s]; s6-user-wd: 58s [-0.08 s/s]; s7-user-wd: 58s [-0.08 s/s] [17:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 700636.000000 [17:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:39:19] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [17:40:09] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 74189 MB (12% inode=99%): [17:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [17:54:46] hi, anyone running .NET bot (with mono)? [17:56:29] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 43556 [18:04:19] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 42333.000000 [18:04:45] @replag [18:04:46] DaBPunkt: s1-rr-a-wd: 1m 5s [+0.00 s/s]; s1-user: 11h 43m 0s [-2.99 s/s]; s1-user-wd: 4h 34m 46s [-6.28 s/s]; s2-rr: 43m 31s [+0.30 s/s]; s2-user: 43m 31s [+0.30 s/s]; s2-user-c: error; s2-user-wd: 22h 46m 10s [-0.16 s/s]; s3-user: 1m 6s [+0.01 s/s] [18:04:47] DaBPunkt: s4-user-wd: 1m 6s [+0.00 s/s]; s5-user-c: 1w 1d 1h 8m 47s [-3.69 s/s]; s5-user-wd: 1m 6s [+0.00 s/s]; s6-user-wd: 1m 5s [+0.00 s/s]; s7-user-wd: 1m 4s [+0.00 s/s] [18:05:23] ←away for food [18:06:48] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [18:07:29] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [18:10:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [18:12:59] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:13:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [18:17:28] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 10414.000000 [18:17:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [18:19:59] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:20:58] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:21:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [18:26:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:35:30] wikidata replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3411.000000 [18:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 693947.000000 [18:39:13] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:39:19] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [18:39:29] wikidata replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1787.000000 [18:40:09] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 73998 MB (12% inode=99%): [18:44:29] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [18:45:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [18:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [18:49:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [18:49:59] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:51:49] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [18:54:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:54:59] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:55:29] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [18:55:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [18:56:30] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 32489 [18:56:59] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:58:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:59:58] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [19:04:19] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 31113.000000 [19:06:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [19:07:29] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [19:18:06] DaB. * Re: [Toolserver-announce] Reboot of willow Monday [19:22:06] DaB. * Re: [Toolserver-l] Postmortem: Partial Toolserver-outage [19:29:29] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [19:30:30] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [19:34:59] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [19:35:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [19:36:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [19:36:39] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [19:37:59] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [19:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 692594.000000 [19:39:10] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [19:39:18] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [19:40:09] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 73830 MB (12% inode=99%): [19:44:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [19:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [19:56:29] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 22927 [20:02:06] Tim Landscheidt * Re: [Toolserver-l] Postmortem: Partial Toolserver-outage [20:03:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [20:04:19] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 22749.000000 [20:06:59] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [20:07:29] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [20:07:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [20:07:29] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [20:25:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:26:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [20:26:59] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:27:38] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [20:38:10] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 691112.000000 [20:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:39:19] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [20:40:10] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 73652 MB (12% inode=99%): [20:41:59] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:42:59] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:42:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:43:29] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [20:44:39] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [20:47:59] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [20:56:29] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 18510 [21:04:19] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 17988.000000 [21:07:14] I notice I forgot to renew my account this time ヾ [21:07:29] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [21:07:56] DaBPunkt: arth thou there? [21:14:45] https://toolserver.org/~azatoth/ RIP. [21:14:48] wtf [21:14:49] AzaToth: File a ticket in JIRA? [21:14:53] Susan: trying [21:14:57] :-) [21:15:01] but I'm getting logged out after like 30 secs [21:15:11] Heh. [21:15:16] File faster. ;-) [21:16:34] it's probably because it's running on Solaris! [21:16:40] and it's blody JIRA [21:17:14] you blody need a super computer to even start executing JIRA [21:18:09] bloody * [21:18:17] :-) [21:18:20] You can always try e-mailing ts-admins@toolserver.org. [21:18:24] had to be brief here as well [21:18:31] * FastLizard4 starts chanting BugZilla [21:18:59] FastLizard4: I actually like redmine better, mostly because it's ruby [21:36:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [21:38:10] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 681608.000000 [21:38:21] AzaToth: ooh, free injections! [21:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [21:39:19] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [21:39:24] but anything rather than bugzilla. Well, more or less. [21:40:10] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 73458 MB (12% inode=99%): [21:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [21:53:19] /sql on cassia is WARNING: DISK WARNING - free space: /sql 105841 MB (8% inode=99%): [21:56:30] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 16227 [21:59:43] hum.. i have an sge work in t state for about 5 hours... his he being tranfered to siberia? [22:04:19] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 15672.000000 [22:07:29] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [22:31:59] what's with usr store? [22:33:32] Danny_B, did you see the mailing list? [22:34:01] nope, just came to comp [22:34:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [22:36:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [22:38:10] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 669681.000000 [22:38:29] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [22:39:09] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:39:19] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [22:39:59] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:40:10] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 73238 MB (12% inode=99%): [22:40:29] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [22:43:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [22:49:29] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [22:52:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:53:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [22:56:32] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 9922 [22:58:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:02:00] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:02:29] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [23:03:59] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:04:19] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 8761.000000 [23:07:29] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [23:09:07] AzaToth: done [23:09:31] thanks [23:10:01] Alchimista: which job-number? [23:11:09] / on damiana is WARNING: DISK WARNING - free space: / 15012 MB (20% inode=95%): [23:11:12] DaBPunkt: I'm getting logged out from JIRA after like a minute [23:11:32] DaBPunkt: 1329445 [23:11:40] i've set it to delection now [23:11:48] but i was unable to qdel him [23:12:48] I see the problem [23:14:08] i tried to qdel him on submit, but it gave an nfs error [23:15:54] see the problem for that too [23:16:53] hmm, could have been firefox sync thingi [23:17:09] AzaToth: no, its jira [23:17:10] disabled it, and now I'm not getting logged out any more [23:17:14] uh [23:18:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [23:19:19] MySQL slave on z-dat-s6-a is CRITICAL: (Return code of 139 is out of bounds) [23:23:29] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [23:25:49] Alchimista: it should both be fixed now [23:26:16] ok, thanks DaBPunkt [23:32:50] [[Special:Log/newusers]] create 10 * Theautojunkie * (New user account) [23:36:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [23:36:58] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:37:39] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [23:37:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:38:09] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 652286.000000 [23:38:15] @replag [23:38:15] DaBPunkt: s1-rr-a-wd: 1m 27s [+0.00 s/s]; s1-user: 1h 14m 50s [-1.88 s/s]; s1-user-wd: 1m 27s [-0.82 s/s]; s2-user-c: error; s2-user-wd: 6h 17m 56s [-2.96 s/s]; s4-user-wd: 1m 27s [+0.00 s/s]; s5-user-c: 1w 13h 11m 17s [-2.15 s/s]; s5-user-wd: 1m 29s [+0.00 s/s] [23:38:16] DaBPunkt: s6-user: 25m 31s [-]; s6-user-wd: 1m 27s [+0.00 s/s]; s7-user-wd: 1m 27s [+0.00 s/s] [23:39:10] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:39:19] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [23:39:49] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [23:40:09] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 73035 MB (11% inode=99%): [23:40:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:43:39] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [23:48:59] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:49:28] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [23:50:19] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3573.000000 [23:50:29] MySQL slave on rosemary is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3557 [23:52:10] hmmm, the absence of /mnt/user-store prevents some of my services from caching their results, causing an increased amount of db requests [23:52:47] is it worth setting up a workaround (i.e. create new cache dir in my home dir?) [23:53:59] well, probably hard to say without clairvoyant powers ;-) [23:57:59] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:58:28] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output