[00:01:48] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 82577.000000 [00:02:18] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 440854.000000 [00:02:18] s5 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 146476.000000 [00:09:28] APT on nightshade is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [00:16:28] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [00:54:18] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 600083.000000 [00:54:18] APT on yucca is WARNING: APT WARNING: 39 packages available for upgrade (0 critical updates). [00:54:18] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 776920.000000 [00:54:19] s4 replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 85721.000000 [00:54:19] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [00:54:19] ethernet 0/1/12 [csw1-esams:1/24] on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/12:DOWN: 1 int NOK : CRITICAL [00:54:19] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [00:54:19] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc: [00:54:20] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1315673.000000 [00:54:20] FMA on amaranth is CRITICAL: ERROR - unexpected output from snmpwalk [00:54:30] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1592077.000000 [00:54:30] NTP on hyacinth is CRITICAL: NTP CRITICAL: Offset 10.358761 secs [00:54:30] SMTP on sage is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:54:30] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:54:38] NTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:54:38] SMTP on z-dat-s1-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:54:38] /mnt on thyme is UNKNOWN: NRPE: Unable to read output [00:54:48] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:54:48] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:54:48] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:54:48] Environment IPMI on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:54:48] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [00:54:48] SSH on mayapple is CRITICAL: Server answer: [00:54:48] APT on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:54:49] Load avg. on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:54:50] Load avg. on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:54:50] Environment IPMI on thyme is UNKNOWN: NRPE: Unable to read output [00:54:51] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [00:55:21] APT on z-dat-s2-b is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [00:55:21] APT on sage is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [00:55:21] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [00:55:21] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [00:55:21] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293355 MB (5% inode=64%): [00:55:21] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [00:55:21] MySQL slave on z-dat-s1-b is CRITICAL: (Return code of 139 is out of bounds) [00:55:26] RAID on thyme is UNKNOWN: NRPE: Unable to read output [00:55:26] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 185056.000000 [00:55:26] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [00:55:26] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:55:26] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [00:55:26] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 2481647.000000 [00:55:49] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:55:49] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:55:49] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:55:49] / on thyme is UNKNOWN: NRPE: Unable to read output [00:55:49] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on thyme (146) [00:55:49] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1430261.000000 [00:55:49] NTP on amaranth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:55:49] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:56:07] /tmp on thyme is UNKNOWN: NRPE: Unable to read output [00:56:07] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [00:56:07] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:56:07] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:56:07] NTP on rosemary is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [00:56:07] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:56:29] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:05:03] downside of piping my toolserver-l email threw toolserver, I don't get uipdates of down mail [01:09:28] APT on nightshade is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [01:16:27] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [01:21:47] /sql on ptolemy is CRITICAL: DISK CRITICAL - free space: /sql 54278 MB (8% inode=99%): [01:22:18] NTP on ptolemy is CRITICAL: NTP CRITICAL: Offset 11.505836 secs [01:54:18] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 599236.000000 [01:54:18] s4 replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 49332.000000 [01:54:18] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 776532.000000 [01:54:18] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [01:54:19] APT on yucca is WARNING: APT WARNING: 39 packages available for upgrade (0 critical updates). [01:54:19] ethernet 0/1/12 [csw1-esams:1/24] on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/12:DOWN: 1 int NOK : CRITICAL [01:54:19] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc: [01:54:19] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1319272.000000 [01:54:19] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [01:54:20] FMA on amaranth is CRITICAL: ERROR - unexpected output from snmpwalk [01:54:28] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1590218.000000 [01:54:28] NTP on hyacinth is CRITICAL: NTP CRITICAL: Offset 10.358761 secs [01:54:28] SMTP on sage is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:28] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:38] NTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:38] SMTP on z-dat-s1-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:38] /mnt on thyme is UNKNOWN: NRPE: Unable to read output [01:54:48] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:54:48] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:48] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:54:48] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [01:54:48] Environment IPMI on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:54:48] Load avg. on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:54:48] APT on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:54:49] SSH on mayapple is CRITICAL: Server answer: [01:54:50] Load avg. on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:55:18] APT on z-dat-s2-b is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [01:55:18] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [01:55:18] APT on sage is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [01:55:27] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:55:28] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:55:28] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:55:28] / on thyme is UNKNOWN: NRPE: Unable to read output [01:55:28] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on thyme (146) [01:55:29] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1429119.000000 [01:55:29] NTP on amaranth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:55:37] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:55:38] /tmp on thyme is UNKNOWN: NRPE: Unable to read output [01:55:38] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [01:55:48] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:55:48] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:55:48] NTP on rosemary is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [01:55:48] Environment IPMI on thyme is UNKNOWN: NRPE: Unable to read output [01:55:48] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [01:55:58] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:56:08] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:56:18] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293355 MB (5% inode=64%): [01:56:18] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [01:56:18] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 188716.000000 [01:56:18] RAID on thyme is UNKNOWN: NRPE: Unable to read output [01:56:18] MySQL slave on z-dat-s1-b is CRITICAL: (Return code of 139 is out of bounds) [01:56:18] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [01:56:19] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [01:56:19] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:56:19] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [01:56:20] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 2474601.000000 [02:01:47] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 52150.000000 [02:02:18] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 444787.000000 [02:02:19] s5 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 150290.000000 [02:09:27] APT on nightshade is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [02:16:27] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [02:21:47] /sql on ptolemy is CRITICAL: DISK CRITICAL - free space: /sql 54278 MB (8% inode=99%): [02:22:18] NTP on ptolemy is CRITICAL: NTP CRITICAL: Offset 11.505836 secs [02:54:18] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 599369.000000 [02:54:18] s4 replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 24983.000000 [02:54:18] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 775931.000000 [02:54:28] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1588236.000000 [02:54:28] NTP on hyacinth is CRITICAL: NTP CRITICAL: Offset 10.358761 secs [02:54:28] SMTP on sage is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:54:28] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:54:37] NTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:54:37] SMTP on z-dat-s1-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:54:38] /mnt on thyme is UNKNOWN: NRPE: Unable to read output [02:54:47] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:54:47] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:54:47] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:54:47] Environment IPMI on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:54:47] Load avg. on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:54:48] APT on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:54:48] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [02:54:49] Load avg. on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:54:49] SSH on mayapple is CRITICAL: Server answer: [02:55:18] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [02:55:18] APT on z-dat-s2-b is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [02:55:19] APT on sage is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [02:55:19] APT on yucca is WARNING: APT WARNING: 39 packages available for upgrade (0 critical updates). [02:55:19] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [02:55:19] ethernet 0/1/12 [csw1-esams:1/24] on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/12:DOWN: 1 int NOK : CRITICAL [02:55:19] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [02:55:19] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc: [02:55:20] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1322933.000000 [02:55:20] FMA on amaranth is CRITICAL: ERROR - unexpected output from snmpwalk [02:55:28] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:55:28] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:55:28] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:55:28] / on thyme is UNKNOWN: NRPE: Unable to read output [02:55:28] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on thyme (146) [02:55:28] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1428507.000000 [02:55:29] NTP on amaranth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:55:38] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:55:38] /tmp on thyme is UNKNOWN: NRPE: Unable to read output [02:55:38] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [02:55:48] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:55:48] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:55:48] NTP on rosemary is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [02:55:48] Environment IPMI on thyme is UNKNOWN: NRPE: Unable to read output [02:55:48] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [02:55:58] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:56:07] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:56:18] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293355 MB (5% inode=64%): [02:56:18] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [02:56:18] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [02:56:18] RAID on thyme is UNKNOWN: NRPE: Unable to read output [02:56:18] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 192316.000000 [02:56:18] MySQL slave on z-dat-s1-b is CRITICAL: (Return code of 139 is out of bounds) [02:56:19] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [02:56:19] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:56:20] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [02:56:20] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 2466672.000000 [03:01:47] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 29185.000000 [03:02:18] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 445565.000000 [03:02:18] s5 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 152316.000000 [03:09:28] APT on nightshade is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [03:16:28] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [03:26:38] SMF on ptolemy is CRITICAL: ERROR - maintenance: svc:/network/ts/apache22:default [03:26:48] /sql on ptolemy is CRITICAL: DISK CRITICAL - free space: /sql 54278 MB (8% inode=99%): [03:27:18] NTP on ptolemy is CRITICAL: NTP CRITICAL: Offset 11.505836 secs [03:44:18] s4 replag on z-dat-s5-b is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3417.000000 [03:50:18] s4 replag on z-dat-s5-b is OK: QUERY OK: SELECT ts_rc_age() returned 1636.000000 [03:54:18] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 598746.000000 [03:54:18] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 775247.000000 [03:54:27] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1587498.000000 [03:54:28] SMTP on sage is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:54:29] NTP on hyacinth is CRITICAL: NTP CRITICAL: Offset 10.358761 secs [03:54:29] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:54:37] NTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:54:38] SMTP on z-dat-s1-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:54:38] /mnt on thyme is UNKNOWN: NRPE: Unable to read output [03:54:48] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:54:48] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [03:54:48] SSH on mayapple is CRITICAL: Server answer: [03:54:48] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:54:48] Load avg. on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:54:48] APT on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:54:49] Environment IPMI on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:54:49] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:54:50] Load avg. on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:55:18] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [03:55:18] APT on z-dat-s2-b is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [03:55:18] APT on sage is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [03:55:18] APT on yucca is WARNING: APT WARNING: 39 packages available for upgrade (0 critical updates). [03:55:18] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [03:55:18] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [03:55:19] ethernet 0/1/12 [csw1-esams:1/24] on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/12:DOWN: 1 int NOK : CRITICAL [03:55:19] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc: [03:55:20] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1326533.000000 [03:55:20] FMA on amaranth is CRITICAL: ERROR - unexpected output from snmpwalk [03:55:28] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:55:28] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:55:28] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:55:28] / on thyme is UNKNOWN: NRPE: Unable to read output [03:55:28] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on thyme (146) [03:55:28] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1426795.000000 [03:55:29] NTP on amaranth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:55:37] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:55:37] /tmp on thyme is UNKNOWN: NRPE: Unable to read output [03:55:47] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [03:55:47] NTP on rosemary is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [03:55:48] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:55:48] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:55:48] Environment IPMI on thyme is UNKNOWN: NRPE: Unable to read output [03:55:48] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [03:55:57] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:56:07] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:56:18] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293355 MB (5% inode=64%): [03:56:19] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [03:56:19] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [03:56:19] RAID on thyme is UNKNOWN: NRPE: Unable to read output [03:56:19] MySQL slave on z-dat-s1-b is CRITICAL: (Return code of 139 is out of bounds) [03:56:19] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 195916.000000 [03:56:20] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [03:56:20] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:56:21] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 2462211.000000 [03:56:21] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [04:01:48] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 3746.000000 [04:02:18] s5 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 154734.000000 [04:02:18] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 445269.000000 [04:02:47] s4 replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3399.000000 [04:07:28] Free Memory on damiana is WARNING: WARNING - 5.8% (488352 kB) free! [04:08:48] s4 replag on cassia is OK: QUERY OK: SELECT ts_rc_age() returned 1224.000000 [04:09:28] APT on nightshade is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [04:11:28] Free Memory on damiana is CRITICAL: CRITICAL - 4.9% (406696 kB) free! [04:16:28] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [04:26:38] SMF on ptolemy is CRITICAL: ERROR - maintenance: svc:/network/ts/apache22:default [04:26:48] /sql on ptolemy is CRITICAL: DISK CRITICAL - free space: /sql 54278 MB (8% inode=99%): [04:27:18] NTP on ptolemy is CRITICAL: NTP CRITICAL: Offset 11.505836 secs [04:45:48] /sql on z-dat-s1-b is WARNING: DISK WARNING - free space: /sql 81973 MB (8% inode=99%): [04:54:18] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 597417.000000 [04:54:18] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 774348.000000 [04:54:28] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1586771.000000 [04:54:28] NTP on hyacinth is CRITICAL: NTP CRITICAL: Offset 10.358761 secs [04:54:28] SMTP on sage is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:54:28] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:54:38] NTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:54:38] SMTP on z-dat-s1-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:54:38] /mnt on thyme is UNKNOWN: NRPE: Unable to read output [04:54:48] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:54:48] SSH on mayapple is CRITICAL: Server answer: [04:54:48] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [04:54:48] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:54:48] Environment IPMI on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:54:48] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:54:49] Load avg. on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:54:49] APT on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:54:50] Load avg. on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:18] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [04:55:18] APT on sage is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [04:55:18] APT on z-dat-s2-b is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [04:55:18] APT on yucca is WARNING: APT WARNING: 39 packages available for upgrade (0 critical updates). [04:55:18] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [04:55:19] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [04:55:19] ethernet 0/1/12 [csw1-esams:1/24] on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/12:DOWN: 1 int NOK : CRITICAL [04:55:19] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc: [04:55:20] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1330132.000000 [04:55:20] FMA on amaranth is CRITICAL: ERROR - unexpected output from snmpwalk [04:55:27] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:28] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:28] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:28] / on thyme is UNKNOWN: NRPE: Unable to read output [04:55:28] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on thyme (146) [04:55:28] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1425408.000000 [04:55:28] NTP on amaranth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:37] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:37] /tmp on thyme is UNKNOWN: NRPE: Unable to read output [04:55:47] NTP on rosemary is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [04:55:47] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:48] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:48] Environment IPMI on thyme is UNKNOWN: NRPE: Unable to read output [04:55:48] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [04:55:57] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:56:08] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:56:18] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293355 MB (5% inode=64%): [04:56:38] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [04:57:18] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [04:57:18] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [04:57:18] RAID on thyme is UNKNOWN: NRPE: Unable to read output [04:57:18] MySQL slave on z-dat-s1-b is CRITICAL: (Return code of 139 is out of bounds) [04:57:18] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 199576.000000 [04:57:18] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [04:57:19] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:57:19] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 2460992.000000 [04:57:20] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [04:58:18] any idea why toolserver is suddenly refusing my ssh key? [04:58:25] ah [04:58:26] I see topic [04:58:28] nevermind [05:09:27] APT on nightshade is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [05:11:27] Free Memory on damiana is CRITICAL: CRITICAL - 1.6% (137948 kB) free! [05:12:18] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 444696.000000 [05:12:18] s5 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 157197.000000 [05:16:28] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [05:26:38] SMF on ptolemy is CRITICAL: ERROR - maintenance: svc:/network/ts/apache22:default [05:26:47] /sql on ptolemy is CRITICAL: DISK CRITICAL - free space: /sql 54278 MB (8% inode=99%): [05:27:18] NTP on ptolemy is CRITICAL: NTP CRITICAL: Offset 11.505836 secs [05:49:48] NTP on adenia is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [05:54:18] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 596238.000000 [05:54:18] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 773504.000000 [05:54:28] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1586499.000000 [05:54:28] NTP on hyacinth is CRITICAL: NTP CRITICAL: Offset 10.358761 secs [05:54:28] SMTP on sage is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:28] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:38] NTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:38] SMTP on z-dat-s1-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:38] /mnt on thyme is UNKNOWN: NRPE: Unable to read output [05:54:48] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:54:48] SSH on mayapple is CRITICAL: Server answer: [05:54:48] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:54:48] Load avg. on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:54:48] Environment IPMI on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:54:48] APT on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:54:48] Load avg. on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:54:49] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:54:57] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [05:55:17] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [05:55:18] APT on z-dat-s2-b is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [05:55:19] APT on sage is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [05:55:19] APT on yucca is WARNING: APT WARNING: 39 packages available for upgrade (0 critical updates). [05:55:19] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [05:55:19] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [05:55:19] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc: [05:55:20] ethernet 0/1/12 [csw1-esams:1/24] on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/12:DOWN: 1 int NOK : CRITICAL [05:55:20] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1333733.000000 [05:55:28] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:55:28] FMA on amaranth is CRITICAL: ERROR - unexpected output from snmpwalk [05:55:28] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:28] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:28] / on thyme is UNKNOWN: NRPE: Unable to read output [05:55:28] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on thyme (146) [05:55:29] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1423965.000000 [05:55:29] NTP on amaranth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:38] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:55:38] /tmp on thyme is UNKNOWN: NRPE: Unable to read output [05:55:48] NTP on rosemary is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [05:55:48] Environment IPMI on thyme is UNKNOWN: NRPE: Unable to read output [05:55:48] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [05:55:48] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:55:48] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:55:58] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:56:07] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:56:18] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293355 MB (5% inode=64%): [05:56:37] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [05:57:18] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [05:57:18] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [05:57:18] RAID on thyme is UNKNOWN: NRPE: Unable to read output [05:57:18] MySQL slave on z-dat-s1-b is CRITICAL: (Return code of 139 is out of bounds) [05:57:19] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 203177.000000 [05:57:19] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [05:57:19] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:57:19] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 2458366.000000 [05:57:28] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [06:09:05] Anyone have a machine handy that can mirror Magnus' reference generator during the Toolserver's downtime? [06:09:28] APT on nightshade is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [06:11:28] Free Memory on damiana is CRITICAL: CRITICAL - 1.6% (136932 kB) free! [06:12:18] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 443856.000000 [06:12:18] s5 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 159318.000000 [06:14:17] [[Category:Edit counters]] ! 10https://wiki.toolserver.org/w/index.php?diff=8006&oldid=7883&rcid=21908 * 217.76.79.57 * (+52) () [06:16:28] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [06:22:46] CMBJ1: Does it need access to the DB replicas? [06:23:15] Duh, of course not if it can be hosted elsewhere. :-) [06:24:20] Magnus has already gotten flickr2commons running on the Tool Labs, I'm sure he could bring his reference generator over too. [06:26:37] SMF on ptolemy is CRITICAL: ERROR - maintenance: svc:/network/ts/apache22:default [06:26:47] /sql on ptolemy is CRITICAL: DISK CRITICAL - free space: /sql 54278 MB (8% inode=99%): [06:27:18] NTP on ptolemy is CRITICAL: NTP CRITICAL: Offset 11.505836 secs [06:30:37] Sun Grid Engine execd on ortelius is CRITICAL: (Service Check Timed Out) [06:30:49] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:59:29] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [06:59:30] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:01:09] Shubinator * [Toolserver-l] Can't login to willow [07:01:09] Casey Brown * Re: [Toolserver-l] Can't login to willow [07:01:09] Hersfold * Re: [Toolserver-l] Can't login to willow [07:01:09] John * Re: [Toolserver-l] Can't login to willow [07:01:09] DaB. * Re: [Toolserver-announce] [Toolserver-l] Can't login to willow [07:01:09] Alex Brollo * Re: [Toolserver-l] Can't login to willow [07:03:48] Sun Grid Engine execd on nightshade is UNKNOWN: Execution timeout exceeded [07:03:49] Sun Grid Engine execd on yarrow is UNKNOWN: Execution timeout exceeded [07:03:58] APT on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:03:59] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:03:59] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:03:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:04:58] Sun Grid Engine execd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:04:59] Sun Grid Engine execd on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:09:08] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:09:48] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:09:58] /home on hemlock is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:09:59] aliasd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:09:59] Load avg. on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:10:08] /var on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:10:09] Environment IPMI on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:10:09] / on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:10:09] Sensors on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:10:09] /var/tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:10:58] /tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:41:39] s4 replag on z-dat-s5-b is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2177.000000 [07:58:29] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 600708.000000 [07:58:30] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [07:58:30] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293349 MB (5% inode=64%): [07:58:30] APT on yucca is WARNING: APT WARNING: 39 packages available for upgrade (0 critical updates). [07:58:30] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [07:58:30] NTP on adenia is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [07:58:30] NTP on ptolemy is CRITICAL: NTP CRITICAL: Offset 11.505836 secs [07:58:31] /mnt on thyme is UNKNOWN: NRPE: Unable to read output [07:58:32] /sql on ptolemy is CRITICAL: DISK CRITICAL - free space: /sql 54277 MB (8% inode=99%): [07:58:32] / on thyme is UNKNOWN: NRPE: Unable to read output [07:58:32] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [07:58:33] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 778628.000000 [07:58:33] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [07:58:34] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc: [07:58:49] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:58:49] NTP on amaranth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:58:49] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 2461571.000000 [07:58:49] NTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:58:49] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:58:49] NTP on hyacinth is CRITICAL: NTP CRITICAL: Offset 10.358761 secs [07:58:50] SMTP on sage is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:58:50] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:58:50] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:58:51] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:58:59] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on thyme (146) [07:58:59] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1427712.000000 [07:58:59] SMTP on z-dat-s1-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:58:59] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:58:59] APT on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:59:09] Load avg. on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:59:09] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:59:09] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:59:09] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:59:09] Environment IPMI on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:59:09] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:59:09] Load avg. on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:59:10] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:59:10] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:59:11] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [07:59:12] SSH on mayapple is CRITICAL: Server answer: [07:59:12] NTP on rosemary is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [07:59:29] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [07:59:29] RAID on thyme is UNKNOWN: NRPE: Unable to read output [07:59:29] /tmp on thyme is UNKNOWN: NRPE: Unable to read output [07:59:29] Environment IPMI on thyme is UNKNOWN: NRPE: Unable to read output [07:59:29] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:59:39] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1341196.000000 [07:59:39] FMA on amaranth is CRITICAL: ERROR - unexpected output from snmpwalk [08:03:59] APT on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:03:59] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:04:09] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:04:09] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:04:58] Sun Grid Engine execd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:05:08] Sun Grid Engine execd on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:05:39] s4 replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 3618.000000 [08:08:38] Sun Grid Engine execd on willow is UNKNOWN: Cannot execute /sge/GE/bin/sol-amd64/qstat [08:09:09] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:09:49] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:09:58] /home on hemlock is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:10:08] aliasd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:10:08] Load avg. on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:10:08] /var/tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:10:08] / on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:10:08] Sensors on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:10:09] /var on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:10:09] Environment IPMI on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:10:59] /tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:15:29] / on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:15:38] SSH on willow is CRITICAL: Server answer: [08:15:39] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:15:39] /tmp on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:15:39] Environment IPMI on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:21:39] SMTP on willow is CRITICAL: Connection refused [09:16:25] / on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [09:16:30] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [09:16:30] /tmp on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [09:16:30] Environment IPMI on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [09:16:32] SSH on willow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:21:39] SMTP on willow is CRITICAL: Connection refused [09:58:29] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 607908.000000 [09:58:29] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [09:58:29] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293349 MB (5% inode=64%): [09:58:39] APT on z-dat-s2-b is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [09:58:39] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [09:58:39] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [09:58:39] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1599360.000000 [09:58:39] MySQL slave on z-dat-s1-b is CRITICAL: (Return code of 139 is out of bounds) [09:58:39] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 217658.000000 [09:58:48] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 2468772.000000 [09:58:48] NTP on amaranth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:58:48] NTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:58:49] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:58:49] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:58:49] NTP on hyacinth is CRITICAL: NTP CRITICAL: Offset 10.358761 secs [09:58:49] SMTP on sage is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:58:49] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:58:49] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:58:50] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:58:58] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on thyme (146) [09:58:58] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1434912.000000 [09:58:58] SMTP on z-dat-s1-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:58:59] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:58:59] APT on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:08] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:09] Load avg. on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:09] Load avg. on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:09] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:09] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:09] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:09] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:09] Environment IPMI on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:09] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:59:10] SSH on mayapple is CRITICAL: Server answer: [09:59:28] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [09:59:30] RAID on thyme is UNKNOWN: NRPE: Unable to read output [09:59:30] /tmp on thyme is UNKNOWN: NRPE: Unable to read output [09:59:30] Environment IPMI on thyme is UNKNOWN: NRPE: Unable to read output [09:59:30] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc: [09:59:30] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [09:59:30] APT on yucca is WARNING: APT WARNING: 39 packages available for upgrade (0 critical updates). [09:59:31] NTP on adenia is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [09:59:31] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:59:32] /mnt on thyme is UNKNOWN: NRPE: Unable to read output [09:59:32] / on thyme is UNKNOWN: NRPE: Unable to read output [09:59:33] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 785888.000000 [09:59:33] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [09:59:34] ethernet 0/1/12 [csw1-esams:1/24] on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/12:DOWN: 1 int NOK : CRITICAL [09:59:34] APT on sage is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [09:59:39] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1348397.000000 [09:59:39] FMA on amaranth is CRITICAL: ERROR - unexpected output from snmpwalk [10:00:09] NTP on rosemary is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [10:00:09] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [10:03:58] APT on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:04:08] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:04:08] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:04:59] Sun Grid Engine execd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:05:09] Sun Grid Engine execd on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:05:39] s4 replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 10818.000000 [10:08:39] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:09:08] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:09:48] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:09:59] /home on hemlock is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:10:09] aliasd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:10:09] / on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:10:09] Sensors on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:10:09] /var/tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:10:09] Environment IPMI on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:10:10] /var on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:10:10] Load avg. on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:10:58] /tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:15:29] / on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:15:39] /tmp on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:15:39] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:15:39] Environment IPMI on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:16:38] SSH on willow is CRITICAL: Server answer: [10:21:38] SMTP on willow is CRITICAL: Connection refused [10:29:08] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [10:58:29] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 611508.000000 [10:58:29] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [10:58:38] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [10:58:39] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [10:58:39] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1602960.000000 [10:58:39] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 221258.000000 [10:58:39] MySQL slave on z-dat-s1-b is CRITICAL: (Return code of 139 is out of bounds) [10:58:49] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 2472372.000000 [10:58:49] NTP on amaranth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:58:49] NTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:58:49] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:58:49] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:58:50] SMTP on sage is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:58:50] NTP on hyacinth is CRITICAL: NTP CRITICAL: Offset 10.358761 secs [10:58:51] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:58:51] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:58:51] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:58:59] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on thyme (146) [10:58:59] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1438512.000000 [10:58:59] SMTP on z-dat-s1-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:58:59] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:59:09] APT on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:59:09] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:59:09] Load avg. on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:59:09] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:59:09] Environment IPMI on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:59:09] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:59:10] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:59:10] Load avg. on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:59:11] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:59:12] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:59:12] SSH on mayapple is CRITICAL: Server answer: [10:59:29] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [10:59:29] RAID on thyme is UNKNOWN: NRPE: Unable to read output [10:59:29] /tmp on thyme is UNKNOWN: NRPE: Unable to read output [10:59:29] Environment IPMI on thyme is UNKNOWN: NRPE: Unable to read output [10:59:29] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293349 MB (5% inode=64%): [10:59:30] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc: [10:59:30] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:59:30] NTP on adenia is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [10:59:31] /mnt on thyme is UNKNOWN: NRPE: Unable to read output [10:59:31] / on thyme is UNKNOWN: NRPE: Unable to read output [10:59:32] APT on yucca is WARNING: APT WARNING: 39 packages available for upgrade (0 critical updates). [10:59:32] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 789488.000000 [11:00:08] NTP on rosemary is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [11:00:08] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [11:04:09] APT on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:04:09] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:04:09] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:04:58] Sun Grid Engine execd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:05:08] Sun Grid Engine execd on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:05:38] s4 replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 14418.000000 [11:09:08] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:09:39] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:09:49] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:10:09] /home on hemlock is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:10:09] Load avg. on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:10:09] aliasd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:10:09] /var/tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:10:09] /var on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:10:09] Sensors on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:10:10] Environment IPMI on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:10:10] / on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:11:09] /tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:15:39] / on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:15:39] /tmp on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:15:39] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:15:39] Environment IPMI on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:16:49] SSH on willow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:21:39] SMTP on willow is CRITICAL: Connection refused [11:29:09] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [11:58:29] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 615110.000000 [11:58:29] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [11:58:39] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [11:58:39] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [11:58:39] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1606560.000000 [11:58:39] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 224858.000000 [11:58:39] MySQL slave on z-dat-s1-b is CRITICAL: (Return code of 139 is out of bounds) [11:58:49] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 2475971.000000 [11:58:49] NTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:58:49] NTP on amaranth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:58:49] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:58:49] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:58:49] SMTP on sage is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:58:50] NTP on hyacinth is CRITICAL: NTP CRITICAL: Offset 10.358761 secs [11:58:50] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:58:51] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:58:51] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:58:58] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on thyme (146) [11:58:58] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1442112.000000 [11:58:58] SMTP on z-dat-s1-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:58:58] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:59:08] APT on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:59:08] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:59:08] Load avg. on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:59:09] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:59:09] Load avg. on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:59:09] Environment IPMI on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:59:09] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:59:10] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:59:10] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:59:10] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:59:11] SSH on mayapple is CRITICAL: Server answer: [11:59:28] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [11:59:29] Environment IPMI on thyme is UNKNOWN: NRPE: Unable to read output [11:59:29] /tmp on thyme is UNKNOWN: NRPE: Unable to read output [11:59:30] RAID on thyme is UNKNOWN: NRPE: Unable to read output [11:59:38] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1355597.000000 [11:59:39] FMA on amaranth is CRITICAL: ERROR - unexpected output from snmpwalk [12:00:09] NTP on rosemary is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [12:00:09] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [12:00:29] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293349 MB (5% inode=64%): [12:00:29] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [12:00:29] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:00:29] APT on yucca is WARNING: APT WARNING: 39 packages available for upgrade (0 critical updates). [12:00:29] NTP on adenia is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [12:00:29] /mnt on thyme is UNKNOWN: NRPE: Unable to read output [12:00:30] / on thyme is UNKNOWN: NRPE: Unable to read output [12:00:31] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [12:00:31] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 793148.000000 [12:00:31] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc: [12:00:32] ethernet 0/1/12 [csw1-esams:1/24] on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/12:DOWN: 1 int NOK : CRITICAL [12:00:32] APT on z-dat-s2-b is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [12:04:09] APT on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:04:09] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:04:09] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:04:59] Sun Grid Engine execd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:05:09] Sun Grid Engine execd on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:05:39] s4 replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 18018.000000 [12:09:09] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:09:38] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:09:48] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:10:09] /home on hemlock is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:10:09] /var/tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:10:09] /var on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:10:09] / on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:10:09] Sensors on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:10:09] Environment IPMI on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:10:10] Load avg. on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:10:10] aliasd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:11:09] /tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:15:35] argh, 504. [12:15:39] / on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:15:39] /tmp on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:15:39] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:15:39] Environment IPMI on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:17:39] SSH on willow is CRITICAL: Server answer: [12:21:39] SMTP on willow is CRITICAL: Connection refused [12:29:08] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [12:58:29] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 618708.000000 [12:58:29] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [12:58:38] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [12:58:38] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1610159.000000 [12:58:39] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 228458.000000 [12:58:39] MySQL slave on z-dat-s1-b is CRITICAL: (Return code of 139 is out of bounds) [12:58:48] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 2479571.000000 [12:58:49] NTP on amaranth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:58:49] NTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:58:49] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:58:49] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:58:49] NTP on hyacinth is CRITICAL: NTP CRITICAL: Offset 10.358761 secs [12:58:50] SMTP on sage is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:58:50] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:58:50] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:58:50] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:58:58] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on thyme (146) [12:58:59] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1445712.000000 [12:58:59] SMTP on z-dat-s1-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:58:59] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:59:09] APT on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:59:09] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:59:09] Load avg. on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:59:09] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:59:09] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:59:09] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:59:10] Load avg. on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:59:10] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:59:11] Environment IPMI on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:59:11] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:59:12] SSH on mayapple is CRITICAL: Server answer: [12:59:29] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [12:59:29] Environment IPMI on thyme is UNKNOWN: NRPE: Unable to read output [12:59:29] /tmp on thyme is UNKNOWN: NRPE: Unable to read output [12:59:29] RAID on thyme is UNKNOWN: NRPE: Unable to read output [12:59:30] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [12:59:39] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1359196.000000 [12:59:39] FMA on amaranth is CRITICAL: ERROR - unexpected output from snmpwalk [13:00:08] NTP on rosemary is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [13:00:08] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [13:00:29] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293349 MB (5% inode=64%): [13:00:30] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [13:00:30] APT on yucca is WARNING: APT WARNING: 39 packages available for upgrade (0 critical updates). [13:00:30] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:00:30] NTP on adenia is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [13:00:30] /mnt on thyme is UNKNOWN: NRPE: Unable to read output [13:00:30] / on thyme is UNKNOWN: NRPE: Unable to read output [13:00:30] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 796748.000000 [13:00:31] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc: [13:00:32] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [13:00:32] ethernet 0/1/12 [csw1-esams:1/24] on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/12:DOWN: 1 int NOK : CRITICAL [13:00:32] APT on z-dat-s2-b is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [13:00:38] APT on sage is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [13:04:08] APT on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:04:09] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:04:09] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:04:59] Sun Grid Engine execd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:05:09] Sun Grid Engine execd on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:05:38] s4 replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 21618.000000 [13:09:08] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:09:49] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:10:09] /home on hemlock is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:10:09] /var/tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:10:09] Load avg. on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:10:09] /var on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:10:09] Sensors on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:10:10] Environment IPMI on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:10:10] / on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:10:11] aliasd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:10:38] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:11:09] /tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:15:39] / on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:15:39] Environment IPMI on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:15:39] /tmp on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:15:39] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:17:39] SSH on willow is CRITICAL: Server answer: [13:20:48] SSH on yarrow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:21:39] SMTP on willow is CRITICAL: Connection refused [13:29:08] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [13:34:56] hello all [13:44:27] hi DaBPunkt [13:44:52] * valhallasw hands DaBPunkt a brownie [13:45:07] thanks. [13:45:42] but you should use it to allure nosy because the TS need her more than me at the moment ;) [13:50:39] DaBPunkt: If you have a minute, can you summarize what the problem is. [13:51:32] Could this outage have been avoided by better hardware or more manpower? [13:51:51] I.e., is this a technical or a management problem? [13:56:03] at the very moment it is a managment problem becaue Nosy didn't documented what she did and so I can not fix it. Before it was a technical problem because AFAIU Nosy something with the solaris-updates went wrong. [13:57:47] the problem is that the solaris nfs-daemon output one very short (and unusable) output so I can not find the reason it doesn't accept connections anymore [13:58:03] Ok, so there was no missing "something" that led to the problem in the first place? [13:58:29] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 622308.000000 [13:58:29] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [13:58:39] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [13:58:39] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1613759.000000 [13:58:39] MySQL slave on z-dat-s1-b is CRITICAL: (Return code of 139 is out of bounds) [13:58:39] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 232058.000000 [13:58:48] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 2483171.000000 [13:58:49] NTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:58:49] NTP on amaranth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:58:49] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:58:49] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:58:49] NTP on hyacinth is CRITICAL: NTP CRITICAL: Offset 10.358761 secs [13:58:49] SMTP on sage is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:58:49] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:58:51] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:58:51] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:58:58] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on thyme (146) [13:58:58] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1449312.000000 [13:58:58] SMTP on z-dat-s1-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:58:58] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:59:08] APT on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:59:08] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:59:08] Load avg. on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:59:08] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:59:08] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:59:09] Load avg. on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:59:09] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:59:10] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:59:10] Environment IPMI on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:59:10] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:59:11] SSH on mayapple is CRITICAL: Server answer: [13:59:28] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [13:59:29] Environment IPMI on thyme is UNKNOWN: NRPE: Unable to read output [13:59:30] /tmp on thyme is UNKNOWN: NRPE: Unable to read output [13:59:30] RAID on thyme is UNKNOWN: NRPE: Unable to read output [13:59:30] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [13:59:39] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1362797.000000 [13:59:39] FMA on amaranth is CRITICAL: ERROR - unexpected output from snmpwalk [14:00:09] NTP on rosemary is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [14:00:09] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [14:00:27] krd: nosy reported problems with a ethernet-card. But we are not sure if it is related or if the card is defect at all. But it is not a problem of too few servers this time (in theory if we have had a testing server nosy could have test there first, but that's a little bit extreme in my eyes) [14:00:29] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293349 MB (5% inode=64%): [14:00:29] APT on yucca is WARNING: APT WARNING: 39 packages available for upgrade (0 critical updates). [14:00:29] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [14:00:29] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:00:29] NTP on adenia is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [14:00:29] /mnt on thyme is UNKNOWN: NRPE: Unable to read output [14:00:30] ethernet 0/1/12 [csw1-esams:1/24] on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/12:DOWN: 1 int NOK : CRITICAL [14:00:30] APT on z-dat-s2-b is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [14:00:46] I will stop this annoiny nagios… [14:01:21] DaBPunkt: I agree that this is bad luck then. [14:02:52] I will now reboot the remaining node. Maybe I can switch to a pre-update-state [14:03:24] As you said "missing docu", I tried to take the situation to migrate something to the WMF Labs stuff, but, as you might expect, gave up after a few hours of unsuccessful search for some Howto. [14:04:10] What I found out at least, is that the database replication at labs is expected to work in the near future, Feb 2013. [14:04:34] Maybe I'm too stupid. [14:04:57] Anyway getting a bit angry with the whole situation, to be honest. [14:05:07] The last date I heard was end of April – but AFAIK it still doesn't work [14:05:27] I passed the state of "angry" [14:06:18] I'm somewhere in between disappointed and resignated [14:07:31] We're doing a sprint this week to finish DB replication, FYI. [14:07:45] krd, try https://wikitech.wikimedia.org/wiki/User:Magnus_Manske/Migrating_from_toolserver [14:07:51] We're almost there, actually. [14:09:48] valhallasw: I almost made it to the shell access, but didn't know if I'm on the right place to put my scripts, and how to do database queries, which is the only thing I need at all. [14:11:30] Coren: Which page shall someone watch to notice the DB availability? [14:12:41] krd: http://www.mediawiki.org/wiki/Wikimedia_Labs/Tool_Labs/TODO is probably gonig the be the first updated. [14:12:52] thank you. [14:14:45] DaBPunkt: What happens to toolserver emails that are normally forwarded by ~/.forward? Are they getting queued at the TS? [14:16:08] most likely they will stack in the mail-server [14:16:46] Ok. [14:39:22] SMTP on willow is CRITICAL: Connection refused [14:39:22] APT on sage is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [14:39:22] APT on z-dat-s2-b is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [14:39:31] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293343 MB (5% inode=64%): [14:39:32] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [14:39:32] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [14:39:33] SSH on willow is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [14:39:33] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [14:39:33] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:39:33] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:42] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:42] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:39:42] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:39:52] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:39:52] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:39:52] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:40:02] Load avg. on willow is WARNING: WARNING - load average: 28.03, 13.09, 5.04 [14:41:01] Load avg. on willow is OK: OK - load average: 11.93, 11.20, 4.88 [14:41:22] SMTP on willow is OK: SMTP OK - 0.058 sec. response time [14:42:10] Got message by nosy she will be online in ~2h [14:43:22] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:44:02] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:44:02] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:44:21] APT on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:44:31] Sun Grid Engine execd on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:44:41] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 1848.000000 [14:45:12] Sun Grid Engine execd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:45:42] s1 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1697.000000 [14:46:11] NTP on adenia is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [14:46:21] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 23348.000000 [14:46:22] s5 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 179618.000000 [14:46:22] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 472781.000000 [14:49:12] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:49:51] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:49:52] /home on hemlock is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:50:12] aliasd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:50:12] /var on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:50:22] /var/tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:50:32] Environment IPMI on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:51:01] Sensors on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:51:02] / on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:51:11] /tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:56:09] Is the nagios missing some service dependencies regarding NPRE? [15:38:32] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 627739.000000 [15:38:32] APT on yucca is WARNING: APT WARNING: 39 packages available for upgrade (0 critical updates). [15:38:33] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [15:38:33] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [15:38:33] s4 replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 25121.000000 [15:38:33] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 805847.000000 [15:38:42] NTP on ptolemy is CRITICAL: NTP CRITICAL: Offset 11.505836 secs [15:38:42] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc: [15:38:42] RAID on thyme is UNKNOWN: NRPE: Unable to read output [15:38:52] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 2488601.000000 [15:38:52] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1619387.000000 [15:38:52] SMTP on sage is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:38:52] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:38:52] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:02] / on thyme is UNKNOWN: NRPE: Unable to read output [15:39:02] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on thyme (146) [15:39:02] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1454508.000000 [15:39:02] NTP on amaranth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:39:02] NTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:39:02] APT on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:12] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:39:12] Environment IPMI on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:12] /tmp on thyme is UNKNOWN: NRPE: Unable to read output [15:39:12] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [15:39:12] Load avg. on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:13] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:39:13] Load avg. on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:22] APT on sage is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [15:39:22] APT on z-dat-s2-b is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [15:39:22] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [15:39:32] FMA on thyme is CRITICAL: ERROR - unexpected output from snmpwalk [15:39:32] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [15:39:32] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [15:39:32] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [15:39:32] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:41] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:39:41] SRaid on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:41] Load avg. on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:42] MySQL slave on z-dat-s1-b is CRITICAL: (Return code of 139 is out of bounds) [15:39:43] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 238121.000000 [15:39:43] FMA on amaranth is CRITICAL: ERROR - unexpected output from snmpwalk [15:39:43] ethernet 0/1/12 [csw1-esams:1/24] on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/12:DOWN: 1 int NOK : CRITICAL [15:39:43] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1368799.000000 [15:39:43] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:51] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:52] SSH on yarrow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:39:52] NTP on hyacinth is CRITICAL: NTP CRITICAL: Offset 10.358761 secs [15:39:52] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:52] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:39:52] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:40:01] SMTP on z-dat-s1-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:40:02] SMF on ptolemy is CRITICAL: ERROR - maintenance: svc:/network/ts/apache22:default [15:40:02] /mnt on thyme is UNKNOWN: NRPE: Unable to read output [15:40:12] SSH on mayapple is CRITICAL: Server answer: [15:40:12] NTP on rosemary is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [15:40:12] /sql on ptolemy is CRITICAL: DISK CRITICAL - free space: /sql 54258 MB (8% inode=99%): [15:40:12] Environment IPMI on thyme is UNKNOWN: NRPE: Unable to read output [15:40:12] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [15:40:22] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293341 MB (5% inode=64%): [15:40:32] SSH on willow is CRITICAL: Server answer: [15:40:32] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:43:21] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:44:02] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:44:22] APT on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:44:32] Sun Grid Engine execd on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:44:42] / on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [15:44:42] /tmp on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [15:44:51] Environment IPMI on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [15:45:01] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [15:45:11] Sun Grid Engine execd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:46:12] NTP on adenia is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [15:46:22] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 26948.000000 [15:46:22] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 476382.000000 [15:46:22] s5 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 182114.000000 [15:49:12] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:49:52] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:49:52] /home on hemlock is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:50:11] /var on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:50:12] aliasd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:50:21] /var/tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:50:31] Environment IPMI on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:51:02] Sensors on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:51:02] / on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:51:12] /tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:28:30] DaBPunkt: Any idea about how long will these problems last? [17:40:20] jem-: waiting on nosy [17:46:54] Thanks, Betacommand, I was just wondering if it was worth a reconfiguration for another server which could take a while [17:47:58] I saw DaBPunkt said 3 h ago that 2 h [17:48:16] ... Let's wait for the moment [17:54:11] jem-, I think the mailing list post said evening CEST [17:55:35] I'm still reviewing mails :) [17:56:20] Is that more recent that DaBPunkt's message here? [17:56:23] than* [18:14:28] ts is back with a nfs workaround [18:15:55] Thanks a lot from all the users, nosy :) [18:16:56] what about the database lag? [18:19:23] hrhrhr [18:19:26] ill look [18:19:27] yarrow still asks for a password, nightshade seems to be fine. [18:20:24] scfc_de: yarrow is back [18:21:17] nosy: Yes. [18:23:39] jesus...wikipedia.de here with error 502 bad gateway too... [18:23:43] wasnt me... [18:39:05] nosy: Lookin' good, thanks :) [18:39:31] nosy: don't forget to document the workaround ;-) [18:39:51] Yep, but SGE still seems to be down. "qstat": "error: commlib error: got select error (Connection refused)", "error: unable to send message to qmaster using port 536 on host "damiana": got send error". [18:41:06] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 293163 MB (5% inode=64%): [18:41:06] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [18:41:06] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [18:41:06] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:41:16] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:41:26] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:41:26] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:41:36] Sun Grid Engine execd on yarrow is OK: Host and Queues Ok [18:41:36] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.009 second response time [18:42:16] Sun Grid Engine execd on nightshade is OK: Host and Queues Ok [18:43:46] Load avg. on nightshade is WARNING: WARNING - load average: 1.01, 1.13, 19.89 [18:45:42] nosy: DB replication is still down, isn't it? [18:46:08] krd: i guess s2, s5 and one s1 instance are with high replag [18:46:18] as wikidata [18:46:35] krd: more replags? [18:46:43] i look for dewiki, which should be at s5. [18:46:50] right [18:46:57] ok [18:53:33] SGE is up. [18:55:02] Are medium-lx@nightshade, medium-lx@yarrow and longrun-lx@nightshade still disabled on purpose, or can they be reenabled? [19:02:15] scfc_de: re-enable them [19:05:17] DaBPunkt: Done. [19:05:24] ok, tnx [20:23:43] NFS server ha-nfs.esi not responding still trying [22:03:24] Now SGE is down again. [22:29:19] And down we go again [22:36:18] jem-: let me see [22:43:31] s5 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 199203.000000 [22:43:31] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 263739 MB (4% inode=64%): [22:43:38] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [22:43:38] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [22:43:38] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:43:38] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:43:46] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:43:57] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:43:57] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:45:37] Load avg. on nightshade is OK: OK - load average: 1.18, 11.33, 14.53 [22:46:17] Sun Grid Engine execd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:46:17] NTP on adenia is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [22:46:27] APT on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:46:27] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:46:37] Sun Grid Engine execd on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:47:07] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:47:07] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:48:37] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 489921.000000 [22:48:38] MySQL on z-dat-s1-b is CRITICAL: Cant connect to MySQL server on z-dat-s1-b (146) [22:48:46] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 17087.000000 [22:49:17] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [22:51:57] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:52:16] /tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:52:17] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:52:17] /var on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:52:17] aliasd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:52:27] /var/tmp on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:52:37] Environment IPMI on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:52:47] ethernet 0/1/12 [csw1-esams:1/24] on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/12:DOWN: 1 int NOK : CRITICAL [22:53:07] Sensors on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:53:07] /home on hemlock is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:53:07] / on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:53:07] Load avg. on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:02:21] /home on hemlock is OK: DISK OK - free space: /home 13418 MB (26% inode=81%): [23:02:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [23:02:29] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.010 second response time [23:02:49] looks like I fixed it [23:02:54] Toolserver is broken again. [23:03:01] is it sw or hw issue? [23:03:20] Nevermind [23:03:35] Danny_B|webgate: stupid solaris [23:03:59] move to linux? [23:04:04] anyway, I have to leave for bed. I have to rise in 8h [23:04:06] wasn't that actually plan [23:04:18] after river left? [23:04:41] Danny_B|webgate: yes [23:05:02] i guess reinstall to debian seems easier and less painfull than continuous dealing with nfs issues [23:05:28] DaBPunkt, is there a way to patch a PHP debugger into toolserver and utilize its PHP engine? [23:05:45] Danny_B|webgate: You can screw up Debian as well :-). [23:06:03] scfc_de: the deal is there is nobody familiar with solaris atm [23:06:04] DaBPunkt, or is there a way to externally connect to the databases. [23:06:07] Danny_B|webgate: the problem is that we have to convert some data (like LDAP or DNS). [23:06:19] Cyberpower678: yes, with an SSH-tunnel [23:06:22] while roots know linux much better [23:06:52] Yea I thought so. Will that allow access into the databases as if it were local? [23:06:59] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:07:18] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:07:19] APT on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:07:27] DaBPunkt: it still seems better than dealing with issues every single day like recently... ts has been instable for at least two months, which is definitely worse than moving to different platform [23:07:28] APT on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:07:47] Cyberpower678: "ssh -L 1234:daphne:3306 yarrow.toolserver.org" will open a local mysqlport to daphne at port 1234 [23:07:59] Sun Grid Engine execd on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:07:59] Sun Grid Engine execd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:08:09] Danny_B|webgate: I'm not opposed to moving, on the contrary. But setting it up isn't a matter of minutes. [23:08:20] surte [23:08:48] but still less time than it has been already spent (and obviously still will be) with fixing current issues [23:09:01] (besides it was the plan anyway) [23:09:48] nacht ts [23:09:59] what i would suggest is, just simply turn off the ts for couple days, reinstall it properly and then make it accessible for ppl again [23:10:51] people will be more comfortable with few days of outage and then stability than with instability every hour [23:11:14] What's the tunnel listening address? [23:12:28] aliasd on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:12:46] Danny_B|webgate: Yeah, but moving off Solaris and other odd choices (like the ZWS webserver) has been on the table for years now, and instead of for example pursuing the Apache experiment setup further, it was called off. Perhaps the new root will bring some new momentum, but until then I'm skeptical. [23:12:59] SRaid on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:13:03] Cyberpower678: You mean on your local host? Port 1234. [23:13:09] / on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:13:09] /tmp on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:13:09] Sensors on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:13:19] /var on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:13:29] /var/tmp on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:13:39] Environment IPMI on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:13:39] Load avg. on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:14:40] scfc_de, I'm trying to patch my debugger into toolserver to access the databases with. Can someone help me through this? [23:14:49] I'm doing this for the first time. [23:15:14] scfc_de: i wouldn't say it's a question of (new) root [23:15:34] but rather about the decision to make the radical cut [23:16:01] which would require having ts off for couple days [23:16:04] Cyberpower678: I haven't used the PHP debugger even on my local machine, so: Not me :-). But is this debugging feature enabled on Toolserver? I'm not sure. [23:16:31] maybe we could start the discussion on the list? [23:16:48] / on yarrow is OK: DISK OK - free space: / 1582 MB (88% inode=94%): [23:16:49] /tmp on yarrow is OK: DISK OK - free space: /tmp 4085 MB (96% inode=99%): [23:16:49] /var on yarrow is OK: DISK OK - free space: /var 11690 MB (87% inode=96%): [23:16:49] SRaid on yarrow is OK: OK md0 status=[UU]. [23:16:50] Sun Grid Engine execd on yarrow is UNKNOWN: Cannot execute /sge/GE/bin/linux-x64/qhost [23:16:50] Sensors on yarrow is OK: sensor ok [23:16:50] Sun Grid Engine execd on nightshade is UNKNOWN: Cannot execute /sge/GE/bin/linux-x64/qhost [23:16:50] APT on nightshade is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [23:16:58] /var/tmp on yarrow is OK: DISK OK - free space: /var/tmp 827 MB (97% inode=99%): [23:16:59] aliasd on yarrow is OK: TCP OK - 0.005 second response time on port 984 [200 wp@dabpunkt.eu] [23:16:59] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [23:17:08] Screw it. [23:17:08] Environment IPMI on yarrow is OK: ok: temperature ok fan ok voltage ok chassis ok [23:17:09] Sun Grid Engine execd on ortelius is UNKNOWN: Cannot execute /sge/GE/bin/sol-amd64/qstat [23:17:09] Load avg. on yarrow is WARNING: WARNING - load average: 17.51, 19.52, 14.37 [23:17:28] I'm just going to do a trial and error debug directly on toolserver. [23:17:29] Sun Grid Engine execd on wolfsbane is UNKNOWN: Cannot execute /sge/GE/bin/sol-amd64/qstat [23:19:08] Load avg. on yarrow is OK: OK - load average: 4.09, 13.72, 12.86 [23:20:16] Toolserver's broken again. -.- [23:20:23] Danny_B|webgate: If you look at JIRA, there are trivial issues with ready fixes that have been open for over a year with no admin action. If we don't have energy for these low-hanging fruits, where do we get someone who ports all the services that reside on Solaris today (https://wiki.toolserver.org/view/Admin:HA_cluster just for the HA stuff) to Debian in a week, documents it all properly and hands us a working system in seven day [23:20:23] I don't see that happen. [23:20:39] s4 replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 3629.000000 [23:21:29] APT on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:21:39] s4 replag on daphne is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3551.000000 [23:27:09] MySQL slave on daphne is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2436 [23:28:40] s4 replag on daphne is OK: QUERY OK: SELECT ts_rc_age() returned 1614.000000 [23:29:09] MySQL slave on daphne is OK: Uptime: 2086953 Threads: 17 Questions: 249759712 Slow queries: 437294 Opens: 81765 Flush tables: 1 Open tables: 1925 Queries per second avg: 119.676 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1345 [23:29:58] My user account has dissapeared or something? SSH is rejecting my account, and I'm getting tons of cron e-mails with weird errors about my home directory not existing [23:30:27] also lots of "! could not obtain latest contract for PID (some number): No such process (some date)" [23:30:59] Krinkle: We are still down. [23:31:03] (Somewhat.) [23:32:23] scfc_de: ok [23:32:55] scfc_de: Is it possible to perhaps have stuff just be killed entirely instead of limping? [23:33:01] e.g. disable cron temporarily [23:33:24] I mean, there isn't much that wouldn't go wrong with the file system absent, right? [23:33:29] MySQL slave on z-dat-s2-b is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3101 [23:35:17] Krinkle: If you get into submit.toolserver.org, you could backup "crontab -l" and then erase it with "crontab -r". [23:35:28] MySQL slave on z-dat-s2-b is OK: Uptime: 18199 Threads: 10 Questions: 15225437 Slow queries: 152 Opens: 253790 Flush tables: 1 Open tables: 256 Queries per second avg: 836.608 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1465 [23:38:09] scfc_de: submit.ts.o is quering me for a password when I try to access. [23:38:14] I presume it has lost NFS as well? [23:40:23] Krinkle: Everything's either down, or everything's online. I should have said "*When* you get into ..." :-). [23:40:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [23:40:29] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [23:40:49] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [23:40:49] APT on nightshade is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [23:40:59] APT on yarrow is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [23:41:02] scfc_de: When I get into it, it will likely no longer be needed :) I'll wait it out and meanwhile set up a filter in gmail to ignore these [23:41:15] I think the problem with the password is that the LDAP server is down at that times, the NFS problem kicks in later. [23:41:22] I assume an announcement or toolserver-l reply will be made when I can un-ignore anything toolserver related. [23:41:39] If you get into it, it doesn't mean it will stay this way for long :-). [23:42:39] SMF on web.amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:42:39] Sensors on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:42:59] /tmp on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:42:59] aliasd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:43:09] APT on sage is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [23:43:09] APT on z-dat-s2-b is WARNING: APT WARNING: 34 packages available for upgrade (0 critical updates). [23:49:18] Load avg. on thyme is UNKNOWN: NRPE: Unable to read output [23:49:18] APT on z-dat-s1-b is WARNING: APT WARNING: 35 packages available for upgrade (0 critical updates). [23:49:18] SMF on amaranth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:49:18] MySQL on z-dat-s1-b is CRITICAL: Cant connect to MySQL server on z-dat-s1-b (146) [23:49:18] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:49:28] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 20729.000000 [23:49:28] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:49:38] / on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:49:38] Sun Grid Engine execd on mayapple is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:49:58] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [23:53:08] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:53:08] APT on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:53:28] Sun Grid Engine execd on wolfsbane is UNKNOWN: Cannot execute /sge/GE/bin/sol-amd64/qstat [23:53:38] ethernet 0/1/12 [csw1-esams:1/24] on asw-oe10-esams.mgmt is UNKNOWN: ERROR: Description table : No response from remote host asw-oe10-esams.mgmt. [23:53:38] Sun Grid Engine execd on ortelius is UNKNOWN: Cannot execute /sge/GE/bin/sol-amd64/qstat [23:53:38] APT on nightshade is WARNING: APT WARNING: 67 packages available for upgrade (0 critical updates). [23:54:18] Sun Grid Engine execd on willow is UNKNOWN: Cannot execute /sge/GE/bin/sol-amd64/qstat [23:54:18] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [23:54:28] Sun Grid Engine execd on nightshade is UNKNOWN: Cannot execute /sge/GE/bin/linux-x64/qhost [23:54:48] Sun Grid Engine execd on yarrow is UNKNOWN: Cannot execute /sge/GE/bin/linux-x64/qhost [23:57:58] Sun Grid Engine execd on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:58:08] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:58:08] APT on nightshade is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:58:18] Sun Grid Engine execd on yarrow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:58:48] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:58:48] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:58:58] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:58:58] Sun Grid Engine execd on yarrow is UNKNOWN: Execution timeout exceeded [23:59:18] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [23:59:18] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [23:59:27] Sun Grid Engine execd on nightshade is UNKNOWN: Error with qhost: error: commlib error: got select error (Connection refused) [23:59:28] aliasd on nightshade is CRITICAL: Connection refused [23:59:48] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.005 second response time