[00:03:41] QQ [00:05:55] QQ?? [00:06:29] Could not connect to database host "sql-s2-rr.toolserver.org". [00:06:29] Exception: SQLSTATE[HY000] [2013] Lost connection to MySQL server at 'reading initial communication packet', system error: 0 [00:07:11] MySQL on z-dat-s2-b is CRITICAL: Cant connect to MySQL server on z-dat-s2-b (146) [00:08:29] DerHexer: Same thing here... [00:08:38] And none of the roots are on :/ [00:08:56] http://status.toolserver.org/ lies to us :P [00:11:51] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7734.000000 [00:12:01] great [00:14:19] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [00:15:51] DerHexer: http://nagios.toolserver.org/cgi-bin/status.cgi?host=daphne :P [00:34:08] err [00:34:54] i cant connect either [00:35:39] it's back [00:36:15] but sql-s2-rr.toolserver.org not [00:39:11] / on damiana is WARNING: NRPE: Unable to read output [00:39:38] NTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:40:59] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [00:41:51] /tmp on damiana is WARNING: NRPE: Unable to read output [00:41:59] Load avg. on damiana is WARNING: NRPE: Unable to read output [00:45:39] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:47:59] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [00:48:00] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [00:48:00] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [00:48:00] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [00:48:59] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [00:49:11] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [00:49:11] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 100933 MB (16% inode=99%): [00:50:51] Free Memory on turnera is CRITICAL: CRITICAL - 1.7% (144296 kB) free! [00:51:11] MySQL slave on z-dat-s2-b is CRITICAL: Cant connect to MySQL server on z-dat-s2-b (146) [00:52:11] SMTP on damiana is CRITICAL: Connection refused [00:52:39] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:54:12] MySQL on z-dat-s2-b is OK: Uptime: 3194 Threads: 5 Questions: 63 Slow queries: 0 Opens: 30 Flush tables: 1 Open tables: 22 Queries per second avg: 0.19 [00:54:12] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [00:54:51] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 177266 MB (3% inode=67%): [00:55:00] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [00:55:00] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [00:55:00] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:55:10] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [00:55:11] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [00:56:11] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 124171.000000 [00:58:18] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [01:11:51] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 9349.000000 [01:39:39] NTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:40:10] / on damiana is WARNING: NRPE: Unable to read output [01:40:59] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [01:41:51] /tmp on damiana is WARNING: NRPE: Unable to read output [01:41:59] Load avg. on damiana is WARNING: NRPE: Unable to read output [01:44:22] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [01:45:39] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:47:59] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [01:48:00] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [01:48:00] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [01:48:00] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [01:48:59] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [01:50:11] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [01:50:11] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 98872 MB (16% inode=99%): [01:51:51] Free Memory on turnera is CRITICAL: CRITICAL - 2.4% (204116 kB) free! [01:52:11] SMTP on damiana is CRITICAL: Connection refused [01:52:12] MySQL slave on z-dat-s2-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 12101 [01:52:39] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:55:00] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [01:55:00] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [01:55:00] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:55:10] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [01:55:50] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 167131 MB (3% inode=68%): [01:56:11] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [01:56:11] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 127771.000000 [01:56:11] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [01:58:19] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [02:12:50] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 10277.000000 [02:30:32] The mailserver on yarrow seems to be stuck again :-(. [02:35:02] Which server hosts LDAP? [02:36:52] Apparently ha-ldap.esi (http://nagios.toolserver.org/cgi-bin/status.cgi?host=all). [02:39:39] NTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:40:10] / on damiana is WARNING: NRPE: Unable to read output [02:40:59] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [02:42:11] MySQL slave on z-dat-s2-b is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3026 [02:42:50] /tmp on damiana is WARNING: NRPE: Unable to read output [02:42:59] Load avg. on damiana is WARNING: NRPE: Unable to read output [02:45:33] https://jira.toolserver.org/browse/TS-1613 [02:45:39] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:48:59] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [02:49:00] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [02:49:00] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [02:49:00] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [02:49:00] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [02:51:10] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [02:51:11] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 100529 MB (16% inode=99%): [02:51:50] Free Memory on turnera is CRITICAL: CRITICAL - 1.9% (155984 kB) free! [02:52:39] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:11] SMTP on damiana is CRITICAL: Connection refused [02:54:59] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [02:54:59] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [02:54:59] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:55:10] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [02:55:51] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 164507 MB (3% inode=69%): [02:57:11] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [02:57:12] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 131431.000000 [02:57:12] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [02:58:10] MySQL slave on z-dat-s2-b is OK: Uptime: 10637 Threads: 11 Questions: 2177511 Slow queries: 218 Opens: 33833 Flush tables: 1 Open tables: 256 Queries per second avg: 204.711 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1673 [02:59:24] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [03:03:58] Ummm... am I the only one having errors with Toolserver? [03:04:00] Specifically of the 404 variety? [03:04:18] Hi Matthew_. [03:04:36] Matthew_: What specifically is giving you 404 errors? [03:04:53] Susan: http://toolserver.org/~matthewrbowker http://toolserver.org/~acc [03:05:03] Can't say about 404s, but I can't ssh into yarrow and nightshade ATM. [03:05:14] ugh [03:05:15] again [03:05:22] Keeps happening at this hour. [03:05:25] I think it's load-related. [03:05:44] Susan: And it's interesting, because I set a custom .htaccess and that's not working either. [03:05:48] I have an open session into willow. [03:05:58] I ran "uptime" and it's hanging. [03:06:01] heh [03:06:21] Hmm, it worked. [03:06:25] It's not awful. [03:06:35] It's just really slow. [03:06:57] When I try to ssh into willow, I get asked for a password, so I assume the /home server must be down. [03:06:58] Better now. [03:07:17] ls, uptime, ruptime -a are all quick now. [03:08:36] Still can't login. Shouldn't the load have been taken care of with the mandatory SGE? What did Nosy write about MySQL heads or something? [03:08:44] /home seems fine. I can't log in, though. [03:08:58] But my open session from earlier seems fine. [03:09:08] And I'm still 404ing... [03:09:29] someone probably just has a horrible script that starts 10 minutes ago [03:09:35] thats when tsbot/tsnag died [03:10:03] http://p.defau.lt/?e_zsJIW_rAbfR3Cvlvx9Uw [03:10:15] yarrow and nightshade are reporting as down. [03:10:22] run top -u to figure out who to blame? [03:10:30] er just top [03:10:49] Those loads are minimal and shouldn't cause any problems. [03:11:19] The loads look fine, yeah. It's the "down"s I'm worried about. ;-) [03:11:41] http://p.defau.lt/?asmBijtXnvzQacz1e8JXOQ [03:12:15] hm [03:12:26] The user ids instead of names look like some lookup (LDAP?) problem. [03:12:42] My assumption is that either the LDAP server has problems or the network connection is faulty. [03:12:54] How do you look up a user by ID? [03:13:22] Local users in /etc/passwd, LDAP ... dunno. [03:13:28] Are you on willow? [03:13:31] Yeah. [03:13:42] The user IDs are Unix, not LDAP, aren't they? [03:14:22] root & Co. should be local (in /etc/passwd), the rest from LDAP. One moment, I look up the ldapsearch command. [03:14:32] My SSH key is getting rejected too [03:14:59] "id" doesn't work? [03:15:12] Try :-). [03:15:31] mzmcbride@willow:~$ id mzmcbride [03:15:32] id: mzmcbride: No such user [03:15:35] :-) [03:15:48] Difficult to know whether that's just broken now or it's never worked, though. [03:15:54] Yeah, looks like the central user repo is down. [03:16:55] Please try "ldapsearch -h ldap -b ou=people,o=unix,o=toolserver '(uid=mzmcbride)' tsDefaultLicense". [03:17:07] (On willow.) [03:17:12] Running now. [03:18:17] Seems to hang. It's not outputting anything. [03:18:21] Sweet, nagios.toolserver.org seems to be located in the cluster as well. [03:18:46] Shoot... [03:20:22] Speaking of nagios... [03:20:39] scfc_de: http://p.defau.lt/?P47PCC3_1d3mnoLyVqFUqQ [03:20:54] Seems like you're right. LDAP must be broken or inaccessible or something. [03:22:22] ts-admins@toolserver.org -- well, I hope this isn't routed over the Toolserver :-). [03:22:26] "whoami" is also hanging. I didn't realize LDAP was so integrated. [03:23:00] Recovered, heh. [03:23:30] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:23:30] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [03:23:31] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [03:23:37] ^ Yep [03:23:58] Threatening to mail the admins apparently works :-). [03:24:13] (We should still tell them.) [03:24:15] Something seems to be doing this every day around this time. [03:24:26] Yeah, can you write an e-mail, scfc_de? :-) [03:24:33] s/can/will/ [03:24:52] Or a JIRA ticket. [03:25:08] Susan: A jira ticket is it's own set of problems... [03:25:10] I feel more like toolserver-l. [03:25:13] "id mzmcbride" works now, BTW. [03:25:23] Coren: You're around? [03:25:26] (like, logging in three times...) [03:25:41] Free Memory on turnera is OK: OK - 86.6% (7255016 kB) free. [03:25:44] Kinda. ¿Qué pasa? [03:25:45] Yeah, JIRA is having its own set of issues. [03:25:53] Which may be related. [03:26:41] journal.toolserver.org on web.amaranth is WARNING: (null) [03:28:05] Uh, oh... [03:28:11] Coren: nagios. We just noticed that setting up the nagios server inside the Toolserver cluster is a bad idea if the Toolserver cluster is down. Could you brainstorm with WMF what possibilities there are to integrate this somehow with the existing infrastructure there? [03:28:37] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:28:37] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [03:28:37] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:28:37] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [03:28:56] scfc_de: I can raise the issue, certainly. [03:28:58] Susan: But nothing's supposed to run on the LDAP server(s). They should be out of users' hands. [03:29:25] nightshade and yarrow are still reporting as down. [03:29:31] From ruptime -a on willow. [03:29:48] Hmm, it's just lying. I logged in to nightshade just fine. [03:29:52] I wonder what that's about. [03:30:07] Coren: Of course we need to wait for DaBPunkt, but if there are technical (or political) impediments on WMF's part, we needn't think about it at all. [03:30:34] The Wikimedia Foundation is using an outside service for monitoring. [03:30:39] At least for the public part. [03:30:54] And they just switched from nagios to icinga or something. [03:32:28] Susan: I'm looking more at icinga than status.wikimedia.org. [03:33:37] aliasd on yarrow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:33:57] NTP on turnera is WARNING: NTP WARNING: Server has the LI_ALARM bit set, Offset -0.004477 secs [03:33:57] aliasd on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:35:07] wikidata replag on thyme is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2070.000000 [03:35:17] s4 replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 1802.000000 [03:35:27] wikidata replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2067.000000 [03:35:27] wikidata replag on z-dat-s6-a is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2127.000000 [03:35:37] wikidata replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 1862.000000 [03:35:37] wikidata replag on z-dat-s7-a is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2060.000000 [03:35:57] I know status.wikimedia.org is an external (donated) service. [03:36:03] I'm not actually sure where WMF's nagios runs. [03:36:05] Or icinga. [03:36:16] s4 replag on cassia is OK: QUERY OK: SELECT ts_rc_age() returned 1011.000000 [03:36:23] Susan: status is an external service that neatly cleans up and present the stuff that comes from icinga; the latter is what we use internally. [03:36:37] wikidata replag on cassia is OK: QUERY OK: SELECT ts_rc_age() returned 1578.000000 [03:36:52] https://icinga.wikimedia.org/icinga/ [03:36:58] Predictably enough. :-) [03:37:07] wikidata replag on thyme is OK: QUERY OK: SELECT ts_rc_age() returned 1712.000000 [03:37:27] wikidata replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1781.000000 [03:37:28] wikidata replag on z-dat-s6-a is OK: QUERY OK: SELECT ts_rc_age() returned 1788.000000 [03:37:37] wikidata replag on z-dat-s7-a is OK: QUERY OK: SELECT ts_rc_age() returned 1746.000000 [03:40:27] NTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:40:27] / on damiana is WARNING: NRPE: Unable to read output [03:40:35] I thought they'd decided to stop using software names as subdomain names, but clearly not. [03:40:57] NTP on turnera is OK: NTP OK: Offset -0.008226 secs [03:40:59] mediawiki.wikimedia.org [03:41:57] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [03:43:08] Load avg. on damiana is WARNING: NRPE: Unable to read output [03:43:37] /tmp on damiana is WARNING: NRPE: Unable to read output [03:54:04] Tim Landscheidt * [Toolserver-l] Possible LDAP outtime this morning, major disruption [04:13:04] Tim Landscheidt * Re: [Toolserver-l] Possible LDAP outtime this morning, major disruption [04:27:47] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 166014 MB (3% inode=69%): [04:27:47] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 136865.000000 [04:27:47] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [04:27:57] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [04:27:57] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [04:27:57] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [04:28:10] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [04:28:10] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [04:28:10] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 9971.000000 [04:28:10] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [04:28:10] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [04:28:17] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [04:28:27] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [04:28:28] SMTP on damiana is CRITICAL: Connection refused [04:28:28] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:28:28] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 100451 MB (16% inode=99%): [04:28:28] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [04:28:28] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [04:28:37] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:28:37] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:28:47] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [04:28:48] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [04:33:40] aliasd on yarrow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:33:57] aliasd on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:40:27] NTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:40:28] / on damiana is WARNING: NRPE: Unable to read output [04:41:57] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [04:43:07] Load avg. on damiana is WARNING: NRPE: Unable to read output [04:43:37] /tmp on damiana is WARNING: NRPE: Unable to read output [05:27:47] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 169448 MB (3% inode=69%): [05:27:48] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 140465.000000 [05:27:48] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [05:27:57] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [05:27:58] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [05:27:58] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [05:28:08] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [05:28:08] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [05:28:08] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 9593.000000 [05:28:08] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [05:28:08] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [05:28:17] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [05:28:31] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [05:28:31] SMTP on damiana is CRITICAL: Connection refused [05:28:31] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:28:31] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 100329 MB (16% inode=99%): [05:28:31] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [05:28:31] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [05:28:37] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:28:37] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:28:47] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [05:33:37] aliasd on yarrow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:33:57] aliasd on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:40:27] / on damiana is WARNING: NRPE: Unable to read output [05:41:28] NTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:41:57] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [05:43:07] Load avg. on damiana is WARNING: NRPE: Unable to read output [05:43:37] /tmp on damiana is WARNING: NRPE: Unable to read output [05:58:36] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [06:27:48] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 170990 MB (3% inode=69%): [06:27:48] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [06:27:57] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [06:27:57] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [06:27:57] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [06:28:07] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [06:28:07] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [06:28:07] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 10023.000000 [06:28:07] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [06:28:07] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [06:28:17] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [06:28:27] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [06:28:27] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:28:27] SMTP on damiana is CRITICAL: Connection refused [06:28:27] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 100218 MB (16% inode=99%): [06:28:27] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [06:28:28] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [06:28:37] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:28:38] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:28:47] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [06:29:06] wikidata replag on z-dat-s2-b is CRITICAL: (Service Check Timed Out) [06:33:37] aliasd on yarrow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:33:57] aliasd on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:40:27] / on damiana is WARNING: NRPE: Unable to read output [06:42:12] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [06:42:30] NTP on damiana is CRITICAL: NTP CRITICAL: No response from NTP server [06:43:07] Load avg. on damiana is WARNING: NRPE: Unable to read output [06:43:38] /tmp on damiana is WARNING: NRPE: Unable to read output [06:58:37] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [07:27:48] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 171887 MB (3% inode=69%): [07:27:48] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [07:27:57] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [07:27:58] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [07:27:58] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [07:28:07] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [07:28:08] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [07:28:08] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 8959.000000 [07:28:08] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [07:28:08] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [07:28:20] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [07:28:27] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [07:28:28] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:28:28] SMTP on damiana is CRITICAL: Connection refused [07:28:28] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 100096 MB (16% inode=99%): [07:28:28] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [07:28:28] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [07:28:36] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:28:37] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:28:48] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [07:29:17] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 147754.000000 [07:33:37] aliasd on yarrow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:34:01] aliasd on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:40:46] / on damiana is WARNING: NRPE: Unable to read output [07:41:57] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [07:42:17] NTP on damiana is CRITICAL: NTP CRITICAL: No response from NTP server [07:43:07] Load avg. on damiana is WARNING: NRPE: Unable to read output [07:43:49] /tmp on damiana is WARNING: NRPE: Unable to read output [07:58:47] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [08:27:56] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [08:27:57] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [08:27:58] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [08:28:07] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [08:28:07] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [08:28:08] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4704.000000 [08:28:16] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [08:28:32] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [08:28:32] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:28:32] SMTP on damiana is CRITICAL: Connection refused [08:28:32] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 99919 MB (16% inode=99%): [08:28:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [08:28:32] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [08:28:37] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:28:38] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:28:47] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc:S3:oob:S4:2530:S9:ts-array5:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.descri [08:28:48] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 172607 MB (3% inode=69%): [08:28:48] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [08:29:12] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [08:29:12] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [08:29:28] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 151354.000000 [08:33:37] aliasd on yarrow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:33:57] aliasd on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:40:27] / on damiana is WARNING: NRPE: Unable to read output [08:41:57] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [08:42:33] NTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:43:07] Load avg. on damiana is WARNING: NRPE: Unable to read output [08:43:37] /tmp on damiana is WARNING: NRPE: Unable to read output [09:25:47] aliasd on nightshade is OK: TCP OK - 0.002 second response time on port 984 [500 Not found.] [09:27:57] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [09:27:57] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [09:27:57] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [09:28:07] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [09:28:07] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [09:28:07] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4440.000000 [09:28:22] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [09:28:27] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [09:28:28] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:28:28] SMTP on damiana is CRITICAL: Connection refused [09:28:28] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 99687 MB (16% inode=99%): [09:28:28] aliasd on yarrow is OK: TCP OK - 0.013 second response time on port 984 [500 Not found.] [09:28:28] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [09:28:28] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [09:28:37] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [09:28:37] NTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:28:37] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:28:47] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc:S3:oob:S4:2530:S9:ts-array5:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.descri [09:28:47] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 173045 MB (3% inode=69%): [09:28:47] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [09:29:07] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [09:29:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [09:29:16] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 154954.000000 [09:30:27] NTP on yucca is OK: NTP OK: Offset -0.025201 secs [09:30:28] SMTP on yucca is OK: SMTP OK - 0.037 sec. response time [09:40:27] / on damiana is WARNING: NRPE: Unable to read output [09:41:57] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [09:42:07] wikidata replag on daphne is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3524.000000 [09:42:27] NTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:43:07] Load avg. on damiana is WARNING: NRPE: Unable to read output [09:43:37] /tmp on damiana is WARNING: NRPE: Unable to read output [09:51:13] MySQL on z-dat-s2-b is CRITICAL: Cant connect to MySQL server on z-dat-s2-b (146) [09:57:13] wikidata replag on daphne is OK: QUERY OK: SELECT ts_rc_age() returned 1679.000000 [10:18:54] aliasd on nightshade is CRITICAL: Connection refused [10:28:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [10:28:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [10:28:13] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [10:28:13] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [10:28:34] SMTP on damiana is CRITICAL: Connection refused [10:28:34] MySQL slave on z-dat-s2-b is CRITICAL: Cant connect to MySQL server on z-dat-s2-b (146) [10:28:34] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 99465 MB (16% inode=99%): [10:28:34] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [10:28:34] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [10:28:53] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 172472 MB (3% inode=69%): [10:28:54] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [10:28:54] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [10:28:54] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [10:29:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [10:29:14] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [10:29:23] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:30:13] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s2-b (146) [10:32:14] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [10:40:33] / on damiana is WARNING: NRPE: Unable to read output [10:42:54] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [10:43:13] Load avg. on damiana is WARNING: NRPE: Unable to read output [10:43:23] NTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:43:43] /tmp on damiana is WARNING: NRPE: Unable to read output [10:51:13] MySQL on z-dat-s2-b is CRITICAL: Cant connect to MySQL server on z-dat-s2-b (146) [10:58:33] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [11:18:53] aliasd on nightshade is CRITICAL: Connection refused [11:28:14] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [11:28:14] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [11:28:14] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [11:28:14] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [11:28:33] SMTP on damiana is CRITICAL: Connection refused [11:28:33] MySQL slave on z-dat-s2-b is CRITICAL: Cant connect to MySQL server on z-dat-s2-b (146) [11:28:33] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 99159 MB (16% inode=99%): [11:28:33] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [11:28:33] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [11:28:53] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [11:28:53] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 165865 MB (3% inode=69%): [11:28:53] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [11:28:53] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [11:29:14] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [11:29:14] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [11:29:23] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:30:14] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s2-b (146) [11:32:13] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [11:38:32] i cant access s2 [11:40:33] / on damiana is WARNING: NRPE: Unable to read output [11:42:54] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [11:43:14] Load avg. on damiana is WARNING: NRPE: Unable to read output [11:43:23] NTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:43:43] /tmp on damiana is WARNING: NRPE: Unable to read output [11:51:13] MySQL on z-dat-s2-b is CRITICAL: Cant connect to MySQL server on z-dat-s2-b (146) [11:51:53] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 68458 MB (7% inode=99%): [11:58:33] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [12:18:54] aliasd on nightshade is CRITICAL: Connection refused [12:28:33] SMTP on damiana is CRITICAL: Connection refused [12:28:34] MySQL slave on z-dat-s2-b is CRITICAL: Cant connect to MySQL server on z-dat-s2-b (146) [12:28:34] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 98888 MB (16% inode=99%): [12:28:34] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [12:28:34] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [12:28:54] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [12:28:54] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 162221 MB (3% inode=69%): [12:28:54] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [12:28:54] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [12:29:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [12:29:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [12:29:14] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [12:29:14] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [12:29:14] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [12:29:14] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [12:29:23] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:30:13] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s2-b (146) [12:32:14] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [12:40:33] / on damiana is WARNING: NRPE: Unable to read output [12:42:53] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [12:43:13] Load avg. on damiana is WARNING: NRPE: Unable to read output [12:43:23] NTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:43:42] /tmp on damiana is WARNING: NRPE: Unable to read output [12:51:14] MySQL on z-dat-s2-b is CRITICAL: Cant connect to MySQL server on z-dat-s2-b (146) [12:58:33] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [12:59:13] MySQL on z-dat-s2-b is OK: Uptime: 42 Threads: 5 Questions: 669 Slow queries: 1 Opens: 4001 Flush tables: 1 Open tables: 193 Queries per second avg: 15.928 [13:04:13] Load avg. on damiana is CRITICAL: Connection refused by host [13:05:13] Load avg. on damiana is WARNING: NRPE: Unable to read output [13:11:33] SMTP on damiana is OK: SMTP OK - 0.250 sec. response time [13:15:14] NTP on damiana is OK: NTP OK: Offset -0.00047 secs [13:18:53] [[Special:Log/newusers]] create 10 * S.M.Samee * (New user account) [13:18:53] aliasd on nightshade is CRITICAL: Connection refused [13:28:34] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 98615 MB (16% inode=99%): [13:28:34] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [13:28:34] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [13:28:43] /tmp on damiana is OK: DISK OK - free space: /tmp 8253 MB (99% inode=99%): [13:28:53] MySQL slave on z-dat-s2-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 37619 [13:28:54] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 156515 MB (2% inode=69%): [13:28:54] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [13:28:54] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [13:28:54] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [13:29:14] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [13:29:14] Load avg. on damiana is OK: OK - load average: 0.66, 0.63, 0.67 [13:29:14] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [13:29:14] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [13:29:14] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [13:29:15] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [13:29:15] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [13:29:23] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:29:33] / on damiana is OK: DISK OK - free space: / 42734 MB (59% inode=95%): [13:31:03] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 169460.000000 [13:32:13] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [13:35:53] aliasd on nightshade is OK: TCP OK - 0.016 second response time on port 984 [500 Not found.] [13:42:53] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [13:58:33] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [14:28:53] MySQL slave on z-dat-s2-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 37950 [14:28:53] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 155434 MB (2% inode=69%): [14:28:53] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [14:28:53] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [14:28:53] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [14:29:22] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:29:34] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 98333 MB (16% inode=99%): [14:29:34] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [14:29:34] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [14:30:14] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [14:30:14] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [14:30:14] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [14:30:14] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [14:30:14] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [14:30:15] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [14:31:23] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 173080.000000 [14:32:13] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [14:42:53] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [14:58:22] @replag [14:58:28] liangent: s1-rr-a-wd: 53s [-0.02 s/s]; s1-user-wd: 53s [-0.03 s/s]; s2-rr: 8h 45m 35s [-12.90 s/s]; s2-user: 8h 45m 35s [-12.90 s/s]; s2-user-c: error; s2-user-wd: 2d 31m 49s [+1.00 s/s]; s3-user-wd: 53s [-0.01 s/s]; s4-user-wd: 56s [-0.01 s/s] [14:58:29] liangent: s5-user: error; s5-user-c: error; s6-user-wd: 57s [+0.03 s/s]; s7-user-wd: 56s [+0.01 s/s] [14:58:33] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [15:28:54] MySQL slave on z-dat-s2-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 27638 [15:28:54] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [15:28:54] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 154003 MB (2% inode=69%): [15:28:54] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [15:28:54] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [15:29:23] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:29:33] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 97997 MB (16% inode=99%): [15:29:33] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [15:29:33] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [15:30:14] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [15:30:14] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [15:30:14] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [15:30:14] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [15:30:15] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [15:30:15] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [15:31:22] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 176680.000000 [15:32:13] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [15:42:53] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [15:58:33] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [15:58:54] aliasd on nightshade is CRITICAL: Connection refused [16:26:56] Hi, I should change my username from pasqual to coet on toolserver, and create a new login password. Are both things able? [16:27:31] coet|cawiki: you have to file a request in jira [16:27:39] you should be able to change your own password though [16:28:53] MySQL slave on z-dat-s2-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 25945 [16:28:53] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [16:28:53] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 153715 MB (2% inode=69%): [16:28:53] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [16:28:55] how can i change it, legoktm? [16:29:05] passwd i think [16:29:23] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [16:29:27] this is for the LPDA password [16:29:34] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 97695 MB (16% inode=99%): [16:29:34] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [16:29:34] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [16:29:53] not the login password [16:29:54] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [16:30:37] oh [16:30:42] login password is just your ssh key [16:30:49] just make a new one, and stick it in your .ssh folder [16:31:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [16:31:14] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [16:31:14] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [16:31:14] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [16:31:14] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [16:31:14] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [16:31:23] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 180280.000000 [16:31:52] i'll try it [16:32:00] thx [16:32:07] np [16:32:13] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [16:32:29] (and i'll file a request in jira) [16:32:36] thx too [16:42:53] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [16:58:32] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [16:58:54] aliasd on nightshade is CRITICAL: Connection refused [17:19:13] Free Memory on damiana is CRITICAL: CRITICAL - 2.9% (240620 kB) free! [17:28:53] MySQL slave on z-dat-s2-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 7124 [17:28:54] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [17:28:54] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [17:28:54] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 153566 MB (2% inode=69%): [17:29:22] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:29:33] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 97511 MB (16% inode=99%): [17:29:34] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [17:29:34] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [17:29:53] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [17:31:14] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [17:31:14] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [17:31:14] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [17:31:14] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [17:31:14] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [17:31:23] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 183880.000000 [17:32:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [17:32:13] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [17:33:15] legoktm: I have create a new public key with the puttygen, maybe I must upload the new file to my .ssh folder? [17:33:55] there are only two files: authorized_keys and known_hosts [17:34:42] can I edit the authorized_keys and append my new key there? [17:35:53] MySQL slave on z-dat-s2-b is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3297 [17:37:54] MySQL slave on z-dat-s2-b is OK: Uptime: 16753 Threads: 7 Questions: 16737833 Slow queries: 1042 Opens: 316423 Flush tables: 1 Open tables: 256 Queries per second avg: 999.94 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1621 [17:38:14] Free Memory on damiana is WARNING: WARNING - 6.5% (544652 kB) free! [17:39:19] Ok, legoktm, I have made that I asked you and it works!!! Thanks!!! [17:42:53] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [17:43:14] Free Memory on damiana is CRITICAL: CRITICAL - 4.9% (411440 kB) free! [17:58:34] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [17:58:53] aliasd on nightshade is CRITICAL: Connection refused [18:28:53] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 153438 MB (2% inode=69%): [18:28:54] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [18:28:54] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [18:29:23] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:29:33] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 97203 MB (15% inode=99%): [18:29:33] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [18:29:33] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [18:29:53] APT on nightshade is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [18:31:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [18:31:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [18:31:13] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [18:31:14] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [18:31:14] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [18:31:23] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 187482.000000 [18:32:14] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [18:32:14] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [18:42:54] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [18:43:13] Free Memory on damiana is CRITICAL: CRITICAL - 1.6% (130076 kB) free! [18:58:43] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [18:58:53] aliasd on nightshade is CRITICAL: Connection refused [19:31:26] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 96884 MB (15% inode=99%): [19:31:30] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [19:31:30] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [19:31:30] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [19:31:30] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [19:31:40] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 191098.000000 [19:31:50] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [19:31:51] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [19:32:00] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [19:32:01] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [19:32:01] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [19:33:00] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [19:33:10] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [19:36:40] /sql on z-dat-s5-b is UNKNOWN: NRPE: Unable to read output [19:36:40] /tmp on z-dat-s5-b is UNKNOWN: NRPE: Unable to read output [19:36:50] APT on z-dat-s5-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [19:37:00] Load avg. on z-dat-s5-b is CRITICAL: NRPE: Command check_load not defined [19:37:10] MySQL slave on z-dat-s5-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 290511 [19:37:40] / on z-dat-s5-b is CRITICAL: NRPE: Command check_root not defined [19:37:40] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 290485.000000 [19:38:00] wikidata replag on daphne is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2002.000000 [19:43:50] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [19:43:51] Free Memory on damiana is CRITICAL: CRITICAL - 1.5% (127424 kB) free! [19:50:13] DanielK_WMDE: Do you still have root privileges? Could you "service aliasd restart; sleep 30; service postfix restart" on yarrow, please? [19:53:57] scfc_de: done. now tell me what that is supposed to have fixed :) [19:54:55] * DanielK_WMDE can't find the admin log any more [19:55:18] It was in Atlassian or something, wasn't it? [19:56:31] we have tickets for some things in jira, but i thought we had a more light weight logging thing. [19:56:36] but maybe that went away. [19:56:46] I thought that was a JIRA counterpart wiki thing where the log was. [19:56:50] Confluence, maybe? [19:57:04] we killed confluence years ago [19:57:42] Time flies. [19:58:30] aliasd on yarrow is CRITICAL: Connection refused [19:58:55] aliasd was stuck as LDAP went away, so hopefully you gave it a kickstart :-). Let's see. [19:59:14] (aliasd is the thingy that determines from LDAP and ~/.forwards where mail should be delivered to.) [19:59:41] aliasd on nightshade is CRITICAL: Connection refused [20:00:29] DanielK_WMDE: Could you do "postfix flush" as well to get the queue delivered? [20:01:31] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [20:03:54] done [20:03:55] hrm [20:03:58] 2013 Feb 28 20:03:35 yarrow fatal: tcp:localhost:984(0,lock|fold_fix): table lookup problem [20:04:00] whatever that means [20:04:36] Hmmm. I received some older mail, but the queue is stuck again at 45 messages and the aliasd process is gone. [20:05:11] Could you "service aliasd start" and look if a process aliasd is created? [20:05:27] Ah, there it is (18689). [20:05:30] aliasd on yarrow is OK: TCP OK - 0.003 second response time on port 984 [500 Not found.] [20:05:36] that error message indicates a postfix authentication failure, if google isn't fooling me [20:05:37] And the mailq is empty. [20:06:10] Well, whatever it is, it's working now. [20:06:13] i'll not attempt to fix this, since i'm likely to break things for good :P [20:06:40] aliasd on nightshade is OK: TCP OK - 0.018 second response time on port 984 [500 Not found.] [20:06:46] I spoke with DaBPunkt the other day to get rid of the very unstable aliasd thing altogether, but that has time till he comes back. [20:08:40] I'll note your restart at https://jira.toolserver.org/browse/TS-1613. [20:30:30] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [20:30:40] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 153457 MB (2% inode=69%): [20:30:50] APT on nightshade is CRITICAL: APT CRITICAL: 8 packages available for upgrade (4 critical updates). warnings detected. run with -v for information. [20:31:20] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 96635 MB (15% inode=99%): [20:31:30] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:31:31] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [20:31:31] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [20:31:31] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [20:31:40] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 194698.000000 [20:31:50] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [20:31:50] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [20:32:00] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [20:32:00] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [20:32:00] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [20:33:00] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [20:33:11] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [20:36:40] /tmp on z-dat-s5-b is UNKNOWN: NRPE: Unable to read output [20:36:50] APT on z-dat-s5-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [20:37:00] Load avg. on z-dat-s5-b is CRITICAL: NRPE: Command check_load not defined [20:37:10] MySQL slave on z-dat-s5-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 286311 [20:37:40] / on z-dat-s5-b is CRITICAL: NRPE: Command check_root not defined [20:37:40] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 286271.000000 [20:38:00] wikidata replag on daphne is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3128.000000 [20:43:41] SSH on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:43:50] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [20:43:50] Free Memory on damiana is CRITICAL: CRITICAL - 1.5% (127196 kB) free! [20:56:40] /tmp on z-dat-s5-b is CRITICAL: Connection refused by host [20:57:40] /sql on z-dat-s5-b is OK: DISK OK - free space: /sql 132303 MB (20% inode=99%): [20:57:40] / on z-dat-s5-b is OK: DISK OK - free space: / 2673 MB (69% inode=89%): [20:57:40] /tmp on z-dat-s5-b is OK: DISK OK - free space: / 2673 MB (69% inode=89%): [20:58:00] Load avg. on z-dat-s5-b is OK: OK - load average: 1.58, 1.90, 2.03 [21:01:40] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [21:11:40] Sun Grid Engine execd on yarrow is UNKNOWN: Execution timeout exceeded [21:12:00] Sun Grid Engine execd on nightshade is UNKNOWN: Execution timeout exceeded [21:15:20] LDAP on ha-ldap.esi is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:15:40] aliasd on yarrow is CRITICAL: Connection refused [21:15:41] aliasd on nightshade is CRITICAL: Connection refused [21:19:43] Environment IPMI on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [21:19:55] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 3844.000000 [21:19:55] LDAP on ha-ldap.esi is OK: LDAP OK - 0.864 seconds response time [21:20:53] Sun Grid Engine execd on yarrow is OK: Host and Queues Ok [21:21:54] Free Memory on damiana is OK: OK - 92.0% (7707420 kB) free. [21:27:00] aliasd on yarrow is OK: TCP OK - 0.003 second response time on port 984 [500 Not found.] [21:29:14] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [21:30:57] APT on nightshade is CRITICAL: APT CRITICAL: 8 packages available for upgrade (4 critical updates). warnings detected. run with -v for information. [21:31:04] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 153634 MB (2% inode=69%): [21:31:13] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [21:32:00] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [21:32:00] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [21:32:00] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 96365 MB (15% inode=99%): [21:32:05] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [21:32:06] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [21:32:06] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [21:32:06] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 198328.000000 [21:32:23] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [21:32:56] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [21:32:57] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [21:32:57] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [21:33:57] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [21:33:57] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [21:36:57] APT on z-dat-s5-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [21:37:57] MySQL slave on z-dat-s5-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 281593 [21:38:04] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 281609.000000 [21:44:23] SSH on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:54:57] s1 replag on thyme is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2169.000000 [21:54:57] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2169.000000 [21:54:57] wikidata replag on thyme is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2180.000000 [21:54:58] s4 replag on daphne is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2170.000000 [21:54:58] s4 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2173.000000 [21:54:58] s5 replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2173.000000 [21:54:58] wikidata replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2182.000000 [21:54:59] wikidata replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2184.000000 [21:55:04] wikidata replag on z-dat-s7-a is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2194.000000 [21:55:04] wikidata replag on z-dat-s6-a is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2196.000000 [22:01:56] s5 replag on cassia is OK: QUERY OK: SELECT ts_rc_age() returned 925.000000 [22:05:57] MySQL slave on rosemary is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2297 [22:06:57] MySQL slave on thyme is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2290 [22:08:57] wikidata replag on cassia is OK: QUERY OK: SELECT ts_rc_age() returned 1649.000000 [22:11:57] wikidata replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1723.000000 [22:13:04] wikidata replag on z-dat-s7-a is OK: QUERY OK: SELECT ts_rc_age() returned 1676.000000 [22:13:04] wikidata replag on z-dat-s6-a is OK: QUERY OK: SELECT ts_rc_age() returned 1794.000000 [22:13:58] s1 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1739.000000 [22:13:58] MySQL slave on rosemary is OK: Uptime: 266192 Threads: 12 Questions: 229190115 Slow queries: 151823 Opens: 17010 Flush tables: 1 Open tables: 3010 Queries per second avg: 860.995 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1736 [22:15:57] s1 replag on thyme is OK: QUERY OK: SELECT ts_rc_age() returned 1788.000000 [22:15:57] wikidata replag on thyme is OK: QUERY OK: SELECT ts_rc_age() returned 1741.000000 [22:15:57] MySQL slave on thyme is OK: Uptime: 253026 Threads: 16 Questions: 182037893 Slow queries: 24488 Opens: 2066 Flush tables: 1 Open tables: 402 Queries per second avg: 719.443 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1784 [22:16:15] SSH on z-dat-s5-b is OK: SSH OK - OpenSSH_5.5p1 Debian-6+squeeze3 (protocol 2.0) [22:18:56] s4 replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 3604.000000 [22:18:57] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 3613.000000 [22:19:57] s4 replag on daphne is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3457.000000 [22:19:57] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 5900.000000 [22:24:57] MySQL slave on z-dat-s6-a is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3867 [22:24:57] MySQL slave on z-dat-s7-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3109 [22:24:57] MySQL slave on daphne is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1893 [22:24:57] MySQL slave on z-dat-s3-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2862 [22:25:58] s4 replag on daphne is OK: QUERY OK: SELECT ts_rc_age() returned 1503.000000 [22:25:58] MySQL slave on daphne is OK: Uptime: 5272242 Threads: 7 Questions: 1207111344 Slow queries: 102912 Opens: 244331 Flush tables: 1 Open tables: 1481 Queries per second avg: 228.955 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1495 [22:26:15] wikidata replag on z-dat-s6-a is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2097.000000 [22:29:15] Environment IPMI on damiana is WARNING: NRPE: Unable to read output [22:30:57] MySQL slave on z-dat-s3-a is OK: Uptime: 1678091 Threads: 21 Questions: 1492525856 Slow queries: 62224 Opens: 14105966 Flush tables: 1 Open tables: 16384 Queries per second avg: 889.418 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1614 [22:30:58] APT on nightshade is CRITICAL: APT CRITICAL: 8 packages available for upgrade (4 critical updates). warnings detected. run with -v for information. [22:31:05] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 153411 MB (2% inode=69%): [22:31:15] APT on mayapple is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [22:31:24] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [22:31:57] MySQL slave on z-dat-s7-a is OK: Uptime: 1678150 Threads: 6 Questions: 638281402 Slow queries: 34194 Opens: 2339826 Flush tables: 1 Open tables: 3882 Queries per second avg: 380.348 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1650 [22:32:14] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [22:32:23] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:32:57] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [22:32:58] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [22:32:58] APT on yucca is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [22:32:58] APT on z-dat-s2-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [22:32:58] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [22:32:58] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 96042 MB (15% inode=99%): [22:33:04] APT on yarrow is CRITICAL: APT CRITICAL: 4 packages available for upgrade (4 critical updates). [22:33:04] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [22:33:04] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 195241.000000 [22:33:56] wikidata replag on daphne is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3590.000000 [22:34:58] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [22:34:59] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [22:36:57] APT on z-dat-s5-b is CRITICAL: APT CRITICAL: 3 packages available for upgrade (3 critical updates). [22:37:56] MySQL slave on z-dat-s6-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3274 [22:38:14] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 276153.000000 [22:38:56] MySQL slave on z-dat-s5-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 276105 [22:44:56] wikidata replag on daphne is OK: QUERY OK: SELECT ts_rc_age() returned 1729.000000 [22:45:09] Since about 12:00Z, SGE jobs fail "cgroup change of group failed", "/usr/bin/cgdelete: cannot remove group 'users/dbreps/1459667-undefined': No such file or directory". Anybody seeing the same? [22:47:22] Merlissimo: Any idea? [22:48:20] scfc_de: nope. cgroups is used in the prolog of sge on linux servers. it was added bei DaB. [22:48:42] i added only the resource control for solaris and i am not familar with cgroups [22:48:57] MySQL slave on z-dat-s6-a is OK: Uptime: 1679170 Threads: 11 Questions: 615848416 Slow queries: 89552 Opens: 2839034 Flush tables: 1 Open tables: 3199 Queries per second avg: 366.757 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1446 [22:49:06] the problem is that this error causes the queues to change in error state (so that no more jobs are accepted) [22:49:40] the job itself is simply rescheduled on another host [22:50:15] wikidata replag on z-dat-s6-a is OK: QUERY OK: SELECT ts_rc_age() returned 1788.000000 [22:51:20] Merlissimo: Hmmm. dbreps is a bot, and it hasn't done updates for all of these jobs. They also don't show up in qstat. [22:52:10] for finished job you have to use qacct [22:53:48] dbreps run for 33 minutes today in total [22:54:06] scfc_de: qacct -o dbreps -d 1 -j [22:57:37] Merlissimo: Yep, still http://en.wikipedia.org/w/index.php?title=Special:Contributions/BernsteinBot&offset=&limit=500&target=BernsteinBot shows only two (non-test) reports since 12:00Z (they would have at least updated the "last change:" bit), and I see 16 "cgroup change ..." error messages. [22:58:02] (dbreps are run -l arch=lx.) [23:00:21] Merlissimo: But it was you who changed /sge/scripts/prolog-lx.sh at 9:05Z today?! :-) (Then we would have a temporal coincidence.) [23:04:49] scfc_de: that as me, because the last error were caused by a script with job name "stats" and i though the name could have caused the error [23:05:08] i only added the "J" for job name [23:05:36] I have no idea about cgroups, so I can't tell whether this should have any effect. [23:05:36] but that was not the problem. i think it is caused by ldap problems [23:06:34] i don't know cgroup, too, so it was only a quick try [23:07:57] Could we if/else your fix for the moment, so that J is only added for "stats"? [23:09:47] SGE_O_LOGNAME seems to be the username, BTW, so the job name should not cause trouble. [23:12:43] Actually, no. [23:13:16] it only created a unique string, so adding J or not is not important [23:13:30] SGO_O_LOGNAME is the username [23:13:41] Yeah, now I see: The message complains about cgdelete, so you need to patch epilog-lx.sh as well. [23:14:36] done [23:14:41] And there's starter-lx.sh and linux_sum_mem.sh. [23:15:24] (Don't know about linux_sum_mem.sh, though. Looks different.) [23:15:36] llinux_sum_mem is only a sensor try [23:16:27] Sensor try? [23:17:16] it is not used [23:17:38] Okay. Next dbreps job is scheduled for 23:30, let's see how it works. [23:18:21] dab wrote a load sensor checking unavailable mem [23:18:58] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7212.000000 [23:19:00] Ah. For Nagios I suppose? [23:19:44] but the cgroup errors in prolog/epilog cannot cause problems for the job script [23:21:47] Merlissimo: sorry to distract, but is the wikidata db still down? [23:21:52] From a logical view point, I think you're right :-). But I'd like to have this out of the way before debugging the rest.