[00:00:00] wikidata replag on z-dat-s5-b is OK: QUERY OK: SELECT ts_rc_age() returned 1783.000000 [00:00:31] MySQL slave on z-dat-s5-b is OK: Uptime: 171190 Threads: 5 Questions: 754177891 Slow queries: 1904 Opens: 105188 Flush tables: 1 Open tables: 256 Queries per second avg: 4405.502 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1683 [00:01:22] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [00:01:30] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 164351.000000 [00:01:40] MySQL slave on z-dat-s2-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 35450 [00:01:50] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:16:38] [[Main Page/lang]] ! 10https://wiki.toolserver.org/w/index.php?diff=7851&oldid=7844&rcid=21657 * 176.55.165.224 * (-5) (parmaklararasindan) [00:18:14] [[Main Page/lang]] M 10https://wiki.toolserver.org/w/index.php?diff=7852&oldid=7851&rcid=21658 * Betacommand * (+5) (Reverted edits by [[Special:Contributions/176.55.165.224|176.55.165.224]] ([[User talk:176.55.165.224|talk]]) to last revision by [[User:Chihonglee|Chihonglee]]) [00:26:07] nacht ts [00:27:50] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [00:30:10] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54794 MB (5% inode=99%): [00:35:30] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [00:44:39] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [00:46:21] / on ortelius is WARNING: DISK WARNING - free space: / 4038 MB (13% inode=92%): [00:47:10] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [00:47:30] MySQL slave on z-dat-s5-b is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2118 [00:48:01] wikidata replag on z-dat-s5-b is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2138.000000 [00:51:00] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 169886 MB (3% inode=67%): [00:53:00] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [00:53:40] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 81000 MB (13% inode=99%): [00:55:10] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 140801.000000 [01:01:20] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [01:01:30] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 167260.000000 [01:01:40] MySQL slave on z-dat-s2-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 38364 [01:01:50] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:19:00] Load avg. on ortelius is CRITICAL: CRITICAL - load average: 11.92, 26.34, 16.37 [01:20:00] Load avg. on ortelius is WARNING: WARNING - load average: 7.54, 22.55, 15.66 [01:24:00] Load avg. on ortelius is OK: OK - load average: 4.97, 13.25, 13.34 [01:27:50] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [01:33:20] Free Memory on damiana is WARNING: WARNING - 6.2% (520940 kB) free! [01:35:30] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [01:40:20] Free Memory on damiana is CRITICAL: CRITICAL - 4.9% (409408 kB) free! [01:44:40] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [01:46:20] / on ortelius is WARNING: DISK WARNING - free space: / 3704 MB (12% inode=92%): [01:47:11] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [01:47:30] MySQL slave on z-dat-s5-b is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3045 [01:48:00] wikidata replag on z-dat-s5-b is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3070.000000 [01:51:00] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 169839 MB (3% inode=67%): [01:53:00] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [01:53:40] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 78879 MB (12% inode=99%): [01:55:10] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 142809.000000 [02:01:20] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [02:01:30] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 170298.000000 [02:01:40] MySQL slave on z-dat-s2-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 40576 [02:01:50] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:04:30] MySQL slave on z-dat-s5-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3644 [02:05:00] wikidata replag on z-dat-s5-b is CRITICAL: (Service Check Timed Out) [02:25:10] wikidata replag on z-dat-s5-b is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3576.000000 [02:25:30] MySQL slave on z-dat-s5-b is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3551 [02:27:50] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [02:35:30] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [02:40:20] Free Memory on damiana is CRITICAL: CRITICAL - 1.9% (160404 kB) free! [02:44:30] MySQL slave on z-dat-s5-b is OK: Uptime: 181030 Threads: 6 Questions: 836683446 Slow queries: 2077 Opens: 106338 Flush tables: 1 Open tables: 255 Queries per second avg: 4621.794 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1720 [02:44:40] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [02:45:10] wikidata replag on z-dat-s5-b is OK: QUERY OK: SELECT ts_rc_age() returned 1654.000000 [02:46:20] / on ortelius is WARNING: DISK WARNING - free space: / 3385 MB (11% inode=92%): [02:47:11] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [02:51:02] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 167458 MB (3% inode=67%): [02:53:00] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [02:53:40] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 80598 MB (13% inode=99%): [02:55:10] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 143174.000000 [03:01:20] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [03:01:30] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 173197.000000 [03:01:40] MySQL slave on z-dat-s2-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 41985 [03:01:50] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:11:20] / on ortelius is CRITICAL: DISK CRITICAL - free space: / 3254 MB (10% inode=92%): [03:12:19] / on ortelius is WARNING: DISK WARNING - free space: / 5502 MB (18% inode=92%): [03:27:50] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [03:35:30] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [03:40:20] Free Memory on damiana is CRITICAL: CRITICAL - 1.6% (134940 kB) free! [03:44:39] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [03:47:10] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [03:51:00] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 167383 MB (3% inode=67%): [03:53:00] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [03:53:40] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 80421 MB (13% inode=99%): [03:55:10] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 143689.000000 [04:01:20] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [04:01:30] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 176270.000000 [04:01:40] MySQL slave on z-dat-s2-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 43455 [04:01:55] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:12:20] / on ortelius is WARNING: DISK WARNING - free space: / 5225 MB (17% inode=92%): [04:27:50] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [04:35:30] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [04:40:20] Free Memory on damiana is CRITICAL: CRITICAL - 1.9% (159360 kB) free! [04:44:40] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [04:47:10] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [04:51:02] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 167227 MB (3% inode=67%): [04:53:00] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [04:53:40] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 80221 MB (13% inode=99%): [04:55:10] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 145008.000000 [05:01:40] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [05:01:54] MySQL slave on z-dat-s2-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 45640 [05:01:54] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:02:35] wikidata replag on daphne is CRITICAL: (Service Check Timed Out) [05:08:58] TS downish? [05:10:31] What are you noticing? [05:11:17] my ircbots are timing out [05:11:32] [12:07:30 AM] legobot (legoktm@wikimedia/bot/legobot) left IRC. (Ping timeout: 272 seconds) [05:11:32] [12:08:49 AM] legobot2 (legoktm@wikimedia/bot/legobot) left IRC. (Ping timeout: 252 seconds) [05:11:55] and one stopped responding so it should timeout shortly [05:13:25] [12:11:44 AM] i-am-legobot (legoktm@wikimedia/bot/legobot) left IRC. (Ping timeout: 255 seconds) [05:15:15] willow is asking for a password when i ssh in [05:15:26] oh well [05:25:27] [[unc:UN:N]] ;-) [05:26:01] ? [05:26:11] o.O [05:26:14] :PPP [05:36:34] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:36:34] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:38:14] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [05:38:14] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [05:39:09] hmmm [05:39:15] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:15] PING on ptolemy is CRITICAL: CRITICAL - Plugin timed out after 10 seconds [05:39:15] SMTP on rosemary is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:26] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:26] SMTP on cassia is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:27] SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:28] SMTP on hemlock is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:29] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:35] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:36] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:39] SMTP on ptolemy is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:40] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:41] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:39:42] SMTP on thyme is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:41:46] SMTP on adenia is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:41:46] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:41:47] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:41:47] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:42:07] SMTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:06] DiskSuite on damiana is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:45:06] SMTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:06] Environment IPMI on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:45:06] ts-array5 on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:45:06] /tmp on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:45:06] SMTP on daphne is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:15] NTP on damiana is CRITICAL: NTP CRITICAL: No response from NTP server [05:45:16] wikidata replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2672.000000 [05:45:29] / on ortelius is WARNING: DISK WARNING - free space: / 5045 MB (16% inode=92%): [05:45:29] s4 replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2652.000000 [05:45:30] SSH on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:45:30] PING on yucca is CRITICAL: CRITICAL - Plugin timed out after 10 seconds [05:45:58] DiskSuite on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:45:59] Load avg. on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:45:59] / on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:46:28] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2704.000000 [05:46:28] s4 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2708.000000 [05:46:29] s5 replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2718.000000 [05:46:30] wikidata replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2741.000000 [05:46:39] wikidata replag on z-dat-s6-a is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2745.000000 [05:46:40] wikidata replag on z-dat-s7-a is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2744.000000 [05:46:42] wikidata replag on z-dat-s5-b is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2719.000000 [05:46:43] s1 replag on thyme is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2719.000000 [05:46:44] wikidata replag on thyme is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2758.000000 [05:46:56] SMTP on ortelius is OK: SMTP OK - 7.868 sec. response time [05:46:57] DiskSuite on damiana is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:08:56] PING on ptolemy is OK: PING OK - Packet loss = 0%, RTA = 0.28 ms [06:09:02] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4099.000000 [06:09:03] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [06:09:03] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4064.000000 [06:09:13] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4070.000000 [06:09:13] MySQL slave on cassia is CRITICAL: (Return code of 139 is out of bounds) [06:09:13] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4082.000000 [06:09:13] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4108.000000 [06:09:14] s5 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4085.000000 [06:09:14] MySQL slave on z-dat-s7-a is CRITICAL: (Return code of 139 is out of bounds) [06:09:14] MySQL slave on z-dat-s6-a is CRITICAL: (Return code of 139 is out of bounds) [06:09:14] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4105.000000 [06:09:15] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4104.000000 [06:09:22] MySQL slave on thyme is CRITICAL: (Return code of 139 is out of bounds) [06:09:22] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [06:09:22] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4077.000000 [06:09:23] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4076.000000 [06:09:23] wikidata replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4115.000000 [06:09:23] MySQL slave on z-dat-s3-a is CRITICAL: (Return code of 139 is out of bounds) [06:09:23] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [06:09:32] / on damiana is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:10:31] /tmp on damiana is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:10:41] / on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:11:22] PING on yucca is OK: PING OK - Packet loss = 0%, RTA = 0.28 ms [06:11:41] SMTP on daphne is OK: SMTP OK - 5.017 sec. response time [06:11:41] Free Memory on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:13:41] /tmp on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:15:21] PostgreSQL on ptolemy is CRITICAL: (Return code of 137 is out of bounds) [06:16:11] LDAP on ha-ldap.esi is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:16:31] Load avg. on damiana is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:17:31] Sun Grid Engine execd on nightshade is UNKNOWN: Execution timeout exceeded [06:17:32] Environment IPMI on damiana is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:17:41] PING on fsw1-n1-oe16-esams.mgmt is CRITICAL: CRITICAL - Plugin timed out after 10 seconds [06:17:41] Load avg. on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:18:50] SMTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:18:51] SMTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:19:32] PING on ortelius is CRITICAL: CRITICAL - Plugin timed out after 10 seconds [06:21:21] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:22:24] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 167193 MB (3% inode=67%): [06:22:58] Environment IPMI on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:23:57] SMTP on yarrow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:24:19] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [06:24:48] Sun Grid Engine execd on yarrow is UNKNOWN: Execution timeout exceeded [06:25:18] SMTP on hemlock is OK: SMTP OK - 5.008 sec. response time [06:25:28] MySQL slave on daphne is CRITICAL: (Return code of 139 is out of bounds) [06:25:28] s4 replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 5045.000000 [06:25:28] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 80158 MB (13% inode=99%): [06:26:28] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 150152.000000 [06:26:48] DiskSuite on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:28:38] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:32:18] SMTP on hemlock is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:32:47] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:33:09] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [06:33:48] / on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:34:38] ts-array5 on damiana is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:34:57] why am I not able to access toolserver today [06:35:48] /tmp on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:38:51] PING on yucca is CRITICAL: CRITICAL - Plugin timed out after 10 seconds [06:39:51] SMTP on rosemary is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:40:00] SMTP on cassia is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:40:00] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:40:01] SMTP on ptolemy is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:40:01] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:40:01] SMTP on thyme is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:40:01] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:40:01] SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:40:02] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:40:02] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:40:03] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:40:41] not able to access toolserver today [06:42:58] SMTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:43:08] SMTP on adenia is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:43:17] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:43:20] What are you trying to access? [06:43:44] Hey Susan. Looks like the IRC bots that are running from the toolserver are also down [06:43:45] :( [06:44:05] Like StewardBot and the ones elsewhere. [06:45:58] SSH on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:46:07] NTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:46:23] Mine is working. [06:46:28] Sun Grid Engine execd on nightshade is UNKNOWN: Execution timeout exceeded [06:46:28] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:46:28] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 185416.000000 [06:46:34] Hmm [06:46:58] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:47:08] / on ortelius is WARNING: DISK WARNING - free space: / 4843 MB (16% inode=92%): [06:47:27] Environment IPMI on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:48:18] Load avg. on damiana is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:48:27] ts-array5 on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:49:57] LDAP on ha-ldap.esi is OK: LDAP OK - 1.335 seconds response time [06:51:07] @replag [06:54:27] Load avg. on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:00] can't use t-paris edit count tool today [06:56:17] Load avg. on damiana is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:56:36] SSH on fsw1-n1-oe16-esams.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:56:58] SMTP on damiana is OK: SMTP OK - 9.476 sec. response time [06:57:08] / on damiana is OK: DISK OK - free space: / 43454 MB (60% inode=95%): [06:57:08] /tmp on damiana is OK: DISK OK - free space: /tmp 6914 MB (99% inode=99%): [06:58:27] SSH on fsw1-n1-oe16-esams.mgmt is OK: SSH OK - OpenSSH_5.2 (protocol 2.0) [07:01:07] SMTP on hemlock is OK: SMTP OK - 8.137 sec. response time [07:03:58] SSH on damiana is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [07:04:27] ts-array5 on damiana is OK: 2/2 paths are active [07:05:37] SSH on fsw1-n1-oe16-esams.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:06:27] PING on fsw1-n1-oe16-esams.mgmt is CRITICAL: CRITICAL - Plugin timed out after 10 seconds [07:08:07] SMTP on hemlock is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:09:17] MySQL slave on z-dat-s7-a is CRITICAL: (Return code of 139 is out of bounds) [07:09:17] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7705.000000 [07:09:47] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7710.000000 [07:09:47] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [07:09:47] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7745.000000 [07:09:57] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7715.000000 [07:09:58] MySQL slave on thyme is CRITICAL: (Return code of 139 is out of bounds) [07:09:58] wikidata replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7755.000000 [07:09:58] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7718.000000 [07:09:58] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [07:10:17] s5 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7743.000000 [07:10:17] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7766.000000 [07:10:18] MySQL slave on z-dat-s3-a is CRITICAL: (Return code of 139 is out of bounds) [07:10:18] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7740.000000 [07:10:18] MySQL slave on cassia is CRITICAL: (Return code of 139 is out of bounds) [07:10:18] MySQL slave on z-dat-s6-a is CRITICAL: (Return code of 139 is out of bounds) [07:10:18] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7764.000000 [07:10:18] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7733.000000 [07:10:19] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [07:10:58] SSH on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:11:27] ts-array5 on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:12:27] Free Memory on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:15:58] PostgreSQL on ptolemy is CRITICAL: (Return code of 137 is out of bounds) [07:19:12] It seems like willow is down [07:19:17] but nightshade bots are still up [07:19:19] I think. [07:22:58] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 168127 MB (3% inode=67%): [07:24:57] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [07:25:27] SMTP on yarrow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:25:58] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 80051 MB (13% inode=99%): [07:26:57] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 153785.000000 [07:27:26] DiskSuite on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:32:26] SMTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:32:58] SMTP on wolfsbane is OK: SMTP OK - 0.008 sec. response time [07:33:08] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [07:33:27] Sun Grid Engine execd on nightshade is UNKNOWN: Execution timeout exceeded [07:33:27] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:33:57] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [07:35:17] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [07:36:07] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:39:57] SMTP on rosemary is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:39:58] SMTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:40:07] SMTP on ptolemy is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:40:07] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:40:07] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:40:08] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:40:08] SMTP on cassia is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:40:08] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:40:08] SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:40:09] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:40:17] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:40:17] SMTP on thyme is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:42:27] Sun Grid Engine execd on nightshade is UNKNOWN: Execution timeout exceeded [07:43:07] SMTP on adenia is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:44:28] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:45:37] SMTP on mayapple is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:47:08] NTP on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:47:27] Environment IPMI on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:47:57] SMTP on damiana is OK: SMTP OK - 0.987 sec. response time [07:49:07] SMTP on hemlock is OK: SMTP OK - 5.026 sec. response time [07:50:27] Load avg. on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:53:27] Load avg. on damiana is OK: OK - load average: 1.85, 7.95, 8.32 [07:53:37] SSH on fsw1-n1-oe16-esams.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:56:07] SMTP on hemlock is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:58:07] SMTP on hemlock is OK: SMTP OK - 5.012 sec. response time [08:03:58] / on ortelius is WARNING: DISK WARNING - free space: / 4516 MB (15% inode=92%): [08:03:58] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:04:07] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [08:04:57] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [08:06:27] PING on fsw1-n1-oe16-esams.mgmt is CRITICAL: CRITICAL - Plugin timed out after 10 seconds [08:09:57] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11317.000000 [08:09:57] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11314.000000 [08:09:57] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11349.000000 [08:09:57] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [08:09:57] MySQL slave on z-dat-s4-a is CRITICAL: (Return code of 139 is out of bounds) [08:09:58] wikidata replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11359.000000 [08:09:58] MySQL slave on thyme is CRITICAL: (Return code of 139 is out of bounds) [08:09:59] MySQL slave on z-dat-s7-a is CRITICAL: (Return code of 139 is out of bounds) [08:09:59] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11351.000000 [08:10:08] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11323.000000 [08:10:57] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11389.000000 [08:10:58] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11380.000000 [08:10:58] MySQL slave on cassia is CRITICAL: (Return code of 139 is out of bounds) [08:10:58] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11415.000000 [08:10:58] s5 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11392.000000 [08:10:58] MySQL slave on z-dat-s3-a is CRITICAL: (Return code of 139 is out of bounds) [08:10:58] MySQL slave on z-dat-s6-a is CRITICAL: (Return code of 139 is out of bounds) [08:10:59] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [08:11:07] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11414.000000 [08:11:27] ts-array5 on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:11:57] SSH on damiana is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [08:12:27] Free Memory on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:14:37] MySQL slave on daphne is CRITICAL: (Return code of 139 is out of bounds) [08:14:37] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 190700.000000 [08:14:37] s4 replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11594.000000 [08:15:58] PostgreSQL on ptolemy is CRITICAL: (Return code of 137 is out of bounds) [08:16:27] Sun Grid Engine execd on yarrow is UNKNOWN: Execution timeout exceeded [08:17:27] SMTP on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:18:27] SMTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:23:57] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 168069 MB (3% inode=67%): [08:23:58] SSH on damiana is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:24:57] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [08:25:58] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 79944 MB (13% inode=99%): [08:27:07] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 157386.000000 [08:27:27] DiskSuite on damiana is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:29:16] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [08:29:27] SSH on fsw1-n1-oe16-esams.mgmt is OK: SSH OK - OpenSSH_5.2 (protocol 2.0) [08:30:27] PING on asw-oe10-esams.mgmt is CRITICAL: CRITICAL - Plugin timed out after 10 seconds [08:33:17] PING on asw-oe10-esams.mgmt is OK: PING OK - Packet loss = 0%, RTA = 0.42 ms [08:33:27] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:34:08] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [08:34:27] SMTP on yarrow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:36:27] ts-array5 on damiana is OK: 2/2 paths are active [08:36:37] SSH on fsw1-n1-oe16-esams.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:37:56] hello can some one help us since our whois tools via the toolserver seem to be broke [08:38:18] broken* [08:38:29] broke [08:38:38] SSH on fsw1-n1-oe16-esams.mgmt is OK: SSH OK - OpenSSH_5.2 (protocol 2.0) [08:38:48] SMTP on rosemary is OK: SMTP OK - 0.017 sec. response time [08:38:48] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.017 second response time [08:39:06] SMTP on cassia is OK: SMTP OK - 0.015 sec. response time [08:39:06] SMTP on adenia is OK: SMTP OK - 0.005 sec. response time [08:39:06] SMTP on z-dat-s6-a is OK: SMTP OK - 0.013 sec. response time [08:39:06] SMTP on hyacinth is OK: SMTP OK - 0.018 sec. response time [08:39:06] SMTP on z-dat-s4-a is OK: SMTP OK - 0.018 sec. response time [08:39:06] SMTP on z-dat-s7-a is OK: SMTP OK - 0.111 sec. response time [08:39:06] SMTP on z-dat-s3-a is OK: SMTP OK - 0.023 sec. response time [08:39:07] SMTP on thyme is OK: SMTP OK - 0.007 sec. response time [08:39:07] SMTP on ptolemy is OK: SMTP OK - 0.055 sec. response time [08:39:08] SMTP on nightshade is OK: SMTP OK - 0.019 sec. response time [08:39:08] SMTP on mayapple is OK: SMTP OK - 0.007 sec. response time [08:39:09] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [08:39:13] PING on fsw1-n1-oe16-esams.mgmt is OK: PING OK - Packet loss = 0%, RTA = 0.47 ms [08:39:45] thank you it works again [08:40:08] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:40:08] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:40:08] MySQL slave on cassia is OK: Uptime: 1773435 Threads: 8 Questions: 1617490770 Slow queries: 33615 Opens: 83298 Flush tables: 2 Open tables: 14103 Queries per second avg: 912.66 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [08:40:48] SSH on damiana is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [08:40:58] Free Memory on damiana is OK: OK - 88.4% (7404360 kB) free. [08:41:09] Environment IPMI on damiana is OK: ok: temperature ok fan ok voltage ok chassis ok [08:42:02] DiskSuite on damiana is OK: OK - No disk failures detected [08:42:02] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [08:43:09] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [08:44:01] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [08:44:08] s5 replag on cassia is OK: QUERY OK: SELECT ts_rc_age() returned 1703.000000 [08:45:09] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:45:18] Sun Grid Engine execd on nightshade is UNKNOWN: Execution timeout exceeded [08:46:09] SSH on yarrow is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:46:10] SSH on nightshade is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:49:49] NTP on damiana is OK: NTP OK: Offset 0.001935 secs [08:52:02] SSH on nightshade is OK: SSH OK - OpenSSH_5.5p1 Debian-6+squeeze3 (protocol 2.0) [08:52:02] Sun Grid Engine execd on nightshade is OK: Host and Queues Ok [08:55:01] SSH on yarrow is OK: SSH OK - OpenSSH_5.5p1 Debian-6+squeeze3 (protocol 2.0) [08:55:01] Sun Grid Engine execd on yarrow is OK: Host and Queues Ok [08:58:28] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [09:01:09] MySQL slave on z-dat-s7-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3448 [09:04:01] / on ortelius is WARNING: DISK WARNING - free space: / 4223 MB (14% inode=92%): [09:07:09] MySQL slave on z-dat-s7-a is OK: Uptime: 2839469 Threads: 9 Questions: 1349591851 Slow queries: 61076 Opens: 4405319 Flush tables: 1 Open tables: 6916 Queries per second avg: 475.297 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1522 [09:10:01] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 12105.000000 [09:10:09] MySQL slave on z-dat-s4-a is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 6557 [09:10:09] wikidata replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 14970.000000 [09:10:09] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 6206 [09:10:09] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 14962.000000 [09:10:38] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 6137.000000 [09:10:49] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 8000 [09:10:49] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 15006.000000 [09:10:49] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7999.000000 [09:11:01] PostgreSQL on ptolemy is OK: OK - database osm_mapnik (0 sec.) [09:11:10] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 14991.000000 [09:11:10] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [09:11:10] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 15000.000000 [09:11:10] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 15026.000000 [09:11:10] MySQL slave on z-dat-s3-a is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 9529 [09:11:10] MySQL slave on z-dat-s6-a is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 8693 [09:11:38] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 15044.000000 [09:14:38] MySQL slave on daphne is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 11619 [09:14:38] s4 replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11624.000000 [09:14:38] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 194301.000000 [09:21:09] MySQL slave on thyme is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3535 [09:21:38] s1 replag on thyme is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3451.000000 [09:24:00] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 167437 MB (3% inode=67%): [09:25:01] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [09:26:00] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 79591 MB (13% inode=99%): [09:26:09] MySQL slave on z-dat-s4-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3416 [09:27:09] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 160996.000000 [09:33:38] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:34:15] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [09:34:15] MySQL slave on z-dat-s4-a is OK: Uptime: 4962108 Threads: 2 Questions: 482687793 Slow queries: 244 Opens: 669 Flush tables: 11 Open tables: 224 Queries per second avg: 97.274 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1183 [09:34:49] MySQL slave on rosemary is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3593 [09:34:49] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3582.000000 [09:38:09] MySQL slave on thyme is OK: Uptime: 1417164 Threads: 9 Questions: 987637800 Slow queries: 100129 Opens: 7910 Flush tables: 1 Open tables: 526 Queries per second avg: 696.911 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1768 [09:38:38] s1 replag on thyme is OK: QUERY OK: SELECT ts_rc_age() returned 1724.000000 [09:40:00] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [09:40:18] SMTP on z-dat-s5-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:40:19] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:42:09] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [09:43:09] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [09:44:00] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [09:45:18] SMTP on yucca is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:52:49] MySQL slave on rosemary is OK: Uptime: 1431329 Threads: 14 Questions: 1235181877 Slow queries: 773602 Opens: 90023 Flush tables: 1 Open tables: 3584 Queries per second avg: 862.961 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1772 [09:52:49] s1 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1772.000000 [09:56:19] MySQL slave on z-dat-s6-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3577 [09:58:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [09:59:20] MySQL slave on z-dat-s3-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3534 [10:00:19] SMTP on yucca is OK: SMTP OK - 0.008 sec. response time [10:00:20] SMTP on z-dat-s5-b is OK: SMTP OK - 0.005 sec. response time [10:04:09] / on ortelius is WARNING: DISK WARNING - free space: / 3693 MB (12% inode=92%): [10:10:20] wikidata replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 18574.000000 [10:10:20] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 18568.000000 [10:10:49] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 18605.000000 [10:10:59] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 8588.000000 [10:11:20] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [10:11:21] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 18596.000000 [10:11:21] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 18630.000000 [10:11:21] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 18606.000000 [10:11:38] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 18644.000000 [10:12:20] MySQL slave on z-dat-s3-a is OK: Uptime: 979977 Threads: 19 Questions: 1341598126 Slow queries: 58836 Opens: 12137806 Flush tables: 1 Open tables: 16384 Queries per second avg: 1369.9 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1629 [10:14:38] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 197900.000000 [10:14:39] s4 replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7836.000000 [10:14:39] MySQL slave on daphne is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 7837 [10:24:59] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 165448 MB (3% inode=67%): [10:25:59] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [10:26:20] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 79299 MB (13% inode=99%): [10:27:20] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 164600.000000 [10:27:21] MySQL slave on z-dat-s6-a is OK: Uptime: 2844275 Threads: 6 Questions: 1319484316 Slow queries: 140547 Opens: 4650861 Flush tables: 1 Open tables: 4241 Queries per second avg: 463.908 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1798 [10:33:49] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:34:20] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [10:40:20] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [10:40:29] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:42:20] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [10:43:20] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [10:44:20] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [10:52:09] / on ortelius is CRITICAL: DISK CRITICAL - free space: / 3287 MB (10% inode=92%): [10:54:38] MySQL slave on daphne is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3567 [10:54:38] s4 replag on daphne is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3574.000000 [10:58:00] Load avg. on ortelius is WARNING: WARNING - load average: 17.76, 18.91, 14.44 [11:02:59] Load avg. on ortelius is OK: OK - load average: 13.18, 14.74, 13.82 [11:10:48] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 22205.000000 [11:10:59] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4534.000000 [11:11:20] wikidata replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 22235.000000 [11:11:20] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 22231.000000 [11:11:20] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 22196.000000 [11:11:20] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [11:11:20] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 22205.000000 [11:11:21] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 22228.000000 [11:11:38] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 22244.000000 [11:13:38] MySQL slave on daphne is OK: Uptime: 6441502 Threads: 12 Questions: 1624016671 Slow queries: 273100 Opens: 294263 Flush tables: 1 Open tables: 1661 Queries per second avg: 252.117 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1740 [11:13:38] s4 replag on daphne is OK: QUERY OK: SELECT ts_rc_age() returned 1740.000000 [11:14:02] [[Special:Log/newusers]] create 10 * Erwin Mulialim * (New user account) [11:14:38] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 201501.000000 [11:25:00] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 165015 MB (3% inode=67%): [11:26:08] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [11:26:20] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 79033 MB (12% inode=99%): [11:27:20] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 168200.000000 [11:28:28] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [11:28:59] s4 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3597.000000 [11:33:49] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:34:20] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [11:38:31] @replag [11:38:31] Danny_B: s1-rr-a-wd: 6h 37m 54s [+1.00 s/s]; s1-user-c: 52m 27s [-0.37 s/s]; s1-user-wd: 6h 37m 50s [+1.00 s/s]; s2-rr: 6h 37m 15s [+1.00 s/s]; s2-user: 6h 37m 15s [+0.99 s/s]; s2-user-c: error; s2-user-wd: 1d 22h 54m 40s [+0.99 s/s]; s3-user: 1m 44s [+0.99 s/s] [11:38:32] Danny_B: s3-user-wd: 6h 37m 50s [+0.99 s/s]; s4-user-wd: 2d 8h 22m 24s [+0.99 s/s]; s5-user: 6h 37m 27s [+0.99 s/s]; s5-user-c: 2w 3d 22h 4m 13s [+0.99 s/s]; s6-user-wd: 6h 37m 47s [+0.99 s/s]; s7-user-wd: 6h 37m 47s [+1.00 s/s] [11:39:14] why is the s2 delayed? were there any issues? [11:40:29] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:41:21] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [11:42:20] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [11:44:20] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [11:44:20] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [11:47:29] I've a job waiting since hour, qstat -j give me [11:47:33] scheduling info: queue instance "longrun-lx@yarrow.toolserver.org" dropped because it is temporarily not available [11:47:33] queue instance "longrun-lx@nightshade.toolserver.org" dropped because it is temporarily not available [11:52:09] / on ortelius is CRITICAL: DISK CRITICAL - free space: / 2781 MB (9% inode=92%): [11:55:12] phe, I'm looking and don't see why they may be temporarily not available [11:55:59] s4 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1764.000000 [11:56:27] hmm.. queues longrun-lx@yarrow. and longrun-lx@nightshade seem to be in Error state [11:58:36] longrun-lx@nightshade.toolserv BI 0/6/64 3.40 linux-x64 E [11:58:37] queue longrun-lx marked QERROR as result of job 1560048's failure at host nightshade.toolserver.org [11:58:37] queue longrun-lx marked QERROR as result of job 1560050's failure at host nightshade.toolserver.org [11:58:37] --------------------------------------------------------------------------------- [11:58:37] longrun-lx@yarrow.toolserver.o BI 0/2/64 0.14 linux-x64 E [11:58:37] queue longrun-lx marked QERROR as result of job 1560048's failure at host yarrow.toolserver.org [11:58:37] queue longrun-lx marked QERROR as result of job 1560050's failure at host yarrow.toolserver.org [11:59:42] both of them are /home/alchimista/ircBots/albeth.py [11:59:53] error reason 1: 03/14/2013 08:43:01 [0:26189]: can't set additional group id (uid=0, euid=0): Cannot allocate memory [12:00:18] the server seems to have enough free memory now [12:02:04] ok, I cleaned it [12:02:16] phe, see if it works now [12:03:28] Platonides, it started, ty [12:03:52] yw [12:04:17] for the record, the command was: sudo /sge/GE/bin/linux-x64/qmod -c '*' [12:06:40] btw I received a mail from nosy, then from Frederic Leva saying I was using to much cpu time on nightshade, but my script use 1 thread per proc and the main thread send SIGSTOP/SIGCONT to child to allow them to run only in idle time, is this an allowed way to use free cpu time ? [12:06:47] *too much [12:07:22] how are you detecting if it's run in idle time? [12:07:41] stopping / continuing the process automatically seems odd [12:07:54] why not just run it with nice? [12:08:26] Platonides, Im' using the using first line of /proc/stat [12:09:20] Platonides, because it's a cpu hog that'll run for week, and I prefer to run it as nice level AND only in idle time [12:10:49] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 25806.000000 [12:11:36] it's probably ok [12:11:38] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 25844.000000 [12:11:57] what are you running ? [12:12:20] wikidata replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 25894.000000 [12:12:20] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 25890.000000 [12:12:20] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [12:12:20] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 25856.000000 [12:12:20] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 25866.000000 [12:12:20] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 25887.000000 [12:12:47] Platonides, tesseract, doing hocr of wikisource book http://en.wikipedia.org/wiki/Hocr [12:13:21] Platonides, tesseract use little memory resource (20/30 Mo per process) and little IO but need tons of cpu time [12:14:38] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 205102.000000 [12:16:10] I would have expected tesseract to be more memory intensive [12:21:33] tesseract is more clever than imagemagick on memory use, mostly because tesseract can run on small device [12:24:59] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 164039 MB (3% inode=67%): [12:25:03] phe: How does it work? Is it fed one multi-page TIFF and then spits out data? Do I understand it correctly that presently you don't use SGE? [12:26:20] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 78790 MB (12% inode=99%): [12:26:47] scfc_de, it run through SGE, support only djvu file format, extract one image, run tesseract on one image and so on, this with one subprocess per processor [12:27:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [12:27:54] Subprocess = one image? [12:28:20] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 171860.000000 [12:28:29] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [12:28:44] yeps, tesseract support multipage file only with tiff file [12:29:10] (Disclaimer: I have absolutely no idea of tesseract :-).) [12:29:22] Do you run tesseract as just a normal process or as SGE? [12:29:33] I just wonder because mayapple is idling away ... :-) [12:29:50] mayapple has a little nimber of cpus :) [12:29:54] *number [12:31:03] scfc_de, subprocess run as normal process [12:31:31] yarrow isn't very busy as well. I just fear that if you have one job that runs for weeks it gets scheduled on nightshade alone and so the existing CPU isn't used very well. [12:31:37] phe: Is the source available somewhere? [12:33:49] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:34:30] scfc_de, sort of https://svn.toolserver.org/svnroot/phe/trunk/ task scheduling is in /common/task_scheduler.py ocr is in /ocr , highlevel stuff is in /wshocr [12:34:50] I'll take a look, thanks. [12:34:55] it's a mess... [12:35:20] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [12:35:36] :-) [12:40:29] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:42:20] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [12:42:20] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [12:44:08] phe: Wow, that's complicated. So you have some CGI that passes commands from and to your daemon via some local files? [12:44:20] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [12:44:20] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [12:45:39] Oh, sockets even. [12:45:48] scfc_de, no local file, there is a local file to save the queued request if the daemon die but hocr.php use a socket to talk to the daemon so the service provided by wshocr.py can run on another box than the web server [12:47:13] the daemon get the request, queue them and run only one request at a time [12:48:48] scfc_de, using socket ensure also only one daemon is running, else it get a "socket in use" at startup [12:51:55] phe: I believe you should be able to use "qsub" from PHP directly (and "qstat" for status). I don't know if that's necessary to reduce the load, but if it were my script, it certainly would ease the headache of having a PHP script that talks to a daemon which then starts processes, watched the load and suspends them, etc. :-). But I don't know if I were to "fix" this on Toolserver. [12:52:05] Do you require the replicated databases? [12:52:09] / on ortelius is CRITICAL: DISK CRITICAL - free space: / 2304 MB (7% inode=92%): [12:52:36] scfc_de, no database access with this tool [12:56:11] scfc_de, one of the queue is a fast interactive service to get the hocr cached on the TS for a given page, using qsub/qstat will be slow, actually it is used that way ; user double click on a word (javascript follow) which request the hocr for this page, then from the hocr I can get the bouding box of of this word and highlight in the image [12:56:32] Sometime within in the next year :-), you'll probably migrate to Labs. That might be a good opportunity to redesign the whole thing and reduce the complexity :-). [12:57:07] Well, you can just use normal exec for that. [12:57:45] Heyas [12:57:54] scfc_de: Both bugs resolved, btw [12:58:00] (Or the admins could define an extra SGE queue for your fasttrack jobs.) [12:58:31] Coren: Thanks! [12:59:45] phe: Where on Wikisource can you see it in action? [13:02:03] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 53385 MB (5% inode=99%): [13:04:11] scfc_de not yet in production, I'm waiting for enough hocr to be ready, you can test it by adding to your common.js on fr.ws [13:04:19] var server = mw.config.get('wgServer'); [13:04:19] jQuery.getScript(server + '/w/index.php?title=User:Phe/hocr.js&action=raw&ctype=text/javascript&dontcountme=s'); [13:04:42] then try it on [13:04:43] http://fr.wikisource.zaniah.virgus/wiki/Page:Mazi%C3%A8res_-_Parall%C3%A8le_entre_la_fi%C3%A8vre_typho%C3%AFde_de_l%E2%80%99homme_et_la_thyphose_des_animaux.djvu/21 [13:05:01] double click on work either in edit mode or view mode [13:05:51] -on work +on a word [13:10:18] phe: Wow! That looks really cool. I already noticed two bugs, though :-): If you zoom in or move the image, the highlighted region doesn't zoom/move as well, and wrapped words don't get highlighted (e. g., "l’intérieur"). But: Very impressive! [13:10:31] the hocr link is a fake things, request is done on the first double click [13:10:49] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 23795.000000 [13:11:12] Coren: What was the SGE command you recently me about ("ssh $SOMEWHERECOZY")? "qexec"? [13:11:38] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 21498.000000 [13:11:40] scfc_de, and the way I instrument html to implement double click on a word break completely all scfc_de: qrsh is probably the one you want, or qlogin if you need a pty [13:12:19] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 29456.000000 [13:12:20] wikidata replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 16823.000000 [13:12:20] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [13:12:20] wikidata replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 19102.000000 [13:12:20] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 26773.000000 [13:12:20] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 19065.000000 [13:12:33] scfc_de: I.e. "qrsh something" vs. just "qlogin" [13:14:26] phe: I stand by my "Wow!" :-). [13:14:37] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 200108.000000 [13:16:05] scfc_de, it's intended to be used mostly with that sort of page, (not yet hocr ready), locating where a word is can be very annoying in such case : http://fr.wikisource.zaniah.virgus/wiki/Page:Michaud_-_Biographie_universelle_ancienne_et_moderne_-_1843_-_Tome_2.djvu/9 [13:17:16] phe: So as per Coren you could look at replacing the execs by "qrsh"s to spread the load more evenly. [13:18:09] phe, .zaniah.virgus ? [13:18:34] scfc_de: That /does/ have a possible startup delay of the order of a few seconds, scfc_de; it might not be good enough for interactive use unless it's heavy processing. [13:18:45] oups, local wiki : http://fr.wikisource.org/wiki/Page:Michaud_-_Biographie_universelle_ancienne_et_moderne_-_1843_-_Tome_2.djvu/9 [13:18:49] hehe [13:19:19] qrsh, k, but how I can know many request can send w/o flooding SGE ? [13:19:49] some book are 2500 pages, is it sensible to send 2500 request in a row to SGE ? [13:20:01] phe: the gridengine will just queue them up until it has the resources for them. [13:21:01] Although, admitedly, 2500 is a bit /large/ [13:21:52] phe, your example is failing with {"text": "unable to locate file /mnt/user-store/phe/cache/hocr/commons/commons/9c/3a/af/c171f99bf5e03a67394e14b353/page_0009.html for page Michaud - Biographie universelle ancienne et moderne - 1843 - Tome 2.djvu/9", "error": 1} [13:22:00] beside that, I'm unsure if it's simple to get the return status of each job, but I'll look at qrsh and all [13:22:10] actually, you need to view the loaded data to view it [13:22:15] the error is not shown in the interface [13:22:21] ... wait. qrsh waits until the job can start, runs it, and then exit. That's blocking. You probably want to start the jobs asynchronously instead. [13:22:46] phe: (But then again, I'm speaking without actually knowing what you're actually trying to do) :-) [13:22:51] Platonides, yeah, the hocr is not ready for this book, that's normal, it was an example where locating a word can be useful [13:23:31] Coren, there is surely a way to do that with an SGE only way [13:24:19] phe: I'd probably need to know more about what you're trying to do first before I can help with actual information rather than random guesswork. :-) [13:25:00] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 163752 MB (3% inode=67%): [13:25:26] Coren: Presently, phe schedules the jobs themselves, so using qrsh instead of exec isn't more blocking than now :-). [13:26:14] phe: You could just replace the check of /proc/stat with a simple test not to "submit" (= qrsh) more than, say, 25 jobs at any time. [13:26:50] Coren: https://svn.toolserver.org/svnroot/phe/trunk/ [13:27:08] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [13:27:19] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 78550 MB (12% inode=99%): [13:28:19] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 171921.000000 [13:28:28] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [13:33:49] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:36:20] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [13:40:29] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:41:05] DaB. * [Toolserver-announce] Re-Setup of thyme (sql-s1-rr) [13:41:24] Hello all [13:42:20] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [13:42:20] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [13:44:20] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [13:44:20] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [13:46:18] [[Special:Log/newusers]] create 10 * ProkoschLP * (New user account) [13:53:08] / on ortelius is CRITICAL: DISK CRITICAL - free space: / 1715 MB (5% inode=92%): [14:08:19] wikidata replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3368.000000 [14:10:59] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 16899.000000 [14:11:38] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 15681.000000 [14:13:19] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [14:13:19] wikidata replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 12412.000000 [14:13:19] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 25591.000000 [14:13:20] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 13836.000000 [14:13:20] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 33116.000000 [14:14:20] wikidata replag on cassia is OK: QUERY OK: SELECT ts_rc_age() returned 1492.000000 [14:14:38] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 191784.000000 [14:25:00] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 163019 MB (3% inode=67%): [14:27:20] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 78255 MB (12% inode=99%): [14:28:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [14:28:20] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 173430.000000 [14:28:38] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [14:33:49] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:37:19] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [14:40:28] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:43:20] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [14:43:20] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [14:44:20] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [14:44:20] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [14:53:08] / on ortelius is CRITICAL: DISK CRITICAL - free space: / 1204 MB (4% inode=92%): [14:54:15] SGE tasks can freely access to /mnt/user-store/dumps) [14:54:19] *? [15:03:20] fale: It's probably enabled on all hosts, but if you want to be sure, specify the ressource "fs-user-store=1" (cf. https://wiki.toolserver.org/view/Job_scheduling#Optional_resources). [15:04:25] scfc_de: thanks :) do you know on average the resources needed to pass an xml dump? [15:06:18] i've been able to parse enwiki with 50MB and preloading [15:06:21] fale: What does "pass" mean? :-) You can use "/usr/bin/time" to determine RSS and time, or, if the job has already been executed via SGE, you can see its consumption with "qacct". [15:06:36] i didnt bother to check how much it actually used [15:07:04] legoktm: preloading? you mean using a pipe? [15:07:27] Errr no [15:07:35] I had the program running in 2 threads [15:07:42] scfc_de: I'll try with /usr/bin/time since I've never used sge :D [15:07:45] One was parsing the dump, the other was the actual script [15:08:01] The dump parser just went how fast it could without worrying where the script was [15:08:11] Since it was in another thread [15:08:36] legoktm: I see... but I'm only doing a parse, so I think I would not gain any advantage [15:08:45] ah ok [15:09:55] fale: Be sure to use the right output format, though, to capture RSS and time. One moment. [15:11:00] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 14025.000000 [15:11:38] wikidata replag on z-dat-s6-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 6700.000000 [15:14:38] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [15:14:38] wikidata replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4213.000000 [15:14:38] wikidata replag on z-dat-s5-b is CRITICAL: (Service Check Timed Out) [15:14:38] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 24498.000000 [15:14:38] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 9334.000000 [15:14:39] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 185658.000000 [15:14:39] I've gave a qcronsub command on williw and no error have been returned, but nothing appear in qstat... why? [15:17:08] fale, it already finished? [15:17:11] fale: maybe the task was already executedt? [15:17:46] Platonides: I don't think that the full itwiki dump can be parsed in a few seconds [15:18:05] DaBPunkt: this is the first time I've ever used SGE, so I don't think that possible [15:18:08] fale: how was the job-name? [15:18:15] fale, what command did you run? [15:18:29] DaBPunkt, Platonides: qcronsub -l fs-user-store=1 -l h_rt=12:00:00 -l virtual_free=100M -l arch=* "$HOME/pywikipedia/replace.py -log -xml:/mnt/user-store/dumps/itwiki-latest-pages-articles.xml -namespace:0 -fix:si -pt:60 -savenew:/home/fale/si.log -always > /dev/null 2>&1" [15:19:19] A name is missing [15:20:08] add it with the "-N" parameter [15:20:23] DaBPunkt: ok, thanks :) [15:20:43] also you see no error-message if you use " > /dev/null 2>&1" [15:21:35] DaBPunkt: that should only delete the script output, or does it delete the sge errors too? [15:22:01] everything [15:22:26] DaBPunkt: removed and executed, and this time it does apper on qstat :) [15:22:33] thankyou a lot :):) [15:22:38] if you like to null only the script-output you have to put it in a wrapper-script [15:23:46] DaBPunkt: with sge, if I do not mute the script, I will receive a huge mail with the whole output? [15:24:07] no [15:24:23] the output will be written to a file [15:24:52] see https://wiki.toolserver.org/view/SGE_for_beginners#Notification_and_logging [15:25:09] DaBPunkt: thanks :) [15:25:20] wikidata replag on thyme is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3503.000000 [15:25:39] DaBPunkt, earlier this morning queues longrun-lx@yarrow. and longrun-lx@nightshade were in Error state [15:25:44] it was showed as caused by: [15:25:45] error reason 1: 03/14/2013 08:43:01 [0:26189]: can't set additional group id (uid=0, euid=0): Cannot allocate memory [15:25:49] don't know why [15:25:59] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 162191 MB (3% inode=67%): [15:26:01] I reenabled them, and have been working fine since [15:26:29] that's a strange error [15:26:48] thanks for fixing [15:27:20] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 77971 MB (12% inode=99%): [15:27:28] fale: how is the name of the job if I may ask? [15:27:33] I took advantage of the operator bit for the first time [15:27:45] I didn't know if I should have noted it somewhere [15:27:47] DaBPunkt: its regioni [15:28:08] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [15:28:19] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 173189.000000 [15:29:37] fale: mm, that's strange. I do not see it running [15:29:54] DaBPunkt: me neither, but was running few minutes ago... [15:30:25] DaBPunkt: it was Job #1561666 [15:31:14] fale: /home/fale/regioni.e1561666 [15:32:26] DaBPunkt: /var/sge/spool/execd/mayapple/job_scripts/1561666: 134: Syntax error: word unexpected (expecting ")") [15:33:49] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:33:53] oh oh. snow again… [15:35:52] fale: from where did you commited? [15:36:04] DaBPunkt: login.toolserver.org [15:37:20] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [15:37:53] fale: right from the shell? [15:37:58] yep [15:38:15] and it works without SGE? [15:38:35] DaBPunkt: yes [15:39:17] (I've executed it last time on willow itself) [15:39:56] try to add "-l arch=sol" for testing [15:40:01] ok [15:40:28] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:42:16] fale: failed too. But not with a SGE-error. [15:42:26] AFAIS the file you try to parse is not valid [15:42:50] looks like a msdos-file [15:43:20] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [15:43:20] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [15:43:38] wikidata replag on z-dat-s6-a is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3592.000000 [15:44:00] DaBPunkt: the xml file is the one the TS provids: /mnt/user-store/dumps/itwiki-latest-pages-articles.xml [15:44:20] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [15:44:20] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [15:44:21] fale: I spoke of pywikipedia/replace.py [15:45:03] DaBPunkt: uhm, that is the official pywikipedia/replace.py file... maybe I've to run it as python ~/pywikipedia/replace.py? [15:45:04] fale, pywikipedia/replace.py doesn't have a shebang [15:45:21] so it can't be run as "$HOME/pywikipedia/replace.py" ... [15:45:36] Platonides: I see [15:46:00] try with /usr/bin/python "$HOME/pywikipedia/replace.py" ... [15:46:11] or even better, use a shell wrapper [15:47:50] Platonides: I have to add something special to the command, using the shell wrapper? [15:52:03] fale: special? [15:52:15] DaBPunkt: like /usr/bin/python before or something like that [15:52:20] wikidata replag on thyme is OK: QUERY OK: SELECT ts_rc_age() returned 1776.000000 [15:53:00] Load avg. on ortelius is CRITICAL: CRITICAL - load average: 9.64, 27.81, 21.49 [15:53:08] fale: yes, before the python-script (but not before the wrapper-file) [15:53:09] / on ortelius is CRITICAL: DISK CRITICAL - free space: / 591 MB (1% inode=92%): [15:53:18] WTF?? [15:54:41] why it was medium run and now it become longrun, simply using the wrapper and not changing the h_rt? [15:55:39] fale: that depends on the server load [15:55:47] fale: the medium-queues were overloaded [15:56:02] legoktm: I see, I thought it was only based on time [15:56:10] well time and server load [15:56:25] if your job runs for 2 weeks, itll go into long-run no matter what [15:56:43] but if its 2 minutes, it might go into short/medium/long depending on the load [15:57:02] fale: if you can remove the "-l arch=sol"-parameter there are more possible servers [15:57:09] I see :) [15:57:33] does it change something being short/medium/long? [15:57:40] pardon? [15:57:59] Load avg. on ortelius is WARNING: WARNING - load average: 10.77, 18.98, 19.32 [15:58:23] DaBPunkt: do short/medium script have higher priority over long script? Or something like that [15:58:38] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [15:59:37] fale: I guess so. But for details you have to ask Merlissimo [15:59:51] DaBPunkt: thanks :) [16:00:32] fale: jobs running at short/medium queues have higher priorities, yes [16:01:09] Merlissimo: thanks :) [16:01:26] fale: sge will choose the right queue for your job [16:01:52] there are different limits for time/memory [16:03:59] Load avg. on ortelius is OK: OK - load average: 4.11, 9.21, 14.68 [16:04:12] Merlissimo: :) thanks [16:05:56] if a person has more than one script to run, would it be better to put them in the queue toghether or one any X hours? [16:06:51] fale: put them together [16:06:56] DaBPunkt: thanks :) [16:08:08] / on ortelius is WARNING: DISK WARNING - free space: / 3497 MB (11% inode=92%): [16:08:37] wikidata replag on z-dat-s6-a is OK: QUERY OK: SELECT ts_rc_age() returned 1795.000000 [16:10:59] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 10396.000000 [16:13:28] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 40324.000000 [16:14:20] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [16:14:20] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 22535.000000 [16:14:20] wikidata replag on z-dat-s7-a is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 5278.000000 [16:14:35] is the .bashrc file disabled? [16:14:38] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 184917.000000 [16:15:12] fale: should not [16:15:41] why do you ask? [16:16:28] DaBPunkt: I put 2 commands in it and de-logging and re-logging I can't use them [16:20:38] (solved adding .bash_profile that included the .bashrc) [16:23:32] the .profile, .bashrc, .bash_profile files are a special chapter per se… to make it even more fun they work differnd of linux and solaris :( [16:26:00] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 161659 MB (3% inode=67%): [16:28:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [16:28:20] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 77699 MB (12% inode=99%): [16:28:20] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 172550.000000 [16:29:40] DaBPunkt, they should work similar [16:29:57] the difference probably lie in the contents of /etc/profile and /etc/bash.bashrc [16:33:49] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [16:37:18] it would be nice to be able to run more regex set (aka fix) in the same run of replace.py, as suggested at http://sourceforge.net/tracker/?func=detail&aid=3607815&group_id=93107&atid=603138 [16:37:20] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [16:40:20] wikidata replag on z-dat-s7-a is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3585.000000 [16:40:28] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:43:19] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [16:43:20] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [16:44:20] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [16:44:20] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [16:58:38] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [17:08:09] / on ortelius is CRITICAL: DISK CRITICAL - free space: / 3294 MB (10% inode=92%): [17:09:08] / on ortelius is WARNING: DISK WARNING - free space: / 3299 MB (11% inode=92%): [17:10:09] / on ortelius is OK: DISK OK - free space: / 9863 MB (32% inode=92%): [17:10:59] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7858.000000 [17:13:48] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 43942.000000 [17:14:39] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 187879.000000 [17:15:20] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [17:15:20] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 21765.000000 [17:17:19] wikidata replag on z-dat-s7-a is OK: QUERY OK: SELECT ts_rc_age() returned 1790.000000 [17:25:59] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 161003 MB (3% inode=67%): [17:28:08] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [17:29:20] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 77420 MB (12% inode=99%): [17:29:20] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 169436.000000 [17:34:49] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:37:20] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [17:40:28] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:43:20] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [17:43:20] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [17:45:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [17:45:19] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [17:45:33] Merlissimo: Are you there? [18:00:00] APT on nightshade is CRITICAL: APT CRITICAL: 2 packages available for upgrade (2 critical updates). [18:10:59] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 6298.000000 [18:14:38] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 189674.000000 [18:14:50] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 47602.000000 [18:16:19] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [18:16:19] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 19470.000000 [18:24:39] Jan_Luca: pong [18:26:00] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 160523 MB (3% inode=67%): [18:28:09] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [18:28:28] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [18:29:19] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 163490.000000 [18:30:26] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 77154 MB (12% inode=99%): [18:34:49] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:37:19] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [18:41:29] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:43:20] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [18:43:20] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [18:45:20] /tmp on ortelius is WARNING: DISK WARNING - free space: /tmp 2850 MB (19% inode=99%): [18:45:20] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [18:45:20] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [18:56:19] /tmp on ortelius is OK: DISK OK - free space: /tmp 3155 MB (21% inode=99%): [18:59:59] APT on nightshade is CRITICAL: APT CRITICAL: 2 packages available for upgrade (2 critical updates). [19:11:00] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4696.000000 [19:14:38] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 191946.000000 [19:14:49] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 51202.000000 [19:16:19] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [19:16:20] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 19574.000000 [19:25:59] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 160649 MB (3% inode=67%): [19:28:08] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (3 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.CommunicationLost.desc:S3:oob:S4:2530:S9:ts-array5:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.descri [19:28:29] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [19:30:20] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 156824.000000 [19:31:19] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 76840 MB (12% inode=99%): [19:34:49] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [19:38:19] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [19:39:19] hi DaBPunkt [19:41:28] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:44:20] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [19:44:20] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [19:45:19] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [19:45:19] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [19:57:00] wikidata replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3594.000000 [20:00:00] APT on nightshade is CRITICAL: APT CRITICAL: 2 packages available for upgrade (2 critical updates). [20:14:39] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 194475.000000 [20:14:49] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 54802.000000 [20:16:19] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [20:16:20] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 16344.000000 [20:26:00] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 158914 MB (2% inode=67%): [20:28:29] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [20:29:08] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [20:30:19] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 143786.000000 [20:31:19] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 76454 MB (12% inode=99%): [20:34:50] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:38:22] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [20:41:38] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:44:22] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [20:44:23] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [20:57:37] wikidata replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3380.000000 [21:00:06] APT on nightshade is CRITICAL: APT CRITICAL: 2 packages available for upgrade (2 critical updates). [21:01:36] wikidata replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 3610.000000 [21:15:27] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 197057.000000 [21:15:46] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 58464.000000 [21:16:26] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [21:16:36] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 13012.000000 [21:16:57] ping Merlissimo [21:26:36] wikidata replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3594.000000 [21:26:56] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 157069 MB (2% inode=67%): [21:28:36] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [21:29:56] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [21:31:06] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 139738.000000 [21:31:36] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 74813 MB (12% inode=99%): [21:35:46] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [21:38:36] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [21:41:56] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:44:26] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [21:45:06] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [21:49:36] wikidata replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1787.000000 [21:51:16] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [21:51:46] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [22:00:06] APT on nightshade is CRITICAL: APT CRITICAL: 2 packages available for upgrade (2 critical updates). [22:15:28] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 193895.000000 [22:15:46] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 62064.000000 [22:16:26] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [22:16:36] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 5298.000000 [22:18:32] I will restart mysql on daphne – something is wrong there. [22:26:46] MySQL slave on daphne is CRITICAL: Cant connect to MySQL server on daphne (146) [22:26:56] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 154346 MB (2% inode=67%): [22:27:38] and back [22:27:46] MySQL slave on daphne is OK: Uptime: 55 Threads: 5 Questions: 8508 Slow queries: 3 Opens: 73 Flush tables: 1 Open tables: 64 Queries per second avg: 154.690 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 408 [22:28:36] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [22:29:36] s4 replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3472.000000 [22:29:56] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A: [22:31:06] wikidata replag on z-dat-s2-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 133974.000000 [22:31:36] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 69420 MB (11% inode=99%): [22:35:46] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:38:36] MySQL slave on z-dat-s2-b is CRITICAL: (Return code of 139 is out of bounds) [22:38:37] s4 replag on cassia is OK: QUERY OK: SELECT ts_rc_age() returned 1714.000000 [22:41:56] SMTP on z-dat-s2-b is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:44:26] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [22:45:06] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [22:47:15] / on ortelius is WARNING: DISK WARNING - free space: / 6203 MB (20% inode=92%): [22:51:16] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [22:51:46] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [22:57:15] @replag [22:57:15] DaBPunkt: s1-rr-a-wd: 25s [+0.00 s/s]; s1-user-wd: 9m 15s [-0.33 s/s]; s2-rr: 17h 55m 59s [+1.00 s/s]; s2-user: 17h 55m 59s [+1.00 s/s]; s2-user-c: error; s2-user-wd: 1d 13h 4m 6s [-1.86 s/s]; s3-user-wd: 29s [-0.00 s/s]; s4-user-wd: 2d 5h 27m 13s [+0.18 s/s] [22:57:16] DaBPunkt: s5-user: 17h 56m 11s [+1.00 s/s]; s5-user-c: 2w 3d 6h 21m 56s [-3.07 s/s]; s6-user: 1m 7s [-]; s6-user-wd: 23m 41s [+0.08 s/s]; s7-user-wd: 24s [-0.00 s/s] [23:00:06] APT on nightshade is CRITICAL: APT CRITICAL: 2 packages available for upgrade (2 critical updates). [23:02:06] APT on nightshade is OK: APT OK: 0 packages available for upgrade (0 critical updates). [23:13:46] SMTP on z-dat-s2-b is OK: SMTP OK - 0.105 sec. response time [23:15:26] wikidata replag on daphne is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 189232.000000 [23:15:46] wikidata replag on z-dat-s5-b is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 65664.000000 [23:16:27] MySQL slave on z-dat-s5-b is CRITICAL: (Return code of 139 is out of bounds) [23:26:56] /mnt user-store on rosemary is CRITICAL: DISK CRITICAL - free space: /mnt 145013 MB (2% inode=66%): [23:28:36] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: OK 3, WARN 0, CRIT 1: far1-n1-fast3 FTOL, far1-n1-bulk CRIT, far1-n1-fast2 FTOL, far1-n1-fast FTOL [23:29:55] CAM on hemlock is CRITICAL: CRITICAL - Storage ts-array5 (2 errors, 1 warning): null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.B:, null :OSGi.com.sun.storage.cam.agent(device.2530):event.ProblemEvent.REC_EXPIRED_BATTERY.description:S17:Tray.85.Battery.A:, null :OSGi.com.sun.storage.cam.agent(com.sun.netstorage.fm.storade.agent.Messages):monitor.Communicatio [23:31:26] wikidata replag on z-dat-s2-b is CRITICAL: (Service Check Timed Out) [23:31:36] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 69320 MB (11% inode=99%): [23:35:46] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:38:36] MySQL slave on z-dat-s2-b is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 66623 [23:44:26] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [23:45:06] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [23:46:19] @replag [23:46:19] DaBPunkt: s1-rr-a-wd: 30s [+0.00 s/s]; s1-user-wd: 26s [-0.18 s/s]; s2-rr: 18h 37m 52s [+0.85 s/s]; s2-user: 18h 37m 52s [+0.85 s/s]; s2-user-c: error; s2-user-wd: 1d 13h 45m 22s [+0.84 s/s]; s3-user: 1m 17s [+0.00 s/s]; s3-user-wd: 26s [-0.00 s/s] [23:46:20] DaBPunkt: s4-user-wd: 2d 2h 57m 1s [-3.06 s/s]; s5-user: 18h 45m 15s [+1.00 s/s]; s5-user-c: 2w 3d 6h 13s [-0.44 s/s]; s6-user-wd: 27s [-0.47 s/s]; s7-user-wd: 30s [+0.00 s/s] [23:47:16] / on ortelius is WARNING: DISK WARNING - free space: / 5588 MB (18% inode=92%): [23:51:16] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [23:51:46] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [23:54:18] DaBPunkt: how long does it normally take to transfer 300G across the ocean? [23:54:36] jeremyb_: depens on the bandwidth [23:54:45] that's a given! [23:55:22] (i meant the transfer you just started or will soon start) [23:55:38] jeremyb_: depens on the bandwidth I use ;) [23:56:02] DaBPunkt: 56k? [23:56:06] 15mb/s is a possible speed afair [23:57:34] I normaly time such transfer in the way that I can sleep during it [23:58:56] hrm. my math must be wrong [23:59:05] is it less than a day at 15 mpbs? [23:59:27] yes