[00:02:16] nacht ts [00:02:44] multichill: for the next hours [03:03:55] [[Special:Log/newusers]] create 10 * Lsmll * (New user account) [03:48:17] 2013/05/19 03:43 CRIT damiana Free Memory CRITICAL - 1.7% (146320 kB) free! [03:58:18] 2013/05/19 03:57 OK damiana Free Memory OK - 18.0% (1505936 kB) free. [04:13:20] 2013/05/19 04:08 CRIT damiana Free Memory CRITICAL - 1.5% (126492 kB) free! [04:26:21] 2013/05/19 04:25 OK damiana Free Memory OK - 17.0% (1421752 kB) free. [04:39:22] 2013/05/19 04:33 CRIT damiana Free Memory CRITICAL - 1.5% (127252 kB) free! [04:49:20] [[Special:Log/newusers]] create 10 * Elvina746 * (New user account) [04:58:22] 2013/05/19 04:57 OK damiana Free Memory OK - 15.7% (1317324 kB) free. [05:09:23] 2013/05/19 05:03 CRIT damiana Free Memory CRITICAL - 1.5% (127328 kB) free! [05:32:25] 2013/05/19 05:31 OK damiana Free Memory OK - 11.6% (975484 kB) free. [05:41:26] 2013/05/19 05:36 CRIT damiana Free Memory CRITICAL - 1.5% (127812 kB) free! [06:39:46] [[Special:Log/newusers]] create 10 * Flossie525 * (New user account) [07:39:34] 2013/05/19 07:35 CRIT ha-ldap.esi LDAP CRITICAL - Socket timeout after 10 seconds [07:44:35] 2013/05/19 07:43 OK ha-ldap.esi LDAP LDAP OK - 2.765 seconds response time [07:47:35] 2013/05/19 07:47 CRIT ha-ldap.esi PING CRITICAL - Host Unreachable (ha-ldap.esi) [07:47:35] 2013/05/19 07:47 CRIT ha-nfs.esi PING CRITICAL - Host Unreachable (ha-nfs.esi) [07:47:35] 2013/05/19 07:46 CRIT nightshade APT CHECK_NRPE: Socket timeout after 30 seconds. [07:47:35] 2013/05/19 07:46 CRIT ortelius Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [07:48:35] 2013/05/19 07:47 CRIT damiana / Connection refused or timed out [07:48:35] 2013/05/19 07:47 CRIT damiana /tmp Connection refused or timed out [07:48:35] 2013/05/19 07:47 CRIT damiana DiskSuite Connection refused or timed out [07:48:35] 2013/05/19 07:47 CRIT damiana Environment IPMI Connection refused or timed out [07:48:35] 2013/05/19 07:47 CRIT damiana Load avg. Connection refused or timed out [07:48:36] 2013/05/19 07:48 CRIT damiana NTP CRITICAL - Socket timeout after 10 seconds [07:48:36] 2013/05/19 07:47 CRIT damiana PING CRITICAL - Host Unreachable (damiana) [07:48:37] 2013/05/19 07:48 CRIT damiana SMTP CRITICAL - Socket timeout after 10 seconds [07:48:37] 2013/05/19 07:48 CRIT damiana SSH CRITICAL - Socket timeout after 10 seconds [07:48:38] 2013/05/19 07:47 CRIT damiana ts-array5 Connection refused or timed out [07:48:38] 2013/05/19 07:47 CRIT ha-dns-auth Authoritative DNS CRITICAL - Plugin timed out while executing system call [07:48:39] 2013/05/19 07:47 CRIT ha-dns-auth PING CRITICAL - Host Unreachable (ha-dns-auth) [07:48:39] 2013/05/19 07:47 CRIT ha-dns-recursor.esi DNS recursor CRITICAL - Plugin timed out while executing system call [07:48:40] 2013/05/19 07:47 CRIT ha-dns-recursor.esi PING CRITICAL - Host Unreachable (ha-dns-recursor.esi) [07:49:35] 2013/05/19 07:49 CRIT ha-ldap.esi CRITICAL - Host Unreachable (ha-ldap.esi) [07:50:35] 2013/05/19 07:49 CRIT damiana CRITICAL - Host Unreachable (damiana) [07:50:36] 2013/05/19 07:49 CRIT ha-nfs.esi CRITICAL - Host Unreachable (ha-nfs.esi) [07:50:36] 2013/05/19 07:49 CRIT ha-proxy.esi CRITICAL - Host Unreachable (ha-proxy.esi) [07:51:36] 2013/05/19 07:50 CRIT ha-dns-auth CRITICAL - Host Unreachable (ha-dns-auth) [07:51:36] 2013/05/19 07:50 CRIT ha-dns-recursor.esi CRITICAL - Host Unreachable (ha-dns-recursor.esi) [07:51:36] 2013/05/19 07:50 CRIT ha-sql.esi CRITICAL - Host Unreachable (ha-sql.esi) [07:51:36] 2013/05/19 07:51 CRIT ha-www CRITICAL - Host Unreachable (ha-www) [07:52:36] 2013/05/19 07:45 CRIT daphne SMTP CRITICAL - Socket timeout after 10 seconds [07:52:36] 2013/05/19 07:45 CRIT hyacinth SMTP CRITICAL - Socket timeout after 10 seconds [07:52:36] 2013/05/19 07:46 CRIT ortelius SMTP CRITICAL - Socket timeout after 10 seconds [07:52:36] 2013/05/19 07:45 CRIT ptolemy SMTP CRITICAL - Socket timeout after 10 seconds [07:52:36] 2013/05/19 07:46 CRIT thyme SMTP CRITICAL - Socket timeout after 10 seconds [07:52:36] 2013/05/19 07:46 CRIT yarrow SMTP CRITICAL - Socket timeout after 10 seconds [07:52:36] 2013/05/19 07:47 CRIT yarrow Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [07:52:37] 2013/05/19 07:46 CRIT z-dat-s4-a SMTP CRITICAL - Socket timeout after 10 seconds [07:52:37] 2013/05/19 07:46 CRIT z-dat-s6-a SMTP CRITICAL - Socket timeout after 10 seconds [07:53:36] 2013/05/19 07:46 CRIT adenia SMTP CRITICAL - Socket timeout after 10 seconds [07:53:36] 2013/05/19 07:46 CRIT cassia SMTP CRITICAL - Socket timeout after 10 seconds [07:53:36] 2013/05/19 07:46 CRIT hemlock /home CHECK_NRPE: Socket timeout after 30 seconds. [07:53:36] 2013/05/19 07:46 CRIT nightshade Environment IPMI CHECK_NRPE: Socket timeout after 30 seconds. [07:53:36] 2013/05/19 07:46 CRIT nightshade SMTP CRITICAL - Socket timeout after 10 seconds [07:53:37] 2013/05/19 07:47 CRIT nightshade Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [07:53:37] 2013/05/19 07:47 CRIT ortelius toolserver.org HTTP CRITICAL - Socket timeout after 10 seconds [07:53:38] 2013/05/19 07:46 CRIT rosemary SMTP CRITICAL - Socket timeout after 10 seconds [07:53:38] 2013/05/19 07:46 CRIT willow SMTP CRITICAL - Socket timeout after 10 seconds [07:53:39] 2013/05/19 07:46 CRIT z-dat-s3-a SMTP CRITICAL - Socket timeout after 10 seconds [07:53:39] 2013/05/19 07:46 CRIT z-dat-s7-a SMTP CRITICAL - Socket timeout after 10 seconds [07:54:36] 2013/05/19 07:47 CRIT nightshade / CHECK_NRPE: Socket timeout after 30 seconds. [07:54:36] 2013/05/19 07:47 CRIT nightshade /tmp CHECK_NRPE: Socket timeout after 30 seconds. [07:54:36] 2013/05/19 07:48 CRIT nightshade Load avg. CHECK_NRPE: Socket timeout after 30 seconds. [07:54:36] 2013/05/19 07:47 CRIT nightshade Sensors CHECK_NRPE: Socket timeout after 30 seconds. [07:54:36] 2013/05/19 07:47 CRIT yarrow / CHECK_NRPE: Socket timeout after 30 seconds. [07:54:37] 2013/05/19 07:47 CRIT yarrow /tmp CHECK_NRPE: Socket timeout after 30 seconds. [07:54:37] 2013/05/19 07:47 CRIT yarrow /var CHECK_NRPE: Socket timeout after 30 seconds. [07:54:38] 2013/05/19 07:47 CRIT yarrow /var/tmp CHECK_NRPE: Socket timeout after 30 seconds. [07:54:38] 2013/05/19 07:47 CRIT yarrow Environment IPMI CHECK_NRPE: Socket timeout after 30 seconds. [07:54:39] 2013/05/19 07:47 CRIT yarrow Load avg. CHECK_NRPE: Socket timeout after 30 seconds. [07:54:39] 2013/05/19 07:47 CRIT yarrow SRaid CHECK_NRPE: Socket timeout after 30 seconds. [07:54:40] 2013/05/19 07:47 CRIT yarrow Sensors CHECK_NRPE: Socket timeout after 30 seconds. [07:55:36] 2013/05/19 07:54 OK daphne SMTP SMTP OK - 2.119 sec. response time [07:55:36] 2013/05/19 07:48 CRIT nightshade /var CHECK_NRPE: Socket timeout after 30 seconds. [07:55:36] 2013/05/19 07:48 CRIT nightshade /var/tmp CHECK_NRPE: Socket timeout after 30 seconds. [07:55:36] 2013/05/19 07:55 OK willow SMTP SMTP OK - 5.006 sec. response time [07:56:36] 2013/05/19 07:49 CRIT hemlock SMTP CRITICAL - Socket timeout after 10 seconds [07:56:36] 2013/05/19 07:56 OK ortelius SMTP SMTP OK - 0.002 sec. response time [07:57:36] 2013/05/19 07:56 OK ha-dns-recursor.esi PING OK - Packet loss = 0%, RTA = 1.73 ms [07:57:36] 2013/05/19 07:56 OK ha-proxy.esi PING OK - Packet loss = 0%, RTA = 0.57 ms [07:57:36] 2013/05/19 07:56 OK ha-sql.esi PING OK - Packet loss = 0%, RTA = 0.54 ms [07:57:36] 2013/05/19 07:56 OK ha-dns-auth Authoritative DNS DNS OK: 1.079 second response time. 1.www.toolserver.org returns 91.198.174.203 [07:57:36] 2013/05/19 07:56 OK ha-dns-auth PING PING OK - Packet loss = 0%, RTA = 0.43 ms [07:57:37] 2013/05/19 07:56 OK ha-dns-recursor.esi DNS recursor DNS OK: 0.014 seconds response time. www.google.com returns 74.125.136.103,74.125.136.104,74.125.136.105,74.125.136.106,74.125.136.147,74.125.136.99 [07:57:37] 2013/05/19 07:56 OK ha-sql.esi PING PING OK - Packet loss = 0%, RTA = 0.17 ms [07:57:38] 2013/05/19 07:56 OK hemlock SMTP SMTP OK - 7.827 sec. response time [07:58:36] 2013/05/19 07:57 CRIT ha-dns-auth Authoritative DNS CRITICAL - Plugin timed out while executing system call [07:58:36] 2013/05/19 07:57 CRIT ha-dns-auth PING CRITICAL - Host Unreachable (ha-dns-auth) [07:58:36] 2013/05/19 07:47 CRIT ha-ldap.esi LDAP CRITICAL - Socket timeout after 10 seconds [07:58:36] 2013/05/19 07:57 CRIT ha-sql.esi PING PING CRITICAL - Packet loss = 80%, RTA = 502.98 ms [07:58:36] 2013/05/19 07:58 ?? nightshade SMTP check_smtp: Invalid hostname/address - nightshade [07:58:37] 2013/05/19 07:57 ?? turnera SMTP check_smtp: Invalid hostname/address - turnera [07:59:36] 2013/05/19 07:58 CRIT ha-dns-recursor.esi DNS recursor CRITICAL - Plugin timed out while executing system call [08:00:36] 2013/05/19 07:59 CRIT damiana CRITICAL - Host Unreachable (damiana) [08:00:36] 2013/05/19 07:59 CRIT ha-ldap.esi CRITICAL - Host Unreachable (ha-ldap.esi) [08:00:36] 2013/05/19 08:00 CRIT ha-nfs.esi CRITICAL - Host Unreachable (ha-nfs.esi) [08:00:36] 2013/05/19 08:00 CRIT ha-proxy.esi CRITICAL - Host Unreachable (ha-proxy.esi) [08:01:36] 2013/05/19 08:00 CRIT ha-dns-auth CRITICAL - Host Unreachable (ha-dns-auth) [08:01:36] 2013/05/19 08:01 CRIT ha-dns-recursor.esi CRITICAL - Host Unreachable (ha-dns-recursor.esi) [08:01:36] 2013/05/19 07:47 CRIT damiana DiskSuite Connection refused or timed out [08:01:36] 2013/05/19 07:47 CRIT damiana Environment IPMI Connection refused or timed out [08:01:37] 2013/05/19 07:47 CRIT damiana Load avg. Connection refused or timed out [08:01:37] 2013/05/19 08:01 ?? damiana SMTP check_smtp: Invalid hostname/address - damiana [08:01:37] 2013/05/19 08:01 ?? damiana SSH Usage: [08:01:38] 2013/05/19 07:47 CRIT ha-proxy.esi PING CRITICAL - Host Unreachable (ha-proxy.esi) [08:02:36] 2013/05/19 08:01 CRIT ha-sql.esi CRITICAL - Host Unreachable (ha-sql.esi) [08:02:37] 2013/05/19 07:55 CRIT daphne SMTP CRITICAL - Socket timeout after 10 seconds [08:02:37] 2013/05/19 07:47 CRIT ha-nfs.esi NFS No route to host [08:02:37] 2013/05/19 08:00 ?? willow SMTP check_smtp: Invalid hostname/address - willow [08:02:37] 2013/05/19 08:02 ?? z-dat-s6-a MySQL check_mysql: Invalid hostname/address - z-dat-s6-a [08:02:37] 2013/05/19 08:01 ?? z-dat-s6-a MySQL slave check_mysql: Invalid hostname/address - z-dat-s6-a [08:02:37] 2013/05/19 08:02 ?? z-dat-s6-a SMTP check_smtp: Invalid hostname/address - z-dat-s6-a [08:02:38] 2013/05/19 08:02 ?? z-dat-s6-a wikidata replag check_mysql_query: Invalid hostname/address - z-dat-s6-a [08:03:36] 2013/05/19 07:47 CRIT ha-dns-recursor.esi PING CRITICAL - Host Unreachable (ha-dns-recursor.esi) [08:03:37] 2013/05/19 08:02 CRIT nightshade SMTP CRITICAL - Socket timeout after 10 seconds [08:03:37] 2013/05/19 08:03 ?? yucca SMTP check_smtp: Invalid hostname/address - yucca [08:03:37] 2013/05/19 08:02 ?? z-dat-s3-a MySQL slave check_mysql: Invalid hostname/address - z-dat-s3-a [08:04:37] 2013/05/19 08:04 OK damiana PING OK - Packet loss = 0%, RTA = 0.36 ms [08:04:37] 2013/05/19 08:04 OK ha-nfs.esi PING OK - Packet loss = 0%, RTA = 0.43 ms [08:04:37] 2013/05/19 08:04 OK damiana PING PING OK - Packet loss = 0%, RTA = 0.41 ms [08:04:37] 2013/05/19 08:04 OK ha-nfs.esi PING PING OK - Packet loss = 16%, RTA = 0.17 ms [08:04:37] 2013/05/19 07:57 ?? turnera SSH Usage: [08:04:37] 2013/05/19 08:03 ?? z-dat-s3-a MySQL check_mysql: Invalid hostname/address - z-dat-s3-a [08:04:37] 2013/05/19 08:03 ?? z-dat-s3-a SMTP check_smtp: Invalid hostname/address - z-dat-s3-a [08:05:37] 2013/05/19 08:04 OK damiana Environment IPMI ok: temperature ok fan ok voltage ok chassis ok [08:05:37] 2013/05/19 08:04 OK damiana Free Memory OK - 90.5% (7580204 kB) free. [08:05:37] 2013/05/19 08:04 OK damiana Load avg. OK - load average: 0.90, 0.54, 0.21 [08:05:37] 2013/05/19 08:01 ?? damiana SMTP check_smtp: Invalid hostname/address - damiana [08:05:37] 2013/05/19 08:04 OK nightshade / DISK OK - free space: / 1593 MB (89% inode=94%): [08:05:49] 2013/05/19 08:05 CRIT ortelius SMTP CRITICAL - Socket timeout after 10 seconds [08:05:49] 2013/05/19 08:05 ?? wolfsbane SMTP check_smtp: Invalid hostname/address - wolfsbane [08:06:49] 2013/05/19 08:05 OK daphne SMTP SMTP OK - 0.001 sec. response time [08:06:49] 2013/05/19 08:06 ?? thyme s1 replag check_mysql_query: Invalid hostname/address - thyme [08:06:49] 2013/05/19 08:05 CRIT turnera SMTP CRITICAL - Socket timeout after 10 seconds [08:06:49] 2013/05/19 08:05 OK turnera SSH SSH OK - OpenSSH_5.5p1 Debian-6+squeeze3 (protocol 2.0) [08:06:49] 2013/05/19 08:06 ?? z-dat-s5-b MySQL slave check_mysql: Invalid hostname/address - z-dat-s5-b [08:06:50] 2013/05/19 08:06 ?? z-dat-s5-b SMTP check_smtp: Invalid hostname/address - z-dat-s5-b [08:06:50] 2013/05/19 08:06 ?? z-dat-s5-b s4 replag check_mysql_query: Invalid hostname/address - z-dat-s5-b [08:06:51] 2013/05/19 08:06 ?? z-dat-s5-b wikidata replag check_mysql_query: Invalid hostname/address - z-dat-s5-b [08:07:49] 2013/05/19 08:07 ?? ha-sql.esi MySQL check_mysql: Invalid hostname/address - ha-sql.esi [08:07:49] 2013/05/19 08:07 ?? thyme SMTP check_smtp: Invalid hostname/address - thyme [08:07:49] 2013/05/19 08:01 ?? willow SSH Usage: [08:07:49] 2013/05/19 08:06 ?? z-dat-s5-b MySQL check_mysql: Invalid hostname/address - z-dat-s5-b [08:08:51] 2013/05/19 08:07 OK ha-dns-auth PING OK - Packet loss = 0%, RTA = 0.18 ms [08:08:51] 2013/05/19 08:07 OK ha-dns-recursor.esi PING OK - Packet loss = 0%, RTA = 0.27 ms [08:08:51] 2013/05/19 08:07 OK ha-ldap.esi PING OK - Packet loss = 0%, RTA = 0.30 ms [08:08:51] 2013/05/19 08:07 OK ha-proxy.esi PING OK - Packet loss = 0%, RTA = 0.21 ms [08:08:51] 2013/05/19 08:07 OK ha-sql.esi PING OK - Packet loss = 0%, RTA = 0.58 ms [08:09:57] 2013/05/19 08:08 CRIT z-dat-s3-a MySQL Access denied for user 'tsnagios7643'@'turnera-bge0.esi.toolserver.org' (using password: NO) [08:09:57] 2013/05/19 08:08 CRIT z-dat-s5-b MySQL Access denied for user 'tsnagios7643'@'turnera-bge0.esi.toolserver.org' (using password: NO) [08:09:57] 2013/05/19 08:08 CRIT z-dat-s6-a MySQL slave Access denied for user 'tsnagios7643'@'turnera-bge0.esi.toolserver.org' (using password: NO) [08:10:57] 2013/05/19 08:10 OK damiana DiskSuite OK - No disk failures detected [08:13:57] 2013/05/19 08:13 WARN willow Sun Grid Engine execd NRPE: Unable to read output [08:14:57] 2013/05/19 08:14 CRIT nightshade APT CHECK_NRPE: Socket timeout after 30 seconds. [08:14:57] 2013/05/19 08:13 CRIT nightshade Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [08:14:57] 2013/05/19 08:13 CRIT ortelius Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [08:14:57] 2013/05/19 08:14 CRIT ortelius toolserver.org HTTP CRITICAL - Socket timeout after 10 seconds [08:14:57] 2013/05/19 08:14 CRIT willow Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [08:14:58] 2013/05/19 08:13 CRIT wolfsbane Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [08:14:58] 2013/05/19 08:13 CRIT yarrow APT CHECK_NRPE: Socket timeout after 30 seconds. [08:14:59] 2013/05/19 08:13 CRIT yarrow Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [08:18:58] 2013/05/19 08:18 WARN nightshade APT APT WARNING: 67 packages available for upgrade (0 critical updates). [08:18:58] 2013/05/19 08:18 ?? nightshade Sun Grid Engine execd Error with qhost: error: commlib error: got select error (Connection refused) [08:18:58] 2013/05/19 08:18 WARN ortelius Sun Grid Engine execd NRPE: Unable to read output [08:19:59] 2013/05/19 08:18 OK hemlock /home DISK OK - free space: /home 12873 MB (25% inode=80%): [08:19:59] 2013/05/19 08:19 OK ortelius toolserver.org HTTP HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.009 second response time [08:19:59] 2013/05/19 08:19 WARN willow Sun Grid Engine execd NRPE: Unable to read output [08:19:59] 2013/05/19 08:18 WARN yarrow APT APT WARNING: 67 packages available for upgrade (0 critical updates). [08:19:59] 2013/05/19 08:18 ?? yarrow Sun Grid Engine execd Error with qhost: error: commlib error: got select error (Connection refused) [08:26:00] 2013/05/19 08:25 WARN nightshade Load avg. WARNING - load average: 1.31, 7.13, 19.11 [08:30:01] 2013/05/19 08:29 OK nightshade Load avg. OK - load average: 1.00, 3.74, 14.97 [08:45:02] 2013/05/19 08:44 OK nightshade Sun Grid Engine execd Host and Queues Ok [08:45:02] 2013/05/19 08:44 OK yarrow Sun Grid Engine execd Host and Queues Ok [08:47:02] 2013/05/19 08:46 ?? wolfsbane Sun Grid Engine execd CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:48:02] 2013/05/19 08:47 CRIT wolfsbane / Connection refused or timed out [08:48:02] 2013/05/19 08:47 CRIT wolfsbane /tmp Timeout while attempting connection [08:48:02] 2013/05/19 08:47 CRIT wolfsbane Environment IPMI Timeout while attempting connection [08:48:02] 2013/05/19 08:47 CRIT wolfsbane Load avg. Connection refused or timed out [08:48:02] 2013/05/19 08:47 CRIT wolfsbane NTP CRITICAL - Socket timeout after 10 seconds [08:48:03] 2013/05/19 08:47 CRIT wolfsbane SSH CRITICAL - Socket timeout after 10 seconds [08:48:03] 2013/05/19 08:47 CRIT wolfsbane Sun Grid Engine execd Connection refused or timed out [08:49:02] 2013/05/19 08:48 CRIT wolfsbane PING CRITICAL - Host Unreachable (wolfsbane) [08:50:02] 2013/05/19 08:49 OK wolfsbane / DISK OK - free space: / 9887 MB (33% inode=93%): [08:50:02] 2013/05/19 08:49 OK wolfsbane Load avg. OK - load average: 2.23, 0.73, 0.26 [08:50:02] 2013/05/19 08:49 ?? wolfsbane Sun Grid Engine execd Cannot execute /sge/GE/bin/sol-amd64/qstat [08:51:08] 2013/05/19 08:50 OK wolfsbane /tmp DISK OK - free space: / 11558 MB (38% inode=93%): [08:51:08] 2013/05/19 08:50 OK wolfsbane Environment IPMI ok: temperature ok fan ok voltage ok chassis ok [08:51:08] 2013/05/19 08:50 OK wolfsbane NTP NTP OK: Offset 0.043123 secs [08:51:08] 2013/05/19 08:50 OK wolfsbane PING PING OK - Packet loss = 0%, RTA = 0.12 ms [08:51:08] 2013/05/19 08:50 OK wolfsbane SMTP SMTP OK - 1.267 sec. response time [08:51:08] 2013/05/19 08:50 OK wolfsbane SSH SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [08:51:08] 2013/05/19 08:50 WARN wolfsbane Sun Grid Engine execd NRPE: Unable to read output [08:51:08] 2013/05/19 08:50 OK wolfsbane toolserver.org HTTP HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.019 second response time [08:57:07] [[Report tool]] ! 10https://wiki.toolserver.org/w/index.php?diff=8012&oldid=4667&rcid=21929 * 125.209.70.3 * (-1567) (/* For developers */ ) [09:31:52] Merlissimo: Hello Merlissimo! Hope you are doing well?! I got SGE issues again... can you help?? [09:46:51] Merlissimo: (ok - I read the last mails now... ;) [10:40:32] [[Special:Log/newusers]] create 10 * Kaj * (New user account) [11:28:12] 2013/05/19 11:24 CRIT damiana Free Memory CRITICAL - 2.4% (204688 kB) free! [12:53:18] 2013/05/19 12:52 OK damiana Free Memory OK - 58.0% (4864016 kB) free. [14:22:22] 2013/05/19 14:15 CRIT damiana SMTP CRITICAL - Socket timeout after 10 seconds [14:22:22] 2013/05/19 14:15 CRIT damiana SSH CRITICAL - Socket timeout after 10 seconds [14:22:22] 2013/05/19 14:15 CRIT ha-proxy.esi HTTP proxy CRITICAL - Socket timeout after 10 seconds [14:22:22] 2013/05/19 14:15 CRIT ha-www HTTP svn CRITICAL - Socket timeout after 10 seconds [14:45:37] Hello all [14:46:55] Morgen, DaB [14:53:12] Are you managing to recover things well? Is there something I can do to help? [15:06:57] no problems at the moment. The moving is working quite good at the moment (*knock on wood*) [15:40:50] Merlissimo: you wrote qcronsub, right? Is it open-source, and if so, under which license? It would be nice to have it on Tool Labs. [15:43:14] valhallasw: that's a very simple bash script with no special license. it only does a precheck. main part which garantees uniques is is done on jsv scripts [15:49:26] Merlissimo: 'no special license' = 'all rights maintained' ;-) [15:49:36] reserved* [15:50:18] I think your standard TS license is MPL [15:50:25] is there any maintaner mentioned in the header? [15:50:30] no, not at all [15:50:40] MPL is only for my tools [15:50:40] that's why I asked [15:51:42] valhallasw: Is there some functionality missing to jsub? [15:51:50] i think this script is to simple to have any copyrights protections [15:52:24] Coren: not specifically, but to ease transition it would be good to have qcronsub available [15:53:31] valhallasw: I would very much rather encourage people to switch to jsub so as to avoid multiplying the ways in which jobs are started; it makes support much easier, and documentation considerable simpler to maintain accurate. A qcronsub -> jsub guide would be a better idea. :-) [15:53:50] Also, to be fair, many of the things that are started from cron on TS shouldn't on TL. [15:53:51] Coren: I would suggest to implement qcronsub with a deprecation warning [15:54:00] valhallasw: the real uniquess function is in /sge/scripts/jsv-client [15:54:09] Merlissimo: ok, thanks [15:55:28] valhallasw: I honestly don't see the point; maintainers /have/ to change their crontab entries anyways since we're not using the same resources for jobs. But hey -- whatever floats your boat. If you want to move qcronsub over be my guest. :-) [15:57:20] SGE is the devil. [15:57:31] I hope to God nothing like it appears on Labs. [15:58:05] Coren: yes, that is true. It might be more effective to get jsub to read the requested resources from a file [15:59:06] Susan: Define "nothing like it"? :-) We're using gridengine, the open source fork. Not sure why you see it as evil, though, it has given rock-solid scheduling and job management. [16:00:03] I like when I can set up a script to run at a specified time and it'll run. [16:00:11] SGE seems to block that. [16:00:14] With obscure and awful syntax. [16:00:23] With obscure and awful header requirements. [16:00:42] With obscure and awful resource management (where it'll execute on some foreign host where nothing works properly). [16:00:56] So rather than "python /path/to/file.py", I end up with... [16:01:50] Susan: The syntax is, indeed, a sun monstorsity who only exists for hysterical raisins; but I don't get your other points. There are no header requirements (that's an optional way of giving options), the resource management is - at least on labs - simply a matter of saying how much memory you need to reserve. [16:01:54] qcronsub -b y -j y -l arch=lx -l h_rt=0:10:00 -l virtual_free=60M -l sql-s3-rr=1 -o ~/var/log ~/bin/userspaceinitems.py [16:02:16] Coren: The Toolserver may have made my impression worse. [16:02:18] Yeah, you need nothing anywhere that complicated on labs [16:02:26] Because there was also Linux v. Solaris to deal with. [16:02:33] and the databases... [16:02:53] So in addition to a script possibly running on any host (which is a huge problem), it also had architecture issues to combat. [16:03:06] "jsub -mem xxx /path/to/file.py" will work fine if file.py has a shebang. [16:03:20] Oh, and file permissions. [16:03:40] It's death by a thousand cuts. [16:03:45] In labs, all the exec nodes are identical and have access to the same resources; that simplifies things greatly. [16:03:58] Plus SGE used to break all the time. [16:04:09] I'm tainted. :-( [16:04:19] Well, newbies have sucessfuly gotten scheduled and continuous jobs running on TL, I expect you'll have absolutely no difficulties. :-) [16:04:43] People are asking me about why database reports are failing to update and my answer is basically "the Toolserver is dying." [16:04:53] And by "basically" I mean that's what I tell them verbatim. [16:05:01] Is DB replication up and running on Labs yet? [16:05:23] Yeah, despite Daniel's best effort it's bailing water out of the titanic. I'm sure that moving the core services to linux will help a great deal for stability. [16:05:51] I can't remember when the DB replication goal was. [16:05:54] End of this month? [16:06:06] Susan: I'm testing it now, with dewiki while Asher is doing the initial dump/sanitize [16:06:26] \o/ [16:06:31] A bit earlier, I expect to have it for the Amsterdam Hackaton. Probably in unpolished state, but functional. [16:06:41] Coren: is there any ETA for db replication on ToolLabs? [16:06:49] Merlissimo: ^^ :-) [16:07:04] Heh. [16:07:45] To be fair, however, we're not doing things /quite/ the same way so it'll require a bit of adaptation. On the flipside, it's going to be /much/ faster and robust. [16:07:55] * Coren tries to keep adaptation to the minimum. [16:08:10] so next week i could try to run some queries? how many ToolLabs wmf maintainers will be at hackaton? [16:08:21] Coren: -cwd is also useful [16:08:23] Merlissimo: We'll basically all be there. [16:08:48] Betacommand: I prefer my scripts to be explicit about chdir themselves and stuff, but yeah, -cwd is often useful. [16:09:25] mmh, then i must rethink if i have time for a trip or not [16:09:57] Coren: for things that require imports and what not it just reduces the things that can break [16:10:23] Coren: can you explain what differences you are referring to with the DBs? [16:11:03] Susan: The biggest change is that, because of indexing, there will be a couple of alternate views to the same tables you might want to switch queries to. For instance, revision has a revision_userindex which is orders of magnitude faster if you have where clauses or orders by rv_user; at the cost of not containing revisions with suppresed users. [16:12:07] Whereas revision has suppressed users as null, but if you where on that column you end up with full table scans (!) unless you also have another where clause to restrict a lot. [16:12:09] Coren: please tell me that all of these views will be documented and understandable? [16:12:28] Betacommand: Of course. There aren't very many of them, most tables are just straight through. [16:12:53] logging will be the same, I assume. [16:13:13] The TS had custom views like that as well. [16:13:20] For performance reasons. For example, logging_ts_alternate or whatever. [16:13:34] Susan: The "important" tables like this are abuse_filter, ipblocks, logging, and recentchanges [16:13:46] * Susan nods. [16:13:56] Will database names have a suffix? [16:14:02] Like enwiki --> enwiki_p? [16:14:15] * Coren nods. The same prefix too, to avoid confusing matters further. [16:14:28] Cool. [16:14:33] Coren: will we still have access to the archive table? [16:14:49] Please say yes. [16:15:00] Adminstats depends on that access. [16:15:26] Cyberpower678: not logging? [16:15:45] Betacommand, It counts deleted edits. [16:15:52] There was no plan to allow access to archive. If you need collected stats out of it I can create a view on it though. [16:16:12] Coren, please do. [16:16:13] Coren: a lot of TS users use the archive table [16:16:40] we like having as much access as possible [16:16:45] Betacommand, adminstats really digs into archive, users, and logging. [16:17:01] I'm not sure the archive table is used very often. [16:17:11] Betacommand: Part of the rules come from legal, the primary of which is "same info should be available without +sysop", but specific use cases can be exempted. Counting deleted revisions seems a straightforward "Okay" to me. :-) [16:17:17] Cyberpower678: Really digs into archive for what? [16:17:49] Susan, it counts deleted revisions. [16:17:49] Coren: what we currently have access to via the archive table is just basic history information [16:17:56] For a crapload of users. [16:18:03] Cyberpower678: So a basic integer. Anything else? [16:18:25] Not really [16:18:32] K. [16:18:45] Coren: bot using wiki databases are going to bot or tools labs? [16:18:48] Susan: Ive used it quite often for researching users and other issues where page_title is needed [16:18:58] Betacommand: I know, but legal wants to be extra paranoid. Like I said, access to archive is a no-no, but we can extract information from it and give it. [16:19:03] Betacommand: I don't understand what you mean. [16:19:05] It needs to be setup like toolserver or adminstats will break. [16:19:42] https://en.wikipedia.org/wiki/Wikipedia:ADMINSTATS [16:19:44] Do you mean that? [16:19:48] Merlissimo: Tools. Bots is the "experimental environment" where new versions of stuff is tried and can break occasionally. Tools has a strict change management cycle, and has "always up" as objective - at the cost of being conservative. [16:19:49] Because I wrote that and it doesn't use the archive table at all. [16:20:00] Susan: cross referencing edits to pages that have been completely deleted, grouping by page title and other stuff [16:20:16] Betacommand: I doubt you're grouping by page title in the archive table. [16:20:34] Anyway, you won't have access any longer. [16:20:35] Susan, no. I mean https://en.wikipedia.org/wiki/Template:Adminstats [16:21:00] Cyberpower678: Right. All you need is a simple integer. Coren says that's fine. [16:21:25] Coren: having the same access as the researcher user group is what the TS has [16:21:35] Coren, what server is this database located on. [16:21:37] ? [16:21:52] basiclly access to delete page history without access to page contents [16:21:58] Cyberpower678: Make a use case of what info you need and what queries you run, and we can arrange something. We have (almost) complete freedom in creating special-use views even when we can't give access to the underlying tables. [16:23:23] Betacommand: I'd have to clear it with Legal. I see it likely to be okayed, iff access to the tables need to be requested rather than given by default. [16:24:52] Coren, SELECT count(*) AS count FROM archive WHERE `ar_user_text` = ''; [16:25:14] Cyberpower678: That won't even need an okay. I can tell you outright that view would be okay. [16:25:30] Although, if you wanted to be non-evil, you'd use ar_user instead. [16:26:03] (Well, unless you're looking for anon contribs obviously) [16:26:31] Cyberpower678: I'm not going to promise it for Amsterdam though; we want to get the basic stuff working first. [16:26:36] (I'll try though) [16:26:49] This is written directly into the framework. [16:27:03] Coren: http://en.wikipedia.org/wiki/Special:ListUsers/researcher [16:27:08] Cyberpower678: Yeah, I don't mind ar_user_text [16:27:10] I haven't yet had the chance to look at and update it as necessary. [16:27:22] ar_user is just much faster if that's all you need. :-) [16:27:25] that is basically what I am looking for [16:27:54] can't archive be setup like on toolserver? [16:28:06] It's been that way forever. [16:28:06] Betacommand: Yeah, I know about +researcher. I'll ask legal for the okay to create a matching set of views to grant on request. [16:28:27] thanks [16:28:44] Coren, What's the host to connect to it? [16:28:55] Cyberpower678: Legal doesn't even want archive to /be/ on the physical servers. Interestingly enough, the way we're doing it now, most of the stuff is actually stripped /before/ being replicated in the first place. [16:28:58] Cyberpower678: its not available yet [16:29:06] Cyberpower678: It's not connectable from projects yet. :-) [16:29:17] Cyberpower678: Patience, grasshopper. [16:29:54] I'm anxious to get it setup on labs. Do you at least what the host is going to be called so I can have it setup? [16:30:21] Coren: are there going to be multiple copies of the DBs? [16:31:12] Betacommand: There will never be any multiple copies. That's one of the biggest hornet nests. We'll set up federated tables to commons and wikidata on all shards though. It covers 99% of use cases fairly efficiently. [16:32:10] (And the rest is simple enough to do with application logic) [16:36:55] Hey, I notice we're on -toolserver. It would probably be polite to move that chatter to -labs. [16:37:26] Coren, you noticed that just now. :p [16:37:27] ? [16:37:48] I was wondering why this chatter was taking place here. [18:10:29] Good afternoon [18:11:23] I would like to see where there are any other details or clarifications that I should make for https://jira.toolserver.org/browse/TS-1652 [18:22:00] re [18:52:17] the sge-moving starts now [19:02:37] 2013/05/19 18:56 ?? nightshade Sun Grid Engine execd Error with qhost: error: commlib error: got select error (Connection refused) [19:02:37] 2013/05/19 18:56 ?? yarrow Sun Grid Engine execd Error with qhost: error: commlib error: got select error (Connection refused) [20:39:43] 2013/05/19 20:38 CRIT yarrow Sun Grid Engine execd CRITICAL: execd not communicating [20:41:43] 2013/05/19 20:40 OK yarrow Sun Grid Engine execd Host and Queues Ok [20:44:43] 2013/05/19 20:43 OK nightshade Sun Grid Engine execd Host and Queues Ok [20:45:43] 2013/05/19 20:45 ?? willow Sun Grid Engine execd Cannot execute /sge/GE/bin/sol-amd64/qstat [20:46:43] 2013/05/19 20:46 WARN willow Sun Grid Engine execd NRPE: Unable to read output [22:01:18] nacht ts