[00:03:26] @replag [00:03:26] Reedy: s1-sec: 5m 20s [-0.00 s/s]; s1-sec-c: 6m 56s [+0.00 s/s]; s2-pri: 55s [+0.00 s/s]; s2/s5-pri-c: 6m 2s [+0.00 s/s]; s3-rr: 36s [-0.00 s/s]; s3-user: 36s [-0.00 s/s]; s4-rr: 6m 56s [+0.00 s/s]; s4-user: 6m 26s [+0.00 s/s] [00:05:21] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:05:22] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:05:22] Evening all. Any toolserver admins, can you please check the toolserver over? A couple of the ACC project's tools aren't working, #wikimedia-tech reckon centralauth is fine. We can't get anything at the moment which is using centralauth, and according to reedy, nagios is bitching at the TS too. [00:08:02] BarkingFish: Nagios has been throwing warnings and errors for about an hour but it doesn't any more. Is it okay now ? [00:08:56] at the moment, as of about 4 or 5 minutes ago, no. I'll try the utils we have which aren't working again, and let you know in a sec, Krinkle :) [00:10:14] ok, one is working (SUL Util), the other, luxo's global contribs tool, is still throwing errors at me [00:10:20] SSH on nightshade.mgmt is CRITICAL: Server answer: [00:11:29] and the errors are? [00:12:16] SUL Status: unable to connect to centralauth database [00:12:49] and that (in yellow boxes) is attached to about 60 entries across wiktionary, wikibooks, wp and others, mostly foreign language versions [00:13:22] it just says i can get more information here [00:14:18] @replag all [00:14:18] Reedy: s1-pri: 8s [-0.00 s/s]; s1-sec: 5m 30s [+0.02 s/s]; s1-sec-c: 10m 30s [+0.33 s/s]; s2-pri: 5s [-0.08 s/s]; s2/s5-pri-c: 9m 59s [+0.36 s/s]; s3-rr: 3m 19s [+0.25 s/s]; s3-user: 3m 19s [+0.25 s/s]; s4-rr: 10m 30s [+0.33 s/s] [00:14:19] Reedy: s4-user: 4m 34s [-0.17 s/s]; s5-rr: 5s [-0.00 s/s]; s5-user: 5s [-0.00 s/s]; s6-rr: 1s [-0.01 s/s]; s6-user: 1s [-0.01 s/s]; s7-rr: 5s [-0.00 s/s]; s7-user: 5s [-0.00 s/s] [00:15:25] btw Krinkle - nagios on the toolserver is also not bringing up your map, presents me with an internal server error :) [00:17:30] apparently, one of our other users has been able to get the global contribs tool to work, so I'm gonna try again. Give me a few moments, if it's ok, I'll let you know :) [00:18:01] yup, it seems to be up and working now. [00:18:09] nvm, thanks for the help anyway :) [00:21:01] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 45025 MB (4% inode=99%): [00:31:51] Sun Grid Engine execd on ortelius is WARNING: short@ortelius exceedes load threshold: alarm hl:np_load_short=1.150391/1.00, alarm hl:np_load_long=0.692383/1.50, alarm hl:mem_free=14693.000000M/300M: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.150391/1.10, alarm hl:np_load_long=0.692383/1.75, alarm hl:mem_free=14693.000000M/300M [00:32:51] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [00:33:20] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [01:05:21] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:05:21] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:08:30] MySQL slave on rosemary is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1899 [01:08:40] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 1910.000000 [01:10:21] SSH on nightshade.mgmt is CRITICAL: Server answer: [01:21:01] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 53911 MB (5% inode=99%): [01:33:21] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [01:58:40] s1 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1794.000000 [01:59:31] MySQL slave on rosemary is OK: Uptime: 6162825 Threads: 32 Questions: 1757291842 Slow queries: 763358 Opens: 9526 Flush tables: 1 Open tables: 890 Queries per second avg: 285.143 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1737 [02:05:29] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:05:29] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:11:20] SSH on nightshade.mgmt is CRITICAL: Server answer: [02:22:00] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54677 MB (5% inode=99%): [02:41:28] Who is pathoschild? [02:42:14] Wikimedia steward [02:42:25] Do they use IRC? [02:42:31] https://meta.wikimedia.org/wiki/User:Pathoschild [02:42:38] He used to. Not very regularly any longer. [02:42:49] Thanks ;) [03:00:20] 3(created) [TS-1267] Install gcj on nightshade; Toolserver; Minor Task <10https://jira.toolserver.org/browse/TS-1267> (Tim.Landscheidt) [03:03:20] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [03:05:30] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:05:30] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:11:19] SSH on nightshade.mgmt is CRITICAL: Server answer: [03:22:01] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54509 MB (5% inode=99%): [04:03:21] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [04:05:30] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:05:31] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:11:20] SSH on nightshade.mgmt is CRITICAL: Server answer: [04:22:03] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54383 MB (5% inode=99%): [04:32:51] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:33:41] SMTP on hyacinth is OK: SMTP OK - 0.005 sec. response time [05:03:21] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [05:05:30] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:05:30] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:11:21] SSH on nightshade.mgmt is CRITICAL: Server answer: [05:22:12] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54178 MB (5% inode=99%): [06:03:23] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [06:06:30] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:06:30] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:11:22] SSH on nightshade.mgmt is CRITICAL: Server answer: [06:22:11] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54122 MB (5% inode=99%): [07:06:30] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:07:30] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:12:22] SSH on nightshade.mgmt is CRITICAL: Server answer: [07:23:12] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 53999 MB (5% inode=99%): [07:33:21] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [08:06:30] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:07:31] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:08:12] /aux0 on daphne is CRITICAL: DISK CRITICAL - free space: /aux0 38966 MB (4% inode=99%): [08:12:21] SSH on nightshade.mgmt is CRITICAL: Server answer: [08:23:12] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 53877 MB (5% inode=99%): [08:33:21] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [09:06:31] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:07:30] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:13:21] SSH on nightshade.mgmt is CRITICAL: Server answer: [09:23:55] [[Special:Log/newusers]] create 10 * UrSuS * (New user account) [09:24:10] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54687 MB (5% inode=99%): [09:45:21] s4 replag on daphne is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 1921.000000 [09:46:20] Good morning [09:50:21] s4 replag on daphne is OK: QUERY OK: SELECT ts_rc_age() returned 1603.000000 [10:03:20] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [10:06:31] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:07:11] I will reboot nightshade in a few minutes [10:07:32] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:09:14] hello nosy [10:09:25] hello DaBPunkt [10:09:32] long time no see [10:09:35] :) [10:09:56] will reset the san controllers one after another [10:10:44] nosy: have fun [10:10:50] you too [10:10:57] you reboot nightshade? [10:11:07] in 1 minute [10:11:08] i rebooted it for updates [10:11:19] once - remember? [10:11:24] 3(commented) [DRTRIGON-112] subster_irc bot forgets wiki login when accessing other mediawiki project <10https://jira.toolserver.org/browse/DRTRIGON-112> (drtrigon) [10:11:26] do you get to see the bios then? [10:11:30] nosy: yes, I remembered [10:11:35] I hope so [10:11:51] oki *press thumbs* [10:11:58] hehe this was the reboot [10:12:13] now we see, which irc-bots use sge ;) [10:13:31] SSH on nightshade.mgmt is CRITICAL: Server answer: [10:13:54] mm, doesn't help. no bios [10:14:14] <|Danny_B|> did i miss any announcement about reboot? [10:14:44] |Danny_B|: yes, posted here in the channel some minutes gao [10:15:15] nosy and I are in teh datacenter, so rebooting will happen from time to time on short note-base [10:15:28] (I mailed that few days ago) [10:15:55] <|Danny_B|> ah right [10:16:05] <|Danny_B|> bad i didn't catch it [10:16:23] <|Danny_B|> so i ton't have some stuff saved [10:18:50] Sun Grid Engine execd on nightshade is CRITICAL: all.q@nightshade in unknown state: longrun@nightshade in unknown state [10:21:51] DaBPunkt: resetted the san by pulling out the controllers one after another :) - worked! [10:22:12] and hyacinth san volumes are still alive [10:22:42] |Danny_B|: some editors store the non-saved text in a file called DEADJOE or so. [10:22:56] nosy: great. [10:24:11] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54556 MB (5% inode=99%): [10:32:31] Virtual disks on far1-n1-oe16-esams.mgmt is OK: OK 2, WARN 0, CRIT 0: far1-n1-bulk FTOL, far1-n1-fast FTOL [10:36:33] <|Danny_B|> DaBPunkt: nosy the cron @reboot did not run [10:36:55] <|Danny_B|> it is persistent bug for at least three restarts back in past [10:38:19] Danny_B: yes but i do not know why [10:38:29] must be a cron daemon problem [10:38:42] <|Danny_B|> :-(((( [10:38:48] <|Danny_B|> cronie actually [10:38:52] i have some other thing i want to do now [10:39:06] DaBPunkt: I would like to move ptolemy and ortelius [10:39:20] <|Danny_B|> any other restart planned or could i start the screens? [10:39:39] DaBPunkt: ^^ [10:39:44] nosy: sure. I will help you [10:39:51] great [10:39:54] |Danny_B|: better use another server [10:40:08] unfortunatelly apmon is not here atm [10:40:41] anyway: with what machine do we want to start? ortelius? [10:40:52] your choice [10:41:05] ortelius then [10:41:14] <|Danny_B|> it's not allowed to run shell stuff elsewhere than on nightshade so far i remember [10:41:26] what about willow? [10:42:05] |Danny_B|: you can use willow too [10:43:25] nosy: Do shutdown on ortelius now [10:43:36] DaBPunkt: me? [10:43:40] I do [10:44:26] shyt [10:44:46] forgot about maintenance, lost information about my screens [10:45:56] <|Danny_B|> Hydriz: welcome to the club... [10:46:03] heh [10:46:28] nosy: do you have a changelog of what happened to the cluster during this maintenance? [10:48:23] <|Danny_B|> if we are switchng back to apache, can we also switch back to original cron? [10:54:11] s5 replag on daphne is CRITICAL: (Service Check Timed Out) [10:54:39] s2 replag on daphne is CRITICAL: (Service Check Timed Out) [10:54:40] s4 replag on daphne is CRITICAL: (Service Check Timed Out) [10:55:12] s2 replag on daphne is OK: QUERY OK: SELECT ts_rc_age() returned 251.000000 [10:55:12] s5 replag on daphne is OK: QUERY OK: SELECT ts_rc_age() returned 223.000000 [10:55:12] s4 replag on daphne is OK: QUERY OK: SELECT ts_rc_age() returned 254.000000 [11:07:30] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:08:30] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:12:45] I tried to delete a longrun@nightshade job from the queue but it seems impossible, neither with qdel -f [11:13:00] Any ideas? [11:13:31] SSH on nightshade.mgmt is CRITICAL: Server answer: [11:13:33] (Apart from relaunching it with another name) [11:14:03] DaBPunkt ? [11:15:22] jem-: they are automatically reschuduled on another host after 50 min [11:15:31] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:15:32] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:15:32] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:15:32] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:15:36] Hmmmm, Ok [11:15:39] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:16:14] Then I'll have to change the name for the moment [11:16:18] (or killed if startet with -r n) [11:16:20] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:16:20] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:16:20] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:16:20] SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:16:27] jem-: which job? [11:16:30] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [11:16:45] you can manually restart with qmod -rj [11:16:48] 1428098, my IRC bot [11:17:22] Well, I have relaunched already [11:18:29] Anyway, qmod -rj replies "The job 1428098 is already in deleted state. No rescheduling!" [11:18:48] sure, because you used qdel before [11:18:51] Sun Grid Engine execd on nightshade is CRITICAL: all.q@nightshade in unknown state: longrun@nightshade in unknown state [11:24:11] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54449 MB (5% inode=99%): [11:34:01] ethernet 0/1/24 on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/24:UP: 1 int NOK : CRITICAL [12:07:30] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:08:42] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:09:01] ethernet 0/1/24 on asw-oe10-esams.mgmt is OK: GigabitEthernet0/1/24:DOWN:1 UP: OK [12:12:01] ethernet 0/1/24 on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/24:UP: 1 int NOK : CRITICAL [12:13:31] SSH on nightshade.mgmt is CRITICAL: Server answer: [12:18:50] Sun Grid Engine execd on nightshade is CRITICAL: all.q@nightshade in unknown state: longrun@nightshade in unknown state [12:25:11] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54309 MB (5% inode=99%): [12:50:37] I will reboot nightshade again in a few minutes [12:50:47] <|Danny_B|> good to know [12:50:50] |Danny_B|: ok? [12:50:53] ok [12:51:00] <|Danny_B|> i'll turn off my stuff [12:51:16] <|Danny_B|> could you pls check about that cron? [13:07:30] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:08:41] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:12:11] ethernet 0/1/24 on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/24:UP: 1 int NOK : CRITICAL [13:14:11] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [13:14:21] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver svc:/network/ldap/client:default [13:14:30] SSH on nightshade.mgmt is CRITICAL: Server answer: [13:15:01] NTP on ortelius is WARNING: NTP WARNING: Server has the LI_ALARM bit set, Offset -0.006857 secs [13:19:01] Sun Grid Engine execd on nightshade is CRITICAL: all.q@nightshade in unknown state: longrun@nightshade in unknown state [13:25:10] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54159 MB (5% inode=99%): [13:28:01] NTP on ortelius is OK: NTP OK: Offset -0.018089 secs [13:44:11] Sun Grid Engine execd on ortelius is CRITICAL: short@ortelius in unknown state: all.q@ortelius in unknown state [14:05:25] 3(created) [ACCAPP-439] Creating tools for automated template:book filling in articles; Account Approval; New Account <10https://jira.toolserver.org/browse/ACCAPP-439> (Engraver) [14:07:42] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [14:08:41] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [14:11:12] ethernet 0/1/18 on asw-oe16-esams.mgmt is CRITICAL: GigabitEthernet0/1/18:UP: 1 int NOK : CRITICAL [14:12:10] ethernet 0/1/24 on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/24:UP: 1 int NOK : CRITICAL [14:15:20] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [14:15:30] SSH on nightshade.mgmt is CRITICAL: Server answer: [14:20:00] Sun Grid Engine execd on nightshade is CRITICAL: all.q@nightshade in unknown state: longrun@nightshade in unknown state [14:25:11] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54010 MB (5% inode=99%): [14:45:11] Sun Grid Engine execd on ortelius is CRITICAL: short@ortelius in unknown state: all.q@ortelius in unknown state [15:07:42] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:08:43] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:11:34] ethernet 0/1/18 on asw-oe16-esams.mgmt is CRITICAL: GigabitEthernet0/1/18:UP: 1 int NOK : CRITICAL [15:12:34] ethernet 0/1/24 on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/24:UP: 1 int NOK : CRITICAL [15:15:34] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [15:15:42] SSH on nightshade.mgmt is CRITICAL: Server answer: [15:20:34] Sun Grid Engine execd on nightshade is CRITICAL: all.q@nightshade in unknown state: longrun@nightshade in unknown state [15:24:38] ok guys, I wil now reboot nightshade as promised [15:25:34] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 53836 MB (5% inode=99%): [15:32:36] DaBPunkt: it takes time to reboot :D - when will it be available ? [15:33:23] few minutes. I have to work in the BIOS [15:33:46] okay, thank you [15:45:32] Sun Grid Engine execd on ortelius is CRITICAL: short@ortelius in unknown state: all.q@ortelius in unknown state [15:55:32] ok, nightshade is back now, should not get another reboot [15:58:30] cool, I can renew my account now :D [16:08:41] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:09:41] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:11:33] ethernet 0/1/18 on asw-oe16-esams.mgmt is CRITICAL: GigabitEthernet0/1/18:UP: 1 int NOK : CRITICAL [16:13:31] ethernet 0/1/24 on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/24:UP: 1 int NOK : CRITICAL [16:15:41] SSH on nightshade.mgmt is CRITICAL: Server answer: [16:16:32] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [16:21:32] Sun Grid Engine execd on nightshade is CRITICAL: all.q@nightshade in unknown state: longrun@nightshade in unknown state [16:22:31] nosy: you have a minute? [16:24:34] [[Scs-oe10]] 10https://wiki.toolserver.org/w/index.php?diff=6560&oldid=6559&rcid=8627 * DaB * (+32) (+fsw2) [16:25:34] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 53692 MB (5% inode=99%): [16:45:33] Sun Grid Engine execd on ortelius is CRITICAL: short@ortelius in unknown state: all.q@ortelius in unknown state [16:51:01] fisheye.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 272 bytes in 16.518 second response time [16:55:01] fisheye.toolserver.org on web.amaranth is OK: HTTP OK: HTTP/1.1 200 OK - 274 bytes in 11.289 second response time [17:08:42] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:09:41] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:11:33] ethernet 0/1/18 on asw-oe16-esams.mgmt is CRITICAL: GigabitEthernet0/1/18:UP: 1 int NOK : CRITICAL [17:13:32] ethernet 0/1/24 on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/24:UP: 1 int NOK : CRITICAL [17:15:42] SSH on nightshade.mgmt is CRITICAL: Server answer: [17:17:32] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [17:21:31] Sun Grid Engine execd on nightshade is CRITICAL: all.q@nightshade in unknown state: longrun@nightshade in unknown state [17:26:33] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54380 MB (5% inode=99%): [17:45:42] Sun Grid Engine execd on ortelius is CRITICAL: short@ortelius in unknown state: all.q@ortelius in unknown state [18:06:42] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.656738/1.75, alarm hl:np_load_avg=0.658691/2.00, alarm hl:mem_free=286.000000M/300M [18:07:41] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [18:08:42] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:09:42] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:11:42] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.829102/1.75, alarm hl:np_load_avg=0.689453/2.00, alarm hl:mem_free=249.000000M/300M: longrun@willow exceedes load threshold: alarm hl:np_load_short=0.829102/1.50, alarm hl:np_load_long=0.613769/1.75, alarm hl:mem_free=249.000000M/250M [18:12:31] ethernet 0/1/18 on asw-oe16-esams.mgmt is CRITICAL: GigabitEthernet0/1/18:UP: 1 int NOK : CRITICAL [18:13:31] ethernet 0/1/24 on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/24:UP: 1 int NOK : CRITICAL [18:16:42] SSH on nightshade.mgmt is CRITICAL: Server answer: [18:17:31] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [18:21:31] Sun Grid Engine execd on nightshade is CRITICAL: all.q@nightshade in unknown state: longrun@nightshade in unknown state [18:27:31] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54341 MB (5% inode=99%): [18:45:41] Sun Grid Engine execd on ortelius is CRITICAL: short@ortelius in unknown state: all.q@ortelius in unknown state [19:08:42] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:09:42] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:12:31] ethernet 0/1/18 on asw-oe16-esams.mgmt is CRITICAL: GigabitEthernet0/1/18:UP: 1 int NOK : CRITICAL [19:13:31] ethernet 0/1/24 on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/24:UP: 1 int NOK : CRITICAL [19:16:41] SSH on nightshade.mgmt is CRITICAL: Server answer: [19:17:41] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [19:21:31] Sun Grid Engine execd on nightshade is CRITICAL: all.q@nightshade in unknown state: longrun@nightshade in unknown state [19:27:31] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54213 MB (5% inode=99%): [19:37:54] zzz =_= [19:46:42] Sun Grid Engine execd on ortelius is CRITICAL: short@ortelius in unknown state: all.q@ortelius in unknown state [20:00:43] [[Special:Log/newusers]] create 10 * Punchtv * (New user account) [20:08:31] /aux0 on daphne is CRITICAL: DISK CRITICAL - free space: /aux0 40955 MB (4% inode=99%): [20:08:41] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:09:42] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:12:32] ethernet 0/1/18 on asw-oe16-esams.mgmt is CRITICAL: GigabitEthernet0/1/18:UP: 1 int NOK : CRITICAL [20:13:31] ethernet 0/1/24 on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/24:UP: 1 int NOK : CRITICAL [20:16:42] SSH on nightshade.mgmt is CRITICAL: Server answer: [20:17:42] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [20:21:31] Sun Grid Engine execd on nightshade is CRITICAL: all.q@nightshade in unknown state: longrun@nightshade in unknown state [20:27:31] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 54058 MB (5% inode=99%): [20:45:46] [[Special:Log/newusers]] create 10 * Sxynblond2 * (New user account) [20:46:42] Sun Grid Engine execd on ortelius is CRITICAL: short@ortelius in unknown state: all.q@ortelius in unknown state [21:08:52] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:10:42] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:12:31] ethernet 0/1/18 on asw-oe16-esams.mgmt is CRITICAL: GigabitEthernet0/1/18:UP: 1 int NOK : CRITICAL [21:13:31] ethernet 0/1/24 on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/24:UP: 1 int NOK : CRITICAL [21:17:42] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [21:17:42] SSH on nightshade.mgmt is CRITICAL: Server answer: [21:21:31] Sun Grid Engine execd on nightshade is CRITICAL: all.q@nightshade in unknown state: longrun@nightshade in unknown state [21:27:31] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 53915 MB (5% inode=99%): [21:46:42] Sun Grid Engine execd on ortelius is CRITICAL: short@ortelius in unknown state: all.q@ortelius in unknown state [22:08:52] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:11:40] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:12:42] ethernet 0/1/18 on asw-oe16-esams.mgmt is CRITICAL: GigabitEthernet0/1/18:UP: 1 int NOK : CRITICAL [22:13:30] ethernet 0/1/24 on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/24:UP: 1 int NOK : CRITICAL [22:17:42] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [22:17:42] SSH on nightshade.mgmt is CRITICAL: Server answer: [22:21:31] Sun Grid Engine execd on nightshade is CRITICAL: all.q@nightshade in unknown state: longrun@nightshade in unknown state [22:27:31] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 53789 MB (5% inode=99%): [22:42:12] hello guys. I just would like to let you know, that I am back home :-). See/read you tomorrow [22:45:31] Sun Grid Engine execd on nightshade is OK: all.q@nightshade OK: longrun@nightshade OK [22:45:42] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [22:45:42] SMF on ortelius is OK: OK - all services online [22:49:20] 3(created) [MNT-1174] Restarted SGE-services on nightshade and ortelius; Maintenance; Emergency work <10https://jira.toolserver.org/browse/MNT-1174> (DaB.) [22:49:26] 3(updated) [MNT-1174] Restarted SGE-services on nightshade and ortelius <10https://jira.toolserver.org/browse/MNT-1174> (DaB.) [23:09:52] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [23:11:41] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [23:12:42] ethernet 0/1/18 on asw-oe16-esams.mgmt is CRITICAL: GigabitEthernet0/1/18:UP: 1 int NOK : CRITICAL [23:13:31] ethernet 0/1/24 on asw-oe10-esams.mgmt is CRITICAL: GigabitEthernet0/1/24:UP: 1 int NOK : CRITICAL [23:17:42] SSH on nightshade.mgmt is CRITICAL: Server answer: [23:27:31] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 53681 MB (5% inode=99%):