[00:02:48] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.780762/1.75, alarm hl:np_load_avg=0.672852/2.00, alarm hl:mem_free=260.000000M/300M [00:03:56] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [00:35:28] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:35:57] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 6401 MB (0% inode=99%): [00:36:36] SSH on nightshade.mgmt is CRITICAL: Server answer: [00:39:37] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:52:20] 3(created) [UTRS-52] Find a way to add a notice that javascript is required, if it's disabled; UTRS; Minor New Feature <10https://jira.toolserver.org/browse/UTRS-52> (Hersfold) [00:56:20] 3(created) [UTRS-53] Emails invalid; UTRS; Bug <10https://jira.toolserver.org/browse/UTRS-53> (DeltaQuad) [00:57:46] PostgreSQL on ptolemy is CRITICAL: CRITICAL - no connection to osm_mapnik (FATAL: the database system is in recovery mode [01:05:47] PostgreSQL on ptolemy is OK: OK - database osm_mapnik (0 sec.) [01:12:57] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.653809/1.75, alarm hl:np_load_avg=0.629394/2.00, alarm hl:mem_free=283.000000M/300M [01:13:56] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [01:35:37] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:36:08] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44616 MB (4% inode=99%): [01:36:37] SSH on nightshade.mgmt is CRITICAL: Server answer: [01:37:57] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.340820/1.75, alarm hl:np_load_avg=0.358887/2.00, alarm hl:mem_free=274.000000M/300M [01:38:56] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [01:39:37] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:41:56] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.417969/1.75, alarm hl:np_load_avg=0.386719/2.00, alarm hl:mem_free=232.000000M/300M: longrun@willow exceedes load threshold: alarm hl:np_load_short=0.417969/1.50, alarm hl:np_load_long=0.419922/1.75, alarm hl:mem_free=232.000000M/250M [01:43:21] 3(created) [OSM-6] mapnik tirex-backend occasionally failing; OSM; Bug <10https://jira.toolserver.org/browse/OSM-6> (Kai Krueger) [01:58:26] 3(commented) [OSM-3] Ptolemy Postgres crashed twice <10https://jira.toolserver.org/browse/OSM-3> (Kai Krueger) [02:06:48] MySQL slave on rosemary is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1928 [02:07:36] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 1977.000000 [02:26:03] @replag [02:26:04] feedintm: s1-sec: 51m 31s [-]; s3-rr: 4m 9s [-]; s3-user: 4m 9s [-]; s5-rr: 15s [-]; s5-user: 15s [-] [02:34:49] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3608 [02:35:38] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 3661.000000 [02:35:38] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:37:06] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44453 MB (4% inode=99%): [02:37:37] SSH on nightshade.mgmt is CRITICAL: Server answer: [02:39:37] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:54:57] PostgreSQL on ptolemy is CRITICAL: CRITICAL - no connection to osm_mapnik (FATAL: the database system is in recovery mode [03:02:57] PostgreSQL on ptolemy is OK: OK - database osm_mapnik (0 sec.) [03:08:56] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.531250/1.75, alarm hl:np_load_avg=0.584961/2.00, alarm hl:mem_free=235.000000M/300M: longrun@willow exceedes load threshold: alarm hl:np_load_short=0.531250/1.50, alarm hl:np_load_long=0.570312/1.75, alarm hl:mem_free=235.000000M/250M [03:09:57] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [03:12:57] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.707031/1.75, alarm hl:np_load_avg=0.699219/2.00, alarm hl:mem_free=236.000000M/300M: longrun@willow exceedes load threshold: alarm hl:np_load_short=0.707031/1.50, alarm hl:np_load_long=0.625488/1.75, alarm hl:mem_free=236.000000M/250M [03:24:47] Sun Grid Engine execd on nightshade is WARNING: all.q@nightshade exceedes load threshold: alarm hl:np_load_short=0.728516/1.75, alarm hl:np_load_avg=0.821777/2.00, alarm hl:mem_free=257.000000M/300M [03:25:47] Sun Grid Engine execd on nightshade is OK: all.q@nightshade OK: longrun@nightshade OK [03:28:56] PostgreSQL on ptolemy is CRITICAL: CRITICAL - no connection to osm_mapnik (FATAL: the database system is in recovery mode [03:35:37] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4950.000000 [03:35:47] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:35:47] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 4956 [03:35:56] PostgreSQL on ptolemy is OK: OK - database osm_mapnik (0 sec.) [03:37:07] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44311 MB (4% inode=99%): [03:37:37] SSH on nightshade.mgmt is CRITICAL: Server answer: [03:39:47] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:51:07] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:51:57] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.013 second response time [04:02:07] MySQL slave on z-dat-s7-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1935 [04:03:07] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.516113/1.75, alarm hl:np_load_avg=0.537598/2.00, alarm hl:mem_free=295.000000M/300M [04:03:07] MySQL slave on z-dat-s3-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1952 [04:05:07] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [04:05:20] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:15:37] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3545.000000 [04:15:46] MySQL slave on rosemary is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3525 [04:16:07] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.526367/1.75, alarm hl:np_load_avg=0.480957/2.00, alarm hl:mem_free=203.000000M/300M: longrun@willow exceedes load threshold: alarm hl:np_load_short=0.526367/1.50, alarm hl:np_load_long=0.471191/1.75, alarm hl:mem_free=203.000000M/250M [04:16:12] @replag [04:16:13] JeffGq: s1-sec: 57m 18s [+0.05 s/s]; s3-rr: 45m 5s [+0.37 s/s]; s3-user: 45m 5s [+0.37 s/s]; s6-rr: 27m 3s [-]; s6-user: 27m 3s [-]; s7-rr: 44m 16s [-]; s7-user: 44m 16s [-] [04:23:07] MySQL slave on z-dat-s6-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1891 [04:35:08] toolserver.org HTTP on wolfsbane is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 1.138 second response time [04:35:47] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:36:07] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.003 second response time [04:36:07] MySQL slave on z-dat-s3-a is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3614 [04:37:07] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44213 MB (4% inode=99%): [04:37:37] SSH on nightshade.mgmt is CRITICAL: Server answer: [04:38:36] s1 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1786.000000 [04:38:46] MySQL slave on rosemary is OK: Uptime: 7727582 Threads: 42 Questions: 2328080983 Slow queries: 975968 Opens: 11595 Flush tables: 1 Open tables: 918 Queries per second avg: 301.269 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1770 [04:39:56] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:49:07] MySQL slave on z-dat-s7-a is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3634 [04:56:07] MySQL slave on z-dat-s7-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3569 [05:01:08] MySQL slave on z-dat-s7-a is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3631 [05:03:07] toolserver.org HTTP on wolfsbane is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 1.164 second response time [05:10:09] MySQL slave on z-dat-s6-a is OK: Uptime: 51707 Threads: 18 Questions: 4098208 Slow queries: 2752 Opens: 50085 Flush tables: 1 Open tables: 594 Queries per second avg: 79.258 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1774 [05:14:07] MySQL slave on z-dat-s6-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1856 [05:21:08] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.369141/1.75, alarm hl:np_load_avg=0.387207/2.00, alarm hl:mem_free=269.000000M/300M [05:25:07] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [05:29:08] MySQL slave on z-dat-s6-a is OK: Uptime: 52849 Threads: 19 Questions: 4143275 Slow queries: 2952 Opens: 50085 Flush tables: 1 Open tables: 594 Queries per second avg: 78.398 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1779 [05:35:47] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:36:07] MySQL slave on z-dat-s3-a is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 5222 [05:37:08] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 45012 MB (4% inode=99%): [05:37:37] SSH on nightshade.mgmt is CRITICAL: Server answer: [05:39:56] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:50:07] MySQL slave on z-dat-s7-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3576 [06:02:48] Sun Grid Engine execd on nightshade is WARNING: longrun@nightshade exceedes load threshold: alarm hl:np_load_short=1.729492/1.50, alarm hl:np_load_long=1.049316/1.75, alarm hl:mem_free=396.000000M/250M [06:03:07] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [06:04:47] Sun Grid Engine execd on nightshade is OK: all.q@nightshade OK: longrun@nightshade OK [06:05:08] MySQL slave on z-dat-s3-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3579 [06:16:07] MySQL slave on z-dat-s3-a is OK: Uptime: 55674 Threads: 19 Questions: 46562060 Slow queries: 3537 Opens: 188584 Flush tables: 1 Open tables: 16384 Queries per second avg: 836.334 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1706 [06:17:07] MySQL slave on z-dat-s7-a is OK: Uptime: 55730 Threads: 6 Questions: 5737507 Slow queries: 2852 Opens: 1554 Flush tables: 1 Open tables: 1427 Queries per second avg: 102.951 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1394 [06:35:48] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:37:38] SSH on nightshade.mgmt is CRITICAL: Server answer: [06:38:07] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44888 MB (4% inode=99%): [06:39:56] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:45:09] Sun Grid Engine execd on ortelius is WARNING: short@ortelius exceedes load threshold: alarm hl:np_load_short=1.041016/1.00, alarm hl:np_load_long=0.544922/1.50, alarm hl:mem_free=21383.000000M/300M [06:46:07] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [07:03:07] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [07:34:08] Sun Grid Engine execd on ortelius is WARNING: short@ortelius exceedes load threshold: alarm hl:np_load_short=1.060547/1.00, alarm hl:np_load_long=0.792969/1.50, alarm hl:mem_free=22209.000000M/300M [07:35:07] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [07:35:48] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:37:38] SSH on nightshade.mgmt is CRITICAL: Server answer: [07:38:07] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44762 MB (4% inode=99%): [07:39:56] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:03:07] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [08:19:47] /aux0 on daphne is CRITICAL: DISK CRITICAL - free space: /aux0 19951 MB (2% inode=99%): [08:35:58] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:37:38] SSH on nightshade.mgmt is CRITICAL: Server answer: [08:38:17] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44633 MB (4% inode=99%): [08:39:57] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:03:07] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [09:36:00] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:37:44] SSH on nightshade.mgmt is CRITICAL: Server answer: [09:38:25] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44498 MB (4% inode=99%): [09:40:25] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:03:25] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [10:03:26] PostgreSQL on ptolemy is CRITICAL: CRITICAL - no connection to osm_mapnik (FATAL: the database system is in recovery mode [10:03:26] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.418945/1.75, alarm hl:np_load_avg=0.506348/2.00, alarm hl:mem_free=201.000000M/300M: longrun@willow exceedes load threshold: alarm hl:np_load_short=0.418945/1.50, alarm hl:np_load_long=0.515625/1.75, alarm hl:mem_free=201.000000M/250M [10:09:25] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [10:12:25] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.448242/1.75, alarm hl:np_load_avg=0.413574/2.00, alarm hl:mem_free=225.000000M/300M: longrun@willow exceedes load threshold: alarm hl:np_load_short=0.448242/1.50, alarm hl:np_load_long=0.452637/1.75, alarm hl:mem_free=225.000000M/250M [10:12:25] PostgreSQL on ptolemy is OK: OK - database osm_mapnik (0 sec.) [10:36:05] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:37:45] SSH on nightshade.mgmt is CRITICAL: Server answer: [10:38:25] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44356 MB (4% inode=99%): [10:40:25] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:03:24] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [11:36:05] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:37:45] SSH on nightshade.mgmt is CRITICAL: Server answer: [11:38:25] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44217 MB (4% inode=99%): [11:40:25] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:03:25] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [12:12:25] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.545898/1.75, alarm hl:np_load_avg=0.527832/2.00, alarm hl:mem_free=239.000000M/300M: longrun@willow exceedes load threshold: alarm hl:np_load_short=0.545898/1.50, alarm hl:np_load_long=0.522461/1.75, alarm hl:mem_free=239.000000M/250M [12:13:25] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [12:36:05] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:37:45] SSH on nightshade.mgmt is CRITICAL: Server answer: [12:38:26] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 45002 MB (4% inode=99%): [12:40:36] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:03:27] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [13:15:20] 3(created) [MNT-1184] Restarted Zeus on wolfsbane; Maintenance; Emergency work <10https://jira.toolserver.org/browse/MNT-1184> (DaB.) [13:17:20] 3(updated) [MNT-1184] Restarted Zeus on wolfsbane <10https://jira.toolserver.org/browse/MNT-1184> (DaB.) [13:36:18] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:37:47] SSH on nightshade.mgmt is CRITICAL: Server answer: [13:38:37] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44837 MB (4% inode=99%): [13:40:37] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:44:38] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.308594/1.75, alarm hl:np_load_avg=0.386719/2.00, alarm hl:mem_free=264.000000M/300M [13:51:37] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [14:03:27] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [14:17:47] Sun Grid Engine execd on ortelius is WARNING: short@ortelius exceedes load threshold: alarm hl:np_load_short=1.016601/1.00, alarm hl:np_load_long=0.791992/1.50, alarm hl:mem_free=21810.000000M/300M [14:18:47] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [14:36:16] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [14:37:57] SSH on nightshade.mgmt is CRITICAL: Server answer: [14:38:47] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44673 MB (4% inode=99%): [14:40:47] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:03:27] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [15:12:57] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.406738/1.75, alarm hl:np_load_avg=0.381836/2.00, alarm hl:mem_free=206.000000M/300M: longrun@willow exceedes load threshold: alarm hl:np_load_short=0.406738/1.50, alarm hl:np_load_long=0.388672/1.75, alarm hl:mem_free=206.000000M/250M [15:18:57] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [15:36:18] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:38:17] SSH on nightshade.mgmt is CRITICAL: Server answer: [15:38:47] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44511 MB (4% inode=99%): [15:41:17] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:47:57] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.292480/1.75, alarm hl:np_load_avg=0.291992/2.00, alarm hl:mem_free=285.000000M/300M [15:49:57] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [16:03:27] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [16:36:27] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:38:17] SSH on nightshade.mgmt is CRITICAL: Server answer: [16:38:47] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44326 MB (4% inode=99%): [16:41:17] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:58:56] PostgreSQL on ptolemy is CRITICAL: CRITICAL - no connection to osm_mapnik (FATAL: the database system is in recovery mode [17:03:27] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [17:06:57] PostgreSQL on ptolemy is OK: OK - database osm_mapnik (0 sec.) [17:36:29] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:38:17] SSH on nightshade.mgmt is CRITICAL: Server answer: [17:38:47] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44153 MB (4% inode=99%): [17:41:17] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:03:39] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [18:14:08] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.424316/1.75, alarm hl:np_load_avg=0.460449/2.00, alarm hl:mem_free=291.000000M/300M [18:15:07] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [18:36:27] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:38:17] SSH on nightshade.mgmt is CRITICAL: Server answer: [18:38:47] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44863 MB (4% inode=99%): [18:41:28] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:04:38] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [19:13:07] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.651856/1.75, alarm hl:np_load_avg=0.599609/2.00, alarm hl:mem_free=244.000000M/300M: longrun@willow exceedes load threshold: alarm hl:np_load_short=0.651856/1.50, alarm hl:np_load_long=0.528809/1.75, alarm hl:mem_free=244.000000M/250M [19:13:22] 3(commented) [ACCAPP-435] Running the bot for cleanup tasks like spell-check, transliterate, etc. <10https://jira.toolserver.org/browse/ACCAPP-435> [19:14:07] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [19:36:38] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:38:27] SSH on nightshade.mgmt is CRITICAL: Server answer: [19:38:46] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44676 MB (4% inode=99%): [19:41:27] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:04:37] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [20:13:08] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.576172/1.75, alarm hl:np_load_avg=0.497559/2.00, alarm hl:mem_free=227.000000M/300M: longrun@willow exceedes load threshold: alarm hl:np_load_short=0.576172/1.50, alarm hl:np_load_long=0.487793/1.75, alarm hl:mem_free=227.000000M/250M [20:14:07] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [20:20:27] /aux0 on daphne is CRITICAL: DISK CRITICAL - free space: /aux0 19905 MB (2% inode=99%): [20:20:47] any ts admin here? [20:21:02] DaBPunkt is here [20:22:24] DaBPunkt: do you have a sec? [20:23:05] sure [20:23:11] DaBPunkt: are you the DaB user on jira.toolserver.org? [20:23:35] shantanoo: yes [20:23:51] I tried to run a sql query abd got ERROR 1045 (28000): Access denied for user 'matanya'@'damiana-bge0.esi.toolserver.org' (using password: YES) [20:23:59] what is it? [20:24:17] DaBPunkt: i have filled the information requested on https://jira.toolserver.org/browse/ACCAPP-435 [20:24:34] can you create the account? [20:24:34] matanya: which database and which server did you try? [20:24:53] hewiki_p on nightshade [20:25:15] matanya: wait a moment, I will look [20:27:06] matanya: wait a moment, you miss some grants AFAIS [20:27:18] oh, ok [20:28:29] DaBPunkt: back. got disconnected. [20:29:41] matanya: try now [20:30:10] works. thank you [20:30:19] ok. np [20:31:10] DaBPunkt: should wait for another 15-20 mins to verify that i can login or will it more than that to get the account activated? [20:31:26] shantanoo: I look at it now [20:31:51] DaBPunkt: ok. i will wait... [20:32:00] I am a old serial modell and can only do 2 things at the same time – 1 of them is breathing by default [20:33:16] DaBPunkt: :) [20:33:47] realy nice looking chars :) [20:35:39] done. check your inbox [20:35:52] checking [20:36:40] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:38:29] SSH on nightshade.mgmt is CRITICAL: Server answer: [20:38:47] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44492 MB (4% inode=99%): [20:38:47] MySQL slave on z-dat-s3-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1876 [20:38:55] @replag [20:38:55] DaBPunkt: s1-sec: 19m 9s [-0.04 s/s]; s1-sec-c: 15m 3s [-]; s3-rr: 31m 53s [-0.01 s/s]; s3-user: 31m 53s [-0.01 s/s]; s4-rr: 15m 3s [-]; s6-rr: 16m 8s [-0.01 s/s]; s6-user: 16m 8s [-0.01 s/s]; s7-rr: 14m 10s [-0.03 s/s] [20:38:56] DaBPunkt: s7-user: 14m 10s [-0.03 s/s] [20:41:27] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:41:42] DaBPunkt: trying 'ssh -i ~/.ssh/toolserver.org.id_rsa.pub shantanoo@toolserver.org'. but its asking me the password even though i did not set the password while creating ssh-key. [20:42:12] try nightshade.toolserver.org or willow.toolserver.org [20:42:45] is login.toolserver.org no longer the preferred hostname? [20:43:22] nightshade worked. [20:43:23] thanks [20:43:55] a ton :) [20:44:40] valhalla1w: became depreached 20 July 2009 [20:45:58] oh. right. [20:46:31] * valhalla1w never remembers the plant names without googling, and therefore still sometimes resorts to login ;-) [20:48:24] valhalla1w: "willow" is short enough to remember (tehere is also a quite good fantasy film with that name) [20:51:43] *restarting nagios* [20:53:03] SSH on nightshade.mgmt is CRITICAL: Server answer: [20:53:03] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44470 MB (4% inode=99%): [20:53:13] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:53:25] 3(created) [MNT-1185] Corrected nagios HTTP-check for wolfsbane and ortelius; Maintenance; Minor work <10https://jira.toolserver.org/browse/MNT-1185> (DaB.) [20:54:23] MySQL slave on z-dat-s3-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2307 [20:59:20] 3(closed) [MNT-1185] Corrected nagios HTTP-check for wolfsbane and ortelius <10https://jira.toolserver.org/browse/MNT-1185> (DaB.) [21:01:46] zzz [21:05:07] web is not running? [21:05:15] can't load my pages [21:05:33] neither http nor https [21:05:39] mm [21:07:04] looks like a problem with turnera again [21:07:46] :-( [21:08:10] 504 Gateway Time-out [21:08:23] on https [21:08:26] if that matters [21:08:33] DaBPunkt: nightshade and willow are again asking for password...:( [21:08:42] ouch [21:08:46] looks like a ldap or dns-problem again [21:10:32] oh. seems to be dns [21:12:06] need to sleep now...its 0241 HRS...will look at it tomorrow. [21:12:22] NFS server ha-nfs.esi not responding still trying [21:12:29] DaBPunkt: thanks again for you help. [21:13:01] 500 Server Error now [21:14:32] I think it's nfs [21:14:52] there's a wait between SSH2_MSG_SERVICE_ACCEPT received and offering authentications [21:15:00] <|Danny_B|> now all my putty windows are frozen [21:15:05] (plus it's rejecting the key) [21:15:33] <|Danny_B|> Platonides: yes, it obviously is, i reported the message above [21:16:02] |Danny_B|, shanoo was saying it was dns :P [21:16:58] * Merlissimo is happy that his bot has an hard encoded ip as fallback which is used at an unknownhosteception happens for editing at wikipedia [21:17:26] <|Danny_B|> this kind of error appeared already couple months ago [21:17:48] <|Danny_B|> and seems the frequency gets more often [21:17:57] looks like turnera is starting to pull the services over [21:18:17] SSH on nightshade.mgmt is CRITICAL: Server answer: [21:18:17] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44370 MB (4% inode=99%): [21:18:22] |Danny_B|: there is a network-problem somewhere which threads damiana and turnera [21:18:26] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:18:30] I will add a direct cable next visit [21:18:30] acc is giving 404 [21:18:51] http://toolserver.org/~acc/acc.php Page not found [21:19:23] another problem is that river left without writing down some paswords… [21:19:40] looks like putty is working again? [21:19:47] Cluster on turnera.esi is CRITICAL: damiana FAILED, damiana:nge1-turnera:nge1 faulted, damiana:nge0-turnera:nge0 faulted, vote damiana Offline, check nfs-hasp, nfs-home Online, [21:20:37] huh, i thought you have psses written somewhere [21:20:47] Cluster on turnera.esi is WARNING: damiana:nge0-turnera:nge0 waiting, check nfs-hasp, [21:21:39] Danny_B|backup: we have, but none of them are accepted for the managment-console of damiana and turnera [21:22:17] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:22:47] Cluster on damiana.esi is WARNING: damiana:nge0-turnera:nge0 faulted, check nfs-hasp, [21:29:29] MySQL slave on z-dat-s3-a is CRITICAL: (Return code of 139 is out of bounds) [21:36:48] Cluster on turnera.esi is OK: CLUSTER OK ! check nfs-hasp, [21:36:48] Cluster on damiana.esi is OK: CLUSTER OK ! check nfs-hasp, [21:39:28] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.267090/1.75, alarm hl:np_load_avg=0.288574/2.00, alarm hl:mem_free=235.000000M/300M: longrun@willow exceedes load threshold: alarm hl:np_load_short=0.267090/1.50, alarm hl:np_load_long=0.266113/1.75, alarm hl:mem_free=235.000000M/250M [21:41:28] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [21:50:01] DaBPunkt less busy? [22:18:18] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44171 MB (4% inode=99%): [22:18:18] SSH on nightshade.mgmt is CRITICAL: Server answer: [22:18:28] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:22:17] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:29:13] toolserver: 03dab * r1130 10/trunk/ts-specs/ (patches/perl512-kstat-01.diff perl512/TSperl512-Kstat.spec): -Added Solaris::Kstat. [22:29:29] MySQL slave on z-dat-s3-a is CRITICAL: (Return code of 139 is out of bounds) [22:58:57] MySQL slave on z-dat-s7-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1860 [23:07:49] @replag [23:07:50] robla: s1-sec: 3m 59s [-]; s3-rr: 1h 52m 3s [-]; s3-user: 1h 52m 3s [-]; s6-rr: 24m 14s [-]; s6-user: 24m 14s [-]; s7-rr: 31m 57s [-]; s7-user: 31m 57s [-] [23:12:09] toolserver: 03dab * r1131 10/trunk/ts-specs/ (patches/check_mem-01.diff sysutils/TSnagios-check-mem.spec): -Add a nagios-check for (free) memory. [23:13:01] *restarting nagios* [23:14:38] SSH on nightshade.mgmt is CRITICAL: Server answer: [23:14:38] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 44018 MB (4% inode=99%): [23:16:16] MySQL slave on z-dat-s7-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2179 [23:16:27] Free Memory on turnera.esi is CRITICAL: CRITICAL - 9.5% (395544 kB) free! [23:18:27] Free Memory on turnera.esi is WARNING: WARNING - 11.0% (458460 kB) free! [23:20:27] MySQL slave on z-dat-s6-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1900 [23:23:56] PostgreSQL on ptolemy is CRITICAL: CRITICAL - no connection to osm_mapnik (FATAL: the database system is in recovery mode [23:29:14] PostgreSQL on ptolemy is CRITICAL: CRITICAL - no connection to osm_mapnik (FATAL: the database system is in recovery mode [23:29:35] MySQL slave on z-dat-s7-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2209 [23:29:44] Free Memory on turnera.esi is WARNING: WARNING - 10.2% (428448 kB) free! [23:29:44] MySQL slave on z-dat-s6-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2057 [23:32:13] PostgreSQL on ptolemy is OK: OK - database osm_mapnik (0 sec.) [23:32:40] What's going on ? My tables no longer exist ? [23:32:55] Table 'u_krinkle.querycache_info' doesn't exist [23:33:00] s4-user [23:33:19] Krinkle: there is no s4-user at the moment. see the ml [23:34:30] !wmml toolserver-l [23:34:37] http://bit.ly/toolserverLast http://bit.ly/toolserverMonth [23:37:02] hm.. guess that doesn't leave much choice then [23:37:15] DaBPunkt: Is there a way to at least have read-only access to s4-user ? [23:38:21] no. there is data left on the partition [23:38:31] it need a complete re-setup [23:38:41] and that is going to happen tomorrow ? [23:39:04] yes [23:39:23] ok [23:40:01] would be nice if there was something I could call inside my tools so that this kind of information would be available by default (now the commons tools just fall on their face, unless I manually add a notice for now) [23:41:42] Krinkle: http://status.toolserver.org/ in a way like this? [23:41:44] Free Memory on turnera.esi is CRITICAL: CRITICAL - 10.0% (418292 kB) free! [23:41:46] Krinkle, the db status files [23:41:58] see for instance http://toolserver.org/~platonides/CrosswikiTitles/CrosswikiTitles.php [23:42:25] DaBPunkt: That could probably be used as input. I mean more like a human readable sentence [23:42:35] I guess the hole status-thing need a renovation [23:42:53] Krinkle, you can use this class: /home/platonides/public_html/common/status.php [23:43:20] My commons tools, for instance, do a join to u_krinkle on s4-user. That tool simply won't work at all. [23:43:34] but the query does run for a few seconds apparently [23:43:53] MySQL slave on z-dat-s6-a is OK: Uptime: 118539 Threads: 15 Questions: 12601038 Slow queries: 8426 Opens: 58197 Flush tables: 1 Open tables: 1642 Queries per second avg: 106.302 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1799 [23:43:57] nvm. it's something else that keeps the page stalling [23:44:05] that query exits right away (naturally) [23:48:33] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:48:43] Free Memory on turnera.esi is WARNING: WARNING - 12.2% (508816 kB) free! [23:49:03] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.375000/1.75, alarm hl:np_load_avg=0.410645/2.00, alarm hl:mem_free=198.000000M/300M: longrun@willow exceedes load threshold: alarm hl:np_load_short=0.375000/1.50, alarm hl:np_load_long=0.389160/1.75, alarm hl:mem_free=198.000000M/250M [23:50:22] Cluster on turnera.esi is CRITICAL: check nfs-hasp, nfs-home Online, [23:50:23] Cluster on damiana.esi is CRITICAL: check nfs-hasp, nfs-home Online, [23:51:23] Cluster on turnera.esi is OK: CLUSTER OK ! check nfs-hasp, [23:52:22] Cluster on damiana.esi is OK: CLUSTER OK ! check nfs-hasp, [23:58:43] Free Memory on turnera.esi is CRITICAL: CRITICAL - 8.2% (341840 kB) free!