[00:03:44] PROBLEM dpkg-check is now: CRITICAL on deployment-thumbproxy i-0000026b output: Connection refused by host [00:05:08] PROBLEM Current Load is now: CRITICAL on deployment-thumbproxy i-0000026b output: CHECK_NRPE: Error - Could not complete SSL handshake. [00:05:44] PROBLEM Current Users is now: CRITICAL on deployment-thumbproxy i-0000026b output: CHECK_NRPE: Error - Could not complete SSL handshake. [00:06:19] PROBLEM Disk Space is now: CRITICAL on deployment-thumbproxy i-0000026b output: CHECK_NRPE: Error - Could not complete SSL handshake. [00:06:54] PROBLEM Free ram is now: CRITICAL on deployment-thumbproxy i-0000026b output: CHECK_NRPE: Error - Could not complete SSL handshake. [00:08:44] RECOVERY dpkg-check is now: OK on deployment-thumbproxy i-0000026b output: All packages OK [00:10:04] RECOVERY Current Load is now: OK on deployment-thumbproxy i-0000026b output: OK - load average: 0.11, 0.91, 0.97 [00:10:44] RECOVERY Current Users is now: OK on deployment-thumbproxy i-0000026b output: USERS OK - 1 users currently logged in [00:11:14] RECOVERY Disk Space is now: OK on deployment-thumbproxy i-0000026b output: DISK OK [00:11:54] RECOVERY Free ram is now: OK on deployment-thumbproxy i-0000026b output: OK: 84% free memory [00:18:08] drdee: reportcard isn't working for some reason [00:34:19] New patchset: Hashar; "stop apache when having nginx thumb proxy" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7253 [00:34:34] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7253 [00:41:24] New patchset: Hashar; "ability to change thumbnail server name" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7255 [00:41:39] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7255 [00:50:55] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 17% free memory [00:56:45] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [01:00:29] !deployement-prep moving upload.beta.wmflabs.org from the non working instances back to the main entry point [01:04:45] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: DISK CRITICAL - free space: /home/dzahn 612 MB (3% inode=80%): [01:05:55] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5 output: Critical: 5% free memory [01:13:33] hashar_: s/deploye/deploy/ [01:15:55] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 60% free memory [01:20:15] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [01:23:38] RoanKattouw: thx :)D [01:23:45] RoanKattouw: we really need to switch to french [01:23:52] !deployment-prep moving upload.beta.wmflabs.org from the non working instances back to the main entry point [01:23:52] deployment-prep is a project to test mediawiki at beta.wmflabs.org before putting it to prod [01:24:02] !log deployment-prep moving upload.beta.wmflabs.org from the non working instances back to the main entry point [01:24:05] Logged the message, Master [01:28:15] !log deployment-prep Created upload2.beta.wmflabs.org to be the entry point for the "new" thumbnailing infrastructure [01:28:16] Logged the message, Master [01:34:44] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [02:27:02] PROBLEM Disk Space is now: CRITICAL on deployment-apache09 i-0000025e output: Connection refused or timed out [02:28:26] !log deployment-prep deleted all remaining deployment-apache instances : : You don't have enough free space in /var/cache/apt/archives/. So we really want to use m1.large , not m1.tiny pretending to save disk space :-D [02:28:28] Logged the message, Master [02:28:31] good [02:28:33] danke [02:43:33] !deployment-prep setting up "apache20" instance by using only puppet. We will see what happens :-D [02:43:33] deployment-prep is a project to test mediawiki at beta.wmflabs.org before putting it to prod [02:44:14] RECOVERY Current Load is now: OK on exim-test i-00000265 output: OK - load average: 0.92, 0.50, 0.18 [02:44:45] RECOVERY Disk Space is now: OK on exim-test i-00000265 output: DISK OK [02:44:45] RECOVERY Current Users is now: OK on exim-test i-00000265 output: USERS OK - 0 users currently logged in [02:45:04] PROBLEM Current Users is now: CRITICAL on deployment-apache20 i-0000026c output: Connection refused by host [02:45:44] PROBLEM Disk Space is now: CRITICAL on deployment-apache20 i-0000026c output: Connection refused by host [02:45:54] RECOVERY Free ram is now: OK on exim-test i-00000265 output: OK: 88% free memory [02:46:34] PROBLEM Free ram is now: CRITICAL on deployment-apache20 i-0000026c output: Connection refused by host [02:46:54] RECOVERY Total Processes is now: OK on exim-test i-00000265 output: PROCS OK: 81 processes [02:46:59] PROBLEM HTTP is now: CRITICAL on deployment-apache20 i-0000026c output: Connection refused [02:48:04] RECOVERY dpkg-check is now: OK on exim-test i-00000265 output: All packages OK [02:48:14] PROBLEM Total Processes is now: CRITICAL on deployment-apache20 i-0000026c output: Connection refused by host [02:48:27] it is installing [02:48:28] hopefully [02:48:30] :-D [02:49:24] PROBLEM Current Load is now: CRITICAL on deployment-apache20 i-0000026c output: Connection refused by host [02:49:44] PROBLEM dpkg-check is now: CRITICAL on deployment-apache20 i-0000026c output: Connection refused by host [03:06:58] RECOVERY HTTP is now: OK on deployment-apache20 i-0000026c output: HTTP OK: HTTP/1.1 200 OK - 453 bytes in 0.004 second response time [03:09:28] RECOVERY Current Load is now: OK on deployment-apache20 i-0000026c output: OK - load average: 1.34, 2.31, 2.39 [03:09:48] RECOVERY dpkg-check is now: OK on deployment-apache20 i-0000026c output: All packages OK [03:10:35] RECOVERY Current Users is now: OK on deployment-apache20 i-0000026c output: USERS OK - 2 users currently logged in [03:10:45] RECOVERY Disk Space is now: OK on deployment-apache20 i-0000026c output: DISK OK [03:11:35] RECOVERY Free ram is now: OK on deployment-apache20 i-0000026c output: OK: 93% free memory [03:11:45] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: DISK CRITICAL - free space: /home/dzahn 601 MB (3% inode=80%): [03:13:15] RECOVERY Total Processes is now: OK on deployment-apache20 i-0000026c output: PROCS OK: 138 processes [03:13:45] PROBLEM Current Load is now: CRITICAL on deployment-apache21 i-0000026d output: Connection refused by host [03:14:25] PROBLEM Current Users is now: CRITICAL on deployment-apache21 i-0000026d output: Connection refused by host [03:14:55] PROBLEM HTTP is now: WARNING on deployment-apache20 i-0000026c output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.005 second response time [03:15:05] PROBLEM Disk Space is now: CRITICAL on deployment-apache21 i-0000026d output: Connection refused by host [03:16:01] PROBLEM Free ram is now: CRITICAL on deployment-apache21 i-0000026d output: Connection refused by host [03:16:34] PROBLEM HTTP is now: CRITICAL on deployment-apache21 i-0000026d output: Connection refused [03:17:43] PROBLEM Total Processes is now: CRITICAL on deployment-apache21 i-0000026d output: Connection refused by host [03:18:28] PROBLEM dpkg-check is now: CRITICAL on deployment-apache21 i-0000026d output: Connection refused by host [03:21:21] RECOVERY HTTP is now: OK on deployment-apache21 i-0000026d output: HTTP OK: HTTP/1.1 200 OK - 453 bytes in 0.003 second response time [03:26:10] RECOVERY Disk Space is now: OK on deployment-feed i-00000118 output: DISK OK [03:30:08] petan: are you the correct person to ping for setting stuff on beta? [03:34:00] PROBLEM Disk Space is now: WARNING on deployment-feed i-00000118 output: DISK WARNING - free space: / 78 MB (5% inode=40%): [03:38:11] RECOVERY Current Load is now: OK on kripke i-00000268 output: OK - load average: 0.55, 0.34, 0.17 [03:38:11] RECOVERY Disk Space is now: OK on kripke i-00000268 output: DISK OK [03:38:41] RECOVERY dpkg-check is now: OK on kripke i-00000268 output: All packages OK [03:39:01] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 15% free memory [03:39:21] RECOVERY Current Users is now: OK on kripke i-00000268 output: USERS OK - 0 users currently logged in [03:39:21] RECOVERY Free ram is now: OK on kripke i-00000268 output: OK: 96% free memory [03:39:21] RECOVERY Total Processes is now: OK on kripke i-00000268 output: PROCS OK: 217 processes [03:40:21] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 16% free memory [03:41:03] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 14% free memory [03:47:42] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [03:47:42] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 15% free memory [03:55:39] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: DISK CRITICAL - free space: /home/dzahn 595 MB (3% inode=80%): [03:56:11] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 5% free memory [03:59:17] PROBLEM Disk Space is now: CRITICAL on deployment-feed i-00000118 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:59:17] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: CHECK_NRPE: Socket timeout after 10 seconds. [04:00:23] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 4% free memory [04:04:06] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 95% free memory [04:05:36] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [04:06:06] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:12:36] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 4% free memory [04:17:36] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:37:44] !log deployment-prep [~2 hrs ago] 02:43:33 < hashar_> !deployment-prep setting up "apache20" instance by using only puppet. We will see what happens :-D [04:37:45] Logged the message, Master [04:39:47] the new instances are quite large [04:44:06] PROBLEM Disk Space is now: WARNING on deployment-feed i-00000118 output: DISK WARNING - free space: / 78 MB (5% inode=40%): [04:45:45] PROBLEM Free ram is now: CRITICAL on dumps-8 i-0000026e output: Connection refused by host [04:46:55] PROBLEM Total Processes is now: CRITICAL on dumps-8 i-0000026e output: Connection refused by host [04:47:35] PROBLEM dpkg-check is now: CRITICAL on dumps-8 i-0000026e output: Connection refused by host [04:48:45] PROBLEM Current Load is now: CRITICAL on dumps-8 i-0000026e output: Connection refused by host [04:52:05] PROBLEM Disk Space is now: CRITICAL on dumps-8 i-0000026e output: Connection refused by host [04:52:25] PROBLEM Current Users is now: CRITICAL on dumps-8 i-0000026e output: Connection refused by host [05:04:53] PROBLEM Free ram is now: WARNING on test3 i-00000093 output: Warning: 8% free memory [06:02:02] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 1.34, 9.17, 6.85 [06:04:52] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 96% free memory [06:12:15] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.72, 1.77, 3.86 [06:31:18] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 6.50, 5.92, 3.21 [06:34:36] PROBLEM Current Users is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:36] PROBLEM Total Processes is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:36:14] PROBLEM Current Load is now: CRITICAL on nova-essex-test i-000001f9 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:36:15] PROBLEM Current Users is now: CRITICAL on nova-essex-test i-000001f9 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:36:15] PROBLEM Total Processes is now: CRITICAL on nova-essex-test i-000001f9 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:36:33] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [06:38:24] PROBLEM Current Load is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:38:24] PROBLEM dpkg-check is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:38:24] PROBLEM Disk Space is now: CRITICAL on nova-essex-test i-000001f9 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:38:29] PROBLEM Current Load is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:38:29] PROBLEM Disk Space is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:39:54] PROBLEM Free ram is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:41:25] PROBLEM Current Load is now: WARNING on nova-essex-test i-000001f9 output: WARNING - load average: 8.12, 9.61, 6.33 [06:41:26] RECOVERY Current Users is now: OK on nova-essex-test i-000001f9 output: USERS OK - 0 users currently logged in [06:41:26] RECOVERY Total Processes is now: OK on nova-essex-test i-000001f9 output: PROCS OK: 126 processes [06:41:31] PROBLEM Disk Space is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:41:52] PROBLEM Total Processes is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:42:12] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 12.04, 7.25, 4.64 [06:43:31] RECOVERY Disk Space is now: OK on nova-essex-test i-000001f9 output: DISK OK [06:44:44] PROBLEM Current Load is now: WARNING on nova-production1 i-0000007b output: WARNING - load average: 7.22, 7.92, 5.84 [06:44:45] PROBLEM Disk Space is now: WARNING on bz-dev i-000001db output: DISK WARNING - free space: / 46 MB (3% inode=43%): [06:45:14] PROBLEM Free ram is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:15] PROBLEM Current Users is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:15] PROBLEM Disk Space is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:30] PROBLEM Current Users is now: CRITICAL on pediapress-ocg1 i-00000233 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:30] PROBLEM dpkg-check is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:55] RECOVERY Disk Space is now: OK on deployment-feed i-00000118 output: DISK OK [06:47:00] PROBLEM dpkg-check is now: CRITICAL on pediapress-ocg1 i-00000233 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:09] PROBLEM dpkg-check is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:09] PROBLEM SSH is now: CRITICAL on mobile-enwp i-000000ce output: CRITICAL - Socket timeout after 10 seconds [06:48:22] PROBLEM Disk Space is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:48:22] PROBLEM Current Users is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:48:23] PROBLEM Current Load is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:06] PROBLEM Disk Space is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:46] PROBLEM Current Users is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:47] PROBLEM Total Processes is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:55] PROBLEM dpkg-check is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:56] PROBLEM Total Processes is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:05] PROBLEM Free ram is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:05] PROBLEM Current Load is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:50] PROBLEM Current Load is now: WARNING on bots-3 i-000000e5 output: WARNING - load average: 6.01, 6.69, 5.68 [06:53:12] PROBLEM Total Processes is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:32] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: DISK CRITICAL - free space: /home/dzahn 703 MB (4% inode=80%): [06:53:32] PROBLEM Current Load is now: WARNING on nova-precise1 i-00000236 output: WARNING - load average: 12.31, 10.56, 7.36 [06:54:33] PROBLEM Disk Space is now: WARNING on deployment-feed i-00000118 output: DISK WARNING - free space: / 78 MB (5% inode=40%): [06:54:34] RECOVERY dpkg-check is now: OK on pediapress-ocg2 i-00000234 output: All packages OK [06:54:34] RECOVERY Current Load is now: OK on maps-test2 i-00000253 output: OK - load average: 6.72, 6.84, 4.18 [06:54:34] RECOVERY Current Users is now: OK on maps-test2 i-00000253 output: USERS OK - 0 users currently logged in [06:54:34] RECOVERY Total Processes is now: OK on maps-test2 i-00000253 output: PROCS OK: 101 processes [06:54:38] RECOVERY dpkg-check is now: OK on maps-test2 i-00000253 output: All packages OK [06:54:43] RECOVERY Free ram is now: OK on ve-nodejs i-00000245 output: OK: 80% free memory [06:54:43] RECOVERY Free ram is now: OK on pediapress-ocg2 i-00000234 output: OK: 90% free memory [06:54:43] RECOVERY Total Processes is now: OK on pediapress-ocg2 i-00000234 output: PROCS OK: 83 processes [06:55:53] PROBLEM Current Load is now: CRITICAL on pediapress-ocg1 i-00000233 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:53] PROBLEM Free ram is now: CRITICAL on pediapress-ocg1 i-00000233 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:53] PROBLEM Disk Space is now: CRITICAL on pediapress-ocg1 i-00000233 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:56:19] PROBLEM Total Processes is now: CRITICAL on pediapress-ocg1 i-00000233 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:56:49] PROBLEM dpkg-check is now: CRITICAL on en-wiki-db-precise i-0000023c output: CHECK_NRPE: Socket timeout after 10 seconds. [06:56:49] PROBLEM Disk Space is now: CRITICAL on en-wiki-db-precise i-0000023c output: CHECK_NRPE: Socket timeout after 10 seconds. [06:56:50] PROBLEM Current Users is now: CRITICAL on en-wiki-db-precise i-0000023c output: CHECK_NRPE: Socket timeout after 10 seconds. [06:57:35] PROBLEM Current Load is now: CRITICAL on nova-essex-test i-000001f9 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:57:44] PROBLEM Current Load is now: WARNING on pediapress-ocg2 i-00000234 output: WARNING - load average: 5.12, 6.95, 5.85 [06:57:44] RECOVERY Current Users is now: OK on pediapress-ocg2 i-00000234 output: USERS OK - 0 users currently logged in [06:57:44] RECOVERY Disk Space is now: OK on pediapress-ocg2 i-00000234 output: DISK OK [06:58:45] PROBLEM Current Load is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:34] RECOVERY Current Users is now: OK on pediapress-ocg1 i-00000233 output: USERS OK - 0 users currently logged in [07:00:22] PROBLEM Current Load is now: WARNING on aggregator-test i-0000024d output: WARNING - load average: 7.64, 23.33, 16.21 [07:01:16] RECOVERY Current Load is now: OK on bots-3 i-000000e5 output: OK - load average: 6.13, 4.55, 4.88 [07:01:16] PROBLEM Disk Space is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused by host [07:01:16] PROBLEM Free ram is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused by host [07:01:51] !deployment-prep del [07:01:51] You are not autorized to perform this, sorry [07:01:55] meh [07:02:17] hey [07:02:18] PROBLEM HTTP is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused [07:02:24] PROBLEM Disk Space is now: CRITICAL on deployment-apache22 i-0000026f output: Connection refused by host [07:02:29] if someone again ping regarding beta tell them to use bz plix [07:02:30] PROBLEM Free ram is now: CRITICAL on deployment-apache22 i-0000026f output: Connection refused by host [07:02:48] PROBLEM Current Users is now: CRITICAL on deployment-apache22 i-0000026f output: Connection refused by host [07:02:49] PROBLEM Current Load is now: CRITICAL on deployment-apache22 i-0000026f output: Connection refused by host [07:03:03] where is hashar [07:03:05] PROBLEM Current Users is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [07:03:05] PROBLEM Free ram is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [07:03:05] RECOVERY Total Processes is now: OK on mobile-enwp i-000000ce output: PROCS OK: 114 processes [07:03:13] PROBLEM Current Load is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:03:34] PROBLEM Total Processes is now: CRITICAL on deployment-apache22 i-0000026f output: Connection refused by host [07:03:34] PROBLEM HTTP is now: CRITICAL on deployment-apache22 i-0000026f output: Connection refused [07:03:34] PROBLEM Current Load is now: WARNING on nova-precise1 i-00000236 output: WARNING - load average: 10.57, 10.25, 8.72 [07:03:59] RECOVERY Disk Space is now: OK on maps-test2 i-00000253 output: DISK OK [07:04:24] PROBLEM dpkg-check is now: CRITICAL on deployment-apache22 i-0000026f output: Connection refused by host [07:04:24] PROBLEM Total Processes is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused by host [07:04:34] PROBLEM Total Processes is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [07:04:40] PROBLEM dpkg-check is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [07:04:40] RECOVERY Current Users is now: OK on mobile-enwp i-000000ce output: USERS OK - 2 users currently logged in [07:04:40] RECOVERY Disk Space is now: OK on mobile-enwp i-000000ce output: DISK OK [07:04:40] RECOVERY Free ram is now: OK on mobile-enwp i-000000ce output: OK: 28% free memory [07:04:54] PROBLEM dpkg-check is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused by host [07:05:24] PROBLEM Current Users is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused by host [07:05:24] PROBLEM Current Load is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused by host [07:06:14] PROBLEM Current Load is now: WARNING on pediapress-ocg1 i-00000233 output: WARNING - load average: 4.75, 6.08, 5.87 [07:06:14] RECOVERY Disk Space is now: OK on pediapress-ocg1 i-00000233 output: DISK OK [07:06:14] RECOVERY Free ram is now: OK on pediapress-ocg1 i-00000233 output: OK: 84% free memory [07:06:26] RECOVERY SSH is now: OK on mobile-enwp i-000000ce output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [07:06:26] RECOVERY dpkg-check is now: OK on mobile-enwp i-000000ce output: All packages OK [07:06:26] RECOVERY Total Processes is now: OK on pediapress-ocg1 i-00000233 output: PROCS OK: 89 processes [07:06:34] RECOVERY dpkg-check is now: OK on pediapress-ocg1 i-00000233 output: All packages OK [07:07:05] RECOVERY Disk Space is now: OK on en-wiki-db-precise i-0000023c output: DISK OK [07:07:05] RECOVERY dpkg-check is now: OK on en-wiki-db-precise i-0000023c output: All packages OK [07:07:05] RECOVERY Current Users is now: OK on en-wiki-db-precise i-0000023c output: USERS OK - 0 users currently logged in [07:07:50] RECOVERY Total Processes is now: OK on nova-precise1 i-00000236 output: PROCS OK: 125 processes [07:08:09] RECOVERY Disk Space is now: OK on nova-precise1 i-00000236 output: DISK OK [07:08:09] RECOVERY Current Users is now: OK on fr-wiki-db-precise i-0000023e output: USERS OK - 0 users currently logged in [07:08:09] RECOVERY Free ram is now: OK on fr-wiki-db-precise i-0000023e output: OK: 78% free memory [07:08:09] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 7.12, 3.04, 3.82 [07:08:55] RECOVERY Current Load is now: OK on ve-nodejs i-00000245 output: OK - load average: 0.50, 3.59, 4.91 [07:09:03] RECOVERY dpkg-check is now: OK on ve-nodejs i-00000245 output: All packages OK [07:09:03] RECOVERY Disk Space is now: OK on ve-nodejs i-00000245 output: DISK OK [07:09:25] RECOVERY Total Processes is now: OK on fr-wiki-db-precise i-0000023e output: PROCS OK: 83 processes [07:09:30] RECOVERY dpkg-check is now: OK on fr-wiki-db-precise i-0000023e output: All packages OK [07:11:08] RECOVERY Current Load is now: OK on pediapress-ocg1 i-00000233 output: OK - load average: 0.12, 2.53, 4.42 [07:11:20] RECOVERY Current Users is now: OK on ve-nodejs i-00000245 output: USERS OK - 0 users currently logged in [07:11:20] RECOVERY Total Processes is now: OK on ve-nodejs i-00000245 output: PROCS OK: 81 processes [07:14:28] RECOVERY Current Load is now: OK on nova-production1 i-0000007b output: OK - load average: 0.05, 1.45, 4.29 [07:19:01] PROBLEM Current Load is now: WARNING on mobile-enwp i-000000ce output: WARNING - load average: 4.56, 7.14, 18.19 [07:25:29] RECOVERY Current Load is now: OK on aggregator-test i-0000024d output: OK - load average: 0.25, 0.63, 4.04 [07:30:02] Hydriz: I don't see any possible reason to need 8 instances to upload dumps [07:30:19] thats going to be 100 wikis per instance [07:30:23] 100*5 [07:30:29] upload slower [07:30:47] you're eating a ton of IO [07:31:05] heh [07:31:31] then, give me some desired figures, and I can adapt to it [07:31:43] I mean, I don't know what is possible [07:31:48] so I need feedback [07:32:04] use a couple instances [07:32:32] 4? [07:33:26] ok [07:33:47] you are uploading directly off the gluster share now, right? [07:34:25] yep [07:34:29] ok. cool [07:34:33] no wait [07:34:38] which gluster share? [07:34:46] the nfs one, sorry [07:34:51] /dumps-project or /publicdata-project [07:34:58] the latter [07:35:13] oh, just doing initial tests [07:35:33] but its a little out of date the other time I checked [07:35:34] (a few days ago) [07:35:49] I'll check to see why [07:35:59] * Hydriz checks again [07:36:09] what is the interval for updating this directory? [07:36:26] I could have broken it somehow recently [07:36:30] it should be consistent [07:37:15] yeah, what did you set the interval for updating this directory to be? [07:37:35] if its once a day, then its really out of date [07:37:51] I didn't [07:37:53] ariel did [07:38:01] looks like its once every two days then :) [07:38:13] May 09 is there [07:38:17] but not 10 and 11 [07:39:27] yep [07:39:28] I broke it [07:39:38] no, it constantly updates [07:39:42] but I broke it monday [07:39:59] so, I just fixed it [07:40:03] heh [07:40:04] it'll start updating again [07:40:31] I am figuring out how to extract the dates of the dump in python [07:40:52] the script for uploading is almost done, just lacking this and a date range check [07:44:36] PROBLEM Current Load is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [07:44:51] RECOVERY Disk Space is now: OK on deployment-feed i-00000118 output: DISK OK [07:45:01] PROBLEM Free ram is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [07:46:10] PROBLEM Total Processes is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [07:51:35] RECOVERY HTTP is now: OK on deployment-apache22 i-0000026f output: HTTP OK: HTTP/1.1 200 OK - 453 bytes in 0.028 second response time [07:53:35] RECOVERY HTTP is now: OK on deployment-apache23 i-00000270 output: HTTP OK: HTTP/1.1 200 OK - 453 bytes in 0.005 second response time [08:00:38] RECOVERY Total Processes is now: OK on deployment-apache21 i-0000026d output: PROCS OK: 149 processes [08:00:58] RECOVERY Free ram is now: OK on deployment-apache21 i-0000026d output: OK: 96% free memory [08:00:58] RECOVERY Disk Space is now: OK on deployment-apache21 i-0000026d output: DISK OK [08:01:28] RECOVERY Current Load is now: OK on deployment-apache21 i-0000026d output: OK - load average: 0.19, 0.27, 0.11 [08:01:28] RECOVERY dpkg-check is now: OK on deployment-apache21 i-0000026d output: All packages OK [08:01:28] RECOVERY Current Users is now: OK on deployment-apache21 i-0000026d output: USERS OK - 0 users currently logged in [09:08:44] RECOVERY Current Load is now: OK on deployment-apache23 i-00000270 output: OK - load average: 0.35, 0.21, 0.15 [09:08:44] RECOVERY Current Users is now: OK on deployment-apache23 i-00000270 output: USERS OK - 0 users currently logged in [09:09:50] !log deployment-prep petrb: fixed nrpe on boxes where it was failing, we need to insert motd to puppet [09:09:53] Logged the message, Master [09:10:04] RECOVERY Disk Space is now: OK on deployment-apache23 i-00000270 output: DISK OK [09:10:04] RECOVERY dpkg-check is now: OK on deployment-apache22 i-0000026f output: All packages OK [09:10:04] RECOVERY Free ram is now: OK on deployment-apache23 i-00000270 output: OK: 93% free memory [09:10:44] RECOVERY Disk Space is now: OK on deployment-apache22 i-0000026f output: DISK OK [09:10:44] RECOVERY Free ram is now: OK on deployment-apache22 i-0000026f output: OK: 94% free memory [09:11:14] RECOVERY Current Load is now: OK on deployment-apache22 i-0000026f output: OK - load average: 0.33, 0.33, 0.18 [09:11:27] RECOVERY Current Users is now: OK on deployment-apache22 i-0000026f output: USERS OK - 1 users currently logged in [09:11:34] RECOVERY Total Processes is now: OK on deployment-apache22 i-0000026f output: PROCS OK: 129 processes [09:12:34] RECOVERY Total Processes is now: OK on deployment-apache23 i-00000270 output: PROCS OK: 127 processes [09:12:54] RECOVERY dpkg-check is now: OK on deployment-apache23 i-00000270 output: All packages OK [09:27:44] PROBLEM Disk Space is now: WARNING on deployment-feed i-00000118 output: DISK WARNING - free space: / 78 MB (5% inode=40%): [09:33:46] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [09:37:39] PROBLEM HTTP is now: WARNING on deployment-apache21 i-0000026d output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.006 second response time [09:47:47] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: DISK CRITICAL - free space: /home/dzahn 681 MB (3% inode=80%): [11:20:28] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [11:56:15] mutante: can you insert a file to puppet so that all servers on deployment have it in /etc/motd.tail [11:56:25] I don't know how to do that [11:57:34] do you know the right class already? [11:57:44] yes there is class for apache servers [11:57:46] or puppet group [11:58:56] apaches::service [11:59:42] ok, if you are in that file, and you scroll down a little to apaches::pybal-check [12:00:06] you can see a line with file { [12:00:32] that block puts 2 files in place [12:00:54] one is a directory "ensure => directory" in there [12:01:24] and the other one is a file and gets the content from a variable (content => ..) [12:01:52] then scrolling down further to class::apaches:syslog ... [12:02:23] you can see another one, which uses source => instead, and you can see that point to puppet:///files/.... [12:02:42] that refers to the pathes in the git repo [12:05:54] so you put the file you want somewhere in ./files/ in git repo and type "git add ",then add a file { block like that in the class, by saying "ensure => present;" you tell puppet to make sure it exists [12:12:44] ok now another problem one of boxes I use to connect doesn't resolve for some reason, idea how to fix it? [12:12:58] I can't access git because of that [12:13:01] :o [12:13:25] I can't resolve any ip for some reason, is it possible to restart resolver? [12:13:32] or whatever service [12:14:01] I am connected on office pc to one of my servers and from there to labs [12:15:11] will try #ubuntu [12:15:39] petan|wk: does not look like a labs problem, worksforme. temp work around put something in /etc/hosts (just dont forget to remove once resolving works again) [12:16:29] mutante: no it's not a labs problem I just thought you could know that [12:16:41] oh, you meant local DNS cache? /etc/init.d/nscd restart ? [12:17:07] nscd is the caching daemon [12:17:44] no such a service on that box [12:18:11] I tried to use another dns servers but all of them doesn't work :o [12:18:14] but I can ping them [12:18:43] like ping one of opendns servers work, but when I try to resolve and dns it fail [12:18:53] any [12:19:06] firewalling port 53? [12:19:21] no it used to work in past I didn't change anything on firewall [12:19:27] I think the service just died [12:19:33] but I don't know which service it is [12:19:34] which one [12:19:39] the one which resolve dns [12:19:44] where are you? you meant wmf office? [12:20:34] eh, no I am in office in my work, then I connect to one of my servers which is in entirely different place and from there I connect to bastion (I can't connect to bastion from office because of firewall here) [12:21:16] I just wanted to install some packages I need to upload that patch to gerrit [12:21:23] but aptitude can't resolve the repo [12:21:32] :/ [12:22:07] if only I know how the linux resolve stuff [12:22:25] I believe there is some service which needs to be restarted [12:22:27] first it looks in /etc/nsswitch.conf [12:22:40] to see where it is supposed to check (files, DNS,..) [12:23:20] if there is (also) "files" in there, it looks up names in /etc/hosts [12:23:26] ok, but when you type ping blabla does it contact some local service to resolve it? or use some api? [12:23:43] otherwise it looks in /etc/resolv.conf [12:24:02] which tells it which nameserver to ask [12:24:07] "it" is a program I called or a service which resolve it? [12:24:24] "it" is like the answer to "how does linux resolve stuff" [12:24:27] I mean if I call some program, is it that program which directly contact dns server or some service [12:24:56] I guess it call some api [12:25:04] which does all the stuff you described [12:25:18] or it could just query some local service which would do that stuff [12:25:34] that would allow caching of dns records etc [12:26:19] my question is if there is any service which if I restart could make it work [12:26:32] because it just stopped working for no reason [12:26:56] I didn't change anything on that box, it's running almost 8 months with no restart [12:26:57] afaik what you meant with api here is glibc [12:27:30] re: caching, that would be the nscd you did not have installed though [12:27:52] you may ask the work admin first to confirm there is no issue with local DNS server (likely "bind")? [12:28:13] actually I don't use local dns, in resolv.conf I have another dns server [12:28:15] opendns [12:28:22] I can ping it [12:28:30] but resolving doesn't work [12:28:47] I thought their service is down, but changing dns servers didn't fix it [12:29:50] that's weird I will try ubuntu guys [12:30:57] ok [12:39:35] mutante: found out what the problem was [12:40:06] all possible dns servers I tried were down, maybe some hackers are trying to get internet down, hehe [12:40:15] finally I found one which is up [12:40:54] when in situations like that: 8.8.8.8 is google [12:47:54] mutante: which file is apache class hashar used in [12:48:17] I know the name of class only [12:48:54] I think I found that [14:01:09] PROBLEM dpkg-check is now: CRITICAL on deployment-apache20 i-0000026c output: DPKG CRITICAL dpkg reports broken packages [14:23:02] PROBLEM HTTP is now: CRITICAL on deployment-apache20 i-0000026c output: Connection refused [14:26:02] RECOVERY dpkg-check is now: OK on deployment-apache20 i-0000026c output: All packages OK [14:26:42] PROBLEM HTTP is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused [14:28:02] PROBLEM HTTP is now: WARNING on deployment-apache20 i-0000026c output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.013 second response time [14:31:42] PROBLEM HTTP is now: WARNING on deployment-apache23 i-00000270 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.018 second response time [16:20:18] Soulparadox gets cookies [16:33:22] hello [16:37:28] !log deployment-prep updated MediaWiki up to 05e656a (aka master) [16:37:31] Logged the message, Master [16:39:56] !log deployment-prep cloning mediawiki/extensions.git which has all extensions as submodules [16:39:57] Logged the message, Master [16:49:55]