[00:32:03] New patchset: Hashar; "on labs: placeholders for squid configuration files" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7144 [00:32:19] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7144 [00:32:32] paravoid: I need some dummy squid configuration files to finish up the upload squid installation https://gerrit.wikimedia.org/r/7144 [00:34:50] New patchset: Hashar; "on labs: placeholders for squid configuration files" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7144 [00:35:04] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7144 [00:36:19] New review: Faidon; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7144 [00:36:21] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7144 [00:40:49] New patchset: Hashar; "we want the file to be present (exist create a symlink)" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7148 [00:41:03] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7148 [00:41:23] New review: Faidon; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7148 [00:41:25] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7148 [01:36:46] PROBLEM Current Load is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused by host [01:37:26] PROBLEM Current Users is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused by host [01:38:21] PROBLEM Disk Space is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused by host [01:38:46] PROBLEM Free ram is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused by host [01:41:16] PROBLEM Total Processes is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused by host [01:41:16] PROBLEM SSH is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused [01:41:26] PROBLEM dpkg-check is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused by host [01:46:16] RECOVERY Total Processes is now: OK on deployment-cache-upload i-00000263 output: PROCS OK: 104 processes [01:46:21] RECOVERY SSH is now: OK on deployment-cache-upload i-00000263 output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [01:46:26] RECOVERY dpkg-check is now: OK on deployment-cache-upload i-00000263 output: All packages OK [01:46:46] RECOVERY Current Load is now: OK on deployment-cache-upload i-00000263 output: OK - load average: 0.29, 0.26, 0.11 [01:47:26] RECOVERY Current Users is now: OK on deployment-cache-upload i-00000263 output: USERS OK - 2 users currently logged in [01:48:16] RECOVERY Disk Space is now: OK on deployment-cache-upload i-00000263 output: DISK OK [01:48:46] RECOVERY Free ram is now: OK on deployment-cache-upload i-00000263 output: OK: 92% free memory [01:54:43] New patchset: Ryan Lane; "Setup APT preferences in Puppet instead of in package wikimedia-base" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7150 [01:54:57] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7150 [01:55:00] New review: Ryan Lane; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7150 [01:55:04] Change merged: Ryan Lane; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7150 [02:57:37] RECOVERY dpkg-check is now: OK on migration1 i-00000261 output: All packages OK [02:58:47] RECOVERY Current Load is now: OK on migration1 i-00000261 output: OK - load average: 0.39, 0.64, 0.34 [02:59:03] end of day [02:59:04] take care [02:59:27] RECOVERY Current Users is now: OK on migration1 i-00000261 output: USERS OK - 0 users currently logged in [03:00:47] RECOVERY Free ram is now: OK on migration1 i-00000261 output: OK: 89% free memory [03:01:47] RECOVERY Disk Space is now: OK on migration1 i-00000261 output: DISK OK [03:02:17] RECOVERY Total Processes is now: OK on migration1 i-00000261 output: PROCS OK: 79 processes [03:04:27] RECOVERY Current Users is now: OK on deployment-cache-bits i-00000264 output: USERS OK - 0 users currently logged in [03:06:47] RECOVERY Current Load is now: OK on deployment-cache-bits i-00000264 output: OK - load average: 0.15, 0.86, 0.54 [03:06:47] RECOVERY Free ram is now: OK on deployment-cache-bits i-00000264 output: OK: 88% free memory [03:06:57] RECOVERY Total Processes is now: OK on deployment-cache-bits i-00000264 output: PROCS OK: 79 processes [03:07:07] RECOVERY Disk Space is now: OK on deployment-cache-bits i-00000264 output: DISK OK [03:07:38] RECOVERY dpkg-check is now: OK on deployment-cache-bits i-00000264 output: All packages OK [03:12:48] PROBLEM Disk Space is now: WARNING on bz-dev i-000001db output: DISK WARNING - free space: / 53 MB (4% inode=43%): [03:17:48] PROBLEM Disk Space is now: CRITICAL on bz-dev i-000001db output: DISK CRITICAL - free space: / 38 MB (2% inode=43%): [03:41:48] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 14% free memory [03:44:18] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 11% free memory [03:47:28] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 17% free memory [03:56:48] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [04:01:49] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [04:04:19] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 4% free memory [04:04:49] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 15% free memory [04:07:29] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 4% free memory [04:09:19] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:12:29] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:14:49] PROBLEM Free ram is now: WARNING on test3 i-00000093 output: Warning: 11% free memory [04:19:49] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 96% free memory [04:19:49] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 5% free memory [04:29:49] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 95% free memory [05:00:02] boug [05:19:13] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [05:58:09] PROBLEM Puppet freshness is now: CRITICAL on swift-be1 i-000001c7 output: Puppet has not run in last 20 hours [06:12:28] Ryan_Lane: did u start wm bot by hand [06:12:37] because it has auto restart [06:12:40] on crash [06:12:45] it didn't [06:12:55] ok, so you did start it by hand? [06:12:57] yeah [06:13:00] as the user [06:13:03] using nohup [06:13:09] wmib user [06:13:11] I hope [06:13:12] yep [06:13:14] ok [06:13:24] I added docs for it on the project page too [06:13:27] hello petan :-] [06:13:28] the restart script wait 10 minutes [06:13:33] ah [06:13:35] that's why [06:13:39] I didn't wait that long [06:13:41] let me check if restart doesn't try to restart it now [06:13:59] Ryan_Lane: finally managed to get the squid-frontend to listen to port 80 :-]] [06:14:08] ah. great [06:14:15] Ryan_Lane: but somehow, I can't restart the backend one using /etc/init.d/squid stop [06:14:22] it just fail :-D [06:14:33] will have to polish that tomorrow [06:14:33] Ryan_Lane: you started nohup restart.sh? [06:14:38] or directly bot [06:14:42] using mono [06:14:43] restart.sh [06:14:45] ok [06:14:50] weird it crashed then [06:17:05] petan: do you have any idea how on deployment-web3,4,5 the /usr/local/apache/common-local/ directory is populated ? :-D [06:17:27] I can't find any deb package or puppet class doing it :-( [06:17:41] there is script patch in my home [06:17:53] oharhhhgh :-D [06:17:54] :P [06:18:02] and where are the source file ? [06:18:16] source file is the script itself [06:18:22] ./home/petrb/patch [06:18:33] just execute it [06:18:40] I mean /usr/local/apache/common-local/ contains various files. Where do you copy them from? [06:18:51] it's mountpoint [06:19:07] deployment-nfs:/export/apache I think [06:19:15] type df [06:19:17] or mount [06:19:18] to see [06:19:44] i see [06:19:50] gotta migrate that to puppet :-] [06:19:54] ok [06:20:03] also I fixed the udp2log stuff at one point this week [06:20:14] it now send its stuff to /home/wikipedia/log [06:20:16] how [06:20:23] is there any guide how to do that [06:20:27] can't remember, ton of tweaking / hacking [06:20:31] heh [06:20:38] in the end, it is installed by puppet now [06:20:45] and I have fixed some wrong conf in commonsettings [06:21:06] on that subject. Reedy has migrated the CommonSettings / initi settings and everything to a new git repository :-]] [06:21:13] operations/mediawiki-config.git IIRC [06:21:17] yay [06:21:27] we definitely want to use that one [06:21:33] we need to find out how to merge it with deployment config [06:21:37] and then do all the lab specific stuff in a dedicate file [06:21:38] so that we can just pull [06:21:43] something like LabSettings.php [06:21:46] ok [06:22:02] Reedy might be able to set that up tomorrow [06:22:04] Idk [06:22:10] cool [06:24:08] you probably don't want to use that `patch` script anymore :-D [06:24:58] heh [06:25:28] it did everything I needed [06:25:40] I just started it and I was done configuring server XD [06:25:50] deployment-nfs-memc:/mnt/export/apache on /usr/local/apache ahhhh [06:25:52] :-D [06:26:32] what's up [06:26:37] that's how it works now [06:26:40] is it wrong [06:27:08] apart from using NFS and not being copied from a central place ..... [06:27:10] nothing wrong [06:27:11] :-D [06:27:15] just need to create a puppet class [06:27:19] hm, it has some benefits too XD [06:27:29] you don't need to use so much disk space [06:27:35] labs friendlier [06:27:36] XD [06:27:42] ultimately we want to have the labs instances to be identical to the production one :-] [06:27:49] PROBLEM Disk Space is now: WARNING on bz-dev i-000001db output: DISK WARNING - free space: / 39 MB (3% inode=43%): [06:27:51] ;-) [06:28:01] hashar: do we want to have so many as production servers? :P [06:28:05] like 300 [06:28:11] heh [06:28:12] probably not that much [06:28:23] :D [06:28:24] but certainly more than what we have now [06:28:30] why [06:28:46] ultimately, all the work will happen on that cluster [06:28:53] production will just be a copy of it :-]]] [06:29:00] hm... [06:29:03] ok [06:29:06] so beta labs will become the main testing / integration / staging / preproduction area [06:29:07] somehow [06:29:25] right that's far beyond the original purpose [06:29:29] but ok [06:29:31] so we definitely want it to be fully documented in puppet and as close to production as possible [06:29:48] (that is the long term vision I am giving you there) ;-D [06:29:53] ok [06:33:20] hashar: fix [06:33:21] https://bugzilla.wikimedia.org/show_bug.cgi?id=36685 [06:34:13] yeah will have to poke Reedy about it [06:37:09] PROBLEM Puppet freshness is now: CRITICAL on mobile-feeds i-000000c1 output: Puppet has not run in last 20 hours [06:38:31] New patchset: Hashar; "labs has Apache docroot mounted from NFS" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7157 [06:38:45] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7157 [06:38:52] I should have asked for merge rights on test branch [06:43:37] well [06:43:39] RECOVERY Disk Space is now: OK on deployment-apache09 i-0000025e output: DISK OK [06:43:42] New patchset: Hashar; "mount apache docroot for taskservers and imagescaler" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7158 [06:43:45] been there for 15 hours [06:44:10] time to get some rest [06:44:11] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7158 [06:45:55] PROBLEM Current Load is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:55] PROBLEM Free ram is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:46:22] PROBLEM Disk Space is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:07] petan: have a good day :-] I am off! [06:47:18] PROBLEM Free ram is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:48:09] PROBLEM dpkg-check is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:48:09] PROBLEM Current Load is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:48:09] PROBLEM Total Processes is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:48:17] PROBLEM dpkg-check is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:48:17] PROBLEM SSH is now: CRITICAL on mobile-enwp i-000000ce output: CRITICAL - Socket timeout after 10 seconds [06:49:04] PROBLEM Current Users is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:34] PROBLEM Current Users is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:34] PROBLEM Total Processes is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:37] PROBLEM Current Load is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:37] PROBLEM Disk Space is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:38] PROBLEM Free ram is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:38] PROBLEM Current Users is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:38] PROBLEM Disk Space is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:38] PROBLEM Total Processes is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:51] PROBLEM dpkg-check is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:51:57] PROBLEM Disk Space is now: WARNING on deployment-apache09 i-0000025e output: DISK WARNING - free space: / 73 MB (5% inode=59%): [06:52:24] PROBLEM Free ram is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:41] PROBLEM Disk Space is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:41] PROBLEM Current Load is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:41] PROBLEM Current Users is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:41] PROBLEM Current Load is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:43] PROBLEM dpkg-check is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:03] PROBLEM Disk Space is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Total Processes is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Free ram is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Disk Space is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Current Load is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Current Users is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM dpkg-check is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Total Processes is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Free ram is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:55] RECOVERY Disk Space is now: OK on nova-precise1 i-00000236 output: DISK OK [06:55:58] RECOVERY Total Processes is now: OK on nova-precise1 i-00000236 output: PROCS OK: 136 processes [06:55:58] RECOVERY dpkg-check is now: OK on nova-precise1 i-00000236 output: All packages OK [06:56:05] PROBLEM dpkg-check is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:56:11] RECOVERY Disk Space is now: OK on mobile-enwp i-000000ce output: DISK OK [06:56:56] RECOVERY Free ram is now: OK on mobile-enwp i-000000ce output: OK: 34% free memory [06:56:56] RECOVERY Free ram is now: OK on ve-nodejs i-00000245 output: OK: 87% free memory [06:58:26] RECOVERY Total Processes is now: OK on pediapress-ocg2 i-00000234 output: PROCS OK: 86 processes [06:58:32] RECOVERY Free ram is now: OK on pediapress-ocg2 i-00000234 output: OK: 90% free memory [06:59:26] RECOVERY Current Users is now: OK on ve-nodejs i-00000245 output: USERS OK - 0 users currently logged in [06:59:26] RECOVERY Total Processes is now: OK on ve-nodejs i-00000245 output: PROCS OK: 80 processes [06:59:56] PROBLEM Current Load is now: WARNING on maps-test2 i-00000253 output: WARNING - load average: 4.13, 6.46, 5.50 [06:59:56] RECOVERY Disk Space is now: OK on maps-test2 i-00000253 output: DISK OK [06:59:56] RECOVERY Free ram is now: OK on maps-test2 i-00000253 output: OK: 93% free memory [06:59:56] RECOVERY Current Users is now: OK on maps-test2 i-00000253 output: USERS OK - 0 users currently logged in [07:00:56] PROBLEM Current Load is now: WARNING on mobile-enwp i-000000ce output: WARNING - load average: 2.60, 13.54, 16.73 [07:00:57] RECOVERY dpkg-check is now: OK on pediapress-ocg2 i-00000234 output: All packages OK [07:02:06] RECOVERY Current Load is now: OK on ve-nodejs i-00000245 output: OK - load average: 0.57, 3.57, 3.50 [07:02:06] RECOVERY dpkg-check is now: OK on ve-nodejs i-00000245 output: All packages OK [07:02:06] RECOVERY Disk Space is now: OK on pediapress-ocg2 i-00000234 output: DISK OK [07:02:06] RECOVERY Current Load is now: OK on pediapress-ocg2 i-00000234 output: OK - load average: 5.17, 6.14, 4.45 [07:02:06] RECOVERY Current Users is now: OK on pediapress-ocg2 i-00000234 output: USERS OK - 0 users currently logged in [07:02:56] PROBLEM Current Load is now: WARNING on nova-precise1 i-00000236 output: WARNING - load average: 3.49, 7.52, 6.90 [07:02:56] RECOVERY SSH is now: OK on mobile-enwp i-000000ce output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [07:02:56] RECOVERY Total Processes is now: OK on maps-test2 i-00000253 output: PROCS OK: 91 processes [07:03:01] RECOVERY dpkg-check is now: OK on mobile-enwp i-000000ce output: All packages OK [07:03:01] RECOVERY dpkg-check is now: OK on maps-test2 i-00000253 output: All packages OK [07:03:01] RECOVERY Disk Space is now: OK on ve-nodejs i-00000245 output: DISK OK [07:03:01] RECOVERY Total Processes is now: OK on fr-wiki-db-precise i-0000023e output: PROCS OK: 80 processes [07:03:06] RECOVERY Free ram is now: OK on fr-wiki-db-precise i-0000023e output: OK: 78% free memory [07:03:06] RECOVERY Disk Space is now: OK on fr-wiki-db-precise i-0000023e output: DISK OK [07:03:07] RECOVERY Current Load is now: OK on fr-wiki-db-precise i-0000023e output: OK - load average: 0.92, 3.67, 3.28 [07:03:07] RECOVERY Current Users is now: OK on fr-wiki-db-precise i-0000023e output: USERS OK - 0 users currently logged in [07:03:07] RECOVERY dpkg-check is now: OK on fr-wiki-db-precise i-0000023e output: All packages OK [07:03:56] RECOVERY Current Users is now: OK on nova-precise1 i-00000236 output: USERS OK - 0 users currently logged in [07:04:56] RECOVERY Current Load is now: OK on maps-test2 i-00000253 output: OK - load average: 0.09, 2.57, 4.09 [07:05:56] RECOVERY Free ram is now: OK on nova-precise1 i-00000236 output: OK: 71% free memory [07:06:58] ACKNOWLEDGEMENT Puppet freshness is now: CRITICAL on mobile-feeds i-000000c1 output: Puppet has not run in last 20 hours [07:07:28] ACKNOWLEDGEMENT Puppet freshness is now: CRITICAL on swift-be1 i-000001c7 output: Puppet has not run in last 20 hours [07:07:43] ACKNOWLEDGEMENT Puppet freshness is now: CRITICAL on wikidata-dev-2 i-00000259 output: Puppet has not run in last 20 hours [07:12:56] RECOVERY Current Load is now: OK on nova-precise1 i-00000236 output: OK - load average: 0.19, 1.14, 3.69 [07:20:56] RECOVERY Current Load is now: OK on mobile-enwp i-000000ce output: OK - load average: 0.59, 0.81, 4.98 [08:02:56] PROBLEM Disk Space is now: CRITICAL on bz-dev i-000001db output: DISK CRITICAL - free space: / 39 MB (2% inode=43%): [10:09:34] !log deployment-prep petrb: fixed teh missing NOT FOUND error page [10:09:37] Logged the message, Master [10:15:26] 05/10/2012 - 10:15:26 - Creating a home directory for ariel at /export/home/mail/ariel [10:16:25] 05/10/2012 - 10:16:25 - Updating keys for ariel [10:33:45] PROBLEM Current Load is now: CRITICAL on exim-test i-00000265 output: CHECK_NRPE: Error - Could not complete SSL handshake. [10:34:25] PROBLEM Current Users is now: CRITICAL on exim-test i-00000265 output: CHECK_NRPE: Error - Could not complete SSL handshake. [10:35:05] PROBLEM Disk Space is now: CRITICAL on exim-test i-00000265 output: CHECK_NRPE: Error - Could not complete SSL handshake. [10:35:45] PROBLEM Free ram is now: CRITICAL on exim-test i-00000265 output: CHECK_NRPE: Error - Could not complete SSL handshake. [10:36:55] PROBLEM Total Processes is now: CRITICAL on exim-test i-00000265 output: CHECK_NRPE: Error - Could not complete SSL handshake. [10:37:35] PROBLEM dpkg-check is now: CRITICAL on exim-test i-00000265 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:14:00] PROBLEM Current Users is now: CRITICAL on dumps-7 i-00000267 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:14:00] PROBLEM Current Load is now: CRITICAL on dumps-6 i-00000266 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:14:30] PROBLEM Current Users is now: CRITICAL on dumps-6 i-00000266 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:14:30] PROBLEM Disk Space is now: CRITICAL on dumps-7 i-00000267 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:15:05] PROBLEM Disk Space is now: CRITICAL on dumps-6 i-00000266 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:15:05] PROBLEM Free ram is now: CRITICAL on dumps-7 i-00000267 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:16:00] PROBLEM Free ram is now: CRITICAL on dumps-6 i-00000266 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:16:20] PROBLEM Total Processes is now: CRITICAL on dumps-7 i-00000267 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:17:00] PROBLEM dpkg-check is now: CRITICAL on dumps-7 i-00000267 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:17:00] PROBLEM Total Processes is now: CRITICAL on dumps-6 i-00000266 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:17:30] PROBLEM dpkg-check is now: CRITICAL on dumps-6 i-00000266 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:18:29] PROBLEM Current Load is now: CRITICAL on dumps-7 i-00000267 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:24:00] RECOVERY Current Load is now: OK on dumps-6 i-00000266 output: OK - load average: 1.27, 0.56, 0.52 [11:24:30] RECOVERY Current Users is now: OK on dumps-6 i-00000266 output: USERS OK - 1 users currently logged in [11:25:00] RECOVERY Disk Space is now: OK on dumps-6 i-00000266 output: DISK OK [11:25:00] RECOVERY Free ram is now: OK on dumps-7 i-00000267 output: OK: 92% free memory [11:26:00] RECOVERY Free ram is now: OK on dumps-6 i-00000266 output: OK: 93% free memory [11:26:20] RECOVERY Total Processes is now: OK on dumps-7 i-00000267 output: PROCS OK: 82 processes [11:27:00] RECOVERY Total Processes is now: OK on dumps-6 i-00000266 output: PROCS OK: 82 processes [11:27:05] RECOVERY dpkg-check is now: OK on dumps-7 i-00000267 output: All packages OK [11:27:30] RECOVERY dpkg-check is now: OK on dumps-6 i-00000266 output: All packages OK [11:28:10] RECOVERY Current Load is now: OK on dumps-7 i-00000267 output: OK - load average: 0.03, 0.28, 0.41 [11:29:00] RECOVERY Current Users is now: OK on dumps-7 i-00000267 output: USERS OK - 0 users currently logged in [11:29:30] RECOVERY Disk Space is now: OK on dumps-7 i-00000267 output: DISK OK [15:19:12] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [15:40:48] hello [17:33:49] Is it possible to map a sub-sub-domain through labsconsole? I want to set up test.reportcard.wmflabs.org [17:34:18] I don't want to use a hyphen :( that's ugly. [17:35:27] yes [17:36:12] petan: Bad resource name provided. Resource names start with a-z, and can only contain a-z, 0-9, -, and _ characters. [17:37:23] dschoon: Are you using https://labsconsole.wikimedia.org/wiki/Special:NovaDomain ? [17:37:26] Or something else? [17:37:54] andrewbogott_: https://labsconsole.wikimedia.org/wiki/Special:NovaAddress [17:38:00] is that the wrong page? [17:38:13] dschoon: ... [17:38:20] I am curtly told I am not a cloudadmin by https://labsconsole.wikimedia.org/wiki/Special:NovaDomain [17:38:24] Ryan_Lane: ... [17:38:41] reportcard is the test server, is it not? [17:38:58] >:( that is immaterial. [17:39:25] you can have a subdomain, but we'd need to set reportcard up as a domain [17:39:34] it also tells me i cannot have Ryan_Lane.reportcard.wmflabs.org [17:39:36] ok. [17:39:43] so i should just use a hyphen then :( [17:39:51] it's easiet [17:39:56] s'fine [17:39:59] easiest [17:40:01] about how long is DNS propagation for labs domains? [17:40:08] instant [17:40:26] hmm. [17:40:34] i'll flush local cache then. [17:40:47] New patchset: Hashar; "labs has Apache docroot mounted from NFS" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7157 [17:40:50] remembering of course that negative cache can be a while [17:40:50] PROBLEM Free ram is now: WARNING on mobile-enwp i-000000ce output: Warning: 6% free memory [17:41:01] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7157 [17:41:18] is there a reason wikimedia.org isn't a domain option? [17:41:58] because that would be amazingly insecure? [17:44:48] just making sure. [17:44:56] and now the fun questions. [17:45:04] New patchset: Hashar; "mount apache docroot for taskservers and imagescaler" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7158 [17:45:19] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7158 [17:45:42] Ryan_Lane, if I get "Permission denied (publickey)." when I attempt to ssh to a labs host, what can I do to fix this? reboot it? [17:45:50] RECOVERY Free ram is now: OK on mobile-enwp i-000000ce output: OK: 40% free memory [17:45:51] the console doesn't show anything weird that i see [17:45:56] which instance? [17:46:17] kripke == i-000001fc [17:46:39] New review: Faidon; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7157 [17:46:41] i'll refrain from rebooting it if you want to touch it [17:46:42] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7157 [17:47:25] I'm betting it installed improperly [17:47:27] and was rebooted [17:47:41] based on this line: user-scripts already ran once-per-instance [17:48:05] did it ever work? [17:48:20] unsure. i did this weeks ago. [17:48:22] lame. [17:48:29] i just added all the addresses to it. [17:48:30] delete/recreate [17:48:33] yeah. [17:48:35] that's no big deal [17:48:39] i'll have to redo all that, won't i [17:48:53] the addresses go with the floating ip [17:48:54] not the instance [17:48:56] ah! [17:48:58] sweet. [17:49:01] so i just disassociate [17:49:10] that's really hot, actually [17:50:16] New patchset: Hashar; "mount apache docroot for taskservers and imagescaler" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7158 [17:50:22] just to double-check, because the confirmation page doesn't tell me what i'm about to actually do [17:50:30] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7158 [17:50:45] Ryan_Lane, i *do* want to click "Disassociate IP", right? [17:51:19] deleting the instance will do that for you [17:51:24] ok [17:51:36] New review: Faidon; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7158 [17:51:38] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7158 [17:51:49] ty, will let you know how it goes. [17:57:44] PROBLEM host: kirke is DOWN address: i-000001fd check_ping: Invalid hostname/address - i-000001fd [17:57:44] PROBLEM host: kripke is DOWN address: i-000001fc check_ping: Invalid hostname/address - i-000001fc [18:02:45] RECOVERY host: kripke is UP address: i-00000268 PING OK - Packet loss = 0%, RTA = 0.97 ms [18:02:55] PROBLEM Current Load is now: CRITICAL on kripke i-00000268 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:02:55] PROBLEM Disk Space is now: CRITICAL on kripke i-00000268 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:03:25] PROBLEM dpkg-check is now: CRITICAL on kripke i-00000268 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:04:07] PROBLEM Current Users is now: CRITICAL on kripke i-00000268 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:09:05] PROBLEM Free ram is now: CRITICAL on kripke i-00000268 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:09:05] PROBLEM Total Processes is now: CRITICAL on kripke i-00000268 output: CHECK_NRPE: Error - Could not complete SSL handshake. [19:41:27] New patchset: Hashar; "/mnt/{thumbs,upload} for labs" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7209 [19:41:41] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7209 [19:41:52] paravoid: here for ya :_D https://gerrit.wikimedia.org/r/7209 [19:42:03] nfs mount for /mnt/upload6 and /mnt/thumbs [19:42:44] ugh [19:42:49] upload6, ew [19:42:54] not your fault of course [19:43:02] it is cause it uses ipv6 maybe :-D [19:43:07] according to LeslieCarr anyway [19:43:20] or just because that is the 6th generation of /mnt/upload [19:43:22] ahah [19:43:25] hehe [19:43:33] probably the latter [19:43:42] as for your changes themselves [19:43:49] I wouldn't do them that way at all [19:43:53] hehe [19:44:12] the idea was to keep calls to nfs::upload unchanged [19:44:18] because that way we'll end up having if ($realm ...) } { else { } in all of our manifests [19:44:25] making them even more complicated than they are [19:44:34] so nfs::upload::labs ? [19:44:45] this is certainly not the puppet way, but it all has to do with how things are being done around here [19:44:55] which I don't know yet :-) [19:45:36] so, I propose that we either get another review or let it in now and ask after the fact [19:46:00] (I couldn't find Ryan) [19:46:16] one issue with creating a new class [19:46:33] i stat I will have to change all the "include nfs::upload" calls [19:46:58] and made them something like: if( $realm..) { include nfs::upload } else { include nfs::upload::labs } [19:46:58]