[00:32:03] New patchset: Hashar; "on labs: placeholders for squid configuration files" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7144 [00:32:19] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7144 [00:32:32] paravoid: I need some dummy squid configuration files to finish up the upload squid installation https://gerrit.wikimedia.org/r/7144 [00:34:50] New patchset: Hashar; "on labs: placeholders for squid configuration files" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7144 [00:35:04] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7144 [00:36:19] New review: Faidon; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7144 [00:36:21] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7144 [00:40:49] New patchset: Hashar; "we want the file to be present (exist create a symlink)" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7148 [00:41:03] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7148 [00:41:23] New review: Faidon; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7148 [00:41:25] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7148 [01:36:46] PROBLEM Current Load is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused by host [01:37:26] PROBLEM Current Users is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused by host [01:38:21] PROBLEM Disk Space is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused by host [01:38:46] PROBLEM Free ram is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused by host [01:41:16] PROBLEM Total Processes is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused by host [01:41:16] PROBLEM SSH is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused [01:41:26] PROBLEM dpkg-check is now: CRITICAL on deployment-cache-upload i-00000263 output: Connection refused by host [01:46:16] RECOVERY Total Processes is now: OK on deployment-cache-upload i-00000263 output: PROCS OK: 104 processes [01:46:21] RECOVERY SSH is now: OK on deployment-cache-upload i-00000263 output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [01:46:26] RECOVERY dpkg-check is now: OK on deployment-cache-upload i-00000263 output: All packages OK [01:46:46] RECOVERY Current Load is now: OK on deployment-cache-upload i-00000263 output: OK - load average: 0.29, 0.26, 0.11 [01:47:26] RECOVERY Current Users is now: OK on deployment-cache-upload i-00000263 output: USERS OK - 2 users currently logged in [01:48:16] RECOVERY Disk Space is now: OK on deployment-cache-upload i-00000263 output: DISK OK [01:48:46] RECOVERY Free ram is now: OK on deployment-cache-upload i-00000263 output: OK: 92% free memory [01:54:43] New patchset: Ryan Lane; "Setup APT preferences in Puppet instead of in package wikimedia-base" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7150 [01:54:57] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7150 [01:55:00] New review: Ryan Lane; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7150 [01:55:04] Change merged: Ryan Lane; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7150 [02:57:37] RECOVERY dpkg-check is now: OK on migration1 i-00000261 output: All packages OK [02:58:47] RECOVERY Current Load is now: OK on migration1 i-00000261 output: OK - load average: 0.39, 0.64, 0.34 [02:59:03] end of day [02:59:04] take care [02:59:27] RECOVERY Current Users is now: OK on migration1 i-00000261 output: USERS OK - 0 users currently logged in [03:00:47] RECOVERY Free ram is now: OK on migration1 i-00000261 output: OK: 89% free memory [03:01:47] RECOVERY Disk Space is now: OK on migration1 i-00000261 output: DISK OK [03:02:17] RECOVERY Total Processes is now: OK on migration1 i-00000261 output: PROCS OK: 79 processes [03:04:27] RECOVERY Current Users is now: OK on deployment-cache-bits i-00000264 output: USERS OK - 0 users currently logged in [03:06:47] RECOVERY Current Load is now: OK on deployment-cache-bits i-00000264 output: OK - load average: 0.15, 0.86, 0.54 [03:06:47] RECOVERY Free ram is now: OK on deployment-cache-bits i-00000264 output: OK: 88% free memory [03:06:57] RECOVERY Total Processes is now: OK on deployment-cache-bits i-00000264 output: PROCS OK: 79 processes [03:07:07] RECOVERY Disk Space is now: OK on deployment-cache-bits i-00000264 output: DISK OK [03:07:38] RECOVERY dpkg-check is now: OK on deployment-cache-bits i-00000264 output: All packages OK [03:12:48] PROBLEM Disk Space is now: WARNING on bz-dev i-000001db output: DISK WARNING - free space: / 53 MB (4% inode=43%): [03:17:48] PROBLEM Disk Space is now: CRITICAL on bz-dev i-000001db output: DISK CRITICAL - free space: / 38 MB (2% inode=43%): [03:41:48] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 14% free memory [03:44:18] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 11% free memory [03:47:28] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 17% free memory [03:56:48] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [04:01:49] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [04:04:19] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 4% free memory [04:04:49] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 15% free memory [04:07:29] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 4% free memory [04:09:19] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:12:29] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:14:49] PROBLEM Free ram is now: WARNING on test3 i-00000093 output: Warning: 11% free memory [04:19:49] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 96% free memory [04:19:49] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 5% free memory [04:29:49] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 95% free memory [05:00:02] boug [05:19:13] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [05:58:09] PROBLEM Puppet freshness is now: CRITICAL on swift-be1 i-000001c7 output: Puppet has not run in last 20 hours [06:12:28] Ryan_Lane: did u start wm bot by hand [06:12:37] because it has auto restart [06:12:40] on crash [06:12:45] it didn't [06:12:55] ok, so you did start it by hand? [06:12:57] yeah [06:13:00] as the user [06:13:03] using nohup [06:13:09] wmib user [06:13:11] I hope [06:13:12] yep [06:13:14] ok [06:13:24] I added docs for it on the project page too [06:13:27] hello petan :-] [06:13:28] the restart script wait 10 minutes [06:13:33] ah [06:13:35] that's why [06:13:39] I didn't wait that long [06:13:41] let me check if restart doesn't try to restart it now [06:13:59] Ryan_Lane: finally managed to get the squid-frontend to listen to port 80 :-]] [06:14:08] ah. great [06:14:15] Ryan_Lane: but somehow, I can't restart the backend one using /etc/init.d/squid stop [06:14:22] it just fail :-D [06:14:33] will have to polish that tomorrow [06:14:33] Ryan_Lane: you started nohup restart.sh? [06:14:38] or directly bot [06:14:42] using mono [06:14:43] restart.sh [06:14:45] ok [06:14:50] weird it crashed then [06:17:05] petan: do you have any idea how on deployment-web3,4,5 the /usr/local/apache/common-local/ directory is populated ? :-D [06:17:27] I can't find any deb package or puppet class doing it :-( [06:17:41] there is script patch in my home [06:17:53] oharhhhgh :-D [06:17:54] :P [06:18:02] and where are the source file ? [06:18:16] source file is the script itself [06:18:22] ./home/petrb/patch [06:18:33] just execute it [06:18:40] I mean /usr/local/apache/common-local/ contains various files. Where do you copy them from? [06:18:51] it's mountpoint [06:19:07] deployment-nfs:/export/apache I think [06:19:15] type df [06:19:17] or mount [06:19:18] to see [06:19:44] i see [06:19:50] gotta migrate that to puppet :-] [06:19:54] ok [06:20:03] also I fixed the udp2log stuff at one point this week [06:20:14] it now send its stuff to /home/wikipedia/log [06:20:16] how [06:20:23] is there any guide how to do that [06:20:27] can't remember, ton of tweaking / hacking [06:20:31] heh [06:20:38] in the end, it is installed by puppet now [06:20:45] and I have fixed some wrong conf in commonsettings [06:21:06] on that subject. Reedy has migrated the CommonSettings / initi settings and everything to a new git repository :-]] [06:21:13] operations/mediawiki-config.git IIRC [06:21:17] yay [06:21:27] we definitely want to use that one [06:21:33] we need to find out how to merge it with deployment config [06:21:37] and then do all the lab specific stuff in a dedicate file [06:21:38] so that we can just pull [06:21:43] something like LabSettings.php [06:21:46] ok [06:22:02] Reedy might be able to set that up tomorrow [06:22:04] Idk [06:22:10] cool [06:24:08] you probably don't want to use that `patch` script anymore :-D [06:24:58] heh [06:25:28] it did everything I needed [06:25:40] I just started it and I was done configuring server XD [06:25:50] deployment-nfs-memc:/mnt/export/apache on /usr/local/apache ahhhh [06:25:52] :-D [06:26:32] what's up [06:26:37] that's how it works now [06:26:40] is it wrong [06:27:08] apart from using NFS and not being copied from a central place ..... [06:27:10] nothing wrong [06:27:11] :-D [06:27:15] just need to create a puppet class [06:27:19] hm, it has some benefits too XD [06:27:29] you don't need to use so much disk space [06:27:35] labs friendlier [06:27:36] XD [06:27:42] ultimately we want to have the labs instances to be identical to the production one :-] [06:27:49] PROBLEM Disk Space is now: WARNING on bz-dev i-000001db output: DISK WARNING - free space: / 39 MB (3% inode=43%): [06:27:51] ;-) [06:28:01] hashar: do we want to have so many as production servers? :P [06:28:05] like 300 [06:28:11] heh [06:28:12] probably not that much [06:28:23] :D [06:28:24] but certainly more than what we have now [06:28:30] why [06:28:46] ultimately, all the work will happen on that cluster [06:28:53] production will just be a copy of it :-]]] [06:29:00] hm... [06:29:03] ok [06:29:06] so beta labs will become the main testing / integration / staging / preproduction area [06:29:07] somehow [06:29:25] right that's far beyond the original purpose [06:29:29] but ok [06:29:31] so we definitely want it to be fully documented in puppet and as close to production as possible [06:29:48] (that is the long term vision I am giving you there) ;-D [06:29:53] ok [06:33:20] hashar: fix [06:33:21] https://bugzilla.wikimedia.org/show_bug.cgi?id=36685 [06:34:13] yeah will have to poke Reedy about it [06:37:09] PROBLEM Puppet freshness is now: CRITICAL on mobile-feeds i-000000c1 output: Puppet has not run in last 20 hours [06:38:31] New patchset: Hashar; "labs has Apache docroot mounted from NFS" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7157 [06:38:45] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7157 [06:38:52] I should have asked for merge rights on test branch [06:43:37] well [06:43:39] RECOVERY Disk Space is now: OK on deployment-apache09 i-0000025e output: DISK OK [06:43:42] New patchset: Hashar; "mount apache docroot for taskservers and imagescaler" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7158 [06:43:45] been there for 15 hours [06:44:10] time to get some rest [06:44:11] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7158 [06:45:55] PROBLEM Current Load is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:55] PROBLEM Free ram is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:46:22] PROBLEM Disk Space is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:07] petan: have a good day :-] I am off! [06:47:18] PROBLEM Free ram is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:48:09] PROBLEM dpkg-check is now: CRITICAL on mobile-enwp i-000000ce output: CHECK_NRPE: Socket timeout after 10 seconds. [06:48:09] PROBLEM Current Load is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:48:09] PROBLEM Total Processes is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:48:17] PROBLEM dpkg-check is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:48:17] PROBLEM SSH is now: CRITICAL on mobile-enwp i-000000ce output: CRITICAL - Socket timeout after 10 seconds [06:49:04] PROBLEM Current Users is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:34] PROBLEM Current Users is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:49:34] PROBLEM Total Processes is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:37] PROBLEM Current Load is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:37] PROBLEM Disk Space is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:38] PROBLEM Free ram is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:38] PROBLEM Current Users is now: CRITICAL on maps-test2 i-00000253 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:38] PROBLEM Disk Space is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:38] PROBLEM Total Processes is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:51] PROBLEM dpkg-check is now: CRITICAL on nova-precise1 i-00000236 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:51:57] PROBLEM Disk Space is now: WARNING on deployment-apache09 i-0000025e output: DISK WARNING - free space: / 73 MB (5% inode=59%): [06:52:24] PROBLEM Free ram is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:41] PROBLEM Disk Space is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:41] PROBLEM Current Load is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:41] PROBLEM Current Users is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:41] PROBLEM Current Load is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:43] PROBLEM dpkg-check is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:03] PROBLEM Disk Space is now: CRITICAL on ve-nodejs i-00000245 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Total Processes is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Free ram is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Disk Space is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Current Load is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Current Users is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM dpkg-check is now: CRITICAL on fr-wiki-db-precise i-0000023e output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Total Processes is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:37] PROBLEM Free ram is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:55] RECOVERY Disk Space is now: OK on nova-precise1 i-00000236 output: DISK OK [06:55:58] RECOVERY Total Processes is now: OK on nova-precise1 i-00000236 output: PROCS OK: 136 processes [06:55:58] RECOVERY dpkg-check is now: OK on nova-precise1 i-00000236 output: All packages OK [06:56:05] PROBLEM dpkg-check is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:56:11] RECOVERY Disk Space is now: OK on mobile-enwp i-000000ce output: DISK OK [06:56:56] RECOVERY Free ram is now: OK on mobile-enwp i-000000ce output: OK: 34% free memory [06:56:56] RECOVERY Free ram is now: OK on ve-nodejs i-00000245 output: OK: 87% free memory [06:58:26] RECOVERY Total Processes is now: OK on pediapress-ocg2 i-00000234 output: PROCS OK: 86 processes [06:58:32] RECOVERY Free ram is now: OK on pediapress-ocg2 i-00000234 output: OK: 90% free memory [06:59:26] RECOVERY Current Users is now: OK on ve-nodejs i-00000245 output: USERS OK - 0 users currently logged in [06:59:26] RECOVERY Total Processes is now: OK on ve-nodejs i-00000245 output: PROCS OK: 80 processes [06:59:56] PROBLEM Current Load is now: WARNING on maps-test2 i-00000253 output: WARNING - load average: 4.13, 6.46, 5.50 [06:59:56] RECOVERY Disk Space is now: OK on maps-test2 i-00000253 output: DISK OK [06:59:56] RECOVERY Free ram is now: OK on maps-test2 i-00000253 output: OK: 93% free memory [06:59:56] RECOVERY Current Users is now: OK on maps-test2 i-00000253 output: USERS OK - 0 users currently logged in [07:00:56] PROBLEM Current Load is now: WARNING on mobile-enwp i-000000ce output: WARNING - load average: 2.60, 13.54, 16.73 [07:00:57] RECOVERY dpkg-check is now: OK on pediapress-ocg2 i-00000234 output: All packages OK [07:02:06] RECOVERY Current Load is now: OK on ve-nodejs i-00000245 output: OK - load average: 0.57, 3.57, 3.50 [07:02:06] RECOVERY dpkg-check is now: OK on ve-nodejs i-00000245 output: All packages OK [07:02:06] RECOVERY Disk Space is now: OK on pediapress-ocg2 i-00000234 output: DISK OK [07:02:06] RECOVERY Current Load is now: OK on pediapress-ocg2 i-00000234 output: OK - load average: 5.17, 6.14, 4.45 [07:02:06] RECOVERY Current Users is now: OK on pediapress-ocg2 i-00000234 output: USERS OK - 0 users currently logged in [07:02:56] PROBLEM Current Load is now: WARNING on nova-precise1 i-00000236 output: WARNING - load average: 3.49, 7.52, 6.90 [07:02:56] RECOVERY SSH is now: OK on mobile-enwp i-000000ce output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [07:02:56] RECOVERY Total Processes is now: OK on maps-test2 i-00000253 output: PROCS OK: 91 processes [07:03:01] RECOVERY dpkg-check is now: OK on mobile-enwp i-000000ce output: All packages OK [07:03:01] RECOVERY dpkg-check is now: OK on maps-test2 i-00000253 output: All packages OK [07:03:01] RECOVERY Disk Space is now: OK on ve-nodejs i-00000245 output: DISK OK [07:03:01] RECOVERY Total Processes is now: OK on fr-wiki-db-precise i-0000023e output: PROCS OK: 80 processes [07:03:06] RECOVERY Free ram is now: OK on fr-wiki-db-precise i-0000023e output: OK: 78% free memory [07:03:06] RECOVERY Disk Space is now: OK on fr-wiki-db-precise i-0000023e output: DISK OK [07:03:07] RECOVERY Current Load is now: OK on fr-wiki-db-precise i-0000023e output: OK - load average: 0.92, 3.67, 3.28 [07:03:07] RECOVERY Current Users is now: OK on fr-wiki-db-precise i-0000023e output: USERS OK - 0 users currently logged in [07:03:07] RECOVERY dpkg-check is now: OK on fr-wiki-db-precise i-0000023e output: All packages OK [07:03:56] RECOVERY Current Users is now: OK on nova-precise1 i-00000236 output: USERS OK - 0 users currently logged in [07:04:56] RECOVERY Current Load is now: OK on maps-test2 i-00000253 output: OK - load average: 0.09, 2.57, 4.09 [07:05:56] RECOVERY Free ram is now: OK on nova-precise1 i-00000236 output: OK: 71% free memory [07:06:58] ACKNOWLEDGEMENT Puppet freshness is now: CRITICAL on mobile-feeds i-000000c1 output: Puppet has not run in last 20 hours [07:07:28] ACKNOWLEDGEMENT Puppet freshness is now: CRITICAL on swift-be1 i-000001c7 output: Puppet has not run in last 20 hours [07:07:43] ACKNOWLEDGEMENT Puppet freshness is now: CRITICAL on wikidata-dev-2 i-00000259 output: Puppet has not run in last 20 hours [07:12:56] RECOVERY Current Load is now: OK on nova-precise1 i-00000236 output: OK - load average: 0.19, 1.14, 3.69 [07:20:56] RECOVERY Current Load is now: OK on mobile-enwp i-000000ce output: OK - load average: 0.59, 0.81, 4.98 [08:02:56] PROBLEM Disk Space is now: CRITICAL on bz-dev i-000001db output: DISK CRITICAL - free space: / 39 MB (2% inode=43%): [10:09:34] !log deployment-prep petrb: fixed teh missing NOT FOUND error page [10:09:37] Logged the message, Master [10:15:26] 05/10/2012 - 10:15:26 - Creating a home directory for ariel at /export/home/mail/ariel [10:16:25] 05/10/2012 - 10:16:25 - Updating keys for ariel [10:33:45] PROBLEM Current Load is now: CRITICAL on exim-test i-00000265 output: CHECK_NRPE: Error - Could not complete SSL handshake. [10:34:25] PROBLEM Current Users is now: CRITICAL on exim-test i-00000265 output: CHECK_NRPE: Error - Could not complete SSL handshake. [10:35:05] PROBLEM Disk Space is now: CRITICAL on exim-test i-00000265 output: CHECK_NRPE: Error - Could not complete SSL handshake. [10:35:45] PROBLEM Free ram is now: CRITICAL on exim-test i-00000265 output: CHECK_NRPE: Error - Could not complete SSL handshake. [10:36:55] PROBLEM Total Processes is now: CRITICAL on exim-test i-00000265 output: CHECK_NRPE: Error - Could not complete SSL handshake. [10:37:35] PROBLEM dpkg-check is now: CRITICAL on exim-test i-00000265 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:14:00] PROBLEM Current Users is now: CRITICAL on dumps-7 i-00000267 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:14:00] PROBLEM Current Load is now: CRITICAL on dumps-6 i-00000266 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:14:30] PROBLEM Current Users is now: CRITICAL on dumps-6 i-00000266 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:14:30] PROBLEM Disk Space is now: CRITICAL on dumps-7 i-00000267 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:15:05] PROBLEM Disk Space is now: CRITICAL on dumps-6 i-00000266 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:15:05] PROBLEM Free ram is now: CRITICAL on dumps-7 i-00000267 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:16:00] PROBLEM Free ram is now: CRITICAL on dumps-6 i-00000266 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:16:20] PROBLEM Total Processes is now: CRITICAL on dumps-7 i-00000267 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:17:00] PROBLEM dpkg-check is now: CRITICAL on dumps-7 i-00000267 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:17:00] PROBLEM Total Processes is now: CRITICAL on dumps-6 i-00000266 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:17:30] PROBLEM dpkg-check is now: CRITICAL on dumps-6 i-00000266 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:18:29] PROBLEM Current Load is now: CRITICAL on dumps-7 i-00000267 output: CHECK_NRPE: Error - Could not complete SSL handshake. [11:24:00] RECOVERY Current Load is now: OK on dumps-6 i-00000266 output: OK - load average: 1.27, 0.56, 0.52 [11:24:30] RECOVERY Current Users is now: OK on dumps-6 i-00000266 output: USERS OK - 1 users currently logged in [11:25:00] RECOVERY Disk Space is now: OK on dumps-6 i-00000266 output: DISK OK [11:25:00] RECOVERY Free ram is now: OK on dumps-7 i-00000267 output: OK: 92% free memory [11:26:00] RECOVERY Free ram is now: OK on dumps-6 i-00000266 output: OK: 93% free memory [11:26:20] RECOVERY Total Processes is now: OK on dumps-7 i-00000267 output: PROCS OK: 82 processes [11:27:00] RECOVERY Total Processes is now: OK on dumps-6 i-00000266 output: PROCS OK: 82 processes [11:27:05] RECOVERY dpkg-check is now: OK on dumps-7 i-00000267 output: All packages OK [11:27:30] RECOVERY dpkg-check is now: OK on dumps-6 i-00000266 output: All packages OK [11:28:10] RECOVERY Current Load is now: OK on dumps-7 i-00000267 output: OK - load average: 0.03, 0.28, 0.41 [11:29:00] RECOVERY Current Users is now: OK on dumps-7 i-00000267 output: USERS OK - 0 users currently logged in [11:29:30] RECOVERY Disk Space is now: OK on dumps-7 i-00000267 output: DISK OK [15:19:12] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [15:40:48] hello [17:33:49] Is it possible to map a sub-sub-domain through labsconsole? I want to set up test.reportcard.wmflabs.org [17:34:18] I don't want to use a hyphen :( that's ugly. [17:35:27] yes [17:36:12] petan: Bad resource name provided. Resource names start with a-z, and can only contain a-z, 0-9, -, and _ characters. [17:37:23] dschoon: Are you using https://labsconsole.wikimedia.org/wiki/Special:NovaDomain ? [17:37:26] Or something else? [17:37:54] andrewbogott_: https://labsconsole.wikimedia.org/wiki/Special:NovaAddress [17:38:00] is that the wrong page? [17:38:13] dschoon: ... [17:38:20] I am curtly told I am not a cloudadmin by https://labsconsole.wikimedia.org/wiki/Special:NovaDomain [17:38:24] Ryan_Lane: ... [17:38:41] reportcard is the test server, is it not? [17:38:58] >:( that is immaterial. [17:39:25] you can have a subdomain, but we'd need to set reportcard up as a domain [17:39:34] it also tells me i cannot have Ryan_Lane.reportcard.wmflabs.org [17:39:36] ok. [17:39:43] so i should just use a hyphen then :( [17:39:51] it's easiet [17:39:56] s'fine [17:39:59] easiest [17:40:01] about how long is DNS propagation for labs domains? [17:40:08] instant [17:40:26] hmm. [17:40:34] i'll flush local cache then. [17:40:47] New patchset: Hashar; "labs has Apache docroot mounted from NFS" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7157 [17:40:50] remembering of course that negative cache can be a while [17:40:50] PROBLEM Free ram is now: WARNING on mobile-enwp i-000000ce output: Warning: 6% free memory [17:41:01] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7157 [17:41:18] is there a reason wikimedia.org isn't a domain option? [17:41:58] because that would be amazingly insecure? [17:44:48] just making sure. [17:44:56] and now the fun questions. [17:45:04] New patchset: Hashar; "mount apache docroot for taskservers and imagescaler" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7158 [17:45:19] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7158 [17:45:42] Ryan_Lane, if I get "Permission denied (publickey)." when I attempt to ssh to a labs host, what can I do to fix this? reboot it? [17:45:50] RECOVERY Free ram is now: OK on mobile-enwp i-000000ce output: OK: 40% free memory [17:45:51] the console doesn't show anything weird that i see [17:45:56] which instance? [17:46:17] kripke == i-000001fc [17:46:39] New review: Faidon; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7157 [17:46:41] i'll refrain from rebooting it if you want to touch it [17:46:42] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7157 [17:47:25] I'm betting it installed improperly [17:47:27] and was rebooted [17:47:41] based on this line: user-scripts already ran once-per-instance [17:48:05] did it ever work? [17:48:20] unsure. i did this weeks ago. [17:48:22] lame. [17:48:29] i just added all the addresses to it. [17:48:30] delete/recreate [17:48:33] yeah. [17:48:35] that's no big deal [17:48:39] i'll have to redo all that, won't i [17:48:53] the addresses go with the floating ip [17:48:54] not the instance [17:48:56] ah! [17:48:58] sweet. [17:49:01] so i just disassociate [17:49:10] that's really hot, actually [17:50:16] New patchset: Hashar; "mount apache docroot for taskservers and imagescaler" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7158 [17:50:22] just to double-check, because the confirmation page doesn't tell me what i'm about to actually do [17:50:30] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7158 [17:50:45] Ryan_Lane, i *do* want to click "Disassociate IP", right? [17:51:19] deleting the instance will do that for you [17:51:24] ok [17:51:36] New review: Faidon; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7158 [17:51:38] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7158 [17:51:49] ty, will let you know how it goes. [17:57:44] PROBLEM host: kirke is DOWN address: i-000001fd check_ping: Invalid hostname/address - i-000001fd [17:57:44] PROBLEM host: kripke is DOWN address: i-000001fc check_ping: Invalid hostname/address - i-000001fc [18:02:45] RECOVERY host: kripke is UP address: i-00000268 PING OK - Packet loss = 0%, RTA = 0.97 ms [18:02:55] PROBLEM Current Load is now: CRITICAL on kripke i-00000268 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:02:55] PROBLEM Disk Space is now: CRITICAL on kripke i-00000268 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:03:25] PROBLEM dpkg-check is now: CRITICAL on kripke i-00000268 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:04:07] PROBLEM Current Users is now: CRITICAL on kripke i-00000268 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:09:05] PROBLEM Free ram is now: CRITICAL on kripke i-00000268 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:09:05] PROBLEM Total Processes is now: CRITICAL on kripke i-00000268 output: CHECK_NRPE: Error - Could not complete SSL handshake. [19:41:27] New patchset: Hashar; "/mnt/{thumbs,upload} for labs" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7209 [19:41:41] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7209 [19:41:52] paravoid: here for ya :_D https://gerrit.wikimedia.org/r/7209 [19:42:03] nfs mount for /mnt/upload6 and /mnt/thumbs [19:42:44] ugh [19:42:49] upload6, ew [19:42:54] not your fault of course [19:43:02] it is cause it uses ipv6 maybe :-D [19:43:07] according to LeslieCarr anyway [19:43:20] or just because that is the 6th generation of /mnt/upload [19:43:22] ahah [19:43:25] hehe [19:43:33] probably the latter [19:43:42] as for your changes themselves [19:43:49] I wouldn't do them that way at all [19:43:53] hehe [19:44:12] the idea was to keep calls to nfs::upload unchanged [19:44:18] because that way we'll end up having if ($realm ...) } { else { } in all of our manifests [19:44:25] making them even more complicated than they are [19:44:34] so nfs::upload::labs ? [19:44:45] this is certainly not the puppet way, but it all has to do with how things are being done around here [19:44:55] which I don't know yet :-) [19:45:36] so, I propose that we either get another review or let it in now and ask after the fact [19:46:00] (I couldn't find Ryan) [19:46:16] one issue with creating a new class [19:46:33] i stat I will have to change all the "include nfs::upload" calls [19:46:58] and made them something like: if( $realm..) { include nfs::upload } else { include nfs::upload::labs } [19:46:58] :-( [19:47:14] OR we could make the NFS server name a puppet variable [19:47:21] as well as the export paths [19:47:49] then call the classes using parameters [20:24:33] !log deployment-prep deployment-imagescaler01 apache does not log anymore :-( [20:24:36] Logged the message, Master [20:24:56] !log deployment-prep running 'apt-get install --reinstall apache2.2-common' to attempt to fix /var/log/apache2 rights (root:arm) [20:24:57] Logged the message, Master [20:25:39] our cluster is really lame [20:25:59] lol [20:26:04] like /var/log/apache2 belongs to root:arm and is -rwxr-x--- [20:26:25] that's on the "real" machines as well ? [20:27:41] yuip [20:27:44] just checked on srv289 [20:27:52] so I guess apache is able to write as root ? ;:-D [20:28:02] OR [20:28:11] someone set that to prevents apache from writing to /var/log/apache2 [20:28:16] to avoid filling hard disks [20:28:45] apache2.conf:ErrorLog syslog [20:28:47] \O/ [20:32:51] probably a way to enforce it won't write there even if the ErrorLog directive stopped being obeyed [20:35:28] now I have to figure out where the log are sent too hehe [20:36:40] !log deployment-prep fixing a few php notices and general logic problems in wmf-config [20:36:41] Logged the message, Master [20:52:09] hashar: do you know if the git repo on labs for usr/local/apache/common should be pushed anywhere? [20:52:13] I don't see any remote set up [20:52:24] I committed my changes [20:53:06] Did you temporarily override wgDebugLogFile in CommonSettings.php? no problem, but I saw it as an uncommitted change [20:53:24] left it for now (committed my other changes, temporarily undid wgDebugLogFile, then restored it - done now) [20:54:12] That common should die in a fire [20:54:18] and use mediawiki-config [20:54:43] May be we can start using operations/mediawiki-config on labs (read-only, with only local changes in another include, e.g. database stuff) [20:54:47] yeah [20:54:56] That's the reason I did the migration this week [20:54:57] so we can [20:55:02] may not be compatible yet [20:55:11] (ignoring the fact it's been requested for a long time) [20:55:18] it is based on an old copy from prototype which was based on an older copy of the production ./common [20:55:45] I just got rid of broken usability initiative variables that where throwing notices on labs [20:56:02] those haven't been on the cluster for a while [20:56:10] oh and wmgHTTPSExperiment [20:56:22] had a few if () uses, but not defined anywher [21:04:01] if i use the create-wiki tool and do —data enwiki, does this pick up things like templates, images, etc as well? [21:11:40] paravoid: have you find Ryan around ? :D [21:11:51] paravoid: for https://gerrit.wikimedia.org/r/#/c/7209 (mounting /mnt/upload ) [21:15:25] New patchset: Hashar; "/mnt/{thumbs,upload} for labs" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7209 [21:15:39] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7209 [21:23:34] New review: Faidon; "I'm not entirely happy with that, since doing the whole if/else thing produces cruft with our tree a..." [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7209 [21:23:36] Change merged: Faidon; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7209 [21:23:38] hashar: sorry, was at lunch [21:23:59] hashar: found Ryan, said that we have to discuss this internally in the team and formulate a policy [21:24:11] most probably :-D [21:24:15] but we shouldn't delay this work until that, so I just merged your change [21:24:22] ohh nice thx [21:24:36] I already had the change applied manually anyway [21:24:49] been hunting some syslog messages [21:24:51] since [21:25:22] I got log sent to a box and no idea where they are written to :-( [21:25:34] which has been pissing me off for the last 2 hours or so :-( [21:37:44] Ryan_Lane: if i use the create-wiki tool and do —data enwiki, does this pick up things like templates, images, etc as well as basic article text? [21:38:30] Heh. That's a proposed spec. That tool doesnt actually exist [21:38:36] lol [21:38:37] oh. [21:38:48] ha ha ha ha [21:38:54] ha ha ha ha [21:38:58] and one more ha [21:39:16] That tool == unicorns [21:39:47] ha ha [21:40:06] Ryan_Lane: any timeline for the arrival of the unicorns? [21:40:37] So, I started a new project, and it kind of depends on a tool like that [21:42:01] so like, weeks? months? years? [21:44:30] Months [21:44:37] Realistically [21:45:55] PROBLEM Disk Space is now: WARNING on deployment-feed i-00000118 output: DISK WARNING - free space: / 78 MB (5% inode=40%): [21:46:23] k thanks Ryan_Lane [21:46:43] Yw [21:46:53] !log deployment-prep Creating a syslog server instance. I have a VERY nasty conflict between misc::syslog-server and misc::mediawiki-logger which tries to install conflicting packages ( syslog-ng / rsyslog ) [21:46:54] Logged the message, Master [21:48:40] Ryan_Lane: how do you do an update of deployment-prep so that it relates to the latest version of production? For example, wikimani2013 wiki isn't there at the moment, probably some changes in InitiliaseSettings.php [21:49:01] I don't know. [21:49:06] The devs do it [21:49:31] Thehelpfulone: pending [21:49:36] I doubt we'll add in wikis like wiki mania ones [21:49:46] Do we need to test those, really? [21:49:48] Thehelpfulone: Reedy is in the progress of cleaning up the production files ( php files + htdocs etc...) [21:49:59] Thehelpfulone: once that is done, we can probably try deploying them on labs [21:50:07] hashar: ok [21:50:19] wikimania2013, we probably do not care :-D [21:50:19] not necessarily Ryan_Lane but if we want to keep it as close to production as possible, it's a good idea to no? [21:51:02] Probably. As long as [21:51:04] It's automates [21:51:15] Sorry. On a phone [21:51:39] np [21:53:26] ok. on a laptop now [21:53:27] easier [21:53:44] PROBLEM Current Load is now: CRITICAL on deployment-syslog i-00000269 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:54:24] PROBLEM Current Users is now: CRITICAL on deployment-syslog i-00000269 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:55:04] PROBLEM Disk Space is now: CRITICAL on deployment-syslog i-00000269 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:55:44] PROBLEM Free ram is now: CRITICAL on deployment-syslog i-00000269 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:56:54] PROBLEM Total Processes is now: CRITICAL on deployment-syslog i-00000269 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:58:09] so syslog::server is just unusable :-( [21:58:44] RECOVERY Current Load is now: OK on deployment-syslog i-00000269 output: OK - load average: 0.59, 0.60, 0.42 [21:59:24] RECOVERY Current Users is now: OK on deployment-syslog i-00000269 output: USERS OK - 1 users currently logged in [21:59:34] hashar: also, http://deployment.wikimedia.beta.wmflabs.org/wiki/Special:RecentChanges (diff | hist) . . N! Muchas Game gracias zavvi LA MEJOR PAGINA‎; 09:18 . . (+4,716)‎ . . ‎10.4.0.17 (Talk | block)‎ (Created page with "Image:GameGame_1063.jpg yo yo subji how's the Midlands ?? kkkk JA QUERO JA PRECISOOO!!!! thanks for the follow....hope you enjoy the feed. Partying in Moore square...") [21:59:40] how is this being done through the local IP? [21:59:56] I have no idea [21:59:57] Reedy: Do you know where operations/mediawik-config is checked out on fenari? I'd like to make it match on labs, I'll try a checkout (keeping the current /common in tact of course) [22:00:04] RECOVERY Disk Space is now: OK on deployment-syslog i-00000269 output: DISK OK [22:00:11] it's at /common [22:00:26] /common is the repo or is mediawiki-config inside [22:00:44] RECOVERY Free ram is now: OK on deployment-syslog i-00000269 output: OK: 87% free memory [22:01:54] RECOVERY Total Processes is now: OK on deployment-syslog i-00000269 output: PROCS OK: 86 processes [22:03:19] Reedy: [22:05:08] hashar: Is part of your plan this week to set up labs to use mediawiki-config repo? If not I could help with that [22:05:18] /home/wikipedia/common [22:05:48] Reedy: Ah, right, it isn't in /usr/local/apache/common, that's only the case on the apaches, on fenari it is in /h/w [22:05:49] okay [22:05:57] No [22:06:05] both ? [22:06:06] Those are essentially the same location [22:06:09] Fenari has both [22:06:20] /h/w/c is copied to become /u/l/a/c [22:06:21] !deployment-prep Disabling syslog-server class entirely for now so we can keep the mediawiki-logguer one (aka udp2log ) [22:06:21] deployment-prep is a project to test mediawiki at beta.wmflabs.org before putting it to prod [22:06:26] right, one is the working copy and scap  puts it everywhere else [22:06:28] yeah [22:06:34] !deployment-prep End result: no syslog at all from apaches yeah!!! [22:06:34] deployment-prep is a project to test mediawiki at beta.wmflabs.org before putting it to prod [22:06:35] and it also copies to itself [22:06:45] hashar: log :) [22:07:02] it is essentially totally f*** up [22:07:08] no way to set up a syslog server :( [22:07:10] Reedy: so does test.wikipedia actually run off fenari ? [22:07:16] no [22:07:17] puppet is fun [22:07:23] ok [22:07:24] It runs off /h/w/c mounted on srv193 [22:07:59] ah, skay, essentially the same. fenari doesn't take the request hits, but the code does run off fenari [22:08:06] Krinkle: as for the config, ask sam :-D [22:08:08] so one wouldn't have to scap in order to test on testwiki [22:08:18] nope [22:08:23] cool [22:08:27] soon as it's saved it's live [22:08:48] yeah, but only for test.wikipedia / srv193, or does srv193 also server other wikis? [22:10:05] No, srv193 serves test only [22:10:23] okay [22:33:03] is there any labs-specific reason why $wgUseInstantCommons wouldn't work on a labs instance? [22:37:25] Hmm [22:37:32] Isn't the cluster firewalled from labs? [22:37:36] * RoanKattouw looks for a Ryan_Lane [22:37:45] He's not here and not on IRC [22:41:07] ahha. [22:41:36] Sounds about right [22:41:43] that would make sense. [22:44:01] hmm i can curl http://commons.wikimedia.org/w/api.php tho [22:45:03] hmm [22:55:09] jeremyb: access on labs deployment-prep wmf-config works now, thanks again [22:55:32] RoanKattouw: InstantCommons uses API [22:55:41] that should work, unless outgoing connections is blocked entirely [22:56:01] and btw, we should probably use commons labs as file repo, not production commons, otherwise we're testing something else [22:56:03] as db repo [22:57:05] on another wiki inside commons wgUseInstantCommons is fine though, but for deployment-prep it should use commons.beta [22:57:09] inside labs* [23:55:45] PROBLEM host: deployment-thumbrewrite is DOWN address: i-0000026a check_ping: Invalid hostname/address - i-0000026a