[01:54:28] PROBLEM Total Processes is now: CRITICAL on eraseme-andrewb i-0000027f output: CHECK_NRPE: Error - Could not complete SSL handshake. [01:55:04] PROBLEM dpkg-check is now: CRITICAL on eraseme-andrewb i-0000027f output: CHECK_NRPE: Error - Could not complete SSL handshake. [01:56:14] PROBLEM Current Load is now: CRITICAL on eraseme-andrewb i-0000027f output: CHECK_NRPE: Error - Could not complete SSL handshake. [01:56:54] PROBLEM Current Users is now: CRITICAL on eraseme-andrewb i-0000027f output: CHECK_NRPE: Error - Could not complete SSL handshake. [01:57:34] PROBLEM Disk Space is now: CRITICAL on eraseme-andrewb i-0000027f output: CHECK_NRPE: Error - Could not complete SSL handshake. [01:58:14] PROBLEM Free ram is now: CRITICAL on eraseme-andrewb i-0000027f output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:09:27] RECOVERY Total Processes is now: OK on eraseme-andrewb i-0000027f output: PROCS OK: 97 processes [02:10:04] RECOVERY dpkg-check is now: OK on eraseme-andrewb i-0000027f output: All packages OK [02:11:14] RECOVERY Current Load is now: OK on eraseme-andrewb i-0000027f output: OK - load average: 0.23, 1.01, 1.21 [02:11:54] RECOVERY Current Users is now: OK on eraseme-andrewb i-0000027f output: USERS OK - 2 users currently logged in [02:12:34] RECOVERY Disk Space is now: OK on eraseme-andrewb i-0000027f output: DISK OK [02:13:14] RECOVERY Free ram is now: OK on eraseme-andrewb i-0000027f output: OK: 88% free memory [02:14:46] New patchset: Andrew Bogott; "Added ffmpeg2theora to the imagescaler package list." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8564 [02:18:54] ah, andrewbogott_, i was just catching up on my mail and was about to start poking people asking for info about what the current state is, etc. [02:19:05] but i guess you've fixed it already ;) [02:19:15] if you remember your password and merge yourself! [02:19:30] oh, merged already [02:19:39] why gerrit-wm not speaking? [02:25:18] hashar: re your recent mail, in which bz product/component? [02:25:32] i assume labs :: beta. but maybe people don't know that [02:27:33] hi [02:28:03] jeremyb: gerrit-wm has been +q in #mediawiki [02:28:03] hi! [02:28:11] hashar: no, in *here* [02:28:17] ohooooh [02:28:19] well [02:28:20] hmm [02:28:22] broken? ;-D [02:28:33] hashar: see the last thing it said. it never mentioned that it was merged [02:28:49] I got New patchset: Andrew Bogott; "Added ffmpeg2theora to the imagescaler package list." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8564 [02:29:21] well I have honestly no idea :( [02:29:41] right [02:29:57] there was a recent change to ignore the l10n bot... [02:29:59] anyway... [02:30:52] hashar: saw a guy get on the subway today with a cart full of stuff from Babies"R"Us. immediately thought of you. ;-) [02:31:04] ;-D [02:31:17] I have went to bed at 8pm yesterday [02:31:22] woot [02:31:35] * jeremyb goes to bed at 22:32, right now ;) [02:31:42] daughter started crying a bit past midnight and my wife took care of her till 3am [02:31:43] see you tomorrow [02:31:44] then I woke up [02:31:46] and now [02:31:52] I am starting my work day [02:31:53] ;-D [02:32:01] my life sucks [02:32:24] what did you do for the last 4.5 hrs? [02:32:49] slept till 3am ;-D [02:32:58] hmm no [02:33:08] I can't remember already haha [02:33:21] anyway, the end result is that I am working at night!! [02:33:33] haha [02:33:45] PROBLEM Current Load is now: CRITICAL on testing-andrewb i-00000280 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:33:54] i just reran my ssh loop and it's still not installed on the imagescalaer. fyi [02:34:25] * jeremyb runs away [02:34:27] PROBLEM Current Users is now: CRITICAL on testing-andrewb i-00000280 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:35:04] PROBLEM Disk Space is now: CRITICAL on testing-andrewb i-00000280 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:35:05] it is ;D [02:35:19] but that is the job runner instances which are going to handle that [02:35:23] aka video trasncoding [02:35:44] PROBLEM Free ram is now: CRITICAL on testing-andrewb i-00000280 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:36:54] PROBLEM Total Processes is now: CRITICAL on testing-andrewb i-00000280 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:38:18] PROBLEM dpkg-check is now: CRITICAL on testing-andrewb i-00000280 output: CHECK_NRPE: Error - Could not complete SSL handshake. [02:39:20] 05/23/2012 - 02:39:20 - Updating keys for laner at /export/home/deployment-prep/laner [02:41:05] RECOVERY Disk Space is now: OK on tw-next i-0000027e output: DISK OK [02:41:05] RECOVERY Free ram is now: OK on tw-next i-0000027e output: OK: 79% free memory [02:42:56] RECOVERY Total Processes is now: OK on tw-next i-0000027e output: PROCS OK: 70 processes [02:44:25] RECOVERY Current Load is now: OK on tw-next i-0000027e output: OK - load average: 0.03, 0.17, 0.12 [02:45:05] RECOVERY Current Users is now: OK on tw-next i-0000027e output: USERS OK - 0 users currently logged in [02:45:35] RECOVERY Free ram is now: OK on testing-andrewb i-00000280 output: OK: 82% free memory [02:46:04] hashar: We meet again :D [02:46:55] RECOVERY Total Processes is now: OK on testing-andrewb i-00000280 output: PROCS OK: 93 processes [02:47:55] RECOVERY dpkg-check is now: OK on testing-andrewb i-00000280 output: All packages OK [02:48:19] 05/23/2012 - 02:48:19 - Updating keys for laner at /export/home/deployment-prep/laner [02:49:21] 05/23/2012 - 02:49:21 - Updating keys for laner at /export/home/deployment-prep/laner [02:50:20] 05/23/2012 - 02:50:19 - Updating keys for laner at /export/home/deployment-prep/laner [02:53:20] 05/23/2012 - 02:53:19 - Updating keys for laner at /export/home/deployment-prep/laner [03:13:10] Krinkle: somehow :D [03:13:18] I am going to try to get some sleep [03:13:19] ++ [03:16:21] me too [03:19:29] PROBLEM Current Load is now: WARNING on deployment-nfs-memc i-000000d7 output: WARNING - load average: 6.89, 7.18, 5.47 [03:24:31] RECOVERY Current Load is now: OK on deployment-nfs-memc i-000000d7 output: OK - load average: 0.96, 4.67, 5.00 [03:29:09] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 20% free memory [03:37:09] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 17% free memory [03:40:19] 05/23/2012 - 03:40:19 - Updating keys for laner at /export/home/deployment-prep/laner [03:43:09] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [03:44:19] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 14% free memory [03:52:02] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5 output: Critical: 1% free memory [03:58:22] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 15% free memory [03:59:32] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 4% free memory [04:02:29] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 15% free memory [04:04:29] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:08:29] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 14% free memory [04:13:35] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [04:23:35] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [04:27:29] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 5% free memory [04:28:29] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 3% free memory [04:33:29] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 94% free memory [04:37:29] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [05:54:33] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [05:59:28] RECOVERY Free ram is now: OK on bots-2 i-0000009c output: OK: 20% free memory [06:07:28] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [06:18:27] join #wikimedia-external-links [06:45:32] Hello, last time I checked I had an account in labs, but now I see none! [06:46:09] It cannot recognize my e-mail either. What is the problem? [06:46:56] https://www.mediawiki.org/wiki/Developer_access#Wikitanvir [06:47:13] Here is my request. I got the login details via e-mail, and logged in. [06:47:31] But now I am trying to log in again and it says this account no longer exists! o.O [07:27:37] RECOVERY Free ram is now: OK on bots-2 i-0000009c output: OK: 20% free memory [08:25:12] PROBLEM Puppet freshness is now: CRITICAL on mailman-01 i-00000235 output: Puppet has not run in last 20 hours [09:01:59] Barebone: hey [09:02:19] Barebone: what is your account name [09:03:11] meh [09:03:14] hashar: hey [09:03:18] what's up [09:04:03] User:Tanvir Rahman [09:04:19] what's the problem [09:04:53] Aye, /me bombs the lab. [09:05:10] I have some pretty scary explosive nicks. [09:05:21] * C-4 is one of them. [09:05:51] ok, I guess you don't have any problem with your account right [09:06:18] Where? I cannot login in labs wiki. [09:06:26] what error do you get [09:06:47] you account exist and is active [09:06:48] System says, there is not account named Tanvir Rahman. [09:07:01] There are no emails named wikitanvir@gmail.com [09:07:11] Then why I can't log in? [09:07:20] PROBLEM HTTP is now: CRITICAL on deployment-apache23 i-00000270 output: CRITICAL - Socket timeout after 10 seconds [09:07:33] Trying again now. [09:07:39] you probably didn't fill in correct name [09:07:45] because it says your account exist [09:07:57] and when I try to login as that, it tell me password is wrong [09:08:13] so that problem is in name, not in labs [09:08:32] It says the same when I give a wrong password, actually. [09:09:05] But later I tried to reveal my password then it says the account does not exist. [09:11:07] Petan|wk, here you go: http://i.imgur.com/myQc7.png [09:11:28] reveal? [09:11:37] you mean reset I hope :D [09:12:10] PROBLEM HTTP is now: WARNING on deployment-apache23 i-00000270 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.006 second response time [09:12:25] Oh aye, that! [09:16:03] petan|wk: hello :-) [09:16:03] can you create a ticket [09:16:12] C-4: ^ [09:16:14] hashar: heya [09:19:28] are deployment-job-runner?? processing all jobs or just a selection? [09:19:51] wondering if they also start the encoding jobs [09:20:42] they do everything [09:20:56] even TMH encoding [09:21:08] but I did not verify they are actually succeeding [09:21:16] also need to find a wait to resubmit all encoding jobs [09:21:25] they dont have the required packages installed [09:21:33] which are ? [09:21:43] ffmpeg2 @ some version ? [09:21:47] I opened a bug for that [09:21:55] need to get ops to look at it [09:22:17] https://bugzilla.wikimedia.org/show_bug.cgi?id=37043 [09:22:21] backport ffmpeg packages to Ubuntu lucid [09:22:22] :-( [09:22:30] hashar: there is a script that contains all necessary for it to work [09:22:31] hashar: i created a puppet class that installs the required packages, mostly ffmpeg2theora and ffmpeg in versions newer than lucid [09:22:44] ah, or puppet of couse [09:22:46] hashar: i already backported the package its in a ppa [09:25:14] petan|wk: I want everything in puppet ;-D [09:25:15] hashar: https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=blob;f=manifests/timedmediahandler.pp;h=a3e507c230ac71ae14176d05100d7e8618822471;hb=refs/heads/test [09:25:31] but you might want to merge it into some other manifest [09:25:36] petan|wk: so we can later easily install all the stuff on production by just cherry-picking / merging form `test` to `production` [09:25:48] yeah I have noticed the ppa [09:26:05] I am pretty sure we are not going to use a third party repo [09:26:12] that manifest right now is for the apache [09:26:45] for the transcoding you also need ffmpeg2theora [09:27:15] hashar: whats the way to backport a package in that case? [09:28:17] so far i only heare ppa might not be an option, but whats the alternative? and who should i talk to about it [09:29:54] in the long run (once everything is running on precise) there is no need for custom packages [09:30:46] j^: back sorry [09:30:55] j^: ops do not want a third party repository for security reason I guess [09:31:08] so we need to find out the debian package from a later Ubuntu version [09:31:20] build it for a Lucid target (aka backporting) [09:31:27] then put the package on apt.wikimedia.org [09:31:42] possibly under a different name, like ffmpeg-backport or similar [09:32:12] j^: will ask paravoid :) [09:33:21] hashar_: i can help if there are questsion about the backports, we need to backport ffmpeg and ffmpeg2theora + libvpx, he can look at https://launchpad.net/~j/+archive/timedmediahandler [09:34:42] oh libvpx [09:34:45] another one ;-D [09:34:45] New patchset: Hashar; "(bug 37046) fix apache monitoring on deployment-prep" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8575 [09:34:52] j^: can you possibly update https://bugzilla.wikimedia.org/show_bug.cgi?id=37043 ? [09:35:18] we will probably want to craft our own package [09:35:33] something like wikimedia-timedmediahandler or something [09:36:42] hashar_: for the job-runner vms, do they have a special config section right now? transcoding needs some tweaking [09:37:14] nothing yet [09:37:20] I need to find out the tweaks [09:37:23] and puppetize them [09:37:44] do you have a list of such tweaks? [09:38:10] hashar_: in /apache/common/wmf-config/CommonSettings.php look for the /etc/wikimedia-transcoding block [09:38:17] $wgTranscodeBackgroundTimeLimit = 3600 * 4; [09:38:17] $wgMaxShellMemory = 3000000; [09:38:17] $wgMaxShellTime = 3600 * 4; [09:38:17] $wgMaxShellFileSize = 100*102400; //1GB [09:38:28] oh that part [09:38:33] yeah still have to migrate that somewhere [09:38:44] you also want to make sure that $TMP points to a large enough partition [09:39:48] on deployment-transocding it uses deployment-nfs-memc:/mnt/export/data on /mnt/data type nfs (rw,addr=10.4.0.58) [09:43:14] PROBLEM Puppet freshness is now: CRITICAL on localpuppet1 i-0000020b output: Puppet has not run in last 20 hours [09:44:51] $TMP ? [09:44:56] the shell env variable? [09:45:07] that should be a global variable I guess [09:45:15] something like $wgTMHTempDirectory [09:45:52] and we probably want it to be local instead of over NFS [09:48:15] hashar_: its not related to TMH just the temp files backend [09:49:11] ahhh [09:49:15] and yes, local is better of cause but vms need to have a large enough disk in that case [09:49:56] deployment-jobrunner01 [09:50:02] has 40GB so shold be fine [09:50:58] /dev/vdb 40G 177M 38G 1% /mnt [09:50:59] yeahhh [09:51:07] not in /tmp :) [09:51:08] so we need to make sure TMP is somewhere in that place [09:51:20] oh yeah [09:51:21] grr [09:51:24] ff** [09:51:39] setting $TMP or symlink /tmp would work [09:52:06] I want that in puppet [09:52:12] sure [09:52:14] so I guess I need to mount /dev/vdb to some other place [09:52:43] what about /tmp ? ;-D [09:53:48] its used for other things no? [09:56:18] j^: how can we set the temp directory used by TMH ? [10:00:15] hashar: let me check but filerepo does not support namespaced temp afaik [10:00:19] right now it calls $tmpFile = TempFSFile::factory( 'transcode_' . $transcodeKey, $ext); [10:00:33] hashar: . [10:01:12] j^: we can just alter the filerepo configuration [10:02:15] factory in turn uses wfTempDir [10:02:53] oh [10:03:14] that checks for TMPDIR', 'TMP', 'TEMP' and if any is set tries if it can write [10:03:39] thats why i suggested to set any of those shell variables in the script that runs the jobs [10:03:41] we will need a global variable to easily change the temp dir [10:03:47] env var are evil ;-) [10:04:09] temp is system configuration, not sure what you mean [10:04:57] the point of TempFSFile::factory is to use the systems temporary space for a file [10:05:37] I am not sure how large are the system temp spaces on WMF servers [10:05:39] we just have to make sure the vm has enough space in $TEMP and all is fine [10:06:13] even current job vm has 6GB left, the config i mentioned above only allows 1GB files, so it would be enough [10:09:16] for video transcoding the WMF servers will need temp space we should fix that. whats the current partition layout on WMF servers? [10:10:06] I am looking for a job runner box [10:11:25] WMF boxes are screwed ;-D [10:11:55] abbaaaaa [10:12:06] /tmp is the same partition as / [10:12:31] on production [10:19:39] might be good to change that, since encoding is not done as root, the 5% ext limit should also make sure its not causing any problems if a process tries to fill it up [10:21:21] hence why I want the /tmp to be mounted somewhere else [10:21:26] that will be easier / safer [10:21:34] well at least on labs [10:45:03] New patchset: Hashar; "easily umount labs /dev/vdb on /mnt" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8584 [10:47:24] New patchset: Hashar; "(bug 37048) labs jobrunner needs a large tmp directory" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8585 [10:47:51] not sure it is going to work ;-D [10:51:29] paravoid: will you be there this afternoon for some labs review / chat ? [10:51:35] s/there/available/ ? [10:51:39] sure! [10:52:03] ;-) [10:53:34] New patchset: Hashar; "(bug 37048) labs jobrunner needs a large tmp directory" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8585 [10:54:53] lunch time [11:16:26] mutante: can you create ldap account for us? [11:16:37] we need to have mwdeploy user [11:16:45] which we could sudo su to [11:17:07] if possible it should have same group as apache and www-data [11:17:18] :o [11:17:38] meh [11:29:58] hello, I am trying to set up a web service for the Visual Editor project [11:30:33] I guess I'd need a public IP, or do some redirection/proxy gymnastics [11:31:00] tried to add a public IP to the project in the web interface, but that failed [11:32:27] gwicke: failed how? [11:32:35] what you need is called a "floating IP" [11:32:43] but I'm not really sure how to add one in labs yet :) [11:32:52] let's try to troubleshoot this together though [11:33:04] the informative message is 'Failed to allocate new public IP address. ' [11:33:06] ;) [11:33:25] oh hah [11:34:49] the project is called 'visualeditor' [11:37:16] lemme add myself as an admin [11:37:44] 05/23/2012 - 11:37:44 - Creating a home directory for faidon at /export/home/visualeditor/faidon [11:38:08] I'm getting failed to allocate as well [11:38:17] * paravoid goes for some log digging [11:38:43] 05/23/2012 - 11:38:43 - Updating keys for faidon at /export/home/visualeditor/faidon [11:42:04] are there any public IPs in the pool? [11:43:02] Is your ip quota greater than 0? [11:44:12] where would I check that? [11:44:46] openstack? [11:44:52] Dunno if you can in the web interface. [11:45:38] k, I guess I can't get at that info from within bastion or an instance [11:46:26] Nope because it sucks and we don't have openstack keys accessible for our users atm. [11:57:32] paravoid: / gwicke: hi. it is indeed the project quote [11:58:10] mutante: ok, thanks for checking! [11:58:13] could this be changed? [11:58:19] paravoid: fyi: on virt1: nova-manage project quota .. https://labsconsole.wikimedia.org/wiki/Nova-manage#List_floating_IPs [11:58:58] gwicke: it can, but if you are also fine with proxying and dont have external users that would be preferred [12:00:02] we need the service for external users [12:00:27] ok, we just need some kind of reason. thats fine. i will raise quota [12:00:48] !log visualeditor raising floating IP quota to 1 [12:00:50] Logged the message, Master [12:00:56] now try again what you did earlier [12:01:09] the editor will use it to retrieve an HTML DOM of a wiki page, and will ask it to serialize back to wikitext after modifications [12:01:57] on gerrit, how do I upload a second patch set? (I ask in here because it seems to be a busier channel) :) [12:02:08] alright. it's just cause of the general IPv4 shortage [12:02:17] like RIPE would ask you as well [12:02:47] gwicke: should work now to assign one as you tried earlier [12:02:49] ok, np- worked now [12:02:52] nice [12:02:55] thanks! [12:02:59] yw [12:27:24] paravoid: back :) [12:27:31] was looking at https://gerrit.wikimedia.org/r/#/c/8585/ [12:27:41] I am not sure how to organize the jobrunner classes [12:27:42] hashar: we need to have user mwdeploy in ldap [12:28:09] petan|wk: do we even have a mwdeploy user ? [12:28:13] yes [12:28:16] on dbdump only [12:28:24] how was it created? Manually? [12:28:29] anyway, it needs to be in same group as apache [12:28:36] so that all permissions are correct [12:28:53] both users, apache and mwdeploy need write access to /mnt/upload [12:29:27] well isn't /mnt/upload6 0777 anyway? [12:29:41] if everything inside is 0777 then it's fine [12:29:43] but I doubt [12:30:08] paravoid: I though about using jobrunner::directory::temp [12:30:24] petan|wk: /mnt/upload6/* are indeed 0777, just checked [12:30:36] ok, but files inside are not [12:30:44] apache don't create files with mask 777 [12:30:45] and on production mwdeploy does not belong to any other group ;) [12:31:00] ok, but I guess mwdeploy has write access to stuff [12:31:58] I have no idea how that is setup [12:32:06] I guess files belong to mwdeploy [12:32:09] and are in group wikidev [12:32:24] to be honest, I don't even know what mwdeploy user is for ;-D [12:33:38] hashar / petan|wk: why wont it work to create mwdeploy via puppet? [12:34:13] we probably can't adduser on labs [12:34:26] we should have wikidev group too [12:37:33] I got a bug opened for that IIRC [12:37:34] New patchset: Hashar; "(bug 37048) labs jobrunner needs a large tmp directory" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8585 [12:39:47] paravoid: I have updated the job runner temporary directory change https://gerrit.wikimedia.org/r/#/c/8585/ [12:40:29] so gerrit-wm no more reports comments [12:40:29] :-( [12:57:48] hashar: creating users via puppet works f.e. in wikistats project, can also add to group, the things is just that i was NOT using /home as home [12:58:07] oh [12:58:22] need to find out if mwdeploy is described in puppet [12:59:45] systemuser { wikistats: name => 'wikistats', home => '/usr/lib/wikistats', groups => [ 'project-wikistats' ] } [13:00:27] note: adding to a "project-X" group was a test, before just any other group not being a project group [13:00:54] systemuser { "mwdeploy": name => "mwdeploy" } [13:01:16] yeah, well, you might want to add home => and groups => [13:01:27] which is only in production [13:01:40] really need to merge test to production so we can then merge production to test ;-D [13:03:01] hmm, i guess. [13:03:17] do you need it exactly the same or do you need it different in labs anyways? [13:06:52] I think I am missing some more info on how to get the public IP working for the service [13:07:33] the netmask / gateway config is unclear to me [13:07:41] mutante: well I am not sure how to handle the merge [13:07:50] and those rules don't say what their actions actually are [13:08:17] (in 'security group list') [13:08:36] are those DROP rules? [13:08:41] I have no idea ;) [13:09:14] gwicke: copy from another existing group. in project "testlabs" ( the general one), there is a securiy group "web". it allows 80,443 [13:10:14] where do you see gateway settings ? [13:10:45] to use the IP, I guess I'll have to bring up an interface for it [13:11:04] or is that handed via nat? [13:11:07] i did not have to do that manually [13:11:26] you should auto. get it after clicking in labsconsole [13:12:04] it was not configured as an interface in the instance [13:12:14] re: security groups: in CIDR ranges if you want to say "ALL", use 0.0.0.0/0 [13:12:33] so those are all ACCEPT rules? [13:12:59] yea, default drop and these add accepts [13:13:22] ok, will add that to the documentation [13:13:26] but you got a default security group after creating a new instance , so you can SSH in and it can have monitoring [13:13:42] the groups are also "per-project" [13:14:37] gwicke: i guess you would have the IP on the interface after reboting the instance via labsconsole [13:15:31] ok, I wasn't able to ssh in using the public IP so far [13:15:46] trying the reboot.. [13:16:23] assign IP to project, assign to instance, check if the IP is listed as "Instance floating IP address" on Special:NovaInstance [13:16:59] yes, it is listed there [13:17:31] ve-nodejs i-00000245 running m1.small 10.4.0.125 208.80.153.246 [13:17:32] default [13:18:01] no additional interfaces after the reboot, and no ssh on the public IP [13:20:13] it works now after opening the ports [13:20:31] so no extra interfaces needed [13:21:25] ah, cool [13:21:42] are you are you really want ssh on the public IP? [13:22:02] would recommend adding a proxy host line to ssh config instead [13:22:18] then you as well just ssh into your instance with one command and dont even need it on public [13:23:00] would just use the public IP for the webserver then [13:23:16] bbiab, people at my door [13:24:21] http://parsoid.wmflabs.org/ [13:24:45] well, I don't see much harm in opening ssh directly [13:24:56] gwicke: is that using node.js for rendering ? [13:25:00] yes [13:25:15] all of sudden I am wondering why we also have a LUA extension [13:25:18] ;D [13:25:58] ;) [13:26:03] gwicke: do you have an up to date Debian package for node.js ? [13:26:15] I could use one for another project ;-] [13:26:46] the current ubuntu / debian packages are new enough for parsoid [13:27:15] ohh [13:27:16] 0.6.12 on Ubuntu, 0.6.17 on Debian [13:27:32] oh you are not using Lucid [13:27:34] cool ;-D [13:27:50] http://parsoid.wmflabs.org/Ahalya <--- blank page [13:27:51] 0.6.12 is in the instance [13:28:27] any errors return empty pages right now [13:28:55] noticed some tokenizer failures when clicking around, need to fix those [13:32:38] hashar: lemme check [13:33:00] just saw that, I was having lunch and missed the notification [13:36:42] are instances limited to a single core? [13:40:05] gwicke: there are different types of instances, decided upon installation [13:40:11] node.js is mono core IIRC [13:40:51] the server I wrote uses the cluster module to fork off a worker per core, but the instance only sees a single core [13:40:52] paravoid: well gerrit-wm no more notify us ;-( [13:41:14] gwicke: does it actually get several cores ? [13:41:41] yes, but each is a different node instance competing on a single socket [13:41:46] gwicke: I think the S1.* instance types are monocore [13:42:08] and managed (restarted on failure etc) by an upper node instance [13:42:53] ok [13:43:11] m1.xlarge is 8 CPU hehe [13:43:19] and 16GB ram [13:43:34] so you can probably parse [[Barrack Obama]] and all the {{cite}} stuff in there [13:43:47] ;) [13:44:02] fortunately Obama now works with 340M ram max.. [13:44:15] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [13:44:28] " managed (restarted on failure etc) by an upper node instance " <-- is that on another instance ? [13:45:48] instance as in process in this case, not VM [13:48:46] ooo [13:49:08] just make sure you have enough CPU core for the controller process to still be able to do its job ;-] [13:49:33] it only starts a limited number of workers, normally one per core [13:50:16] so on 8 cores you would get 8 workers and thus potentially have no horse power left for the controlling process [13:50:23] though you could just have it niced [13:50:24] ;) [13:50:44] hmm- why would the scheduler ignore the coordinator process? [13:51:27] the OOM killer might be a problem, but I think that got better at avoiding to kill the parent process in the last years [13:58:41] can't you give a process an OOM score nowadays ? [13:59:33] gwicke: http://lwn.net/Articles/317814/ [13:59:46] there is probably a node module to do so already haha [14:00:12] so you can get the worker with a high chance of dieing and the controller a very low one [14:00:33] another possibility would be to use a user account with memory quota [14:00:52] right now both the master and the workers share the same uid [14:01:16] but setting the score should likely work, if it becomes a problem [14:01:41] the master is very small though, so I would be surprised if it was picked by the OOM killer [14:01:50] maybe 20M res or so.. [14:01:59] or [14:02:05] --v8-option --max_executable_size (max size of executable memory (in Mbytes)) [14:03:40] hashar: thanks for all those options, noted them for future reference ;) [14:07:27] paravoid: I am not sure what you meant on https://gerrit.wikimedia.org/r/#/c/8585/ ;) [14:08:03] ohh [14:08:05] back [14:08:08] let's see [14:08:10] gerrit-wm: hi [14:08:14] I restarted it [14:09:10] it got a bug somehow [14:09:18] comments are no more shown ;-] [14:15:11] gwicke: I should one day get the parsoid parser tests in jenkins [14:15:41] hashar: I broke something in a recent commit, am just fixing it.. [14:16:02] getting an automatic warning would be nice ;) [14:16:17] I sometimes am too lazy to run the tests manually before each commit [14:16:27] ohh [14:16:32] you should make it a pre commit hook [14:16:51] look at the .git/hooks/pre-commit.sample file [14:18:55] paravoid: can we review the other gerrit changes ? ;-) [14:19:02] which ones? [14:19:35] https://gerrit.wikimedia.org/r/#/c/8584 [14:19:42] to umount /dev/vdb form /mnt on labs [14:19:51] that is mounted by Nova as a default [14:20:06] since sometime we want to reuse /dev/vdb for another purpose, I added a new class [14:20:32] why do you need the safeguard? [14:20:42] just to be sure it is not going to be applied on production ? [14:20:45] could remove it though [14:20:58] production does not have vdb anyway :) [14:21:11] also, if someone applies a class named "labs" in production [14:21:27] then tough luck :) [14:23:06] hashar: thanks for the pointer to the commit hook, will check that [14:23:41] the trouble is that it needs to detect a rising number of failures.. [14:26:58] paravoid: I am removing the safeguard [14:27:01] doing some rebasing meanwhile [14:30:05] New patchset: Hashar; "(bug 37048) labs jobrunner needs a large tmp directory" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8585 [14:30:20] New patchset: Hashar; "easily umount labs /dev/vdb on /mnt" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8584 [14:31:12] bad hashar [14:31:17] removed the lines but didn't reindent :) [14:31:33] grrrrr [14:31:34] r [14:31:37] I shoud %= [14:31:39] always [14:33:22] holy hell [14:33:38] we have no notifications at all [14:33:40] from gerrit-wm [14:33:50] even merges [14:33:54] that must be the l10nbot stuff [14:34:51] https://bugzilla.wikimedia.org/show_bug.cgi?id=37047 [14:34:52] ;) [14:37:10] !log deployment-prep running puppet on job runner to check change 8584 & 8585 worked [14:37:11] Logged the message, Master [14:38:18] 05/23/2012 - 14:38:18 - Updating keys for laner at /export/home/deployment-prep/laner [14:40:18] 05/23/2012 - 14:40:18 - Updating keys for laner at /export/home/deployment-prep/laner [14:45:21] 05/23/2012 - 14:45:20 - Updating keys for laner at /export/home/deployment-prep/laner [14:46:56] umount: /mnt: device is busy [14:46:56] great [14:47:18] ahhhh [14:47:26] I am SO dumb [14:48:25] RECOVERY Current Load is now: OK on deployment-sql i-000000d0 output: OK - load average: 4.79, 4.76, 4.93 [14:49:09] paravoid: looks like /dev/vdb to /mnt is in /etc/fstab [14:49:22] then there are various stuff mounted in /mnt (like /mnt/upload6 [14:49:26] so I can't amount /mnt ;-D [14:51:12] uuhh, okay, revert? [14:51:22] I am rebooting jobrunner01 to see [14:51:49] and checking how it was done on -squid instance [14:51:52] btw, having a disk mounted as /tmp sounds extremely wrong to me [14:52:08] tmpfs with 10GB of memory ? [14:52:09] :-D [14:52:31] doing transcoding using /tmp also sounds wrong. [14:53:32] jobrunner rebooted [14:54:16] so basically /dev/vdb /mnt override the puppet change [14:56:05] !log deployment-prep stopped job runner on jobrunner01, amounted /mnt/upload6 and /mnt/ [14:56:07] Logged the message, Master [14:56:15] and runnnig puppet [14:57:31] sorry for being late to respond [14:57:43] I'm trying to finish up the local puppet stuff the past three days [14:57:58] if you feel that some of these tasks should be my job instead of yours, feel free to say so [14:58:05] I will ;) [14:58:10] I'm still new around here and don't know where the line is drawn [14:58:14] I am just writing to this channel as a way to log hehe [14:58:20] 05/23/2012 - 14:58:20 - Updating keys for laner at /export/home/deployment-prep/laner [15:04:24] New patchset: Hashar; "jobrunner::labs did not specify /tmp mount FS type" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8606 [15:04:59] paravoid: mount need a fstype ;-D 8606 [15:05:32] doh [15:05:47] puppet parser validate just validate the syntax apparently [15:07:10] yes [15:09:02] you said you wanted to discuss something btw? [15:09:31] yeah I wanted to review some puppet changes I have submitted [15:09:38] and eventually got trapped with other stuff :-( [15:10:31] grr [15:10:32] err: Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid parameter type at /etc/puppet/manifests/jobrunner.pp:20 on node i-00000278.pmtpa.wmflabs [15:11:59] type -> fstype [15:12:07] New patchset: Hashar; "use valid mount parameter: type -> fstype" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8607 [15:12:09] I need to find a better puppet validation script [15:12:26] paravoid: 8607 fix a typo, sorry :-( [15:14:19] 05/23/2012 - 15:14:19 - Updating keys for laner at /export/home/deployment-prep/laner [15:16:14] doh² [15:17:02] yeah [15:17:08] :-( [15:19:19] 05/23/2012 - 15:19:19 - Updating keys for laner at /export/home/deployment-prep/laner [15:20:22] Guest31435? [15:20:25] ;) [15:20:45] some webchat user probably [15:21:19] 05/23/2012 - 15:21:19 - Updating keys for laner at /export/home/deployment-prep/laner [15:22:35] hashar: it has a cloak [15:22:42] and no it's not webchat [15:22:48] stupid puppet can amount an ext3 fs [15:22:49] grr [15:23:38] hashar: it's me :P [15:23:49] http://dpaste.org/0abVo/ [15:23:50] I was stuck in a nickserv enforcer [15:24:17] jeremyb: whilst you're here, how do I add a second patch to a commit in gerrit? [15:24:21] 05/23/2012 - 15:24:20 - Updating keys for laner at /export/home/deployment-prep/laner [15:24:47] https://gerrit.wikimedia.org/r/#/c/8586/ [15:25:43] I am busy sorry :( [15:25:58] Thehelpfulone: look at the Git/workflow doc or ask in #mediawiki :-- [15:25:58] https://labsconsole.wikimedia.org/wiki/Help:Git#Amending_a_change? [15:27:38] !log deployment-pre rebooting jobrunner01 to see how it goes [15:27:39] deployment-pre is not a valid project. [15:27:44] git fetch ssh://@gerrit.wikimedia.org:29418/operations/puppet && git checkout FETCH_HEAD is bug/complaint ? [15:27:48] !log deployment-prep rebooting jobrunner01 to see how it goes [15:27:50] Logged the message, Master [15:27:52] Thehelpfulone: you mean to merge into the original? or to submit as a separate gerrit change? [15:27:54] or 93ee0683db627cb89c2f22e8208e181c3155549a ? [15:27:59] merge into original [15:29:27] Thehelpfulone: did you make the change yet in git locally? [15:29:35] yes [15:30:43] and you're still on windows? [15:30:48] do you have a local bash? [15:30:49] yep :P [15:30:52] (cygwin?) [15:30:58] I have git bash [15:31:03] you should use a seat instead of sitting on windows border [15:31:05] it is safer [15:31:12] har har har [15:31:25] thanks for being so considerate hashar <3 [15:32:01] Thehelpfulone: pastebin: git status; git log -p --stat --pretty=fuller -n 4 [15:32:17] or http://lokwi.com/item/1252 [15:34:01] pfff [15:34:12] PROBLEM SSH is now: CRITICAL on deployment-jobrunner01 i-00000278 output: Connection refused [15:34:13] http://pastebin.com/px1q5CPu [15:34:34] Press S to skip mounting or M for manual recovery [15:34:35] ahah [15:34:38] I broken the instance \o/ [15:35:02] PROBLEM Disk Space is now: CRITICAL on deployment-jobrunner01 i-00000278 output: Connection refused by host [15:35:03] PROBLEM Current Load is now: CRITICAL on deployment-jobrunner01 i-00000278 output: Connection refused by host [15:35:16] PROBLEM Current Users is now: CRITICAL on deployment-jobrunner01 i-00000278 output: Connection refused by host [15:35:16] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner01 i-00000278 output: Connection refused by host [15:35:16] PROBLEM Free ram is now: CRITICAL on deployment-jobrunner01 i-00000278 output: Connection refused by host [15:35:16] PROBLEM Total Processes is now: CRITICAL on deployment-jobrunner01 i-00000278 output: Connection refused by host [15:35:28] $ git status [15:35:28] # On branch bug/complaint [15:35:29] nothing to commit (working directory clean) [15:40:44] New patchset: Hashar; "revert unmounting /dev/vdb to mount it on /tmp" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8610 [15:41:12] !log deployment-prep deleting jobrunner01, it is crashed beyond repair. Will create a new one named jobrunner03 [15:41:14] Logged the message, Master [15:41:49] paravoid: Ok I finally just reverted everything i have done about /dev/vdb /tmp and /mnt ;-D https://gerrit.wikimedia.org/r/8610 [15:41:58] will find out a better solution [15:47:15] Thehelpfulone: that's not actually the output from that command... [15:47:57] hashar: merged [15:48:12] hashar: in any case, I'd argue that TMH must use a configured path rather than a hardcoded /tmp [15:48:26] well it uses MediaWiki temp directory [15:48:28] which is … /tmp [15:48:36] and /tmp is jus t/ [15:48:38] jeremyb: http://pastebin.com/tpCeb7VZ [15:48:38] :-( [15:48:57] paravoid: I will open a bug to get TMH to have that feature [15:49:14] PROBLEM host: deployment-jobrunner01 is DOWN address: i-00000278 CRITICAL - Host Unreachable (i-00000278) [15:51:27] Thehelpfulone: git rebase -i HEAD^^ [15:53:44] PROBLEM Current Load is now: CRITICAL on deployment-jobrunner03 i-00000281 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:54:07] ok [15:54:24] PROBLEM Current Users is now: CRITICAL on deployment-jobrunner03 i-00000281 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:55:04] PROBLEM Disk Space is now: CRITICAL on deployment-jobrunner03 i-00000281 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:55:44] PROBLEM Free ram is now: CRITICAL on deployment-jobrunner03 i-00000281 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:57:14] PROBLEM Total Processes is now: CRITICAL on deployment-jobrunner03 i-00000281 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:57:37] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner03 i-00000281 output: CHECK_NRPE: Error - Could not complete SSL handshake. [16:00:48] PROBLEM Current Users is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [16:00:48] PROBLEM Disk Space is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [16:00:48] PROBLEM Free ram is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [16:00:48] PROBLEM Total Processes is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [16:01:23] edit 93ee068 per complaints about feedback being in main space, moving to project namespace enter the commit message for your changes. Lines starting [16:01:26] squash 843a04b changing page name to WP style. [16:01:36] Thehelpfulone: you'll want to rebase with something like that ^^^ [16:02:54] Thehelpfulone: your new commit msg should have a first line that's less than 65 chars and a second line that's blank (0 chars) [16:02:57] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 12.61, 14.44, 6.71 [16:03:05] Thehelpfulone: extra details can go starting on line 3 [16:03:18] RECOVERY Free ram is now: OK on incubator-bot2 i-00000252 output: OK: 74% free memory [16:03:18] ok so now I git commit --amend ? [16:03:29] * Damianz yawns [16:03:37] did you git rebase -i HEAD^^ ? [16:03:41] yes [16:03:48] and then what? [16:03:49] I changed to edit and squash like you said [16:03:57] RECOVERY Total Processes is now: OK on incubator-bot2 i-00000252 output: PROCS OK: 128 processes [16:03:59] !log bots Ran mysqladmin flush-hosts on bots-sql2 as it was blocking cbng's report interface. [16:04:01] Logged the message, Master [16:04:31] ok, and then? [16:05:20] 05/23/2012 - 16:05:19 - Updating keys for laner at /export/home/deployment-prep/laner [16:05:20] then I :wq to save it [16:05:28] and then? [16:05:36] that's where I am now :) [16:05:44] what happened? [16:05:49] it gave you back a shell? [16:05:52] yes [16:05:56] shouldn't have [16:06:12] err, nvm [16:06:16] yes it should have ;) [16:06:26] it told you to run --amend? then yes run it [16:07:01] ok [16:07:06] RECOVERY Current Users is now: OK on incubator-bot2 i-00000252 output: USERS OK - 0 users currently logged in [16:07:15] i have to leave any min [16:07:36] RECOVERY Disk Space is now: OK on incubator-bot2 i-00000252 output: DISK OK [16:08:07] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.37, 5.48, 4.94 [16:08:15] ok so now how do I push it? [16:08:27] did you git rebase --continue ? [16:08:46] I did now ;) [16:09:08] git push gerrit HEAD:refs/for/master/bug/complaint [16:09:11] is what I did last time [16:09:16] to get the first commit [16:09:21] 05/23/2012 - 16:09:20 - Updating keys for laner at /export/home/deployment-prep/laner [16:09:28] RECOVERY Current Load is now: OK on deployment-jobrunner03 i-00000281 output: OK - load average: 0.51, 1.19, 1.09 [16:09:38] RECOVERY Current Users is now: OK on deployment-jobrunner03 i-00000281 output: USERS OK - 1 users currently logged in [16:09:43] 1 file changed, 1 insertion(+), 1 deletion(-) [16:09:44] Successfully rebased and updated refs/heads/bug/complaint. [16:09:59] do I run the same thing to push it? git push gerrit HEAD:refs/for/master/bug/complaint [16:10:09] !log deployment-prep rebooted jobrunner03 to check everything works fine there [16:10:11] Logged the message, Master [16:10:12] I am off [16:10:18] will be back tomorrow [16:10:20] ++ [16:10:21] try: git push gerrit HEAD:refs/for/master/2012/mobilefeedbacktarget [16:10:32] bug/* is just for stuff with bug #s [16:10:44] idk if that will actually change the topic though [16:10:48] ok done [16:11:05] RECOVERY Free ram is now: OK on deployment-jobrunner03 i-00000281 output: OK: 95% free memory [16:11:05] meh it's a new commit https://gerrit.wikimedia.org/r/#/c/8612/ [16:11:11] I'll abandon the other one? [16:11:38] RECOVERY Disk Space is now: OK on deployment-jobrunner03 i-00000281 output: DISK OK [16:12:03] RECOVERY Total Processes is now: OK on deployment-jobrunner03 i-00000281 output: PROCS OK: 127 processes [16:12:12] 23 16:01:36 < jeremyb> Thehelpfulone: you'll want to rebase with something like that ^^^ [16:12:16] 23 16:02:54 < jeremyb> Thehelpfulone: your new commit msg should have a first line that's less than 65 chars and a second line that's blank (0 chars) [16:12:19] 23 16:03:05 < jeremyb> Thehelpfulone: extra details can go starting on line 3 [16:12:38] i guess... [16:12:38] RECOVERY dpkg-check is now: OK on deployment-jobrunner03 i-00000281 output: All packages OK [16:12:51] but also fix the commit msg [16:12:52] oh I didn't see that, I think my second line had something on it [16:13:19] and your first line needs to be a summary not just half a sentence [16:13:41] bah, so what's the command to fix the commit message? [16:13:42] rebase? [16:13:55] you can just do commit --amend without a rebase [16:13:59] and then repush [16:15:35] you know why it made a new one instead of adding a new PS to the existing? you messed up the commit msg. it had 2 change-ids. each commit msg should just have 2 [16:15:56] oh it worked [16:16:06] err [16:16:09] just have 1* [16:16:16] heh I was going to say :P [16:16:53] so, push a new commit msg? [16:19:33] PROBLEM Current Load is now: WARNING on deployment-nfs-memc i-000000d7 output: WARNING - load average: 3.63, 6.98, 5.43 [16:22:47] !log deployment-prep hashar: delete all 3 webVideoTranscode jobs from enwiki database [16:22:48] Logged the message, Master [16:24:23] RECOVERY Current Load is now: OK on deployment-nfs-memc i-000000d7 output: OK - load average: 0.03, 2.58, 3.93 [16:29:53] RECOVERY Puppet freshness is now: OK on localpuppet1 i-0000020b output: puppet ran at Wed May 23 16:29:31 UTC 2012 [16:41:19] 05/23/2012 - 16:41:19 - Updating keys for laner at /export/home/deployment-prep/laner [17:23:54] Change on 12mediawiki a page OAuth/User stories was modified, changed by Tfinc link https://www.mediawiki.org/w/index.php?diff=540848 edit summary: [18:26:12] PROBLEM Puppet freshness is now: CRITICAL on mailman-01 i-00000235 output: Puppet has not run in last 20 hours [18:28:36] !log deployment-prep hashar: running `mwscript rebuildLocalisationCache.php --wiki=aawiki` for {{bug|36806}} [18:28:38] Logged the message, Master [18:36:19] !log deployment-prep hashar: relocalisation cache done 367/367 languages rebuilt [18:36:20] Logged the message, Master [18:48:29] !log deployment-prep Adding puppet class 'imagescaler' on all deployment-apacheXX instances in an attempt to fix thumbnails [18:48:31] Logged the message, Master [18:56:12] PROBLEM dpkg-check is now: CRITICAL on deployment-apache23 i-00000270 output: CHECK_NRPE: Socket timeout after 10 seconds. [19:02:47] PROBLEM dpkg-check is now: CRITICAL on deployment-apache22 i-0000026f output: DPKG CRITICAL dpkg reports broken packages [19:05:19] 05/23/2012 - 19:05:19 - Updating keys for laner at /export/home/deployment-prep/laner [19:05:20] Fixed.. [19:06:19] RECOVERY dpkg-check is now: OK on deployment-apache23 i-00000270 output: All packages OK [19:08:26] * Damianz sharpens his other pointy stick [19:15:16] !log deployment-prep rebooting apache21 following installation of imagescaler puppet class [19:15:18] Logged the message, Master [19:15:38] New patchset: Hashar; "only apply admins::* while on labs" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8642 [19:15:55] Only apply admin kisses on labs? [19:18:01] RECOVERY dpkg-check is now: OK on deployment-apache22 i-0000026f output: All packages OK [19:19:20] PROBLEM HTTP is now: CRITICAL on deployment-apache23 i-00000270 output: CRITICAL - Socket timeout after 10 seconds [19:21:33] PROBLEM host: deployment-apache21 is DOWN address: i-0000026d CRITICAL - Host Unreachable (i-0000026d) [19:21:44] PROBLEM Disk Space is now: CRITICAL on nova-production1 i-0000007b output: DISK CRITICAL - free space: / 286 MB (2% inode=80%): [19:26:38] RECOVERY host: deployment-apache21 is UP address: i-0000026d PING OK - Packet loss = 0%, RTA = 4.20 ms [19:29:06] PROBLEM HTTP is now: WARNING on deployment-apache23 i-00000270 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.010 second response time [19:29:46] PROBLEM Total Processes is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [19:32:06] PROBLEM HTTP is now: CRITICAL on deployment-apache21 i-0000026d output: Connection refused [19:34:37] RECOVERY Total Processes is now: OK on worker1 i-00000208 output: PROCS OK: 85 processes [19:41:52] PROBLEM dpkg-check is now: CRITICAL on deployment-apache22 i-0000026f output: DPKG CRITICAL dpkg reports broken packages [19:43:20] 05/23/2012 - 19:43:20 - Updating keys for laner at /export/home/deployment-prep/laner [19:45:20] 05/23/2012 - 19:45:20 - Updating keys for laner at /export/home/deployment-prep/laner [19:46:00] !log deployment-prep rebooting apache22 following installation of imagescaler puppet class [19:46:01] Logged the message, Master [19:48:26] PROBLEM dpkg-check is now: CRITICAL on deployment-apache20 i-0000026c output: DPKG CRITICAL dpkg reports broken packages [19:49:41] !log deployment-prep rebooting apache23 following installation of imagescaler puppet class [19:49:43] Logged the message, Master [19:52:16] PROBLEM host: deployment-apache22 is DOWN address: i-0000026f CRITICAL - Host Unreachable (i-0000026f) [19:53:19] 05/23/2012 - 19:53:19 - Updating keys for laner at /export/home/deployment-prep/laner [19:53:28] RECOVERY dpkg-check is now: OK on deployment-apache20 i-0000026c output: All packages OK [19:54:06] notice: Finished catalog run in 3843.63 seconds [19:54:07] wonderfull [19:54:22] !log deployment-prep rebooting apache20 following installation of imagescaler puppet class [19:54:24] Logged the message, Master [19:55:38] I am heading bed now [19:56:46] PROBLEM host: deployment-apache23 is DOWN address: i-00000270 CRITICAL - Host Unreachable (i-00000270) [19:57:06] RECOVERY host: deployment-apache22 is UP address: i-0000026f PING OK - Packet loss = 0%, RTA = 1.86 ms [19:57:06] PROBLEM HTTP is now: WARNING on deployment-apache21 i-0000026d output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.005 second response time [19:57:56] RECOVERY host: deployment-apache23 is UP address: i-00000270 PING OK - Packet loss = 0%, RTA = 0.86 ms [19:59:56] PROBLEM host: deployment-apache20 is DOWN address: i-0000026c CRITICAL - Host Unreachable (i-0000026c) [20:02:16] PROBLEM HTTP is now: CRITICAL on deployment-apache22 i-0000026f output: Connection refused [20:02:16] PROBLEM HTTP is now: CRITICAL on deployment-apache23 i-00000270 output: Connection refused [20:08:16] RECOVERY host: deployment-apache20 is UP address: i-0000026c PING OK - Packet loss = 0%, RTA = 0.64 ms [20:10:34] how would I modify in deployment-prep a file managed by puppet? [20:11:34] Ryan_Lane? [20:11:42] PROBLEM HTTP is now: CRITICAL on deployment-apache20 i-0000026c output: Connection refused [20:11:49] push the change into the test branch [20:12:59] I edit the file in operations/puppet repo? [20:14:20] 05/23/2012 - 20:14:20 - Updating keys for laner at /export/home/deployment-prep/laner [20:15:20] 05/23/2012 - 20:15:19 - Updating keys for laner at /export/home/deployment-prep/laner [20:16:31] New patchset: Platonides; "Enable upload virtual host in deployment-prep since there's not a single upload host yet. See bug 37034" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8710 [20:17:11] looks good [20:17:22] PROBLEM HTTP is now: WARNING on deployment-apache22 i-0000026f output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.961 second response time [20:28:24] 05/23/2012 - 20:28:23 - Updating keys for laner at /export/home/deployment-prep/laner [20:31:22] PROBLEM HTTP is now: WARNING on deployment-apache20 i-0000026c output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.005 second response time [20:48:42] Change on 12mediawiki a page OAuth/User stories was modified, changed by Philinje link https://www.mediawiki.org/w/index.php?diff=541314 edit summary: /* Securely uploading media to Commons from 3rd party website */ [20:54:15] 05/23/2012 - 20:54:14 - Creating a home directory for csteipp at /export/home/bastion/csteipp [20:54:43] 05/23/2012 - 20:54:43 - Creating a project directory for ipv6 [20:54:44] 05/23/2012 - 20:54:43 - Creating a home directory for csteipp at /export/home/ipv6/csteipp [20:55:16] 05/23/2012 - 20:55:15 - Updating keys for csteipp at /export/home/bastion/csteipp [20:55:51] 05/23/2012 - 20:55:51 - Updating keys for csteipp at /export/home/ipv6/csteipp [20:58:05] Change on 12mediawiki a page OAuth/User stories was modified, changed by Philinje link https://www.mediawiki.org/w/index.php?diff=541321 edit summary: /* Securely uploading media to Commons from 3rd party website */ [21:04:52] Change on 12mediawiki a page OAuth/User stories was modified, changed by Philinje link https://www.mediawiki.org/w/index.php?diff=541329 edit summary: /* Mobile Apps */ [21:07:17] Change on 12mediawiki a page OAuth/User stories was modified, changed by Philinje link https://www.mediawiki.org/w/index.php?diff=541332 edit summary: /* Securely uploading media to Commons from 3rd party website */ [21:09:36] Change on 12mediawiki a page OAuth/User stories was modified, changed by Philinje link https://www.mediawiki.org/w/index.php?diff=541335 edit summary: /* Securely uploading media to Commons from 3rd party website */ [21:10:28] Change on 12mediawiki a page OAuth/User stories was modified, changed by Philinje link https://www.mediawiki.org/w/index.php?diff=541336 edit summary: /* Securely uploading media to Commons from 3rd party website */ [21:12:40] Change on 12mediawiki a page OAuth/User stories was modified, changed by Philinje link https://www.mediawiki.org/w/index.php?diff=541337 edit summary: /* Web-based editing assistance tool */ [21:12:43] PROBLEM HTTP is now: WARNING on deployment-apache23 i-00000270 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 8.402 second response time [21:14:28] PROBLEM Current Load is now: CRITICAL on ipv6test1 i-00000282 output: Connection refused by host [21:15:07] PROBLEM Current Users is now: CRITICAL on ipv6test1 i-00000282 output: Connection refused by host [21:15:47] PROBLEM Disk Space is now: CRITICAL on ipv6test1 i-00000282 output: Connection refused by host [21:16:21] PROBLEM Free ram is now: CRITICAL on ipv6test1 i-00000282 output: Connection refused by host [21:17:44] PROBLEM Total Processes is now: CRITICAL on ipv6test1 i-00000282 output: Connection refused by host [21:18:19] PROBLEM dpkg-check is now: CRITICAL on ipv6test1 i-00000282 output: Connection refused by host [21:18:22] Change on 12mediawiki a page OAuth/User stories was modified, changed by Philinje link https://www.mediawiki.org/w/index.php?diff=541341 edit summary: /* Web-based editing assistance tool */ [21:23:43] 05/23/2012 - 21:23:43 - Updating keys for csteipp at /export/home/ipv6/csteipp [21:24:15] 05/23/2012 - 21:24:14 - Updating keys for csteipp at /export/home/bastion/csteipp [21:36:30] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 12.07, 10.14, 6.62 [23:14:06] 05/23/2012 - 23:14:06 - Creating a home directory for dfoy at /export/home/mobile-sms/dfoy [23:14:12] 05/23/2012 - 23:14:12 - Creating a home directory for dfoy at /export/home/mobile/dfoy [23:14:15] 05/23/2012 - 23:14:15 - Creating a home directory for dfoy at /export/home/bastion/dfoy [23:15:06] 05/23/2012 - 23:15:06 - Updating keys for dfoy at /export/home/mobile-sms/dfoy [23:15:14] 05/23/2012 - 23:15:13 - Updating keys for dfoy at /export/home/mobile/dfoy [23:15:16] 05/23/2012 - 23:15:15 - Updating keys for dfoy at /export/home/bastion/dfoy [23:17:26] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [23:21:36] RECOVERY Current Load is now: OK on mobile-testing i-00000271 output: OK - load average: 0.88, 1.84, 4.80 [23:44:10] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours