[00:19:15] 05/22/2012 - 00:19:15 - Updating keys for bsitu at /export/home/bastion/bsitu [00:19:27] 05/22/2012 - 00:19:27 - Updating keys for bsitu at /export/home/editor-engagement/bsitu [02:02:43] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 14% free memory [02:39:31] RECOVERY Free ram is now: OK on deployment-squid i-000000dc output: OK: 20% free memory [02:43:19] 05/22/2012 - 02:43:19 - Updating keys for laner at /export/home/deployment-prep/laner [02:50:21] 05/22/2012 - 02:50:19 - Updating keys for laner at /export/home/deployment-prep/laner [02:52:03] PROBLEM Free ram is now: WARNING on deployment-squid i-000000dc output: Warning: 18% free memory [02:54:19] 05/22/2012 - 02:54:19 - Updating keys for laner at /export/home/deployment-prep/laner [02:54:23] RECOVERY Puppet freshness is now: OK on deployment-apache21 i-0000026d output: puppet ran at Tue May 22 02:54:06 UTC 2012 [02:55:20] 05/22/2012 - 02:55:20 - Updating keys for laner at /export/home/deployment-prep/laner [02:56:53] RECOVERY Puppet freshness is now: OK on labs-relay i-00000103 output: puppet ran at Tue May 22 02:56:31 UTC 2012 [03:05:00] New review: Hashar; "First line is some normal text" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/4144 [03:05:47] PROBLEM HTTP is now: CRITICAL on deployment-web3 i-00000219 output: CRITICAL - Socket timeout after 10 seconds [03:05:47] PROBLEM HTTP is now: CRITICAL on deployment-web i-00000217 output: CRITICAL - Socket timeout after 10 seconds [03:05:47] PROBLEM HTTP is now: CRITICAL on deployment-web5 i-00000213 output: CRITICAL - Socket timeout after 10 seconds [03:05:47] PROBLEM HTTP is now: CRITICAL on deployment-web4 i-00000214 output: CRITICAL - Socket timeout after 10 seconds [03:05:51] New review: Hashar; "First line really" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/4144 [03:10:45] PROBLEM HTTP is now: WARNING on deployment-web3 i-00000219 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.017 second response time [03:10:45] PROBLEM HTTP is now: WARNING on deployment-web i-00000217 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.016 second response time [03:10:45] PROBLEM HTTP is now: WARNING on deployment-web5 i-00000213 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.016 second response time [03:10:45] PROBLEM HTTP is now: WARNING on deployment-web4 i-00000214 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.022 second response time [03:37:13] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 15% free memory [03:37:19] !log testlabs Test [03:37:22] Logged the message, Master [03:40:09] Really need to get a wiki [03:40:13] PROBLEM Puppet freshness is now: CRITICAL on localpuppet1 i-0000020b output: Puppet has not run in last 20 hours [03:40:22] Wikipedia-Labs cloak. [03:40:51] What is happening with the wiki.. [03:47:43] !log deployment-prep hashar: (Bug 36870) deleting deployment-web{,3,4,5} [03:47:45] Logged the message, Master [03:52:05] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [03:58:41] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 14% free memory [04:00:31] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 14% free memory [04:01:17] !log Status PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 14% free memory [04:01:17] Status is not a valid project. [04:01:36] !log status PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 14% free memory [04:01:37] status is not a valid project. [04:01:56] !log freenode PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 14% free memory [04:01:57] freenode is not a valid project. [04:02:11] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 96% free memory [04:02:12] !log test PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 14% free memory [04:02:12] test is not a valid project. [04:02:39] !log testlabs PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 14% free memory [04:02:40] Logged the message, Master [04:03:43] !log testlabs RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 96% free memory [04:03:44] Logged the message, Master [04:04:41] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 17% free memory [04:08:51] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 15% free memory [04:11:04] !log bot bots-apache1 has two defunct processes eating CPU: pdflushsh (pid 6382) and 10 (pid 6278) [04:11:05] bot is not a valid project. [04:11:12] !log bots bots-apache1 has two defunct processes eating CPU: pdflushsh (pid 6382) and 10 (pid 6278) [04:11:14] Logged the message, Master [04:13:55] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 5% free memory [04:14:23] New review: Hashar; "Foo" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/4144 [04:18:53] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:25:30] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 5% free memory [04:28:50] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 5% free memory [04:29:40] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 21% free memory [04:35:30] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:38:46] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 95% free memory [05:11:49] !log deployment-prep creating a second job runner instance deployment-jobrunner02 . Will apply puppet classes later on. [05:11:51] Logged the message, Master [05:17:52] Is bastion down again? [05:18:04] Or, inaccessible - 'connection timed out' [05:24:24] PROBLEM Total Processes is now: CRITICAL on deployment-jobrunner02 i-00000279 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:25:04] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner02 i-00000279 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:26:14] PROBLEM Current Load is now: CRITICAL on deployment-jobrunner02 i-00000279 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:26:54] PROBLEM Current Users is now: CRITICAL on deployment-jobrunner02 i-00000279 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:27:34] PROBLEM Disk Space is now: CRITICAL on deployment-jobrunner02 i-00000279 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:28:14] PROBLEM Free ram is now: CRITICAL on deployment-jobrunner02 i-00000279 output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:04:41] PROBLEM Free ram is now: WARNING on test3 i-00000093 output: Warning: 9% free memory [06:09:41] PROBLEM Free ram is now: CRITICAL on test3 i-00000093 output: Critical: 1% free memory [06:14:11] RECOVERY Free ram is now: OK on bots-2 i-0000009c output: OK: 57% free memory [06:29:54] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 96% free memory [06:50:28] PROBLEM Current Load is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:18] RECOVERY Current Load is now: OK on migration1 i-00000261 output: OK - load average: 0.28, 1.38, 1.04 [07:42:20] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [07:49:37] ahhh [07:49:39] back [07:49:46] !log deployment-prep installing jobrunner2 [07:49:48] Logged the message, Master [08:19:20] 05/22/2012 - 08:19:19 - Updating keys for laner at /export/home/deployment-prep/laner [08:21:19] 05/22/2012 - 08:21:19 - Updating keys for laner at /export/home/deployment-prep/laner [08:22:06] RECOVERY Current Load is now: OK on deployment-jobrunner02 i-00000279 output: OK - load average: 0.43, 0.94, 1.10 [08:22:25] \O/ [08:23:16] RECOVERY Current Users is now: OK on deployment-jobrunner02 i-00000279 output: USERS OK - 1 users currently logged in [08:23:16] RECOVERY Disk Space is now: OK on deployment-jobrunner02 i-00000279 output: DISK OK [08:23:22] !log deployment-prep started job loop on deployment-job-runner02 [08:23:25] Logged the message, Master [08:23:36] RECOVERY Free ram is now: OK on deployment-jobrunner02 i-00000279 output: OK: 84% free memory [08:24:26] RECOVERY Total Processes is now: OK on deployment-jobrunner02 i-00000279 output: PROCS OK: 113 processes [08:25:31] two job runners yeahhhh [08:25:36] RECOVERY dpkg-check is now: OK on deployment-jobrunner02 i-00000279 output: All packages OK [08:25:38] not sure what they are doing though [08:41:50] !log deployment-prep restarted upd2log on -feed (again) [08:41:52] Logged the message, Master [08:54:53] !log deployment-prep purged all logs from /home/wikipedia/logs/archive/ just to be safe [08:54:55] Logged the message, Master [09:12:45] New patchset: Hashar; "/home/wikipedia/log need to be writable by udp2log!" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8442 [09:13:00] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/8442 [09:16:26] New patchset: Hashar; "(bug 37014) udp2log needs write access to /h/w/log" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8442 [09:16:41] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/8442 [09:17:08] New review: Hashar; "Patchset 2 amend commit message to reference the bug number." [operations/puppet] (test); V: 0 C: 0; - https://gerrit.wikimedia.org/r/8442 [09:20:24] New review: ArielGlenn; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/8442 [09:20:27] Change merged: ArielGlenn; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/8442 [09:25:44] !log deployment-prep Fixed udp2log not able to add new log files in /home/wikipedia/log , that dir need to be writable by udp2log user! See https://gerrit.wikimedia.org/r/8442 | https://bugzilla.wikimedia.org/37014 [09:25:46] Logged the message, Master [09:30:51] !log deployment-prep hashar: jobrunner logs are available in /home/wikipedia/logs/runJobs.log now [09:30:52] Logged the message, Master [09:39:57] !log deployment-prep hashar: rebooting jobrunner02 just to be sure it is properly loaded up [09:39:59] Logged the message, Master [09:44:37] hi [09:44:43] hello [09:44:43] :) [09:44:49] everything ok? [09:44:53] I hope labs are back [09:45:22] I will try to setup some replication of sql [09:45:37] because we really need backup if sql server crash [09:46:27] btw hashar there is a script to create wiki [09:46:31] you wanted to make one [09:46:44] but it needs to be tweaked to work with mwmultiversion [09:49:54] yeah addWiki [09:50:00] in extensions/WikimediaMaintenance IIRC [09:50:03] I have used that script [09:50:29] through multiversion ;-D [09:50:40] probably need some better tweaking though [09:51:33] petan|wk: if you are going to setup replication, please use puppet [09:51:39] hashar: when you have time could you give me description of what each machine is for [09:51:51] job runners, cache etc [09:51:57] sure [09:52:08] any idea where to document that? [09:52:16] https://labsconsole.wikimedia.org/wiki/Nova_Resource:Deployment-prep [09:52:39] https://labsconsole.wikimedia.org/wiki/Deployment/Help [09:52:41] I guess [09:53:19] 05/22/2012 - 09:53:19 - Updating keys for laner at /export/home/deployment-prep/laner [09:58:12] ok [09:58:22] is it possible to use puppet for that? [10:00:48] hashar: does the version of script you used work? if so maybe update bin/addWiki [10:01:09] because what we have now doesn't [10:03:06] I have updated the help page : https://labsconsole.wikimedia.org/wiki/Deployment/Help [10:03:12] ok [10:03:17] probably need to use a table instead of a list [10:03:39] hashar: do we want to keep using that motd we have now on some instances or not / can we move it to puppet? [10:04:06] if you like it, move it to puppet ! [10:04:19] I like it because when you ssh there you know where you are :) and what the box is for, it could help people who joined the project and aren't sure how it works [10:04:38] and who to ask if they get in trouble of course [10:04:54] what I did is that whenever I log in labs, I get a yellow PS1 ;-) [10:05:05] huh [10:05:10] oh right [10:05:16] that's not a bad idea [10:05:21] https://github.com/hashar/alix/commit/920e8e7637597d170522002bcc8f81ad4143c600 [10:05:21] we might do that for root [10:05:25] no [10:05:30] don't alter the root prompt [10:05:33] please ;-D [10:05:34] so that when you sudo su you see that you are root [10:05:41] like red line [10:05:42] XD [10:05:49] and big label "don't break stuff" [10:05:52] you will know you are root cause the line is blank and ends with a # ;-D [10:05:55] that is enough [10:05:58] heh [10:06:15] I don't sudo su much [10:06:24] anyway, my bashrc is at https://github.com/hashar/alix/blob/master/bashrc [10:06:42] I once mistyped and removed my home on my computer [10:06:49] since then I don't do that much [10:07:05] I had a backup though [10:07:20] but it wasn't fun to see it disappear heh [10:07:25] do you know what deployment-webs1 is? [10:07:39] yes, it's former instance which was supposed to become https server [10:07:50] ohh [10:07:52] probably can be removed now [10:07:54] so I guess we can get ride of it ? [10:07:55] sure [10:08:11] someone that build the HTTPS infrastructure in production will set it up on labs [10:08:15] I guess Roan / Ryan [10:08:17] k [10:08:19] Roan wanted to do that [10:08:24] but that is very low priority for now [10:08:27] I know [10:09:01] we definitely should setup some kind of backups [10:09:21] until config is in puppet etc [10:09:37] !log deployment-prep Remove deployment-webs instance which was meant to emulate the HTTPS access. Hacky and low priority for now, we will need to setup a nginx proxy one day to properly replicate the production infrastructure. [10:09:39] Logged the message, Master [10:09:58] apaches* and jobrunner* have been installed from puppet [10:10:13] deleting -webs NOW [10:10:58] ok [10:11:19] bbl [10:11:25] cya ;)) [10:12:21] 05/22/2012 - 10:12:21 - Updating keys for laner at /export/home/deployment-prep/laner [10:16:50] PROBLEM host: deployment-webs1 is DOWN address: i-0000012a check_ping: Invalid hostname/address - i-0000012a [10:23:23] New review: Dzahn; "as mentioned above: just for labs (which already has it manually)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6468 [10:26:36] New patchset: Hashar; "class to install the 'tree' utility" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/6468 [10:26:51] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/6468 [10:27:39] New review: Dzahn; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6468 [10:27:42] Change merged: Dzahn; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/6468 [10:30:24] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 16% free memory [10:40:24] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 20% free memory [10:51:43] "A newer build of the Ubuntu lucid server image is available. [10:52:11] what does that mean? Must I upgrade? Should I upgrade? How would I do that? [11:04:46] * Barebone pokes Thehelpfulone hard. [11:05:06] hashar: do you know if it's possible to do a rename of the wiki account on labs? [11:05:14] or does that need to be changed in gerrit (?) too [11:06:02] oh actually mutante you might know, being a crat on the labsconsole wiki [11:06:24] can you rename Barebone from "Tanvir Rahman" to "Wikitanvir" on wiki? [11:06:29] Thehelpfulone: quote "If you haven't yet logged into gerrit I can rename your account. Once [11:06:32] you log into gerrit you're mostly stuck." [11:07:06] he still wants gerrit to be Tanvir Rahman, just the wiki username to be wikitanvir [11:07:17] or are they the same because of LDAP? [11:07:18] Nikerabbit: got sudo? you can "apt-get dist-upgrade" your instance, then reboot if it installed a newer kernel [11:07:36] Thehelpfulone: I have no idea [11:07:50] Nikerabbit: the part about the image means there is a new image to install from if you create a new instance, but you dont have to reinstall the existing one [11:08:27] Thehelpfulone: the only thing I know is that both my login and real name are set to "hashar" :-( [11:08:45] thats what I did, don't want to complicate things ;) [11:09:00] Thehelpfulone: wiki user = git user, yes, LDAP, i dont think he can [11:09:18] * Barebone is okay with Tanvir Rahman then. [11:09:23] "Preferred wiki username. This will also be the user's git username, so legal name would be reasonable " [11:09:24] Hola guys btw! [11:09:35] Barebone: have you put your SSH key into labsconsole wiki? [11:09:42] Thehelpfulone, not yet. [11:09:49] I've given you access to both bastion and bots but nothing is being updated [11:09:53] Barebone: your shell user name can be different [11:10:06] ok, you need to put it in open SSH format into https://labsconsole.wikimedia.org/wiki/Special:NovaKey [11:10:12] mutante: how can we change our real name in LDAP ? [11:10:15] Mutante, it is my nickname/first name --> tanvir [11:10:22] I am happy with that. :-D [11:10:28] Barebone: looks good, id keep it that way, ok [11:10:29] mutante: Gerrit show me as Hashar [11:11:35] hashar: not sure, afraid that is the "mostly stuck" part from the quote above [11:11:51] yeah I guess [11:11:55] we had the question before.. but hmm,,, [11:12:03] and looked at different LDAP fields [11:12:16] I suspect that Ryan uses the same field for wiki username and git realname [11:12:28] which does not make sense to me ;-D [11:12:34] he I will have to poke him [11:12:39] yes, that was what we saw last time we checked [11:12:42] afair [11:12:48] laner is something for him [11:12:49] yes,please do [11:12:52] could be his shell username [11:12:56] it is [11:13:03] oh yeah I remember [11:13:19] so we have a shell username and a wiki username/git realname [11:13:31] we probably need to get the later split in two ;-D [11:13:46] to get fields like: shellname / username / realname [11:14:00] Barebone: well, welcome to labs. planning to join a specific project? [11:14:14] going to need some nice LDAP migration script and conf update in labsconsole / gerrit [11:14:18] hashar: ack [11:15:10] btw, the rules for the shell user name are: "(1 to 17 numbers and lower case ASCII letters, as well as the . (full stop), - (hyphen-minus) and _ (low line) characters) " heh :p [11:16:29] would not recommend shell user "-_._-" though :) [11:21:45] Mutante, for my bot mostly. [11:22:55] Barebone: ah, alright. would be nice if you add some docs about it later here https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots/Documentation [11:24:27] oh yeah, and i would like to merge that page with these some way: http://wikitech.wikimedia.org/view/Category:Bots [11:24:42] (and of course update them all) [11:25:09] any bot owner who wants to help, it's appreciated [11:28:56] Sure Mutante, will think about it. [11:29:16] peeps .. I seem to be unable to login to MySQL on phpmyadmin on bots-sql2 [11:29:31] shell mysql works fine (just got out of that), bots-sql3 works as well [11:29:52] eh, is Labs settled down from yesterday? [11:29:57] -> "#1129 Cannot log in to the MySQL server" [11:30:10] -> "Connection for controluser as defined in your configuration failed." [11:30:35] Maybe not completely settled, but phpmyadmin does not have troubles on bots-sql3 [11:30:52] hmm [11:31:04] I wonder if its a good time to start everything again... [11:31:23] start or restart ? [11:31:31] start [11:31:39] since I already killed everything, perhaps? [11:31:45] :-D [11:31:52] everything is dead? [11:32:07] all bots are stopped, uploading processes too [11:32:31] eh, I am referring to those on my projects [11:32:32] Beetstra: confirmed running mysqld process on bots-sql2 (has root password which i dont know so cant confirm login) [11:33:06] I just killed a server :( [11:33:24] mutante, I know that mysqld works, since I was working in a shell on bots-sql2 (mysql command) [11:33:25] i dont know if its a good time to restart "things" [11:33:28] and labsconsole refuses to report what happened to it now... [11:33:41] PROBLEM host: dumps-6 is DOWN address: i-00000266 CRITICAL - Host Unreachable (i-00000266) [11:33:42] But phpmyadmin apparently does not work (properly) [11:33:55] boom, just reported... [11:33:58] My bots are writing into the database as well .. [11:34:02] without problem [11:34:23] Beetstra: ok, well, i just use shell. report phpmyadmin issue as bugzilla bug? [11:34:29] Beetstra, what does your bots do? [11:34:30] ahablaba [11:34:32] 1pm [11:34:41] good feed [11:34:47] nood feed! [11:34:57] I am running LiWa3 in #wikipedia-en-spam and #cvn-sw-spam -> every link addition on wikipedia goes into a db .. [11:35:11] oh, so it checks for link spam? [11:36:05] * Hydriz 's interest was just killed [11:36:10] Yes [11:36:19] Well, it checks for ALL link additions [11:36:31] also the good stuff [11:36:38] zzz sudo reboot breaks everything, sigh [11:37:06] Beetstra: that info is also something that would be great on bots doc page, i once added something to your talk page, but that was somehow lost because at that time liquid threads was on and later disabled [11:37:08] is there some kind of firewall to/on the instances by default? [11:37:14] RECOVERY host: dumps-6 is UP address: i-00000266 PING OK - Packet loss = 0%, RTA = 0.61 ms [11:37:42] Nikerabbit: AFAIK yes [11:37:45] i think nobody had used user:talk on labsconsole much before. but lets do [11:37:55] It is documented, but not really good, mutante [11:38:25] Hydriz: I can access :ssh on my instance but not :8080 from bastion [11:38:39] hmm [11:39:01] not sure about that though :( [11:39:27] Nikerabbit: there is a security group "default" in your project, it probably does not have 8080 in it [11:39:32] \o/ just fixed an error [11:40:16] Nikerabbit: but for example there is group "web" in testlabs, which has 80,443 and 8080 (security groups) [11:40:53] hmm [11:41:04] why is :5666 in the defaults? [11:41:27] it's puppet [11:41:39] wikipedia says nagios [11:42:20] oh yea, of course. you are right [11:42:58] it is Nagios NRPE [11:43:23] to execute checks like disk space on the instances, which you cant just check from remote [11:43:28] remote plugin executor [11:45:01] wow it takes three clicks to remove a rule (with page load between each) [11:55:57] !log deployment-prep create two more job runner instances [11:55:59] Logged the message, Master [11:56:10] going to grab food while nova install them [11:56:22] hmm I can't seem to tunnel into bots-1 petan? [11:56:29] hi [11:56:36] hey [11:56:55] I'm using WinSCP to view files but I can't seem to get into bots-1 [11:57:08] right [11:57:10] can you ssh ther [11:57:14] PROBLEM Current Load is now: WARNING on bots-sql3 i-000000b4 output: WARNING - load average: 6.30, 6.12, 5.38 [11:57:20] yes [11:57:32] you are using scp or sftp [11:57:32] ah there we go [11:57:46] !tunnel [11:57:50] ok so it's scp [11:57:51] !putty [11:57:51] official site: http://www.chiark.greenend.org.uk/~sgtatham/putty/ | how to tunnel - http://oldsite.precedence.co.uk/nc/putty.html [11:57:55] and it's telling me connection refused [11:58:13] you are using socks proxy? [11:58:17] or tunnel [11:58:25] SSH tunnel [11:58:28] through bastion [11:58:31] how did you initiate it [11:58:36] using winSCP [11:58:41] you need to bind to port 22 [11:58:46] yes I have done [11:58:46] did you? [11:58:49] ok [11:58:59] but bots-1:22 [11:59:10] what local port did you bind [11:59:11] I put the port as 22 [11:59:27] when you tunnel you need to specify local port remote address and port [11:59:43] I need to know all these 3 values you used [12:00:04] ok one sec, - it was working with bots-3 the other day [12:00:17] it should have been local port 1234 remote address bots-1 port 22 [12:00:29] then connect sftp to localhost:1234 [12:00:45] ok I can get into bots-3 fine petan|wk [12:00:50] but bots-1 doesn't work [12:01:03] that's clear then [12:01:14] you configured remote address of bots-3 [12:01:26] that's wrong [12:01:36] you can't use remote address of bots-3 and connect to bots-1 [12:01:37] I set this up with mutante, all I need to do in change the instance name [12:01:42] yes [12:01:49] open putty [12:02:00] I can get in via SSH fine [12:02:08] it's just winSCP is telling me "connection refused" [12:02:26] create new tunnel local port 4858 remote address bots-1:22 [12:02:29] ssh to bastio [12:02:41] then open winscp and connect to localhost:4858 [12:03:03] !socks-proxy [12:03:03] ssh @bastion.wmflabs.org -D ; # [12:03:09] the tunneling is done through winscp not putty [12:03:12] @search bastion [12:03:12] Results (found 4): sudo, bastion, ryanland, socks-proxy, [12:03:20] on winscp, under connection -> tunnel [12:03:23] Thehelpfulone: ok, I don't know winscp [12:03:26] host name: bastion.wmflabs.org [12:03:30] port number: 22 [12:03:34] user name: thehelpfulone [12:03:42] but for the actual "session" [12:03:44] Thehelpfulone: http://winscp.net/eng/docs/ui_login_tunnel [12:03:45] PROBLEM Current Users is now: CRITICAL on deployment-jobrunner04 i-0000027b output: Connection refused by host [12:03:45] PROBLEM Current Load is now: CRITICAL on deployment-jobrunner03 i-0000027a output: Connection refused by host [12:04:02] mutante: it works for bots-3, and mailman-01, just not bots-1 [12:04:07] Thehelpfulone:I can only tell you how to open tunnel using ssh [12:04:12] I don't know scp [12:04:25] PROBLEM Disk Space is now: CRITICAL on deployment-jobrunner04 i-0000027b output: Connection refused by host [12:04:25] PROBLEM Current Users is now: CRITICAL on deployment-jobrunner03 i-0000027a output: Connection refused by host [12:05:05] PROBLEM Disk Space is now: CRITICAL on deployment-jobrunner03 i-0000027a output: Connection refused by host [12:05:05] PROBLEM Free ram is now: CRITICAL on deployment-jobrunner04 i-0000027b output: Connection refused by host [12:05:14] Thehelpfulone: i guess you have saved sessions in winscp with the tunnel settings for these, but its not activated in te one for bots-1 ? if not it would be weird, tunnel should always go through bastion and not make a difference which instance from there [12:05:29] works on bots-2 too [12:05:38] well, it's the same ssh tunnel thing that you setup in putty [12:05:42] it says the it authenticates properly [12:05:45] PROBLEM Free ram is now: CRITICAL on deployment-jobrunner03 i-0000027a output: Connection refused by host [12:05:46] just in winscp [12:05:56] but then it says network error connection refused [12:06:00] yes [12:06:15] PROBLEM Total Processes is now: CRITICAL on deployment-jobrunner04 i-0000027b output: Connection refused by host [12:06:16] hmm, so I can't get into bots-1 or bots-4 [12:06:55] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner04 i-0000027b output: Connection refused by host [12:06:55] PROBLEM Total Processes is now: CRITICAL on deployment-jobrunner03 i-0000027a output: Connection refused by host [12:07:35] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner03 i-0000027a output: Connection refused by host [12:08:15]