Thursday, November 20, 2014

Enabling SSH for a Hadoop Cluster

This article is part of the Hadoop Masterpage.

To set up HDFS in the distributed mode, SSH must be enabled from master nodes to slave nodes.

The purpose of SSH is to create a secure channel across an insecure network. SSH is an asymmetric encryption protocol. This means there are two keys. The public key will encrypt data, and can be disseminated to the world. The private key is used to decrypt data and must be kept hidden at all times.


Getting Started


Since we haven’t defined our master and slave nodes yet, and are still working on the stock image, we want to do some initial set up around SSH. The first thing we’ll do is try to ssh into localhost, and if this command gives an error, we know we have work to do.

From the terminal, run
 ssh localhost  

if that command gives you an error, typically
 craigtrim@CVB:~$ ssh localhost  
 ssh: connect to host localhost port 22: Connection refused  

Then you need to follow the steps below.


Installing OpenSSH (Optional)

At the time of this article, I was running both Redhat 5.2 and Ubuntu 14.04.

For many Ubuntu installations (including my own) OpenSSH may not be installed by default. If you're using Ubuntu, and you follow steps outlined above, but still still get denied when attempting to SSH into localhost, the following three commands should be sufficient to install :
 sudo rm /etc/ssh/sshd_config  
 sudo apt-get purge openssh-server  
 sudo apt-get install openssh-server  

For my particular configuration, the first two commands accomplished nothing.

I had no "sshd_config" file in my /etc/ssh/ directory, and the openssh-server wasn't installed, so the purge command accomplished nothing.

In my case, the third command; that is, the installation of the openssh-server it was corrected the problem.

 craigtrim@CVB:~$ sudo apt-get install openssh-server   
 [sudo] password for craigtrim:   
 Reading package lists... Done  
 Building dependency tree      
 Reading state information... Done  
 The following extra packages will be installed:  
  libck-connector0 ncurses-term openssh-sftp-server python-requests  
  python-urllib3 ssh-import-id  
 Suggested packages:  
  rssh molly-guard monkeysphere  
 The following NEW packages will be installed:  
  libck-connector0 ncurses-term openssh-server openssh-sftp-server  
  python-requests python-urllib3 ssh-import-id  
 0 upgraded, 7 newly installed, 0 to remove and 225 not upgraded.  
 Need to get 698 kB of archives.  
 After this operation, 3,834 kB of additional disk space will be used.  
 Do you want to continue? [Y/n] y  
 Get:1 http://us.archive.ubuntu.com/ubuntu/ trusty/main libck-connector0 amd64 0.4.5-3.1ubuntu2 [10.5 kB]  
 Get:2 http://us.archive.ubuntu.com/ubuntu/ trusty/main ncurses-term all 5.9+20140118-1ubuntu1 [243 kB]  
 Get:3 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main openssh-sftp-server amd64 1:6.6p1-2ubuntu2 [34.1 kB]  
 Get:4 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main openssh-server amd64 1:6.6p1-2ubuntu2 [319 kB]  
 Get:5 http://us.archive.ubuntu.com/ubuntu/ trusty/main python-urllib3 all 1.7.1-1build1 [38.9 kB]  
 Get:6 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main python-requests all 2.2.1-1ubuntu0.1 [42.9 kB]  
 Get:7 http://us.archive.ubuntu.com/ubuntu/ trusty/main ssh-import-id all 3.21-0ubuntu1 [9,624 B]  
 Fetched 698 kB in 1s (498 kB/s)       
 Preconfiguring packages ...  
 Selecting previously unselected package libck-connector0:amd64.  
 (Reading database ... 165818 files and directories currently installed.)  
 Preparing to unpack .../libck-connector0_0.4.5-3.1ubuntu2_amd64.deb ...  
 Unpacking libck-connector0:amd64 (0.4.5-3.1ubuntu2) ...  
 Selecting previously unselected package ncurses-term.  
 Preparing to unpack .../ncurses-term_5.9+20140118-1ubuntu1_all.deb ...  
 Unpacking ncurses-term (5.9+20140118-1ubuntu1) ...  
 Selecting previously unselected package openssh-sftp-server.  
 Preparing to unpack .../openssh-sftp-server_1%3a6.6p1-2ubuntu2_amd64.deb ...  
 Unpacking openssh-sftp-server (1:6.6p1-2ubuntu2) ...  
 Selecting previously unselected package openssh-server.  
 Preparing to unpack .../openssh-server_1%3a6.6p1-2ubuntu2_amd64.deb ...  
 Unpacking openssh-server (1:6.6p1-2ubuntu2) ...  
 Selecting previously unselected package python-urllib3.  
 Preparing to unpack .../python-urllib3_1.7.1-1build1_all.deb ...  
 Unpacking python-urllib3 (1.7.1-1build1) ...  
 Selecting previously unselected package python-requests.  
 Preparing to unpack .../python-requests_2.2.1-1ubuntu0.1_all.deb ...  
 Unpacking python-requests (2.2.1-1ubuntu0.1) ...  
 Selecting previously unselected package ssh-import-id.  
 Preparing to unpack .../ssh-import-id_3.21-0ubuntu1_all.deb ...  
 Unpacking ssh-import-id (3.21-0ubuntu1) ...  
 Processing triggers for man-db (2.6.7.1-1) ...  
 Processing triggers for ureadahead (0.100.0-16) ...  
 ureadahead will be reprofiled on next reboot  
 Processing triggers for ufw (0.34~rc-0ubuntu2) ...  
 Setting up libck-connector0:amd64 (0.4.5-3.1ubuntu2) ...  
 Setting up ncurses-term (5.9+20140118-1ubuntu1) ...  
 Setting up openssh-sftp-server (1:6.6p1-2ubuntu2) ...  
 Setting up openssh-server (1:6.6p1-2ubuntu2) ...  
 Creating SSH2 RSA key; this may take some time ...  
 Creating SSH2 DSA key; this may take some time ...  
 Creating SSH2 ECDSA key; this may take some time ...  
 Creating SSH2 ED25519 key; this may take some time ...  
 ssh start/running, process 3998  
 Setting up python-urllib3 (1.7.1-1build1) ...  
 Setting up python-requests (2.2.1-1ubuntu0.1) ...  
 Setting up ssh-import-id (3.21-0ubuntu1) ...  
 Processing triggers for libc-bin (2.19-0ubuntu6) ...  
 Processing triggers for ureadahead (0.100.0-16) ...  
 Processing triggers for ufw (0.34~rc-0ubuntu2) ...  
 craigtrim@CVB:~$   



Key Generation


ssh-keygen generates, manages and converts authentication keys for ssh.

Run this command to generate SSH keys:
 ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa  

What does this command do?
   -t dsa   
     Specifies the type of key to create. We create a “dsa” key for protocol version 2.   
   -P ''   
     Provides the (old) passphrase (none)   
     Note the use of two single quotes  
   -f ~/.ssh/id_dsa   
     Specifies the filename of the key file.   



SSH localhost


If this has all worked out, you should be able to type
 ssh localhost  

and get a connection.

If this is your first time running the command, you’ll get something that looks like this:
 craigtrim@CVB:~$ ssh localhost  
 The authenticity of host 'localhost (127.0.0.1)' can't be established.  
 ECDSA key fingerprint is b5:62:45:21:ba:b8:75:64:2d:d5:ba:d8:36:f5:d3:3d.  
 Are you sure you want to continue connecting (yes/no)? yes  
 Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.  
 craigtrim@localhost's password:   
 Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64)  
  * Documentation: https://help.ubuntu.com/  
 226 packages can be updated.  
 85 updates are security updates.  
 The programs included with the Ubuntu system are free software;  
 the exact distribution terms for each program are described in the  
 individual files in /usr/share/doc/*/copyright.  
 Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by  
 applicable law.  
 craigtrim@CVB:~$   


Once the command has been run, an SSH key pair will be created in the .ssh/ directory of the home directory, and the generated public key will be registered as a trusted key.


However, to properly get this ready for an HDFS cluster, we don't want to be prompted for a password each time we attempt a connection.



~/.ssh/authorized_keys


This file holds a list of authorized public keys for servers.

When the client connects to a server, the server authenticates the client by checking its signed public key stored within this file.

Each line of the file contains one key, up to a limit of 8 kilobytes, which permits DSA keys up to 8 kilobits and RSA keys up to 16 kilobits. You don't want to type them in; instead, copy the id_dsa.pub and edit it.

Does the authorized_keys file in the home directory exist?

If it does, you’ll see a file here:
 ~/.ssh/authorized_keys  

 ~/.ssh/  

If the authorized_keys flie does not exist, create it
 touch ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys  

Now with your permissions set, add your key to the authorized_keys file
 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys  
What does this command do? 

cat is a common command that will read the contents of a file. In this case, the public key is a file (id_dsa.pub) with an encryption key that we want to read, and then append >> the contents of this key into the new authorized_key file we created. This should be done for both the master and the slave node. 
If we try to SSH into localhost again, we should be able to make the connection without being asked for a password.

  craigtrim@CVB:~/.ssh$ ssh master   
  The authenticity of host 'master (192.168.1.43)' can't be established.   
  ECDSA key fingerprint is b5:62:45:21:ba:b8:75:64:2d:d5:ba:d8:36:f5:d3:3d.   
  Are you sure you want to continue connecting (yes/no)? yes   
  Warning: Permanently added 'master,192.168.1.43' (ECDSA) to the list of known hosts.   
  Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64)   
  * Documentation: https://help.ubuntu.com/   
  Last login: Fri Nov 21 09:59:45 2014 from localhost   
  craigtrim@CVB:~$    

When the HDFS cluster is started, the start-up scripts are going to attempt to start the datanodes (slaves) in the cluster.  And they do this by first SSH'ing into each datanode, and once inside, starting up a daemon on that instance.

If the datanode prompts for a password, all is lost.




Troubleshooting



ECDSA host key differs

If you're working with VMs, and the IP addresses change for nodes in your cluster, you may get the error message:
 craigtrim@CVB:~$ ssh slave2  
 Warning: the ECDSA host key for 'slave2' differs from the key for the IP address '192.168.1.12'  
 Offending key for IP in /home/craigtrim/.ssh/known_hosts:10  
 Matching host key in /home/craigtrim/.ssh/known_hosts:7  
 Are you sure you want to continue connecting (yes/no)? yes  
 Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64)  
  * Documentation: https://help.ubuntu.com/  
 Last login: Wed Nov 26 11:34:48 2014 from slave2  

The solution is simple.

Remove the host (the cached key) from the known_hosts file:
 ssh-keygen -R 192.168.1.12  

Then SSH into the node again.  You should not see this warning message anymore.


Next: Configure the Base.

1 comment:

  1. Hi i can able to do ssh from master to slaves but i am getting connection refused error when i am doing ssh from slaves to master

    ReplyDelete