Tuesday, March 24, 2015

Hadoop Architecture

The Map/Reduce Paradigm


How many times does this pattern occur in the data?


Mapping




Introductioun:
  1. Each Map task outputs data in the form of Key/Value pair.
    1. mapreduce.tasktracker.map.tasks.maximum: 8
      1. The maximum number of map tasks that will be run simultaneously by a task tracker
    2. mapreduce.map.memory.mb: 128
      1. The amount of memory to request from the scheduler for each map task.
  2. The output is stored in a Ring Buffer rather than being written directly to the disk.
  3. When the Ring Buffer reaches 80% capacity, the content is "spilled" to disk.
    1. This process will create multiple files on the datanode (shuffle spill files).
    2. mapreduce.map.sort.spill.percent: 0.80
      1. The soft limit in the serialization buffer. Once reached, a thread will begin to spill the contents to disk in the background. Note that collection will not block if this threshold is exceeded while a spill is already in progress, so spills may be larger than this threshold when it is set to less than .5
  4. Hadoop will merge all the spill files on a given datanode into a single file
    1. This single file is both sorted and partitioned based on number of reducers.
    2. mapreduce.task.io.sort.mb: 512
      1. The total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks.
    3. mapreduce.task.io.sort.factor: 64
      1. The number of streams to merge at once while sorting files. This determines the number of open file handles.
    4. mapreduce.reduce.shuffle.input.buffer.percent: 0.70
      1. The percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle.
    5. mapreduce.reduce.input.buffer.percent: 0.70
      1. The percentage of memory- relative to the maximum heap size- to retain map outputs during the reduce. When the shuffle is concluded, any remaining map outputs in memory must consume less than this threshold before the reduce can begin.
    6. mapreduce.reduce.shuffle.parallelcopies: 128
      1. The default number of parallel transfers run by reduce during the copy(shuffle) phase.
    7. mapreduce.reduce.memory.mb: 1024
      1. The amount of memory to request from the scheduler for each reduce task.
    8. mapreduce.reduce.shuffle.merge.percent: 0.66
      1. The usage threshold at which an in-memory merge will be initiated, expressed as a percentage of the total memory allocated to storing in-memory map outputs, as defined by mapreduce.reduce.shuffle.input.buffer.percent.



Ring Buffer


The Ring Buffer (aka Circular Buffer) is a key concept in the MapReduce ecosystem.

We have two major challenges in any map/reduce program:

  1. We are dealing with a massive amount of data
    1. If this isn't true, we don't need to use map/reduce
  2. The result of the map tasks can not be constantly written to disk
    1. This would be too slow
  3. Nor can it be stored entirely within memory
    1. Most systems would not have a sufficient amount of memory

We have to use a combination of disks/memory efficiently.

The circular buffer is fast. Writing to memory is much faster than doing an I/O to disk. Flushing the data is only performed when needed.

Continuous logging can fill up space on the systems, causing other programs to also run out of space and fail. In such cases, either logs have to be manually removed or a log rotation policy has to be implemented.



References

  1. Hadoop Internals
    1. One of the best all-in-one overviews of Hadoop Architecture I have read.
    2. The documentation appears to be to date with YARN and other ecosystem improvements.
  2. Advantages of a Ring Buffer
    1. Map Tasks write to ring (aka Circular) buffers while executing
    2. This article is unrelated to Hadoop, but a knowlege of how this buffer works will aid in understanding mapred-site.xml configuration parameters
      1. Property: mapreduce.map.sort.spill.percent
      2. DescriptionThe soft limit in the serialization buffer. Once reached, a thread will begin to spill the contents to disk in the background. Note that collection will not block if this threshold is exceeded while a spill is already in progress, so spills may be larger than this threshold when it is set to less than .5
      3. Default Value: 0.80
  3. [Quora] Apache Spark vs Hadoop
    1. A good discussion of both the map-side and reduce-side differences.  
    2. Helpful for an understanding of Hadoop's design independent of Spark.

Monday, December 1, 2014

Hadoop Troubleshooting: GC overhead limit exceeded

GC overhead limit exceeded


 craigtrim@CVB:/usr/lib/apache/hadoop/2.5.2/bin$ hadoop jar sandbox-1.0-SNAPSHOT.jar dev.hadoop.sandbox.counter.WordCountRunner /nyt/1987/ /out2  
 Input Directory = /nyt/1987/  
 Output Directory = /out2  
 2014-12-01 18:37:07,381 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1019)) - session.id is deprecated. Instead, use dfs.metrics.session-id  
 2014-12-01 18:37:07,385 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=  
 2014-12-01 18:37:07,743 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.  
 ^[^N2014-12-01 18:37:14,857 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(281)) - Total input paths to process : 106105  
 2014-12-01 18:37:17,228 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(396)) - number of splits:106105  
 2014-12-01 18:37:17,329 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(479)) - Submitting tokens for job: job_local1925425636_0001  
 2014-12-01 18:37:17,359 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/tmp/hadoop-craigtrim/mapred/staging/craigtrim1925425636/.staging/job_local1925425636_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.  
 2014-12-01 18:37:17,362 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/tmp/hadoop-craigtrim/mapred/staging/craigtrim1925425636/.staging/job_local1925425636_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.  
 2014-12-01 18:37:17,465 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/home/craigtrim/HADOOP_DATA_DIR/local/localRunner/craigtrim/job_local1925425636_0001/job_local1925425636_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.  
 2014-12-01 18:37:17,468 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/home/craigtrim/HADOOP_DATA_DIR/local/localRunner/craigtrim/job_local1925425636_0001/job_local1925425636_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.  
 2014-12-01 18:37:17,476 INFO [main] mapreduce.Job (Job.java:submit(1289)) - The url to track the job: http://localhost:8080/  
 2014-12-01 18:37:17,479 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1334)) - Running job: job_local1925425636_0001  
 2014-12-01 18:37:17,482 INFO [Thread-5] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(471)) - OutputCommitter set in config null  
 2014-12-01 18:37:17,489 INFO [Thread-5] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(489)) - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter  
 2014-12-01 18:37:18,484 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1355)) - Job job_local1925425636_0001 running in uber mode : false  
 2014-12-01 18:37:18,486 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) - map 0% reduce 0%  
 2014-12-01 18:37:37,400 WARN [Thread-5] mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local1925425636_0001  
 java.lang.OutOfMemoryError: GC overhead limit exceeded  
      at java.util.HashMap.newNode(HashMap.java:1734)  
      at java.util.HashMap.putVal(HashMap.java:630)  
      at java.util.HashMap.putMapEntries(HashMap.java:514)  
      at java.util.HashMap.<init>(HashMap.java:489)  
      at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:673)  
      at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:440)  
      at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.<init>(LocalJobRunner.java:217)  
      at org.apache.hadoop.mapred.LocalJobRunner$Job.getMapTaskRunnables(LocalJobRunner.java:272)  
      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:517)  
 2014-12-01 18:37:38,354 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1375)) - Job job_local1925425636_0001 failed with state FAILED due to: NA  
 2014-12-01 18:37:38,432 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Counters: 0  
 craigtrim@CVB:/usr/lib/apache/hadoop/2.5.2/bin$   


Potential Causes

  1. The NameNode was formatted, and the DataNodes already had data in the HADOOP_DATA_DIR
    1. Triggered when I attempted to load data into the HDFS


Solution

  1. Re-create the HADOOP_DATA_DIR
     mkdir -p $HADOOP_DATA_DIR/data  
     mkdir -p $HADOOP_DATA_DIR/name  
     mkdir -p $HADOOP_DATA_DIR/local  
     sudo chmod 755 $HADOOP_DATA_DIR  
    
  2. Format your NameNode
     cd $HADOOP_HOME/bin  
     dfs namenode -format  
    
  3. Re-start Services


Shortcomings

  1. If your HADOOP_DATA_DIR has data that you don't want to lose, this isn't a great solution


References

  1. http://stackoverflow.com/questions/16020334/hadoop-datanode-process-killed
    1. "Problem could be that NN was formatted after cluster was set up and DN were not, so slaves are still referring to old NN."
  2. http://stackoverflow.com/questions/22316187/datanode-not-starts-correctly
    1. More fine-grained approach involving formatting individual clusterIDs, rather than brute force removal of entire HADOOP_DATA_DIR

Hadoop Troubleshooting: Incompatible Cluster IDs

Incompatible clusterIDs



2015-03-23 21:09:06,824 WARN  [IPC Server handler 5 on 9000] namenode.NameNode (NameNodeRpcServer.java:verifyRequest(1177)) - Registration IDs mismatched: the DatanodeRegistration ID is NS-1432439258-CID-e5f1aae5-2c67-487a-aa0e-5710e3b679e5-0 but the expected ID is NS-691282619-CID-efa3290f-7776-4cfc-8e92-a438d11abdd8-0
2015-03-23 21:09:06,825 INFO  [IPC Server handler 5 on 9000] ipc.Server (Server.java:run(2060)) - IPC Server handler 5 on 9000, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 192.168.1.201:37895 Call#13731 Retry#0
org.apache.hadoop.hdfs.protocol.UnregisteredNodeException: Unregistered server: DatanodeRegistration(192.168.1.201, datanodeUuid=5c2239a4-d14c-4549-8b87-f956ece5d946, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-e5f1aae5-2c67-487a-aa0e-5710e3b679e5;nsid=1432439258;c=0)
 at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.verifyRequest(NameNodeRpcServer.java:1180)
 at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1074)
 at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:107)
 at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26380)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)


Potential Causes

  1. The NameNode was formatted, and the DataNodes already had data in the HADOOP_DATA_DIR
    1. Triggered when I attempted to load data into the HDFS



Solution 1


The ClusterID from the NameNode needs to be copied onto the DataNode.

The first line of the NameNode should contain the error message that shows the incompatible cluster IDs:
the DatanodeRegistration ID is NS-1432439258-CID-e5f1aae5-2c67-487a-aa0e-5710e3b679e5-0 but the expected ID is NS-691282619-CID-efa3290f-7776-4cfc-8e92-a438d11abdd8-0
Copy the NameNode ClusterID to the VERSION file in the $HADOOP_DATA_DIR/data/current directory.

raig@dn02:~/HADOOP_DATA_DIR/data/current$ cat VERSION
#Mon Mar 23 09:37:50 PDT 2015
storageID=DS-821e9b0e-c3c8-478d-9829-9d1f721b84ed
clusterID=CID-e5f1aae5-2c67-487a-aa0e-5710e3b679e5
cTime=0
datanodeUuid=c9363e06-81bf-467f-958c-042292edd3bf
storageType=DATA_NODE
layoutVersion=-56

The datanodeUuid value (in red above) needs to be changed to the "expected ID" value found in the NameNode log file.

IMPORTANT:  If you are operating a multi-node cluster, do not copy this file (replacing all the existing VERSION files) on each data node in the cluster.  In doing so, you will overwrite the datanodeUuid and storageId values.

Doing this will trigger an UnregisteredNodeException error:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(192.168.1.204, datanodeUuid=5c2239a4-d14c-4549-8b87-f956ece5d946, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-efa3290f-7776-4cfc-8e92-a438d11abdd8;nsid=691282619;c=0) is attempting to report storage ID 5c2239a4-d14c-4549-8b87-f956ece5d946. Node 192.168.1.203:50010 is expected to serve this storage.
 at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:477)
 at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1780)
 at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1097)
 at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
 at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

 at org.apache.hadoop.ipc.Client.call(Client.java:1468)
 at org.apache.hadoop.ipc.Client.call(Client.java:1399)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
 at com.sun.proxy.$Proxy12.blockReport(Unknown Source)
 at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:175)
 at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:492)
 at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:715)
 at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
 at java.lang.Thread.run(Thread.java:745)
2015-03-23 21:48:49,812 WARN  [DataNode: [[[DISK]file:/home/craig/HADOOP_DATA_DIR/data/]]  heartbeating to master/192.168.1.70:9000] datanode.DataNode (BPServiceActor.java:run(861)) - Ending block pool service for: Block pool BP-494263941-10.0.4.15-1426797442562 (Datanode Uuid 5c2239a4-d14c-4549-8b87-f956ece5d946) service to master/192.168.1.70:9000
2015-03-23 21:48:49,913 INFO  [DataNode: [[[DISK]file:/home/craig/HADOOP_DATA_DIR/data/]]  heartbeating to master/192.168.1.70:9000] datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool BP-494263941-10.0.4.15-1426797442562 (Datanode Uuid 5c2239a4-d14c-4549-8b87-f956ece5d946)
2015-03-23 21:48:49,914 INFO  [DataNode: [[[DISK]file:/home/craig/HADOOP_DATA_DIR/data/]]  heartbeating to master/192.168.1.70:9000] datanode.DataBlockScanner (DataBlockScanner.java:removeBlockPool(273)) - Removed bpid=BP-494263941-10.0.4.15-1426797442562 from blockPoolScannerMap
2015-03-23 21:48:49,914 INFO  [DataNode: [[[DISK]file:/home/craig/HADOOP_DATA_DIR/data/]]  heartbeating to master/192.168.1.70:9000] impl.FsDatasetImpl (FsDatasetImpl.java:shutdownBlockPool(2217)) - Removing block pool BP-494263941-10.0.4.15-1426797442562
2015-03-23 21:48:51,915 WARN  [main] datanode.DataNode (DataNode.java:secureMain(2392)) - Exiting Datanode
2015-03-23 21:48:51,918 INFO  [main] util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0


and cause the DataNode to shutdown.



Solution 2 (Last Resort)

  1. Re-create the HADOOP_DATA_DIR
    mkdir -p $HADOOP_DATA_DIR/data  
    mkdir -p $HADOOP_DATA_DIR/name  
    mkdir -p $HADOOP_DATA_DIR/local  
    chmod 755 $HADOOP_DATA_DIR  
    

  2. Format your NameNode
    ./hdfs namenode -format
    

  3. Re-start Services

If your HADOOP_DATA_DIR has data that you don't want to lose, this isn't a great solution


References

  1. http://stackoverflow.com/questions/16020334/hadoop-datanode-process-killed
    1. "Problem could be that NN was formatted after cluster was set up and DN were not, so slaves are still referring to old NN."
  2. http://stackoverflow.com/questions/22316187/datanode-not-starts-correctly
    1. More fine-grained approach involving formatting individual clusterIDs, rather than brute force removal of entire HADOOP_DATA_DIR

Wednesday, November 26, 2014

Adding a new DataNode to the Cluster

This article is part of the Hadoop Masterpage.


If you're initializing a cluster for the first time, it's likely that you'll want to start small with just one or two slaves (DataNodes).

Over time, as you operate your cluster, gaining more experience and confidence, you'll likely want to add more slaves.

I started with a simple NameNode and dual-DataNode cluster configuration.  This article details the steps I took to add an extra node to my cluster.


Outline

  1. The first step I took was to clone an existing VM.
  2. Update the /etc/hosts file
    1. Add the new slave and IP address
    2. Copy this file to ea
  3. Start the DataNode


Clone an existing DataNode


If you're cloning a DataNode that's already been used within a cluster, you'll want to clean out the $HADOOP_DATA_DIR. An easy way to do this is simply delete the existing directory, and re-create it (reference "Configuring your Base" for more information on this step):

 sudo rm -rf $HADOOP_DATA_DIR
 mkdir -p $HADOOP_DATA_DIR/data  
 mkdir -p $HADOOP_DATA_DIR/name  
 mkdir -p $HADOOP_DATA_DIR/local  
 sudo chmod 755 $HADOOP_DATA_DIR  

If you plan to clone more than one node, I recommend cloning the additional nodes from this step onward.

Once my data directory is cleared, I use VirtualBox to create my clone:
 VBoxManage clonevm "%~1" --name "%~2" --register --basefolder %vm%  

Substitute the node names for the parameters above. Reference "VirtualBox for Virtualization" for automated cloning recipes.


Updating the Hosts File


On your NameNode, update the hosts file:
 sudo gedit /etc/hosts  

and add the node name and IP address for each newly created slave node.

My /etc/hosts file now looks like this:
 127.0.0.1     localhost  
 127.0.1.1     CVB  
 192.168.1.10     master  
 192.168.1.11     slave1  
 192.168.1.12     slave2  
 192.168.1.15     slave3  
 192.168.1.17     slave4  
 192.168.1.18     slave5  
 192.168.1.14     dev  
 # The following lines are desirable for IPv6 capable hosts  
 ::1   ip6-localhost ip6-loopback  
 fe00::0 ip6-localnet  
 ff00::0 ip6-mcastprefix  
 ff02::1 ip6-allnodes  
 ff02::2 ip6-allrouters  

The new lines in my file are in blue bold.  The information in your file will not be identical.



Copying the Host File


Do I need to copy the hosts file on my NameNode to each DataNode in the cluster?

Yes, you should do this.  There are cases where DataNodes will talk to each other.  DataNode do this when they are replicating data.  Also, when adding a new DataNode to an existing cluster, data re-balancing (addressed below) will occur.  This requires DataNodes to address one another.

It is important that each DataNode be able to address each other in a consistent fashion.  I recommend maintaining (and editing) the master copy of the hosts file on the NameNode.

On my NameNode, I've created a script in my home directory that will copy the hosts file to each DataNode in my cluster:

 cd ~  
 gedit copy-hosts.sh  

Copy this script with the appropriate modifications:
 #!/bin/bash  
 for i in {1..5}  
 do  
   scp /etc/hosts craigtrim@slave$i:/etc/hosts  
 done  

Don't forget to make your shell script executable:
 chmod +x copy-hosts.sh  


Operational Output

When I execute this script, it simply reports back to me that all the files were copied successfully:
 craigtrim@CVB:~$ ./copy-hosts.sh  
 hosts                     100% 355   0.4KB/s  00:00    
 hosts                     100% 355   0.4KB/s  00:00    
 hosts                     100% 355   0.4KB/s  00:00    
 hosts                     100% 355   0.4KB/s  00:00    
 hosts                     100% 355   0.4KB/s  00:00    

It doesn't hurt to logon to at least one of the slaves and verify that the file was copied correctly:
 craigtrim@CVB:~$ ssh slave3  
 Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64)  
  * Documentation: https://help.ubuntu.com/  
 229 packages can be updated.  
 87 updates are security updates.  
 Last login: Wed Nov 26 14:26:43 2014 from master  
 craigtrim@CVB:~$ cat /etc/hosts  
 127.0.0.1     localhost  
 127.0.1.1     CVB  
 192.168.1.10     master  
 192.168.1.11     slave1  
 192.168.1.12     slave2  
 192.168.1.15     slave3  
 192.168.1.17     slave4  
 192.168.1.18     slave5  
 192.168.1.14     dev  
 # The following lines are desirable for IPv6 capable hosts  
 ::1   ip6-localhost ip6-loopback  
 fe00::0 ip6-localnet  
 ff00::0 ip6-mcastprefix  
 ff02::1 ip6-allnodes  
 ff02::2 ip6-allrouters  

Once you gain confidence that the script is doing what it's supposed to, you can likely skip this verification step in the future.


Updating the Slaves File


This section is almost identical to the one above. Just as we maintain a master /etc/hosts on the designated NameNode, we'll maintain our master slaves file there.

Let's go ahead and edit it, and add the two new slaves (DataNodes):
 cd $HADOOP_CONF_DIR  
 gedit slaves  

My /etc/hosts file now looks like this:
 slave1  
 slave2  
 slave3  
 slave4  
 slave5  

The new lines in my file are in blue bold. That's about as simple as it gets. Use of the hosts file prevents us from having to enter actual IP addresses in this file.

In like manner, we can also write a script to copy our slaves file to each DataNode in the cluster.  I call this script "copy-slaves.sh", make it executable in the same manner as the last script we wrote, and enter this:
 #!/bin/bash  
 for i in {1..5}  
 do  
   scp $HADOOP_CONF_DIR/slaves craigtrim@slave$i:$HADOOP_CONF_DIR  
 done  


Operational Output

The script output contains nothing surprising:
 craigtrim@CVB:~$ ./copy-slaves.sh  
 slaves                    100%  35   0.0KB/s  00:00    
 slaves                    100%  35   0.0KB/s  00:00    
 slaves                    100%  35   0.0KB/s  00:00    
 slaves                    100%  35   0.0KB/s  00:00    
 slaves                    100%  35   0.0KB/s  00:00    



Starting Up


If the entire cluster has been stopped, you should just start the cluster using the start-dfs and the start-yarn shell scripts, as described in this article "Flipping the Switch".

If your cluster is already operational, and you want to "hot deploy" a new DataNode, then execute this command the DataNode:
 hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode  

this works then notice once I start up the datanode (it heartbeats to the namenode - diagram this)
 2014-11-26 13:08:56,141 INFO [main] datanode.DataNode (StringUtils.java:startupShutdownMessage(619)) - STARTUP_MSG:   
 /************************************************************  
 STARTUP_MSG: Starting DataNode  
 STARTUP_MSG:  host = CVB/127.0.1.1  
 STARTUP_MSG:  args = []  
 STARTUP_MSG:  version = 2.5.2  
 STARTUP_MSG:  classpath = /usr/lib/apache/hadoop/2.5.2/conf:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-net-3.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jettison-1.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/junit-4.11.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/activation-1.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/paranamer-2.3.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jsch-0.1.42.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/hadoop-auth-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-el-1.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/hadoop-annotations-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-collections-3.2.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/xz-1.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/httpclient-4.2.5.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/asm-3.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-httpclient-3.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/httpcore-4.2.5.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/avro-1.7.4.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/commons-io-2.4.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jsr305-1.3.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/lib/guava-11.0.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/hadoop-common-2.5.2-tests.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/hadoop-nfs-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/common/hadoop-common-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/commons-el-1.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/hadoop-hdfs-nfs-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/hadoop-hdfs-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/hdfs/hadoop-hdfs-2.5.2-tests.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/activation-1.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jline-0.9.94.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/guice-3.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/commons-collections-3.2.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/xz-1.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/asm-3.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/commons-httpclient-3.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jsr305-1.3.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/hadoop-yarn-client-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/hadoop-yarn-common-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/hadoop-yarn-server-common-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/hadoop-yarn-api-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/yarn/hadoop-yarn-server-tests-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/hadoop-annotations-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.2-tests.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.5.2.jar:/usr/lib/apache/hadoop/2.5.2/contrib/capacity-scheduler/*.jar:/usr/lib/apache/hadoop/2.5.2/contrib/capacity-scheduler/*.jar:/usr/lib/apache/hadoop/2.5.2/contrib/capacity-scheduler/*.jar  
 STARTUP_MSG:  build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r cc72e9b000545b86b75a61f4835eb86d57bfafc0; compiled by 'jenkins' on 2014-11-14T23:45Z  
 STARTUP_MSG:  java = 1.8.0_25  
 ************************************************************/  
 2014-11-26 13:08:56,157 INFO [main] datanode.DataNode (SignalLogger.java:register(91)) - registered UNIX signal handlers for [TERM, HUP, INT]  
 2014-11-26 13:08:56,699 WARN [main] impl.MetricsConfig (MetricsConfig.java:loadFirst(124)) - Cannot locate configuration: tried hadoop-metrics2-datanode.properties,hadoop-metrics2.properties  
 2014-11-26 13:08:56,794 INFO [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(345)) - Scheduled snapshot period at 10 second(s).  
  2014-11-26 13:08:56,794 INFO [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:start(184)) - DataNode metrics system started  
 2014-11-26 13:08:56,799 INFO [main] datanode.DataNode (DataNode.java:<init>(291)) - Configured hostname is CVB  
 2014-11-26 13:08:56,800 INFO [main] datanode.DataNode (DataNode.java:startDataNode(766)) - Starting DataNode with maxLockedMemory = 0  
 2014-11-26 13:08:56,825 INFO [main] datanode.DataNode (DataNode.java:initDataXceiver(567)) - Opened streaming server at /0.0.0.0:50010  
 2014-11-26 13:08:56,829 INFO [main] datanode.DataNode (DataXceiverServer.java:<init>(75)) - Balancing bandwith is 1048576 bytes/s  
 2014-11-26 13:08:56,829 INFO [main] datanode.DataNode (DataXceiverServer.java:<init>(76)) - Number threads for balancing is 5  
 2014-11-26 13:08:57,069 INFO [main] mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog  
 2014-11-26 13:08:57,073 INFO [main] http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) - Http request log for http.requests.datanode is not defined  
 2014-11-26 13:08:57,084 INFO [main] http.HttpServer2 (HttpServer2.java:addGlobalFilter(699)) - Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)  
 2014-11-26 13:08:57,086 INFO [main] http.HttpServer2 (HttpServer2.java:addFilter(677)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode  
 2014-11-26 13:08:57,087 INFO [main] http.HttpServer2 (HttpServer2.java:addFilter(684)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static  
 2014-11-26 13:08:57,087 INFO [main] http.HttpServer2 (HttpServer2.java:addFilter(684)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs  
 2014-11-26 13:08:57,103 INFO [main] http.HttpServer2 (HttpServer2.java:addJerseyResourcePackage(603)) - addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.datanode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*  
 2014-11-26 13:08:57,106 INFO [main] http.HttpServer2 (HttpServer2.java:openListeners(887)) - Jetty bound to port 50075  
 2014-11-26 13:08:57,106 INFO [main] mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26  
 2014-11-26 13:08:57,394 INFO [main] mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50075  
 2014-11-26 13:08:57,559 INFO [main] datanode.DataNode (DataNode.java:startDataNode(783)) - dnUserName = craigtrim  
 2014-11-26 13:08:57,559 INFO [main] datanode.DataNode (DataNode.java:startDataNode(784)) - supergroup = supergroup  
 2014-11-26 13:08:57,608 INFO [main] ipc.CallQueueManager (CallQueueManager.java:<init>(53)) - Using callQueue class java.util.concurrent.LinkedBlockingQueue  
 2014-11-26 13:08:57,626 INFO [Socket Reader #1 for port 50020] ipc.Server (Server.java:run(593)) - Starting Socket Reader #1 for port 50020  
 2014-11-26 13:08:57,660 INFO [main] datanode.DataNode (DataNode.java:initIpcServer(442)) - Opened IPC server at /0.0.0.0:50020  
 2014-11-26 13:08:57,673 INFO [main] datanode.DataNode (BlockPoolManager.java:refreshNamenodes(152)) - Refresh request received for nameservices: null  
 2014-11-26 13:08:57,698 INFO [main] datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(197)) - Starting BPOfferServices for nameservices: <default>  
 2014-11-26 13:08:57,708 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataNode (BPServiceActor.java:run(805)) - Block pool <registering> (Datanode Uuid unassigned) service to master/192.168.1.10:9000 starting to offer service  
 2014-11-26 13:08:57,713 INFO [IPC Server Responder] ipc.Server (Server.java:run(815)) - IPC Server Responder: starting  
 2014-11-26 13:08:57,713 INFO [IPC Server listener on 50020] ipc.Server (Server.java:run(662)) - IPC Server listener on 50020: starting  
 2014-11-26 13:08:57,947 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] common.Storage (DataStorage.java:recoverTransitionRead(173)) - Data-node version: -55 and name-node layout version: -57  
 2014-11-26 13:08:57,962 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] common.Storage (Storage.java:tryLock(706)) - Lock on /home/craigtrim/HADOOP_DATA_DIR/data/in_use.lock acquired by nodename 5450@CVB  
 2014-11-26 13:08:57,964 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] common.Storage (DataStorage.java:recoverTransitionRead(197)) - Storage directory /home/craigtrim/HADOOP_DATA_DIR/data is not formatted  
 2014-11-26 13:08:57,964 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] common.Storage (DataStorage.java:recoverTransitionRead(198)) - Formatting ...  
 2014-11-26 13:08:58,000 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] common.Storage (BlockPoolSliceStorage.java:recoverTransitionRead(108)) - Analyzing storage directories for bpid BP-1847084755-127.0.1.1-1416961177933  
 2014-11-26 13:08:58,000 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] common.Storage (Storage.java:lock(666)) - Locking is disabled  
 2014-11-26 13:08:58,001 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] common.Storage (BlockPoolSliceStorage.java:recoverTransitionRead(130)) - Storage directory /home/craigtrim/HADOOP_DATA_DIR/data/current/BP-1847084755-127.0.1.1-1416961177933 is not formatted.  
 2014-11-26 13:08:58,001 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] common.Storage (BlockPoolSliceStorage.java:recoverTransitionRead(131)) - Formatting ...  
 2014-11-26 13:08:58,001 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] common.Storage (BlockPoolSliceStorage.java:format(183)) - Formatting block pool BP-1847084755-127.0.1.1-1416961177933 directory /home/craigtrim/HADOOP_DATA_DIR/data/current/BP-1847084755-127.0.1.1-1416961177933/current  
 2014-11-26 13:08:58,004 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] common.Storage (BlockPoolSliceStorage.java:doTransition(254)) - Restored 0 block files from trash.  
 2014-11-26 13:08:58,006 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataNode (DataNode.java:initStorage(977)) - Setting up storage: nsid=1137176454;bpid=BP-1847084755-127.0.1.1-1416961177933;lv=-55;nsInfo=lv=-57;cid=CID-822421ef-e4d6-49d7-9b25-e0b81ae32b7d;nsid=1137176454;c=0;bpid=BP-1847084755-127.0.1.1-1416961177933;dnuuid=null  
 2014-11-26 13:08:58,009 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataNode (DataNode.java:checkDatanodeUuid(812)) - Generated and persisted new Datanode UUID 68bd5310-85cb-49bf-9bfa-a34be7531d58  
 2014-11-26 13:08:58,026 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] impl.FsDatasetImpl (FsDatasetImpl.java:<init>(214)) - Added volume - /home/craigtrim/HADOOP_DATA_DIR/data/current, StorageType: DISK  
 2014-11-26 13:08:58,036 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] impl.FsDatasetImpl (FsDatasetImpl.java:registerMBean(1376)) - Registered FSDatasetState MBean  
 2014-11-26 13:08:58,041 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DirectoryScanner (DirectoryScanner.java:start(329)) - Periodic Directory Tree Verification scan starting at 1417045060041 with interval 21600000  
 2014-11-26 13:08:58,042 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] impl.FsDatasetImpl (FsDatasetImpl.java:addBlockPool(1745)) - Adding block pool BP-1847084755-127.0.1.1-1416961177933  
 2014-11-26 13:08:58,044 INFO [Thread-30] impl.FsDatasetImpl (FsVolumeList.java:run(208)) - Scanning block pool BP-1847084755-127.0.1.1-1416961177933 on volume /home/craigtrim/HADOOP_DATA_DIR/data/current...  
 2014-11-26 13:08:58,061 INFO [Thread-30] impl.FsDatasetImpl (FsVolumeList.java:run(213)) - Time taken to scan block pool BP-1847084755-127.0.1.1-1416961177933 on /home/craigtrim/HADOOP_DATA_DIR/data/current: 17ms  
 2014-11-26 13:08:58,062 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] impl.FsDatasetImpl (FsVolumeList.java:addBlockPool(237)) - Total time to scan all replicas for block pool BP-1847084755-127.0.1.1-1416961177933: 19ms  
 2014-11-26 13:08:58,062 INFO [Thread-32] impl.FsDatasetImpl (FsVolumeList.java:run(109)) - Adding replicas to map for block pool BP-1847084755-127.0.1.1-1416961177933 on volume /home/craigtrim/HADOOP_DATA_DIR/data/current...  
 2014-11-26 13:08:58,063 INFO [Thread-32] impl.FsDatasetImpl (FsVolumeList.java:run(114)) - Time to add replicas to map for block pool BP-1847084755-127.0.1.1-1416961177933 on volume /home/craigtrim/HADOOP_DATA_DIR/data/current: 0ms  
 2014-11-26 13:08:58,063 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] impl.FsDatasetImpl (FsVolumeList.java:getAllVolumesMap(137)) - Total time to add all replicas to map: 2ms  
 2014-11-26 13:08:58,067 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataNode (BPServiceActor.java:register(765)) - Block pool BP-1847084755-127.0.1.1-1416961177933 (Datanode Uuid null) service to master/192.168.1.10:9000 beginning handshake with NN  
 2014-11-26 13:08:58,084 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataNode (BPServiceActor.java:register(778)) - Block pool Block pool BP-1847084755-127.0.1.1-1416961177933 (Datanode Uuid null) service to master/192.168.1.10:9000 successfully registered with NN  
 2014-11-26 13:08:58,085 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataNode (BPServiceActor.java:offerService(637)) - For namenode master/192.168.1.10:9000 using DELETEREPORT_INTERVAL of 300000 msec BLOCKREPORT_INTERVAL of 21600000msec CACHEREPORT_INTERVAL of 10000msec Initial delay: 0msec; heartBeatInterval=3000  
 2014-11-26 13:08:58,127 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataNode (BPOfferService.java:updateActorStatesFromHeartbeat(439)) - Namenode Block pool BP-1847084755-127.0.1.1-1416961177933 (Datanode Uuid 68bd5310-85cb-49bf-9bfa-a34be7531d58) service to master/192.168.1.10:9000 trying to claim ACTIVE state with txid=1193283  
 2014-11-26 13:08:58,127 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataNode (BPOfferService.java:updateActorStatesFromHeartbeat(451)) - Acknowledging ACTIVE Namenode Block pool BP-1847084755-127.0.1.1-1416961177933 (Datanode Uuid 68bd5310-85cb-49bf-9bfa-a34be7531d58) service to master/192.168.1.10:9000  
 2014-11-26 13:08:58,153 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataNode (BPServiceActor.java:blockReport(498)) - Sent 1 blockreports 0 blocks total. Took 1 msec to generate and 24 msecs for RPC and NN processing. Got back commands org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@7c7470f4  
 2014-11-26 13:08:58,154 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataNode (BPOfferService.java:processCommandFromActive(618)) - Got finalize command for block pool BP-1847084755-127.0.1.1-1416961177933  
 2014-11-26 13:08:58,160 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] util.GSet (LightWeightGSet.java:computeCapacity(354)) - Computing capacity for map BlockMap  
 2014-11-26 13:08:58,160 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] util.GSet (LightWeightGSet.java:computeCapacity(355)) - VM type    = 64-bit  
 2014-11-26 13:08:58,161 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] util.GSet (LightWeightGSet.java:computeCapacity(356)) - 0.5% max memory 889 MB = 4.4 MB  
 2014-11-26 13:08:58,161 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] util.GSet (LightWeightGSet.java:computeCapacity(361)) - capacity   = 2^19 = 524288 entries  
 2014-11-26 13:08:58,162 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.BlockPoolSliceScanner (BlockPoolSliceScanner.java:<init>(186)) - Periodic Block Verification Scanner initialized with interval 504 hours for block pool BP-1847084755-127.0.1.1-1416961177933  
 2014-11-26 13:08:58,166 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataBlockScanner (DataBlockScanner.java:addBlockPool(264)) - Added bpid=BP-1847084755-127.0.1.1-1416961177933 to blockPoolScannerMap, new size=1  
 2014-11-26 13:09:01,671 INFO [DataXceiver for client at /192.168.1.12:47651 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835150_94326]] datanode.DataNode (DataXceiver.java:writeBlock(600)) - Receiving BP-1847084755-127.0.1.1-1416961177933:blk_1073835150_94326 src: /192.168.1.12:47651 dest: /192.168.1.15:50010  
 2014-11-26 13:09:01,671 INFO [DataXceiver for client at /192.168.1.12:47650 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835151_94327]] datanode.DataNode (DataXceiver.java:writeBlock(600)) - Receiving BP-1847084755-127.0.1.1-1416961177933:blk_1073835151_94327 src: /192.168.1.12:47650 dest: /192.168.1.15:50010  
 2014-11-26 13:09:01,672 INFO [DataXceiver for client at /192.168.1.11:38074 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835153_94329]] datanode.DataNode (DataXceiver.java:writeBlock(600)) - Receiving BP-1847084755-127.0.1.1-1416961177933:blk_1073835153_94329 src: /192.168.1.11:38074 dest: /192.168.1.15:50010  
 2014-11-26 13:09:01,675 INFO [DataXceiver for client at /192.168.1.11:38075 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835152_94328]] datanode.DataNode (DataXceiver.java:writeBlock(600)) - Receiving BP-1847084755-127.0.1.1-1416961177933:blk_1073835152_94328 src: /192.168.1.11:38075 dest: /192.168.1.15:50010  
 2014-11-26 13:09:01,702 INFO [DataXceiver for client at /192.168.1.12:47651 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835150_94326]] datanode.DataNode (DataXceiver.java:writeBlock(766)) - Received BP-1847084755-127.0.1.1-1416961177933:blk_1073835150_94326 src: /192.168.1.12:47651 dest: /192.168.1.15:50010 of size 923  
 2014-11-26 13:09:01,702 INFO [DataXceiver for client at /192.168.1.12:47650 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835151_94327]] datanode.DataNode (DataXceiver.java:writeBlock(766)) - Received BP-1847084755-127.0.1.1-1416961177933:blk_1073835151_94327 src: /192.168.1.12:47650 dest: /192.168.1.15:50010 of size 1367  
 2014-11-26 13:09:01,703 INFO [DataXceiver for client at /192.168.1.11:38074 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835153_94329]] datanode.DataNode (DataXceiver.java:writeBlock(766)) - Received BP-1847084755-127.0.1.1-1416961177933:blk_1073835153_94329 src: /192.168.1.11:38074 dest: /192.168.1.15:50010 of size 1902  
 2014-11-26 13:09:01,702 INFO [DataXceiver for client at /192.168.1.11:38075 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835152_94328]] datanode.DataNode (DataXceiver.java:writeBlock(766)) - Received BP-1847084755-127.0.1.1-1416961177933:blk_1073835152_94328 src: /192.168.1.11:38075 dest: /192.168.1.15:50010 of size 1980  
 2014-11-26 13:09:03,055 INFO [Thread-26] datanode.BlockPoolSliceScanner (BlockPoolSliceScanner.java:verifyBlock(430)) - Verification succeeded for BP-1847084755-127.0.1.1-1416961177933:blk_1073835151_94327  
 2014-11-26 13:09:03,057 INFO [Thread-26] datanode.BlockPoolSliceScanner (BlockPoolSliceScanner.java:verifyBlock(430)) - Verification succeeded for BP-1847084755-127.0.1.1-1416961177933:blk_1073835153_94329  
 2014-11-26 13:09:03,059 INFO [Thread-26] datanode.BlockPoolSliceScanner (BlockPoolSliceScanner.java:verifyBlock(430)) - Verification succeeded for BP-1847084755-127.0.1.1-1416961177933:blk_1073835150_94326  
 2014-11-26 13:09:03,060 INFO [Thread-26] datanode.BlockPoolSliceScanner (BlockPoolSliceScanner.java:verifyBlock(430)) - Verification succeeded for BP-1847084755-127.0.1.1-1416961177933:blk_1073835152_94328  
 2014-11-26 13:09:04,479 INFO [DataXceiver for client at /192.168.1.12:47652 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835155_94331]] datanode.DataNode (DataXceiver.java:writeBlock(600)) - Receiving BP-1847084755-127.0.1.1-1416961177933:blk_1073835155_94331 src: /192.168.1.12:47652 dest: /192.168.1.15:50010  
 2014-11-26 13:09:04,479 INFO [DataXceiver for client at /192.168.1.12:47653 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835154_94330]] datanode.DataNode (DataXceiver.java:writeBlock(600)) - Receiving BP-1847084755-127.0.1.1-1416961177933:blk_1073835154_94330 src: /192.168.1.12:47653 dest: /192.168.1.15:50010  
 2014-11-26 13:09:04,482 INFO [DataXceiver for client at /192.168.1.12:47653 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835154_94330]] datanode.DataNode (DataXceiver.java:writeBlock(766)) - Received BP-1847084755-127.0.1.1-1416961177933:blk_1073835154_94330 src: /192.168.1.12:47653 dest: /192.168.1.15:50010 of size 776  
 2014-11-26 13:09:04,484 INFO [DataXceiver for client at /192.168.1.12:47652 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835155_94331]] datanode.DataNode (DataXceiver.java:writeBlock(766)) - Received BP-1847084755-127.0.1.1-1416961177933:blk_1073835155_94331 src: /192.168.1.12:47652 dest: /192.168.1.15:50010 of size 381  
 2014-11-26 13:09:04,617 INFO [DataXceiver for client at /192.168.1.11:38076 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835157_94333]] datanode.DataNode (DataXceiver.java:writeBlock(600)) - Receiving BP-1847084755-127.0.1.1-1416961177933:blk_1073835157_94333 src: /192.168.1.11:38076 dest: /192.168.1.15:50010  
 2014-11-26 13:09:04,619 INFO [DataXceiver for client at /192.168.1.11:38077 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835156_94332]] datanode.DataNode (DataXceiver.java:writeBlock(600)) - Receiving BP-1847084755-127.0.1.1-1416961177933:blk_1073835156_94332 src: /192.168.1.11:38077 dest: /192.168.1.15:50010  
 2014-11-26 13:09:04,621 INFO [DataXceiver for client at /192.168.1.11:38076 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835157_94333]] datanode.DataNode (DataXceiver.java:writeBlock(766)) - Received BP-1847084755-127.0.1.1-1416961177933:blk_1073835157_94333 src: /192.168.1.11:38076 dest: /192.168.1.15:50010 of size 555  
 2014-11-26 13:09:04,624 INFO [DataXceiver for client at /192.168.1.11:38077 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835156_94332]] datanode.DataNode (DataXceiver.java:writeBlock(766)) - Received BP-1847084755-127.0.1.1-1416961177933:blk_1073835156_94332 src: /192.168.1.11:38077 dest: /192.168.1.15:50010 of size 7514  
 2014-11-26 13:09:07,480 INFO [DataXceiver for client at /192.168.1.12:47654 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835163_94339]] datanode.DataNode (DataXceiver.java:writeBlock(600)) - Receiving BP-1847084755-127.0.1.1-1416961177933:blk_1073835163_94339 src: /192.168.1.12:47654 dest: /192.168.1.15:50010  
 2014-11-26 13:09:07,486 INFO [DataXceiver for client at /192.168.1.12:47655 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835162_94338]] datanode.DataNode (DataXceiver.java:writeBlock(600)) - Receiving BP-1847084755-127.0.1.1-1416961177933:blk_1073835162_94338 src: /192.168.1.12:47655 dest: /192.168.1.15:50010  
 2014-11-26 13:09:07,488 INFO [DataXceiver for client at /192.168.1.12:47654 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835163_94339]] datanode.DataNode (DataXceiver.java:writeBlock(766)) - Received BP-1847084755-127.0.1.1-1416961177933:blk_1073835163_94339 src: /192.168.1.12:47654 dest: /192.168.1.15:50010 of size 6221  
 2014-11-26 13:09:07,496 INFO [DataXceiver for client at /192.168.1.12:47655 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835162_94338]] datanode.DataNode (DataXceiver.java:writeBlock(766)) - Received BP-1847084755-127.0.1.1-1416961177933:blk_1073835162_94338 src: /192.168.1.12:47655 dest: /192.168.1.15:50010 of size 7424  
 2014-11-26 13:09:07,621 INFO [DataXceiver for client at /192.168.1.11:38079 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835158_94334]] datanode.DataNode (DataXceiver.java:writeBlock(600)) - Receiving BP-1847084755-127.0.1.1-1416961177933:blk_1073835158_94334 src: /192.168.1.11:38079 dest: /192.168.1.15:50010  
 2014-11-26 13:09:07,621 INFO [DataXceiver for client at /192.168.1.11:38078 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835159_94335]] datanode.DataNode (DataXceiver.java:writeBlock(600)) - Receiving BP-1847084755-127.0.1.1-1416961177933:blk_1073835159_94335 src: /192.168.1.11:38078 dest: /192.168.1.15:50010  
 2014-11-26 13:09:07,632 INFO [DataXceiver for client at /192.168.1.11:38078 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835159_94335]] datanode.DataNode (DataXceiver.java:writeBlock(766)) - Received BP-1847084755-127.0.1.1-1416961177933:blk_1073835159_94335 src: /192.168.1.11:38078 dest: /192.168.1.15:50010 of size 4788  
 2014-11-26 13:09:07,633 INFO [DataXceiver for client at /192.168.1.11:38079 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835158_94334]] datanode.DataNode (DataXceiver.java:writeBlock(766)) - Received BP-1847084755-127.0.1.1-1416961177933:blk_1073835158_94334 src: /192.168.1.11:38079 dest: /192.168.1.15:50010 of size 9118  
 ... etc ...  
Notice how the auto -rebalancing occurs It looks like this
2014-11-26 13:09:07,488 INFO [DataXceiver for client at /192.168.1.12:47654 [Receiving block BP-1847084755-127.0.1.1-1416961177933:blk_1073835163_94339]] datanode.DataNode (DataXceiver.java:writeBlock(766)) -
Received BP-1847084755-127.0.1.1-1416961177933:blk_1073835163_94339
src: /192.168.1.12:47654
dest: /192.168.1.15:50010
of size 6221

LIkewise, the summary in the web interface should now show the extra node:



For more information on how DataNodes join the cluster, read up on the heartbeat mechanism in the Hadoop Architecture. Brad Hedlund has one of the best posts on this topic I've come across.


Troubleshooting


Node is expected to serve this storage

If you create a clone VM, and don't clean out the $HADOOP_DATA_DIR, you're likely to get this error.
 2014-11-26 12:56:16,660 WARN [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataNode (BPServiceActor.java:offerService(731)) - Block pool BP-1847084755-127.0.1.1-1416961177933 (Datanode Uuid 9b460762-eba0-45f2-b0b4-f00e11572ed6) service to master/192.168.1.10:9000 is shutting down  
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(192.168.1.15, datanodeUuid=9b460762-eba0-45f2-b0b4-f00e11572ed6, infoPort=50075, ipcPort=50020, storageInfo=lv=-55;cid=CID-822421ef-e4d6-49d7-9b25-e0b81ae32b7d;nsid=1137176454;c=0) is attempting to report storage ID 9b460762-eba0-45f2-b0b4-f00e11572ed6. Node 192.168.1.12:50010 is expected to serve this storage.  
      at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:475)  
      at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1702)  
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1049)  
      at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)  
      at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28061)  
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)  
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)  
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)  
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)  
      at java.security.AccessController.doPrivileged(Native Method)  
      at javax.security.auth.Subject.doAs(Subject.java:422)  
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)  
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)  
      at org.apache.hadoop.ipc.Client.call(Client.java:1411)  
      at org.apache.hadoop.ipc.Client.call(Client.java:1364)  
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)  
      at com.sun.proxy.$Proxy12.blockReport(Unknown Source)  
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)  
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  
      at java.lang.reflect.Method.invoke(Method.java:483)  
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)  
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)  
      at com.sun.proxy.$Proxy12.blockReport(Unknown Source)  
      at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:214)  
      at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:476)  
      at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:699)  
      at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.(BPServiceActor.java:834)  
      at java.lang.Thread.run(Thread.java:745)  
 2014-11-26 12:56:16,664 WARN [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataNode (BPServiceActor.java:run(845)) - Ending block pool service for: Block pool BP-1847084755-127.0.1.1-1416961177933 (Datanode Uuid 9b460762-eba0-45f2-b0b4-f00e11572ed6) service to master/192.168.1.10:9000  
 2014-11-26 12:56:16,766 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool BP-1847084755-127.0.1.1-1416961177933 (Datanode Uuid 9b460762-eba0-45f2-b0b4-f00e11572ed6)  
 2014-11-26 12:56:16,766 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] datanode.DataBlockScanner (DataBlockScanner.java:removeBlockPool(273)) - Removed bpid=BP-1847084755-127.0.1.1-1416961177933 from blockPoolScannerMap  
 2014-11-26 12:56:16,766 INFO [DataNode: [[[DISK]file:/home/craigtrim/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.10:9000] impl.FsDatasetImpl (FsDatasetImpl.java:shutdownBlockPool(1755)) - Removing block pool BP-1847084755-127.0.1.1-1416961177933  
 2014-11-26 12:56:18,768 WARN [main] datanode.DataNode (DataNode.java:secureMain(2019)) - Exiting Datanode  
 2014-11-26 12:56:18,770 INFO [main] util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0  
 2014-11-26 12:56:18,772 INFO [Thread-1] datanode.DataNode (StringUtils.java:run(645)) - SHUTDOWN_MSG:   
 /************************************************************  
 SHUTDOWN_MSG: Shutting down DataNode at CVB/127.0.1.1  
 ************************************************************/  

The solution is to clear out the directory as mentioned above, and restart the DataNode.


References


  1. Inter DataNode Communication
    1. https://wiki.apache.org/hadoop/DataNode
    2. http://www.fromdev.com/2010/12/interview-questions-hadoop-mapreduce.html
      1. The NameNode Orchestrates the replication of data blocks from one datanode to another. The replication data transfer happens directly between datanodes and the data never passes through the namenode.
    3. http://stackoverflow.com/questions/9790784/how-are-the-datanodes-communicates-with-each-other-in-hadoop
      1. "the main communications between datanodes is data transfers, replicating blocks from one datanode to one or more other nodes"
    4. http://opensource.com/life/14/8/intro-apache-hadoop-big-data
      1. Data nodes can talk to each other to rebalance data, to move copies around, and to keep the replication of data high
  2. The Heartbeat Mechanism
    1. http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/
      1. Data Nodes send heartbeats to the Name Node every 3 seconds via a TCP handshake, using the same port number defined for the Name Node daemon, usually TCP 9000.

Tuesday, November 25, 2014

Running the WordCount Program

Environment

  1. Configuring a Hadoop Development Environment



Sample Code


Let's import the sample WordCount program, and make sure we can get this to compile correctly.  We'll export the JAR file containing this code to our HDFS cluster.

WordCountMapper

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
 package dev.hadoop.sandbox.counter;  

 import java.io.IOException;  
 import java.util.StringTokenizer;  
 import org.apache.hadoop.io.IntWritable;  
 import org.apache.hadoop.io.LongWritable;  
 import org.apache.hadoop.io.Text;  
 import org.apache.hadoop.mapreduce.Mapper;  

 public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {  
      
      private final IntWritable     one        = new IntWritable(1);  
      private Text                  word       = new Text();  
     
      public WordCountMapper() {  
           System.out.println("Init WordCount Mapper");  
      }  
     
      @Override  
      protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {  
           StringTokenizer iter = new StringTokenizer(value.toString());  
           while (iter.hasMoreTokens()) {  
                word.set(iter.nextToken());  
                context.write(word, one);  
           }  
      }  
 }  

WordCountReducer

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
 package dev.hadoop.sandbox.counter;  

 import java.io.IOException;  
 import java.util.Iterator;  
 import org.apache.hadoop.io.IntWritable;  
 import org.apache.hadoop.io.Text;  
 import org.apache.hadoop.mapreduce.Reducer;  

 public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {  
     
      private IntWritable     result     = new IntWritable();  
      
      public WordCountReducer() {  
           System.out.println("Init WordCountReducer");  
      }  
     
      @Override  
      protected void reduce(Text word, Iterable<IntWritable> intOne, Context context) throws IOException, InterruptedException {  
           int sum = 0;  
           Iterator<IntWritable> iter = intOne.iterator();  
           while (iter.hasNext())  
                sum += iter.next().get();  
           result.set(sum);  
           context.write(word, result);  
      }  
 }  

WordCountRunner

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
 package dev.hadoop.sandbox.counter;  

 import org.apache.hadoop.conf.Configuration;  
 import org.apache.hadoop.fs.Path;  
 import org.apache.hadoop.io.IntWritable;  
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Job;  
 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
 import org.apache.hadoop.util.GenericOptionsParser;  

 public class WordCountRunner {  

      public static void main(String... args) throws Throwable {  
           Configuration conf = new Configuration();  
           Job job = new Job(conf, "word count");  
           String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();  
           job.setCombinerClass(WordCountReducer.class);  
           job.setReducerClass(WordCountReducer.class);  
           job.setMapperClass(WordCountMapper.class);  
           job.setJarByClass(WordCountRunner.class);  
           job.setOutputKeyClass(Text.class);  
           job.setOutputValueClass(IntWritable.class);  
           FileInputFormat.addInputPath(job, new Path(otherArgs[0]));  
           FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));  
           System.exit(job.waitForCompletion(true) ? 0 : 1);  
      }  
 }  


Import the code, and make sure it can compile successfully.

It should look something like this:


We're going to re-compile this using Maven.  Eclipse is useful as an IDE, but it's better to rely on a tool like Maven for managing the project build cycles.


Building the Sample


I type this command on the terminal window:
mvn clean package

This is essentially the same as calling "mvn package" as far as build cycle execution, with the added advantage of invoking the clean plugin. As the name might imply, this plugin attempts to clean the files and directories generated by Maven during its build.

Note that if your POM file contains dependencies on other POM files, you may need to build a far JAR:
<plugin>
 <groupId>org.apache.maven.plugins</groupId>
 <artifactId>maven-shade-plugin</artifactId>
 <executions>
  <execution>
   <phase>package</phase>
   <goals>
    <goal>shade</goal>
   </goals>
  </execution>
 </executions>
 <configuration>
  <finalName>uber-${artifactId}-${version}</finalName>
 </configuration>
</plugin>
This is the JAR that you'll want to copy to the NameNode and execute.


Running the JAR


I'm going to copy the JAR onto my NameNode:
scp target/*.jar craig@master:~

Then SSH into my NameNode and execute the JAR
hadoop jar ~/sandbox-1.0-SNAPSHOT.jar dev.hadoop.sandbox.counter.WordCountRunner /nyt /out

I've colour-coded the input parameters:
/nyt is the input directory
/out is the output directory 

If the output directory exists from a prior run of this program, you'll have to delete it using this command:
hdfs dfs -rm -r /out

Or simply specify a new output directory (eg. /out2)

Operational Output

The (partial) operational output from a successful run of the WordCounter looks like this:
 craigtrim@CVB:/usr/lib/apache/hadoop/2.5.2/bin$ hadoop jar sandbox-1.0-SNAPSHOT.jar dev.hadoop.sandbox.counter.WordCountRunner /nyt /out  
 Input Directory = /nyt  
 Output Directory = /out  
 2014-11-25 11:14:05,828 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1019)) - session.id is deprecated. Instead, use dfs.metrics.session-id  
 2014-11-25 11:14:05,832 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=  
 2014-11-25 11:14:06,188 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.  
 2014-11-25 11:14:06,389 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(281)) - Total input paths to process : 180  
 2014-11-25 11:14:06,451 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(396)) - number of splits:180  
 2014-11-25 11:14:06,542 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(479)) - Submitting tokens for job: job_local1043625416_0001  
 2014-11-25 11:14:06,569 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/tmp/hadoop-craigtrim/mapred/staging/craigtrim1043625416/.staging/job_local1043625416_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.  
 2014-11-25 11:14:06,572 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/tmp/hadoop-craigtrim/mapred/staging/craigtrim1043625416/.staging/job_local1043625416_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.  
 2014-11-25 11:14:06,662 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/home/craigtrim/HADOOP_DATA_DIR/local/localRunner/craigtrim/job_local1043625416_0001/job_local1043625416_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.  
snip ...
 2014-11-25 11:14:15,549 INFO [pool-6-thread-1] mapred.Task (Task.java:done(1001)) - Task:attempt_local1043625416_0001_r_000000_0 is done. And is in the process of committing  
 2014-11-25 11:14:15,552 INFO [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 180 / 180 copied.  
 2014-11-25 11:14:15,552 INFO [pool-6-thread-1] mapred.Task (Task.java:commit(1162)) - Task attempt_local1043625416_0001_r_000000_0 is allowed to commit now  
 2014-11-25 11:14:15,588 INFO [pool-6-thread-1] output.FileOutputCommitter (FileOutputCommitter.java:commitTask(439)) - Saved output of task 'attempt_local1043625416_0001_r_000000_0' to hdfs://master:9000/out/_temporary/0/task_local1043625416_0001_r_000000  
 2014-11-25 11:14:15,589 INFO [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - reduce > reduce  
 2014-11-25 11:14:15,590 INFO [pool-6-thread-1] mapred.Task (Task.java:sendDone(1121)) - Task 'attempt_local1043625416_0001_r_000000_0' done.  
 2014-11-25 11:14:15,590 INFO [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:run(325)) - Finishing task: attempt_local1043625416_0001_r_000000_0  
 2014-11-25 11:14:15,590 INFO [Thread-5] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - reduce task executor complete.  
 2014-11-25 11:14:15,843 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) - map 100% reduce 100%  
 2014-11-25 11:14:15,844 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1373)) - Job job_local1043625416_0001 completed successfully  
 2014-11-25 11:14:15,932 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Counters: 38  
      File System Counters  
           FILE: Number of bytes read=71500803  
           FILE: Number of bytes written=129250399  
           FILE: Number of read operations=0  
           FILE: Number of large read operations=0  
           FILE: Number of write operations=0  
           HDFS: Number of bytes read=67981524  
           HDFS: Number of bytes written=199881  
           HDFS: Number of read operations=33667  
           HDFS: Number of large read operations=0  
           HDFS: Number of write operations=183  
      Map-Reduce Framework  
           Map input records=4372  
           Map output records=83529  
           Map output bytes=837472  
           Map output materialized bytes=614722  
           Input split bytes=17820  
           Combine input records=83529  
           Combine output records=46932  
           Reduce input groups=19209  
           Reduce shuffle bytes=614722  
           Reduce input records=46932  
           Reduce output records=19209  
           Spilled Records=93864  
           Shuffled Maps =180  
           Failed Shuffles=0  
           Merged Map outputs=180  
           GC time elapsed (ms)=1299  
           CPU time spent (ms)=0  
           Physical memory (bytes) snapshot=0  
           Virtual memory (bytes) snapshot=0  
           Total committed heap usage (bytes)=134581059584  
      Shuffle Errors  
           BAD_ID=0  
           CONNECTION=0  
           IO_ERROR=0  
           WRONG_LENGTH=0  
           WRONG_MAP=0  
           WRONG_REDUCE=0  
      File Input Format Counters   
           Bytes Read=503395  
      File Output Format Counters   
           Bytes Written=199881  

Pay particular attention to the text at the end of the operational output.