Incompatible clusterIDs
2015-03-23 21:09:06,824 WARN [IPC Server handler 5 on 9000] namenode.NameNode (NameNodeRpcServer.java:verifyRequest(1177)) - Registration IDs mismatched: the DatanodeRegistration ID is NS-1432439258-CID-e5f1aae5-2c67-487a-aa0e-5710e3b679e5-0 but the expected ID is NS-691282619-CID-efa3290f-7776-4cfc-8e92-a438d11abdd8-0
2015-03-23 21:09:06,825 INFO [IPC Server handler 5 on 9000] ipc.Server (Server.java:run(2060)) - IPC Server handler 5 on 9000, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 192.168.1.201:37895 Call#13731 Retry#0
org.apache.hadoop.hdfs.protocol.UnregisteredNodeException: Unregistered server: DatanodeRegistration(192.168.1.201, datanodeUuid=5c2239a4-d14c-4549-8b87-f956ece5d946, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-e5f1aae5-2c67-487a-aa0e-5710e3b679e5;nsid=1432439258;c=0)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.verifyRequest(NameNodeRpcServer.java:1180)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1074)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:107)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26380)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
Potential Causes
- The NameNode was formatted, and the DataNodes already had data in the HADOOP_DATA_DIR
- Triggered when I attempted to load data into the HDFS
Solution 1
The ClusterID from the NameNode needs to be copied onto the DataNode.
The first line of the NameNode should contain the error message that shows the incompatible cluster IDs:
the DatanodeRegistration ID is NS-1432439258-CID-e5f1aae5-2c67-487a-aa0e-5710e3b679e5-0 but the expected ID is NS-691282619-CID-efa3290f-7776-4cfc-8e92-a438d11abdd8-0
Copy the NameNode ClusterID to the VERSION file in the $HADOOP_DATA_DIR/data/current directory.
raig@dn02:~/HADOOP_DATA_DIR/data/current$ cat VERSION
#Mon Mar 23 09:37:50 PDT 2015
storageID=DS-821e9b0e-c3c8-478d-9829-9d1f721b84ed
clusterID=CID-e5f1aae5-2c67-487a-aa0e-5710e3b679e5
cTime=0
datanodeUuid=c9363e06-81bf-467f-958c-042292edd3bf
storageType=DATA_NODE
layoutVersion=-56
The datanodeUuid value (in red above) needs to be changed to the "expected ID" value found in the NameNode log file.
IMPORTANT: If you are operating a multi-node cluster, do not copy this file (replacing all the existing VERSION files) on each data node in the cluster. In doing so, you will overwrite the datanodeUuid and storageId values.
Doing this will trigger an UnregisteredNodeException error:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(192.168.1.204, datanodeUuid=5c2239a4-d14c-4549-8b87-f956ece5d946, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-efa3290f-7776-4cfc-8e92-a438d11abdd8;nsid=691282619;c=0) is attempting to report storage ID 5c2239a4-d14c-4549-8b87-f956ece5d946. Node 192.168.1.203:50010 is expected to serve this storage.
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:477)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1780)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1097)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy12.blockReport(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:175)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:492)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:715)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
at java.lang.Thread.run(Thread.java:745)
2015-03-23 21:48:49,812 WARN [DataNode: [[[DISK]file:/home/craig/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.70:9000] datanode.DataNode (BPServiceActor.java:run(861)) - Ending block pool service for: Block pool BP-494263941-10.0.4.15-1426797442562 (Datanode Uuid 5c2239a4-d14c-4549-8b87-f956ece5d946) service to master/192.168.1.70:9000
2015-03-23 21:48:49,913 INFO [DataNode: [[[DISK]file:/home/craig/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.70:9000] datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool BP-494263941-10.0.4.15-1426797442562 (Datanode Uuid 5c2239a4-d14c-4549-8b87-f956ece5d946)
2015-03-23 21:48:49,914 INFO [DataNode: [[[DISK]file:/home/craig/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.70:9000] datanode.DataBlockScanner (DataBlockScanner.java:removeBlockPool(273)) - Removed bpid=BP-494263941-10.0.4.15-1426797442562 from blockPoolScannerMap
2015-03-23 21:48:49,914 INFO [DataNode: [[[DISK]file:/home/craig/HADOOP_DATA_DIR/data/]] heartbeating to master/192.168.1.70:9000] impl.FsDatasetImpl (FsDatasetImpl.java:shutdownBlockPool(2217)) - Removing block pool BP-494263941-10.0.4.15-1426797442562
2015-03-23 21:48:51,915 WARN [main] datanode.DataNode (DataNode.java:secureMain(2392)) - Exiting Datanode
2015-03-23 21:48:51,918 INFO [main] util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0
and cause the DataNode to shutdown.
Solution 2 (Last Resort)
- Re-create the HADOOP_DATA_DIR
mkdir -p $HADOOP_DATA_DIR/data
mkdir -p $HADOOP_DATA_DIR/name
mkdir -p $HADOOP_DATA_DIR/local
chmod 755 $HADOOP_DATA_DIR
- Format your NameNode
- Re-start Services
If your HADOOP_DATA_DIR has data that you don't want to lose, this isn't a great solution
References
- http://stackoverflow.com/questions/16020334/hadoop-datanode-process-killed
- "Problem could be that NN was formatted after cluster was set up and DN were not, so slaves are still referring to old NN."
- http://stackoverflow.com/questions/22316187/datanode-not-starts-correctly
- More fine-grained approach involving formatting individual clusterIDs, rather than brute force removal of entire HADOOP_DATA_DIR