Hadoop: Hadoop Troubleshooting: Incompatible Cluster IDs

Incompatible clusterIDs

2015-03-23 21:09:06,824 WARN  [IPC Server handler 5 on 9000] namenode.NameNode (NameNodeRpcServer.java:verifyRequest(1177)) - Registration IDs mismatched: the DatanodeRegistration ID is NS-1432439258-CID-e5f1aae5-2c67-487a-aa0e-5710e3b679e5-0 but the expected ID is NS-691282619-CID-efa3290f-7776-4cfc-8e92-a438d11abdd8-0
2015-03-23 21:09:06,825 INFO  [IPC Server handler 5 on 9000] ipc.Server (Server.java:run(2060)) - IPC Server handler 5 on 9000, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 192.168.1.201:37895 Call#13731 Retry#0
org.apache.hadoop.hdfs.protocol.UnregisteredNodeException: Unregistered server: DatanodeRegistration(192.168.1.201, datanodeUuid=5c2239a4-d14c-4549-8b87-f956ece5d946, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-e5f1aae5-2c67-487a-aa0e-5710e3b679e5;nsid=1432439258;c=0)
 at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.verifyRequest(NameNodeRpcServer.java:1180)
 at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1074)
 at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:107)
 at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26380)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

Potential Causes

The NameNode was formatted, and the DataNodes already had data in the HADOOP_DATA_DIR

Triggered when I attempted to load data into the HDFS

Solution 1

The ClusterID from the NameNode needs to be copied onto the DataNode.

The first line of the NameNode should contain the error message that shows the incompatible cluster IDs:

the DatanodeRegistration ID is NS-1432439258-CID-e5f1aae5-2c67-487a-aa0e-5710e3b679e5-0 but the expected ID is NS-691282619-CID-efa3290f-7776-4cfc-8e92-a438d11abdd8-0

Copy the NameNode ClusterID to the VERSION file in the $HADOOP_DATA_DIR/data/current directory.

raig@dn02:~/HADOOP_DATA_DIR/data/current$ cat VERSION
#Mon Mar 23 09:37:50 PDT 2015
storageID=DS-821e9b0e-c3c8-478d-9829-9d1f721b84ed
clusterID=CID-e5f1aae5-2c67-487a-aa0e-5710e3b679e5
cTime=0
datanodeUuid=c9363e06-81bf-467f-958c-042292edd3bf
storageType=DATA_NODE
layoutVersion=-56

The datanodeUuid value (in red above) needs to be changed to the "expected ID" value found in the NameNode log file.

IMPORTANT: If you are operating a multi-node cluster, do not copy this file (replacing all the existing VERSION files) on each data node in the cluster. In doing so, you will overwrite the datanodeUuid and storageId values.

Doing this will trigger an UnregisteredNodeException error:

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(192.168.1.204, datanodeUuid=5c2239a4-d14c-4549-8b87-f956ece5d946, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-efa3290f-7776-4cfc-8e92-a438d11abdd8;nsid=691282619;c=0) is attempting to report storage ID 5c2239a4-d14c-4549-8b87-f956ece5d946. Node 192.168.1.203:50010 is expected to serve this storage.
 at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:477)
 at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1780)
 at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1097)
 at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
 at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

 at org.apache.hadoop.ipc.Client.call(Client.java:1468)
 at org.apache.hadoop.ipc.Client.call(Client.java:1399)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
 at com.sun.proxy.$Proxy12.blockReport(Unknown Source)
 at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:175)
 at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:492)
 at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:715)
 at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
 at java.lang.Thread.run(Thread.java:745)
2015-03-23 21:48:49,812 WARN  [DataNode: [[[DISK]file:/home/craig/HADOOP_DATA_DIR/data/]]  heartbeating to master/192.168.1.70:9000] datanode.DataNode (BPServiceActor.java:run(861)) - Ending block pool service for: Block pool BP-494263941-10.0.4.15-1426797442562 (Datanode Uuid 5c2239a4-d14c-4549-8b87-f956ece5d946) service to master/192.168.1.70:9000
2015-03-23 21:48:49,913 INFO  [DataNode: [[[DISK]file:/home/craig/HADOOP_DATA_DIR/data/]]  heartbeating to master/192.168.1.70:9000] datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool BP-494263941-10.0.4.15-1426797442562 (Datanode Uuid 5c2239a4-d14c-4549-8b87-f956ece5d946)
2015-03-23 21:48:49,914 INFO  [DataNode: [[[DISK]file:/home/craig/HADOOP_DATA_DIR/data/]]  heartbeating to master/192.168.1.70:9000] datanode.DataBlockScanner (DataBlockScanner.java:removeBlockPool(273)) - Removed bpid=BP-494263941-10.0.4.15-1426797442562 from blockPoolScannerMap
2015-03-23 21:48:49,914 INFO  [DataNode: [[[DISK]file:/home/craig/HADOOP_DATA_DIR/data/]]  heartbeating to master/192.168.1.70:9000] impl.FsDatasetImpl (FsDatasetImpl.java:shutdownBlockPool(2217)) - Removing block pool BP-494263941-10.0.4.15-1426797442562
2015-03-23 21:48:51,915 WARN  [main] datanode.DataNode (DataNode.java:secureMain(2392)) - Exiting Datanode
2015-03-23 21:48:51,918 INFO  [main] util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0

and cause the DataNode to shutdown.

Solution 2 (Last Resort)

Re-create the HADOOP_DATA_DIR

mkdir -p $HADOOP_DATA_DIR/data  
mkdir -p $HADOOP_DATA_DIR/name  
mkdir -p $HADOOP_DATA_DIR/local  
chmod 755 $HADOOP_DATA_DIR

Format your NameNode
```
./hdfs namenode -format
```
Re-start Services

If your HADOOP_DATA_DIR has data that you don't want to lose, this isn't a great solution

References

http://stackoverflow.com/questions/16020334/hadoop-datanode-process-killed

"Problem could be that NN was formatted after cluster was set up and DN were not, so slaves are still referring to old NN."

http://stackoverflow.com/questions/22316187/datanode-not-starts-correctly

More fine-grained approach involving formatting individual clusterIDs, rather than brute force removal of entire HADOOP_DATA_DIR

5 comments:

UnknownJuly 27, 2015 at 11:20 PM
There are lots of information about hadoop have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get to the next level in big data. Thanks for sharing this.

Hadoop Training Chennai
Hadoop Training in Chennai
Big Data Training in Chennai
AnonymousOctober 26, 2015 at 12:17 AM
Though the hadoop online training gave me the much needed information about the basic hadoop concepts I learned more information like data, cloud, analytic grealy on thie website. Thanks for sharing.
abril josephApril 28, 2018 at 2:29 AM
It’s great to come across a blog every once in a while that isn’t the same out of date rehashed material. Fantastic read.
Hadoop Training Institute In chennai

amazon-web-services-training-in-bangalore
TejutejuAugust 21, 2018 at 6:30 AM
Really Good blog post.provided a helpful information.Big Data Hadoop Online Course Bangalore
easylearnDecember 25, 2019 at 9:15 PM

Wondeful post,very well explained.Thanks for sharing,extremly easy to understand.Python is higly expressive prograaming language.These days all IT industries suggest one to take course on python.
Best Python Training in BTM Layout

Monday, December 1, 2014

Hadoop Troubleshooting: Incompatible Cluster IDs