Monday, December 1, 2014

Hadoop Troubleshooting: GC overhead limit exceeded

GC overhead limit exceeded


 craigtrim@CVB:/usr/lib/apache/hadoop/2.5.2/bin$ hadoop jar sandbox-1.0-SNAPSHOT.jar dev.hadoop.sandbox.counter.WordCountRunner /nyt/1987/ /out2  
 Input Directory = /nyt/1987/  
 Output Directory = /out2  
 2014-12-01 18:37:07,381 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1019)) - session.id is deprecated. Instead, use dfs.metrics.session-id  
 2014-12-01 18:37:07,385 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=  
 2014-12-01 18:37:07,743 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.  
 ^[^N2014-12-01 18:37:14,857 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(281)) - Total input paths to process : 106105  
 2014-12-01 18:37:17,228 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(396)) - number of splits:106105  
 2014-12-01 18:37:17,329 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(479)) - Submitting tokens for job: job_local1925425636_0001  
 2014-12-01 18:37:17,359 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/tmp/hadoop-craigtrim/mapred/staging/craigtrim1925425636/.staging/job_local1925425636_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.  
 2014-12-01 18:37:17,362 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/tmp/hadoop-craigtrim/mapred/staging/craigtrim1925425636/.staging/job_local1925425636_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.  
 2014-12-01 18:37:17,465 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/home/craigtrim/HADOOP_DATA_DIR/local/localRunner/craigtrim/job_local1925425636_0001/job_local1925425636_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.  
 2014-12-01 18:37:17,468 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/home/craigtrim/HADOOP_DATA_DIR/local/localRunner/craigtrim/job_local1925425636_0001/job_local1925425636_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.  
 2014-12-01 18:37:17,476 INFO [main] mapreduce.Job (Job.java:submit(1289)) - The url to track the job: http://localhost:8080/  
 2014-12-01 18:37:17,479 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1334)) - Running job: job_local1925425636_0001  
 2014-12-01 18:37:17,482 INFO [Thread-5] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(471)) - OutputCommitter set in config null  
 2014-12-01 18:37:17,489 INFO [Thread-5] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(489)) - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter  
 2014-12-01 18:37:18,484 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1355)) - Job job_local1925425636_0001 running in uber mode : false  
 2014-12-01 18:37:18,486 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) - map 0% reduce 0%  
 2014-12-01 18:37:37,400 WARN [Thread-5] mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local1925425636_0001  
 java.lang.OutOfMemoryError: GC overhead limit exceeded  
      at java.util.HashMap.newNode(HashMap.java:1734)  
      at java.util.HashMap.putVal(HashMap.java:630)  
      at java.util.HashMap.putMapEntries(HashMap.java:514)  
      at java.util.HashMap.<init>(HashMap.java:489)  
      at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:673)  
      at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:440)  
      at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.<init>(LocalJobRunner.java:217)  
      at org.apache.hadoop.mapred.LocalJobRunner$Job.getMapTaskRunnables(LocalJobRunner.java:272)  
      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:517)  
 2014-12-01 18:37:38,354 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1375)) - Job job_local1925425636_0001 failed with state FAILED due to: NA  
 2014-12-01 18:37:38,432 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Counters: 0  
 craigtrim@CVB:/usr/lib/apache/hadoop/2.5.2/bin$   


Potential Causes

  1. The NameNode was formatted, and the DataNodes already had data in the HADOOP_DATA_DIR
    1. Triggered when I attempted to load data into the HDFS


Solution

  1. Re-create the HADOOP_DATA_DIR
     mkdir -p $HADOOP_DATA_DIR/data  
     mkdir -p $HADOOP_DATA_DIR/name  
     mkdir -p $HADOOP_DATA_DIR/local  
     sudo chmod 755 $HADOOP_DATA_DIR  
    
  2. Format your NameNode
     cd $HADOOP_HOME/bin  
     dfs namenode -format  
    
  3. Re-start Services


Shortcomings

  1. If your HADOOP_DATA_DIR has data that you don't want to lose, this isn't a great solution


References

  1. http://stackoverflow.com/questions/16020334/hadoop-datanode-process-killed
    1. "Problem could be that NN was formatted after cluster was set up and DN were not, so slaves are still referring to old NN."
  2. http://stackoverflow.com/questions/22316187/datanode-not-starts-correctly
    1. More fine-grained approach involving formatting individual clusterIDs, rather than brute force removal of entire HADOOP_DATA_DIR

1 comment: