Showing posts with label hdfs dfs. Show all posts
Showing posts with label hdfs dfs. Show all posts

Monday, November 24, 2014

Working with the Hadoop Distributed File System

The HDFS File System


HDFS is not fully POSIX-compliant.  The requirements for a POSIX file-system differ from the target goals for a Hadoop application.

HDFS is a distributed filesystem that stores large files across multiple machines.  Just like a Unix filesystem, HDFS allows users to manipulate the filesystem using shell commands.  Most HDFS commands have a one-to-one correspondence with Unix commands.



Assumptions


This section assumes that:
  1. you are already logged onto a Linux NameNode
    1. and transferring files from the filesystem of that NameNode onto the HDFS filesystem.
    2. For instructions on how to copy files onto the NameNode itself (perhaps from a Windows machine), please read this article.
  2. Hadoop has been started.



Copying Files into HDFS


In this example, I have some news article data on my home directory.

I'm going to copy this data into my HDFS filesystem:
hdfs dfs -mkdir /nyt
hdfs dfs -put ~/nyt /nyt
hdfs dfs -ls /nyt



Removing Files from HDFS


Using this command, I can delete data from the directory I created in the prior command:
craigtrim@CVB:/usr/lib/apache/hadoop/2.5.2/bin$ hdfs dfs -rm -r /nyt
2014-11-24 14:21:03,517 INFO [main] fs.TrashPolicyDefault (TrashPolicyDefault.java:initialize(92)) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.  
Deleted /nyt  



The Web Interface


It is possible to browse the HDFS filesystem using the NameNode Web Interface.

The URL for the NameNode Web Interface can be found at:
http://192.168.x.y:50070

Click on the Utilities > Browse the File System tab sequence in the menu header, and visually browse the filesystem in read only mode: