The HDFS File System
HDFS is not fully POSIX-compliant. The requirements for a POSIX file-system differ from the target goals for a Hadoop application.
HDFS is a distributed filesystem that stores large files across multiple machines. Just like a Unix filesystem, HDFS allows users to manipulate the filesystem using shell commands. Most HDFS commands have a one-to-one correspondence with Unix commands.
Assumptions
This section assumes that:
- you are already logged onto a Linux NameNode
- and transferring files from the filesystem of that NameNode onto the HDFS filesystem.
- For instructions on how to copy files onto the NameNode itself (perhaps from a Windows machine), please read this article.
- Hadoop has been started.
Copying Files into HDFS
In this example, I have some news article data on my home directory.
I'm going to copy this data into my HDFS filesystem:
hdfs dfs -mkdir /nyt
hdfs dfs -put ~/nyt /nyt
hdfs dfs -ls /nyt
Removing Files from HDFS
Using this command, I can delete data from the directory I created in the prior command:
craigtrim@CVB:/usr/lib/apache/hadoop/2.5.2/bin$ hdfs dfs -rm -r /nyt
2014-11-24 14:21:03,517 INFO [main] fs.TrashPolicyDefault (TrashPolicyDefault.java:initialize(92)) - Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /nyt
The Web Interface
It is possible to browse the HDFS filesystem using the NameNode Web Interface.
The URL for the NameNode Web Interface can be found at:
http://192.168.x.y:50070
Click on the Utilities > Browse the File System tab sequence in the menu header, and visually browse the filesystem in read only mode: