Most popular hadoop commands

Here is a list of the most popular hdfs or hadoop commands to manage your hdfs files. 

List the files in a hdfs directory

hadoop fs -ls hdfs_path
hdfs dfs -ls hdfs_path

Hadoop Create Directory 

hadoop fs -mkdir hdfs://user/new_foder/

Hadoop Create a directory tree

hadoop fs -mkdir -p  hdfs://user/new_foder/new_subfolder/

Hadoop copy hdfs files

hadoop fs -cp source_hdfs_path target_hdfs_path

cp command usage detail:

hadoop fs -cp [-f] [-p | -p[topax]] URI [URI ...] <dest>

This command allows multiple sources as well in which case the destination must be a directory.

Options:

  • The -f option will overwrite the destination if it already exists.

Example:

  • hadoop fs -cp /user/xxx/file1 /user/xxx/file2
  • hadoop fs -cp /user/xxx/file1 /user/xxx/file2 /user/xxxx/dir

Hadoop move a file to new location

hadoop fs -mv source_file new_folder/

mv Usage:

hadoop fs -mv URI [URI ...] <dest>

This command allows multiple sources as well in which case the destination needs to be a directory. Moving files across file systems is not permitted.

Example:

  • hadoop fs -mv /user/xx/file1 /user/xx/file2
  • hadoop fs -mv hdfs://domainA/file1 hdfs://domainA/file2 hdfs://domainA/file3 hdfs://domainA/dir1

Hadoop copy a file or directory from local to Hadoop HDFS

hadoop fs -put local_file hdfs_folder/

hadoop fs -put -f local_file hdfs_folder/   # to overwrite if the file already exist

hadoop fs -copyFromLocal local_file hdfs_folder/

put Usage:

hadoop fs -put <localsrc> ... <dst>

Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system.

  • hadoop fs -put localfile /user/xxx/hadoopfile
  • hadoop fs -put localfile1 localfile2 /user/xxx/hadoopdir
  • hadoop fs -put localfile hdfs://domainA/xxx/hadoopfile
  • hadoop fs -put - hdfs://domainA/xxx/hadoopfile Reads the input from stdin.

Hadoop copy hdfs file to local

hadoop fs -copyToLocal hadoop/file.txt /home/xxx/data

Hadoop show contents of a file

hadoop fs -text any_hdfs_file | less

hadoop fs -zcat hdfs_gzip_file | less

hadoop fs -bzcat hdfs_bzip2_file | less

hadoop fs -cat hdfs_text_file | less

Hadoop grant permission to one file or folder

hadoop fs -chmod 755 hdfs_file_or_folder_path

Hadoop grant permissions to all the subfolders of a folder

hadoop fs -chmod  -R 755 hdfs_folder/

Hadoop Check file or folder size

hadoop fs -du -h hdfs_file_or_folder

Hadoop check whether a file exist

hadoop fs -test -e filename

if [[ $? == 0 ]] ; then ; echo “exist” else echo “not exist” fi

Usage:

hadoop fs -test -[defsz] URI

Options:

  • -d: f the path is a directory, return 0.
  • -e: if the path exists, return 0.
  • -f: if the path is a file, return 0.
  • -s: if the path is not empty, return 0.
  • -z: if the file is zero length, return 0.

Hadoop Copy a directory from one node in the cluster to another

We can Use ‘-distcp’ command to copy,
# -overwrite option to overwrite in an existing files
# -update command to synchronize both directories
#
hadoop fs -distcp hdfs://namenodeA/xxx hdfs://namenodeB/xxx

Hadoop remove or delete a folder or file

hadoop fs -rm -r hadoop/folder

hadoop fs -rm -r -skipTrash hadoop/folder

Hadoop create an empty file directly

hadoop fs -touchz hdfs_empty_file_path

 

For all the commands above, if it is successful, the return code is 0. So we can check the return code in shell to see whether the command has been successfully executed. 

 Reference:

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html

http://stackoverflow.com/questions/31674333/how-to-find-if-a-folder-exists-in-hadoop-or-not