• shell script replace variable of linux path

    sed is a popular tool to be used to replace a string in a file.

    For example, give a file, we can use the following command to replace all the string XXX into YY

    sed -i s/XXX/YY/g fileName

    or sed s/XXX/YY/g fileName > newFile

    but when you want to replace linux path, the above method doesn’t work. You can try it your self.

    here is a workable solution: 

    pathA=”/user/xx/zz/”

    pathB=”/user/aa/zz/dd”

    sed “s,$pathA,$pathB,g” fileName > newFileName

    then all the path of /user/xx/zz/ will be replaced by /user/aa/zz/dd.

    [Read More...]
  • Most popular hadoop commands

    Here is a list of the most popular hdfs or hadoop commands to manage your hdfs files. 

    List the files in a hdfs directory

    hadoop fs -ls hdfs_path
    hdfs dfs -ls hdfs_path

    Hadoop Create Directory 

    hadoop fs -mkdir hdfs://user/new_foder/

    Hadoop Create a directory tree

    hadoop fs -mkdir -p  hdfs://user/new_foder/new_subfolder/

    Hadoop copy hdfs files

    hadoop fs -cp source_hdfs_path target_hdfs_path

    cp command usage detail:

    hadoop fs -cp [-f] [-p | -p[topax]] URI [URI …] <dest>

    This command allows multiple sources as well in which case the destination must be a directory.

    [Read More...]
  • untar (decompress) a tgz or tbz file in linux

    We already know how to tar all the files under a directory into a tgz file.  This posts use simple examples to show how to untar a tgz or tar.bz2 file . 

    untar a .tar file

    x: tells tar to extract the files.

    v:  list all of the files one by one in the archive. The “v” stands for “verbose.”

    f:  tells tar that a file name is followed.

    untar a .tgz or tar.gz file

    If the file extension is .tgz or tar.gz,

    [Read More...]
  • Test whether a file or directory exist in shell

    How to test whether a file exist in shell on Linux? We can use the test command to check file types and whether it exits.

    How to test whether a path is a regular file in shell on Linux

    We can use the following command to test whether a path is a regular file and exist.

    How to test whether a path is a folder in shell

    Using the following command, we can test whether a path is a folder in shell on linux.

    How to test whether a file is a symbolic link on linux?

    [Read More...]
  • tar all the files under a directory on linux

    We often need to tar all the files under a directory into a zip file.  This post gives two examples to show how to tar all the contents under a directory into a tgz file. 

    Suppose we have directory structures like

    By running the following command, we can tar all the files of my_folder into the tgz file.

    However, you will find that you actually include the my_folder in the zip file. The structure of the tgz file will looks like this.

    What if you want the file structures like this:

    Two methods tar all the files under a directory into a tgz  file

    You can use the following command to include only the files under my_folder to the zip file:

    Using the -C option to tar all the files under a directory into a tgz file

    You can also using the -C option to include all the files under your directory into a zip file:

    The -C my_folder tells tar to change the current directory to my_folder,

    [Read More...]
  • check the size of directory or file on linux

    We often need to check the size of a file or subdirectories on linux.  This post shows how to use du command to check the size of a file or directory on linux system. We also show how to sort folders or files by size. 

    The two most useful options of du commands are: -s and -h,The meaning of the options are:

    Get the human readable size of a file

    We can simply use du -h command to get human readable size of a file:

    du -h <file_name>

    Get the size of a folder

    To get the size of a folder only,

    [Read More...]
  • Get shell output when calling shell from Python

    We have shown how to call shell in python using the subprocess communicate method. However, this method has some problems as the output is buffered into memory. We need to print out the shell output if the size is too large. Sometimes, when the shell scripts run for a long time, we may need to examine the output in real time to check the status of the problem.  In order to get the real time output from shell from python, we can using the following method:

    Suppose we have a shell script, long_shell.sh,  like this:

    Then we have the following python program to call the shell command and print the real output in screen.

    [Read More...]
  • Shell for loop: generate a sequence of numbers in Shell

    For loop is one of the most frequently used command in shell. In this post, we show how to generate a sequence of numbers in shell, and use for loop to print out the numbers.

    How to generate a range of number in Shell or Bash.  

    The answer is using the seq command, here is the seq command syntax:

    seq LAST
    seq FIRST LAST
    seq FIRST INCREMENT LAST

    Generate a sequence from 1 to 10

    To generate a sequence of numbers from 1 to 10, we can use both seq 10  or seq 1 10. 

    [Read More...]
  • Run external shell command in Python

    There is often a need to call shell command from python directly. In this post, I use examples to show how to run external shell commands in python.

    The recommended way to call shell command from python is using the subprocess library. See the following example:

    In this post, I use example to show how to run shell command in a python program. However, this method has some problems as the output is buffered into memory.

    We need to print out the shell output on the fly if the size is too large.

    [Read More...]
  • Start, Restart and Stop Apache web server on Linux

    We often need to start, stop or restart apache web server on Linux system such as Debian and Ubuntu. 

    This post uses examples to show how to Start, Restart, or Stop apache web server by using service, systemctl or apache2ctl commands.

    service command

    This method works in most Linux distributions including Debian and Ubuntu.

    To start Apache 2, run:
    $ sudo service apache2 start

    To restart Apache 2, run:
    $ sudo service apache2 restart

    To stop Apache 2, run:
    $ sudo service apache2 stop

    To gracefully reload Apache 2,

    [Read More...]
Page 1 of 212