Big Data | Learn for Master - Part 5
  • How to load data from a text file to Hive table

    In this post, I describe how to insert data from a text file to a hive table. 

    Suppose you have tab delimited file::

    Create a Hive table stored as a text file.

    Load the text file (stored locally) into the Hive table:

    Create a Hive table stored as sequence file.

    Now you can load into the sequence table from the text table:


    [Read More...]
  • Apache Hive Usage Example – Create Hive Table

    In this post, I describe how to create a Hive Table.

    Create Table Statement

    The syntax and example to create a Hive table is as follows:


    For example, suppose you want to create a Table student, with the following attributes:

    1. id, the student’s ID number, the type is int,
    2. name, the type is String, 
    3. age, the type is int,
    4. score, the type is Float, 


    1.  If you’re not currently working in the target database,
    [Read More...]
  • Apache Hive Usage Example – How to Check the Current Hive Database

    To know which Hive database you are currently using, Use the following command:

    then the prompt will display Hive(DB_name)

    Another Method is to edit hive-site.xml by pasting this code:

    In second scenario, you can automatically display the Hive dabatabase name when you open terminal.

    [Read More...]
  • Apache Hive Usage Example – Create and Use Database

    I this post, I describe how  to Create a Hive Database, Create Database using JDBC, Describe and Show hive Database. 

    Create Database Statement

    A database in Hive is a namespace or a collection of tables.  In order to create a database in Hive, we need to use the Create Database statement. The syntax for this statement is as follows:


    Here, IF NOT EXISTS is an optional clause, which notifies the user that a database with the same name already exists. We can use SCHEMA in place of DATABASE in this command.

    [Read More...]
  • What is MapReduce and how it works

    MapReduce for word count

    We usually develop programs based on open sourced MapReduce frameworks such as Hadoop, Apache Pig, Apache Hive, and Spark to solve Big Data problems. In this post, I will use an example to describe what MapReduce is and how it works. I hope this will help you learn those Big Data technologies such as Hadoop, Pig, Hive and Spark easier.

    What is MapReduce?

    MapReduce is the key of Big Data. It was invented by Google, and it is the heart of Hadoop.

    It is a programming paradigm that allows engineers or scientist build scalable systems that can run on hundreds or thousands of servers.

    [Read More...]
  • How to setup ipython notebook server to run spark in local or yarn model

    Ipython notebook is a powerful tool to learn python programming. In this post, I demonstrate how to setup a ipython notebook to to spark program in python.

    1. Install spark
      suppose spark is install at directory ~/spark, then execute:
    2. Install anaconda at ~/anaconda

      This will compress all the anaconda files to a zip file

      Run ipython notebook for pyspark using local model

    3. Now you can start a ipython notebook server in local model: WORKSPACE_DIR is the space where you want to save your codes.
      CONFIG_FILE is the location of the jupyter_notebook_config file.
    [Read More...]
  • Spark MLlib Example

    In this post, I will use an example to describe how to use pyspark, and show how to train a Support Vector Machine, and use the model to make predications using Spark MLlib.

    The following Program is developed using Ipython Notebook. Please refer to this article for how to set up in Ipython Notebook Server for PySpark, if you want to set up an ipython notebook server. You can also run the program use other python IDEs such Spyder or Pycharm. 

    A simple example to demonstrate how to use sc,

    [Read More...]
Page 5 of 512345