Big Data | Learn for Master - Part 2
  • Apache Hive Usage Example – Create Hive Table

    In this post, I describe how to create a Hive Table.

    Create Table Statement

    The syntax and example to create a Hive table is as follows:


    For example, suppose you want to create a Table student, with the following attributes:

    1. id, the student’s ID number, the type is int,
    2. name, the type is String, 
    3. age, the type is int,
    4. score, the type is Float, 


    1.  If you’re not currently working in the target database,
    [Read More...]
  • Apache Hive Usage Example – How to Check the Current Hive Database

    To know which Hive database you are currently using, Use the following command:

    then the prompt will display Hive(DB_name)

    Another Method is to edit hive-site.xml by pasting this code:

    In second scenario, you can automatically display the Hive dabatabase name when you open terminal.

    [Read More...]
  • Apache Hive Usage Example – Create and Use Database

    I this post, I describe how  to Create a Hive Database, Create Database using JDBC, Describe and Show hive Database. 

    Create Database Statement

    A database in Hive is a namespace or a collection of tables.  In order to create a database in Hive, we need to use the Create Database statement. The syntax for this statement is as follows:


    Here, IF NOT EXISTS is an optional clause, which notifies the user that a database with the same name already exists. We can use SCHEMA in place of DATABASE in this command.

    [Read More...]
  • What is MapReduce and how it works

    MapReduce for word count

    We usually develop programs based on open sourced MapReduce frameworks such as Hadoop, Apache Pig, Apache Hive, and Spark to solve Big Data problems. In this post, I will use an example to describe what MapReduce is and how it works. I hope this will help you learn those Big Data technologies such as Hadoop, Pig, Hive and Spark easier.

    What is MapReduce?

    MapReduce is the key of Big Data. It was invented by Google, and it is the heart of Hadoop.

    It is a programming paradigm that allows engineers or scientist build scalable systems that can run on hundreds or thousands of servers.

    [Read More...]
  • Spark MLlib Example

    In this post, I will use an example to describe how to use pyspark, and show how to train a Support Vector Machine, and use the model to make predications using Spark MLlib.

    The following Program is developed using Ipython Notebook. Please refer to this article for how to set up in Ipython Notebook Server for PySpark, if you want to set up an ipython notebook server. You can also run the program use other python IDEs such Spyder or Pycharm. 

    A simple example to demonstrate how to use sc,

    [Read More...]
Page 2 of 212