Apache Hive | Learn for Master
  • Hive partitioning vs Bucketing

    Hive Bucketing and Partitioning

    To better understand how partitioning and bucketing works, please take a look at how data is stored in hive. Let’s say you have a table

    1. CREATE TABLE mytable (
    2. name string,
    3. city string,
    4. employee_id int )
    5. PARTITIONED BY (year STRING, month STRING, day STRING)
    6. CLUSTERED BY (employee_id) INTO 256 BUCKETS

    You insert some data into a partition for 2015-12-02. Hive will then store data in a directory hierarchy, such as:

    1. /user/hive/warehouse/mytable/y=2015/m=12/d=02

    As such, it is important to be careful when partitioning.

    [Read More...]
  • Append to a Hive partition from Pig

    When we use Hive, we can append data to the table easily, but when we use Pig (i.e., the HCatalog ) to insert data into Hive table, we are not allowed to append data to a partition if that partition already contains data. 

    In this post, I describe a method that can help you append data to the existing partition using a dummy partition named run. It means  the run number you append some data to this partition. 

    For example, we create the following partitioned hive table:

    Then pig script looks like the following: 

    Now we can run the pig script using the following command:

    Then we have the following content in the table:

    Each time when you want to append data to the partition DATE=20160605,

    [Read More...]
  • Set variable for hive script

    When we run hive scripts, such as Load data into Hive table, we often need to pass parameters to the hive scripts by defining our own variables. 

    Here are some examples to show how to pass parameters or user defined variables to hive. 

    Use hiveconf for variable subsititution

    For example, you can define a variable DATE, then use it as ${hiveconf:DATE}

    you can even pass the variable from command line:

    Use env and system variables

    You can also use env and system variables like this  ${env:USER}

    You can run the following command to see all the available variables:

    If you are o the hive prompt,

    [Read More...]
  • An Example to Create a Partitioned Hive Table

    Partition is a very useful feature of Hive. Without partition, it is hard to reuse the Hive Table if you use HCatalog to store data to Hive table using Apache Pig, as you will get exceptions when you insert data to a non-partitioned Hive Table that is not empty

     In this post, I use an example to show how to create a partitioned table, and populate data into it. 

    Let’s suppose you have a dataset for user impressions. For instance, a sample of the data set might be like this:

    id
    user_id
    user_lang
    user_device
    time_stamp
    url
    date
    country

    1
    u1
    en
    iphone
    201503210011
    http://xxx/xxx/1
    20150321
    US

    2
    u1
    en
    ipad
    201503220111
    http://xxx/xxx/2
    20150322 
    US

    3
    u2
    en
    desktop
    201503210051
    http://xxx/xxx/3
     20150321
    CA

    4
    u3
    en
    iphone
    201503230021
    http://xxx/xxx/4
     20150323
    HK

    If you use Pig to analyze the data,

    [Read More...]
  • Exceptions When Delete rows from Hive Table

    It’s straight forward to delete data from a traditional Relational table using SQL. However, delete rows from Hive Rows can cause several exceptions.

    For examples, let see we have a imps_part table,  we want to delete the values in the Table.  You will get the exception:

    When you run the simple delete command, we get: FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations

     

    Some one suggest to use the following command:

    This will result in the following exception:
    FAILED: SemanticException 1:23 Need to specify partition columns because the destination table is partitioned.

    [Read More...]
  • Save data to Hive table Using Apache Pig

    We have described how to load data from Hive Table using Apache Pig, in this post, I will use an example to show how to save data to Hive table using Pig.

    Before save data to Hive, you need to first create a Hive Table. Please refer to this post on how to create a Hive table

    Suppose we use Apache Pig to Load some data from a text file, then we can save the data to the hive table using the following script. 

    The store_student.pig script is like this:

    Note: You must specify the table name in single quotes: STORE data into ‘tablename’.

    [Read More...]
  • Apache Pig Load ORC data from Hive Table

    There are some cases your data is stored in Hive Table, and you may want to process the data using Apache Pig. In this post, I use an example to describe how to read Hive ORC data using Apache Pig. 

    1. We first create Hive table stored as ORC, and load some data into the table.
    2. Then, we develop a Apache Pig script to load the data from the Hive ORC table. 

    Optimized Row Columnar (ORC) file format

    The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data.

    [Read More...]
  • How to get hive table delimiter or schema

    When you have a hive table, you may want to check its delimiter or detailed information such as Schema. There are two solutions:

    Get the delimiter of a Hive Table

    To get the field delimiter of a hive table, we can use the following command:

    Here is an example:

    Get the schema of Hive Table

    Another solution is to use: 

    This will generate a competed information about the table. 

    [Read More...]
  • How to load data from a text file to Hive table

    In this post, I describe how to insert data from a text file to a hive table. 

    Suppose you have tab delimited file::

    Create a Hive table stored as a text file.

    Load the text file (stored locally) into the Hive table:

    Create a Hive table stored as sequence file.

    Now you can load into the sequence table from the text table:

     

    [Read More...]
  • Apache Hive Usage Example – Create Hive Table

    In this post, I describe how to create a Hive Table.

    Create Table Statement

    The syntax and example to create a Hive table is as follows:

    Syntax

    For example, suppose you want to create a Table student, with the following attributes:

    1. id, the student’s ID number, the type is int,
    2. name, the type is String, 
    3. age, the type is int,
    4. score, the type is Float, 

    Note:

    1.  If you’re not currently working in the target database,
    [Read More...]
Page 1 of 212