Append to a Hive partition from Pig

Tags: , ,

When we use Hive, we can append data to the table easily, but when we use Pig (i.e., the HCatalog ) to insert data into Hive table, we are not allowed to append data to a partition if that partition already contains data. 

In this post, I describe a method that can help you append data to the existing partition using a dummy partition named run. It means  the run number you append some data to this partition. 

For example, we create the following partitioned hive table:

Then pig script looks like the following: 

Now we can run the pig script using the following command:

Then we have the following content in the table:

Each time when you want to append data to the partition DATE=20160605, you just change the value of RUN

Let’s run again to put the same data to the partition using the following command with RUN==2

You can see that the data is successfully appended. 

When you use this method, you need to do some check to make sure the duplicated data is not inserted into the table.