In this post, I describe how to create a Hive Table.
Create Table Statement
The syntax and example to create a Hive table is as follows:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[ROW FORMAT row_format]
[STORED AS file_format]
For example, suppose you want to create a Table student, with the following attributes:
- id, the student’s ID number, the type is int,
- name, the type is String,
- age, the type is int,
- score, the type is Float,
hive (user_db)> CREATE TABLE IF NOT EXISTS students_db.student ( id int,
> name String,
> age int, score Float)
> COMMENT 'Description of the table'
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE;
Time taken: 2.357 seconds
hive (user_db)> describe students_db.student;
Time taken: 0.04 seconds, Fetched: 4 row(s)
- If you’re not currently working in the target database,
To know which Hive database you are currently using, Use the following command:
then the prompt will display Hive(DB_name)
Another Method is to edit hive-site.xml by pasting this code:
In second scenario, you can automatically display the Hive dabatabase name when you open terminal.
I this post, I describe how to Create a Hive Database, Create Database using JDBC, Describe and Show hive Database.
Create Database Statement
A database in Hive is a namespace or a collection of tables. In order to create a database in Hive, we need to use the Create Database statement. The syntax for this statement is as follows:
CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name>
Here, IF NOT EXISTS is an optional clause, which notifies the user that a database with the same name already exists. We can use SCHEMA in place of DATABASE in this command.
We usually develop programs based on open sourced MapReduce frameworks such as Hadoop, Apache Pig, Apache Hive, and Spark to solve Big Data problems. In this post, I will use an example to describe what MapReduce is and how it works. I hope this will help you learn those Big Data technologies such as Hadoop, Pig, Hive and Spark easier.
What is MapReduce?
MapReduce is the key of Big Data. It was invented by Google, and it is the heart of Hadoop.
It is a programming paradigm that allows engineers or scientist build scalable systems that can run on hundreds or thousands of servers.
In this post, I will use an example to describe how to use pyspark, and show how to train a Support Vector Machine, and use the model to make predications using Spark MLlib.
The following Program is developed using Ipython Notebook. Please refer to this article for how to set up in Ipython Notebook Server for PySpark, if you want to set up an ipython notebook server. You can also run the program use other python IDEs such Spyder or Pycharm.
A simple example to demonstrate how to use sc,