Advertisements
Note: Work in progress where you will see more articles coming in the near future.
What is Apache Hive?
Apache Hive is an open-source data warehouse solution for Hadoop infrastructure. It is used to process structured data of large datasets and provides a way to run HiveQL queries.
What not?
- Hive not designed for OLTP processing
- It’s not a relational database (RDBMS)
- Not used for row-level updates for real-time systems.
Apache Hive Advantages?
- Supports large datasets
- Runs on Hadoop infrastructure which uses commodity hardware
- Supports SQL syntax
- Provides Beeline client which is used to connect from Java, Scala, C#, Python, and many more languages.
Different ways to process Hive data
- Map-reduce application
- Pig scripts
- HiveQL
Hive Installation
Start HiveServer2 & Connect Beeline
- Hive – Start HiveServer2 & Beeline
- Where does Hive store data files in HDFS?
- Connect to Hive using JDC Connection URL
Hive Clients
- Hive CLI (Deprecated in new Hive version)
- Hive Connect to Beeline
HiveQL DDL Commands
- Hive – Create Database Examples
- Hive – Create Table syntax and usage
- Hive – Drop Table & Database Explained with Examples
- Hive – How to Create Temporary Table Examples
- Hive – Difference Between Managed vs External Tables
HiveQL DML Commands
- Hive – INSERT INTO vs OVERWRITE
- Hive – Load CSV file into Hive Table
- Hive – Export Table into a CSV file
- Hive – Using variable on Scripts
Hive Partition and Bucket
- Create Partitioned Hive Table
- Load or Insert files into Partitioned Table
- Update and Drop Partition on Partitioned Table
- Show all partitions of the Table
- Hive Bucketing and its Advantages
- Hive Partitioning vs Bucketing
Hive Java Examples
Hive Scala Examples
Hive Spark Examples
Hive PySpark Examples
Hive Error or Exceptions
- Hive – HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
- Why Hive tables Loads with Null Values