The Hadoop ecosystem contains Hive and other sub-projects. It is an open-source framework that stores and analyzes big data in a distributed platform. Hadoop has two main modules:
- Hadoop Distributed File System (HDFS)
This module supports the storage and processing of datasets. It boasts of a fault-tolerant file system that runs on commodity hardware.
This module is a parallel programming model. It processes large amounts of data on huge clusters of commodity hardware. The data processed by MapReduce can be structured, unstructured, and semi-structured.
Architecture of Hive
Hive has a wide range of components that performs various functions. We have discussed these components below:
- User interface
Hive offers interaction between the users and the Hadoop distributed file system. Some of the user interfaces supported by Hive are the Hive command line, Hive Web UI, and Windows server’s Hive HD insight.
- Meta Store
Hive stores metadata of tables, databases, etc., or schema in respective database stores. These stores are known as Meta stores.
- HiveQL Process Engine
This component works the same as the SQL for querying schema information on the Meta store. It is one of the developments that are meant to replace the traditional approach employed by a MapReduce program. HiveQL process engine allows users to write a query for a MapReduce task and process it rather than coding a MapReduce program in Java.
- Execution Engine
This is the conjunction segment of the HiveQL process engine and Hive’s MapReduce. It is tasked with processing a query and generating results similar to MapReduce results.
- HBASE or HDFS
HBASE or HDFS is a method for storing data into the file system.
This is one of the types involved in table creation in Hive. Some of the column types that support column data types in Hive include:
- Integral types
Integral type, INT is used to specify data of type integer. BIGINT and SMALLINT are used when the integer range is bigger and smaller respectively. Also, TINYINT is used when the range is the least.
- String types
They are usually specified using single or double-quotes. String types have two data types, CHAR and VARCHAR. C-types escape characters are supported in Hive.