+1 678 648 4277 
Table Of Contents
  • Hadoop
  • Architecture of Hive
  • Column Types

Hadoop

The Hadoop ecosystem contains Hive and other sub-projects. It is an open-source framework that stores and analyzes big data in a distributed platform. Hadoop has two main modules:

  • Hadoop Distributed File System (HDFS)

This module supports the storage and processing of datasets. It boasts of a fault-tolerant file system that runs on commodity hardware.

  • MapReduce

This module is a parallel programming model. It processes large amounts of data on huge clusters of commodity hardware. The data processed by MapReduce can be structured, unstructured, and semi-structured.

Architecture of Hive

Hive has a wide range of components that performs various functions. We have discussed these components below:

  • User interface

Hive offers interaction between the users and the Hadoop distributed file system. Some of the user interfaces supported by Hive are the Hive command line, Hive Web UI, and Windows server’s Hive HD insight.

  • Meta Store

Hive stores metadata of tables, databases, etc., or schema in respective database stores. These stores are known as Meta stores.

  • HiveQL Process Engine

This component works the same as the SQL for querying schema information on the Meta store. It is one of the developments that are meant to replace the traditional approach employed by a MapReduce program. HiveQL process engine allows users to write a query for a MapReduce task and process it rather than coding a MapReduce program in Java.

  • Execution Engine

This is the conjunction segment of the HiveQL process engine and Hive’s MapReduce. It is tasked with processing a query and generating results similar to MapReduce results.

  • HBASE or HDFS

HBASE or HDFS is a method for storing data into the file system.

Column Types

This is one of the types involved in table creation in Hive. Some of the column types that support column data types in Hive include:

  • Integral types

Integral type, INT is used to specify data of type integer. BIGINT and SMALLINT are used when the integer range is bigger and smaller respectively. Also, TINYINT is used when the range is the least.

  • String types

They are usually specified using single or double-quotes. String types have two data types, CHAR and VARCHAR. C-types escape characters are supported in Hive.