+1 678 648 4277 
Table Of Contents
  • Apache Pig Components
  • Pig Latin Runtime Environment
  • Pig Latin Data Model

Apache Pig Components

Apache pig was founded on Apache Hadoop ecosystem MapReduce. It was developed to handle a variety and a mammoth of a dataset. Apache Pig was created by Yahoo and was later converted into an open-source project. Mentioned below are some of the fundamental components of Apache Pig:

  • Parser

This component handles the script. It performs functions such as checking the syntax, type, etc. The parser provides the user with an output in the form of directed acyclic graphs. The statements in these graphs are from Pig Latin.

  • Optimizer

The logical optimizer components perform operations such as projection and pushdown. These operations also depend on the logic plan and are represented using directed acyclic graphs.

  • Compiler

The compiler converts the logical plan from the optimizer into a series of MapReduce tasks.

Pig Latin Runtime Environment

This is a scripting language that supports Extract, Transform, and Load (ETL) operations. It can also be used for raw data analysis. Like SQL scripting and query language, Pig Latin can also load and dump data in the structure needed. Pig Latin only does this after applying a variety of filters and constraints. Programs created by this scripting language require a Java Runtime Environment (JRE). Hadoop handles all the operations. It transforms the operations to the modules of Map and Reduce.

Pig Latin Data Model

Pig Latin is a fully nested data model. It supports non-atomic data types such as tuples and maps. Discussed below are these non-atomic data types:

  • Tuple

This is a record that has been created by a set of fields that are ordered. A tuple can be likened to the rows in a Relational Database Management System (RDBMS). The set of fields can also be of any type.

  • Bag

This is a set of non-ordered tuples. A tuple is flexible and can have any number of fields. A bag can be likened to a table in a Relational Database Management System (RDBMS).

  • Map

Sometimes referred to as a data map, this is a set of key-value pairs. It must be of type char array and unique. A map can also have a value of any type.