Monday, August 3, 2015

RELATIONS, BAGS, TUPLES, FIELDS - PIG TUTORIAL

In this article, we will see what is a relation, bag, tuple and field. Let see each one of these in detail. 

Lets consider the following products dataset as an example: 

Id, product_name
-----------------------
10, iphone
20, samsung
30, Nokia

  • Field: A field is a piece of data. In the above data set product_name is a field. 
  • Tuple: A tuple is a set of fields. Here Id and product_name form a tuple. Tuples are represented by braces. Example: (10, iphone). 
  • Bag: A bag is collection of tuples. Bag is represented by flower braces. Example: {(10,iphone),(20, samsung),(30,Nokia)}. 
  • Relation: Relation represents the complete database. A relation is a bag. To be precise relation is an outer bag. We can call a relation as a bag of tuples.
To compare with RDBMS, a relation is a table, where as the tuples in the bag corresponds to the rows in the table. Note that tuples in pig doesn't require to contain same number of fields and fields in the same position have the same data type.

No comments:

Post a Comment