Monday, August 3, 2015

PIG DATA TYPES - PRIMITIVE AND COMPLEX

Pig has a very limited set of data types. Pig data types are classified into two types. They are:
  • Primitive
  • Complex

Primitive Data Types: The primitive datatypes are also called as simple datatypes. The simple data types that pig supports are:
  • int : It is signed 32 bit integer. This is similar to the Integer in java.
  • long : It is a 64 bit signed integer. This is similar to the Long in java.
  • float : It is a 32 bit floating point. This data type is similar to the Float in java.
  • double : It is a 63 bit floating pint. This data type is similar to the Double in java.
  • chararray : It is character array in unicode UTF-8 format. This corresponds to java's String object.
  • bytearray : Used to represent bytes. It is the default data type. If you don't specify a data type for a filed, then bytearray datatype is assigned for the field.
  • boolean : to represent true/false values.

Complex Types: Pig supports three complex data types. They are listed below:
  • Tuple : An ordered set of fields. Tuple is represented by braces. Example: (1,2)
  • Bag : A set of tuples is called a bag. Bag is represented by flower or curly braces. Example: {(1,2),(3,4)}
  • Map : A set of key value pairs. Map is represented in a square brackets. Example: [key#value] . The # is used to separate key and value.

Pig allows nesting of complex data structures. Example: You can nest a tuple inside a tuple, bag and a Map 

Null: Null is not a datatype. Null is an undefined value or corrupted value. Example: Let say you have declared a field as int type. However that field contains character values. When reading data from this field, pig converts those character values(corrupted) values into Nulls. Any operation with Null results in Null. The Null in pig is similar to the Null in SQL.

No comments:

Post a Comment