28
DecChristmas Special : Upto 40% OFF! + 2 free courses - SCHEDULE CALL
Hadoop is an open source framework from Apache. Hadoop is used to analyze huge data volume and store processes. The language used in Hadoop is written in Java and is not an online analytical process, which is used for batch/offline processing. Hadoop is widely in a trend and is being used by Facebook, Yahoo, Google, Twitter, LinkedIn and many more. Hadoop can be scaled up simply by adding nodes in the cluster.
In the hive tables, Data types are used for specifying the column/field type. Hive data types can be classified into following categories: All the data types in the Hive are classified into types, given as follows:
1). Primitive Data type Primitive Data Types also divide into 4 types which are as follows:
A). Numeric Data Type The Hive Numeric Data types also classified into two types-
B). Integral Data Types The Hive Integral data types are as follows- TINYINT (1-byte (8 bit) signed integer, from -128 to 127) SMALLINT (2-byte (16 bit) signed integer, from -32, 768 to 32, 767) INT (4-byte (32-bit) signed integer, from –2,147,483,648to 2,147,483,647) BIGINT (8-byte (64-bit) signed integer, from –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
C). Floating Data Types The Hive Floating data types are as follows- FLOAT (4-byte (32-bit) single-precision floating-point number) DOUBLE (8-byte (64-bit) double-precision floating-point number) DECIMAL (Arbitrary-precision signed decimal number)
i). Date/Time Data Type The second category of Apache Hive primitive data type is Date/Time data types. The following data types come into this category-
Read: Scala Tutorial Guide for Beginner
ii). String Data Type String data types are the third category under Hive data types. Below are the data types:- STRING (Unbounded variable-length character string) VARCHAR (Variable-length character string) CHAR (Fixed-length character string)
iii). Miscellaneous Data Type The two data types come from Hive miscellaneous data types- BOOLEAN (True/false value) BINARY (Byte array)
2). Complex Data Type Following are the complex data types:
An Array is the ordered collection of fields. All the fields must be of the same type. Syntax: ARRAY<data_type> E.g. array (1, 2)
A Map is the unordered collection of key-value pairs. Key values can be of any type. Syntax: MAP<primitive_type, data_type> E.g. map(‘a', 1, ‘b', 2).
A Struct is the collection of named fields. The fields may be of different types. Syntax: STRUCT<col_name : data_type [COMMENT col_comment],…..> E.g. struct(‘a', 1 1.0),[b] named_struct(‘col1', ‘a', ‘col2', 1, ‘col3', 1.0)
A union is the value that may be one of a number of defined data. The value is tagged with an integer (zero-indexed) representing its data type in the union. Syntax: UNIONTYPE<data_type, data_type, …> E.g. create_union(1, ‘a', 63)
Read: How Long Does It Take To Learn hadoop?
3). Column Type
Following are the 4 data types of integral type: TINYINT, Ex. 100Y SMALLINT, Ex. 100S INT/INTEGER BIGINT, Ex. 100L
The string can be represented with either single quotes (‘) or double quotes (").Hive uses C-style escaping within the strings.
The traditional UNIX timestamp is supported in Hive with operational nanosecond precision. Timestamps of text files use format "YYYY-MM-DD HH:MM:SS.fffffffff" and "yyyy-mm-dd hh:mm:ss.ffffffffff".
Hive DECIMAL type is similar to a Big Decimal format of Java that represents the arbitrary precision. The syntax and example are below: “Apache Hive 0.11 and 0.12 has the precision of the DECIMAL type fixed. And it’s limited to 38 digits. Apache Hive 0.13 users can specify the scale and precision when creating tables with the DECIMAL data type using DECIMAL (precision, scale) syntax. If the scale is not specified, then it defaults to 0 (no fractional digits). If no precision is specified, then it defaults to 10. CREATE TABLE foo ( a DECIMAL, -- Defaults to decimal(10,0)b DECIMAL(9, 7) b DECIMAL(9, 7) )
Heterogeneous data types collection. “By using create union, we can create an instance.” The syntax and example are as below: CREATE TABLE union_test(foo UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>); SELECT foo FROM union_test; {0:1} {1:2.0} {2:["three","four"]} {3:{"a":5,"b":"five"}} {2:["six","seven"]} {3:{"a":8,"b":"eight"}} {0:9} {1:10.0}
4). Literals In Hive following literals are used:
Read: Your Complete Guide to Apache Hive Installation on Ubuntu Linux
These are nothing but numbers with decimal points. This type of data is composed of the DOUBLE data type.
This type is nothing but floating point value with higher range than the DOUBLE data type. The decimal type range is approximate -10-308 to 10308.
5). Null Value In Hive, missing values are represented by the special value NULL.
Conclusion
In this blog on Hive data types we have discussed all the data types in detail with examples. It will definitely provide you a deeper understanding and will help you to understand all the data types in hive easily.
A dynamic, highly professional, and a global online training course provider committed to propelling the next generation of technology learners with a whole new way of training experience.
Cyber Security
QA
Salesforce
Business Analyst
MS SQL Server
Data Science
DevOps
Hadoop
Python
Artificial Intelligence
Machine Learning
Tableau
Search Posts
Related Posts
Receive Latest Materials and Offers on Hadoop Course
Interviews