What is Hadoop?
Before going to know what is big data, let first understand what is data?
"The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media."
So , ‘Big Data’ is also a data but it’s a huge size of data. 'Big Data' is a term used to describe collection of data that is huge in size and yet growing exponentially with time.
Big-data can be in 3 formats:
1- Structured - Tabular form MySQL, Oracle
2- Unstructured - Tree form
3- Semi Structured - XML
Characteristics of big-data:
1- Volume - Size
2- Variety -data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc.
3- Velocity – speed at which data flows
4- Variability-
This is all about the Hadoop, just the basic understanding that exactly what is Hadoop.
Basic Terminology used in Hadoop:
-> What is Apache Hadoop?
It is a framework used to develop data processing.
->What is HDFS?
Same way that the data residing in our local file system of personal computer system, in Hadoop, data resides in a distributed file system which we called as HDFS (Hadoop
Distributed File system).
Earlier suppose when we want to process an operation or computational logic on some data, so for this we are writing the code and executing it into our local system i.e Data
locality concept but in Hadoop
That computational logic is sent to cluster node(Server) containing data. That computational logic is nothing but it’s a compiled version of program (written in high level
language like Java or python)
Components
Apache hadoop consists of two components:
1- MapReduce - This is the computation model where we are writing the logic of mapping and reducing using java or any language
2- HDFS - Storage part of hadoop contains DataNode and NameNode
The main heart of hadoop is MapReduce and HDFS
Other hadoop related technologies are Hive, HBase, Mahout, Sqoop , Flume and ZooKeeper.
No comments:
Post a Comment