HADOOP INSTALLATION:
Step 1-> Check Hadoop is installed or not by using this command:
hadoop version
If Hadoop is already installed on your system, then you will get the following response:
Hadoop 2.4.1 Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
Step 2-> If Hadoop is not installed on your system, then proceed with the following steps:
use these commands: cd /usr/local
# wget http://apache.claz.org/hadoop/common/hadoop-2.4.1/hadoop-2.4.1.tar.gz or any latest version tar
# tar xzf hadoop-2.4.1.tar.gz
# mv hadoop-2.4.1/* to hadoop/
OR manually we can do: Firstly download the Hadoop tar file hadoop-2.4.1.tar.gz from Hadoop Website.
Extract the file and put in /usr/local/hadoop folder.
Set the hadoop path.(For Ubantu use this command : sudo gedit ~/.bashrc)
Step 3-> Set the path use this command: sudo gedit ~/.bashrc
and copy- paste this text:
export HADOOP_HOME="/usr/local/hadoop/hadoop-2.6.0"
PATH=$PATH:$HADOOP_HOME/bin
export PATH
Now Restart the bashrc by this command: . .bashrc
Step 4-> Hadoop Configuration
1-> cd $HADOOP_HOME/etc/hadoop
(goes to Hadoop folder then goes to hadoop-2.6.0 -> etc -> hadoop)inside these folder all conf. file is given
2-> You can find all the Hadoop configuration files in the location “$HADOOP_HOME/etc/hadoop”. You need to make suitable changes in those configuration files according to your Hadoop infrastructure.
3-> In order to develop Hadoop programs using java, you have to reset the java environment variables in hadoop-env.sh file by replacing JAVA_HOME value with the location of java in your system.
hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.7.0_71
4-> Pseudo-Distributed Operation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
Configuration:
Use the following:
conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
NOTE- Option 5 is optional not need(5->Setup passphraseless ssh)
5->Setup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
6-> Execution
Format a new distributed-filesystem:
$ bin/hadoop namenode -format
The expected result is as follows:
10/24/14 21:30:55 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = localhost/000.909.0.00
STARTUP_MSG: args = [-format]
7-> Start the hadoop daemons:
$ bin/start-all.sh
8->Browse the web interface for the NameNode and the JobTracker; by default they are available at:
NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/
If you hit this url http://localhost:50070/ and you got some page then it means Hadoop has configured correctly.
NOTE:
To stop the Hadoop:
$HADOOP_HOME/sbin/stop-all.sh