Hadoop
Cluster Setup in CentOS6
Requirement:
1. Have java 1.6.x installed.
2. Have ssh installed.
Installation &
Configuratio[MUST be a root user]
1.
Download hadoop rpm file from apache hadoop official
website.
2.
Install hadoop:
rpm –i hadoop_version.rpm
3.
Edit the file /etc/hosts
on the servers:
192.168.1.40 master
192.168.1.41 slave1
192.168.1.42 slave2
4.
We must configure password less login from name node(master)
to all data nodes (slave1 and slave2), on all servers do the following:
Ø Command :ssh-keygen –tdsa
Ø Keep press ENTER button
until the id_dsa.pub file is generated.
We
have 3 .pub files; one is on master, and others on the two slaves.
Copy
the contents of those three .pub files to the authorized_keys file.
All
servers authorized_keys file should have the same content.
5. Open the file /etc/hadoop/hadoop-env.sh, and set the
$JAVA_HOME:
export
JAVA_HOME=/usr/java/jdk1.6.0_38.
6. Open the file /etc/hadoop/core-site.xml, add the
following properties. This file is to configure the name node store
information:
7.
Open the file /etc/hadoop/hdfs-site.xml
and add the following properties:
8. Open the file /etc/hadoop/mapred-site.xml, add the
following properties. This file is to configure the host and port of the
MapReduce jobtracker in the name node of the hadoop setup:
9. Open the file /etc/hadoop/masters, add the namenode
name: [NAMENODE SERVER ONLY]
master
10. Open the file /etc/hadoop/slaves, add all the
datanodes names:[NAMENODE SERVER ONLY]
/* in case you want the
namenode to also store data(i.e namenode also behave like a datanode) this can
be mentioned in the salves file.*/
master
slave1
slave2
11. Modify files permissions.
Once Hadoop is installed, start-all.sh, stop-all.sh and
several other files would be generated under /usr/sbin/, we must change all of those files permission:
# sudo chmod a+x file_name
Notice: Step 9 ,10 and 11 only for master server,
the slaves should do nothing about those steps.
Start and Stop Hadoop
Cluster (doing on hippo server)
1. Formatting the namenode:
# hadoop namenode –format
2. Starting the Hadoop Cluster
# start-all.sh
Run JPS command on master server:
# jps
922
JobTracker
815 SecondaryNameNode
1062 TaskTracker
521 NameNode
1136 Jps
Run JPS command on slaves:
# jps
7407 DataNode
7521 TaskTracker
7583 Jps
3. Checking the status of
Hadoop Cluster:
(1) Type the command :
# hadoop dfsadmin –report
(2) Browse the web interface for the NameNode (master
server) and the JobTracker:
4. Process a sample to test
Hadoop Cluster (wordcount example):
(1) Create a directory in master
server
# mkdir
input
(2) Create two test files under the
‘input’ directory and add the following text into the files
echo "Hello haifzhan" >> text1.txt
echo "Hello hadoop" >> text2.txt
echo "Hello hadoop
again" >> text3.txt
(3)
Copy the two test files from master server to Hadoop’s HDFS
Under the ‘input’ directory:
# hadoop dfs -put ./ input
(4)
Now you can check the files on Hadoop’s HDFS
# hadoop
dfs -ls input/*
-rw-r--r-- 2 root supergroup 15 2013-04-01 15:03
/user/root/input/text1.txt
-rw-r--r-- 2 root supergroup 13 2013-04-01 15:03
/user/root/input/text2.txt
-rw-r--r-- 2 root supergroup 19 2013-04-01 15:03
/user/root/input/text3.txt
(5)
Run the MapReduce job
# hadoop jar /usr/share/hadoop/hadoop-example-1.0.3.jar wordcount input
output
(6)
Check the result
# hadoop
dfs -cat output/part-r-00000
Hello 3
again 1
hadoop
2
haifzhan 1
5. Stopping the Hadoop Cluster
# stop-all.sh
Other useful resources:
1.
The logfiles locate in: /var/log/hadoop/root
2.
Useful websites:
Error Solving:
1. Datanode: No route to
host (start but then shut down automatically for a while)
close the firewalls on both master and slaves
machines
# service iptables stop
2. Namenode: How to exit
the safemode
# hadoop dfsadmin -safemode leave
3. How to start
datanode or tasktracker independently
# hadoop-daemon.sh start datanode/tasktracker
4. How to check the
current java version and the path of your local machine
# echo $JAVA_HOME
5. proccess information unavailable
remove all files under
/tmp , reformate namenode and restart all servers.