Hadoop
Cluster Setup in CentOS6
Requirement:
1. Have java 1.6.x installed.
2. Have ssh installed.
Installation &
Configuratio[MUST be a root user]
1.
Download hadoop rpm file from apache hadoop official
website.
2.
Install hadoop:
rpm –i hadoop_version.rpm
3.
Edit the file /etc/hosts
on the servers:
192.168.1.40 master
192.168.1.41 slave1
192.168.1.42 slave2
4.
We must configure password less login from name node(master)
to all data nodes (slave1 and slave2), on all servers do the following:
Ø Command :ssh-keygen –tdsa
Ø Keep press ENTER button
until the id_dsa.pub file is generated.
We
have 3 .pub files; one is on master, and others on the two slaves.
Copy
the contents of those three .pub files to the authorized_keys file.
All
servers authorized_keys file should have the same content.
5. Open the file /etc/hadoop/hadoop-env.sh, and set the
$JAVA_HOME:
export
JAVA_HOME=/usr/java/jdk1.6.0_38.
6. Open the file /etc/hadoop/core-site.xml, add the
following properties. This file is to configure the name node store
information:
7.
Open the file /etc/hadoop/hdfs-site.xml
and add the following properties:
8. Open the file /etc/hadoop/mapred-site.xml, add the
following properties. This file is to configure the host and port of the
MapReduce jobtracker in the name node of the hadoop setup:
9. Open the file /etc/hadoop/masters, add the namenode
name: [NAMENODE SERVER ONLY]
master
10. Open the file /etc/hadoop/slaves, add all the
datanodes names:[NAMENODE SERVER ONLY]
/* in case you want the
namenode to also store data(i.e namenode also behave like a datanode) this can
be mentioned in the salves file.*/
master
slave1
slave2
11. Modify files permissions.
Once Hadoop is installed, start-all.sh, stop-all.sh and
several other files would be generated under /usr/sbin/, we must change all of those files permission:
# sudo chmod a+x file_name
Notice: Step 9 ,10 and 11 only for master server,
the slaves should do nothing about those steps.
Start and Stop Hadoop
Cluster (doing on hippo server)
1. Formatting the namenode:
# hadoop namenode –format
2. Starting the Hadoop Cluster
# start-all.sh
Run JPS command on master server:
# jps
922
JobTracker
815 SecondaryNameNode
1062 TaskTracker
521 NameNode
1136 Jps
Run JPS command on slaves:
# jps
7407 DataNode
7521 TaskTracker
7583 Jps
3. Checking the status of
Hadoop Cluster:
(1) Type the command :
# hadoop dfsadmin –report
(2) Browse the web interface for the NameNode (master
server) and the JobTracker:
4. Process a sample to test
Hadoop Cluster (wordcount example):
(1) Create a directory in master
server
# mkdir
input
(2) Create two test files under the ‘input’ directory and add the following text into the files
echo "Hello haifzhan" >> text1.txt
echo "Hello hadoop" >> text2.txt
echo "Hello hadoop
again" >> text3.txt
(3)
Copy the two test files from master server to Hadoop’s HDFS
Under the ‘input’ directory:
# hadoop dfs -put ./ input
(4)
Now you can check the files on Hadoop’s HDFS
# hadoop
dfs -ls input/*
-rw-r--r-- 2 root supergroup 15 2013-04-01 15:03
/user/root/input/text1.txt
-rw-r--r-- 2 root supergroup 13 2013-04-01 15:03
/user/root/input/text2.txt
-rw-r--r-- 2 root supergroup 19 2013-04-01 15:03
/user/root/input/text3.txt
(5)
Run the MapReduce job
# hadoop jar /usr/share/hadoop/hadoop-example-1.0.3.jar wordcount input
output
(6)
Check the result
# hadoop
dfs -cat output/part-r-00000
Hello 3
again 1
hadoop
2
haifzhan 1
5. Stopping the Hadoop Cluster
# stop-all.sh
Other useful resources:
1.
The logfiles locate in: /var/log/hadoop/root
2.
Useful websites:
Error Solving:
1. Datanode: No route to
host (start but then shut down automatically for a while)
close the firewalls on both master and slaves
machines
# service iptables stop
2. Namenode: How to exit
the safemode
# hadoop dfsadmin -safemode leave
3. How to start
datanode or tasktracker independently
# hadoop-daemon.sh start datanode/tasktracker
4. How to check the
current java version and the path of your local machine
# echo $JAVA_HOME
5. proccess information unavailable
remove all files under
/tmp , reformate namenode and restart all servers.
I get a lot of great information here and this is what I am searching for. Thank you for your sharing. I have bookmark this page for my future reference.
ReplyDeleteHadoop Training in hyderabad
Cluster is the toughest topic which i was dealing with for long time. Thanks for sharing your thoughts here.
ReplyDeleteHadoop Training Chennai
Hadoop Training in Chennai
Learning new technology would give oneself a true confidence in the current emerging Information Technology domain. With the knowledge of big data the most magnificent cloud computing technology one can go the peek of data processing. As there is a drastic improvement in this field everyone are showing much interest in pursuing this technology. Your content tells the same about evolving technology. Thanks for sharing this.
ReplyDeleteHadoop Training in Chennai | Hadoop Training Chennai | Big Data Training in Chennai | Big Data Training Chennai
Truely a very good article on how to handle the future technology. This content creates a new hope and inspiration within me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks :)
ReplyDeleteSoftware testing training in chennai | Testing training in chennai | Software testing course in chennai
SAS stands for statistical analysis system which is a analysis tool developed by SAS institute and with the help of this tool data driven decisions can be taken which is helpful for the bsuiness.
ReplyDeleteSAS training in Chennai | SAS course in Chennai | SAS training institute in Chennai
Amazing content.If you are interested instudying nodejs visit this website. Nodejs is an open source, server side web application that enables you to build fast and scalable web application that is capable of running large number of simultaneous connections that has high throughput.
ReplyDeleteNode js Training in Chennai | Node JS training institute in chennai
I am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing.
ReplyDeleteBig Data Hadoop Training In Chennai | Big Data Hadoop Training In anna nagar | Big Data Hadoop Training In omr | Big Data Hadoop Training In porur | Big Data Hadoop Training In tambaram | Big Data Hadoop Training In velachery