How To Setup Multi Node Hadoop 2.0.x(YARN) Cluster

This post describes necessary steps required to setup 2-node Hadoop YARN cluster using Hadoop 2.0.6-alpha release. This post is based on these1 posts2 and can be considered as a combination of both posts with extra steps necessary to setup Hadoop 2.0.6-alpha. Steps I discussed here should also work for Hadoop 2.1.0-beta.

 User accounts, /etc/hosts file modifications and password less SSH

You can follow the steps described in this post1 under section “User creation and other configurations steps” to setup necessary user accounts and password less SSH. One thing to make sure that password less ssh for localhost, in addition to slaves. Other post2 doesn’t mention about password less SSH for localhost, but it is important. Otherwise startup scripts will ask you to type the password during data node and node manager startup.

 Configuring Hadoop

You can follow the steps 4, 5, 6, 7, and 8 described in this post2 to configure Hadoop with some small modifications noted below.

  1. During step 5 in this post2, you need to add JAVA_HOME environment variable to hadoop-env.sh too.
  2. One of the most important configuration step is configuring yarn.nodemanager.address and yarn.nodemanager.localizer.address in yarn-site.xml during step 7 discussed in this post2. We need this configuration change only in master node due to the fact that both resource manager and node manager will run on master node and if we don’t have a node manager address and localizer address specific to node manager, node manager will try to bind to same ports which uses by resource manager.

So with above mentioned change yarn-site.xml will look like following after necessary changes.

<?xml version="1.0"?>
<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>master:8025</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>master:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>master:8040</value>
  </property>
  <property>
    <name>yarn.nodemanager.address</name>
    <value>master:8050</value>
  </property>
    <property>
    <name>yarn.nodemanager.localizer.address</name>
    <value>master:8060</value>
  </property>
</configuration>

 Running Hadoop YARN and Checking Installation

You can follow steps 9, 10, 11 and 12 in this post2 to start and test the installation. You can find necessary information about web interface URLs and how to stop Hadoop YARN cluster in last part of the same post2.

This post was moved from my old blog.

 
13
Kudos
 
13
Kudos

Now read this

Interesting Resources on Writing

Starting to participate in #The100DayProject by writing every day for 100 days got me into research more about writing. Writing is major part of the life as grad student. But I was far behind my writing and I wanted to improve by writing... Continue →