In this article, I would like to provide basic steps to install a Hadoop single node on Ubuntu 14.04 LTS. Basically, Hadoop is an open-source framework used for distributed storage and processing of dataset of big data using MapReduce model. Click here to see more detail.

Prerequisite condition

You absolutely install Hadoop on Window OS but I recommend Ubuntu or other Linux OS that Hadoop supports. You should create a virtual machine if you only have Window OS on your local machine.

Install Hadoop

After setting up a virtual machine, We login to it under and administrator account.

Update packages

This step is to update the latest packages for Ubuntu in case the Ubuntu OS is not the latest package

sudo apt-get update

Ubuntu updating the latest packages

Install JDK Java

sudo apt-get install default-jdk

Installing JDK Java Libraries

Add Hadoop user group and Hadoop users

This step is to create a Hadoop user group and users. We use the user created for working on the Hadoop system after finishing installation

Hadoop user group: hadoop

Hadoop user: hduser

sudo addgroup hadoop

sudo adduser –ingroup hadoop hduser

Installing Hadoop User Group and Hadoop Users

Install SSH

To allow user to access Hadoop system from client tool without entering user name and password, we need to generate a key for hduser  by installing SSH

sudo apt-get install ssh

sudo – hduser

ssh-keygen -t rsa -P “”

Installing SSH for hduser user

Now, the system is ready for installing Hadoop platform on Ubuntu

Install Hadoop

Download Hadoop package

ssh localhost

wget -c http://mirrors.maychuviet.vn/apache/hadoop/core/hadoop-2.7.0/hadoop-2.7.0.tar.gz

Create the folder for installing Hadoop

This step is to create /usr/local/Hadoop folder to store Hadoop packages installed

sudo tar -vxcf Hadoop

Unzip the package downloaded Hapdoop-2.7.0.tar.gz

sudo tar -xzf Hapdoop-2.7.0.tar.gz

Copy unzipped the package to /usr/local/Hadoop

sudo mv Hapdoop-2.7.0 /usr/local/Hadoop

Update JDK Java

update-alternatives –config java

Update Hadoop variables

1- Edit .bashrc file

sudo gedit ~/bashrc

Enter

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib”

Reload .bashrc to update changes

source ~/.bashrc

2- Update JAV_HOME path in Hadoop environment

cd /usr/loca/Hadoop/Hadoop-2.7.0/etc/Hadoop/etc/hadoop/

sudo gedit Hadoop-env.sh

Enter

export JAVA_HOME=”/usr/lib/jvm/java-7-openjdk-amd64″

3- Edit core-site.xml

This step is to update two parameters

hadoop.tmp.dir: Used to specify directory which will be used by Hadoop to store its data files.

fs.default.name: This specifies the default file system.

cd /usr/local/hadoop/hadoop-2.7.0/etc/hadoop

sudo gedit core-site.xml

Enter

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>Parent directory for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS </name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. </description>
</property>

 

 

Updating core-site.xml

Next, we need to create the temp folder which we add in the core-site.xml file and then grant permission to user

cd /usr/local/hadoop/hadoop-2.7.0/etc/hadoop

sudo mkdir -p /app/hadoop/temp

sudo chown dungdt:dungdt -R /app/hadoop/temp

sudo chown hduser:hadoop -R /app/hadoop/temp

sudo chmod 750 /app/hadoop/temp

MapReduce Configuration

MapReduce is the algorithm to process of dataset of big data

Backup file mapred-site.xml

cd /usr/local/hadoop/hadoop-2.7.0/etc/hadoop

sudo cp mapred-site-template.xml mapred-site.xml

sudo gedit mapred-site.xml

Enter

          <configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Update hdfs-site.xml

This step is to configure the folder for replication when we install multiple nodes

sudo gedit hdfs-site.xml

Enter

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hadoop2.7.0/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/hadoop2.7.0/hadoop_store/hdfs/datanode</value>
</property>
</configuration>

We need to create the folders based on the configuration above

$ mkdir -p /usr/local/hadoop/hadoop-2.7.0/hadoop_data/hdfs/namenode

$ mkdir -p /usr/local/hadoop/hadoop-2.7.0/hadoop_store/hdfs/datanode

 

Start Hadoop

Before starting hadoop, we need to format the file system

Starting hadoop by running commnad

start-all.sh

Checking the processes by browsing localhost:8088

Hadoop Processes

 

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s