
After installed java...

Now, we install Hadoop.

Create User and Group

1. Create Hadoop Group

    sudo addgroup hadoop

2. Create Hadoop User and give a password

    sudo adduser --ingroup hadoop hduser

3. Give hduser can "sudo"

    sudo adduser hduser sudo

4. logout, and login with hduser

5. Create SSH

    ssh-keygen -t rsa -P ""

    cat ~/.ssh/ >> ~/.ssh/authorized_keys

    ssh localhost

Download and install Hadoop

1. Download hadoop-1.2.1.tar.gz

2. sudo tar -zxvf hadoop-1.2.1.tar.gz -C /usr/local

3. cd /usr/local

    sudo mv hadoop-1.2.1/ hadoop

4. sudo chown -R hduser:hadoop hadoop

Configure the User Environment

1. cd ~

    nano .bashrc

        export JAVA_HOME=/opt/jdk1.8.0

        export HADOOP_INSTALL=/usr/local/hadoop

        export PATH=$JAVA_HOME/bin:$HADOOP_INSTALL/bin:$PATH

2. sudo reboot

3. Loggin as hduser, and

   $hadoop version

   Hadoop 1.2.1
   Subversion -r 1503152
   Compiled by mattf on Mon Jul 22 15:23:09 PDT 2013
   From source with checksum 6923c86528809c4e7e6f493b6b413a9a
   This command was run using /usr/local/hadoop/hadoop-core-1.2.1.jar

4. nano /usr/local/hadoop/conf/

export JAVA_HOME=/opt/jdk1.8.0

5. nano /usr/local/hadoop/conf/core-site.xml


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<description>Sets the operating directory for Hadoop data.
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.
The URI's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The URI's authority is used to
determine the host, port, etc. for a filesystem.


6. nano /usr/local/hadoop/conf/mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.

7. nano /usr/local/hadoop/conf/hdfs-site.xml


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.



8. Create working directory

    sudo mkdir -p /fs/hadoop/tmp

    sudo chown hduser:hadoop /fs/hadoop/tmp

    sudo chmod 750 /fs/hadoop/tmp/

9. /usr/local/hadoop/bin/hadoop namenode -format

10. To mark "-server" in /usr/local/hadoop/bin/hadoop

Start Hadoop and Run First Job

1. cd /usr/local/hadoop


2. jps

    You should see....

6828 Jps
6434 DataNode
6334 NameNode
6727 TaskTracker
6617 JobTracker
6545 SecondaryNameNode

3.  Download plain text books from Project Gutenberg

4. ./bin/hadoop dfs -copyFromLocal ~pi/books /fs/hduser/books

5. ./bin/hadoop jar hadoop*examples*.jar wordcount /fs/hduser/books /fs/hduser/books-output

13/08/25 20:00:21 INFO mapred.JobClient: Running job: job_201308251952_0001
13/08/25 20:00:22 INFO mapred.JobClient: map 0% reduce 0%
13/08/25 20:02:06 INFO mapred.JobClient: map 46% reduce 0%
13/08/25 20:02:11 INFO mapred.JobClient: map 66% reduce 0%
13/08/25 20:03:19 INFO mapred.JobClient: map 100% reduce 0%
13/08/25 20:03:32 INFO mapred.JobClient: map 100% reduce 77%
13/08/25 20:03:35 INFO mapred.JobClient: map 100% reduce 100%
13/08/25 20:03:59 INFO mapred.JobClient: Job complete: job_201308251952_0001
13/08/25 20:03:59 INFO mapred.JobClient: Counters: 29
13/08/25 20:03:59 INFO mapred.JobClient: Job Counters


