You are viewing...

Installing Apache Hadoop, Hive, Pig, and Mahout + m2e for Eclipse

Updated on April 01, 2012 at the 14th hour
Posted under:

DISCLAIMER: Expressed views on this blog are my own.

Ever wanted to install Hadoop or Mahout? Let's go. Keep in mind, I'm using Fedora 16.

Apache Hadoop

1. sudo yum install maven java hadoop eclipse --> you should install eclipse, but you don't have to. I believe when you install eclipse it drags in all the java dev stuff you'd need.

2. vi /usr/etc/hadoop/ and edit the export JAVA_HOME line reflect where java is installed. On F16 is installed in /usr/lib/jvm/java

3. type '/usr/etc/hadoop/' there should be no output

4. I'm assuming single-node setup here, so type in '/usr/sbin/' and there will be about 5 questions. I answered y to all

5. Hadoop should now be started.

Additionally you can install Apache Pig with sudo yum install pig quite easily.

Apache Mahout

I find it easier to install from svn

1. cd /usr or wherever you want

2. mkdir mahout and change directory into it

Straight from the docs

3. svn co ./

4. mvn install or because there are a number of tests you can do mvn -DskipTests install

5. Congrats maven is installed. Add Maven to the path: export PATH=$PATH:/usr/mahout/bin

If you wanted to install Apache Hive, practically the same instructions as this.


Start up Eclipse because we want to install the m2e project.

1. Go to help > install new software

2. Enter - I have 3.7 (indigo)

3. Go to General Purpose Tools and check m2e, which is maven to eclipse.

Now you should be able to go to File > Import and see Maven there. Perfecto.


Easy peezy. Now you can attempt those tutorials you've been looking at.

Where might you want to go next? - Getting started with Apache Pig - Intro to Mahout - Creating your first Recommender - Hadoop Map/Reduce Tutorial

Oh and grab this book Mahout in Action

Heck let me give you some ideas:

Let's say you want to create a recommendation engine. Attempt to implement one using Mahout/some Java Web Scraper and a bit of data labeling with the Naive Bayes Classifier to do just that. This should be good enough of a start to make it interesting.

Or if you have a bunch of Movies you like already, then implement a recommendation engine using Mahout and IMDB or some other movie rating site.

Want something more complex? Sentiment Analysis? Grab a bunch of tweets and figure out a way to identify each tweet as Positive, Negative or Netural. Heck, even better, grab Amazon reviews (or you fav shopping site) and figure out a way to create ratings from reviews or extract important information from reviews, such as durable / not durable or very cool / traditional, and the list goes on.


You just read "Installing Apache Hadoop, Hive, Pig, and Mahout + m2e for Eclipse". Please share if you liked it!
You can read more recent posts here.