You are viewing...

Installing Apache Hadoop, Hive, Pig, and Mahout + m2e for Eclipse

Updated on April 01, 2012 at the 14th hour
Posted under:

DISCLAIMER: All views are considered my own and you should not draw any conclusions on associates.

Ever wanted to install Hadoop or Mahout? Let's go. Keep in mind, I'm using Fedora 16.

Apache Hadoop


1. sudo yum install maven java hadoop eclipse --> you should install eclipse, but you don't have to. I believe when you install eclipse it drags in all the java dev stuff you'd need.

2. vi /usr/etc/hadoop/hadoop-env.sh and edit the export JAVA_HOME line reflect where java is installed. On F16 is installed in /usr/lib/jvm/java

3. type '/usr/etc/hadoop/hadoop-env.sh' there should be no output

4. I'm assuming single-node setup here, so type in '/usr/sbin/hadoop-setup-single-node.sh' and there will be about 5 questions. I answered y to all

5. Hadoop should now be started.

Additionally you can install Apache Pig with sudo yum install pig quite easily.

Apache Mahout


I find it easier to install from svn

1. cd /usr or wherever you want

2. mkdir mahout and change directory into it

Straight from the docs

3. svn co http://svn.apache.org/repos/asf/mahout/trunk ./

4. mvn install or because there are a number of tests you can do mvn -DskipTests install

5. Congrats maven is installed. Add Maven to the path: export PATH=$PATH:/usr/mahout/bin

If you wanted to install Apache Hive, practically the same instructions as this.

Eclipse


Start up Eclipse because we want to install the m2e project.

1. Go to help > install new software

2. Enter http://download.eclipse.org/releases/indigo/ - I have 3.7 (indigo)

3. Go to General Purpose Tools and check m2e, which is maven to eclipse.

Now you should be able to go to File > Import and see Maven there. Perfecto.

 

Easy peezy. Now you can attempt those tutorials you've been looking at.

Where might you want to go next?

https://cwiki.apache.org/confluence/display/Hive/GettingStarted - Getting started with Apache Pig

http://www.ibm.com/developerworks/java/library/j-mahout/ - Intro to Mahout

http://code.google.com/p/unresyst/wiki/CreateMahoutRecommender - Creating your first Recommender

http://hadoop.apache.org/common/docs/current/mapred_tutorial.html - Hadoop Map/Reduce Tutorial

Oh and grab this book Mahout in Action

Heck let me give you some ideas:


Let's say you want to create a recommendation engine. Attempt to implement one using Mahout/some Java Web Scraper and a bit of data labeling with the Naive Bayes Classifier to do just that. This should be good enough of a start to make it interesting.

Or if you have a bunch of Movies you like already, then implement a recommendation engine using Mahout and IMDB or some other movie rating site.

Want something more complex? Sentiment Analysis? Grab a bunch of tweets and figure out a way to identify each tweet as Positive, Negative or Netural. Heck, even better, grab Amazon reviews (or you fav shopping site) and figure out a way to create ratings from reviews or extract important information from reviews, such as durable / not durable or very cool / traditional, and the list goes on.

 

Credits: http://androidyou.blogspot.com/2011/11/mahout-and-hadoop-are-all-java.html
You just read "Installing Apache Hadoop, Hive, Pig, and Mahout + m2e for Eclipse". Please share if you liked it!
You can read more recent posts here.