Capping IT Off

Capping IT Off

What is Hadoop?

(or Hadoop for dummy architects like me)

I’m sure you’ve heard about Big Data. If not, I recommend you my blog post “What is Big Data ?”

The most well known technology used for Big Data is Hadoop. It is used by Yahoo, eBay, LinkedIn and Facebook. It has been inspired from Google publications on MapReduce, GoogleFS and BigTable. As Hadoop can be hosted on commodity hardware (usually Intel PC on Linux with one or 2 CPU and a few TB on HDD, without any RAID replication technology), it allows them to store huge quantity of data (petabytes or even more) at very low cost (compared to SAN bay systems).

Hadoop is an open source suite, under an apache foundation: http://hadoop.apache.org/.

The Hadoop “brand” contains many different tools. Two of them are core parts of Hadoop:

  • Hadoop Distributed File System (HDFS) is a virtual file system that looks like any other file system except than when you move a file on HDFS, this file is split into many small files, each of those files is replicated and stored on (usually, may be customized) 3 servers for fault tolerance constraints.
  • Hadoop MapReduce is a way to split every request into smaller requests which are sent to many small servers, allowing a truly scalable use of CPU power (describing MapReduce would worth a dedicated post).
Some other components are often installed on Hadoop solutions:
  • HBase is inspired from Google’s BigTable. HBase is a non-relational, scalable, and fault-tolerant database that is layered on top of HDFS. HBase is written in Java. Each row is identified by a key and consists of an arbitrary number of columns that can be grouped into column families.
  • ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Zookeeper is used by HBase, and can be used by MapReduce programs.
  • Solr / Lucene as search engine. This query engine library has been developed by Apache for more than 10 years.
  • Languages. Two languages are identified as original Hadoop languages: PIG and Hive. For instance, you can use them to develop MapReduce processes at a higher level than MapReduce procedures. Other languages may be used, like C, Java or JAQL. Through JDBC or ODBC connectors (or directly in the languages) SQL can be used too.
 

 

Hadoop Architecture

 

Even if the most known Hadoop suite is provided by a very specialized actor named Cloudera (also by MapR, HortonWorks, and of course Apache), big vendors are positioning themselves on this technology:

  • IBM has got BigInsights (Cloudera distribution plus their own custom version of Hadoop called GPFS) and has recently acquired many niche actors in the analytical and big data market (like Platform Computing which has got a product enhancing the capabilities and performance of MapReduce)
  • Oracle has launched BigData machine. Also based on Cloudera, this server is dedicated to storage and usage of non-structured content (as structured content stays on Exadata)
  • Informatica has a tool called HParser to complete PowerCenter This tool is built to launch Informatica process in a MapReduce mode, distributed on the Hadoop servers.
  • Microsoft has got a dedicated Hadoop version supported by Apache for Microsoft Windows and for Azure, their cloud solution, and a big native integration with SQL Server 2012.
  • Some very large database solutions like EMC Greenplum (partnering with MapR), HP Vertica (partnering with Cloudera), Teradata Aster Data (partnering with HortonWorks) or SAP Sybase IQ are able to connect directly to HDFS.
Now you know what Hadoop is, look at what No Hadoop is...

... and if you want to know more about other Big Data solutions, here is a blogpost listing all big data vendors and technologies.

About the author

Manuel Sevilla
37 Comments Leave a comment
I stumbled upon your blog, and liked it so much that I read all your 5 blog posts here in one sitting.
Its crisp and clear; and oozes with end-to-end knowledge. Please continue to put more here.
Also can you please point me to other sites having blogs on BI, DW domain/ news?
".... Google uses a proprietary version of Hadoop." : Should it ideally be "Hadoop is a opensource version of Google's MapReduce framework" ?
msevilla's picture
You're right Ravi. I have corrected it. Thank you
sir, what is the use hadoop technology ???
In future what is the scope for that.
Got a very clear idea about this technology..
Manuel;
A very helpful entry. Good preparation to see your firms 2012 Road Show next week.
I was looking for some information on Hadoop, landed on your blog and found it justifying to the article name "What is Hadoop?" very crisp and clear.
It provides a good overview for a beginner. Thanks.
This is a very nice and brief way of telling what s hadoop. Very helpful for beginners like me thank you.
Your post is really brief to me. I have read a lot of articles about Hadoop and related topics. But when I read this, I found out that all things I have read at here. Nice job, continue to write about this, mate
msevilla's picture
Thanks to both of you for your support, I did this post to have something understandable on what is Hadoop, without having code lines inside :) it is good to read your remarks ;)
great , i would like to learn more about this

with regards
vinod
Thx for sharing ...NICE explanation.
This is very good and in simple words for beginners.thanx for sharing valuable information..
nice explanation...
very nyccc...very useful and clear xplaination..
Excellent article, upto the point, short and crisp...would be interested in knowing futher in this space.
I was searching for "What is Hadoop" navigated several sites and finally landed on this one, found the Topic which is brief, crisp and crystl crear
Thanks for your post.
BR
Ashok
Beautiful explanation ... i just loved it .. thank you sir .. now its clear what exactly hadoop is .. but sir can u please tell me how do i proceed with hadoop ... like what next should i do u know more ..
thank u sir .
Hi, its really good information for beginners. Thank you.
Thx for sharing the inforamation:)
I am just start learning hadoop, your blog is very much helpful to me. Thank you.
To add on my knowledge kitty it was a good read to get the basic information of hadoop. Thanks for a nice article.
very helpful site for begineers...thnkx a lot for such good basic explanation.
Hi Manuel,
Thanks for your clear information on Hadoop, but still I have a doubt.. If I wanted to do a prototype project in SAP BI or BO using
Bigdata what all softwares I require..
msevilla's picture
Kamaal,
The reason I like architecture (and BI) is that the answer to the first question is always the same : It depends :)
More precisely, Business Objects tools connect directly to Hadoop. But the first question, before the tools landscape, is, what will be the usage ?
Then my answer can be more efficient.
Manuel

PS:
And thanks to all these positive comments, really appreciate it
Thanks for u r clear information on hadoop
Hello Sir,
Thanks for your reply, but I wanted to do a project on SAP BI using Bigdata [ twitter,facebook] to generate Bex queries for a particular product, and from here SAP BI will become the source for BO for generating sentimental analysis kind of reprots.
Here my question is how to connect to Bigdata from SAP BI, like what softwares needed or any JDBC/ODBC drivers are availble etc, kindly suggest me.
Thanks in advance
I do accept as true with all the concepts you've presented for your post. They're really convincing and can certainly work. Still, the posts are very quick for starters. Could you please extend them a little from next time? Thanks for the post.
very nice sir, but i need more information about hadoop like implementation...
;But wat the things u presented is so good... I am cleared with the concepts...
Tnx for Shering information......
Hi, i think that i saw you visited my weblog thus i came to “return the favor”.I'm trying to find things to enhance my site!I suppose its ok to use some of your ideas!!
Hi,
thanks for sharing and really a good post with neat explanation!!!
Magnificent goods from you, man. I have understand your stuff previous to and you're just extremely great. I actually like what you have acquired here, really like what you are stating and the way in which you say it. You make it enjoyable and you still take care of to keep it smart. I can't wait to read much
more from you. This is actually a terrific web site.
You can learn hadoop technology in simple language and with example and tutorial from guruzon.com also.they have mentioned small programs from hadoop technology.You can also learn other java related technology from the guruzon.Its includes Hibernate,Spring,Java,J2EE,etc..
sir, where has hadoop being used till date and where it can be used in future??
This is very good artical.I have also find one best and user friendly website for learning java programming languages.Its included all major technologies like hadoop,hibernate,scala and interview questions.

http://guruzon.com/6/introduction

Leave a comment

Your email address will not be published. Required fields are marked *.