Capping IT Off

Capping IT Off

What is Hadoop?

(or Hadoop for dummy architects like me)

I’m sure you’ve heard about Big Data. If not, I recommend you my blog post “What is Big Data ?”

The most well known technology used for Big Data is Hadoop. It is used by Yahoo, eBay, LinkedIn and Facebook. It has been inspired from Google publications on MapReduce, GoogleFS and BigTable. As Hadoop can be hosted on commodity hardware (usually Intel PC on Linux with one or 2 CPU and a few TB on HDD, without any RAID replication technology), it allows them to store huge quantity of data (petabytes or even more) at very low cost (compared to SAN bay systems).

Hadoop is an open source suite, under an apache foundation: http://hadoop.apache.org/.

The Hadoop “brand” contains many different tools. Two of them are core parts of Hadoop:

  • Hadoop Distributed File System (HDFS) is a virtual file system that looks like any other file system except than when you move a file on HDFS, this file is split into many small files, each of those files is replicated and stored on (usually, may be customized) 3 servers for fault tolerance constraints.
  • Hadoop MapReduce is a way to split every request into smaller requests which are sent to many small servers, allowing a truly scalable use of CPU power (describing MapReduce would worth a dedicated post).
Some other components are often installed on Hadoop solutions:
  • HBase is inspired from Google’s BigTable. HBase is a non-relational, scalable, and fault-tolerant database that is layered on top of HDFS. HBase is written in Java. Each row is identified by a key and consists of an arbitrary number of columns that can be grouped into column families.
  • ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Zookeeper is used by HBase, and can be used by MapReduce programs.
  • Solr / Lucene as search engine. This query engine library has been developed by Apache for more than 10 years.
  • Languages. Two languages are identified as original Hadoop languages: PIG and Hive. For instance, you can use them to develop MapReduce processes at a higher level than MapReduce procedures. Other languages may be used, like C, Java or JAQL. Through JDBC or ODBC connectors (or directly in the languages) SQL can be used too.
 

 

Hadoop Architecture

 

Even if the most known Hadoop suite is provided by a very specialized actor named Cloudera (also by MapR, HortonWorks, and of course Apache), big vendors are positioning themselves on this technology:

  • IBM has got BigInsights (Cloudera distribution plus their own custom version of Hadoop called GPFS) and has recently acquired many niche actors in the analytical and big data market (like Platform Computing which has got a product enhancing the capabilities and performance of MapReduce)
  • Oracle has launched BigData machine. Also based on Cloudera, this server is dedicated to storage and usage of non-structured content (as structured content stays on Exadata)
  • Informatica has a tool called HParser to complete PowerCenter This tool is built to launch Informatica process in a MapReduce mode, distributed on the Hadoop servers.
  • Microsoft has got a dedicated Hadoop version supported by Apache for Microsoft Windows and for Azure, their cloud solution, and a big native integration with SQL Server 2012.
  • Some very large database solutions like EMC Greenplum (partnering with MapR), HP Vertica (partnering with Cloudera), Teradata Aster Data (partnering with HortonWorks) or SAP Sybase IQ are able to connect directly to HDFS.
Now you know what Hadoop is, look at what No Hadoop is...

... and if you want to know more about other Big Data solutions, here is a blogpost listing all big data vendors and technologies.

About the author

Manuel Sevilla
16 Comments Leave a comment
very nice sir, but i need more information about hadoop like implementation... ;But wat the things u presented is so good... I am cleared with the concepts...
Tnx for Shering information......
Hi, i think that i saw you visited my weblog thus i came to “return the favor”.I'm trying to find things to enhance my site!I suppose its ok to use some of your ideas!!
Hi, thanks for sharing and really a good post with neat explanation!!!
Magnificent goods from you, man. I have understand your stuff previous to and you're just extremely great. I actually like what you have acquired here, really like what you are stating and the way in which you say it. You make it enjoyable and you still take care of to keep it smart. I can't wait to read much more from you. This is actually a terrific web site.
You can learn hadoop technology in simple language and with example and tutorial from guruzon.com also.they have mentioned small programs from hadoop technology.You can also learn other java related technology from the guruzon.Its includes Hibernate,Spring,Java,J2EE,etc..
sir, where has hadoop being used till date and where it can be used in future??
This is very good artical.I have also find one best and user friendly website for learning java programming languages.Its included all major technologies like hadoop,hibernate,scala and interview questions. http://guruzon.com/6/introduction
Thanks for sharing the valuable information http://biginfosys.com/hadoop-online-training.html
HADOOP is rocking in the current market. Many people are showing interest learn Big Data HADOOP as it is an open source suite, under an apache.
What are the job opportunities in Hadoop technology ?
telecharger gta v carries a substantial function in United states Culture. A lot of people is frequently witnessed involved in things to do linked to telecharger gta gratuit.
Thank you provide valuable informations and iam seacrching same informations,and saved my time http://www.onlinesastrainings.com/
informaticaonlinetraininginfo.com is the Best Online Training Center for informtica in Hyderabad, in India. The Trainer having around 5 years of experience. And they provide Real-time practical Oriented Training. The informtica material prepared by by expertinformaticaonlinetraininginfo.com trainers. We are providing the best online training and corporate training through National an International. mail to getgoodtraining@gmail.com call us at India : +91 996 395 7366 USA : +1 214 347 4655 http://www.informaticaonlinetraininginfo.com/
<p class="comment-body" style="padding-right: 10px;">Nice information. Our institute Is the Leading online institute in Hyderabad, India to provide INFORMATICA online trainings with the real time experts and certified professionals. alot. <a href="http://informaticaonlinetraininginfo.com/" rel="nofollow"> Informtica ONLINE TRAINING</a>.<br></p>
Hello mates, pleasant paragraph and nice urging commented here, I am really enjoying by these.

Leave a comment

Your email address will not be published. Required fields are marked *.