Lead Image © RFsole, Fotolia.com

Lead Image © RFsole, Fotolia.com

OpenStack Sahara brings Hadoop as a Service

Computing Machine

Article from ADMIN 30/2015
By
Apache Hadoop is currently the marketing favorite for Big Data; however, setting up a complete Hadoop environment is not easy. OpenStack Sahara, on the other hand, promises Hadoop at the push of a button.

Scalable cloud environments appear tailor-made for Big Data application Hadoop, putting it squarely in the cloud computing kingdom. Hadoop also has a very potent algorithm at its side: Google's MapReduce [1]. Moreover, the developer of Apache Lucene, Doug Cutting, is also the creator of Hadoop, so the project is not lacking in bona fides.

Hadoop, however, is a very complex structure composed of multiple services and various extensions. Much functionality means high complexity: You have to take many steps between planning a Hadoop installation and having a usable installation. A better, less complex idea is a well-prepared OpenStack service: The OpenStack component Sahara [2] offers Hadoop as a Service.

The promise is that administrators can click together a complete Hadoop environment quickly that is ready to use. Several questions arise: Will you see any benefits from Hadoop if you have not looked thoroughly into the solution in advance? Does Sahara work? Is the Hadoop installation that Sahara produces usable? I tested Sahara to find out.

Hadoop

The heart of Hadoop comprises two parts:

  • The Hadoop Distributed File System (HDFS), a scalable filesystem characterized by its inherent high availability. HDFS primarily works as object-based storage (e.g., Ceph, which might become a replacement for HDFS).
  • The MapReduce algorithm from Google. Map and Reduce are two functions within Hadoop that let you fish in the Big Data pond to find exactly the data needed.

Several components that fall into the "nice to have" category connect these two core components:

  • HBase, a DBMS that runs on top of HDFS. It serves up data from a Hadoop cluster to the outside world.
  • Hive, a data warehouse. The data stored in Hadoop can be
...
Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • The new OpenStack version 2014.1 alias "Icehouse"
    The new OpenStack version "Icehouse" comes with new features and new components, on top of numerous improvements to existing components.
  • Big data tools for midcaps and others
    Hadoop 2.x and its associated tools promise to deliver big data solutions not just to the IT-heavy big players, but to anyone with unstructured data and the need for multidimensional data analysis.
  • Hadoop for Small-to-Medium-Sized Businesses

    Hadoop 2.x and its associated tools promise to deliver big data solutions not just to the IT-heavy big players, but to anyone with unstructured data and the need for multidimensional data analysis.

  • Ubuntu Server 14.04 LTS, 64-Bit
    The 64-bit server install image on this month's CD is for computers with the AMD64 or EM64T architecture (e.g., Athlon64, Opteron, EM64T Xeon, Core 2). Ubuntu Server emphasizes scale-out computing, whether you are administering an OpenStack cloud, a Hadoop cluster, or a massive render farm.
  • The New Hadoop

    Hadoop version 2 expands Hadoop beyond MapReduce and opens the door to MPI applications operating on large parallel data stores.

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=