« Previous 1 2 3 4
OpenStack Sahara brings Hadoop as a Service
Computing Machine
Hardware for Sahara
Anyone who wants to offer Hadoop as a Service, needs to use large CPUs, a generous helping of RAM, and, ideally, fast 10Gb network cards. However, this alone is still not enough; Hadoop is only really fast when it can use fast local storage.
As a reminder, the default configuration of OpenStack packs persistent VMs onto storage that is connected in the background via iSCSI. This might not be very elegant from a technical point of view, but beyond that, it is very slow. Alternatives offer high throughput in the form of Ceph. What most of the alternatives have in common is that they come with fairly high latency, because the packets always have to traverse the network.
Local storage helps. If the VM is running on the host and using the system's local storage, the detour via the network is eliminated. Until the most recent OpenStack release (Kilo), OpenStack was unable to map the connection between a VM and storage created with Cinder. Administrators could thus choose whether they wanted to run a VM on persistent network storage or locally on the individual hypervisors – but then not persistently.
In Kilo, the developers retrofitted a long-desired function from which Sahara will also benefit: It is now possible to specify that Cinder should create a volume on the host from which the virtual machine starts. The storage operator in Cinder might then have to take care of the topic of high availability itself, but that should be possible with a detour via DRBD9 [5], for example.
Conclusions
Cloud and Big Data are like chalk and cheese. Ultimately, it was precisely the large HPC setups that first sounded the triumph of cloud computing. For providers to make large amounts of resources available that enables customers to operate Hadoop dynamically and flexibly is certainly a very coherent approach. The cloud particularly offers the advantage that the customer can tap into a production environment immediately, instead of first having to deploy a hardware zoo in their racks.
Fortunately, Sahara developers have solved many problems from earlier times. That a VM can be started from the exact place where the volume provisioned by Cinder is also located makes Hadoop useful at the outset, because Hadoop only works well with fast storage.
However, one big drawback remains: Currently only a few providers operate publicly accessible OpenStack clouds, and those who do only support Hadoop in the rarest of cases and don't offer Sahara support. Except for a DIY cloud, you have virtually no option for using Hadoop's functionality in everyday life – a functionality that is actually very useful. This is a shame, because if a provider were to add Sahara to its portfolio, that provider could probably rely on a multitude of customers rushing to sign up.
Infos
- MapReduce: https://en.wikipedia.org/wiki/MapReduce
- Sahara: http://docs.openstack.org/developer/sahara/
- Figure 1 acknowledgment: http://docs.openstack.org/developer/sahara/architecture.html
- Build images: http://docs.openstack.org/developer/sahara/userdoc/diskimagebuilder.html
- DRBD: http://drbd.linbit.com/home/what-is-drbd/
« Previous 1 2 3 4
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.