Big data tools for midcaps and others

Arithmagician

Numbering

Hadoop developers have not done themselves any great favors with their choice of version numbers. If you identify remarkable development steps with a version number change in the second decimal place – say, from version 0.20 to version 0.23  – you should not be surprised if users take very little note of your progress.

Version 0.20 designates the first generation Hadoop (v1.0, Figure 6). Whenever you hear people refer to the 0.23 branch (Figure 7), they are talking about Hadoop 2.2.x. The version number 2.2.0 refers to the first release of the second generation with general availability.

Figure 6: Just for comparison's sake: In Hadoop 1.x, all applications were dependent on the use of MapReduce.
Figure 7: Hadoop 2.x stack with Apache Tez: performance improvements with data processing in cluster memory.

An enterprise-class product that has proven itself in the tough big data daily grind for years should have earned a higher version number. The seemingly minimal generation leap to version 2.2.0 does little justice to the significantly advanced maturity of Hadoop; however, the hesitant numbering does not detract from the quality of Hadoop.

The growing importance of Hadoop is evidenced by the many prominent providers that are adapting their commercial solutions for Hadoop. SAP sells the Intel distribution and the Hortonworks Data Platform for Hadoop. SAP Hana, a data analysis platform for big data, seamlessly integrates with Hadoop. DataStax provides a distribution of Hadoop and Solr with its own NoSQL solution, DataStax Enterprise. Users of DataStax Enterprise use Hadoop for data processing, Apache Cassandra as a database for transactional data, and the Solr search engine for distributed searching. Incidentally, Cassandra lets you run Hadoop MapReduce jobs on a Cassandra cluster.

Conclusions

After four years of development, Hadoop 2.2.0 surprises its users with some groundbreaking innovations. Thanks to its modularity and HA HDFS, the Apache Foundation has succeeded in keeping ahead of the field and even extended its lead.

Because of its significantly improved management of workloads, Hadoop has become much more attractive for midcaps. Large companies have always been able to tailor any additional functions they needed, whereas painstaking development has always overtasked the midcaps. This situation has now finally changed with Hadoop 2.2.0.

Infos

  1. Apache Hadoop: http://hadoop.apache.org/
  2. YARN: http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/YARN.html
  3. Kognitio: http://kognitio.com
  4. Archer Technologies: http://www.archerits.com/
  5. Cloudera Hadoop distribution: http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html
  6. Hortonworks Hadoop distribution: http://hortonworks.com/products/hadoop-support
  7. Stratosphere Hadoop distribution by the Technical University of Berlin: https://github.com/stratosphere/stratosphere
  8. Elastic MapReduce: http://aws.amazon.com/elasticmapreduce/
  9. Comparison of Hadoop distributions by MapR Technologies: http://www.mapr.com/products/mapr-editions

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • The New Hadoop

    Hadoop version 2 expands Hadoop beyond MapReduce and opens the door to MPI applications operating on large parallel data stores.

  • Hadoop for Small-to-Medium-Sized Businesses

    Hadoop 2.x and its associated tools promise to deliver big data solutions not just to the IT-heavy big players, but to anyone with unstructured data and the need for multidimensional data analysis.

  • Is Hadoop the New HPC?

    Hadoop has been growing clusters in data centers at a rapid pace. Is Hadoop the new corporate HPC?

  • Is Hadoop the new HPC?
    Hadoop has been growing clusters in data centers at a rapid pace. Is Hadoop the new corporate high-performance computing?
  • OpenStack Sahara brings Hadoop as a Service
    Apache Hadoop is currently the marketing favorite for Big Data; however, setting up a complete Hadoop environment is not easy. OpenStack Sahara, on the other hand, promises Hadoop at the push of a button.
comments powered by Disqus