« Previous 1 2 3 4 5
Big data tools for midcaps and others
Arithmagician
Numbering
Hadoop developers have not done themselves any great favors with their choice of version numbers. If you identify remarkable development steps with a version number change in the second decimal place – say, from version 0.20 to version 0.23 – you should not be surprised if users take very little note of your progress.
Version 0.20 designates the first generation Hadoop (v1.0, Figure 6). Whenever you hear people refer to the 0.23 branch (Figure 7), they are talking about Hadoop 2.2.x. The version number 2.2.0 refers to the first release of the second generation with general availability.
An enterprise-class product that has proven itself in the tough big data daily grind for years should have earned a higher version number. The seemingly minimal generation leap to version 2.2.0 does little justice to the significantly advanced maturity of Hadoop; however, the hesitant numbering does not detract from the quality of Hadoop.
The growing importance of Hadoop is evidenced by the many prominent providers that are adapting their commercial solutions for Hadoop. SAP sells the Intel distribution and the Hortonworks Data Platform for Hadoop. SAP Hana, a data analysis platform for big data, seamlessly integrates with Hadoop. DataStax provides a distribution of Hadoop and Solr with its own NoSQL solution, DataStax Enterprise. Users of DataStax Enterprise use Hadoop for data processing, Apache Cassandra as a database for transactional data, and the Solr search engine for distributed searching. Incidentally, Cassandra lets you run Hadoop MapReduce jobs on a Cassandra cluster.
Conclusions
After four years of development, Hadoop 2.2.0 surprises its users with some groundbreaking innovations. Thanks to its modularity and HA HDFS, the Apache Foundation has succeeded in keeping ahead of the field and even extended its lead.
Because of its significantly improved management of workloads, Hadoop has become much more attractive for midcaps. Large companies have always been able to tailor any additional functions they needed, whereas painstaking development has always overtasked the midcaps. This situation has now finally changed with Hadoop 2.2.0.
Infos
- Apache Hadoop: http://hadoop.apache.org/
- YARN: http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/YARN.html
- Kognitio: http://kognitio.com
- Archer Technologies: http://www.archerits.com/
- Cloudera Hadoop distribution: http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html
- Hortonworks Hadoop distribution: http://hortonworks.com/products/hadoop-support
- Stratosphere Hadoop distribution by the Technical University of Berlin: https://github.com/stratosphere/stratosphere
- Elastic MapReduce: http://aws.amazon.com/elasticmapreduce/
- Comparison of Hadoop distributions by MapR Technologies: http://www.mapr.com/products/mapr-editions
« Previous 1 2 3 4 5
Buy this article as PDF
(incl. VAT)