Getting started with the Apache Cassandra database

Believable

Once upon a time, Facebook set itself the goal of offering its users a search option in their inbox, which meant storing a reverse index of all messages. At that time, however, no database would have been able to do this safely and inexpensively. The MySQL database used until then had reached its limits at a size of 7TB and more than 100 million users.

Two Facebook programmers, Avinash Lakshman (one of the authors of Amazon's DynamoDB) and Prashant Malik, set out to develop a completely new storage system that would bear up in the face of the huge amount of data and a high growth rate while being fail-safe, fast, and economical [1]. The result was Apache Cassandra [2].

Facebook released Cassandra in 2008 under an open source license, and two former Rackspace employees founded their own company in 2010 to launch a commercial Cassandra offshoot in 2011: DataStax Enterprise (DSE) has several additional features compared with Apache Cassandra and is now used by some of the world's largest companies, including Netflix, Microsoft, Deloitte, The Home Depot, Walgreens, and Facebook and its subsidiary Instagram. (See the "Cassandra Hands-on" box.)

Cassandra Hands-on

In May 2019, the DataStax Accelerate conference, organized by DataStax, the manufacturer of the commercial Cassandra offshoot, provided a number of interesting practical application examples. Yahoo Japan, for example, reported on how its Cassandra cluster for apps, logs, and statistical data grew from 100 nodes in 2016 to 4,700 nodes in 170 clusters today.

However, operating a large installation of this nature also causes problems, especially with regard to the repair processes required after errors and failures that often broke down or got stuck in infinite loops. Without

...

Use Express-Checkout link below to read the full article (PDF).