« Previous 1 2 3 4 Next »
Take your pick from a variety of AWS databases
Choose Carefully
DynamoDB: Scalable NoSQL Database Service
In addition to RDS, AWS offers further databases for specific use cases. Amazon DynamoDB, for example, is a faster, more flexible NoSQL database service for all applications that require consistent latency in the single-digit millisecond range for all sizes. This fully managed cloud database supports both document and key-value storage models.
The data is stored in tables without a schema. Tables include items that have attributes (as key-value pairs). One attribute is defined as the partition key, which determines the physical storage location; therefore, tables can scale without limits while offering consistently fast performance at any size.
Indexes allow quick searches against other attributes. Attributes can be lists or maps (Listing 1), so that arbitrarily complex and deep object matrixes can be created. This structure is very flexible and therefore suitable for use in many applications. Here, "content"
is a map, and "tags"
is a list. The "article_id"
is the partition key, and "author"
is used as a secondary global index to find articles about the author.
Listing 1
Local Password Policy
{ "article_id": 123, "title": "Databases in AWS", "author": "Steffen Grunwald", "content": { "description": "...", "header": "...", "content": "..." }, "tags": [ "rds", "aurora", "dynamodb"] }
When a table is generated, the possible throughput for a set of read and write capacity units is defined, which determines how many operations per second can be performed against a table. This value can be changed later and is done either manually or through an automatic mechanism that decides on the basis of metrics whether more or fewer capacity units are needed. An example is presented online [4].
Events let users respond to data changes (e.g., to replicate the data or create derived statistics). Using the AWS Management Console, performance metrics also can be visualized, and items and tables can be edited conveniently (Figure 3).
In contrast to RDS, IAM access control for Amazon DynamoDB includes all actions down to the attribute level. In combination with the large number of SDKs, a browser can write tables directly to Amazon DynamoDB without going through an application using a data access layer. The advantage is that such an access layer does not have to scale for load-intensive read and write operations.
Redshift: Data Warehouse for Analytical Processes
Amazon Redshift is a fast, fully managed data warehouse for data volumes from 100GB to the petabyte range, which, together with existing business intelligence tools, enables easy and economical analyses of all data. As a SQL data warehouse solution, Redshift uses Open and Java Database Connectivity (ODBC and JDBC) in line with the industry standard.
Column-based storage, data compression, and zone assignments are used to reduce the I/O overhead when executing queries. Amazon Redshift has a data warehouse architecture for massively parallel processing that parallelizes and distributes the SQL operations to make optimum use of all available resources. The underlying hardware is geared to high-performance computing; locally attached storage maximizes the throughput between the CPUs and the drives, and a 10Gbps Ethernet mesh network maximizes the throughput between the deployed nodes.
Amazon Redshift scales according to the number and size of compute nodes; each data warehouse manages up to 128 compute nodes in a cluster, distributing the data and computing tasks across them. If more than one compute node is used, a main node serves as the endpoint for requests from the clients and ensures the execution of queries on all compute nodes. A node type is set for the entire cluster. Two groups of node types are available, optimized either for performance (dense compute [DC]) or storage (dense storage [DS]).
Thus, the bandwidth of a node can range up to 36 vCPUs and 16TB of storage. The node types can be changed even after creating the cluster. When resizing the existing cluster, Amazon Redshift changes to read-only mode, provides a new cluster of the desired size, and copies data from the old cluster to the new in parallel. During the deployment of the new cluster, the old cluster is still available for read queries. After copying the data to the new cluster, Amazon Redshift automatically forwards the queries to the new cluster and removes the old cluster. Much like RDS, the switch is handled by changing the DNS CNAME record.
Choosing the Correct Tool
When describing the AWS services in the database area, some assistance is available. First, the requirements should be investigated to discover whether the data is processed analytically or transactionally. If analytically, Amazon Redshift is a potential candidate. You also need to consider whether the data matches a relational or non-relational structure and whether the database can scale with the traffic in the long term. Finally, of course, price plays a role. The simple monthly calculator [5] helps you predict the monthly costs.
Relational or non-relational databases often form the basis of an application. Additional functions may be needed depending on the requirements. Examples of this are data life cycle management or functions for aggregation and analysis. Premium AWS services are available for this purpose. For example, Amazon ElastiCache offers a solution for caching, in which the user can choose between a Redis engine or a memcached-compatible engine.
If the data to be stored consists of events that will be processed and analyzed in large quantities as a stream, Amazon Kinesis Streams and Amazon Kinesis Analytics are the right choices. You can quickly perform experiments and feasibility studies, without getting lost in the details of the underlying technology.
« Previous 1 2 3 4 Next »
Buy this article as PDF
(incl. VAT)