Databases in the Google Cloud
Spoiled for Choice
Even in the cloud age, a classic IT task is operating databases. In addition to SQL relational databases, non-relational databases are making inroads into the corporate landscape. Today, companies likely use more than just one kind of database.
In principle, cloud databases can be operated in the same way as databases in an organization's data center. The admin team simply installs the database as a virtual machine. Administration in this case is no different from local administration. Aspects such as high availability, patching, backup and restore, disaster recovery, and scaling remain the administrator's responsibility.
To make it as easy as possible for companies to operate databases, cloud providers also offer databases as managed services, which means the customer is, by and large, only responsible for the application and the database logic but does not need to worry about operations issues.
Choosing a Database
Google is one of the cloud pioneers: Google Cloud Platform (GCP) is one of the three largest cloud providers alongside Amazon Web Services (AWS) and Microsoft Azure. As such, it offers a comprehensive portfolio of database technologies, but choosing the right database is not always easy and needs to be carefully considered. A catalog of criteria can help:
- The access method is important: Legacy applications such as billing software use SQL to access the database. The database needs to have an appropriate relational schema, support highly available operation, and, of course, maintain a consistent dataset at all times.
- Access techniques can differ: Relational databases rely on SQL commands (e.g.,
SELECT
,INSERT
,UPDATE
,DELETE
) and usually support transactions. In contrast, NoSQL databases often use a REST interface, and object stores use files in the form of objects. Alternatively, a database can also support streaming data and batch operations. - The storage volume is relevant: Relational databases are usually in the range of a few terabytes, whereas data warehouse systems and NoSQL databases break through the petabyte barrier, with no conceptual size restrictions for object stores.
Cloud storage is not a database in the purest sense, but because it is used as a storage back end in many NoSQL and big data scenarios, it still makes sense to look at it as such. Cloud storage keeps data in buckets. If you compare this to a filesystem, all the objects are stored in a flat namespace with no folders. Objects are replicated and are therefore always stored with high availability. Thanks to replication, durability is very high: up to 99.9999999 percent. The object store is accessed over REST or API interfaces. As a result, and because of the architecture, latencies when retrieving objects are fairly high – almost always in the double-digit millisecond range. Therefore, object storage is generally used for unstructured data.
Cloud SQL Databases
The Cloud SQL service offers both MySQL, PostgreSQL, and Microsoft SQL Server as managed services. Google operates the database, which means that patches and updates are Google's responsibility. Backup and restore tasks can be easily configured when you create the database – but you still have to set up user access yourself.
Creating a database is relatively easy in the graphical user interface (Figure 1). To do this, click on Databases | SQL in the menu at top left. If no database exists as yet, you can continue directly with the wizard and select Create Instance . MySQL versions 8.0, 5.7, and 5.8 and PostgreSQL versions 9.6, 10, 11, 12, 13, and 14 are now available for selection. Alternatively, a managed database instance with Microsoft SQL Server versions 2019 and 2017 is available.
A unique name is required to create the database. It's a good idea to use random characters, if possible, because after you delete an instance, you cannot reuse the name for seven days. The region is also important. If you want to provide the database in Frankfurt, Germany, for example, select europe-west3 as the region. If you want the database to be highly available, use the Multiple zones (Highly available) option under Zonal Availability . You can specify the Primary Zone and the Secondary Zone , as well. High availability incurs higher costs because a standby server is required to ensure that failover is possible at all times.
When it comes to hardware sizing, you have a wide choice; you can run a small test database with PostgreSQL or MySQL on a shared core with a virtual (v)CPU and 614MB of RAM. On the other side of the scale, the maximum limit is 96 vCPUs and 624GB of RAM (at the time of writing). Hard disk storage is available in the form of hard drives (HDDs) and solid-state disks (SSDs).
Size also determines disk performance; for each gigabyte of storage space you have 30 input/output operations per second (IOPS) for reading from and writing to SSDs, but these values are lower for writing on legacy magnetic drives. The data throughput is similar and also linked to size.
By default, databases are given public IP addresses. If this setup is undesirable for security reasons, deploy the database to a virtual private cloud (VPC) with private IP addresses only. Both normal and shared VPCs are supported.
In the backup options, you can set a time window in which an automatic backup will take place and enable point-in-time recovery to restore an arbitrary database status in a granular way in the event of a problem. By default, Cloud SQL stores the last seven backups. You also configure the backup region at this point. In addition to selecting a dedicated region, multiregional backup is available. For example, you can choose to create backups in various regions within the European Union.
Because Cloud SQL is a managed database service, you do not need to set time windows for patching and updating; instead, this process is handled, like the other options, in the graphical user interface, where you can configure database-specific options, as well. For PostgreSQL, this process includes auditing queries or the number of possible concurrent connections, for example. One item of interest to developers is the Insights option, which can be used to identify performance problems at database runtime.
Of course, Cloud SQL databases can be created with tools other than the graphical user interface. The command-line interface or Terraform are especially popular. A simple call to create a MySQL database would be:
gcloud sql instances create myinstance --database -version=MYSQL_8_0 --cpu=2 --memory=7680MB --region=europe-west3
The corresponding Terraform resource definition,
resource "google_sql_database_instance" "instance" { name = "mysql-instance" region = "us-central1" database_version = "MYSQL_8_0" settings { tier = "db-n1-standard-2" } deletion_protection = "true" }
is not difficult either.
Cloud Spanner Worldwide
Cloud Spanner is an interesting choice for companies that want the benefits of a relational database and the scale of a NoSQL database, with capacities in the petabyte range and an operational scope across multiple regions at the same time. This feature is especially interesting for applications that tolerate very little or no downtime. Unlike classic relational databases, Cloud Spanner scales horizontally rather than vertically. Replication takes place automatically, and the database remains in a globally consistent state at all times.
Cloud Spanner supports a wide range of functions of classic databases, including system compatibility with ANSI 2021 standard SQL and support for transactions. Like other databases in the Google Cloud, Cloud Spanner is integrated into the GCP ecosystem and offers encryption with key management service (KMS), audit logging, and integration with identity and access management. In addition to typical interfaces such as Java Database Connectivity (JDBC), client libraries exist that support access in the Java, Python, and Node.js programming languages.
When migrating an existing application, you need to keep in mind that not all features of classic relational databases are supported. For example, sequences cannot be used for primary keys. Sequences make unique lines with numeric values. However, because Cloud Spanner distributes the data to nodes on the basis of the primary key, and the primary key must be alphanumeric, sequences are ruled out.
Another downer is the lack of support for triggers, stored procedures, and user-defined functions (UDFs). For this reason, Cloud Spanner might not be suitable for your application. However, a number of criteria still make Cloud Spanner interesting, including applications that have become too large for a single database or that only run at all as classic databases after optimization measures such as database sharding. Transactional consistency or global availability requirements also favor Cloud Spanner.
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.