Distributed MySQL with Vitess

Ubiquitous

The Vexed Subject of the Quorum

Anyone familiar with distributed systems – and specifically with distributed storage – will have started wondering how Vitess solves the problem of the quorum, which is always inherent in distributed systems. In fact, the quorum also plays a big role when working with shards, but the Vitess developers make it very easy for themselves. VTGate generates the view for clients with the help of the Topology Service. The Vitess developers have simply added an interface to cooperate with external consensus algorithms. In Vitess, this is Etcd by default, which runs in many Kubernetes instances anyway. Alternatively, Zookeeper can be used.

Performance and Redundancy

A running Vitess cluster must be thought of as a set of keyspaces spread wildly across all existing instances of VTTablet and MySQL in the form of shards. A shard always exists several times, and replica shards are used for read operations. If an instance fails, the missing shards are automatically re-instantiated on other instances of the cluster, which not only helps with performance but also guarantees application redundancy.

At the same time, VTGate ensures that the central consistency guarantees, such as those based on the atomic, consistant, isolated, durable (ACID) principle, continue to be upheld in Vitess. ACID is the catch-all question that comes up regularly in the context of distributed databases. Whether a database is reliably consistent ultimately drives the success or failure of the application in many areas. The Vitess developers have managed to achieve precisely this consistency by using VTGate as a proxy server with third-party language capabilities in the form of a reimplemented MySQL protocol, which is quite impressive from a technical point of view.

This is all the more true because Vitess also supports the cell as a logical unit, which is no less than the targeted distribution of shards across the boundaries of physical locations. Consequently, setups for disaster recovery are easy to implement with Vitess; again, this capability sets Vitess apart from other solutions such as Galera.

Tricks and More Tricks

The Vitess developers note at various points in their documentation that, although useful replication was a primary design goal of Vitess, it was far from the only one. The developers additionally looked to address some of the issues that were bothering them about MySQL but which remain unsolved to this day. One ongoing problem is the famous queries of death, which seem to take an eternity to execute. It's no secret that many companies would be better off investing money in a MySQL consultant than in increasingly powerful hardware for the database.

In many places, for example, database queries have been created in applications over the years that force MySQL to perform huge internal queries and that can, in the worst case, take down the entire database. The problem with MySQL is that, once such a query has been dispatched, you have no way to stop it prematurely from the outside and are forced to look on while MySQL maneuvers itself into a black hole or is eventually reprimanded by the kernel's out-of-memory killer.

Vitess obviously has a starting point for stopping this problem. The VTGate instance can also reach the limits of its performance when dealing with complex requests. Unlike real MySQL, however, a query can be terminated externally with vtctl, averting any danger to the functionality of the database. Prevention is also possible. LIMIT statements for certain query types can be stored at the VTGate level and, transparently from the client's point of view, terminate queries automatically after a certain wait or not execute them at all.

Another similarly useful VTGate capability targets performance optimization. Of course, if all queries to the database pass through VTGate, the VTGate instances also know what has been queried recently, which is where deduplication comes in handy. If VTGate or an instance from a VTGate team detects that several identical read operations are taking place at the same time, it forwards only one of them to the back ends in the background but delivers the results for all of the requests.

Another thing worth mentioning is the pooling of connections, which Vitess supports out of the box. Traditionally, each client connecting to a MySQL instance requires its own standalone TCP/IP connection with its own memory. However, as you know, clients in Vitess do not talk directly to MySQL, but only to VTGate. Because Vitess is written in Go, the connection pooling from this language is available: Instead of opening several self-sufficient connections to the individual VTTablet instances and their back ends in the form of MySQL in the background, it combines all of the incoming connections into significantly fewer open connections in the background, while still achieving higher bandwidth and more performance.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus