Distributed MySQL with Vitess
Ubiquitous
Whereas IT organizations in the past maintained individual setups with separate infrastructures, scalable platforms today almost inevitably set the tone. Because customers take it for granted that their providers will continually reduce the cost of IT infrastructure and its operation, today IT can only be operated efficiently if it relies on sheer mass.
In the past, admins might have had a few dozen systems in their care; however, today they are more likely to have hundreds of machines or even more. Of course, this also affects the applications that run on these platforms: They need to be able to grow along with the environments. Several examples do this impressively, including Ceph, which offers storage that grows with the usage scenario, or applications that adhere to a cloud-ready design and can internally scale seamlessly at every level.
The trend toward scaling does not stop at databases either. You might think this topic no longer needs further consideration. After all, you have Galera for MySQL and a cornucopia of solutions for PostgreSQL that retrofit scalability. Unfortunately, it's not that simple. Galera (Figure 1) has a fan base, but comes with architectural weaknesses that often make its use impossible. PostgreSQL is nowhere near as popular as MariaDB or MySQL; it is used in far fewer setups and is considered more complicated and cumbersome. Not to mention that quite a few of the replication solutions for PostgreSQL are also plagued with critical design flaws.
Vitess [1], a scalability solution for MySQL that has proven itself over the years at YouTube, deliberately does a few things differently from Galera. The authors promise seamless scalability with reliable consistency, as well as several additional features such as an internal engine that optimizes database queries to eke out more performance. At least according to the advertising, Vitess – in combination with MySQL – is the perfect database solution for scalable setups and, in turn, also for cloud setups.
Challenges
Before I turn directly to Vitess, it might be useful to briefly outline the basic challenges of a distributed database system to help understand why special software is needed at all to implement scaled databases. After all, the question of why a single MySQL instance on powerful hardware is not enough is quite legitimate and not at all easy to answer. The same applies to the question of why a single dataset from MySQL cannot be easily distributed to any number of MySQL instances on the back end.
Basically, databases in modern setups are one of the few places where persistent data is backed up. Today, modern cloud-ready applications in particular often take a completely different approach to their data management: They logically isolate the data assets (i.e., static content such as images or videos) from the user data. Asset data in modern setups today often is outsourced to external storage, such as Amazon Simple Storage Service (S3), from where it is then mounted directly through an HTTP link. This process reduces the load on the application and is generally faster for clients because storage can be targeted geographically with the S3 protocol and, moreover, can be combined with caches to create a kind of mini content delivery network (CDN). It used to be quite common to store asset data in databases, too, but those days are long gone. Why then is a single instance of MySQL not enough in cloud setups?
The short answer to this question is that a single MySQL would be a single point of failure and sooner or later could run into performance issues, depending on the size of the setup. Classic high availability (HA) solutions such as cluster managers are also anathema to the developers of cloud-ready setups. The crystal clear requirement for any application, simply put, is that it needs to scale seamlessly and run as an arbitrary number of connected instances. Solutions like MySQL are no exception to this rule.
Technically, though, a big challenge is in making a behemoth such as MySQL fit for horizontal scaling. In classic MySQL, everything is designed to have a central control authority. The master instance is the sole point of contact for write operations. It guarantees consistency without which a database can hardly be used in a meaningful way. What this looks like under the hood is clear to any admin who has ever had to deal with MySQL. Somewhere in /var/lib/mysql/
, you have the databases with their tables, with simply no way to interconnect multiple MySQL instances so that they can handle queries from multiple instances and uphold consistency guarantees at the same time.
Many Instances and Sharding
This problem is exactly what Vitess addresses for MySQL. However Vitess is a totally different beast from Galera, which slots in as a MySQL plugin, implementing communication between nodes with its own consensus algorithm and keeping the contents of the cluster's database instances in sync.
The Vitess architecture (Figure 2) is nothing like that simple. Above all, Vitess can't really be called an add-on for MySQL anymore. Although MySQL still plays a role in the context of Vitess as a possible back end for storing data, the entire logic of data storage and data processing ultimately takes place at the Vitess level and therefore in services designed specifically for Vitess.
The linchpin of the solution is a component named VTGate, or Vitess Gateway, a pointer to the function of this component. All data that is directed to the database must pass through VTGate, which is supported by an additional Topology Service that manages and stores the metadata of a Vitess instance. To import changes to the metadata (e.g., when new databases or tables are created), you can use two tools: vtctl
, which runs directly at the command line, or vtctld
, which is the back end for a GUI that can also modify the data in the database.
When it comes to the database metadata, control occurs indirectly through the Topology Service, with which VTGate communicates constantly. To understand the way Vitess works, you must understand that its gateway implements its own MySQL interface in the form of the MySQL server protocol. Clients accessing the overall construct do not communicate with a genuine MySQL instance, but with a kind of abstraction layer that can handle the MySQL protocol (Figure 3).
Of course, a front end for storing data does not give you a database, because the information has to go somewhere. Vitess relies on MySQL, but in a different way than you might expect. A VTGate instance can have any number of VTTablet instances assigned to it, this being another service from the Vitess universe. A VTTablet instance always needs a real database on the back end where it can store its data locally, and Vitess relies on MySQL again for this task.
On the basis of this design alone, however, it is clear that the single MySQL instance here really just acts as a data store. The actual magic (i.e., the distribution of the data) is handled by a team of VTGate and VTTablet instances. A VTTablet instance can have several operating modes: In primary mode it allows write access, and in replica mode it's in a maintenance state.
Logical Databases
Administrators who want to run Vitess in the future need to understand fully the implications of this implementation. The classic division of MySQL into databases, tables, and rows and columns does not occur natively in Vitess. Also, in terms of the MySQL and VTTablet processes running in the background, there is no equivalent to what a client sees when accessing a MySQL instance of Vitess with VTGate. Instead, what a client sees through a VTGate instance is always the compiled, overall logical image of a widely distributed data set in the background on the individual VTTablet instances. Certain parallels to other distributed solutions such as Ceph are obvious. There, the Ceph Monitor (MON) servers implement the abstraction for the client, and the Object Storage Daemons (OSDs) store data on block storage devices in the background. Vitess works in a quite similar way, except the individual services go by the names VTGate and VTTablet.
The question then becomes: How does VTGate transpose the data in the background (i.e., how does Vitess implement the legacy MySQL data model)? When an arbitrary MySQL client connects to a VTGate instance, it will still see databases, tables, and rows and columns, but VTGate rewrites these structures internally. To make things slightly complicated for newcomers, the Vitess authors use a whole cornucopia of technical terms to describe their design, which prospective Vitess admins urgently need to study.
The analogy for the namespace, for example, is the keyspace, which is the counterpart to a classical database in MySQL at the Vitess level. In the Vitess documentation, the authors describe a keyspace as a logical database. Where the contents of such a database ultimately end up depends on its configuration and mode of operation – and that's the rub when it comes to Vitess scalability. To achieve the desired scalability, Vitess relies on sharding, the splitting of a large data set into many smaller data sets that then end up on different storage back ends. Administrators are familiar with this principle from mail servers, among other things: If the space available on the back-end storage is not sufficient, the only solution is to acquire a second storage solution and distribute the mailboxes across the two storage instances.
Vitess follows quite a similar approach with sharding. The VTGate instance distributes data that belongs together across all VTTablet instances it knows. Vitess explicitly refers to the shard (i.e., a data fragment) as a subunit of the keyspace, which again reveals a little more about Vitess's functionality. What looks like a database and a table to the outside (i.e., when viewed through VTGate) is in reality several chunks of data located in the various VTTablet instances.
The operating modes of VTTablet, as previously mentioned, also play a role because they are inherited by the shards located on a VTTablet instance. Accordingly, shards can also be in the primary or replication state. The highlight is that the different roles of a shard are distributed to different VTTablet instances: VTTablet instance 1 can be the primary for one shard, instance 2 for another shard, and so on. This structure distributes load while allowing users to benefit from the many storage devices on different MySQL nodes in the background by simply logically merging their bandwidth.
Now you can see the core function of VTGate: It knows which shards are located where and cooperates with the Topology Service for this purpose. Earlier, I mentioned a compiled and logical view of the cluster that the clients see. It is VTGate in cooperation with the Topology Service that is responsible for compiling this totally logical view on the fly and delivering it to the client. Needless to say, VTGate is also scalable – many instances of VTGate can run in parallel.
Buy this article as PDF
(incl. VAT)