« Previous 1 2 3
PostgreSQL Replication Update
Data Duplication
Bidirectional Replication
In some cases, you can't avoid bidirectional replication. A few years ago, the PostgreSQL developers implemented logical decoding, which made it possible to decode the transaction log using a plugin and reuse the content in SQL statements.
For example, suppose a user issues the following SQL statement:
DELETE FROM data WHERE id < 4;
In this case, decoding the transaction logs would deliver:
DELETE FROM data WHERE id = 1; DELETE FROM data WHERE id = 2; DELETE FROM data WHERE id = 3;
The resulting stream could then be replayed on a subscriber.
In the future, it should be possible to use this stream to synchronize slaves and deliver almost downtime-free updates. In upcoming versions of PostgreSQL, bidirectional replication will also be feasible. Bidirectional, asynchronous replication is currently only possible with a tool named BDR, which you would need to install as a package.
In this next example, bidirectional replication makes sense. Assume that an insurance company operates branches in the US and Europe. In this case, although the data line could conceivably fail, it's very unlikely that someone would try to insure the same car in the US and Europe at the same time (i.e., the conflict probability is low).
Premier Class: Automatic Failover
PostgreSQL is a modern database that offers numerous features specifically for replication. However, PostgreSQL is not cluster software – things like automatic failover and the like are not implemented, so external tools are needed.
One of the best known tools is Zalandos Patroni [2], which uses the Paxos algorithm to determine who should have what role in the cluster. If Patroni, on the basis of the consensus algorithm, detects that a failover must occur, it triggers the necessary operations. The configuration is relatively simple (Figure 2); note that Patroni starts and stops the services but does not manage the IP addresses. An application that accesses the cluster from the outside thus needs to connect a different IP address in the event of a fault.
One solution is to combine PostgreSQL and Patroni with Consul [3] to modify the DNS records directly in the case of a failover. Cybertec [4], a PostgreSQL database company, will be offering a freely available solution for automated failover and replication in the near future.
Decision-Making Aids
If you are facing a situation in which you need to find a solution for an existing problem, you first need to clarify what you really want to achieve: Is it about availability? Is it about scaling? Do you want to create a geographically distributed system? If availability is the focus, you should rely on simple master-slave replication. If read scaling is most important, you will first want to check the workloads on missing indexes and so forth and then rethink your options.
In many cases, missing indexes can make the workload appear greater than it really is. A modern database server can handle hundreds of thousands of queries per second – it is always useful to check whether the infrastructure is just being used in an inefficient way before scaling your servers. If the read load is still too high, you can rely on streaming replication. If the system is globally distributed, an asynchronous multimaster solution can be useful, but the chances of conflict must be small.
The present PostgreSQL environment is in flux. Hardly a month goes by without the addition of new features. Substantial improvements can be expected in all areas that are unforeseeable today. It is definitely worth your while to see what will happen in the future.
Infos
- PostgreSQL: https://www.postgresql.org
- Patroni: https://github.com/zalando/patroni
- Consul: https://www.consul.io
- Cybertec: http://www.cybertec.at/en/
« Previous 1 2 3
Buy this article as PDF
(incl. VAT)