Lead Image © Golkin Oleg, 123RF.com

Lead Image © Golkin Oleg, 123RF.com

PostgreSQL Replication Update

Data Duplication

Article from ADMIN 40/2017
By
High availability, replication, and scaling are everyday necessities in the database world. What features does PostgreSQL offer in this context, and how good are they?

If you have anything to do professionally with the topic of replication as a PostgreSQL consultant [1], you might easily gain the impression that colleagues and clients see only distributed, synchronous, multimaster replication as true replication. But are they right?

Synchronous vs. Asynchronous

Generally speaking, replication can be classified in different ways (e.g., synchronous or asynchronous) (Figure 1). In the case of synchronous replication, a transaction is not completed until it is acknowledged by the service. The advantage, particularly in the case of a crash, is that the data must have arrived on at least two systems. The drawback is that the data must be written to at least two systems. The advantage is therefore a disadvantage because the double write operations cost time.

Figure 1: In the case of asynchronous replication, a time offset occurs when executing the transactions.

Single- and Multimaster

The distinction between single-master and multimaster is important. Many users choose multimaster because they can use it to distribute the read load, although it does not make writing faster because both machines need to write all of the data. A different and often cited argument to choose multimaster is high availability.

Two asynchronous machines can have different data at any given time, in which case conflicts can arise that need to be resolved. Usually, you can only accomplish this by rolling back one of the two transactions ex post (i.e., after the commit). For example, suppose you have $100 in a joint bank account and two people want to withdraw $90. Their transactions end up on different machines in the asynchronous cluster. Both computers debit $90 and commit the transactions before synchronizing. Only then does it become clear that the money has been dispensed twice.

The possibility of conflicts always needs to be considered in asynchronous multimaster replication. The servers might "eventually" be consistent, but not always or at any given time. In some scenarios, this asynchronous condition is not a problem because it does not cause a conflict.

Look at global aircraft sales, for example. Say someone wants to buy a plane in Japan. How likely is it that a second customer in Germany decides to buy the same aircraft at the same second? The probability is close to zero. Asynchronous multimaster replication makes sense when the partners involved are geographically distributed and a conflict effectively cannot occur.

Synchronous multimaster replication is not useful under any circumstances in these cases, however, because it drags the performance down such that only a few transactions per second are possible. The problem is the need to ensure the delivery order of network packets to avoid deadlocks – a detrimental state in terms of latency.

When it comes to availability, keep it simple. When it comes to read performance, synchronous or asynchronous single-master replication is the choice. When an admin wants to scale write performance, it is essential to distribute the data in the cluster (sharding, horizontal partitioning). If availability is the sole focus at a site, then multimaster replication is not suitable, because you would need to accept unnecessary compromises. The simpler the setup, the easier the administration and ability to maintain the sites.

Transaction Log

If you are not familiar with the database transaction log, you will not understand PostgreSQL replication. Generally, PostgreSQL, in contrast to a database that uses the index sequential access method (ISAM), is not allowed to write directly to a data file. Every change first needs to be logged in the transaction log. The reason is obvious: A computer can crash at any time. If a crash occurs precisely when the database is writing to its data file, you will get a corrupt file with incomplete data.

The PostgreSQL solution for this problem is write-ahead logging (WAL). The idea is to log all changes before they reach the data file, which is similar in principle to how a modern journaling filesystem works. In PostgreSQL, the transaction log describes the binary changes to be made in case of a recovery.

The transaction log does not contain any SQL – it's all about changes in the data files, which is why it is also suitable for things like replication. When a slave is created on the basis of the transaction logs, it is almost binary identical.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus