New in PostgreSQL 9.3
Database Tuning
In version 9.3, the PostgreSQL community once again introduces several important new features in a major release. The work of many developers worldwide, this release went through a rigorous review process (Commitfest) that ensures the quality of the source code. Typically, Commitfests see up to 100 patches; because of the sheer volume, the feature freeze deadline was regularly postponed. To compensate, the list of new features is impressive [1]. In this article, I present the most important ones.
Parallel Dump
A feature long requested by users and developers is the ability to perform a simultaneous database dump using multiple threads or processes. PostgreSQL uses a multiprocess architecture, which means it does not use threads (not even on platforms such as Windows, which actually prefers them). To dump multiple objects in parallel with pg_dump
, the program must therefore open multiple database connections. However, because pg_dump
guarantees the consistency of a dump, even for concurrent transactions, the database connections that belong to pg_dump
now need to be synchronized. This was what made developing this feature tricky.
The infrastructure implemented to handle this process is called "snapshot cloning," which makes it possible for mutual synchronization of several transactions to the same state, although they use separate database connections. If you were to back up tables in parallel, and if these pg_dump
transactions were not synchronized, data consistency between the tables could not be guaranteed. In this case, it would be possible for the concurrent processes to store new data in one of the tables during backup. Enter snapshot synchronization.
Basically, a transaction must first be started and can then be synchronized. All other transactions then import the snapshot generated in the process. The transaction then synchronizes this with its "parent transaction." Both have the same view of the database state. Figure 1 illustrates the sequence. First, a transaction is started. This must have a REPEATABLE READ
or SERIALIZABLE
degree of isolation; READ COMMITTED
is not possible, because the snapshot changes after each command at this isolation level and is thus not permanent for the transaction.
With the use of the pg_export_snapshot()
function, a snapshot can be exported in any transaction. The return value is a datum of the TEXT
type with the snapshot identifier. This identifier can be imported into another transaction with the SET TRANSACTION SNAPSHOT
command.
It should be noted that this must be the first command in the other transaction or must precede each SELECT
, INSERT
, UPDATE
, or DELETE
command. Furthermore, a transaction that you want to import with a SERIALIZABLE
isolation level cannot import a transaction from a lesser isolation level. Once the snapshot has been imported, as in Figure 1, both transactions see the same database state. The records added to the bar
table up front are no longer visible because the imported snapshot had not seen them when it was created.
The pg_dump
utility implements precisely this infrastructure to synchronize multiple dump processes. Parallel dump only works for the new directory
output format. The pg_dump
call for several synchronized dump processes uses the new parallel dump command-line parameter, -j
:
pg_dump -j 4 -Fd -f /srv/backups/db/
The important thing to note is the output format specified by the -Fd
command-line parameter (d
= directory). The -f
option specifies the output directory; it must not exist.
The pg_dump
command in major new versions of PostgreSQL has always been backward-compatible with older versions to allow migrations. Parallel dump (-j
) is no exception and can be used, for example, in version 9.2 of PostgreSQL. In this case, pg_dump
cannot provide synchronized snapshots; thus, no changes should be permitted to the databases for the duration of the dump to avoid inconsistencies. Alternatively, synchronized snapshots can be disabled explicitly using a command-line parameter (--no-synchronized-snapshots
) so that dumps can be created with multiple pg_dump
processes in older versions.
Writable FDWs
PostgreSQL 9.1 already had a partial implementation of SQL/MED standards in place, which enabled the integration of external data sources in the form of a foreign table. This was previously only possible for read access, but in PostgreSQL 9.3, the API has been extended to include writes to these foreign tables.
External data sources are defined in such a way that they appear to the user to be local PostgreSQL tables. When queried, the records are converted at run time by means of a Foreign Data Wrapper (FDW) and displayed as a database row in the result set, in line with the table definition. A large number of FDWs for other database systems are already set up in PostgreSQL, including:
- Oracle
- MySQL
- Informix
- ODBC
- JDBC
- NoSQL sources: CouchDB, Redis, and MongoDB.
The FDW for PostgreSQL itself was introduced to the contrib branch of the database server in version 9.3; thus, tables from external PostgreSQL databases can be integrated. Furthermore, a PostgreSQL FDW already supports Data Modifying Language (DML) write queries with INSERT
, DELETE
, and UPDATE
.
The various distributions usually provide these additional modules in the form of a postgresql-contrib
package. If everything is installed properly, the PostgreSQL FDW can simply be enabled as an extension (Figure 2). The \dx
command in psql
returns all the currently installed extensions for a database.
You don't have to install an FDW on the remote database server, but you must ensure that the local database is granted access to the database you want, using pg_hba.conf
. You can also stipulate popular PostgreSQL keywords for the database connection (e.g., host
, dbname
, port
, etc.). However, user
, password
, failback_application_name
, and client_encoding
are prohibited; the last two are set automatically by the FDW.
To use a PostgreSQL FDW to access a remote database server, you need to configure a matching data source with the CREATE SERVER
command. The example I use here accesses a remote PostgreSQL server, archives.mynet.internal
. First, however, the FDW must be loaded into the local database using CREATE EXTENSION
:
CREATE EXTENSION postgres_fdw;
The data source for the remote database server can then be defined as in:
CREATE SERVER pg_archive_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS(dbname 'archive', host 'archives.mynet.internal', port '5432');
To create a foreign table, you need to tell the data source which user can log in to the remote PostgreSQL instance and with what combination of role name and password. To do so, you need a user mapping. The CREATE USER MAPPING
command takes care of this. The access credentials used here differ between individual FDWs; for PostgreSQL, the username is mandatory and the password can be optionally provided. The latter is imperative for a mapping for a user without superuser privileges. The CURRENT_USER
keyword is automatically replaced by the role currently being used in the local database session.
CREATE USER MAPPING FOR CURRENT_USER SERVER pg_archive_server OPTIONS(user 'bernd', password 'bernd');
After creating a data source and a mapping for the access credentials for each data source, you can then create a foreign table. The local definition should correspond to the schema of the remote data source to the extent possible. In Listing 1, the local database is given a foreign table that stores email in an archive.
Listing 1
Foreign Table
01 CREATE FOREIGN TABLE mail( 02 id bigint not null, 03 label_id bigint, 04 mail_from text not null, 05 mail_to text not null, 06 msg text not null, 07 subject text not null, 08 msg_id text not null, 09 date timestamp) 10 SERVER pg_archive_server
This corresponds exactly to the definition on the archive server, but naturally without the keywords SERVER
and OPTIONS
, which are not required there. Now, the local database can easily use the archive table in local queries:
SELECT COUNT(*) FROM mail WHERE mail_from LIKE 'Tom Lane%'; count ------- 6238 (1 row)
When first run, the FDW is responsible for building an appropriate mapping for the user and setting up the target server database connection. This database connection is also cached for reuse for each user mapping in the local database session.
The PostgreSQL FDW not only supports read operations but also writes DML. If transactions or save points are used in the local database, the PostgreSQL FDW also links these with transactions or savepoints in the remote database. This means that a ROLLBACK
of the local databases also rolls back any changes to a remote PostgreSQL data source.
New Features with Streaming Replication
Introduced in version 9.0, built-in replication using the streaming replication method has seen steady improvements from one major version to the next. PostgreSQL 9.3 also remains true to this tradition and improves the handling of streaming replication in the event of a primary server failover or recovery.
If the primary server is restored by means of an online backup or failed over to another standby streaming replication, the affected PostgreSQL instance changes the timeline, which the database server's transaction log adheres to.
Besides the possibility of promoting a PostgreSQL Hot Standby server to a full PostgreSQL instance using the trigger_file
parameter, PostgreSQL also provides the option of doing this by running the pg_ctl promote
command.
In version 9.3, you also have the option of making the standby a full, writable instance without waiting for a checkpoint. You need the -m fast
command-line option for this, which can save you a longer wait before the instance is available for writing.