PostgreSQL 9.5 – What's new; what's better?
An Eye for Details
PostgreSQL 9.5 contains more than 100 changes, including dozens of performance improvements. The first highlight in PostgreSQL 9.5 is the extension of the INSERT
command to include the ON CONFLICT DO UPDATE
clause, which lets you stipulate that an UPDATE
is performed instead of an INSERT
if the line to be inserted already exists. This behavior is also known as "UPSERT."
UPSERT
In practical applications, the UPSERT case occurs quite often, but users of PostgreSQL previously had to implement the solution manually as a stored procedure or rely on the logic in the client application. The difficulty originates with the defaults in the SQL standard and other database systems, which cause considerable technical problems unacceptable to the PostgreSQL developers [1]. Although the logic of an UPSERT is quite simple, it becomes complicated if this logic needs to deliver high performance in parallel operations without generating deadlocks or sporadic, unique constraint errors, which is very much the case with other implementations.
For these reasons, the PostgreSQL project decided specifically to design a separate syntax with its own semantics. In this example, a table stores subscriptions from a service with a user ID, the subscription type, and the start and end dates:
CREATE TABLE subscriptions ( user_id int NOT NULL, subscription_type text NOT NULL, started date NOT NULL, expires date NOT NULL, PRIMARY KEY (user_id, subscription_type) );
An UPSERT operation is now required to either insert a new entry or update an existing entry. Previously, developers needed to rely on a loop that performed INSERT
or UPDATE
operations until one of them worked without error. You could do this, for example, with a PL/pgSQL function of the type shown in Listing 1. The loop is necessary in this example because another connection could add a conflicting line between the UPDATE
and the INSERT
.
Listing 1
UPSERT Replacement
01 CREATE FUNCTION set_subscription(in_user_id int, in_subscription_type text, in_started date, in_expires date) RETURNS void 02 LANGUAGE plpgsql 03 STRICT 04 AS $$ 05 BEGIN 06 <<upsert>> 07 LOOP 08 UPDATE subscriptions 09 SET (started, expires) = (in_started, in_expires) 10 WHERE (user_id, subscription_type) = (in_user_id, in_subscription_type); 11 EXIT upsert WHEN FOUND; 12 BEGIN 13 INSERT INTO subscriptions (user_id, subscription_type, started, expires) 14 VALUES (in_user_id, in_subscription_type, in_started, in_expires); 15 EXIT upsert; 16 EXCEPTION WHEN unique_violation THEN 17 END 18 END LOOP; 19 $$;
The logic of Listing 1 can be shortened to a single command with an UPSERT, as shown in Listing 2. The command initially comprises a normal INSERT
, followed by an ON CONFLICT
clause describing the conflict and the action to be performed. In this typical case, the conflict relates to the potential ambiguity of the primary key as soon as an attempt is made to insert the same user with the same subscription once again. The action is an UPDATE
command.
Listing 2
With UPSERT Command
01 INSERT INTO subscriptions (user_id, subscription_type, started, expires) 02 VALUES (in_user_id, in_subscription_type, in_started, in_expires) 03 ON CONFLICT (user_id, subscription_id) 04 DO UPDATE SET (started, expires) = (in_started, in_expires);
As you can see, the new command saves a great deal of code and offers better performance by avoiding the problem of DIY implementations, which need to be anticipated for cases in which the action fails in other connections.
If you want to avoid repeating the input values from the VALUES
clause in the UPDATE
command, you can also use the EXCLUDED
alias to point to the value of the line whose insert failed (Listing 3). On the other hand, you can also write completely different values in the UPDATE
than in the INSERT
if you somehow want to log the fact that a conflict occurred.
Listing 3
With EXCLUDED
01 INSERT INTO subscriptions (user_id, subscription_type, started, expires) 02 VALUES (in_user_id, in_subscription_type, in_started, in_expires) 03 ON CONFLICT (user_id, subscription_id) 04 DO UPDATE SET (started, expires) = (EXCLUDED.started, EXCLUDED.expires);
Although UPSERT is more likely to be useful to developers of web applications and other transactional applications, PostgreSQL 9.5 also shows innovations in the field of analytics.
Group Theory
Listing 4 shows some sample data and a number of applications of the grouping function. PostgreSQL supports grouping operations known from other SQL systems, as well, such as CUBE
and ROLLUP
, which extend the well-known GROUP BY
clause and support the execution of multiple grouping variants at the same time.
Listing 4
New Grouping Options
01 CREATE TABLE HR ( 02 country text, 03 city text, 04 department text, 05 name text 06 ); 07 08 INSERT INTO HR VALUES 09 ('DE', 'Berlin', 'Sales', 'Christian'), 10 ('DE', 'Berlin', 'Marketing', 'Sandra'), 11 ('DE', 'Frankfurt', 'Sales', 'Stefan'), 12 ('AT', 'Innsbruck', 'HR', 'Katrin'), 13 ('CH', 'Zurich', 'Sales', 'Alexander'); 14 15 SELECT country, count(*) FROM HR GROUP BY country; 16 SELECT country, city, count(*) FROM HR GROUP BY country, city; 17 SELECT department, count(*) FROM HR GROUP BY department; 18 SELECT country, department, count(*) FROM HR GROUP BY country, department;
CUBE
or ROLLUP
let you combine the scan queries. Listing 5 shows an example using ROLLUP
. The option groups on the specified columns – as a normal GROUP BY
would – and by all prefixes of the column list, including the empty list. This is particularly useful for hierarchical structures. In the example, you can see figures at the local departmental level and at the superordinate organizational levels, as well as the total number for each query. Without ROLLUP
, you need to do this with separate queries (or possibly glue them together with a UNION
), which would be troublesome and slower.
Listing 5
An Example with ROLLUP
> SELECT country, city, department, count(*) FROM HR GROUP BY ROLLUP (country, city, department) ORDER BY 1, 2, 3; +---------+-----------+------------+-------+ | country | city | department | count | +---------+-----------+------------+-------+ | AT | Innsbruck | HR | 1 | | AT | Innsbruck | | 1 | | AT | | | 1 | | CH | Zurich | Sales | 1 | | CH | Zurich | | 1 | | CH | | | 1 | | DE | Berlin | Marketing | 1 | | DE | Berlin | Sales | 1 | | DE | Berlin | | 2 | | DE | Frankfurt | Sales | 1 | | DE | Frankfurt | | 1 | | DE | | | 3 |
The CUBE
option groups by other possible combinations of the grouping columns. In the results for the example in Listing 6, you can see the global totals for the various departments. However, not all of the lines from the CUBE
results make sense in this example. The NULL, Berlin, NULL
line, for example, groups all cities by the name of Berlin in all countries. This may have some attraction as a curiosity, but very little practical use.
Listing 6
An Example with CUBE
> SELECT country, city, department, count(*) FROM HR GROUP BY CUBE (country, city, department) ORDER BY 1, 2, 3; +---------+-----------+------------+-------+ | country | city | department | count | +---------+-----------+------------+-------+ | AT | Innsbruck | HR | 1 | | AT | Innsbruck | | 1 | | AT | | HR | 1 | | AT | | | 1 | | CH | Zürich | Sales | 1 | | CH | Zürich | | 1 | | CH | | Sales | 1 | | CH | | | 1 | | DE | Berlin | Marketing | 1 | | DE | Berlin | Sales | 1 | | DE | Berlin | | 2 | | DE | Frankfurt | Sales | 1 | | DE | Frankfurt | | 1 | | DE | | Marketing | 1 | | DE | | Sales | 2 | | DE | | | 3 | | | Berlin | Marketing | 1 | | | Berlin | Sales | 1 | | | Berlin | | 2 | | | Frankfurt | Sales | 1 | | | Frankfurt | | 1 | | | Innsbruck | HR | 1 | | | Innsbruck | | 1 | | | Zürich | Sales | 1 | | | Zürich | | 1 | | | | Marketing | 1 | | | | HR | 1 | | | | Sales | 3 | | | | | 5 |
If you want even more control over the output, you can use a GROUPING SETS
clause to specify the combinations to be grouped. After all, CUBE
and ROLLUP
themselves are simply abbreviations for specific GROUPING SETS
. The example in Listing 7 shows a potential use case.
Listing 7
GROUPING SETS
> SELECT country, city, department, count(*) FROM HR GROUP BY GROUPING SETS ((country, city, department), (country, city), (country), (), (department), (country, department)) ORDER BY 1, 2, 3; +---------+-----------+------------+-------+ | country | city | department | count | +---------+-----------+------------+-------+ | AT | Innsbruck | HR | 1 | | AT | Innsbruck | | 1 | | AT | | HR | 1 | | AT | | | 1 | | CH | Zürich | Sales | 1 | | CH | Zürich | | 1 | | CH | | Sales | 1 | | CH | | | 1 | | DE | Berlin | Marketing | 1 | | DE | Berlin | Sales | 1 | | DE | Berlin | | 2 | | DE | Frankfurt | Sales | 1 | | DE | Frankfurt | | 1 | | DE | | Marketing | 1 | | DE | | Sales | 2 | | DE | | | 3 | | | | Marketing | 1 | | | | HR | 1 | | | | Sales | 3 | | | | | 5 |
A Little More Partitioning
PostgreSQL has no built-in solution to partitioning, and this does not change in PostgreSQL 9.5. The typical workaround is to construct your own partitioning manually. An inheritance hierarchy is a useful solution (Listing 8).
Listing 8
DIY Partitioning
> CREATE TABLE logins (tx timestamp, ip_adress, ...); > CREATE TABLE logins_201509 (CHECK (tx >= '2015-09-01' AND tx < '2015-10-01')) INHERITS (logins); > CREATE TABLE logins_201510 (CHECK (tx >= '2015-10-01' AND tx < '2015-11-01')) INHERITS (logins);
Using a combination of triggers and constraints, you can divide the data up into various physical tables, and the user only needs to cite the underlying parent table when reading. The database automatically accesses the matching child tables. Some PostgreSQL extension modules can help establish and manage these structures with less manual overhead – pg_partman
being one example.
One innovation in PostgreSQL 9.5 is that the subordinate child tables can also be foreign tables that reside on other computers. Foreign tables are tables the database accesses via plugins known as foreign data wrappers (FDWs), which can be data from other database systems, from files, and even from Internet services. Another PostgreSQL database is also an option.
Foreign tables can now have CHECK
constraints. In partitioned tables, the planner looks at the constraints in the database to skip the partitions whose CHECK
constraints rule out the query criteria. The combination of these two innovations thus makes it possible to use the same approach as previously for partitioning but also extend partitioning beyond computer boundaries.
Buy this article as PDF
(incl. VAT)