Asynchronous PostgreSQL
Sophisticated
Another Way
Preferably, you do not want to wait for the database inside the DBI driver. Instead, you should only initiate the operation there, thus avoiding the problems associated with signal handling. Also, you need a feature that tells whether or not the operation has completed. The best technique is not to query this regularly but to poke it at reasonable points in time. All this can be implemented with DBD::Pg
.
Listing 5 shows the results. Line 5 imports the constant PG_ASYNC
, which is passed to the driver in line 19 together with the SQL command. Thus, the behavior of the subsequent execute command changes; it no longer waits for the database operation to complete. Now it's the responsibility of the program to use $Dbh-> pg_ready
to check periodically whether the request has completed and, possibly, to read the result.
Listing 5
burn2.pl
01 #!/usr/bin/perl 02 03 use common::sense; 04 use DBI; 05 use DBD::Pg qw/PG_ASYNC/; 06 use IO::Select; 07 08 print "Status: 200\nContent-Type: text/plain\n\n"; 09 $|=1; $|=0; # flush 10 11 my $dbh; 12 13 $SIG{TERM}=sub { 14 $dbh->pg_cancel if $dbh and $dbh->{pg_async_status}==1; 15 }; 16 17 $dbh=DBI->connect('dbi:Pg:dbname=r2', 'ipp', undef, {RaiseError=>1}); 18 19 my $ sth = $ dbh-> prepare ('select burncpu (?)' {pg_async=>PG_ASYNC}); 20 $sth->execute((($ENV{QUERY_STRING}+0) || .5).'s'); 21 22 my $sel=IO::Select->new($dbh->{pg_socket}); 23 while( $sel->can_read and !$dbh->pg_ready) { 24 warn "not ready, yet. keep polling\n"; 25 } 26 if( $dbh->{pg_async_status}==1 ) { 27 $dbh->pg_result; 28 29 while( my $row=$sth->fetchrow_arrayref ) { 30 print "@$row\n"; 31 } 32 } elsif( $dbh->{pg_async_status}==-1 ) { 33 warn "query cancelled\n"; 34 }
When I first learned of this option, I suspected that it could also execute multiple SQL commands in parallel on a database connection. Unfortunately, this is not true. Each database process can run only one SQL statement at a time. Thanks to PG_ASYNC
, execute
immediately returns with an error if you try to run a second command while the first is still running. Figure 2 shows the effect with two statements: The first sleeps for one second and the second for two seconds. The complete call only takes one second, then an error message appears. So, only the first statement is executed. The pg_async
key in the hash passed to prepare
is a bit field. PG_ASYNC
can thus be modified with one of the other two constants. If you pass in
PG_ASYNC | PG_OLDQUERY_CANCEL
or
PG_ASYNC + PG_OLDQUERY_CANCEL
the command currently running is aborted and the new one started. The example in Figure 3 takes two seconds. Only the second statement is executed. PG_OLDQUERY_WAIT
waits for the end of the statement currently running before the next one starts. The command in Figure 4, therefore, needs three seconds.
One question remains in the CGI program: When is it reasonable to use the pg_ready
function? In a loop, over and over again? That would work, but it would be a huge waste of CPU time. To counteract this, you could sleep in the loop for a while (see the example for pg_ready
in the DBD::Pg
documentation), but it would slow down the program unnecessarily.
Solutions
The right way to approach the problem is as follows: The medium for the exchange of data between DBD::Pg
and the database is a TCP connection. Once the database is done processing, it usually sends data to the client, so you wait for I/O activity on the socket and call pg_ready
every time something happens. The DBD::Pg
driver conveniently provides the socket descriptor in $dbh->{pg_socket}
. Caution: This is not a file handle, but a Unix file descriptor. Line 22 generates an IO::Select
object.
The script waits in $sel->can_read
(line 23) until data arrives in the connection and then calls pg_ready
. If the function returns "false," the program simply continues to wait.
The code at this point assumes that the socket is no longer readable after the pg_ready
call. I'm not sure whether that is always guaranteed. In my experiments, the cycle was never executed; that is, the results were in place when the data arrived. Lines 26 and 32 evaluate $dbh->{pg_async_status}
. A value of 1
indicates an asynchronous SQL command is active, 0
means the last command was synchronous, and -1
tells you the last command was asynchronous and has been aborted.
If the command has not been canceled, $dbh->pg_result
is called. This function returns exactly the same as $dbh->execute
for synchronous commands. The return value is ignored here. The call itself is necessary though; otherwise, fetchrow_arrayref
does not return a result. For database updates especially, the return value is often important.
What happens now if Apache sends the SIGTERM signal while the program is waiting in $sel->can_read
in line 23 (or more precisely, in the underlying system call select
)? The system call is aborted. Even before $sel->can_read
returns, the program flow reaches a safe point where the signal handler can be called. The signal handler determines that an asynchronous request is active and therefore calls $dbh->pg_cancel
(see the box "What Exactly Does pg_cancel
Do?"). This function tells the database process to cancel the current query and waits for confirmation, then $sel->can_read
returns. The following pg_ready
call returns "true," but the query was canceled; therefore, the program flow ends up in line 33.
What Exactly Does pg_cancel Do?
When I was working on this article, the question arose as to whether pg_cancel
only initiates the termination and returns immediately or whether it waits for confirmation from the server; that is, whether I/O operations can be expected on the socket shortly after it returns. To answer this, you could read the source code; however, it is easier to run strace
and watch. For this, I surrounded the call with two outputs to stderr:
warn '>>>'; $dbh->pg_cancel; warn '<<<';
The relevant snippet of strace
output looks like this:
write(2, ">>> at -e line 1.\n", 18>>> at -e line 1. ) = 18 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4 connect(4, {sa_family=AF_INET, sin_port=htons(5432), ... sendto(4, "\0\0\0\20\4\322\26.\0\0.\0016\252\36\264", 16, 0, NULL, 0) = 16 recvfrom(4, "", 1, 0, NULL, NULL) = 0 close(4) = 0 poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}]) recvfrom(3, "2\0\0\0\4T\0\0\0!\0\1pg_sleep\0"..., 16384, 0, NULL, NULL) = 143 write(2, "<<< at -e line 1.\n", 18<<< at -e line 1. ) = 18
Interestingly, an additional TCP connection is opened. Additionally, the database server is obviously told to send a signal to the back-end process that executes the asynchronous request. A parallel strace
call on the server confirmed that a SIGINT is sent. This is also consistent with the description of the function pg_cancel_backend(int)
[2].
What takes place on file descriptor 3, however, is interesting. It shows that pg_cancel
waits for the end of the operation. After it returns, no I/O can be expected.
Better But Not Good
Although the program now looks quite useful, it still has a weak point that is only obvious when you understand in detail how safe signal handling in Perl works. You will see that it is not really suitable for this case.
What problem is it that the safe signals in Perl are actually trying to solve? When a signal arrives, the signal handler is called. This can happen at any time. Now, if the program is about to change some global variables, and the signal handler wants to manipulate the same variables, chaos is inevitable.
The prime example of this is malloc
. The program is trying to use this function to allocate more memory. Depending on the implementation, it manages several global lists or similar structures to do so; however, if the signal handler also needs memory, precisely the situation described has occurred. Languages such as Perl rely heavily on memory management to allocate and free memory for variables as needed. Therefore, it is hardly possible to write a signal handler in Perl that does not work with global structures.