Asynchronous PostgreSQL

Sophisticated

Another Way

Preferably, you do not want to wait for the database inside the DBI driver. Instead, you should only initiate the operation there, thus avoiding the problems associated with signal handling. Also, you need a feature that tells whether or not the operation has completed. The best technique is not to query this regularly but to poke it at reasonable points in time. All this can be implemented with DBD::Pg.

Listing 5 shows the results. Line 5 imports the constant PG_ASYNC, which is passed to the driver in line 19 together with the SQL command. Thus, the behavior of the subsequent execute command changes; it no longer waits for the database operation to complete. Now it's the responsibility of the program to use $Dbh-> pg_ready to check periodically whether the request has completed and, possibly, to read the result.

Listing 5

burn2.pl

01 #!/usr/bin/perl
02
03 use common::sense;
04 use DBI;
05 use DBD::Pg qw/PG_ASYNC/;
06 use IO::Select;
07
08 print "Status: 200\nContent-Type: text/plain\n\n";
09 $|=1; $|=0;                     # flush
10
11 my $dbh;
12
13 $SIG{TERM}=sub {
14   $dbh->pg_cancel if $dbh and $dbh->{pg_async_status}==1;
15 };
16
17 $dbh=DBI->connect('dbi:Pg:dbname=r2', 'ipp', undef, {RaiseError=>1});
18
19 my $ sth = $ dbh-> prepare ('select burncpu (?)' {pg_async=>PG_ASYNC});
20 $sth->execute((($ENV{QUERY_STRING}+0) || .5).'s');
21
22 my $sel=IO::Select->new($dbh->{pg_socket});
23 while( $sel->can_read and !$dbh->pg_ready) {
24   warn "not ready, yet. keep polling\n";
25 }
26 if( $dbh->{pg_async_status}==1 ) {
27   $dbh->pg_result;
28
29   while( my $row=$sth->fetchrow_arrayref ) {
30     print "@$row\n";
31   }
32 } elsif( $dbh->{pg_async_status}==-1 ) {
33   warn "query cancelled\n";
34 }

When I first learned of this option, I suspected that it could also execute multiple SQL commands in parallel on a database connection. Unfortunately, this is not true. Each database process can run only one SQL statement at a time. Thanks to PG_ASYNC, execute immediately returns with an error if you try to run a second command while the first is still running. Figure 2 shows the effect with two statements: The first sleeps for one second and the second for two seconds. The complete call only takes one second, then an error message appears. So, only the first statement is executed. The pg_async key in the hash passed to prepare is a bit field. PG_ASYNC can thus be modified with one of the other two constants. If you pass in

PG_ASYNC | PG_OLDQUERY_CANCEL

or

PG_ASYNC + PG_OLDQUERY_CANCEL

the command currently running is aborted and the new one started. The example in Figure 3 takes two seconds. Only the second statement is executed. PG_OLDQUERY_WAIT waits for the end of the statement currently running before the next one starts. The command in Figure 4, therefore, needs three seconds.

Figure 2: With the PG_ASYNC flag, the script can only run one SQL statement at any given time.
Figure 3: PG_ASYNC|PG_OLDQUERY_CANCEL tells the script to stop processing the old query and start a new one.
Figure 4: PG_ASYNC|PG_OLDQUERY_WAIT tells a Perl script to wait for the end of the first query.

One question remains in the CGI program: When is it reasonable to use the pg_ready function? In a loop, over and over again? That would work, but it would be a huge waste of CPU time. To counteract this, you could sleep in the loop for a while (see the example for pg_ready in the DBD::Pg documentation), but it would slow down the program unnecessarily.

Solutions

The right way to approach the problem is as follows: The medium for the exchange of data between DBD::Pg and the database is a TCP connection. Once the database is done processing, it usually sends data to the client, so you wait for I/O activity on the socket and call pg_ready every time something happens. The DBD::Pg driver conveniently provides the socket descriptor in $dbh->{pg_socket}. Caution: This is not a file handle, but a Unix file descriptor. Line 22 generates an IO::Select object.

The script waits in $sel->can_read (line 23) until data arrives in the connection and then calls pg_ready. If the function returns "false," the program simply continues to wait.

The code at this point assumes that the socket is no longer readable after the pg_ready call. I'm not sure whether that is always guaranteed. In my experiments, the cycle was never executed; that is, the results were in place when the data arrived. Lines 26 and 32 evaluate $dbh->{pg_async_status}. A value of 1 indicates an asynchronous SQL command is active, 0 means the last command was synchronous, and -1 tells you the last command was asynchronous and has been aborted.

If the command has not been canceled, $dbh->pg_result is called. This function returns exactly the same as $dbh->execute for synchronous commands. The return value is ignored here. The call itself is necessary though; otherwise, fetchrow_arrayref does not return a result. For database updates especially, the return value is often important.

What happens now if Apache sends the SIGTERM signal while the program is waiting in $sel->can_read in line 23 (or more precisely, in the underlying system call select)? The system call is aborted. Even before $sel->can_read returns, the program flow reaches a safe point where the signal handler can be called. The signal handler determines that an asynchronous request is active and therefore calls $dbh->pg_cancel (see the box "What Exactly Does pg_cancel Do?"). This function tells the database process to cancel the current query and waits for confirmation, then $sel->can_read returns. The following pg_ready call returns "true," but the query was canceled; therefore, the program flow ends up in line 33.

What Exactly Does pg_cancel Do?

When I was working on this article, the question arose as to whether pg_cancel only initiates the termination and returns immediately or whether it waits for confirmation from the server; that is, whether I/O operations can be expected on the socket shortly after it returns. To answer this, you could read the source code; however, it is easier to run strace and watch. For this, I surrounded the call with two outputs to stderr:

warn '>>>';
$dbh->pg_cancel;
warn '<<<';

The relevant snippet of strace output looks like this:

write(2, ">>> at -e line 1.\n", 18>>> at -e line 1.
)     = 18
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(5432), ...
sendto(4, "\0\0\0\20\4\322\26.\0\0.\0016\252\36\264", 16, 0, NULL, 0) = 16
recvfrom(4, "", 1, 0, NULL, NULL)       = 0
close(4)                                = 0
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "2\0\0\0\4T\0\0\0!\0\1pg_sleep\0"..., 16384, 0, NULL, NULL) = 143
write(2, "<<< at -e line 1.\n", 18<<< at -e line 1.
)     = 18

Interestingly, an additional TCP connection is opened. Additionally, the database server is obviously told to send a signal to the back-end process that executes the asynchronous request. A parallel strace call on the server confirmed that a SIGINT is sent. This is also consistent with the description of the function pg_cancel_backend(int) [2].

What takes place on file descriptor 3, however, is interesting. It shows that pg_cancel waits for the end of the operation. After it returns, no I/O can be expected.

Better But Not Good

Although the program now looks quite useful, it still has a weak point that is only obvious when you understand in detail how safe signal handling in Perl works. You will see that it is not really suitable for this case.

What problem is it that the safe signals in Perl are actually trying to solve? When a signal arrives, the signal handler is called. This can happen at any time. Now, if the program is about to change some global variables, and the signal handler wants to manipulate the same variables, chaos is inevitable.

The prime example of this is malloc. The program is trying to use this function to allocate more memory. Depending on the implementation, it manages several global lists or similar structures to do so; however, if the signal handler also needs memory, precisely the situation described has occurred. Languages such as Perl rely heavily on memory management to allocate and free memory for variables as needed. Therefore, it is hardly possible to write a signal handler in Perl that does not work with global structures.

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus