Better compression of web pages

Appetizers

Article from ADMIN 42/2017
By
Google develops a software tool that is a genuine alternative to Gzip, with improved website compression rates that save bandwidth for server operators.

For nearly 20 years, web servers have relied on Gzip compression to compress HTML, CSS, and miscellaneous text files, which the browser then receives and unpacks again, speeding up data transfer at the network bottleneck.

Gzip compression of dynamic content takes place at the server on the fly. The web server reads the file from the filesystem, pipes it through Gzip, and then delivers the result to the browser. Before the Gzip step, the server often calls a PHP module.

For static (e.g., CSS) files, the server sometimes performs compression beforehand and then stores .gz files in the filesystem, saving CPU power on the server. Another option is Google's Zopfli [1] compression software. Although its performance is far slower than Gzip and uses more CPU capacity, it provides superior results that are still compatible with Gzip.

Over the past 20 years, better compression methods than Gzip have emerged from time to time. However, they have never made it into the web browser, simply because they all took too much time for on-the-fly compression. After all, if it takes longer to compress than it does to transfer the original files, it hasn't helped anyone when all's said and done.

Kneaded

In 2015, Google provided a solution to this dilemma with the MIT-licensed Brotli [2] software, which compresses files at the same speed as Gzip, but with a higher compression rate – as demonstrated (Listing 1) with the use of both tools on this example HTML document:

<html>
    <head>
        <title>Brotli Test</title>
    </head>
    <body>
        <p>Hello World!</p>
    </body>
</html>

Listing 1

Gzip and Brotli Comparison

01 -rw-r--r--  1 sw  sw  124  8 Sep 16:52  hello-world.html
02 -rw-r--r--  1 sw  sw   77  8 Sep 16:53  hello-world.html.br
03 -rw-r--r--  1 sw  sw  113  8 Sep 16:53  hello-world.html.gz

The original file takes up 124 bytes of space on the hard drive. After compression with Gzip, this number drops to 113 bytes, which means a space savings of 9 percent. Compressed with Brotli, the file is only 77 bytes, or 38 percent smaller.

The main reason for Brotli's good performance with HTML files is that a 120KB dictionary is permanently stored in the program that contains the character strings most frequently used on websites (e.g., HTML tags). Brotli then refers directly to this entry and thus saves a good deal of space. On average, Brotli generates files that are 20 percent smaller than Gzip-generated files, which alone should please every server operator who has to pay for network traffic.

The key to success lies in browser support. Google has a home advantage thanks to its own Chrome browser, and after the developers set up Chrome with Brotli support, other browser manufacturers followed suit. Today, Brotli supports all major browsers [3].

Brotli on the Server Side

Things look less rosy on servers. Of all the web servers provided in the stable branches of the major Linux distributions, Brotli does not support a single one. If you want to offer compression software on Linux, you currently need to patch and recompile. However, this leads to an operating system that is difficult to update. At the latest, the next major release will probably come with official support for Brotli. Then, you will probably want to switch to the official package.

Here, I show how you can combine the available stable Nginx web server [4] with Brotli on Debian Stable 9.1 (Stretch) [5]. I assume you have a newly installed server, on which you have root privileges. Debian 9.1 already provides Brotli as standalone compression software. To begin, install the package with Git:

apt-get install brotli git

You now need the development libraries for the Nginx module. On the downside, they are currently only in the testing branch; on the upside, they do not have too many dependencies. Add the following line to the /etc/apt/sources.list file:

deb http://deb.debian.org/debian experimental main

You can then install the required package in the terminal:

apt-get update
apt-get install libbrotli-dev/experimental

Debian has no prebuilt package for the Nginx module, so you need to grab it from Google's repository and save it under /opt/ngx_brotli:

cd /opt
git clone https://github.com/google/ngx_brotli
cd /opt/ngx_brotli
git submodule update --init

Next, pick up the source code for the official Debian package from Nginx:

cd /usr/src
mkdir nginx
cd nginx
apt-get source nginx
cd nginx-1.10.3

The sources contain a debian/rules file, which defines Debian-specific settings. Now, search for the extras entry (this is the Nginx variant for most modules) and add the following line:

--add-module=/opt/ngx_brotli

Once all of the necessary files are in place and configured, build the new Debian packages:

dpkg-buildpackage -b

They then end up in the /usr/src/nginx directory and can be installed via dpkg:

dpkg -i nginx-common_*.deb libnginx-mod*_.deb nginx-extras*_.deb

To prevent the system from unintentionally updating the newly installed packages, you can set them to hold with Apt:

apt-mark hold $(dpkg --get-selections | grep nginx | sed "s/\t.*//" | xargs)

The stable Nginx with the additional Brotli module is now complete.

A Need for HTTPS

Before calling a page, the web browser explains to the web server which compression it can process (Accept-Encoding) via an HTTP GET request. To ensure that old HTTP proxies do not trip up over Brotli compression on the way from the server to the browser, browsers only ask for Brotli compression if TLS connections (SSL) are used.

Fortunately, the free Let's Encrypt [6] certification service makes light work of configuring SSL on the web server [7]. Install the Let's Encrypt client on Debian with apt-get:

apt-get install letsencrypt

The example does this at the www.linux-magazin.de URL and creates a directory for it:

mkdir -p /var/www/www.linux-magazin.de/.well-known

To prove to Let's Encrypt that you have control over this domain, you first need to set up a simple HTTP server. Store the Nginx configuration (Listing 2) in the /etc/nginx/sites-available/www.linux-magazin.de file, enable the configuration using a link to the /etc/nginx/sites-enabled/ directory, and restart Nginx:

Listing 2

Nginx HTTP Server Config

01 server {
02     listen 80;
03     server_name www.linux-magazin.de;
04
05     root /var/www/www.linux-magazin.de;
06     index index.html index.htm;
07
08     # Let's Encrypt Challenge
09     #
10     location ~ /.well-known {
11       allow all;
12     }
13
14     location / {
15       try_files $uri $uri/ =404;
16     }
17   }
ln -s /etc/nginx/sites-available/  www.linux-magazin.de   /etc/nginx/sites-enabled/  www.linux-magazin.deservice   nginx restart

Next, call the Let's Encrypt certbot program with the desired URL:

certbot certonly --webroot -w /var/www/www.linux-magazin.de/.well-known -d www.linux-magazin.de

The software stores the SSL certificates in the directory structure below /etc/letsencrypt/. Now, you can once again tackle the Nginx configuration of the web server (Listing 3) by configuring HTTPS access, enabling HTTP/2 to ensure web performance, and setting up automatic redirection of all HTTP requests to HTTPS. Now, restart Nginx:

service nginx restart

Listing 3

Nginx Web Server Config

01 # Redirection of HTTP requests to HTTPS
02 #
03   server {
04   listen 80;
05   server_name www.linux-magazin.de;
06
07   root /var/www/www.linux-magazin.de;
08   index index.html index.htm;
09
10   # Let's Encrypt Challenge
11     #
12   location ~ /.well-known {
13    allow all;
14   }
15
16   location / {
17    rewrite ^/(.*)$ https://www.linux-magazin.de/$1 permanent;
18    rewrite ^/$ https://www.linux-magazin.de/ permanent;
19   }
20   }
21
22   # HTTPS configuration
23   #
24   server {
25   listen 443 ssl http2;
26   server_name www.linux-magazin.de;
27
28   # Letsencrypt-SSL certificate
29   ssl_certificate /etc/letsencrypt/live/www.linux-magazin.de/fullchain.pem;
30   ssl_certificate_key /etc/letsencrypt/live/www.linux-magazin.de/privkey.pem;
31
32   # Cache Connection-Credentials
33   ssl_session_cache shared:SSL:20m;
34   ssl_session_timeout 180m;
35
36   root /var/www/www.linux-magazin.de/current;
37   index index.html index.htm;
38
39     # Brotli-Settings
40     #
41     brotli on;
42     brotli_comp_level 5;
43     brotli_static on;
44     brotli_types text/html text/plain text/css application/javascript application/x-javascript text/xml application/xml application/xml+rss text/javascript image/x-icon image/vnd.microsoft.icon image/bmp image/svg+xml;
45
46   location / {
47    try_files $uri $uri/ =404;
48   }
49   }

Store an HTML file in the /var/www/www.linux-magazin.de/ directory and retrieve it with Brotli compression. To change the compression performance of Brotli in on-the-fly compression, brotli_comp_level can be set to values from 1 to 11.

Like Gzip, using higher compression values in Brotli achieves better compression but requires more time and CPU power. Values between 4 and 6 are generally considered a good compromise. Anyone who pre-compresses static files for optimum performance and stores them with the .br file ending will do well with a value of 11.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus