Verifying packages with Debian's ReproducibleBuilds

Identical Build

Toolchain

Numerous triggers of these unpredictable deviations can be eliminated in the toolchain used for package building. Additionally, however, the developers have to edit the individual source packages with a view to improving reproducibility.

In certain cases, the package maintainer even needs to patch the software source code. This is the case, for example, if the developers use the __TIME__ or __DATE__ preprocessor macros [9] in the source code to additionally output the build date if the --version flag is set when calling the program.

Small things can play a decisive role when it comes to reproducibility. If you underestimate their importance, you can only build an identical package on another day by using a customized virtual environment. Having to rely on such virtual build environments is not the answer; it must be possible to reconstruct identical checksums for the binary packages under any conditions.

The ReproducibleBuilds working group therefore maintains an enhanced toolchain for deterministic package building [10]. This provides, for example, alternative packages for Doxygen and docbook-to-man, as well as the dh_strip_nondeterminism debhelper module, which, among other things, removes the timestamps from a series of objects. The toolchain is thus far purely experimental, but it is likely to be incorporated gradually into the official branch after the Debian 8 release.

Another innovation is the new .buildinfo check file. After building a package, it saves the build dependencies used with version numbers, the build path and the checksum of the source, and the generated binary package [11]. On the basis of this information, you can reconstruct the build path when rebuilding the package and understand the dependencies used in the snapshot archive package [12] if they are outdated.

The experimental toolchain already works in a chrooted build environment, for example, with pbuilder [13]. Package maintainers can thus use it for initial testing. The Debian Project also provides debbindiff [14], which can compare two packages built one after another.

Infrastructure

The project is now also docked onto Debian's infrastructure. The continuous integration platform [15] continuously checks the whole package archive to discover whether individual source packages can be reproducibly generated [16]. To do so, it builds the packages one after another with intentionally modified parameters and then compares the binary packages to discover differences.

If differences occur, the developers create bug reports for the package, which currently still has a priority of wishlist . Additionally, a set of predefined user tags is available for these bugs that categorizes the problem, for example, timestamps , fileordering , randomness , and so on. Not infrequently, the project developers also provide patches to the package maintainer.

Looking Forward

Fixed checksums for binary packages are just the start of optimizing security in Debian. In the future, package maintainers would no longer upload binary packages prebuilt on their computers but would instead send source packages along with the .buildinfo file.

If the package was always built on a Buildd machine [17], the checksums could be compared directly, and the project could reject packages when discrepancies occur [18]. Some time in the future, it might then be impossible to add non-reproducible packages to the official package archive. At the same time, checksum deviations in the Buildd network would be a reliable indicator of a manipulated build system. Sophisticated scenarios, such as Trusting Trust attacks [19], in which rogue build environments secretly propagate, could be effectively thwarted.

To monitor the integrity of the entire Debian system in this way, package reproducibility would need to work for all supported processor architectures. This is the next step of the project, but one in which unknown pitfalls might still be lurking. The goal of eventually building the entire package archive in a reproducible way thus may be too ambitious.

Some very subtle problems prevent this eventuality, including the build processes, which return different results depending on the time, CPU usage, and memory configuration. For example, GCC chooses hash functions to reflect the RAM size. Nevertheless, the project would like to get as close as possible to its target and is therefore also covering packages without program code, such as documentation.

If the associated bugs were given a priority of important or serious , maintainers would have to explain why they are not pursuing the long-term objective of assuring software integrity right down to the hardware level with test systems. However, some building blocks are still missing. For example, the project would be forced to accept only signed upstream tarballs from developers.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus