Four rescue systems compared
Emergency Response
One unpleasant experience for any system administrator is a server that does not respond as expected. Difficult questions follow: Why did the reboot fail? How can the server be resuscitated? The first question is especially difficult to answer, because if the computer won't even start, it is impossible to log in to diagnose the problem. Enter rescue systems: They come in many flavors, each with its own particular strengths. In this article, I sound out four flavors of rescue Linux: Grml, Knoppix, Rescatux, and SystemRescueCd.
Bootstrapping for an Emergency
A couple of questions need to be clarified before a test can produce meaningful results: What functions do rescue systems need to fulfill? What workflow should the administrator set up in advance so that the rescue system is ready in case of an emergency?
In the past, systems like Grml or Knoppix served as useful companions to many administrators when it came to bringing computers back to life that could not be booted. The systems resided on a CD or, in the case of Knoppix, a DVD. However, the rules of the game have changed in recent years: Five years ago, it was quite common for servers to be delivered with CD drives, but you often search in vain to find them on today's systems.
An optical disk drive is now irrelevant in practice: The media rarely play a role when administrators try to revive their systems in an emergency. All current servers boot from flash media, such as a USB flash drive or SD card, although admins rarely boot from portable media now; they normally use one of the management frameworks from the major manufacturers (e.g., HP iLO, Dell DRAC, IBM RSA) instead of hotfooting it around a data center with a USB flash drive.
These systems work independently of the operating system on the host and boot the server on demand from any medium. In case of an emergency, a reboot can also be performed using the generic IPMI protocol; a combination of PXE and TFTP servers then ensures that the computer boots to the rescue system and not to the broken OS. Administrators would do well to set up such an infrastructure.
Requirements
The sense and purpose of a rescue system is always to allow access to the broken system for repairs. However, a few requirements must be fulfilled for this to work. First, the rescue system should support current hardware in the best possible way. After all, a booted emergency system will not be a big help if it does not have drivers for the RAID controller and therefore does not recognize the existing disks.
Most test subjects therefore regularly publish new versions with updated kernels. However, that is only half the battle: Current servers sometimes require special additional drivers or firmware that might not have made their way into the rescue system for licensing reasons.
In a worst case scenario, it might be necessary for administrators to build a corresponding kernel themselves based on the rescue system. Rescue tools therefore must provide the option to download additional components or to distribute a modified version of the original image immediately. After all, rescue systems need to support as many technologies as possible: Encrypted software RAIDs or LVM are the rule rather than the exception on servers.
Knoppix – Oldie but Goldie
Knoppix [1] was the first rescue system to be tested (Figure 1). The system has been around for more than 13 years, and anyone who has visited the annual CeBIT lectures by developer Klaus Knopper should have one or two Knoppix DVDs in their collection. Interestingly, Knoppix was not initially designed to serve as a rescue system for server administrators. Instead, Knopper's intention was to familiarize inexperienced users with Linux and Debian without having to install an operating system.
In the days before Knoppix, distributions did not have the perfectly functioning graphical installation routines that are common today. Anyone who wanted to install Debian had to fight their way through a number of text dialogs. Newcomers were therefore faced with almost insurmountable obstacles.
Knoppix became successful very quickly, because the user only had to boot from the CD to get a functioning Linux system in the blink of an eye – without even touching a local hard disk or running the risk of bricking the Windows installation that often resided on the disk. Many observers are now convinced that Knoppix made the Live system principle socially acceptable.
However, Knopper was later confronted with requests to make Knoppix permanently installable on hard drives, often causing greater problems. Later, major Linux manufacturers (e.g., SUSE and Canonical) jumped onto the Live CD bandwagon. Anyone installing Ubuntu or SUSE today can boot from a Live system and start the installation routine for the operating system from a CD, DVD, or flash drive.
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.