Confessions of a Patchaholic
Scan. Patch. Reboot. Scan. Repatch. Reboot. Lather. Rinse. Repeat. Patching is one of the many thankless and joyless duties of a system administrator. Everyone expects it to be done but no one in the Monday morning staff meeting stops to say, “I’d like to take a moment and say ‘Thanks’ to our SAs for their efforts in the scheduled patch session that they performed last night in the Test and Development environment.” It just doesn’t happen that way.
The more likely scenario is that the designated team leader sends out an email blast at 4:00am to notify his email distribution list that the latest patch session has gone well – and any remaining unpatched systems will receive their patches manually tonight.
Patching is a necessary evil, and if you have a good system for applying patches, patch night goes well most of the time. There are always a few stubborn systems that don’t come back up correctly or completely after a reboot, and there are those that won’t accept a particular patch. Some systems have to be coddled through patch night like a newborn baby during its periodic feedings, and some systems are needier than others. You get used to it.
It’s a Numbers Game
If you have a few systems, you can patch manually or use the operating system’s built-in tools for patch management. On Linux systems, you can use Debian’s apt-get , Red Hat’s yum , or SUSE’s YaST. Windows systems have Windows Update, and Apple has its own Mac OS software update service. Commercial Unix systems generally receive a quarterly patch bundle or individual security updates by manual download and install.
But, for more than two dozen or so systems, patching becomes a daunting task and certainly one for which you need a good plan. A good plan includes a patch schedule, a manual patch option, and a pre-patch test environment. The pre-patch test environment allows you to apply patches to systems that are similar to the production system but are used for test or development purposes. If these systems fail, you don’t take a production system out, and you can decide to postpone or remove an errant patch.
A great plan also has some third-party management software and automation built into it. Automation does not mean allowing your systems to update themselves automatically via their vendor sites or repositories. Automation means applying patches to groups of systems using a patch management suite. Automated patching usually will net a greater than 90 percent success rate and often approach 100 percent. Chances are very good, however, that you will never achieve 100 percent patch compliance with an automated system.
Automated but Not Automatic
Automatic patching sounds good to some executives but the loss of control and lack of tracking makes it a very bad idea. You can’t allow your systems to update on their own because you’ll have no change control, and you won’t have time to research any unstable patches. Nor will you be able to schedule outages for the inevitable reboots as part of the process. Automatic updating is OK until someone makes you pay a penalty for missing a service-level agreement (SLA).
Automatic updates take away control on two levels: change control and information control. Change control is the formal procedure associated with making changes to any computing environment. These changes comply with SLAs, federal compliance acts, and best practices for maintaining a stable computing infrastructure. Change control generally consists of a change proposal (description of activities) that includes a backout plan, a request for comments, a peer review, and approvals from all involved parties (SAs, network, management, project management).
Yes, it’s a painful and lengthy, but necessary, process that not only provides a documented history of changes to systems, but it also provides a veil of protection against unauthorized change backlash from clients.
As a system administrator, you need to know what’s going on in your support area. Change control helps you maintain some sanity in the environment. That documentation gives you the information you need to track when changes occurred relative to any problems that you encounter. For example, if your logs tell you your web service began experiencing core dumps the past two weeks, you can go back into change logs and check out which patches were applied to the errant system just before the core dumps.
Without that documentation, you’ll spend hours troubleshooting and floundering about for an answer.