Image © Maksim Kabakou, 123RF.com

The Fine Art of Troubleshooting

Welcome

Article from ADMIN 53/2019

By Ken Hess

System troubleshooting is an art. It is a science. And, sometimes, it's brute force.

Junior system administrators have often asked, "How do you troubleshoot a problem when you have no clue where to start?" My answer has never changed: Start with the simple things first. This advice has helped me resolve every problem I've ever encountered over the past 20 years. Sure, some problems are difficult to solve, and some even seem impossible, but if you start with the simple things first, your chances of success are very high.

People in general tend to complicate problems and solutions. They tend to reach for the least probable cause for a problem and then apply the least likely solution to resolve it. I guess it's just human nature to assume that there is no easy problem or easy solution. I have found just the opposite. Most of the problems that I've seen have a reasonable cause and a relatively simple solution. I've been on many root cause analysis and postmortem calls, where I said, "I rebooted the system and everything came back as it should." Of course, I always had to explain why that resolution was the correct one and it was usually met with unhealthy skepticism and much criticism.

I can't count the number of times I heard, "Well, rebooting fixed the issue temporarily, but you didn't really resolve the problem or apply a permanent fix to it." My task was to restore service and not to spend days or weeks researching a memory leak in an application. A reboot fixed the problem. Subsequent reboots will continue to resolve the problem. Until the developers fix the application, rebooting is the correct response to the problem.

System administrators, especially junior admins, love to see long uptimes for systems. It is impressive to see a system that has an uptime of 500+ days. Everyone loves bragging rights of long uptimes. I once worked on a system that had an uptime of more than 1,300 days – a Sun Enterprise 450

...

Use Express-Checkout link below to read the full article (PDF).