« Previous 1 2 3
System logging for data-based answers
Log Everything
Recording storage logs is very similar to recording network logs. Grabbing each data packet going to and from the storage system and the drives results in a large amount of information, most of which is useless to you. Instead, think about running simple I/O tests in a job's prologue and epilogue scripts and recording that data. Of course, the results will vary depending on the I/O load, but it's worth understanding I/O performance when the job is getting ready to run.
In addition to capturing performance information, you can grab I/O performance statistics from the servers and clients. A simple example is NFS. A great tool, nfsiostat
, allows you to capture statistics about NFS client and server activity. With respect to clients, you can grab information such as:
- Number of blocks read or written
- Number of reads and writes (ops/sec)
With this information, you can get a histogram of the NFS performance of both clients and servers.
In addition to nfsiostat
[8], you can use iostat
, which collects lots of metrics on the storage server, such as CPU time, throughput, and I/O request times. You can also use iostat
[9] to monitor I/O on client nodes.
Likely, you are already using filesystem tools, so you can easily look for errors in the filesystem logs and collect them (script this). These logs are specific to a filesystem, so be sure to read the manuals on what is being recorded.
A number of system administrators are reluctant to log much more than the minimum necessary, primarily for compliance. However, I'm a big believer that having too much information is better than not having enough. More logs means more space used and probably more network traffic, but in the end, you have a set of system logs that you can use to your advantage.
To review, here are four highlights:
- Log everything (within reason).
- Put a time stamp on it.
- Put a node name on every entry.
- Be a lumberjack, and you'll be OK.
- Managing Up principle: https://hbr.org/2015/01/what-everyone-should-know-about-managing-up
- logger: https://www.serverwatch.com/tutorials/article.php/3924816/Use-Logger-to-Write-Messages-to-Log-Files.htm
- rsyslog: http://www.rsyslog.com
- Linux standard logs: https://www.cyberciti.biz/faq/linux-log-files-location-and-how-do-i-view-logs-files/
- mpstat: http://sebastien.godard.pagesperso-orange.fr/man_mpstat.html
- sysstat: http://sebastien.godard.pagesperso-orange.fr
- "Finding and Recording Memory Errors" by Jeff Layton, ADMIN HPC , http://www.admin-magazine.com/HPC/Articles/Memory-Errors
- "Monitoring Client NFS Storage with nfsiostat" by Jeff Layton, ADMIN HPC , http://www.admin-magazine.com/HPC/Articles/Monitoring-NFS-Storage-with-nfsiostat
- "Monitoring Storage Devices with iostat" by Jeff Layton, ADMIN HPC , http://www.admin-magazine.com/HPC/Articles/Monitoring-Storage-with-iostat
« Previous 1 2 3
Buy this article as PDF
(incl. VAT)