It’s the Top; It’s iotop

The first thought that springs to mind when I consider which part of a slow system might be the bottleneck is the main processor or the CPU (Central Processing Unit). For speed improvements it's always worth adding some RAM to a slow machine, but for other tasks, a slowdown can equally occur because the system hard drives can’t cope with the level of requests  they’re receiving. This especially applies to servers that are hosted in a distant data center, where you can’t put your ear to the chassis to hear the drives hammering a steady beat, letting you know they are in constant use, or see the drive light flashing away alarmingly vigorously.

Where Were We?

You have almost certainly come across top for checking CPU usage in the past, and many will have used free to see whether RAM is having to write to the swap file because its capacity is sufficiently depleted. But how on earth would you begin to test what your disk drives are doing? And more importantly, what is causing them to busy themselves to a great extent.

Apparently there’s always been a way of being able to follow disk activity on Linux systems. Namely using a package (bundled with many Linux flavors) called vmstat . By running vmstat without any command-line parameters, you see the following output:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0   1360  27788 125860 269804    0    0   390    43  308  572  8  3 84  5

In this case, it’s the io section you’re interested in. Input/Output is seemingly defined as external communications to and from a computer. In a single server, however, you might consider the CPU as the central brain , and I/O in this scenario refers to the communications passing back and forth, from a hard disk in most cases. At least that’s what I’ll be referring to when I discuss I/O in this article, so purists please be aware.

Just before I completely dismiss vmstat as a somewhat antiquated tool, let me just remind you of its functionality. Not only is it fantastically small and fast, but it’s highly available and usually present without ever having to install packages. To this end, it will offer near instant reporting on your RAM and swap usage and your CPU’s status with great ease, and although the bi and bo sections aren’t what you might call verbose, by using the -D switch, you can glean some useful information that is not easily found elsewhere without much more effort.

Here the vmstat -D output offers the following disk activity summary statistics (trimmed for clarity):

            6 partitions
       130494 total reads
      1796014 merged reads
     15370134 read sectors
      1467900 milli reading
       211902 writes
      1267923 merged writes
     11997432 written sectors
     56956480 milli writing
          899 milli spent IO

I/O You Say?

In good old computing terms, disk drives are sometimes referred to as block devices, and this is exactly what vmstat above is reporting in the bi and bo columns. They simply refer to blocks in and blocks out . In other words, a measure of how much activity there was in the last period. If you instead enter vmstat 1 , you’ll get a second-by-second readout of the activity, updating automatically – an invaluable function undoubtedly useful for may scenarios.

It’s Not Enough

Of course what would be ideal is if you could tell exactly what was using the I/O. That would certainly help you find your system bottleneck. The CPU looks fine, using free -m the RAM looks fine, so surely it must be I/O.

Step forward iotop (http://guichaz.free.fr/iotop/). A clever little piece of console-based software written in Python, iotop interfaces with the kernel to provide per-process I/O usage statistics. On a modern kernel that you don’t compile yourself by default (Debian or Ubuntu for example), you might find all of the prerequisites already enabled in your stock kernel. If not, you might want to look (e.g., on a Gentoo system) at enabling the following kernel parameters (they might be slightly differently named depending on version and flavor):

  • Export task/process statistics through netlink
  • Enable extended accounting over taskstats
  • Enable per-task storage I/O accounting

If you are used to rummaging around your custom kernel configuration, then according to the man page: “ At least the CONFIG_TASK_DELAY_ACCT , CONFIG_TASK_IO_ACCOUNTING , CONFIG_TASKSTATS , and CONFIG_VM_EVENT_COUNTERS options need to be enabled in your Linux kernel build configuration.”

If you fail to enable these parameters, when you run iotop, you might well be faced with one error or another, such as this error at the foot of the screen:

CONFIG_TASK_DELAY_ACCT not enabled in kernel, cannot determine SWAPIN and IO %

What Does It Do?

I’ll start this brief journey with iotop by looking first at using it without parameters. Just running

# iotop

presents a top -style interface that anyone that’s ever used top before might immediately recognize. It lays out the rapidly updating information in set columns and shows the I/O bandwidth for both read and writes by process or thread. Usefully, it also displays how much time each process spent using the swapfile, a temporary buffer, as a percentage for the last period displayed. If you’re not familiar with top you might miss the I/O priorities shown in addition.

Total DISK READ: 687.61 K/s | Total DISK WRITE: 0.00 B/s

TID  PRIO  USER   DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
281  be/0    root      186.53 K/s    0.00 B/s  0.00 % 63.66 % [loop0]
274  be/4    root      369.41 K/s    0.00 B/s  0.00 % 63.62 % mount.ntfs /dev/disk/by-uuid/A001D2F510B2CFE1 /root
1536 be/4   chris      0.00 B/s       0.00 B/s  0.00 %  0.00 % indicator-applet-session --oa~Applet_Factory --oaf-ior-fd=43

The upper totals show what the drive on my laptop is doing – mostly reads and no writing – at the moment I copied iotop’s output.

You might raise an eyebrow if you notice my laptop is using a loop disk but fret not, it’s a long story. The DISK READ column shows exactly which processes combined make up the total DISK READ at the top of the output.

Applied Science

On a busy Web server, you would most likely see Apache processes filling the top few places. Something along the lines of:

TID    PRIO  USER     DISK READ  DISK WRITE IO>    COMMAND
18981 be/4   www-data 0.00 B/s       3.92 K/s        apache2 -k start

Or, if you were trying to see which system user was utilizing all of your resources, and the CPU and RAM inspections showed nothing, then you might stumble across output like this:

8521 naughtyuser 17.21 K/s 8.31 M/s 0.00 % 0.00 % dd if=/dev/urandom of=/home/naughtyuser/destination_file

I hope by now you are beginning to see how useful and undeniably powerful iotop is. I always think of it as the missing jigsaw piece when you’re scratching your head in puzzlement at why a system is misbehaving.

Hitting r while iotop is running, which reverses the display so you see those not using up I/O resources, might be useful occasionally, and you still have the ability to flick back the display order by hitting r again.

If you feel a little overwhelmed, you can either start up the utility with

# iotop --only

or you can opt to hit the o key while it’s running to see just those processes or threads using I/O at that time.

You can also disable the number of threads and just print processes by using -P :

# iotop -P

For the purposes of adding a degree of automation, you can use batch mode, which doesn’t involve human interaction. You can write two iterations of iotop to a file like this:

# iotop --batch -n 2 > filename

It’s Nice To Be Nice

On a very busy server, you might look back to the PRIO column again. If you felt a process or service was out of control, you could use the ionice command to adjust its priority. In the same way the long-standing nice can lower or raise a process’s priority (CPU mostly, I surmise), ionice can adjust its disk access priority based on the type of class and priority it uses.

According to the ionice man page this little piece of software gets the “scheduling class and priority for a program.” It goes on to say that a program running with idle I/O priority will only get disk time when no other program is using disk resources; whereas, Best Effort is where any program not specifically given a defined priority is served such a resource at a priority ranking from 0 to 7, where the lower number equals a higher priority. If there are more than one Best Effort processes, they will be allocated disk resources in a round-robin fashion. Finally, Real Time scheduling is the highest priority and must be used cautiously because it can degrade the system by starving other processes of disk access. Real Time must be run with superuser privileges, as you might expect.

You could adjust a process to become an Idle I/O process as follows. In this case, I’m targeting Process ID (PID) 4012:

# ionice -c 3 -p 4012

Simply use -p to check what the current class and priority of a PID is.

If Apache wasn’t fast enough, you can use a process name and alter it to become a Best Effort program with the top priority available to it:

# ionice -c 2 -n 0 apache2

Back to iotop

There’s also some useful interactive commands with iotop.You can move the cursor arrows left and right to change the sorting column, press p to toggle the --processes option, press a to toggle the --accumulated option, and press any other key to force a refresh.

If the flood of information is just too much to bear, you can toggle between three --quiet settings. With -q , column names are only printed on the first iteration; using -qq , column names are never printed, and with -qqq , the I/O summary is never printed.

A feature for those familiar with top where pressing the s key lets you specify how many seconds delay between screen refreshes take place, or otherwise change the sampling period, is simply added with

# iotop -d 3

where the number is seconds between each refresh.

When troubleshooting specific issues with a particular user there’s little more useful than just highlighting their disk access alone, and this can be achieved by specifying the user as follows:

# iotop -u chrisbinnie

The End

For any seasoned admin, there’s little to no chance you haven’t faced disk bottleneck issues in the past. Beginners might look to CPU and RAM, and hardened veterans might try and decipher the unhelpful statistics produced by vmstat, but for those in the know, iotop wins hands down. I hope this brief insight means you will be using it next time you have to deal with performance issues. It’s a worthy addition to any admin’s toolkit, I’m sure you’ll agree.

Related content

  • A closer look at hard drives
    We continue our exploration of the world of hard drives – both solid state and spinning varieties.
  • HPC Storage – I/O Profiling

    HPC Storage is arguably one of the most pressing issues in HPC. Selecting various HPC Storage solutions is a problem that requires some research, study, and planning to be effective – particularly cost-effective. Getting this process started usually means understanding how your applications perform I/O. This article presents some techniques for examining I/O patterns in HPC applications.

  • Tools to Manage Storage

    We look at management tools and good management habits for Linux and proprietary storage solutions.

  • Stat-like Tools for Admins

    ASCII tools can be life savers when they provide the only access you have to a misbehaving server. However, once you're on the node what do you do? In this article, we look at stat-like tools: vmstat, dstat, and mpstat.

  • Assess USB performance while exploring storage caching
    Use dd fluently to compare disk performance.
comments powered by Disqus