Visualizing kernel scheduling
Behind Time
SchedViz [1] is one of a variety of open source tools recently released by Google that allows you to visualize how your programs are being handled by Linux kernel scheduling. The tool allows you to see exactly how your system is treating the various tasks it is running and allows you to fine-tune the way resources are allotted to each task.
SchedViz is designed to overcome a specific problem: The basic Linux tools available for scheduling [2] don't allow you to see very much. In practice, this means that most people guess how to schedule system resources, and given the complexity of modern systems, these guesses are often wrong.
Multiprocessing
Modern operating systems (OSs) execute multiple processes simultaneously by splitting the computing load across multiple cores and running each process for a short time before switching to a different task (multiprocessing). This feature presents a significant challenge for engineers: Where should each process run and for how long? How should you assign priority to the various tasks that your system needs to run?
A round-robin approach would assign each task processing time in such a way that each would receive equal time. In practice, however, some tasks – such as those related to the core functions of your OS – are of higher priority than others.
SchedViz makes use of a basic feature of the Linux kernel: the ability to capture data in real time about what each core of a multicore system is doing. The kernel is instrumented with hooks called tracepoints; when certain actions occur, any code hooked to the relevant tracepoint is called with arguments that describe the action. This data is referred to as a "trace."
SchedViz captures these "traces" and allows you to visualize them. A command-line script can capture the data over a specified time and then load it into SchedViz for as much analysis as you care to apply. You can also keep saved traces to compare any modification you make.
A basic example of a trace loaded into SchedViz for viewing is seen in Figure 1. Two processes are running simultaneously (green and blue). The blue process, known as the "victim thread," is likely to suffer a performance lag because it has been interrupted by the green process, which has swapped in to the blue thread's core.
In practice, behavior like that in Figure 1 is likely to result in suboptimal performance. There is no obvious reason why the green process swapped cores right at the end of its processing time, but by doing so, it interrupts another thread running on a different core. If the blue process needs to run quickly, particularly if it is a critical system process, you would like to stop this kind of behavior.
SchedViz allows you to see issues like this on a pannable, zoomable graph that shows all the cores of a multicore system. A more detailed trace of a three-core system is seen in Figure 2. Although it might seem inefficient to allocate resources in this way, with each process getting a short period of time before the core swaps to another process, this is how typical core-scheduling processes work.
The SchedViz visualization tools aims to achieve a number of key goals:
- Quantify task starvation caused by round-robin queueing. In the above example, it might be that the blue process is running slowly because the yellow process is assigned the same priority. This case is known as "task starvation," and can be a significant drain on performance in complex systems.
- Identify primary antagonists stealing work from critical threads. Some processes, as seen, steal a lot of resources from others that may be more important. The "primary antagonists" are the biggest drain on the performance of many systems, and finding out which processes are acting in this way is extremely useful.
- Determine when core allocation choices yield unnecessary waiting. In other situations, a process that you would like to prioritize is made to wait while another executes. SchedViz allows you to see this happening.
- Evaluate different scheduling policies. Linux has many ways of implementing scheduling policies [3] that determine which processes will run where and for how long. If you are seeking to improve system performance by manipulating these policies, SchedViz is invaluable, because it allows you to see a visual representation of how they are being applied.
At the moment, the primary use that most system administrators will have for SchedViz is to manage the way tasks are assigned across multicore processors. As Google put it in their blog, "not all cores are equal" [1], and that's because the structure of the memory hierarchy found in most modern systems can make it costly to shift a thread from one core to another, especially if that shift moves it to a new non-uniform memory access (NUMA) node [4]. This move is particularly a problem when it comes to handling modern encryption algorithms [5] that are becoming an integral part of working with web services and cloud storage.
Users can already pin threads explicitly to a CPU or a set of CPUs or can exclude threads from specific CPUs with the use of features like sched_setaffinity()
[6] or cgroups [7], which are both available in most Linux environments. However, such restrictions can also make scheduling even tougher. SchedViz allows you to see exactly how and when these rules are being enforced, allowing you to assess their effectiveness.
Installing SchedViz
SchedViz is hosted on GitHub [8], and the process for installing it will be familiar to most advanced users. To begin, clone the repository:
git clone https://github.com/google/schedviz.git
Next, install the dependencies. Because SchedViz has quite a few of these, it requires yarn
, so head to the Yarn website [9] and follow the instructions there. You should also make sure your version of Node.js is later than 10.9.0.
Now you need the GNU building tools and an unzip utility, so install them now if you don't have them. On Debian, you can run:
sudo apt-get update && sudo apt-get install build-essential unzip
Before running SchedViz for the first time, change to the location where the repo was cloned and install with Yarn:
cd schedviz yarn install
Once it has finished, navigate to the root of the repo folder and start up your server:
yarn bazel run server -- -- -storage_path="<Path to folder that stores traces>"
This command takes several options (Table 1).
Table 1
bazel Options
Name | Type | Description |
---|---|---|
storage_path
|
String | Required. The folder where you want trace data to be stored. This should be an empty folder that is not used by anything else. |
cache_size
|
Int | Optional. The maximum number of collections to keep open in memory at once. |
port
|
Int | Optional. The port to run the server on. Defaults to 7402 .
|
resources_root
|
String | Optional. The folder where the static files (e.g. HTML and JavaScript) are stored. Default is client . If using bazel to run the server, you shouldn't need to change this.
|
Using SchedViz
The most basic function of SchedViz is to collect a trace of activity from a particular machine. To do this, run the command:
sudo ./trace.sh -out '<Path to trace directory>' -capture_seconds '<No. of seconds>' [-buffer_size '<Size of trace buffer>'] [-copy_timeout '<Time to wait>']
The options taken by this command all contain default values if you don't want to specify them. The default number of seconds to record a trace is 5, the default buffer size is 409KB, and the default number of seconds to wait for a copy to finish is 5.
Once the command is run on the target machine [10], move the tar.gz
file to a machine that can use the SchedViz user interface. To look at the data you've collected, click Upload Trace
on the SchedViz collections page (Figure 3). A list of all the traces you have loaded into SchedViz is presented that can be sorted by the date they were collected, by a description of the traces, or by the user who collected them.
Opening a trace from the collections menu will open a visualization similar to those shown in the figures here, with a representation of all the cores on your system and what each is doing.
What you do with this information will depend on your requirements. Google has provided a guide [1] that explains how to spot common problems in SchedViz traces, where a particular process might be causing your system performance to lag. These issues can then be addressed by looking at your scheduling policies and adapting them to the needs of your system.
Buy this article as PDF
(incl. VAT)