Parallelizing and memorizing Python programs with Joblib

A Library for Many Jobs

Article from ADMIN 20/2014
By
The Joblib Python Library handles frequent problems – like parallelization, memorization, and saving and loading objects – in almost no time, giving programmers more freedom to push on with their core tasks.

In recent years, new programming concepts have enriched the computer world. Instead of increasing processor speed, the number of processors have grown in many data centers. Parallel processing supports the handling of large amounts of data but also often requires a delicate transition from traditional, sequential procedures to specially adapted methods. The Joblib Python library [1] saves much error-prone programming in typical procedures such as caching and parallelization.

A number of complex tasks come to mind when you think about parallel processing. The use of large data sets, wherein each entry is independent of the other, is an excellent choice for simultaneous processing by many CPUs (Figure 1). Tasks like this are referred to as "embarrassingly" parallel. Where exactly the term comes from is unclear, but it does suggest that converting an algorithm to a parallelized version should not take long.

Figure 1: Problems involving input objects that can be independently and concurrently processed are referred to as "embarrassingly" parallel.

Experienced developers know, however, that problems can occur in everyday programming practice in any new implementation and that you can quickly get bogged down in the implementation details. The Joblib module, an easy solution for embarrassingly parallel tasks, offers a Parallel class, which requires an arbitrary function that takes exactly one argument.

Parallel Decoration

To help Parallel cooperate with the function in question (I'll call it f(x)), Joblib comes with a delayed() method, which acts as the decorator. Listing 1 shows a simple example of an implementation of f(x) that returns x unchanged. The for loop shown in Listing 1 iterates over a list l and passes the individual values to f(x); each list item in l results in a separate job.

Listing 1

Joblib: Embarrassingly Parallel

01 from joblib import Parallel, delayed
02
03 def f(x):
04     return x
05
06 l = range(5)
07 results = Parallel(n_jobs=-1)(delayed(f)(i) for i in l))

The most interesting part of the work is handled by the anonymous Parallel object, which is generated on the fly. It distributes the calls to f(x) across the computer's various CPUs or processor cores. The n_jobs argument determines how many it uses. By default, this is set to 1, so that Parallel only starts a subprocess. Setting it to -1 uses all the available cores, -2 leaves one core unused, -3 leaves two unused, and so on. Alternatively n_jobs takes a positive integer as a counter that directly defines the number of processes to use.

The value of n_jobs can also be more than the number of available physical cores; the Parallel class simply starts the number of Python processes defined by n_jobs, and the operating system lets them run side by side. Incidentally, this also means that the exchange of global variables between the individual jobs is impossible, because different operating system processes cannot directly communicate with one another. Parallel bypasses this limitation by serializing and caching the necessary objects.

The optimum number of processes depends primarily on the type of tasks to be performed. If your bottleneck is reading and writing data to the local hard disk or across the network, rather than processor power, the number of processes can be higher. As a rule of thumb, you can go for the number of available processor cores times 1.5; however, if each process fully loads a CPU permanently, you will not want to exceed the number of available physical processors.

See How They Run

Additionally, the Parallel class offers an optional verbose argument with regular output of status messages that illustrate the overall progress. The messages show the number of processed and remaining jobs and, if possible, the estimated remaining and elapsed time.

The verbose option is by default set to 0; you can set it to an arbitrary positive number to increase the output frequency. Note that the higher the value of verbose, the more intermediate steps Joblib outputs. Listing 2 shows typical output.

Listing 2

Parallel with Status Reports

Parallel(n_jobs=2, verbose=5)(delayed(f)(i) for i in l))
[Parallel(n_jobs=2)]: Done    1 out of  181 | elapsed:    0.0s remaining:    4.5s
[Parallel(n_jobs=2)]: Done  198 out of 1000 | elapsed:    1.2s remaining:    4.8s
[Parallel(n_jobs=2)]: Done  399 out of 1000 | elapsed:    2.3s remaining:    3.5s
[Parallel(n_jobs=2)]: Done  600 out of 1000 | elapsed:    3.4s remaining:    2.3s
[Parallel(n_jobs=2)]: Done  801 out of 1000 | elapsed:    4.5s remaining:    1.1s
[Parallel(n_jobs=2)]: Done 1000 out of 1000 | elapsed:    5.5s finished

The exact number of interim reports varies. At the beginning of execution, it is often still unclear how many jobs are pending in total, so this number is only an approximation. If you set verbose to a value above 10, Parallel outputs the current status after each iteration. Additionally, the argument offers the option of redirecting the output: If you set verbose to a value of more than 50, Parallel writes status reports to standard output. If it is lower, Parallel uses stderr – that is, the error channel of the active shell.

A third, optional argument that Parallel takes is pre_dispatch, which defines how many of the jobs the class should queue up for immediate processing. By default, Parallel directly loads all the list items into memory, and pre_dispatch is set to 'all'. However, if processing consumes a large amount of memory, a lower value provides an opportunity to save RAM. To do this, you can enter a positive integer.

Convenient Multiprocessing Module

With its Parallel class, Joblib essentially provides a convenient interface for the Python multiprocessing module. It supports the same functionality, but the combination of Parallel and delayed() reduces the implementation overhead of simple parallelization tasks to a one-liner. Additionally, status outputs and configuration options are available – each with an argument.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Parallel Python with Joblib

    The Joblib Python Library handles frequent problems – like parallelization, memorization, and saving and loading objects – in almost no time, giving programmers more freedom to push on with their core tasks.

comments powered by Disqus