« Previous 1 2 3
Parallelizing and memorizing Python programs with Joblib
A Library for Many Jobs
Memory Mapping
Under normal circumstances, mmap_mode='r+'
is recommended for enabling memory mapping. This value opens an optionally existing file and appends new data. In the other modes, Memory
does not write any new data but only reads from the existing file (r
) or overwrites the existing data (w+
). The c
(copy-on-write) mode tells Memory
to treat the file on the disk as immutable, as with r
, but it does keep new assignments in memory.
If you need to save disk space rather than time, you can initialize the memory object with the argument compress=True
. This option tells the Memory
function to compress results when saving to disk; however, it rules out the option of memory mapping.
Finally, the Memory
class also allows you to issue status messages. Its verbose
constructor argument defaults to 1
, which means that cache()
outputs a status message every time a memorized function is called when computing the results from scratch. If you substitute verbose=0
, the potentially very numerous status reports are suppressed. Substituting the default value for something higher tells Memory
to report on each call of the function, whether the result was in a file or is recomputed.
Finally, cache()
uses the ignore
parameter to accept a list of function arguments that it ignores during memorization. This functionality is useful, if individual function arguments only affect the screen output but not the function result. Listing 4 shows the f(x)
function with the additional verbose
argument, whose value is irrelevant for the return value of the function.
Listing 4
Ignoring Individual Arguments
01 from joblib import Memory 02 03 memory = Memory() 04 05 @memory.cache(ignore=['verbose']) 06 def f(x, verbose=0): 07 if verbose > 0: 08 print('Running f(x).') 09 return x
On Disk
Joblib also provides two functions for saving and loading Python objects: joblib.dump()
and joblib.load()
. These functions are also used in the Memory
class, but they also work independently of it and replace the Python pickle
module's mechanisms for serializing objects with what are often more efficient methods. In particular, Joblib stores large NumPy arrays quickly and in a space-saving way.
The joblib.dump()
function accepts any Python object and a file name as arguments. Without other parameters, the object ends up in the specified file. Calling joblib.load()
with the same file name then restores this object:
import joblib x = ... joblib.dump(x, 'file') ... x = joblib.load('file')
Like Memory
, dump()
also supports the optional compress
parameter. This parameter is a number from 0
to 9
, indicating the compression level: 0
means no compression at all; 9
uses the least disk space but also takes the most time. In combination with compress
, the cache_size
argument also determines how much memory Joblib uses to compress data quickly before writing to disk. The specified value describes the size in megabytes, but that is merely an estimate that Joblib exceeds if needed, such as when handling very large NumPy arrays.
The dump()
complement load()
also optionally uses the memory mapping method – like Memory
. The mmap_mode
argument enables this with the same parameters and possible values as for Memory
: r+
, r
, w+
, and c
are used for reading and writing, exclusive reading, overwriting, or read-only and in-memory completion.
Prestigious Helper
The value of the Joblib library is hard to overstate. It solves some common tasks in a flash with an intuitive interface. The problems – simple parallelization, memorization, and saving and loading objects – are those programmers often encounter in practice. What you find here is a convenient solution that gives you more time to devote to genuine problems.
Joblib is included in most distributions and can otherwise easily be imported with the Python package management tools, Easy Install and Pip, using easy_install joblib
or pip install joblib
. This process is quick, because – besides Python itself – Joblib does not require any other packages.
Infos
- Joblib for Python: http://pythonhosted.org/joblib/
- Caching in RAM with Python: http://code.activestate.com/recipes/52201/
- Python NumPy library: http://www.numpy.org/
« Previous 1 2 3
Buy this article as PDF
(incl. VAT)