Lead Image © Christos Georghiou, 123RF.com

Lead Image © Christos Georghiou, 123RF.com

Thread processing in Python

Fork This

Article from ADMIN 57/2020
By
Pymp uses OpenMP fork techniques to perform thread processing in Python.

Ever since Python was created, users have been looking for ways to achieve multiprocessing with threads, which the Python global interpreter lock (GIL) prevents. One common approach to getting around the GIL is to run computationally intensive code outside of Python with tools such as Cython [1] and ctypes [2]. You can even use F2PY [3] with compiled C functions.

All of the previously mentioned tools bypass Python and rely on a compiled language to provide threaded multiprocessing elements with an interface to Python. What is really needed is either a way to perform threaded processing or a form of multiprocessing in Python itself. A very interesting tool for this purpose is Pymp [4], a Python-based method of providing OpenMP-like functionality.

OpenMP

OpenMP [5] employs a few principles in its programming model. The first is that everything takes place in threads. The second is the fork-join model, which comprises parallel regions in which one or more threads can be used (Figure 1).

Figure 1: Illustration of the fork-join model for OpenMP.

Only a single thread (the master thread) exists before the parallel region of the OpenMP program. When the parallel region is encountered, the master thread creates a team of parallel threads. The code in this parallel region is executed in parallel among the various team threads.

When the threads complete their code in the parallel region, they synchronize and terminate, with only the master thread remaining. Inside the parallel region, threads typically share data, and all of the threads can access this shared data at the same time.

The process of forking threads in a parallel region, joining the data back to the master thread, and terminating the other threads can be done many times in a single program, although you don't want to do it too often because of the need to create and destroy the threads.

Pymp

Because the goal of Pymp is to bring OpenMP-like functionality to Python, Pymp and Python should naturally share some concepts. A single master thread forks into multiple threads, sharing data and then synchronizing (joining) and destroying the threads.

As with OpenMP applications, when Pymp Python code hits a parallel region, processes (termed child processes) are forked and are in a state that is nearly the same as the "master process." Note that these are forked processes and not threads, as is typical with OpenMP applications. As for the shared memory, according to the Pymp website, "… the memory is not copied, but referenced. Only when a process writes into a part of the memory [does] it gets its own copy of the corresponding memory region. This keeps the processing overhead low (but of course not as low as for OpenMP threads)."

As the parallel region ends (the join phase), all of the child processes exit so that only the master process continues. Any data structures from the child processes are synchronized using either shared memory or a manager process and the pickle protocol [6] via the multiprocessor module [7]. This module has an API similar to the threading module and supports spawning processes.

As with OpenMP, Pymp numbers the child processes with the thread_num variable. The master process has thread_num 0.

With OpenMP, you define a parallel region. In Fortran (Listing 1) and C (Listing 2), the regions are defined by the directives. Pymp has no way to mark the parallel region. The Pymp website recommends you use a pymp.rangeorpymp.xrange statement, or even an if-else statement. Doing so achieves the same expected behavior.

Listing 1

Defining a Parallel Region in Fortran

"!$omp parallel
...
!$omp end parallel

Listing 2

Defining a Parallel Region in C

#pragma omp parallel
{
...}
#pragma end parallel

From the website, example code for making a parallel region with Pymp might look like:

with pymp.Parallel(4) as p:
  for sec_idx in p.xrange(4):
    if sec_idx == 0:
      p.print('Section 0')
    elif sec_idx == 1:
      p.print('Section 1')
    ...

The first statement in the code outline defines the parallel construct.

As with OpenMP code, you can control various aspects of Pymp code with environment variables (e.g., with the OpenMP variables that begin with OMP), or you can use Pymp versions that begin with PYMP. The mapping is pretty straightforward:

  • PYMP_NESTED/OMP_NESTED
  • PYMP_THREAD_LIMIT/OMP_THREAD_LIMIT
  • PYMP_NUM_THREADS/OMP_NUM_THREADS

The first variable is a binary: TRUE or FALSE. The second variable can be used to set a limit on the number of threads. The third variable is a comma-separated list of the number of threads per nesting level. If only one value is specified, it is used for all levels.

Other aspects to Pymp are explained on the website, including:

  • Schedules
  • Variable scopes
  • Nested loops
  • Exceptions
  • Conditional parallelism
  • Reductions
  • Iterables

This article is too short to cover these topics, but if you are interested, the GitHub site [4] briefly explains them, and you can create some simple examples for further exploration.

Installing Pymp

Although I use Anaconda [8], I also use pip [9] when needed. When I examined both Anaconda and pip, the latest version they both had was Pymp v0.01, although the last version released was 0.4.2 on September 7, 2018; therefore, I installed Pymp according to the instructions on the website.

After downloading and exploding the .tar.gz or .zip source file and moving to the directory created, I ran the command,

$ python setup.py develop

which builds Pymp. Fortunately, the Pymp codebase is fairly small, so the build goes very fast.

Note that the website proposes using the latest master branch from the Git repo if you want the absolute latest version of Pymp.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus