« Previous 1 2
Thread processing in Python
Fork This
Simple Introductory Code
To illustrate how to code with Pymp, the sample code in Listing 3 from the website begins with basic Python code. To keep things simple, this is a serial code with a single array. Listing 4 is the Pymp version of the same code.
Listing 3
Python Code
01 from __future__ import print_function 02 03 ex_array = np.zeros((100,), dtype='uint8') 04 for index in range(0, 100): 05 ex_array[index] = 1 06 print('Yay! {} done!'.format(index))
Listing 4
Pymp Code
01 from __future__ import print_function 02 03 import pymp 04 05 ex_array = pymp.shared.array((100,), dtype='uint8') 06 with pymp.Parallel(4) as p: 07 for index in p.range(0, 100): 08 ex_array[index] = 1 09 # The parallel print function takes care of asynchronous output. 10 p.print('Yay! {} done!'.format(index))
The first change to the serial code is creating a shared array with a pymp
method. The next change is to add the statement creating the number of processes (with pymp.Parallel(4) as p
). Remember that these are forked processes and not threads.
The final action is to change the range
function to p.range(0, 100)
. According to the Pymp website, this is the same as using the static
schedule.
The approach illustrated in these code samples bypasses the GIL in favor of the operating system's fork method. From the GitHub site, "Due to the copy-on-write strategy, this causes only a minimal overhead and results in the expected semantics." Note that using the system fork operation excludes Windows, because it lacks a fork mechanism.
Laplace Solver Example
The next example, the common Laplace solver, is a little more detailed. The code is definitely not the most efficient – it uses loops – but I hope it illustrates how to use Pymp. For the curious, timings are included in the code. Listing 5 is the Python version, and Listing 6 shows the Pymp version of the code. Changed lines in are marked with arrows (-->**<---
).
Listing 5
Python Laplace Solver
01 import numpy 02 from time import perf_counter 03 04 nx = 1201 05 ny = 1201 06 07 # Solution and previous solution arrays 08 sol = numpy.zeros((nx,ny)) 09 soln = sol.copy() 10 11 for j in range(0,ny-1): 12 sol[0,j] = 10.0 13 sol[nx-1,j] = 1.0 14 # end for 15 16 for i in range(0,nx-1): 17 sol[i,0] = 0.0 18 sol[i,ny-1] = 0.0 19 # end for 20 21 # Iterate 22 start_time = perf_counter() 23 for kloop in range(1,100): 24 soln = sol.copy() 25 26 for i in range(1,nx-1): 27 for j in range (1,ny-1): 28 sol[i,j] = 0.25 * (soln[i,j-1] + soln[i,j+1] + soln[i-1,j] + soln[i+1,j]) 29 # end j for loop 30 # end i for loop 31 #end for 32 end_time = perf_counter() 33 34 print(' ') 35 print('Elapsed wall clock time = %g seconds.' % (end_time-start_time) ) 36 print(' ')
Listing 6
Pymp Laplace Solver
01 --> import pymp <-- 02 from time import perf_counter 03 04 nx = 1201 05 ny = 1201 06 07 # Solution and previous solution arrays 08 --> sol = pymp.shared.array((nx,ny)) <-- 09 --> soln = pymp.shared.array((nx,ny)) <-- 10 11 for j in range(0,ny-1): 12 sol[0,j] = 10.0 13 sol[nx-1,j] = 1.0 14 # end for 15 16 for i in range(0,nx-1): 17 sol[i,0] = 0.0 18 sol[i,ny-1] = 0.0 19 # end for 20 21 # Iterate 22 start_time = perf_counter() 23 --> with pymp.Parallel(6) as p: <-- 24 for kloop in range(1,100): 25 soln = sol.copy() 26 27 for i in p.range(1,nx-1): 28 for j in p.range (1,ny-1): 29 sol[i,j] = 0.25 * (soln[i,j-1] + soln[i,j+1] + soln[i-1,j] + soln[i+1,j]) 30 # end j for loop 31 # end i for loop 32 # end kloop for loop 33 # end with 34 end_time = perf_counter() 35 36 print(' ') 37 print('Elapsed wall clock time = %g seconds.' % (end_time-start_time) ) 38 print(' ')
To show that Pymp is actually doing what it is supposed to do, Table 1 shows the timings for various numbers of cores. Notice that the total time decreases as the number of cores increases, as expected.
Table 1
Pymp Timings
Number of Cores | Total Time (sec) |
---|---|
Base (serial) | 165 |
1 | 94 |
2 | 42 |
4 | 10.9 |
6 | 5 |
Summary
Ever since Python was created, people have been trying to achieve multithreaded computation. Several tools were created to do computations outside of and integrate with Python.
Over time, parallel programming approaches have become standards. OpenMP is one of the original standards and is very popular in C/C++ and Fortran programming, so a large number of developers know and use it in application development. However, OpenMP is used with compiled, not interpreted, languages.
Fortunately, the innovative Python Pymp tool was created. It is an OpenMP-like Python module that uses the fork mechanism of the operating system instead of threads to achieve parallelism. As illustrated in the examples in this article, it's not too difficult to port some applications to Pymp.
Infos
- Cython: https://cython.org/
- ctypes: https://docs.python.org/3/library/ctypes.html
- F2PY: https://docs.scipy.org/doc/numpy/f2py/
- Pymp on GitHub: https://github.com/classner/pymp
- OpenMP: https://computing.llnl.gov/tutorials/openMP/
- pickle: https://rushter.com/blog/pickle-serialization-internals/
- Multiprocessor module: https://docs.python.org/2/library/multiprocessing.html
- Anaconda: https://www.anaconda.com/distribution/
- pip: https://pypi.org/project/pip/
« Previous 1 2
Buy this article as PDF
(incl. VAT)