Profiling Python code
In Profile
To improve the performance of your applications, you need to conduct some kind of dynamic (program, software, code) analysis, also called profiling, to measure metrics of interest. A key metric for developers is time (i.e., where is the code spending most of its time?), because it allows you to focus on areas, or hotspots, that can be made to run faster.
And, this might seem obvious, but if you don't profile for code optimization, you could flounder all over the code improving sections you think might be bottlenecks. I have seen people spend hours working a particular part of their code when a simple profile showed that portion of the code contributed very little to the overall run time. I admit that I have also done this; however, once I profiled the code, I found that I had wasted my time and needed to focus elsewhere.
Different kinds of profiling (e.g., event-based, statistical, instrumented, simulation), are used in different situations. In this article, I focus on two types: deterministic and statistical. Deterministic profiling captures every computation of the code and produces very accurate profiles, but it can greatly slow down code performance. Although you achieve very good accuracy with the profile, run times are greatly increased, and you have to wonder whether the profiling didn't adversely affect how the code ran. For example, did the profiling cause the computation bottlenecks to move to a different place in the code?
Statistical profiling, on the other hand, takes periodic "samples" of the code computations and uses them as representations of the profile of the code. This method usually has very little effect on code performance, so you can get a profile that is very close to the real execution of the code. You do have to wonder about the correct time interval to get an accurate profile of the application while not affecting the run time. Usually this means setting the time intervals to smaller and smaller values to capture the profile accurately. If the interval becomes too small, however, it almost becomes deterministic profiling, and run time is greatly increased.
If your code takes a long time to execute (e.g., hours or days), deterministic profiling might be impossible because the increase in run time is unacceptable. In this case, statistical profiling is appropriate because of the longer periods of time available to sample performance.
In this article, I focus on profiling Python code, primarily because of a current lack of Python profiling but also because I think the process of profiling Python code, creating functions, and using Numba to then compile these functions for CPUs or GPUs is a good way to help improve performance.
To help illustrate some tools you can use to profile Python code, I will use an example of an idealized molecular dynamics (MD) application. I'll work through some profiling tools and modify the code in a reasonable manner for better profiling. The first, and probably most used and flexible, method I want to mention is "manual" profiling.
Manual Profiling
The manual profiling approach is fairly simple but involves inserting timing points into your code. Timing points surround a section of code and collect the total elapsed time(s) for the section, as well as how many times the section is executed. From this information, you can calculate an average elapsed time. The timing points can be spread throughput the code, so you get an idea of how much time each section of the code takes. The elapsed times are printed at the end of execution, to give you an idea of where you should focus your efforts to improve performance.
A key advantage of this approach is its generally low overhead. Additionally, you can control which portions of the code are timed (you don't have to profile the entire code). A downside is that you have to instrument your code by inserting timing points throughout. However, inserting these points is not difficult.
An easy way to accomplish this uses the Python time
module. Simple code from an article on the Better Programming [1] website (example 16) is shown in Listing 1. The code simply calls the current time before and after a section of code of interest. The difference is elapsed time, or the amount of time needed to execute that section of code.
Listing 1
Time to Execute
import time start_time = time.time() # Code to check follows a, b = 1,2 c = a + b # Code to check ends end_time = time.time() time_taken = (end_time- start_time) print(" Time taken in seconds: {0} s").format(time_taken_in_micro)
If a section of code is called repeatedly, just sum the elapsed times for the section and sum the number of times that section is used; then, you can compute the average time through the code section. If the number of calls is large enough, you can do some quick descriptive statistics and compute the mean, median, variance, min, max, and deviations.
cProfile
cProfile is a deterministic profiler for Python and is recommended "… for most users." In general terms, it creates a set of statistics that lists the total time spent in certain parts of the code, as well as how often the portion of the code was called.
cProfile, as the name hints, is written in C as a Python extension and comes in the standard Python 3, which keeps the overhead low, so the profiler doesn't affect the amount of time much.
cProfile outputs a few stats about the test code:
ncalls
– Number of calls to the portion of code.tottime
– Total time spent in the given function (excludes time made in calls to subfunctions).percall
–tottime
divided byncalls
.cumtime
– Cumulative time spent in the specific function, including all subfunctions.percall
–cumtime
divided byncalls
.
cProfile also outputs the file name of the code, in case multiple file are involved, as well as the line number of the function (lineno
).
Running cProfile is fairly simple:
$ python -m cProfile -s cumtime script.py
The first part of the command tells Python to use the cProfile
module. The output from cProfile is sorted (-s
) by cumtime
(cumulative time). The last option on the command line is the Python code of interest. cProfile also has an option (-o
) to send the stats to an output file instead of stdout. Listing 2 shows a sample of the first few lines from cProfile on a variation of the MD code.
Listing 2
cProfile Output
Thu Nov 7 08:09:57 2019 12791143 function calls (12788375 primitive calls) in 156.745 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 148/1 0.001 0.000 156.745 156.745 {built-in method builtins.exec} 1 149.964 149.964 156.745 156.745 md_002.py:3() 12724903 3.878 0.000 3.878 0.000 {built-in method builtins.min} 1 2.649 2.649 2.727 2.727 md_002.py:10(init) 50 0.168 0.003 0.168 0.003 md_002.py:127(update) 175/2 0.001 0.000 0.084 0.042 :978(_find_and_load) ...
pprofile
To get a line-by-line profile of your code, you can use the pprofile tool for a granular, thread-aware analysis for deterministic or statistical profiling (pure Python). The form of pprofile
is:
$ pprofile some_python_executable arg1 ...
After the tool finishes, it prints annotated code of each file involved in the execution.
By default, pprofile
profiling is deterministic, which, although it slows down the code, produces a very complete profile. You can also use pprofile
in a statistical manner, which uses much less time:
$ pprofile --statistic .01 code.py
With the statistic
option, you also need to specify the period of time between sampling. In this example, a period of 0.01 seconds was used.
Be careful when using the statistic
option because, if the sample time is too long, you can miss computations, and the output will incorrectly record zero percent activity. Conversely, to get a better estimation of the time spent in certain portions of the code, you have to reduce the time between samples to the point of almost deterministic profiling.
The deterministic pprofile
sample output in Listing 3 uses the same code as the previous cProfile example. I cut out sections of the output because it is very extensive. I do want to point out the increase in execution time by about a factor of 10 (i.e., it ran 10 times slower than without profiling).
Listing 3
pprofile Output
Command line: md_002.py Total duration: 1662.48s File: md_002.py File duration: 1661.74s (99.96%) Line #| Hits| Time| Time per hit| %|Source code ------+----------+-------------+-------------+-------+----------- 1| 0| 0| 0| 0.00%|# md test code 2| 0| 0| 0| 0.00%| 3| 2| 3.50475e-05| 1.75238e-05| 0.00%|import platform 4| 1| 2.19345e-05| 2.19345e-05| 0.00%|from time import clock (call)| 1| 2.67029e-05| 2.67029e-05| 0.00%|# :1009 _handle_fromlist 5| 1| 2.55108e-05| 2.55108e-05| 0.00%|import numpy as np (call)| 1| 0.745732| 0.745732| 0.04%|# :978 _find_and_load 6| 1| 2.57492e-05| 2.57492e-05| 0.00%|from sys import exit (call)| 1| 1.7643e-05| 1.7643e-05| 0.00%|# :1009 _handle_fromlist 7| 1| 7.86781e-06| 7.86781e-06| 0.00%|import time ... 234| 0| 0| 0| 0.00%| # Compute the potential energy and forces 235| 12525000| 51.0831| 4.07849e-06| 3.07%| for j in range(0, p_num): 236| 12500000| 51.6473| 4.13179e-06| 3.11%| if (i != j): 237| 0| 0| 0| 0.00%| # Compute RIJ, the displacement vector 238| 49900000| 210.704| 4.22253e-06| 12.67%| for k in range(0, d_num): 239| 37425000| 177.055| 4.73093e-06| 10.65%| rij[k] = pos[k,i] - pos[k,j] 240| 0| 0| 0| 0.00%| # end for 241| 0| 0| 0| 0.00%| 242| 0| 0| 0| 0.00%| # Compute D and D2, a distance and a truncated distance 243| 12475000| 50.5158| 4.04936e-06| 3.04%| d = 0.0 244| 49900000| 209.465| 4.1977e-06| 12.60%| for k in range(0, d_num): 245| 37425000| 175.823| 4.69801e-06| 10.58%| d = d + rij[k] ** 2 246| 0| 0| 0| 0.00%| # end for 247| 12475000| 78.9422| 6.32803e-06| 4.75%| d = np.sqrt(d) 248| 12475000| 64.7463| 5.19008e-06| 3.89%| d2 = min(d, np.pi / 2.0) 249| 0| 0| 0| 0.00%| 250| 0| 0| 0| 0.00%| # Attribute half of the total potential energy to particle J 251| 12475000| 84.7846| 6.79636e-06| 5.10%| potential = potential + 0.5 * np.sin(d2) * np.sin(d2) 252| 0| 0| 0| 0.00%| 253| 0| 0| 0| 0.00%| # Add particle J's contribution to the force on particle I. 254| 49900000| 227.88| 4.56674e-06| 13.71%| for k in range(0, d_num): 255| 37425000| 244.374| 6.52971e-06| 14.70%| force[k,i] = force[k,i] - rij[k] * np.sin(2.0 * d2) / d 256| 0| 0| 0| 0.00%| # end for 257| 0| 0| 0| 0.00%| # end if 258| 0| 0| 0| 0.00%| 259| 0| 0| 0| 0.00%| # end for ...
Buy this article as PDF
(incl. VAT)