« Previous 1 2 3 4 Next »
Profiling Python code
In Profile
First Function Creation
The potential energy and force computations are the dominant part of the run time, so to better profile them, it is best to isolate that code in a function. Perhaps a bit counterintuitively, I created a function that initializes the algorithm and a second function for the update
loops and called the resulting code md_002.py
. (My modified code is available online.) Because the potential energy and force computations change very little, I won't be profiling this version of the code. All I did was make sure I was getting the same answers as in the previous version. However, feel free to practice profiling it.
Final Version
The final version of the code moves the section of code computing the potential energy and forces into a function for better profiling. The code, md_003.py
, has a properties
function that computes the potential energy and forces.
The cProfile results don't show anything useful, so I will skip that output. On the other hand, the pprofile
output has some useful information (Listing 9). The excerpts mostly focus on the function that computes potential energy and forces. Notice that the overall time to run the code is still about 10 times longer.
Listing 9
md_003.py Output Excerpts
Command line: md_003.py Total duration: 1459.49s File: md_003.py File duration: 1458.73s (99.95%) Line #| Hits| Time| Time per hit| %|Source code ------+----------+-------------+-------------+-------+----------- 1| 0| 0| 0| 0.00%|# md test code 2| 0| 0| 0| 0.00%| 3| 2| 3.40939e-05| 1.70469e-05| 0.00%|import platform 4| 1| 2.28882e-05| 2.28882e-05| 0.00%|from time import clock (call)| 1| 2.71797e-05| 2.71797e-05| 0.00%|# :1009 _handle_fromlist 5| 1| 2.69413e-05| 2.69413e-05| 0.00%|import numpy as np (call)| 1| 0.751632| 0.751632| 0.05%|# :978 _find_and_load 6| 1| 2.47955e-05| 2.47955e-05| 0.00%|from sys import exit (call)| 1| 1.78814e-05| 1.78814e-05| 0.00%|# :1009 _handle_fromlist 7| 1| 8.34465e-06| 8.34465e-06| 0.00%|import time ... 160| 51| 0.000295162| 5.78749e-06| 0.00%|def properties(p_num, d_num, pos, vel, mass): 161| 0| 0| 0| 0.00%| 162| 50| 0.000397682| 7.95364e-06| 0.00%| import numpy as np 163| 0| 0| 0| 0.00%| 164| 0| 0| 0| 0.00%| 165| 0| 0| 0| 0.00%| # compute force, potential, kinetic 166| 50| 0.000529528| 1.05906e-05| 0.00%| force = np.zeros([ d_num, p_num ]) 167| 50| 0.000378609| 7.57217e-06| 0.00%| rij = np.zeros(d_num) 168| 0| 0| 0| 0.00%| 169| 50| 0.000226259| 4.52518e-06| 0.00%| potential = 0.0 170| 0| 0| 0| 0.00%| 171| 25050| 0.0909703| 3.63155e-06| 0.01%| for i in range(0, p_num): 172| 0| 0| 0| 0.00%| 173| 0| 0| 0| 0.00%| # Compute the potential energy and forces 174| 12525000| 44.7758| 3.57492e-06| 3.07%| for j in range(0, p_num): 175| 12500000| 45.4399| 3.63519e-06| 3.11%| if (i != j): 176| 0| 0| 0| 0.00%| # Compute RIJ, the displacement vector 177| 49900000| 183.525| 3.67786e-06| 12.57%| for k in range(0, d_num): 178| 37425000| 155.539| 4.15603e-06| 10.66%| rij[k] = pos[k,i] - pos[k,j] 179| 0| 0| 0| 0.00%| # end for 180| 0| 0| 0| 0.00%| 181| 0| 0| 0| 0.00%| # Compute D and D2, a distance and a truncated distance 182| 12475000| 44.5996| 3.57512e-06| 3.06%| d = 0.0 183| 49900000| 184.464| 3.69668e-06| 12.64%| for k in range(0, d_num): 184| 37425000| 155.339| 4.15067e-06| 10.64%| d = d + rij[k] ** 2 185| 0| 0| 0| 0.00%| # end for 186| 12475000| 68.8519| 5.51919e-06| 4.72%| d = np.sqrt(d) 187| 12475000| 56.0835| 4.49567e-06| 3.84%| d2 = min(d, np.pi / 2.0) 188| 0| 0| 0| 0.00%| 189| 0| 0| 0| 0.00%| # Attribute half of the total potential energy to particle J 190| 12475000| 74.6307| 5.98242e-06| 5.11%| potential = potential + 0.5 * np.sin(d2) * np.sin(d2) 191| 0| 0| 0| 0.00%| 192| 0| 0| 0| 0.00%| # Add particle J's contribution to the force on particle I. 193| 49900000| 199.233| 3.99264e-06| 13.65%| for k in range(0, d_num): 194| 37425000| 212.352| 5.67406e-06| 14.55%| force[k,i] = force[k,i] - rij[k] * np.sin(2.0 * d2) / d 195| 0| 0| 0| 0.00%| # end for 196| 0| 0| 0| 0.00%| # end if 197| 0| 0| 0| 0.00%| 198| 0| 0| 0| 0.00%| # end for 199| 0| 0| 0| 0.00%| # end for 200| 0| 0| 0| 0.00%| 201| 0| 0| 0| 0.00%| # Compute the kinetic energy 202| 50| 0.000184059| 3.68118e-06| 0.00%| kinetic = 0.0 203| 200| 0.000753641| 3.76821e-06| 0.00%| for k in range(0, d_num): 204| 75150| 0.265535| 3.5334e-06| 0.02%| for j in range(0, p_num): 205| 75000| 0.298971| 3.98628e-06| 0.02%| kinetic = kinetic + vel[k,j] ** 2 206| 0| 0| 0| 0.00%| # end for 207| 0| 0| 0| 0.00%| # end for 208| 0| 0| 0| 0.00%| 209| 50| 0.000200987| 4.01974e-06| 0.00%| kinetic = 0.5 * mass * kinetic 210| 0| 0| 0| 0.00%| 211| 50| 0.000211716| 4.23431e-06| 0.00%| return force, kinetic, potential 212| 0| 0| 0| 0.00%| 213| 0| 0| 0| 0.00%|# end def ...
Again, most of the time in the code is spent in the properties
function, which computes the potential energy and forces. The first few loops take up most of the time.
Out of curiosity, I looked at the output from line_profiler
(Listing 10). Remember that all time percentages are relative to the time for that particular routine (not the entire code). The first couple of loops used a fair percentage of the run time. The last loop that computes the force array,
for k in range(0, d_num): force[k,i] = force[k,i] - rij[k] * np.sin (2.0 * d2) / d
used about 25 percent of the run time for this function.
Listing 10
md_003.py line_profiler Output
Total time: 358.778 s File: ./md_003.py Function: properties at line 159 Line # Hits Time Per Hit % Time Line Contents ============================================================== 159 @profile 160 def properties(p_num, d_num, pos, vel, mass): 161 162 50 164.0 3.3 0.0 import numpy as np 163 164 165 # compute force, potential, kinetic 166 50 351.0 7.0 0.0 force = np.zeros([ d_num, p_num ]) 167 50 153.0 3.1 0.0 rij = np.zeros(d_num) 168 169 50 30.0 0.6 0.0 potential = 0.0 170 171 25050 14447.0 0.6 0.0 for i in range(0, p_num): 172 173 # Compute the potential energy and forces 174 12525000 7036459.0 0.6 2.0 for j in range(0, p_num): 175 12500000 7730669.0 0.6 2.2 if (i != j): 176 # Compute RIJ, the displacement vector 177 49900000 33530834.0 0.7 9.3 for k in range(0, d_num): 178 37425000 39827594.0 1.1 11.1 rij[k] = pos[k,i] - pos[k,j] 179 # end for 180 181 # Compute D and D2, a distance and a truncated distance 182 12475000 7182783.0 0.6 2.0 d = 0.0 183 49900000 33037923.0 0.7 9.2 for k in range(0, d_num): 184 37425000 39131501.0 1.0 10.9 d = d + rij[k] ** 2 185 # end for 186 12475000 25236413.0 2.0 7.0 d = np.sqrt(d) 187 12475000 13375864.0 1.1 3.7 d2 = min(d, np.pi / 2.0) 188 189 # Attribute half of the total potential energy to particle J 190 12475000 31104186.0 2.5 8.7 potential = potential + 0.5 * np.sin(d2) * np.sin(d2) 191 192 # Add particle J's contribution to the force on particle I. 193 49900000 34782251.0 0.7 9.7 for k in range(0, d_num): 194 37425000 86664317.0 2.3 24.2 force[k,i] = force[k,i] - rij[k] * np.sin(2.0 * d2) / d 195 # end for 196 # end if 197 198 # end for 199 # end for 200 201 # Compute the kinetic energy 202 50 33.0 0.7 0.0 kinetic = 0.0 203 200 147.0 0.7 0.0 for k in range(0, d_num): 204 75150 44048.0 0.6 0.0 for j in range(0, p_num): 205 75000 78144.0 1.0 0.0 kinetic = kinetic + vel[k,j] ** 2 206 # end for 207 # end for 208 209 50 47.0 0.9 0.0 kinetic = 0.5 * mass * kinetic 210 211 50 37.0 0.7 0.0 return force, kinetic, potential
If you look at this routine in the md_003.py
file, I'm sure you can find some optimizations that would improve performance.
Using Numba
As part of the profiling, I moved parts of the code to functions. With Python, this allowed me to perform deterministic profiling without resulting in a long run time. Personally, I like deterministic profiling better than statistical because I don't have to find the time interval that results in a good profile.
Putting parts of the code in functions provides a good starting point for using Numba. I described the use of Numba in a previous high-performance Python article [2]. I made a few more changes to the last version of the code (md_003.py
) and, because the properties
routine took a majority of the run time, I targeted this routine with Numba and simply used jit
to compile the code.
The original code, before using Numba, was around 140 seconds on my laptop. After using Numba and running on all eight cores of my laptop (four "real" cores and four hyper-threading (HT) cores), it ran in about 3.6 seconds. I would call that a success.
« Previous 1 2 3 4 Next »
Buy this article as PDF
(incl. VAT)