Profiling Python Code

Article Info

Contributed by
2 authors

Last updated on
2019-03-20 14:08:37

Code Profiling
Python
Python for Scientific Computing
Algorithmic Complexity
Static Code Analysis
Software Engineering

Article Versions

5 2019-03-20 14:08:37
1266,451 5,1266

By arvindpdmn

Minor corrections. Updated See Also.
4 2018-05-08 09:07:55
451,449 4,451

By arvindpdmn

Added image. Added milestones. Improved answers based on info in Python Docs.
3 2018-05-08 04:48:04
449,428 3,449

By gurumoorthyP

Adding main content. Unable to find suitable milestones.
2 2018-04-24 03:07:50
428,327 2,428

By gurumoorthyP

Identified questions for this topic.
1 2018-03-26 13:10:59
1,327

By arvindpdmn

Request for content.

Chat Room

Submitting ...

You are editing an existing chat message.
2019-03-20 14:05:48
-
1

By arvindpdmn

First point can be ignored for now: fixed and will be available in next version.
Second point is valid.
2019-03-20 12:47:10
-

By learnbyexample

* second para of first discussion question has formatting issue for **Benchmarking**

* third discussion, "classify profilers are function profilers" here 'are' should be 'as' I think.. suggestion: format the three profile names in this sentence as bold/italic

Python documentation defines a profile as "a set of statistics that describes how often and for how long various parts of the program executed." In addition to measuring time, profiling can also tell us about memory usage.

The idea of profiling code is to identify bottlenecks in performance. It may be tempting to guess where the bottlenecks could be but profiling is more objective and quantitative. Profiling is a necessary step before attempting to optimize any program. Profiling can lead to restructuring code, implementing better algorithms, releasing unused memory, caching computational results, improving data handling, etc.

Discussion

What are the Python packages or modules that help with profiling?
Comparison of some Python profilers. Source: Intel 2017, slide 10.
Python's standard distribution includes profiling modules by default: cProfile and profile. profile is a pure Python module that adds significant overhead. Hence cProfile is preferred, which is implemented in C. Results that come out from this module can be formatted into reports using the pstats module.
cProfile adds a reasonable overhead. Nonetheless, it must be used only for profiling and not for benchmarking. Benchmarking compares performance between different implementations on possibly different platforms. Because of the overhead introduced by cProfile, to benchmark C code and Python code, use %time magic command in IPython instead. Alternatively, we can use the timeit module.
To trace memory leaks, tracemalloc can be used.
Other useful modules to install include line_profiler, memory_profiler and psutil. The module pprofile takes longer to profile but gives more detailed information than line_profiler. For memory profiling, heapy and meliea may also help. PyCounters can be useful in production.
Are there visualization tools to simplify the analysis of profiler output?
Call graph as seen in KCachegrind. Source: Danjou 2015.
vprof is a profiler that displays the output as a webpage. It launches a Node.js server, implying that Node.js must be installed. On the webpage, you can hover over elements to get more information interactively.
The output of cProfile can be saved to a file and then converted using pyprof2calltree. This converted file can then be opened with KCachegrind (for Linux) and QCachegrind (for MAC and Windows). Likewise, gprof2dot and RunSnakeRun are alternative graphical viewers.
PyCallGraph profiles and outputs the statistics in a format that can be opened by Graphviz, a graph visualization software.
nylas-perftools adds instrumentation around code, profile it and export the results in JSON format. This can be imported into Chrome Developer Tools to visualize the timeline of execution. This can also be used in production since the app stack is only sampled periodically.
What are the types of profilers?
Three types of profilers exist:
- Event-based: These collect data when an event occurs. Such events may be entry/exit of functions, load/unload of classes, allocation/release of memory, exceptions, etc. cProfile is an example.
- Instrumented: The application profiles itself by modifying the code or with built-in support from compiler. Using decorators to profile code is an example of this. However, instrumented code is not required in Python because the interpreter provides hooks to do profiling.
- Statistical: Data is collected periodically and therefore what the profiler sees is a sample. While less accurate, this has low overhead. Examples of this include Intel® VTune™ Amplifier, Nylas Perftools and Pyflame. On the contrary, we can also say that cProfile, line_profiler and memory_profiler are Deterministic Profilers.
In general, deterministic profiling adds significant overhead whereas statistical profiling has low overhead since profiling is done by sampling. For this reason, the latter can also be used in production. We should note that the profiling overhead in Python is mostly due to the interpreter: overhead due to deterministic profiling is not that expensive.
We can also classify profilers as function profilers, line profilers or memory profilers.
What are the IPython magic commands that help with profiling?
IPython has the following magic commands for profiling:
- %time: Shows how long a one or more lines of code take to run.
- %timeit: Like %time but gives an average from multiple runs. Option -n can be used to specify the number of runs. Depending on how long the program takes, the number of runs is limited automatically. This is unlike the timeit module.
- %prun: Shows time taken by each function.
- %lprun: Shows time taken line by line. Functions to profile can be specified with -f option.
- %mprun: Shows how much memory is used.
- %memit: Like %mprun but gives an average from multiple runs, which can be specified with -r option.
Commands %time and %timeit are available by default. Commands %lprun, %mprun and %memit are available via modules line-profiler, memory-profiler and psutil. But to use them within IPython as magic commands, mapping must be done via IPython extension files. Or use the %load_ext magic command.
When timing multiple lines of code, use %%timeit instead of %timeit. This is available for %prun as well.
How to interpret the output of cProfile?
A sample output from cProfile. Source: Danjou 2015.
The output of cProfile can be ordered using the -s option. In the accompanying image, we see that the results are ordered by cumulative time in descending order. Everything starts at the module level from where main() function is called. We can also see that most of this is from api.py:request(), sessions.py:request() and sessions.py:send(). However, we are unable to tell if these are part of the same call flow or parallel flows. A graphical viewer will be more useful.
The column "tottime" is the time spent within a function without considering time spent in functions called within. We also see that readline() function from socket.py is called many times. It takes 1.571 seconds of 4.481 seconds of cumulative time. However, the total time within this function is 0. This means that we need to optimize functions that are called by readline(). We suspect it might be the read operation of SSL Socket. The entry "4/3" indicates that sessions.py:send() is called 3 times plus 1 recursive call.
What should I look for when analyzing any profiling output?
You should identify which function (or lines of code) is taking most of the execution time. You should identify which function is getting called most often. Look for call stacks that you didn't expect. Know the difference between total time spent in function and cumulative time. The latter enables direct comparison of recursive implementations against iterative ones.
With respect to memory usage, see if there's a memory leak.
Could you share some tips for optimizing profiled code?
Keep in mind that speeding up a function 100 times is irrelevant if that function takes up only a few percent of the program's total execution. This is the essence of profiling. Optimize only what matters.
If profiling shows that I/O is a bottleneck, threading can help. If your code uses a lot of regex, try to replace these with string equivalents, which will run faster. To avoid repeat computations, earlier results can be cached. This is called memoization. Where applicable, this can help in improving performance. If there are no obvious places to optimize the code, you can also consider an alternative runtime such as PyPy or moving critical parts of the code into Cython or C and calling the same from Python. Likewise, vectorize some operations using NumPy. Consider moving stuff from inner loops. Remove logging or make them conditional. If a particular function is called many times, code could be restructured.
Whether during profiling or after optimizing your code, don't ignore your unit tests. They must continue to pass. Also, keep in mind that optimizing code can compromise on readability and maintainability.
Are there Python IDEs that come with profilers?
With IPython, Jupyter and Anaconda's Spyder, %time and %timeit can be used by default. For line profiling or memory profiling, the necessary modules need to be installed before invoking them.
PyCharm Community Edition does not support profiling but the professional version supports it. In fact, cProfile, yappi and vmprof are three profilers that are supported. vmprof is a statistical profiler.
Visual Studio comes with a built-in profiler. However, this works only for CPython and not for IronPython for which .NET profiler should be used.
What's the decorator approach to profiling Python code?
It's possible to decorate a function for profiling purpose. We can timestamp the entries and exits and thereby calculate time spent within the function. The advantage of this approach is that there's no dependence on any profiling tool. Overhead can be kept low. We can choose to profile only specific parts of our program and ignore core modules and third-party modules. We'll also have better control of how results will be exported.
What's Pyflame and why may I want to use it?
A flame graph from snapshots of Python call stack. Source: Uber GitHub 2018.
Pyflame is a profiler for Python that takes snapshots of the Python call stack. From these, it generates flame graphs. Pyflame is based on Linux's ptrace. Hence it can't be used on Windows systems. Pyflame overcomes the limitations of cProfile, which adds overhead and doesn't give a full stack trace. Pyflame also works with code not instrumented for profiling.
Because Pyflame is statistical in nature, meaning that it doesn't profile every since function call, it can also be used in production.

Milestones

Apr
1992

In CPython implementation, first code commit is made for profile module.

Jun
1994

In CPython implementation, first code commit is made for pstats module.

Feb
2006

In CPython implementation, first code commit is made for cProfile module.

Sep
2008

Initial code commit for line_profiler is done by Robert Kern. Version 2.1 of this module is released in December 2017.

Sep
2016

First version of Pyflame is released by the engineering team at Uber.

Sample Code

python

# Source: https://jakevdp.github.io/PythonDataScienceHandbook/01.07-timing-and-profiling.html
 
# Following is within IPython environment
 
# Sorting an unsorted list
# %timeit will not be a good choice since with the 
# second iteration the list is already sorted
import random
L = [random.random() for i in range(100000)]
%timeit L.sort()
 
# Use %time instead to execute only once
import random
L = [random.random() for i in range(100000)]
%time L.sort()
 
# Sorting an already sorted list takes far less time
%time L.sort()
 
# An example showing how to profile multiple lines of code using %%timeit
%%timeit
total = 0
for i in range(1000):
    for j in range(1000):
        total += i * (-1) ** j
 
# Profile a function sum_of_lists()
%prun sum_of_lists(1000000)
 
# Load line_profiler after installing it via "pip install line_profiler"
%load_ext line_profiler
 
# Profile function line by line
%lprun -f sum_of_lists sum_of_lists(5000)
 
# Profile a script using cProfile
%run -m cProfile myscript.py
 
# Profile a script using a variant of %prun
%run -p myscript.py
 
# Show total time but not the details
%run -t myscript.py
 
 
# Source: https://cyrille.rossant.net/profiling-and-optimizing-python-code/
# Profile using cProfile and output to prof
python -m cProfile -o prof myscript.py
 
# Showing how to use pstats
import pstats
pstats.Stats('prof').strip_dirs().sort_stats("cumulative").print_stats()
 
 
# Source: https://pythonfiles.wordpress.com/2017/06/01/hunting-performance-in-python-code-part-3/
# Use of vprof
vprof -c h myscript.py # heatmap
vprof -c p myscript.py # code profiling
 
# Use of vprof for PyPy
pypy -m vprof -c h myscript.py
pypy -m vprof -c p myscript.py
 
# Profile and show in GraphViz
pycallgraph graphviz -- python myscript.py

References

Article Stats

1601

Words

Authors

Edits

Chats

Likes

15K

Hits

Cite As

Devopedia. 2019. "Profiling Python Code." Version 5, March 20. Accessed 2023-11-12. https://devopedia.org/profiling-python-code

Contributed by
2 authors

Last updated on
2019-03-20 14:08:37

languages optimization performance

Code Profiling
Python
Python for Scientific Computing
Algorithmic Complexity
Static Code Analysis
Software Engineering

Profiling Python Code

Discussion

Milestones

Sample Code

References

Further Reading

Article Stats

Author-wise Stats for Article Edits

Cite As

See Also

Login