Think you python code is slow?

Learn how to measure and improve the performance

Profiling

  • Profiling is form of a dynamic analysis that measures the various aspects of the program like the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls

 

  • That means measuring the time spent in each of its functions

 

  • This will give you data about where your program is spending time, and what area might be worth optimising

Profilers in python

  • cProfile and profile (python standard library modules) are generally used for determining the time that takes every Python function when running the code

 

  • line_profiler (3rd party open source library) is a module for doing line-by-line profiling of functions

 

  • memory_profiler (3rd party open source library) is a module for monitoring memory consumption of a process as well as line-by-line analysis

cProfile

Profiling code with cProfile is really quite easy. All you need to do is import the module and call its run function

import cProfile
#cProfile is recommended for most users
#It’s a C extension with reasonable overhead
#that makes it suitable for profiling
#long-running programs

def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

def fib_seq(n):
    seq = [ ]
    if n > 0:
        seq.extend(fib_seq(n-1))
    seq.append(fib(n))
    return seq

print '=' * 80
cProfile.run('print fib_seq(25); print')
==========================================================================================
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

         57355 function calls (65 primitive calls) in 0.017 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.017    0.017 <string>:1(<module>)
     21/1    0.000    0.000    0.017    0.017 fib.py:11(fib_seq)
 57291/21    0.016    0.000    0.016    0.001 fib.py:3(fib)
       21    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       20    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
The line after the output indicates that 57355 calls were monitored. Of those calls, 65 were primitive, meaning that the call was not induced via recursion

The next line: Ordered by: standard name, indicates that the text string in the far right column was used to sort the output

ncalls - the number of calls

tottime - the total time spent in the given function (and excluding time made in calls to sub-functions)

percall - is the quotient of tottime divided by ncalls
cumtime - is the cumulative time spent in this and all subfunctions (from invocation till exit)
percall - is the quotient of cumtime divided by primitive calls

filename:lineno(function) - provides the respective data of each function

When there are two numbers in the first column (for example 21/1), it means that the function recursed. The second value is the number of primitive calls and the former is the total number of calls. Note that when the function does not recurse, these two values are the same, and only the single figure is printed.

profile

Running profile is the same as cProfile, except that it adds significant overhead to profiled programs

import profile
#A pure Python module whose interface is imitated by cProfile
#but which adds significant overhead to profiled programs
#If you’re trying to extend the profiler in some way
#the task might be easier with this module

def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

def fib_seq(n):
    seq = [ ]
    if n > 0:
        seq.extend(fib_seq(n-1))
    seq.append(fib(n))
    return seq

print '=' * 80
profile.run('print fib_seq(25); print')
==========================================================================================
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

         57356 function calls (66 primitive calls) in 0.187 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       21    0.000    0.000    0.000    0.000 :0(append)
       20    0.000    0.000    0.000    0.000 :0(extend)
        1    0.001    0.001    0.001    0.001 :0(setprofile)
        1    0.000    0.000    0.186    0.186 <string>:1(<module>)
     21/1    0.000    0.000    0.186    0.186 fib.py:11(fib_seq)
 57291/21    0.185    0.000    0.185    0.009 fib.py:3(fib)
        1    0.000    0.000    0.187    0.187 profile:0(print fib_seq(20); print)
        0    0.000             0.000          profile:0(profiler)

Same program when profiled using cProfile took 17ms where as profile took 187ms

DEMO

line_profiler

  • line_profiler will profile the time individual lines of code take to execute

 

  • The profiler is implemented in C via Cython in order to reduce the overhead of profiling

 

  • Also included is the script kernprof.py which can be used to conveniently profile Python applications and scripts either with line_profiler or with the function-level profiling tools in the python standard library
  • LineProfiler can be given functions to profile, and it will time the execution of each individual line inside those functions

 

  • It does need to be explicitly told what functions to profile. The easiest way to get started is to use the kernprof script kernprof -l script_to_profile.py

 

  • kernprof will create an instance of LineProfiler and insert it into the __builtins__ namespace with the name profile. It has been written to be used as a decorator, so in your script, you decorate the functions you want to profile with @profile.
@profile
def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

@profile
def fib_seq(n):
    seq = [ ]
    if n > 0:
        seq.extend(fib_seq(n-1))
    seq.append(fib(n))
    return seq

print '=' * 90
print fib_seq(20)
kernprof -l fib.py

To view the output

kernprof -l -v fib.py
python -m line_profiler fib.py.lprof

(or)

==========================================================================================
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
Wrote profile results to fib.py.lprof
Timer unit: 1e-06 s

Total time: 0.073717 s
File: fib.py
Function: fib at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           @profile
     2                                           def fib(n):
     3     57291        18929      0.3     25.7      if n == 0:
     4     10946         3258      0.3      4.4          return 0
     5     46345        15603      0.3     21.2      elif n == 1:
     6     17710         5398      0.3      7.3          return 1
     7                                               else:
     8     28635        30529      1.1     41.4          return fib(n-1) + fib(n-2)

Total time: 0.192911 s
File: fib.py
Function: fib_seq at line 10

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    10                                           @profile
    11                                           def fib_seq(n):
    12        21           14      0.7      0.0      seq = [ ]
    13        21           11      0.5      0.0      if n > 0:
    14        20           60      3.0      0.0          seq.extend(fib_seq(n-1))
    15        21       192820   9181.9    100.0      seq.append(fib(n))
    16        21            6      0.3      0.0      return seq
Line #: The line number in the file

Hits: The number of times that line was executed

Time: The total amount of time spent executing the line in the timer's units. In the header information before the tables, you will see a line "Timer unit:" giving the conversion factor to seconds
Per Hit: The average amount of time spent executing the line once in the timer's units

% Time: The percentage of time spent on that line relative to the total amount of recorded time spent in the function

Line Contents: The actual source code. Note that this is always read from disk when the formatted results are viewed, not when the code was executed.

DEMO

memory_profiler

  • python module for monitoring memory consumption of a process as well as line-by-line analysis of memory consumption for python programs

 

  • It is a pure python module and has the psutil module as optional (but highly recommended) dependencies

 

  • Also included is the script mprof which gives full memory usage reports as a function of time (not line-by-line) of external processes (be it python scripts or not)

==========================================================================================
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
Filename: fib.py

Line #    Mem usage    Increment   Line Contents
================================================
     1   17.332 MiB    0.000 MiB   @profile
     2                             def fib(n):
     3   17.332 MiB    0.000 MiB       if n == 0:
     4   17.332 MiB    0.000 MiB           return 0
     5   17.332 MiB    0.000 MiB       elif n == 1:
     6   17.332 MiB    0.000 MiB           return 1
     7                                 else:
     8   17.332 MiB    0.000 MiB           return fib(n-1) + fib(n-2)


Filename: fib.py

Line #    Mem usage    Increment   Line Contents
================================================
    10   17.324 MiB    0.000 MiB   @profile
    11                             def fib_seq(n):
    12   17.324 MiB    0.000 MiB       seq = [ ]
    13   17.324 MiB    0.000 MiB       if n > 0:
    14   17.332 MiB    0.008 MiB           seq.extend(fib_seq(n-1))
    15   17.332 MiB    0.000 MiB       seq.append(fib(n))
    16   17.332 MiB    0.000 MiB       return seq

Generic timing with time

A quick and dirty method of timing python code - the good old unix utility time

def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

def fib_seq(n):
    seq = [ ]
    if n > 0:
        seq.extend(fib_seq(n-1))
    seq.append(fib(n))
    return seq

print '=' * 90
print fib_seq(20)
time python fib.py
==========================================================================================
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
python fib.py
0.02s user
0.01s system
0.034 total
total is wall clock time - time from start to finish of the call. This is all elapsed time including time slices used by other processes and time the process spends blocked

user is the amount of CPU time spent in user-mode code (outside the kernel) within the process. This is only actual CPU time used in executing the process

system is the amount of CPU time spent in the kernel within the process. This means executing CPU time spent in system calls within the kernel, as opposed to library code, which is still running in user-space. Like 'user', this is only CPU time used by the process

Elegant timing with a timing context manager

import time

class Timer(object):
    def __init__(self, verbose=False):
        self.verbose = verbose

    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        self.end = time.time()
        self.secs = self.end - self.start
        self.msecs = self.secs * 1000
        if self.verbose:
            print 'elapsed time: %f ms' % self.msecs
==========================================================================================
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
Time taken for fib_seq 0.00767183303833 s
To use it, wrap blocks of code that you want to time with Python’s with keyword and this Timer context manager. It will take care of starting the timer when your code block begins execution and stopping the timer when your code block ends.
from timer import Timer

def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

def fib_seq(n):
    seq = [ ]
    if n > 0:
        seq.extend(fib_seq(n-1))
    seq.append(fib(n))
    return seq

print '=' * 90
with Timer() as t:
    print fib_seq(20)
print "Time taken %s s" % t.secs

Think you python code is slow?

By ambi

Think you python code is slow?

  • 1,142