Rational Girl

Attempting to be rational while dreaming of 3.141592653589793...

profiling code

Ipython is my favorite interactive environment. So lets set up profiling in Ipython. You need to install three key packages (Im assuming you have ipython installed, this example is based on ipython 0.13). For each package I show three possible ways to install

  • using easy_install on a machine where you have sudo
  • using easy_install on a machine you do NOT have sudo, into your own local directory
  • using pip

ipython version 0.13

install line_profile

http://pythonhosted.org/line_profiler/

sudo easy_install line_profiler

easy_install --prefix /path/to/users/local line_profiler

pip install line_profiler

install psutil

https://code.google.com/p/psutil/

sudo easy_install psutil

easy_install --prefix /path/to/users/local psutil

pip install psutil

memory_profiler

https://pypi.python.org/pypi/memory_profiler

sudo easy_install memory_profiler

easy_install --prefix /path/to/users/local memory_profiler

pip install memory_profiler

Create ipython profile

http://ipython.org/ipython-doc/dev/config/overview.html

To generate the default configuration files, do:

$> ipython profile create profiler

$> mkdir ~/.config/ipython/profile_profiler/extensions

ipython profile create profiler

creates ipython_config.py in ~/.config/ipython/profile_profiler

The location may be different for you, but when you call the command it tells you where it put the new profiler directory (cause thats the right thing to do)...

If you dont see this you can locate your ipython config directory which will contain your profiles

ipython locate

in the ~/.config/ipython/profile_profiler/extensions directory create two new files

line_profiler_ext.py

import line_profiler

def load_ipython_extension(ip):
    ip.define_magic('lprun', line_profiler.magic_lprun)

memory_profiler_ext.py

import memory_profiler

def load_ipython_extension(ip):
    ip.define_magic('memit', memory_profiler.magic_memit)
    ip.define_magic('mprun', memory_profiler.magic_mprun)

Next update ipython_config.py to use these extensions

c.InteractiveShellApp.extensions = [
    'line_profiler_ext',
    'memory_profiler_ext'
    ]

c.TerminalIPythonApp.extensions = [
    'line_profiler_ext',
    'memory_profiler_ext'
    ]

Now you can call your profiler profile

ipython --profile=profiler

Python 2.7.3 |EPD 7.3-2 (64-bit)| (default, Apr 11 2012, 17:52:16)
Type "copyright", "credits" or "license" for more information.

IPython 0.13.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

IPython profile: profiler

In [1]: %memit?
In [2]: %lprun?
In [3]: %timeit?

Lets look at a simple example profiling MaskedArray methods

Time Profile

In [11]: from numpy.ma import MaskedArray
In [12]: jnk = np.random.random(100)
In [13]: jnk = MaskedArray(jnk, jnk>.6)

In [14]: %timeit jnk.mean()
10000 loops, best of 3: 29.1 us per loop

In [15]: %timeit np.mean(jnk[jnk <= .6])
10000 loops, best of 3: 150 us per loop

Line Profile

Usually you run this on a file to profile the code line by line

%lprun -f <filename>.py <method>

Memory Profile

%memit -r <R> statement

this gets the max memory usage over 100 loops

In [22]: %memit -r 100 jnk.mean()
maximum of 100: 50.273438 MB per loop

In [23]: %memit -r 100 np.mean(jnk[jnk <= .6])
maximum of 10: 50.273438 MB per loop

Errors Cluster

Software Facts and Fallacies

Robert Glass

I recently did a quick slide stack looking into Fact 49: Errors Cluster

madison_fact49.pdf

Loved the book, but was a little light on research. Luckily I was pointed to this book.

` Making Software: What Really Works, and Why We Believe It by Oram and Wilson http://www.amazon.com/Making-Software-Really-Works-Believe/dp/0596808321/

masked arrays

Masked arrays can be handy

  • to isolate important data (grab ROI from a dataset, eg 4D fMRI)
  • or remove invalid entries from a data set ( missing or NAN )

Methods not using maskedarray

Find indicies matching logical condition (find all the values in a random matrix > 0.3)

In [1]: import numpy as np
In [2]: data = np.random.random((2, 3, 4))
In [3]: ind = np.where(data < 0.3)

In [5]: ind
Out[5]:
(array([0, 0, 1, 1, 1, 1, 1]),
 array([0, 1, 0, 0, 1, 1, 2]),
 array([0, 2, 0, 2, 0, 1, 1]))

This can give you a flat array of this data

In [10]: data[ind]
Out[10]:
array([ 0.22709003,  0.05852228,  0.09808892,  0.02375857,  0.20972922,
        0.00940121,  0.04954049])

This is another way to quickly get that flat array

In [11]: sample = data[data < .3]

In [12]: sample
Out[12]:
array([ 0.22709003,  0.05852228,  0.09808892,  0.02375857,  0.20972922,
        0.00940121,  0.04954049])

In [13]: np.allclose(sample, data[ind])
Out[13]: True

But sometimes we want a slightly different behavior, this is where masked arrays turn out to be quite handy.

However, when working with a masked array, setting ismasked is a logical True, so we can get unexpected behavior.

In [14]: import numpy.ma as ma
In [16]: mdata = ma.MaskedArray(data, data < .3)

[19]: mdata.data
Out[19]:
array([[[ 0.22709003,  0.45430854,  0.79467758,  0.93408729],
        [ 0.96287444,  0.35465963,  0.05852228,  0.78461907],
        [ 0.45160738,  0.93019333,  0.33235178,  0.55158758]],
       [[ 0.09808892,  0.63505732,  0.02375857,  0.50787565],
        [ 0.20972922,  0.00940121,  0.87705912,  0.6506519 ],
        [ 0.88436047,  0.04954049,  0.42319976,  0.42837593]]])
In [24]: mdata.data.shape
Out[24]: (2, 3, 4)

In [25]: mdata.min()
Out[25]: 0.33235177579218356

Note that we have the values that are > 0.3, not < 0.3. This is because the mask puts True values where the condition we pass is True, so to get the same behavior as above, we pass the negation of our logical mask.

In [40]: mdata = ma.MaskedArray(data, (data < .3) == False )

In [41]: mdata.min()
Out[41]: 0.0094012111913013285

In [42]: mdata.max()
Out[42]: 0.22709002636526754

The beautiful thing about this is you can work on the data in the array, without reshaping, or dealing with mapping a flat array back to a nd array....which turns out to be useful.

But if you do need the masked data, it is also simple

In [45]: mdata.compressed()
Out[45]:
array([ 0.88658832,  0.61704277,  0.82931015,  0.52735743,  0.96018349,
        0.43557534,  0.52140037,  0.74956647,  0.33020142,  0.35757183,
        0.94524188,  0.39382885,  0.67163231,  0.54989501,  0.82149955,
        0.7622946 ,  0.81627146,  0.97645765,  0.87780762,  0.84369126])

Though do not mistake this with the compress method which returns a MaskedArray. This requires a flat mask, but you can pass a new mask to the data, which will ignore the old mask and return values in the new mask. NOTE: it will only return the data in the mask, and by default, its mask is all False.

In [17]: simple = MaskedArray([1,2,3,4], [0,0,1,1])

In [18]: simple.data
Out[18]: array([1, 2, 3, 4])

In [19]: simple.compressed()
Out[19]: array([1, 2])

In [21]: newsimple = simple.compress([1,1,0,0])

In [22]: newsimple.data
Out[22]: array([1, 2])

In [23]: newsimple.mask
Out[23]: array([False, False], dtype=bool)