Rational Girl

Attempting to be rational while dreaming of 3.141592653589793...

masked arrays

Masked arrays can be handy

  • to isolate important data (grab ROI from a dataset, eg 4D fMRI)
  • or remove invalid entries from a data set ( missing or NAN )

Methods not using maskedarray

Find indicies matching logical condition (find all the values in a random matrix > 0.3)

In [1]: import numpy as np
In [2]: data = np.random.random((2, 3, 4))
In [3]: ind = np.where(data < 0.3)

In [5]: ind
Out[5]:
(array([0, 0, 1, 1, 1, 1, 1]),
 array([0, 1, 0, 0, 1, 1, 2]),
 array([0, 2, 0, 2, 0, 1, 1]))

This can give you a flat array of this data

In [10]: data[ind]
Out[10]:
array([ 0.22709003,  0.05852228,  0.09808892,  0.02375857,  0.20972922,
        0.00940121,  0.04954049])

This is another way to quickly get that flat array

In [11]: sample = data[data < .3]

In [12]: sample
Out[12]:
array([ 0.22709003,  0.05852228,  0.09808892,  0.02375857,  0.20972922,
        0.00940121,  0.04954049])

In [13]: np.allclose(sample, data[ind])
Out[13]: True

But sometimes we want a slightly different behavior, this is where masked arrays turn out to be quite handy.

However, when working with a masked array, setting ismasked is a logical True, so we can get unexpected behavior.

In [14]: import numpy.ma as ma
In [16]: mdata = ma.MaskedArray(data, data < .3)

[19]: mdata.data
Out[19]:
array([[[ 0.22709003,  0.45430854,  0.79467758,  0.93408729],
        [ 0.96287444,  0.35465963,  0.05852228,  0.78461907],
        [ 0.45160738,  0.93019333,  0.33235178,  0.55158758]],
       [[ 0.09808892,  0.63505732,  0.02375857,  0.50787565],
        [ 0.20972922,  0.00940121,  0.87705912,  0.6506519 ],
        [ 0.88436047,  0.04954049,  0.42319976,  0.42837593]]])
In [24]: mdata.data.shape
Out[24]: (2, 3, 4)

In [25]: mdata.min()
Out[25]: 0.33235177579218356

Note that we have the values that are > 0.3, not < 0.3. This is because the mask puts True values where the condition we pass is True, so to get the same behavior as above, we pass the negation of our logical mask.

In [40]: mdata = ma.MaskedArray(data, (data < .3) == False )

In [41]: mdata.min()
Out[41]: 0.0094012111913013285

In [42]: mdata.max()
Out[42]: 0.22709002636526754

The beautiful thing about this is you can work on the data in the array, without reshaping, or dealing with mapping a flat array back to a nd array....which turns out to be useful.

But if you do need the masked data, it is also simple

In [45]: mdata.compressed()
Out[45]:
array([ 0.88658832,  0.61704277,  0.82931015,  0.52735743,  0.96018349,
        0.43557534,  0.52140037,  0.74956647,  0.33020142,  0.35757183,
        0.94524188,  0.39382885,  0.67163231,  0.54989501,  0.82149955,
        0.7622946 ,  0.81627146,  0.97645765,  0.87780762,  0.84369126])

Though do not mistake this with the compress method which returns a MaskedArray. This requires a flat mask, but you can pass a new mask to the data, which will ignore the old mask and return values in the new mask. NOTE: it will only return the data in the mask, and by default, its mask is all False.

In [17]: simple = MaskedArray([1,2,3,4], [0,0,1,1])

In [18]: simple.data
Out[18]: array([1, 2, 3, 4])

In [19]: simple.compressed()
Out[19]: array([1, 2])

In [21]: newsimple = simple.compress([1,1,0,0])

In [22]: newsimple.data
Out[22]: array([1, 2])

In [23]: newsimple.mask
Out[23]: array([False, False], dtype=bool)