NumPy - Python's missing array class

For the SLCPy Meetup

 

Dylan Gregersen

astrodsg.github.io

NumPy

NumPy is a fantastic package to have in your programming toolbox

Where Python falls short on performance, NumPy steps in to provide C/Fortran speeds

What is NumPy?

NumPy provides a powerful N-dimensional array class

import numpy as np
a = np.array([1,2,3,4])

Fast C and Fortan speed operations

a = np.array([1,2,3])
np.sum(a)

Use NumPy when you have large collections of regular data

Talk Outline

I'm going to present features of NumPy and include many examples

 

Then discuss integrations of NumPy with packages like SciPy, Matplotlib, and Pandas

Ask questions along the way!

np.array

Numpy's array class is powerful!

>>> import numpy as np
>>> a = np.array([-3,5,1,4])
>>> a[0]
-3

>>> a[:2]  # supports slicing like lists
array([-3, 5])

>>> b = np.array([[3,2,4], [2,1,5]])  # 2x3 array
>>> b
array([[3, 2, 4],
       [2, 1, 5]])


>>> b.shape
(2, 3)

Similar to a python list which is an ordered collection of data

shape explains the dimensionality

np.array : <feature>

Features of numpy's array class

np.array : dtype

NumPy arrays are typed

 

Typing is part of what gives NumPy it's speed

 

The any array generation function will accept dtype=? and you can specify a type

 

also an astype method which allows you to cast

 

>>> a = np.array([1,2,3], dtype=float)
>>> a
array([ 1.,  2.,  3.])

>>> a.dtype
dtype('float64')

>>> a.astype(str)
array(['1.0', '2.0', '3.0'], dtype='|S32')


>>> np.array([True, False, False])
array([ True, False, False], dtype=bool)

>>> np.array([{1:2}, {3:4}])
array([{1: 2}, {3: 4}], dtype=object)


>>> np.array(['s1', 50, 'longer_string'])
array(['s1', '50', 'longer_string'], dtype='|S13')

>>> np.array(['5', 50, '100'], dtype=int)
array([  5,  50, 100])

>>> np.array(['5', 50, 'fail'], dtype=int)
ValueError: invalid literal for long() with base 10: 'fail'

np.array : slicing

>>> a = np.array([array to the right])

>>> a[0, 3:5]

array([3, 4])

>>> a[ :, 2 ]

array([2, 12, 22, 32, 43, 52])

>>> a[4:, 4:]

array([[44, 45],

[54, 55]])

>>> a[ ::-1, 2 ]  #  reversed

array([52, 42, 32, 22, 12, 2])

>>> a[2::2, 2::2]

array([[22, 24],
       [42, 44]])

 = np.arange(0,60).reshape((6, -1))[:,:6]

np.array : element-wise operations

An operation is performed on each element of the array not the entire array

>>> a = np.array([2,3,5])
>>> a*2
array([4,6,10])

>>> a+5
array([7,8,10])

np.array : shape and reshaping

Each array has a shape

 

shapes can be modified to produce new arrays

>>> a = np.array([[1, 2, 3, 4], 
                  [10, 12, 13, 14], 
                  [20, 22, 23, 24]
                  ])
>>> a
array([[ 1,  2,  3,  4],
       [10, 12, 13, 14],
       [20, 22, 23, 24]])

>>> a.shape
(3, 4)

>>> a.size
12

>>> a.reshape((4,3))
array([[ 1,  2,  3],
       [ 4, 10, 12],
       [13, 14, 20],
       [22, 23, 24]])

>>> a.reshape(2, -1)
array([[ 1,  2,  3,  4, 10, 12],
       [13, 14, 20, 22, 23, 24]])

>>> a.ravel()
array([ 1,  2,  3,  4, 10, 12, 
       13, 14, 20, 22, 23, 24])

np.array : Transpose

For a multidimensional array you can transpose the elements (e.g. (i,j) to (j,i))

>>> a = np.array([[ 0,  1,  2,  3,  4],
                  [ 5,  6,  7,  8,  9],
                  [10, 11, 12, 13, 14],
                  [15, 16, 17, 18, 19]])
>>> a.T
array([[ 0,  5, 10, 15],
       [ 1,  6, 11, 16],
       [ 2,  7, 12, 17],
       [ 3,  8, 13, 18],
       [ 4,  9, 14, 19]])

Array Generation

  1. arange -- much like python's range
  2. linspace -- break up range into number of elements
  3. ones, zeros -- creates an array of ones or zeros
  4. repeat -- an element that number of times
  5. empty -- empty array
  6. ?_like -- a few like zeros_like which create from another
>>> np.arange(0, 6, 1.5)             # 1
 array([ 0. ,  1.5,  3. ,  4.5])

>>> np.linspace(0, 6, 4)             # 2
 array([ 0.,  2.,  4.,  6.])

>>> np.ones(5)                       # 3
 array([ 1.,  1.,  1.,  1.,  1.])

>>> np.zeros((3,3))                  # 3 with shape
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

>>> np.repeat(3.14, 3)               # 4
 array([ 3.14,  3.14,  3.14])

>>> np.empty((2,2))                  # 5
array([[ 0.,  0.],
       [ 0.,  0.]])

>>> np.zeros_like(np.arange(5))      # 6
array([0, 0, 0, 0, 0])

Fancy Indexing

In addition to slicing you can do fancy indexing

 

This includes using a list of indices or an boolean array

 

These create views of the data which can be assigned to

>>> a = np.array([2, 4, 6, 8, 10, 12, 14])
>>> a[[0,3,5]]
array([2, 8, 12])

>>> idx = [0, 3, 5]
>>> a[idx]
array([2, 8, 12])

>>> mask = np.array([True, False, False, True, True, False, True])
>>> a[mask]
array([2, 8, 10, 14])

>>> a[mask] = 0
array([0, 4, 6, 0, 0, 12, 0]) 

Masking

Can create masks of the data based on logic

 

supports logic operations with &(and), |(or), ~(not)

np.logical_xor

 

>>> a = np.arange(0,20).reshape(5, 4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

>>> mask_1 = a > 10
>>> mask_1 
array([[False, False, False, False],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)

>>> mask_2 = a <= 18 
>>> mask_2
array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [False, False, False, False]], dtype=bool)

>>> mask_1 & mask_2 
array([[False, False, False, False],
       [False,  True,  True,  True],
       [False, False, False, False]], dtype=bool)

>>> a[mask_1 & mask_2]
array([12, 13, 14])

Masking

Use masking to create fast subsets of data

>>> data = {}
>>> data['heights'] = np.array([75., 70., 63.5, 67., 79., 64., 73.]) # inches
>>> data['gender'] = np.array(['m', 'f', 'm', 'm', 'f', 'f', 'm']) 

>>> mask = data['heights'] < 72.0  

>>> data['gender'][mask]
array(['f', 'm', 'm', 'f'])

pro-tip: create all your masks first then apply them

Mask Checks

  1.  any -- are any True?
  2. all -- are all True?
  3. where -- return index of where is True.
>>> a = np.array([0, 1, 1, 0, 0, 1], dtype=bool)
>>> a
array([False,  True,  True, False, False,  True], dtype=bool)

>>> np.all(a)
False

>>> np.any(a)
True

>>> np.where(a)
(array([1, 2, 5],)

Broadcasting

Broadcasting is a concept with rules which allows functions to combine arrays of different shapes

 

Technically this is how the element wise operations perform

(e.g. a + 1)

 

 

>>> a = np.arange(20).reshape(10, -1)
>>> a
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15],
       [16, 17],
       [18, 19]])
>>> b = np.array([1, 1])
>>> a+b
array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10],
       [11, 12],
       [13, 14],
       [15, 16],
       [17, 18],
       [19, 20]])

>>> a.shape
(10, 2)

>>> b.shape
(2,)

Combining Arrays

Several utilities exist for combining arrays together

 

You will most likely need to be aware of the array shapes

 

>>> a = np.array([[1,2],[3,4]])
>>> b = np.array([[4,5],[6,7]])

>>> np.hstack((a,b)) 
array([[1, 2, 4, 5],
       [3, 4, 6, 7]])


>>> np.vstack((a, b))
array([[1, 2],
       [3, 4],
       [4, 5],
       [6, 7]])

>>> np.dstack((a, b))
array([[[1, 4],
        [2, 5]],

       [[3, 6],
        [4, 7]]])

>>> np.concatenate((a.ravel(),b.ravel())
array([1, 2, 3, 4, 4, 5, 6, 7])

>>> np.concatenate((a.ravel(), b))
ValueError: all the input arrays must 
have same number of dimensions

Functions : never import math again

Numpy has math functions

sin, cos, tan, sum, power, sqrt, etc.

 

Can use the numpy function on numbers

>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])

>>> np.sqrt(a)
array([ 0., 1., 1.41421356, 1.73205081, 2.])

>>> np.arctan(a)
array([ 0.,  0.7853,  1.10714,  1.2490,  1.3258])

>>> a[3] = 20
>>> np.max(a)
20

>>> a[1] = np.nan 
>>> np.max(a)
np.nan

>>> np.nanmax(a)
20

>>> np.sqrt(4)
2

Functions : Basic stats

Numpy supports some basic statistics including mean, standard deviation, convolution

 

even do polynomial fitting

>>> a = np.array([3,-1,4,0,3,20]
>>> np.mean(a)
4.8333

>>> np.std(a)
7.0099


>>> x = np.arange(-10, 10, 0.6)
>>> y = np.random.normal(0, 3, len(x))
>>> coefficients = np.polyfit(x, y, 3)
>>> np.polyval(coefficients, np.arange(10, 15, 0.3))

>>> x2 = np.arange(-10, 10, 3)
>>> y2 = np.interp(x2, x, y)

Functions : Ordering

Numpy has math functions

sin, cos, tan, sum, power, sqrt, etc.

 

Can use the numpy function on numbers

>>> a = np.array([3,-1,4,0,3,20]
>>> np.argsort(a)
array([1, 3, 0, 4, 2, 5])

>>> idx = np.argsort(a)
>>> a[idx]
array([-1,  0,  3,  3,  4, 20])

>>> np.sort(a)
array([-1,  0,  3,  3,  4, 20])

>>> a.sort()
>>> a
array([-1,  0,  3,  3,  4, 20])

Functions : random number generation

Numpy has a module for random numbers

>>> np.random.random(5)
array([ 0.3577,  0.8275,  0.94499,  0.768,  0.2198])

>>> np.random.normal(0, 1, 10)
array([ 0.142 , -0.1082, -0.5035,  0.0059,  
        0.0165, -0.4864,  0.5475,
        0.269 , -0.9024,  0.3165])

>>> np.random.uniform(0, 5, 10)
array([ 0.5204,  4.051 ,  0.1581,  4.9441,  
        1.8538,  3.1539,  1.7153,
        0.7021,  1.6248,  2.4531])

Functions : reading and writing

Functions  which will output as a text file

 

genfromtxt will read in csv data

 

>>> a = np.random.random(5)
>>> np.savetxt('data.txt', a)

>>> a = np.genfromtxt('data.txt')

Integration

Pandas - Good if you have 2-dim data want to index by something other than integers

Scipy - Has many more statistics and model fitting which is all built on numpy

Matplotlib -  Good for visualizing arrays

 

Discuss 

 

Thanks!

 

gregersen.dylan@gmail.com

astrodsg.github.io

Numpy

By astrodsg

Numpy

  • 938