NumPy is a fantastic package to have in your programming toolbox
Where Python falls short on performance, NumPy steps in to provide C/Fortran speeds
NumPy provides a powerful N-dimensional array class
import numpy as np
a = np.array([1,2,3,4])
Fast C and Fortan speed operations
a = np.array([1,2,3])
np.sum(a)
Use NumPy when you have large collections of regular data
I'm going to present features of NumPy and include many examples
Then discuss integrations of NumPy with packages like SciPy, Matplotlib, and Pandas
Ask questions along the way!
Numpy's array class is powerful!
>>> import numpy as np
>>> a = np.array([-3,5,1,4])
>>> a[0]
-3
>>> a[:2] # supports slicing like lists
array([-3, 5])
>>> b = np.array([[3,2,4], [2,1,5]]) # 2x3 array
>>> b
array([[3, 2, 4],
[2, 1, 5]])
>>> b.shape
(2, 3)
Similar to a python list which is an ordered collection of data
shape explains the dimensionality
Features of numpy's array class
NumPy arrays are typed
Typing is part of what gives NumPy it's speed
The any array generation function will accept dtype=? and you can specify a type
also an astype method which allows you to cast
>>> a = np.array([1,2,3], dtype=float)
>>> a
array([ 1., 2., 3.])
>>> a.dtype
dtype('float64')
>>> a.astype(str)
array(['1.0', '2.0', '3.0'], dtype='|S32')
>>> np.array([True, False, False])
array([ True, False, False], dtype=bool)
>>> np.array([{1:2}, {3:4}])
array([{1: 2}, {3: 4}], dtype=object)
>>> np.array(['s1', 50, 'longer_string'])
array(['s1', '50', 'longer_string'], dtype='|S13')
>>> np.array(['5', 50, '100'], dtype=int)
array([ 5, 50, 100])
>>> np.array(['5', 50, 'fail'], dtype=int)
ValueError: invalid literal for long() with base 10: 'fail'
>>> a = np.array([array to the right])
>>> a[0, 3:5]
array([3, 4])
>>> a[ :, 2 ]
array([2, 12, 22, 32, 43, 52])
>>> a[4:, 4:]
array([[44, 45],
[54, 55]])
>>> a[ ::-1, 2 ] # reversed
array([52, 42, 32, 22, 12, 2])
>>> a[2::2, 2::2]
array([[22, 24],
[42, 44]])
= np.arange(0,60).reshape((6, -1))[:,:6]
An operation is performed on each element of the array not the entire array
>>> a = np.array([2,3,5])
>>> a*2
array([4,6,10])
>>> a+5
array([7,8,10])
Each array has a shape
shapes can be modified to produce new arrays
>>> a = np.array([[1, 2, 3, 4],
[10, 12, 13, 14],
[20, 22, 23, 24]
])
>>> a
array([[ 1, 2, 3, 4],
[10, 12, 13, 14],
[20, 22, 23, 24]])
>>> a.shape
(3, 4)
>>> a.size
12
>>> a.reshape((4,3))
array([[ 1, 2, 3],
[ 4, 10, 12],
[13, 14, 20],
[22, 23, 24]])
>>> a.reshape(2, -1)
array([[ 1, 2, 3, 4, 10, 12],
[13, 14, 20, 22, 23, 24]])
>>> a.ravel()
array([ 1, 2, 3, 4, 10, 12,
13, 14, 20, 22, 23, 24])
For a multidimensional array you can transpose the elements (e.g. (i,j) to (j,i))
>>> a = np.array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
>>> a.T
array([[ 0, 5, 10, 15],
[ 1, 6, 11, 16],
[ 2, 7, 12, 17],
[ 3, 8, 13, 18],
[ 4, 9, 14, 19]])
>>> np.arange(0, 6, 1.5) # 1
array([ 0. , 1.5, 3. , 4.5])
>>> np.linspace(0, 6, 4) # 2
array([ 0., 2., 4., 6.])
>>> np.ones(5) # 3
array([ 1., 1., 1., 1., 1.])
>>> np.zeros((3,3)) # 3 with shape
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
>>> np.repeat(3.14, 3) # 4
array([ 3.14, 3.14, 3.14])
>>> np.empty((2,2)) # 5
array([[ 0., 0.],
[ 0., 0.]])
>>> np.zeros_like(np.arange(5)) # 6
array([0, 0, 0, 0, 0])
In addition to slicing you can do fancy indexing
This includes using a list of indices or an boolean array
These create views of the data which can be assigned to
>>> a = np.array([2, 4, 6, 8, 10, 12, 14])
>>> a[[0,3,5]]
array([2, 8, 12])
>>> idx = [0, 3, 5]
>>> a[idx]
array([2, 8, 12])
>>> mask = np.array([True, False, False, True, True, False, True])
>>> a[mask]
array([2, 8, 10, 14])
>>> a[mask] = 0
array([0, 4, 6, 0, 0, 12, 0])
Can create masks of the data based on logic
supports logic operations with &(and), |(or), ~(not)
np.logical_xor
>>> a = np.arange(0,20).reshape(5, 4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
>>> mask_1 = a > 10
>>> mask_1
array([[False, False, False, False],
[False, True, True, True],
[ True, True, True, True]], dtype=bool)
>>> mask_2 = a <= 18
>>> mask_2
array([[ True, True, True, True],
[ True, True, True, True],
[False, False, False, False]], dtype=bool)
>>> mask_1 & mask_2
array([[False, False, False, False],
[False, True, True, True],
[False, False, False, False]], dtype=bool)
>>> a[mask_1 & mask_2]
array([12, 13, 14])
Use masking to create fast subsets of data
>>> data = {}
>>> data['heights'] = np.array([75., 70., 63.5, 67., 79., 64., 73.]) # inches
>>> data['gender'] = np.array(['m', 'f', 'm', 'm', 'f', 'f', 'm'])
>>> mask = data['heights'] < 72.0
>>> data['gender'][mask]
array(['f', 'm', 'm', 'f'])
pro-tip: create all your masks first then apply them
>>> a = np.array([0, 1, 1, 0, 0, 1], dtype=bool)
>>> a
array([False, True, True, False, False, True], dtype=bool)
>>> np.all(a)
False
>>> np.any(a)
True
>>> np.where(a)
(array([1, 2, 5],)
Broadcasting is a concept with rules which allows functions to combine arrays of different shapes
Technically this is how the element wise operations perform
(e.g. a + 1)
>>> a = np.arange(20).reshape(10, -1)
>>> a
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15],
[16, 17],
[18, 19]])
>>> b = np.array([1, 1])
>>> a+b
array([[ 1, 2],
[ 3, 4],
[ 5, 6],
[ 7, 8],
[ 9, 10],
[11, 12],
[13, 14],
[15, 16],
[17, 18],
[19, 20]])
>>> a.shape
(10, 2)
>>> b.shape
(2,)
Several utilities exist for combining arrays together
You will most likely need to be aware of the array shapes
>>> a = np.array([[1,2],[3,4]])
>>> b = np.array([[4,5],[6,7]])
>>> np.hstack((a,b))
array([[1, 2, 4, 5],
[3, 4, 6, 7]])
>>> np.vstack((a, b))
array([[1, 2],
[3, 4],
[4, 5],
[6, 7]])
>>> np.dstack((a, b))
array([[[1, 4],
[2, 5]],
[[3, 6],
[4, 7]]])
>>> np.concatenate((a.ravel(),b.ravel())
array([1, 2, 3, 4, 4, 5, 6, 7])
>>> np.concatenate((a.ravel(), b))
ValueError: all the input arrays must
have same number of dimensions
Numpy has math functions
sin, cos, tan, sum, power, sqrt, etc.
Can use the numpy function on numbers
>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> np.sqrt(a)
array([ 0., 1., 1.41421356, 1.73205081, 2.])
>>> np.arctan(a)
array([ 0., 0.7853, 1.10714, 1.2490, 1.3258])
>>> a[3] = 20
>>> np.max(a)
20
>>> a[1] = np.nan
>>> np.max(a)
np.nan
>>> np.nanmax(a)
20
>>> np.sqrt(4)
2
Numpy supports some basic statistics including mean, standard deviation, convolution
even do polynomial fitting
>>> a = np.array([3,-1,4,0,3,20]
>>> np.mean(a)
4.8333
>>> np.std(a)
7.0099
>>> x = np.arange(-10, 10, 0.6)
>>> y = np.random.normal(0, 3, len(x))
>>> coefficients = np.polyfit(x, y, 3)
>>> np.polyval(coefficients, np.arange(10, 15, 0.3))
>>> x2 = np.arange(-10, 10, 3)
>>> y2 = np.interp(x2, x, y)
Numpy has math functions
sin, cos, tan, sum, power, sqrt, etc.
Can use the numpy function on numbers
>>> a = np.array([3,-1,4,0,3,20]
>>> np.argsort(a)
array([1, 3, 0, 4, 2, 5])
>>> idx = np.argsort(a)
>>> a[idx]
array([-1, 0, 3, 3, 4, 20])
>>> np.sort(a)
array([-1, 0, 3, 3, 4, 20])
>>> a.sort()
>>> a
array([-1, 0, 3, 3, 4, 20])
Numpy has a module for random numbers
>>> np.random.random(5)
array([ 0.3577, 0.8275, 0.94499, 0.768, 0.2198])
>>> np.random.normal(0, 1, 10)
array([ 0.142 , -0.1082, -0.5035, 0.0059,
0.0165, -0.4864, 0.5475,
0.269 , -0.9024, 0.3165])
>>> np.random.uniform(0, 5, 10)
array([ 0.5204, 4.051 , 0.1581, 4.9441,
1.8538, 3.1539, 1.7153,
0.7021, 1.6248, 2.4531])
Functions which will output as a text file
genfromtxt will read in csv data
>>> a = np.random.random(5)
>>> np.savetxt('data.txt', a)
>>> a = np.genfromtxt('data.txt')
Pandas - Good if you have 2-dim data want to index by something other than integers
Scipy - Has many more statistics and model fitting which is all built on numpy
Matplotlib - Good for visualizing arrays
Discuss
gregersen.dylan@gmail.com
astrodsg.github.io