NumPy

Working with Arrays

Scientific computing

Financial
analysis

Relational
data

Multimedia
data

Deep
learning

All of these require storing and processing high dimensional arrays efficiently

We already learnt lists, sets, tuples, dictionaries

Lists can store collection of high dimensional numbers as arrays and we can operate on them by iterating

But this is very inefficient - 10x to 100x slower - than expected performance

We already learnt lists, sets, tuples, dictionaries

But this is very inefficient - 10x to 100x slower - than expected performance

Why?

Lists are designed to store heterogenous data

No low-level hardware mechanisms to accelerate operations on lists

Intended to bring performance and functionality improvements for numerical computing

Started only in 2006!

Now a standard package used in many real-world applications, other packages

Intended to bring performance and functionality improvements

Enable other packages to use numpy arrays as an efficient data interface

Efficiently store n-d arrays in vectorised form to benefit from DRAM locality

Enable easy file save and load of n-d arrays

Efficiently process data without type-checking overhead

Efficiently broadcast operations across dimensions

Provide implementations of many functions across linear algebra, statistics, ...

What we will focus on

What are n-d arrays

How to load and save n-d arrays

What is broadcasting

How to use statistical functions

1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
9 10 11 12
5 6 7 8
1 2 3 4
1 2 3 4
9 10 11 12
5 6 7 8
1 2 3 4
9 10 11 12
5 6 7 8
1 2 3 4
9 10 11 12
5 6 7 8
1 2 3 4
1 2 3 4
9 10 11 12
5 6 7 8
1 2 3 4
21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

1d array

2d array

3d array

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Dim 1

Dim 0

Dim 2

So we have a 3d array of size  2 x 3 x 4

We index dimensions backwards in the order we added them

Shape of array

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Dim 1

Dim 0

Dim 2

We can index each item of the 3d array by specifying
3 coordinates

A[0, 2, 3]

A[1, 0, 1]

A[1, 1, 3]

Remember we count starting from 0

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Dim 1

Dim 0

Dim 2

We can refer to slices of the data with partial indices

A[0, :, :]

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Dim 1

Dim 0

Dim 2

We can refer to slices of the data with partial indices

A[0, :, :]

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Dim 1

Dim 0

Dim 2

We can refer to slices of the data with partial indices

A[:, 1, :]

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Dim 1

Dim 0

Dim 2

We can refer to slices of the data with partial indices

A[:, :, 2]

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Dim 1

Dim 0

Dim 2

We can refer to slices of the data with partial indices

A[0, 1:3, 2:4]

Remember in a range (x:y) the second number is not inclusive

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Dim 1

Dim 0

Dim 2

We can refer to slices of the data with partial indices

A[0, 1:3, 2:4]

Remember in a range (x:y) the second number is not inclusive

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Dim 1

Dim 0

Dim 2

We can refer to slices of the data with partial indices

A[?, ?, ?]

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Dim 1

Dim 0

Dim 2

We can refer to slices of the data with partial indices

A[:, 0, 1:3]

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Dim 1

Dim 0

Dim 2

The output of such indexing is also an array

A[:, 0, 1:3]

In this case, the output array has a shape of 2 x 1 x 2

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Broadcasting

+

=

42 44 46 48
34 36 38 40
26 28 30 32
18 20 22 24
10 12 14 16
2 4 6 8
21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4
21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Broadcasting

+

9 10 11 12
1 2 3 4
5 6 7 8
9 10 11 12
1 2 3 4
5 6 7 8
21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Broadcasting

+

9 10 11 12
1 2 3 4
5 6 7 8
9 10 11 12
1 2 3 4
5 6 7 8
30 32 34 36
22 24 26 28
14 16 18 20
18 20 22 24
10 12 14 16
2 4 6 8

=

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Broadcasting

+

9
1
5
9
1
5
9
1
5
9
1
5
9
1
5
9
1
5
9
1
5
9
1
5
21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Broadcasting

+

9
1
5
9
1
5
9
1
5
9
1
5
9
1
5
9
1
5
9
1
5
9
1
5

+

30 31 32 33
22 23 24 25
14 15 16 17
18 19 20 21
10 11 12 13
2 3 4 5

=

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Broadcasting

+

1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Broadcasting

+

1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
22 24 26 28
18 20 22 24
14 16 18 20
10 12 14 16
6 8 10 12
2 4 6 8

=

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Broadcasting

+

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Broadcasting

+

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
22 23 24 25
18 19 20 21
14 15 16 17
10 11 12 13
6 7 8 9
2 3 4 5

=

Broadcasting

+

1 2 3 4 5

Incorrect arguments

1
4
3
2

Broadcasting

+

1 2 3 4 5
1
4
3
2
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5

Broadcasting

+

1 2 3 4 5
1
4
3
2
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1
4
3
2
1
4
3
2
1
4
3
2
1
4
3
2

Broadcasting

+

1 2 3 4 5
1
4
3
2
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1
4
3
2
1
4
3
2
1
4
3
2
1
4
3
2
2 3 4 5 6
5 6 7 8 9
4 5 6 7 8
3 4 5 6 7

=

FDS_NumPy

By One Fourth Labs

FDS_NumPy

NumPy

  • 147