Pandas - Series

Working with Arrays

Scientific computing

Financial
analysis

Relational
data

Multimedia
data

Deep
learning

All of these require storing and processing high dimensional arrays efficiently

We already learnt several ways of doing this

Lists, sets, tuples, dictionaries
Numpy arrays

For performance, NumPy arrays were significantly faster

NumPy package

Provides efficient low-level storage and operations on multi-dim typed arrays

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Dim 1

Dim 0

Dim 2

NumPy package

Provides many efficient indexing methods

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4

Dim 1

Dim 0

Dim 2

A[:, 1, :]

NumPy package

Provides efficient broadcasting of ops

21 22 23 24
17 18 19 20
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4
9
1
5
9
1
5
9
1
5
9
1
5
9
1
5
9
1
5
9
1
5
9
1
5

+

30 31 32 33
22 23 24 25
14 15 16 17
18 19 20 21
10 11 12 13
2 3 4 5

=

NumPy package

Provides efficient implementation of functions - arithmetic, statistic, trignometric, ...

But ...

NumPy is missing features to enable data analysis on relational data like this

What are these features?

Student Id Subject 10th Marks 12th Marks
2318081 Maths 82 84
2318081 English 77 91
2318082 Science 91 -
2318083 Maths 89 78
2318082 Hindi 82 -

Limitations of NumPy

No way to attach labels to data

No way to pivot data

No pre-built methods to fill missing values

No way to group data

Data['10th Marks']
fill 0 in NaN
Data_grouped_by_subject
Each row contains marks of one student

NumPy -> Pandas

NumPy

Pandas

Pandas is built on top of NumPy to make data processing on relational data easier

Pandas for Data Science

For a Data Scientist working in Python, Pandas is a crucial tool

Ingesting, Storing, Pre-processing, Summarising and Visualising data can all be done effectively with Pandas

What we will focus on

Pandas objects: Series and Dataframes

Pandas for Data Wrangling

Case-study with Pandas

Lead onto next session on Visualising Data

FDS_Series

By One Fourth Labs

FDS_Series

Series

  • 100