Data Visualisation

collect

process

store

describe

model

Recap: What is Data Science?

Describing data is an act of compressing the information to focus on

One way is to compute statistics

The other way is to visualise the data

Data visualisation - What

visual encoding of data

Data visualisation - What

visual encoding of data

This encoding can be identified by us,
but can be decoded only by machines

What we are interested in are representations that are decoded by us

Data visualisation - What

visual elements for encoding data

We make perceptual errors in decoding these elements

Data visualisation - What

Data visualisation - What

A

B

Our goal is to reduce perceptual error

Which is longer A or B?

Data visualisation - What

A

B

Our goal is to reduce perceptual error

Which is longer A or B?

Data visualisation - Why

discover insights from the data 


RangeIndex: 53940 entries, 0 to 53939
Data columns (total 10 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   carat    53940 non-null  float64
 1   cut      53940 non-null  object
 2   color    53940 non-null  object
 3   clarity  53940 non-null  object
 4   depth    53940 non-null  float64
 5   table    53940 non-null  float64
 6   price    53940 non-null  int64  
 7   x        53940 non-null  float64
 8   y        53940 non-null  float64
 9   z        53940 non-null  float64
dtypes: float64(6), int64(1), object(3)
memory usage: 4.1+ MB

diamonds dataset

Data visualisation - Why

discover insights from the data
communicate insights effectively

Data visualisation - Why

discover insights from the data
communicate insights effectively

Survey of how Americans spend their day

by NYTimes

Animated gapminder plot by Hans Rosling

Pitfalls in Data Visualisation

It is not that important

Pitfalls in Data Visualisation

It is about packing information in an image

Pitfalls in Data Visualisation

It is about drawing cool images

Pitfalls in Data Visualisation

It is about drawing cool images

Pitfalls in Data Visualisation

Aesthetic features like colour don't matter

Pitfalls in Data Visualisation

Limited number of courses on
data visualisation

Important to figure out
what and how to learn

What to learn

Syntax of plotting libraries

Understanding of plot types and usages

Aesthetics of plots

Communicating with plots

What to learn

Syntax of plotting libraries

Understanding of plot types and usages

Aesthetics of plots

Communicating with plots

Syntax is not hard to learn, or Google
Many small details can be easily searched

What to learn

Syntax of plotting libraries

Understanding of plot types and usages

Aesthetics of plots

Communicating with plots

Most crucial to develop a structured approach to thinking about plots
When to use which plot type, why?

What to learn

Syntax of plotting libraries

Understanding of plot types and usages

Aesthetics of plots

Communicating with plots

Clever people have been thinking about this
Good libraries have aesthetic default choices

What to learn

Syntax of plotting libraries

Understanding of plot types and usages

Aesthetics of plots

Communicating with plots

Very important to use plots as an element in communication (research paper, presentation, or blog)

Our Focus

Syntax of plotting libraries

Understanding of plot types and usages

Aesthetics of plots

Communicating with plots

We will focus on understanding plot types from first principles and then seeing examples of their usage

Pay Attention To

How to tradeoff readability & information

What are the different plot elements & how do they compose

How to re-orient thinking from data-centric to viewer-centric objectives

Package

MATPLOTLIB

SEABORN

Inspired by Matlab plotting

Both Matlab-like & OOP syntax

Abstracts many underlying details

Usually better default aesthetics

Relation to Pandas

MATPLOTLIB

SEABORN

There is a deep integration of these packages with Pandas

As we go through this session,
we will learn new ideas in Pandas

PANDAS

Made with Slides.com