How to DataViz
Prologue
Why DataViz?
People don't read papers any more ...
... they just look at the graphs

How many 3s?
Data is hard to Understand
How many 3s?

Slide from Heer, Stasko

Better than Tables


Even Stats Can Fail
1, 2, 3, 4, ...

DataViz works because Vision is Powerful
Numbers —> 1 Dimensional
Vision —> 5-8 Dimensions
Why This Talk?
“Perception is a fantasy that (tries to) coincide with reality”

Straight or Bent lines?
... they're straight

See the Triangle?
... it's not really there
DataViz is Hard
is Design
(Good)

No Formulae
No Rules
Many Requirements
Design
Creative
Iterative

M Bostock, https://youtu.be/fThhbt23SGM
Guiding Principles
Provide Insight
"Through the graph ... we should see something that would have been harder to see otherwise."
Be Minimalistic
Be Clear
Use as little ink or as few pixels for every bit of data as possible.
Keep the ink/data ratio low.
Attention Bottleneck
"You can pay attention to only one aspect of an image at a time ... neural networks in your brain constantly compete for limited attentional resources."
Red Circle
Left | Right
Attention Bottleneck ...
Purple Circle
Left | Right
Purple Circle
Left | Right



Simplified Information is Surprisingly Effective

Simplified Information is Surprisingly Effective
Simple lines and edges are actually what your brain is looking for.
And so vision works quite well without all of the visual detail.
Which also frees up the attentional bottleneck

Removing "Chart Junk"







Rethinking your Graph

MakeComparisons Accurately
Different visual qualities have different accuracies.
Use the one appropriate to your data.
Visual Qualities
Position
Length
Angle
Area
Colour / Brightness
Visual Qualities
Table ... Counts too

Visual Qualities
Sometimes a visualisation doesn't do any better than a table.
Tables are the baseline for assessing the quality of a visualisation.
Colour

A B C D E F G H I J K
.... which is the FOURTH Darkest?


Colour
Colour can suffer from illusions
Which bars are the same and which are different?
Area

A B C D E F G H I J K
.... which is the FOURTH Smallest?
Unfortunately, currently popular ... for example
Angles

Which is second smallest?
Third Smallest?
Angles

Slope Comparisons
Can be relatively accurate when optimised
Slope Comparisons
Differences easier to see when angle is ~45 degrees.
Slope Comparisons


Slope is greater or lesser for the increase or decrease?
Length

Of the blue ... which is the second smallest?
Position

Grey v Blue is easy & clear
Comparing Blue with Blue is harder ...
... Distance makes position accuracy worse
Position

Position

Position






Accurate
Inaccurate
Impressionistic



Less accurate doesn't mean bad
Use less accurate visual forms when the differences being compared are large, or there is a large structure in the data to be shown.

Sometimes you have order your data for the structure to be apparent

Sometimes it is useful to make inaccurate comparisons when the intention is to simply make it evident that there is substantial variation in a measurement, not to allow direct one-to-one comparisons
Here area encodes population for various nations of the world. The idea is that there are large differences in population, and that it doesn't affect the correlation.
Show Structure with Shapes & Curves
The eye is looking for shapes and curves.
Use that.


These shapes don't exist
But your brain wants to see them


Though position is accurate, you often see and remember the structure as a line / curve / shape




Which shows the structure (insight) most clearly and easily?
Links


Which shows the comparison most clearly and easily?


Which shows the comparison, structure and insight most clearly and easily?
Smoothing in 2D

Shape perception can be dangerous
What are the differences between these two data sets?

How predictable was the differences curve?
Often, graphing the differences is better or necessary


As, when curves are close and of similar slopes, we begin to see new shapes, not the axial distance between them.
Visually Group Layers of Information
The eye can focus on and ignore visual elements depending on shared visual qualities.
Differences in some visual features Pop Out

Angle
Size
Shape (many variations)
Colour / Brightness


Using Pop Out leads to the ability to focus on different groups


Different elements stand out




Different groups stand out

Maximising Pop Out aids grouping
Using different colors helps

Using different shapes helps
The crosses are the most different and pop out the most

Using different shapes + thickness/size/darkness is better

Combining differences in colour + shape + darkness + size is best

Allowing for maximal density and shape perception is good too



Overlapping Symbols

Open circles are the best ...
Their intersections are visually different from circles.
Overlapping squares unfortunately create new squares etc.
Layered Annotations

Don't be afraid to add annotations or guiding graphics.
Use popping out features to make them distinguishable from the actual dataViz.
Layered Reference Grids

Differences in thickness result in differences in apparent brightness. Grid lines and dataViz become easy to distinguish.

ggPlot style
Traditional style
Understand Colour
Colour is beautiful but easy to use badly.
What is Colour

Colour is three things ...
Hue
Saturation
Lightness / Brightness
Links
What is Colour Good For?
... CATEGORIES (nominal)

What is Colour Good For?
... CATEGORIES (nominal)


Bottleneck issues.
Use < 9 colours.
What is Colour NOT Good For?
... QUANTITATIVE



Color for Quantitative ...
still useful

Color for Quantitative

Illogical &
uneven
What is Colour

Even & Linear
Even & Linear
Color for Quantitative

Illogical &
uneven
Color for Quantitative

Illogical &
uneven
Diverging Color for Quantitative


Zero
Max
Min
Divergine Color for Quantitative

... quantitative + Categorical
Divergine Color for Quantitative

... make sure there is a reason or a zero point for diverging scales
Color Blindness
... or Color Confusion

Color Blindness
... or Color Confusion

Color Blindness
... or Color Confusion

Color for Color Blindness

Illogical &
uneven
Color for Color Blindness

... Categorical
Color for Color Blindness

Illogical &
uneven
... Quantitative
Maximum Range
Color for Color Blindness

Illogical &
uneven
... Quantitative
Maximum Range
&
Still Pretty!

-
Insight
-
Clarity & Minimalism
-
Comparison
-
Shapes, Lines & Curves
-
Grouped Elements
-
Careful Colour


Parallel Coordinate Plot
Effective Application of these Guiding Principles
Position encodes values
Colour encodes categories
Curves display correlations
Angles display correlation size
Thank you!
Credits
- Isabell (@Isa_Kiko)
- The elements of graphing data / William S. Cleveland.
- Information visualization : perception for design / Colin Ware.
How to DataViz
By Errol Lloyd
How to DataViz
- 400