Before we start


  1. Why data visualization?
  2. Basic rules
  3. Theory
  4. Appropriate visualizations
  5. How to make better graphs


My opinions

1. Why do we visualize data


Clarify and focus your message

Pirate Peel, Delvenne & Lambiotte (2018). Multiscale mixing patterns in networks. PNAS.

n = 40      m = 160 (40 B-B, 40 O-O, 80 B-O)

E(x) = 54 E(y) = 48 std(x) = 8 std(y) = 27 corr(X,Y) = -6%


Explore your data


2. three basic rules


Message: Female survive more

Message: Males in first class do fairly well

Example: Titanic dataset

Rule #2: WHAT IS your audience?


General public

Rule #3: digital vs printed?

Use beautiful colors


Use colors that convert to grayscale


Presentation? Careful with the contrast

Python (matplotlib + seaborn)

In any case:

Use vector graphics!!!

Make plots interactive





3. theory


not all channels are equal

because humans are biased

Step 1: Determine your point and your audience


Step 2: Map the most important variable(s) to length


Step 3: Use color only if necessary


4. summary of visualizations

Correlation plots

Differences within row

Required with log-scales

Proportions over x

amounts and proportions

...what about pie charts?

- Allow to compare the number to the total easily

- People understand them


--> Useful for reports, not so much in papers


Too many points?

Time series?


Do you actually need a map?

Color is key here (more on this later). No rainbows! No diverging scales for amounts starting at 0!

Combine with a barplot or bubbles if the sizes are highly unequal

geographical data

Different ways to visualize networks


Faskowitz et al(2018). Weighted stochastic block models of the human connectome across the life span. Scientific reports

Do you actually need a network?


Face it, you don't need to show your hairball.

Sometimes you can visualize it:

     - Small networks when humans can understand (karate club, medici)

     - Convince the reader about a point that you want to make

     - Art



5. eight ways to make better graphs

  • Enlarge the labels, almost always too small
  • Use vector graphics (pdf/eps) to avoid blurry and large figures
  • Eliminate unnecesary lines (all frames, use gray grid lines, etc)
  • Create the Figure with the right size
  • Make sure you align subplots
  • Don't use a gray background

1) keep it clean

Grid lines that run perpendicular to the key variable of interest tend to be the most useful.

2) don't use 3D

3) use color when needed

  • Don't use defaults, they are ugly.
  • Three uses of colors: Represent values, categories or highlight
    • Represent values: Only if strictly needed
    • Represent categories: Make gray as many as you can and highlight the most relevant one
  • Three type of color palettes:
    • Sequential: Use when the min is important
    • Diverging: When the mean is important and will be read in color
    • Qualitative
  • Try to be color-blind friendly. Don't use a green to red palette!
  • Don't use rainbows.
  • Choosing a palette:

...represent amounts

The real value is not important

Don't use more than 4 lines

Unless you gray most and highlight 1-2


Real examples

3B) be consistent

Colors consistent across figures
Same type of figures in the paper

4) use an appropriate type

  • Use length to reflect the most important variable(s)
  • One numerical variable --> barplot or histogram
  • Two numerical variables --> scatter plot
  • Three numerical variables --> scatter plot + bubbles
  • Use color for categorical variables (ideally most of them gray + one highlighted)

Me 2013

Me 2016

Me 2020

  • Take the frame out of the legend
  • Make the axis less thick
  • Add horizontal grid lines
  • Use gray and a more muted blue
  • ...

5) eliminate redundant information

  • Bubbles and y axis measuring the same
  • Blurry and small labels
  • WTF colors and gray background
  • It's a a paper about IT, the "J" sector should be highlighted

6) show the context

  • The axis should reflect the range of the data (doesn't have to cross zero!)
  • Don't do double scale, unless you actually want to show there is an association
  • Don't break bars. Use dots and log-scale as an alternative.

7) use tidy data from day 1








8) combine several types

  • Get ideas from the papers you read

Try to guide the audience

some random thoughts

Horizontal bar plots usually look better

Prototype and ask for feedback

Always: Use tidy data.

Step 1: Determine your point and your audience.

Step 2: Choose an appropriate visualization:

  • Map the most important variable(s) to length
  • Use color to highlight, refrain from using it to show amounts
  • Show the context and guide the audience

Step 3: Prototype different visualizations and ask for feedback.

Step 4: Remove all unnecessary components, make sure fonts are large enough and subplots are aligned.

Step 5: Save the figure as PDF (or EPS).

Step 6: Do minor edits in Inkscape or Illustrator.


Spend time, it is worth it


Leão, J. C., Brandão, M. A., de Melo, P. O. V., & Laender, A. H. (2018). Who is really in my social circle?. Journal of Internet Services and Applications, 9(1), 20.

Data Visualization Tutorial (WWCS2020)

By Javier GB

Data Visualization Tutorial (WWCS2020)

Data visualization tutorial. 4x3 unfortunately.

  • 87
Loading comments...

More from Javier GB