Before we start
- Why data visualization?
- Basic rules
- Appropriate visualizations
- How to make better graphs
1. Why do we visualize data
Clarify and focus your message
Pirate Peel, Delvenne & Lambiotte (2018). Multiscale mixing patterns in networks. PNAS.
n = 40 m = 160 (40 B-B, 40 O-O, 80 B-O)
E(x) = 54 E(y) = 48 std(x) = 8 std(y) = 27 corr(X,Y) = -6%
Explore your data
2. three basic rules
Rule #1: WHAT IS THE POINT?
Message: Female survive more
Message: Males in first class do fairly well
Example: Titanic dataset
Rule #2: WHAT IS your audience?
Rule #3: digital vs printed?
Use beautiful colors
Use colors that convert to grayscale
Presentation? Careful with the contrast
Python (matplotlib + seaborn)
In any case:
Use vector graphics!!!
Make plots interactive
not all channels are equal
because humans are biased
Step 1: Determine your point and your audience
Step 2: Map the most important variable(s) to length
Step 3: Use color only if necessary
4. summary of visualizations
Differences within row
Required with log-scales
Proportions over x
amounts and proportions
...what about pie charts?
- Allow to compare the number to the total easily
- People understand them
--> Useful for reports, not so much in papers
Too many points?
Do you actually need a map?
Color is key here (more on this later). No rainbows! No diverging scales for amounts starting at 0!
Combine with a barplot or bubbles if the sizes are highly unequal
Different ways to visualize networks
Faskowitz et al(2018). Weighted stochastic block models of the human connectome across the life span. Scientific reports
Do you actually need a network?
Face it, you don't need to show your hairball.
Sometimes you can visualize it:
- Small networks when humans can understand (karate club, medici)
- Convince the reader about a point that you want to make
5. eight ways to make better graphs
- Enlarge the labels, almost always too small
- Use vector graphics (pdf/eps) to avoid blurry and large figures
- Eliminate unnecesary lines (all frames, use gray grid lines, etc)
- Create the Figure with the right size
- Make sure you align subplots
- Don't use a gray background
1) keep it clean
Grid lines that run perpendicular to the key variable of interest tend to be the most useful.
2) don't use 3D
3) use color when needed
- Don't use defaults, they are ugly.
Three uses of colors: Represent values, categories or highlight
- Represent values: Only if strictly needed
- Represent categories: Make gray as many as you can and highlight the most relevant one
Three type of color palettes:
- Sequential: Use when the min is important
- Diverging: When the mean is important and will be read in color
- Try to be color-blind friendly. Don't use a green to red palette!
- Don't use rainbows.
- Choosing a palette: https://matplotlib.org/tutorials/colors/colormaps.html?highlight=palette
The real value is not important
Don't use more than 4 lines
Unless you gray most and highlight 1-2
3B) be consistent
Colors consistent across figures
Same type of figures in the paper
4) use an appropriate type
- Use length to reflect the most important variable(s)
- One numerical variable --> barplot or histogram
- Two numerical variables --> scatter plot
- Three numerical variables --> scatter plot + bubbles
- Use color for categorical variables (ideally most of them gray + one highlighted)
- Take the frame out of the legend
- Make the axis less thick
- Add horizontal grid lines
- Use gray and a more muted blue
5) eliminate redundant information
- Bubbles and y axis measuring the same
- Blurry and small labels
- WTF colors and gray background
- It's a a paper about IT, the "J" sector should be highlighted
6) show the context
- The axis should reflect the range of the data (doesn't have to cross zero!)
- Don't do double scale, unless you actually want to show there is an association
- Don't break bars. Use dots and log-scale as an alternative.
7) use tidy data from day 1
8) combine several types
- Get ideas from the papers you read
Try to guide the audience
some random thoughts
Horizontal bar plots usually look better
Prototype and ask for feedback
Always: Use tidy data.
Step 1: Determine your point and your audience.
Step 2: Choose an appropriate visualization:
- Map the most important variable(s) to length
- Use color to highlight, refrain from using it to show amounts
- Show the context and guide the audience
Step 3: Prototype different visualizations and ask for feedback.
Step 4: Remove all unnecessary components, make sure fonts are large enough and subplots are aligned.
Step 5: Save the figure as PDF (or EPS).
Step 6: Do minor edits in Inkscape or Illustrator.
Spend time, it is worth it
Leão, J. C., Brandão, M. A., de Melo, P. O. V., & Laender, A. H. (2018). Who is really in my social circle?. Journal of Internet Services and Applications, 9(1), 20.
Data Visualization Tutorial (WWCS2020)
By Javier GB