http://bit.ly/DataViz_WWCS2020
My opinions
Clarify and focus your message
Pirate Peel, Delvenne & Lambiotte (2018). Multiscale mixing patterns in networks. PNAS.
n = 40 m = 160 (40 B-B, 40 O-O, 80 B-O)
E(x) = 54 E(y) = 48 std(x) = 8 std(y) = 27 corr(X,Y) = -6%
Explore your data
Message: Female survive more
Message: Males in first class do fairly well
Example: Titanic dataset
Academic
General public
https://medium.economist.com/mistakes-weve-drawn-a-few-8cdd8a42d368
Use beautiful colors
Use colors that convert to grayscale
Presentation? Careful with the contrast
datawrapper.de
Python (matplotlib + seaborn)
In any case:
Use vector graphics!!!
Make plots interactive
Vector
PDF/EPS/SVG
Bitmap
Screenshots/PNG/JPEG
Step 1: Determine your point and your audience
Step 2: Map the most important variable(s) to length
Step 3: Use color only if necessary
Correlation plots
Differences within row
Required with log-scales
Proportions over x
...what about pie charts?
- Allow to compare the number to the total easily
- People understand them
--> Useful for reports, not so much in papers
https://www.knowablemagazine.org/article/mind/2019/science-data-visualization
Too many points?
Time series?
Do you actually need a map?
Color is key here (more on this later). No rainbows! No diverging scales for amounts starting at 0!
Combine with a barplot or bubbles if the sizes are highly unequal
Different ways to visualize networks
Faskowitz et al(2018). Weighted stochastic block models of the human connectome across the life span. Scientific reports
Do you actually need a network?
Face it, you don't need to show your hairball.
Sometimes you can visualize it:
- Small networks when humans can understand (karate club, medici)
- Convince the reader about a point that you want to make
- Art
https://serialmentor.com/dataviz/avoid-line-drawings.html
Grid lines that run perpendicular to the key variable of interest tend to be the most useful.
...represent amounts
The real value is not important
Don't use more than 4 lines
Unless you gray most and highlight 1-2
...defaults
Real examples
Colors consistent across figures
Same type of figures in the paper
Me 2013
Me 2016
Me 2020
sns.catplot(x="GDP_PC",data=df,kind="...")
box
violin
boxen
swarm
strip
sns.catplot(x="City",y="GDP_PC",
data=df,kind="strip",color="gray")
Try to guide the audience
Horizontal bar plots usually look better
Prototype and ask for feedback
Always: Use tidy data.
Step 1: Determine your point and your audience.
Step 2: Choose an appropriate visualization:
Step 3: Prototype different visualizations and ask for feedback.
Step 4: Remove all unnecessary components, make sure fonts are large enough and subplots are aligned.
Step 5: Save the figure as PDF (or EPS).
Step 6: Do minor edits in Inkscape or Illustrator.
https://serialmentor.com/dataviz/
https://medium.economist.com/mistakes-weve-drawn-a-few-8cdd8a42d368
https://www.pnas.org/content/116/6/1857
https://larremorelab.github.io/assets/pdf/workshop_DataVisualization.pdf
https://slides.com/maartenzam/databeers#/
https://www.knowablemagazine.org/article/mind/2019/science-data-visualization
https://davidmathlogic.com/colorblind/#%23D81B60-%231E88E5-%23FFC107-%23004D40
Leão, J. C., Brandão, M. A., de Melo, P. O. V., & Laender, A. H. (2018). Who is really in my social circle?. Journal of Internet Services and Applications, 9(1), 20.