Take a sad chart and make it better

with ggplot2 and other tools

Rules don't always apply.

Never use 3D.

Never use 3D.

@tylermorganwall

Use < 7 colors.

Use < 7 colors.

and more...

@trang1618

I'm Trang Le.

I have made some (very) bad charts.

https://trang.page

What's your favorite R package?

Let's learn a bit about you!

reduce cognitive burden

  • direct labeling
  • reduce number of labels
  • highlighting
  • consistent color scheme

Risk of hypertension

What is the ggplot term for

facet_grid(
  cols = vars(...),
  scales = 'free', 
  space = 'free')
scales = 'free'
space = 'free'

"strips"

facets are extremely useful to improve simplicity!

identify key idea

select chart type

some examples

I would use histograms or stacked bar charts for each comparison.
Trade-off: node sizes not shown.

Day 3 vs. Day 0

Day 7 vs. Day 0

Metabolome

Proteome

Transcriptome

Novel nodes

performance

Model B – Model A

Model B

Model A

how much information
should I show?

Histograms heavily depend on bin width and location

but sometimes binning is helpful... 

e.g. to solve overcrowding problem

ggcyto::ggcyto() + geom_hex(bins = 64)

paper ≠ presentation

  • amount of information
  • annotation
  • highlights
  • abbreviations
  • builds
  • text sizes
  • ...

 

White scientists are overrepresented.

scientists are underrepresented.

Asian

annotation

useful ggplot snippets

How do you remove legend title?
What about y-axis title?

labs(y = NULL, fill = NULL)

NULL, NA and element_blank()

theme(axis.title.y = element_blank(),
      legend.title = element_blank())
theme_set(
  theme_bw() + 
  theme(legend.title = element_blank())
)
scale_y_continuous(
  breaks = seq(0, 1, 0.2),
  labels = scales::percent_format(accuracy = 0.1),
  expand = expansions(mult = c(0, .1)))
+ coord_fixed(ratio = 1)
ggplot(data = my_world) +
    geom_sf(aes(fill = fct_relevel(region, region_levels))) +
    coord_sf(crs = "+proj=eqearth +wktext")

potential pitfalls

# Setting the limits with the scale discards 
# all data outside the range.
lims(x = c(20, NA), y = c(NA, 40))


# As of June 2019, this is allowed (and recommended)
coord_cartesian(xlim = c(20, NA), ylim = c(NA, 40))
# Setting the limits with the scale discards 
# all data outside the range.
lims(x = c(20, NA), y = c(NA, 40))


# As of June 2019, this is allowed (and recommended)
coord_cartesian(xlim = c(20, NA), ylim = c(NA, 40))
geom_col()

know these differences

geom_bar()
geom_histogram()
geom_point()
geom_count()
geom_jitter()

jitters both in the x and y direction by defaults

Very quick on
colors and themes

  • color: sequential, quantitative, divergent
    • colorblind friendly palettes
  • themes
+ colorblindr::scale_fill_OkabeIto()
+ rcolorcarto::scale_color_carto_d()
+ scale_color_viridis_c()
fill
d

Resources

windRose(openair_data)

Great data stories

library(treeheatr)
heat_tree(penguins, target_lab = 'species')

Take a sad chart and make it better

By Trang Le

Take a sad chart and make it better

Miscellaneous collection of visualizations for academic papers.

  • 204
Loading comments...

More from Trang Le