Ryan Hafen
http://bit.ly/cidacs-vis
Context: Exploratory analysis and statistical modeling
Image source: Hadley Wickham
A series of similar plots, usually each based on a different slice of data, arranged in a grid
"For a wide range of problems in data presentation, small multiples are the best design solution."
Edward Tufte (Envisioning Information)
This idea was formalized and popularized in S/S-PLUS and subsequently R with the trellis and lattice packages
In my experience, small multiples are often much more effective (and easier to create) than more flashy things like animation, linked brushing, custom interactive vis, etc.
geofacet
geographically-oriented small multiple displays
trelliscope
scalable detailed visualization using small multiple displays
Brazil Live Births “Sistema de Informações sobre Nascidos Vivos” (SINASC) data
44.5 million birth records from 2001 to 2015
Brazil Census Data
Municipality-level data about demographics, education, income
Obtained data for 1991, 2000, and 2010 census
An R package that provides a way to flexibly visualize data for different geographical regions by providing a ggplot2 faceting function facet_geo()
ggplot(birth_year_state2, aes(birth_year, pct_chg)) +
geom_rect(data = birth_col, aes(fill = val),
xmin = -Inf, ymin = -Inf, xmax = Inf, ymax = Inf, alpha = 0.5) +
geom_point() +
geom_abline(slope = 0, intercept = 0, alpha = 0.25) +
scale_fill_gradient2("% Change\n2001 - 2015") +
scale_x_continuous(labels = function(x) paste0("'", substr(x, 3, 4))) +
facet_wrap(~ m_state_code) +
theme(strip.text.x = element_text(margin = margin(0.1, 0, 0.1, 0, "cm"), size = 7)) +
labs(x = "Birth Year", y = "Percentage Change in Number of Births")
Swap "facet_wrap()" with "facet_geo()"
Example: municipality grid for Acre
library(geofacet)
ac <- geogrid::read_polygons("http://bit.ly/br-ac-geojson") ac_grid <- grid_auto(ac, seed = 123) grid_preview(ac_grid, label = "name2", label_raw = "name_NOME") grid_design(ac_grid, label = "name_NOME")
library(geofacet)
ac <- geogrid::read_polygons("http://bit.ly/br-ac-geojson") ac_grid <- grid_auto(ac, seed = 123) grid_preview(ac_grid, label = "name2", label_raw = "name_NOME") grid_design(ac_grid, label = "name_NOME")
Incorporating Geofaceting into Applications
Trelliscope is a general solution that allows small multiple displays to come alive by providing the ability to interactively sort and filter the panels based on summary statistics, cognostics, that capture attributes of interest in the data being plotted
Suppose we want to understand mortality over time for each country
Observations: 1,704 Variables: 6 $ country <fctr> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afgh... $ continent <fctr> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, As... $ year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 199... $ lifeExp <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 4... $ pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372,... $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.113...
glimpse(gapminder)
qplot(year, lifeExp, data = gapminder, color = country, geom = "line")
Yikes! There are a lot of countries...
qplot(year, lifeExp, data = gapminder, color = continent, group = country, geom = "line")
Still too much going on...
qplot(year, lifeExp, data = gapminder, color = continent,
group = country, geom = "line") +
facet_wrap(~ continent, nrow = 1)
That helped a little...
p <- qplot(year, lifeExp, data = gapminder, color = continent, group = country, geom = "line") + facet_wrap(~ continent, nrow = 1) plotly::ggplotly(p)
This helps but there is still too much overplotting...
(and hovering for additional info is too much work and we can only see more info one at a time)
qplot(year, lifeExp, data = gapminder) + theme_bw() + facet_wrap(~ country + continent)
qplot(year, lifeExp, data = gapminder) + theme_bw() +
facet_trelliscope(~ country + continent, nrow = 2, ncol = 7, width = 300)
Note: this and future plots in this presentation are interactive - feel free to explore!
qplot(year, lifeExp, data = gapminder) + theme_bw() +
facet_trelliscope(~ country + continent,
nrow = 2, ncol = 7, width = 300, as_plotly = TRUE)
country_model <- function(df) lm(lifeExp ~ year, data = df) by_country <- gapminder %>% group_by(country, continent) %>% nest() %>% mutate( model = map(data, country_model), resid_mad = map_dbl(model, function(x) mad(resid(x)))) by_country
Example adapted from "R for Data Science"
# A tibble: 142 × 5 country continent data model resid_mad <fctr> <fctr> <list> <list> <dbl> 1 Afghanistan Asia <tibble [12 × 4]> <S3: lm> 1.4058780 2 Albania Europe <tibble [12 × 4]> <S3: lm> 2.2193278 3 Algeria Africa <tibble [12 × 4]> <S3: lm> 0.7925897 4 Angola Africa <tibble [12 × 4]> <S3: lm> 1.4903085 5 Argentina Americas <tibble [12 × 4]> <S3: lm> 0.2376178 6 Australia Oceania <tibble [12 × 4]> <S3: lm> 0.7934372 7 Austria Europe <tibble [12 × 4]> <S3: lm> 0.3928605 8 Bahrain Asia <tibble [12 × 4]> <S3: lm> 1.8201766 9 Bangladesh Asia <tibble [12 × 4]> <S3: lm> 1.1947475 10 Belgium Europe <tibble [12 × 4]> <S3: lm> 0.2353342 # ... with 132 more rows
country_plot <- function(data, model) { figure(xlim = c(1948, 2011), ylim = c(10, 95), tools = NULL) %>% ly_points(year, lifeExp, data = data, hover = data) %>% ly_abline(model) } country_plot(by_country$data[[1]], by_country$model[[1]])
We'll use the rbokeh package to make a plot function and apply it to the first row of our data
by_country <- by_country %>% mutate(plot = map2_plot(data, model, country_plot)) by_country
Example adapted from "R for Data Science"
# A tibble: 142 × 6 country continent data model resid_mad plot <fctr> <fctr> <list> <list> <dbl> <list> 1 Afghanistan Asia <tibble [12 × 4]> <S3: lm> 1.4058780 <S3: rbokeh> 2 Albania Europe <tibble [12 × 4]> <S3: lm> 2.2193278 <S3: rbokeh> 3 Algeria Africa <tibble [12 × 4]> <S3: lm> 0.7925897 <S3: rbokeh> 4 Angola Africa <tibble [12 × 4]> <S3: lm> 1.4903085 <S3: rbokeh> 5 Argentina Americas <tibble [12 × 4]> <S3: lm> 0.2376178 <S3: rbokeh> 6 Australia Oceania <tibble [12 × 4]> <S3: lm> 0.7934372 <S3: rbokeh> 7 Austria Europe <tibble [12 × 4]> <S3: lm> 0.3928605 <S3: rbokeh> 8 Bahrain Asia <tibble [12 × 4]> <S3: lm> 1.8201766 <S3: rbokeh> 9 Bangladesh Asia <tibble [12 × 4]> <S3: lm> 1.1947475 <S3: rbokeh> 10 Belgium Europe <tibble [12 × 4]> <S3: lm> 0.2353342 <S3: rbokeh> # ... with 132 more rows
A plot for each model
by_country %>%
trelliscope(name = "by_country_lm", nrow = 2, ncol = 4)
Turning a ggplot2 faceted display into a Trelliscope display is as easy as changing:
to:
facet_wrap()
or:
facet_grid()
facet_trelliscope()
With plots as columns, TrelliscopeJS provides nearly effortless detailed, flexible, interactive visualization in the Tidyverse
(offline demo)
read_csv("http://bit.ly/trs-mri") %>% mutate(img = img_panel(img)) %>% trelliscope("brain_MRI", nrow = 2, ncol = 5)
ggplot(by_muni_lbwt_time, aes(birth_year, pct_low_bwt)) + geom_point(size = 3, alpha = 0.6) + geom_line(stat = "smooth", method = rlm, color = "blue", size = 1, alpha = 0.5) + labs(y = "Percent Low Birth Weight", x = "Year") + facet_trelliscope(~ state_name + muni_name, nrow = 2, ncol = 4, width = 400, height = 400, name = "pct_low_bwt_muni", desc = "percent low birth weight yearly by municipality")
Live birth and census data were merged to investigate relationship between birth weight and income
There is an association indicating that for pre-term births, on average, higher-income municipalities tend to have lower birth-weight babies.
Changing the y-axis by converting the mean birth weight to an approximate z-score based on the expected birth at each gestational age, it is apparent that lower-income municipalities have heavier-than-expected pre-term babies