https://slides.com/joelross/info201w17-ggplot2/live
By the end of class, you should be able to
Words (verbs) used to describe ways to manipulate data:
Words used to describe the visual components and aspects of a graphic.
Organize plots into layers, where each layer has:
ggplot2 is an R package (library) that implements this Grammar of Graphics.
It provides declarative functions for specifying plots in terms of the grammar.
install.packages("ggplot2") # once per machine
library("ggplot2") # load the package
Use the
ggplot()
function to draw a plot, specifying plot elements via the grammar.
# plot the `mpg` data set, with highway milage
# on the x axis and engine displacement (power)
# on the y axis:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
data to plot
add geometry
geometric objects (points)
aesthetic mappings
property = column
The
aes()
function specifies
aesthetic mappings from data values to
visual channels.
# color the data by car type
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
x-location based on
displ column
(continuous)
color based on class column (discrete)
Can also set visual channels without mapping
# blue points!
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy), color = "blue")
ggplot2 supports many different geoms, each created with a function. Each geom requires/supports different aesthetics.
# line chart of milage by engine power
ggplot(data = mpg) +
geom_line(mapping = aes(x = displ, y = hwy))
# bar chart of car type
ggplot(data = mpg) +
geom_bar(mapping = aes(x = class))
no y mapping,
automatically aggregated
Each plot can include multiple geoms, which inherit data and aesthetics unless specified otherwise.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy), se=FALSE)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(se=FALSE)
Words used to describe the visual components and aspects of a graphic.
Many geoms have a default statistical transformation used to calculate new data to plot (e.g., for bar graphs).
# bar chart of car type
ggplot(data = mpg) +
geom_bar(mapping = aes(x = class), stat="count")
explicit "count"
for y
Each geom is associated with a stat_ function, and can be used interchangeably.
# these two charts are identical
ggplot(data = mpg) +
geom_bar(mapping = aes(x = class))
ggplot(data = mpg) +
stat_count(mapping = aes(x = class))
Many geoms have a default position adjustment use to lay out the plot separate from the aesthetic mappings
# bar chart of milage, colored by car type
ggplot(data = mpg) +
geom_bar(mapping = aes(x = hwy, fill = class))
# bar chart of milage, colored by car type
ggplot(data = mpg) +
geom_bar(aes(x=hwy, fill=class), position="fill")
Add scales to a plot to determine the range of (aesthetic) values data should map to (replacing the default)
# city/highway milage relationship
ggplot(data = mpg) +
geom_point(mapping = aes(x = cty, y = hwy, color = class)) +
scale_x_reverse() + # reverse x axis
scale_color_hue(l = 70, c = 30) # custom color scale
aesthetic
to scale
scale to use
1
2
3
4
5
"red"
"yellow"
"blue"
"green"
"purple"
Data
Aesthetic
Use palettes from colorbrewer.org to specify color schemes that are color-bind safe.
# efficiency by engine size, colored nicely
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy, color = class), size=4) +
scale_color_brewer(palette = "Set3")
You can also add a specific coordinate system to a plot.
# horizontal bar chart of milage, colored by car type
ggplot(data = mpg) +
geom_bar(mapping = aes(x = hwy, fill = class)) +
coord_flip()
# A pie chart = stacked bar chart + polar coordinates
ggplot(mpg, aes(x = factor(1), fill = factor(cyl))) +
geom_bar(width = 1) +
coord_polar(theta = "y")
make numeric vector into factor
angle based on (aggregate) "y"
Break a plot into parts with
facets (similar to
group_by()
in
dplyr
). Each facet acts like a "level" in a factor, with a plot for each level.
# a plot with facets based on vehicle type.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~class)
A formula , read as
"as a function of"
6 rows x 4 cols
= 24 scores
24 rows x 1 col
= 24 scores
We can convert between
wide and
long data (and vice versa) using the
tidyr
package.
# Alternatively, install "tidyverse"
install.packages("tidyr") # once per machine
library("tidyr")
# Make a data.frame (example)
students <- data.frame(
name = c('Mason', 'Tabi', 'Bryce', 'Ada', 'Bob','Filipe'),
section = c('a','a','a','b','b','b'),
math_exam1 = c(91, 82, 93, 100, 78, 91),
math_exam2 = c(88, 79, 77, 99, 88, 93),
spanish_exam1 = c(79, 88, 92, 83, 87, 77),
spanish_exam2 = c(99, 92, 92, 82, 85, 95)
)
students.long <- gather(students.wide,
key = exam,
value = score,
math_exam1, math_exam2,
spanish_exam1, spanish_exam2
)
Convert from
wide to
long using
gather()
. The
key is a new column containing
gathered colnames, and
value is a new column with their values.
# spread by column "exam"
stu.wide <- spread(students.long, key = exam, value = score)
# spread by column "name"
stu.wide.name <-
spread(students.long, key = name, value = score)
Convert from
long to
wide using
spread()
. The
key is where to get the
new colnames, and
value is where to get the values
names for new columns
col data to populate with
Be comfortable with module 13
Assignment 5 due Thursday before class
(Assignment 6 online soon)
Thursday: What makes a good visualization?
Also maps.