Introducing the tidyverse

d.navarro@unsw.edu.au

Danielle Navarro

https://slides.com/djnavarro/introducing-the-tidyverse

Where are we up to?  (https://psyr.org)
 

What skills have you learned so far?

What skills do we need to learn next?

# install.packages("usethis")
library(usethis)
use_course("https://psyr.org/part2_tidyverse.zip")

Do it yourself: install today's project folder. Check your learning: what does each part of this code do?

Why be "Little Miss Tidyverse"?

What is tidyverse?

tidyverse.org

https://gist.github.com/emitanaka/ef1d1b6ade5e57acc4734bc4bfbcc0cd

https://github.com/djnavarro/skyliner/blame/master/skyliner.R

1200 raw data files

merge them all in a few seconds with one small script

Okay, let's do this!

www.bigblocksingsong.com

Open the project!

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

R Markdown

Other topics?

 

Revision: Core concepts

script1.R

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

# A quick refresher to check your knowledge
# from part 1. Let's start out by creating
# some variables:

# Create a vector with two numbers
YOUR-CODE-HERE

# Create a vector with two pieces of text
YOUR-CODE-HERE

Revision: Packages

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

 

Revision: Packages

install.packages("tidyverse")

script2.R

library(tidyverse)

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

 

# Script2.R installs & loads tidyverse.
# Just to check... it draws a picture!

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

 

script2.R

# If you see this, it is working!

Revision: Packages

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

 

Revision exercise: Packages

  • Test your knowlege: there is a package called here that we're going to use
  • Install the here package
  • Load the here package 

New exercise: Draw this plot!

script3.R

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

 

# The here package provides you with 
# one function, here(). It is very
# useful for selecting a file

library(here) # load package
here()   # where is the .Rproj file?

The here package

[1] "C:/Users/Dan/Desktop/part2_tidyverse"

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

# Try this code yourself:

here("datasets", "afl.csv") 

The here package


my_data %>%
   do_this_thing() %>%
   now_do_that_thing() %>%
   and_one_more()

Key concept: %>%

This is referred to as "piping" the data through a series of steps

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 


the_result <- my_data %>%
   do_this_thing() %>%
   now_do_that_thing() %>%
   and_one_more()

Key concept: %>%

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 


the_summary <- my_data %>%
   group_by(GROUP) %>%
   summarise( 
    mean_var = mean(OUTCOME)
   )

print(the_summary)
 

Group & summarise

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

script4.R

script5.R

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Additional columns?

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

 

the_summary <- my_data %>%
   group_by(GROUP) %>%
   summarise( 
    median = median(VARIABLE),
    mean = mean(VARIABLE),
    std_dev = sd(VARIABLE)
   )

print(the_summary)

Group & summarise

(just FYI)

the_summary <- my_data %>%
   group_by(GROUP) %>%
   summarise( 
    median = median(VARIABLE, na.rm = TRUE),
    mean = mean(VARIABLE, na.rm = TRUE),
    std_dev = sd(VARIABLE, na.rm = TRUE)
   )

print(the_summary)

Group & summarise

Handling missing data?

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

(just FYI)

my_data %>% 
  group_by(variable) %>%
  summarise(count = n())
my_data %>% 
  group_by(variable1, variable2) %>%
  summarise(count = n())

Frequency tables?

Group & summarise

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

(just FYI)

Explore your data!

result <- data %>%
   group_by(GROUP) %>%
   summarise(mean_var = mean(OUTCOME))
data <- read_csv(PATH_TO_FILE)
library(tidyverse)
picture <- my_data %>%
  ggplot(aes(x=XVAR, y=YVAR)) + 
  geom_point() + 
  geom_smooth()

(1)

(2)

(3)

(4)

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Arrange

# for ascending order
my_data %>% arrange(VARIABLE)

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

# for descending order:
my_data %>% arrange(-VARIABLE)
my_data %>% arrange(desc(VARIABLE))

# to sort by one variable then another
my_data %>% arrange(VARIABLE1, VARIABLE2)
my_data %>% 
    filter(a_logical_expression)
# data from the women
my_data %>% 
    filter(gender == "female")

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Filter

# data from women over 50
my_data %>% 
    filter(gender == "female" & age >= 50)
# keeps all rows, but only retain the
# columns for VARIABLE1 and VARIABLE2

my_data %>% 
    select(VARIABLE1, VARIABLE2)

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Select

# reminder: arrange, filter, select DON'T 
# change the original data set (my_data).
# instead they return a new one, which you
# can assign to a variable

new_data <- old_data %>%
    filter( SOME ROWS ) %>% 
    select( SOME VARIABLES ) %>% 
    arrange( BY VARIABLE )

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Arrange, filter, select

# reminder: arrange, filter, select DON'T 
# change the original data set (my_data).
# instead they return a new one, which you
# can assign to a variable

new_data <- old_data %>%
    filter( SOME ROWS ) %>% 
    select( SOME VARIABLES ) %>% 
    arrange( BY VARIABLE )
# here's how you alter the original:

my_data <- my_data %>%
    filter( SOME ROWS ) %>% 
    select( SOME VARIABLES ) %>% 
    arrange( BY VARIABLE )

script6.R

script7.R

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Arrange, filter, select

# typical use case: create new variables 
# within the ORIGINAL data set:

my_data <- my_data %>%
    mutate(variable = expression)
# e.g.: create a log-transformed variable
my_data <- my_data %>% 
    mutate(log_RT = log(rt))

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Mutate

script8.R

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

Mutate

Make it pretty!

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

 

Grammar of Graphics

pic <- my_data %>%
    ggplot(aes(AESTHETICS)) + 
    geom_something() 

plot(pic)

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

 

  • Aesthetics: things that can be mapped to variables (x-position, y-position, colour, shape, fill, etc)
  • Geometries: tools that can visually render the data (points, lines, histogram, violin, boxplots, contours, etc)

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

 

Grammar of Graphics

Grammar of Graphics

data %>%  
  ggplot(aes(  # set up the plot & aesthetics
     x = named_month, 
     y = total_users)) + 
  geom_col() +                    # add bars
  xlab("Month (2018)") +          # x-axis label
  ylab("Total Number of Users") + # y-axis label
  theme_bw()   # black & white theme

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

script9.R

Points and lines

tidy_users %>% 
    ggplot(aes(x = DayNum, y = Users)) +
    geom_point() +
    geom_line()

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

 

script9.R

Points and lines

# see script 9 
# for extra 
# customisation 
# options...

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

tidy_users %>%
    ggplot(aes(x = Users)) + 
    geom_histogram()

Histogram

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

# see script 9 
# for extra 
# customisation 
# options...

Box plot

tidy_users %>%
  ggplot(aes(
    x = MonthName,
    y = Users,
    colour = MonthName
  )) +
  geom_boxplot()

script9.R

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Violin plot

tidy_users %>%
  ggplot(aes(
    x = MonthName,
    y = Users,
    colour = MonthName
  )) +
  geom_violin()

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

script9.R

Error bar plots

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

users_summary <- tidy_users %>%
  group_by(MonthName) %>%
  summarise(
    mean_user = mean(Users),
    sem = sd(Users) / sqrt(n())
  )

pic <- users_summary %>%
  ggplot(aes(
    x = MonthName,
    y = mean_user,
    ymin = mean_user - sem,
    ymax = mean_user + sem 
  )) +
  geom_col() + 
  geom_errorbar(width = .5)

script9.R

Error bar plots

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Facets

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

tidy_users %>%
  ggplot(aes(x = Users)) +
  geom_histogram(bins = 10) + 
  facet_wrap(~named_month)

script9.R

Overall workflow

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

# read...
afl <- read_csv(here("datasets", "afl.csv"))

# summarise...
attendance <- afl %>%
  group_by(year, game_type) %>%
  summarise(attendance = mean(attendance))

# visualise...
attendance %>%
  ggplot(aes(x = year, y = attendance)) +
  geom_point() + 
  facet_wrap(~ game_type) + 
  geom_smooth()

script10.R

Overall workflow

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

 

https://psyr.org/visualising-data.html

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

www.nailsmag.com/page/70218/market-research

tidy_nails.csv

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

These are two different "kinds" of way to "count" prevalence

script11.R

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Gather

long_nails <- tidy_nails %>%
  gather(key = "kind", value = "count", 
         technicians, salons)

script12.R

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

What about going the other way??

script12.R

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Spread

flat_nails <- pointy_nails %>%
  spread(key = "kind_year", value = "count")

script13.R

Starting the tutorial

Get tidyverse

Load & inspect data

Simple visualisation

The pipe

Group & summarise

Arrange, filter, select

Mutate data

More visualisation

Gather & spread

 

Nail art...

Introducing the Tidyverse

By Danielle Navarro

Introducing the Tidyverse

Introduction to tidyverse

  • 145
Loading comments...

More from Danielle Navarro