Tidyr; R Markdown

Joel Ross
Winter 2021

INFO 201

Course Feedback

Today's Objectives

By the end of class, you should be able to

  • Use tidyr to organize data into the proper shape
     
  • Reflect on Data Feminism Ch 3
     
  • Generate dynamic reports with R Markdown

Q&A Poll (20min max):

Assignment A2: COVID

A chance to work with and analyze COVID data directly from the New York Times. Practice asking questions of data!

New York Times

All data sources have bias; no data is "objective"

Consider a data set...

What does each row represent?

Consider a data set...

Now what does each row represent?

Data Shape

4 rows x 4 cols
= 16 prices

16 rows x 1 col
= 16 prices

tidyr

We can convert between wide and long data (and vice versa) using the tidyr package.

Note: The reshaping functions gather() and spread() have been replaced by the simpler pivot_longer() and pivot_wider() functions respectively

# load the tidyr library
library("tidyr")

pivot_wider()

Convert from long to wide (spread) using pivot_wider(). This creates new columns from existing rows.

state_cases_df %>% pivot_wider(
  # What column(s) will be the unique identifier?
  id_cols = c(date), # can be a vector!  

  # Which coluumn do the new column-names come from?
  names_from = state,

  # Which column do the new column-values come from?
  values_from = cases
)

pivot_longer()

Convert from wide to long (gather) using pivot_longer(). This creates more rows from existing columns.

state_df %>% pivot_longer(
  # What column(s) will new values from from?
  cols = c(cases, deaths), # can be a vector!

  # What should you name the new (label) column?
  names_to = "metric"
)

Data Presentation

Elevating Emotion

A statement of profound truth [is] revealed to us through our own emotion

Elevating Emotion

What is the purpose of each of these data presentations?

All design has ideology

Haraway calls [the god trick] a trick because it makes the viewer believe that they can see everything, all at once, from an imaginary and impossible standpoint. But it’s also a trick because what appears to be everything, and what appears to be neutral, is always what she terms a partial perspective. And in most cases of seemingly “neutral” visualizations, this perspective is the one of the dominant, default group. Think back to the presumption of whiteness as default that we discussed in the introduction, or—for an example of an actual visualization—to the redlining map discussed in chapter 2. This is a good example of the god trick at work.

Own your own subjectivity

How many numbers are in this summary?

Data Presentation

Data Reports

Data reports have hundreds (thousands!) of variables, dozens of representations (tables or graphics)

We need to update our report when the data or analysis changes

Copy and Paste?

R Markdown

An R package (framework) for dynamically generating documents from code. Formatted text, executed code, and displayed graphics are seamlessly integrated.

Markdown

Markdown is a simple syntax for specifying how plain text should be formatted.

Make this executable!

Rmd Files

R Markdown document source code is written in .Rmd files. These can be created through R Studio.

Markdown and Code

This is the code we will look at in class. This
is just plain old Markdown that lets you render
text in **bold** or _italics_. However, you can
put in a block of R code, and the document will
show the code and the results!



```{r example}
numbers <- runif(1:100)  # make random numbers

hist(numbers)  # show histogram of the numbers
```

We write Markdown code as normal in the document, but include {r} next to code blocks (chunk) to execute!

chunk label

Knitting

R Markdown files are converted into readable documents (e.g., HTML) using the knitr library. This library handles the code execution and producing the output.

Markdown and Code

This is the code we will look at in class. This
is just plain old Markdown that lets you render
text in **bold** or _italics_. However, you can
put in a block of R code, and the document will
show the code and the results!



```{r example}
numbers <- runif(1:100)  # make random numbers

hist(numbers)  # show histogram of the numbers
```

knitr Options

This is the code we will look at in class. This
is just plain old Markdown that lets you render
text in **bold** or _italics_. However, you can
put in a block of R code, and the document will
show the code and the results!



```{r example, echo = FALSE}
numbers <- runif(1:100)  # make random numbers

hist(numbers)  # show histogram of the numbers
```

Specify options after a comma in the {r} to specify what content should be rendered.

Do not echo (show) the code

Inline Code

This is the code we will look at in class. This
is just plain old Markdown that lets you render
text in **bold** or _italics_. However, you can
put in a block of R code, and the document will
show the code and the results!



```{r example, echo = FALSE}
numbers <- runif(1:100)  # make random numbers

hist(numbers)  # show histogram of the numbers

numbers_mean <- mean(numbers)  # save the mean
```

The mean of the above histogram 
is **`r numbers_mean`**

Include expressions (e.g., variables) in inline code blocks by prepending them with r

Rendering Strings

Don't print specific strings of text! Instead, save them in a variable and use inline R to render them.

```{r do_not_do_this, echo = FALSE}
# Don't do this
print("Hello world")
```
```{r do_this, echo = FALSE}
msg <- "**Hello world**"  # contains Markdown!
```

Below is the message to see:

`r msg`

Rendering Lists

Outputted strings render Markdown syntax, so you can use that to create Markdown lists.

```{r list_example, echo=FALSE}
markdown_list <- "
- Lions
- Tigers
- Bears
- Oh mys
"
```

`r markdown_list`
```{r pasted_list_example, echo=FALSE}
animals <- c("Lions", "Tigers", "Bears", "Oh mys")

# Paste `-` in front of each and join the items together 
# with newlines between
markdown_list <- paste("-", animals, collapse = "\n")
```

`r markdown_list`

Rendering Tables

Use the knitr::kable() function to render a data frame as a formatted table.

```{r kable_example, echo=FALSE}
library(knitr)  # load the package (once per document)

# make a data frame
letters <- c("a", "b", "c", "d")
numbers <- 1:4
df <- data.frame(letters = letters, numbers = numbers)

# "return" the table to render it
kable(df)
```

Analysis vs. Presentation

Best practice: do your analysis in a separate .R file and then use source() to load that file and call its functions from your .Rmd.

### In `analysis.R`

data <- read.csv("my_file.csv", stringsAsFactors = F)

# produce data frame to show
results_table <- data %>% 
  filter(criteria) %>% 
  select(my_cols)

make_scatterplot <- function() {
  plot(data)  # return a scatterplot
}
# In `report.Rmd`

```{r setup, include=F}
library(knitr)
source("analysis.R")  # load analysis file
```

```{r table, echo=F}
kable(results_table)  # render the table
```

```{r plot, echo=F}
make_scatterplot()  # call function to get plot
```

Action Items!

  • A2: COVID due next week

  • Read: Programming Skills Ch 20 (required)

  • Read: Data Feminism: Chapter 4


Next: multi-player git

info201w21-r-markdown

By Joel Ross

info201w21-r-markdown

  • 794