Lists and Dataframes
Outline
Review
Lists
Dataframes
Factors
{vector review}
Vectors
Making vectors
How do we get mile high cities?
# Create vectors
elevations <- c(100, 5280, 2)
cities <- c('Seattle', 'Denver', 'New Orleans')
# Convert to miles
feet_to_miles <- function(feet) {
return(feet/5280)
}
# Vecotrized operation
elevation_miles <- feet_to_miles(elevations)
# Get Boolean indicies
mile_high <- elevation_in_miles >= 1
mile_high_cities <- cities[mile_high]
# Or....
cities[elevation_in_miles >= 1]
{lists}
Lists
Sequence of elements of different types
Elements can be added or removed
Best practice to name elements
# Create a list
person1 <- list(name="Miriam", salary=40000)
Lists
Referencing values
Different from single braces, which return lists
# Create list
person1 <- list(name="Miriam", salary=40000)
# $ syntax
salary <- person1$salary
# Double braches [[
salary <- person1[['salary']]
# List index (not advised)
salary <- person1[[2]]
# These both return another list (likely not what you want)
salary_list <- person1['salary']
salary_list <- person1[2]
Lists
Creating new values
# Create person
person1 <- list(name="Miriam", salary=40000)
# $ syntax
person1$height <- 63
# Double braches
person1[['favorite_book']] <- "Infinite Jest"
# List index (not advised)
person1[[5]] <- 'what is this...'
{exercise 1}
Dataframes
Two-dimensional (row, column) data structure
Actually a list of vectors
Columns are the same type
Dataframes
Creating dataframes
Put vectors into a dataframe
# Create vectors
elevations <- c(100, 5000, 2)
cities <- c('Seattle', 'Denver', 'New Orleans')
city_df <- data.frame(cities, elevations)
print(city_df)
cities elevations
1 Seattle 100
2 Denver 5000
3 New Orleans 2
Describing dataframes
Number of rows/columns
Seeing first/last sets of data
View it in RStudio
Retrieve row/column names
nrow(df)
ncol(df)
dim(df)
head(df)
tail(df)
View(df)
rownames(df)
colnames(df)
Accesing dataframes
Just like a list!
# $ syntax
cities <- city_df$cities
# Double brackets [[
elevations <- city_df[['elevations']]
# Index (not recommended)
elevations <- city_df[[2]]
# Add a column
city_df$lived_in <- c(TRUE, FALSE, TRUE)
> print(city_df)
cities elevations lived_in
1 Seattle 100 TRUE
2 Denver 5000 FALSE
3 New Orleans 2 TRUE
Accessing Dataframes
Using row/column position
Using row/column names
All columns for a given row
All rows for a given column
# Using row/column position in square brackets
lived_in_seattle <- city_df[1,3]
# Using row/column names
lived_in_denver <- city_df[2, 'lived_in']
# Get all columns for first row
seattle <- city_df[1,]
# Get all rows for the elevations column
all_elevations <- city_df[,'elevations']
{exercise 2}
{factors}
Factors
Appear similar to vectors
Different in underlying structure
Unable to create new values
# Create a factor variable
x <- factor(c('Jane', 'Ella', 'Mario'))
print(x)
[1] Jane Ella Mario
Levels: Ella Jane Mario
# Look at structure with str function
str(x)
Factor w/ 3 levels "Ella","Jane",..: 2 1 3
x[1] <- 'Mario'
str(x)
Factor w/ 3 levels "Ella","Jane",..: 3 1 3
x[1] <- 'Mike'
Warning message:
In `[<-.factor`(`*tmp*`, 1, value = "Mike") :
invalid factor level, NA generated
very important for statistical analysis!
(but annoying...)
Avoiding factors
When making dataframes
Upon retireval
# Create vectors
elevations <- c(100, 5000, 2)
cities <- c('Seattle', 'Denver', 'New Orleans')
city_df <- data.frame(elevations, cities, stringsAsFactors=FALSE)
city_df$cities
[1] "Seattle" "Denver" "New Orleans"
# Get Boolean indicies
city_df <- data.frame(cities, elevations)
cities <- as.character(city_df$cities)
{exercise 3}
Assignments
Assignment-3: Using Data (due Wed. 1/27)
lists-and-dataframes
By Michael Freeman
lists-and-dataframes
- 1,627