Lists and Dataframes

Outline

Review

Lists

Dataframes

Factors

{vector review}

Vectors

Making vectors

How do we get mile high cities?

# Create vectors
elevations <- c(100, 5280, 2)
cities <- c('Seattle', 'Denver', 'New Orleans')

# Convert to miles
feet_to_miles <- function(feet) {
    return(feet/5280)
}

# Vecotrized operation
elevation_miles <- feet_to_miles(elevations)
# Get Boolean indicies
mile_high <- elevation_in_miles >= 1
mile_high_cities <- cities[mile_high]

# Or....
cities[elevation_in_miles >= 1]

{lists}

Lists

Sequence of elements of different types

Elements can be added or removed

Best practice to name elements

# Create a list 
person1 <- list(name="Miriam", salary=40000)

Lists

Referencing values

Different from single braces, which return lists

# Create list
person1 <- list(name="Miriam", salary=40000)

# $ syntax
salary <- person1$salary

# Double braches [[
salary <- person1[['salary']]

# List index (not advised)
salary <- person1[[2]]
# These both return another list (likely not what you want)
salary_list <- person1['salary']
salary_list <- person1[2]

Lists

Creating new values

# Create person
person1 <- list(name="Miriam", salary=40000)

# $ syntax
person1$height <- 63

# Double braches 
person1[['favorite_book']] <- "Infinite Jest"

# List index (not advised)
person1[[5]] <- 'what is this...'

{exercise 1}

Dataframes

Two-dimensional (row, column) data structure

Actually a list of vectors

Columns are the same type

Dataframes

Creating dataframes

Put vectors into a dataframe

# Create vectors
elevations <- c(100, 5000, 2)
cities <- c('Seattle', 'Denver', 'New Orleans')

city_df <- data.frame(cities, elevations)

print(city_df)

       cities elevations
1     Seattle        100
2      Denver       5000
3 New Orleans          2

Describing dataframes

Number of rows/columns

Seeing first/last sets of data

View it in RStudio

Retrieve row/column names

nrow(df)
ncol(df)
dim(df)
head(df)
tail(df)
View(df)
rownames(df)
colnames(df)

Accesing dataframes

Just like a list!

# $ syntax
cities <- city_df$cities

# Double brackets [[
elevations <- city_df[['elevations']]

# Index (not recommended)
elevations <- city_df[[2]] 

# Add a column
city_df$lived_in <- c(TRUE, FALSE, TRUE)

> print(city_df)
       cities elevations lived_in
1     Seattle        100     TRUE
2      Denver       5000    FALSE
3 New Orleans          2     TRUE

Accessing Dataframes

Using row/column position

Using row/column names

All columns for a given row

All rows for a given column

# Using row/column position in square brackets
lived_in_seattle <- city_df[1,3]
# Using row/column names
lived_in_denver <- city_df[2, 'lived_in']
# Get all columns for first row
seattle <- city_df[1,]
# Get all rows for the elevations column
all_elevations <- city_df[,'elevations']

{exercise 2}

{factors}

Factors

Appear similar to vectors

Different in underlying structure

Unable to create new values

# Create a factor variable
x <- factor(c('Jane', 'Ella', 'Mario'))
print(x)
[1] Jane  Ella  Mario
Levels: Ella Jane Mario
# Look at structure with str function
str(x)
Factor w/ 3 levels "Ella","Jane",..: 2 1 3
x[1] <- 'Mario'
str(x)
 Factor w/ 3 levels "Ella","Jane",..: 3 1 3

x[1] <- 'Mike'
Warning message:
In `[<-.factor`(`*tmp*`, 1, value = "Mike") :
  invalid factor level, NA generated

very important for statistical analysis!

(but annoying...)

Avoiding factors

When making dataframes

Upon retireval

# Create vectors
elevations <- c(100, 5000, 2)
cities <- c('Seattle', 'Denver', 'New Orleans')

city_df <- data.frame(elevations, cities, stringsAsFactors=FALSE)

city_df$cities
[1] "Seattle"     "Denver"      "New Orleans"
# Get Boolean indicies
city_df <- data.frame(cities, elevations)
cities <- as.character(city_df$cities)

{exercise 3}

Assignments

Assignment-3: Using Data (due Wed. 1/27)

Made with Slides.com