Simulation

What?

Why?

How?

{what is a simulation?}

Simulation

The imitation of the operation of a real-world process or system over time (wiki)

A way to model random events, such that simulated outcomes closely match real-world outcomes (source)

Using random numbers to generate data with known characteristics and which follow hypothesized processes (source)

{why?}

Why simulate

You don't have any data (...yet...)

It's easier than computation (...sometimes....)

Make predictions based on statisitical models

Understand uncertainties in statistical models

{an example}

Predicting health burden...

1/10 people get the flu

It's not so bad to have the flu (.05, on a scale of 0 - 1)

How much sickness is experienced by a pop. of 1000 people?

Number Flu = Pop. * Flu Probability = 100

Number Flu = Pop. * Flu Probability = 100

Flu Burden = Number Flu * Flu Disability = 5

Flu Burden = Number Flu * Flu Disability = 5

Predicting health burden...

1/10 people get the flu, 2/10 get pneumonia

Flu is .05 disabling, pneumonia is .1 disabling

Disability = 1000 * (Flu Prob * Flu Disability + Pneu. Prob. * Pneu. Disability)

Disability = 1000 * (Flu Prob * Flu Disability + Pneu. Prob. * Pneu. Disability)

How much sickness is experienced by a pop. of 1000 people?

Disability = 25

Disability = 25

Flu

Pneu.

What about people that have both?

The disability of having the flu + pneu. is not the sum of their disabilities

Flu

Pneu.

DW(combined) = 1 - (1 - flu) * (1 - pneu)

DW(combined) = 1 - (1 - flu) * (1 - pneu)

DW(combined) = .145

DW(combined) = .145

Flu

Pneu.

We need to estimate disease overlap...

.... for 1,000+ diseases....

.... 1,000 times ...

.... in 22,000+ demographic groups (country/age/sex/year)

Simulation!

{how?}

Random number generators

Randomly generate binomial outcomes (0,1)

Randomly generate continuous outcomes

Randomly sample out of a vector

# Given a 10% probability, simulate 1000 people getting the flu
people <- rbinom(n = 1000, size = 1, prob = .2)

# Generate a normal distribution of grades for 20 students
grades <- rnorm(20, mean = 80, sd = 10)

# Get the eye color for 100 people: blue, green, or hazel
colors <- c('blue', 'green', 'hazel')
eye_colors <- sample(colors, 100, replace = TRUE)

There is a very long, straight highway with some number of cars (N) placed somewhere along it, randomly. The highway is only one lane, so the cars can’t pass each other. Each car is going in the same direction, and each driver has a distinct positive speed at which she prefers to travel. Each preferred speed is chosen at random. Each driver travels at her preferred speed unless she gets stuck behind a slower car, in which case she remains stuck behind the slower car. On average, how many groups of cars will eventually form? (A group is one or more cars traveling at the same speed.)

For example, if the car in the very front happens to be slowest, there will be exactly one group — everybody will eventually pile up behind the slowpoke. If the cars happen to end up in order, fastest to slowest, there will be N groups — no car ever gets stuck behind a slower car.

Back to our riddle...

Initial Steps

Simulate car speeds

Write a function to test if a car creates a new group

# Simulate 100 cars w/mean speed 50
cars <- rnorm(n = 100, mean = 50, sd = 5)

# A function to determine if a car is slower than all of the cars
# in front of it (which createa a new group of cars **behind** it)
slower_than <- function(index) {
  return(cars[index] < min(cars[1:index - 1]))
}

Apply the function to the list of indicies

# Apply the slower_than function to all of the cars
new_group <- lapply(2:length(cars), slower_than)

Initial Steps

# Simulate 100 cars w/mean speed 50
cars <- rnorm(n = 1000, mean = 50, sd = 5)

# A function to determine if a car is slower than all of the cars
# in front of it (which createa a new group of cars **behind** it)
slower_than <- function(index) {
  return(cars[index] < min(cars[1:index - 1]))
}

# Apply the slower_than function to all of the cars
new_group <- lapply(2:length(cars), slower_than)

# Determine number of groups created
groups <- length(new_group[new_group == TRUE]) + 1

Better yet....

simulate_groups <- function() {
    # Simulate 100 cars w/mean speed 50
    cars <- rnorm(n = 1000, mean = 50, sd = 5)
    
    # A function to determine if a car is slower than all of the cars
    # in front of it (which createa a new group of cars **behind** it)
    slower_than <- function(index) {
      return(cars[index] < min(cars[1:index - 1]))
    }
    
    # Apply the slower_than function to all of the cars
    new_group <- lapply(2:length(cars), slower_than)
    
    # Determine number of groups created
    groups <- length(new_group[new_group == TRUE]) + 1
    return(groups)
}

Loops

A foundational programming structure

Execute the same block of code multiple times

Pass a different element into the block of code in each iteration

items <- 1:10

# Loop through each element in your vector `items`
for(i in items) {
    print(2*i) # i takes on the identity of each element in the vector `items`
}

Repeating your simulation

repeat_simulation <- function(num_sims) {
  # Create an empty vector to store your results
  results <- vector()
  
  # Run your simulation 100 times, and track your results
  for(i in 1:num_sims) {  
    results <- c(results, simulate_groups())
  }
  # Work with your results
  hist(results)
  return(mean(results))
}

Parameterizing your simulation

simulate_groups <- function(mean, sd, num_cars) {
    # Simulate 100 cars w/mean speed 50
    cars <- rnorm(n = num_cars, mean = mean, sd = sd)
    
    # A function to determine if a car is slower than all of the cars
    # in front of it (which createa a new group of cars **behind** it)
    slower_than <- function(index) {
      return(cars[index] < min(cars[1:index - 1]))
    }
    
    # Apply the slower_than function to all of the cars
    new_group <- lapply(2:length(cars), slower_than)
    
    # Determine number of groups created
    groups <- length(new_group[new_group == TRUE]) + 1
    return(groups)
}

Let's make it more interesting...

Create an interface: ui.R

# Create UI
shinyUI(fluidPage(
  # UI for the traffic simulation
  titlePanel('Traffic Simulation'),
  
  # Controls
  sidebarLayout(
    sidebarPanel(  
      sliderInput("num_cars", "Number of Cars:", 
                  min = 10, max = 1000, value = 100, step = 10),
      sliderInput("num_sims", "Iterations of Simulation", 
                  min = 10, max = 1000, value = 100, step= 10),
      sliderInput("speed", "Average Speed", 
                  min = 10, max = 150, value = 40, step= 5),
      sliderInput("deviation", "Speed Deviation", 
                  min = 1, max = 20, value = 5, step= 1)
    ),
    # Render plot
    mainPanel(
      plotOutput("histogram")
    )
  )
))

Run your simulation: server.R

# Run the traffic simulation
source('traffic_sim.R')
shinyServer(function(input, output) {
  output$histogram <- renderPlot({
    repeat_simulation(
        num_sims = input$num_sims, 
        mean = input$speed, 
        sd = input$deviation, 
        num_cars = input$num_cars
    )
  })
})

Simulation

What?

Why?

How?

{what is a simulation?}

Simulation

The imitation of the operation of a real-world process or system over time (wiki)

A way to model random events, such that simulated outcomes closely match real-world outcomes (source)

Using random numbers to generate data with known characteristics and which follow hypothesized processes (source)

{why?}

Why simulate

You don't have any data (...yet...)

It's easier than computation (...sometimes....)

Make predictions based on statisitical models

Understand uncertainties in statistical models

{an example}

Predicting health burden...

1/10 people get the flu

It's not so bad to have the flu (.05, on a scale of 0 - 1)

How much sickness is experienced by a pop. of 1000 people?

Predicting health burden...

1/10 people get the flu, 2/10 get pneumonia

Flu is .05 disabling, pneumonia is .1 disabling

How much sickness is experienced by a pop. of 1000 people?

Flu

Pneu.

What about people that have both?

The disability of having the flu + pneu. is not the sum of their disabilities

Flu

Pneu.

Flu

Pneu.

We need to estimate disease overlap...

.... for 1,000+ diseases....

.... 1,000 times ...

.... in 22,000+ demographic groups (country/age/sex/year)

Simulation!

{how?}

Random number generators

Randomly generate binomial outcomes (0,1)

Randomly generate continuous outcomes

Randomly sample out of a vector

Back to our riddle...

Initial Steps

Simulate car speeds

Write a function to test if a car creates a new group

Apply the function to the list of indicies

Initial Steps

Better yet....

Loops

A foundational programming structure

Execute the same block of code multiple times

Pass a different element into the block of code in each iteration

Repeating your simulation

Parameterizing your simulation

Let's make it more interesting...

Create an interface: ui.R

Run your simulation: server.R

that's it!

more involved (shinyapp, code)

Assignments

Assignment-8: building applications (due Wed. 3/2)

Keep working on your final projects!

simulation

More from Michael Freeman