Natural Disasters:
Prediction Models for Severity of Fatalities

Ty Mulholland

The Problem

 

There have been over 16,000 natural disasters in the last century.
 

How do countries prepare for these?
 

Do some countries have less advantages in disaster preparedness?

 

Background

  • EM-DAT - International Disaster Database 

 

  • Records Natural, Technical and Complex Disasters (Famine)

Strategy

2 Ensembles



1) Fatality Occurrence (Binomial- Yes/No)

  • Using caret (kNN, Rpart, SVM, treebag)


2) Fatality Levels (Multinomial- Low,High,Extreme)

  • Using h2o (Naive Bayes, randomForest, GBM, Neural Net)




Data

50 Variables (28 character, 21 integer, 1 numeric)

 

16,623 observations

 

Time Period 1900-2023

 

Only Natural Disasters

 

 

 

Data

Data

Data

Data

Challenges

  • Data is very sparse

Challenges

  • Much of the data is U.S. specific
    "Reconstruction.Costs...000.US.."          
    "Reconstruction.Costs..Adjusted...000.US.."
    "Insured.Damages...000.US.."  
    "Insured.Damages..Adjusted...000.US.."      
    "Total.Damages...000.US.."                  
    "Total.Damages..Adjusted...000.US.." 

Challenges

  • Almost all relevant variables were character columns
    • Location- Country, Region
    • Disaster Categorization- Group, Subgroup, Type, Subtype, Subsubtype

Approach

Feature Selection

  • Since there were so many highly sparse variables, narrowing was logical
     
  • Much of the data represented higher categorization (i.e Continent -> Country)
     
  • Highly correlated variables such as No.Affected and Total.Affected
     

Approach

Imputation

  • Mode by groups where applicable
     
  • 0 where NA represented no observation

 

Variable Creation

  • "Duration" field from dates of events
     
  • Death Level - ("Low", "High", "Extreme")
     
  • Death Occurrence  - ("1", "0") 

Model 1- Fatality Occurrence

Package: Caret

Methods: kNN, Rpart, SVM, treebag

Ensemble: Stacked

Accuracy

75.9%

77.01%

76.55%

78.13%

 

 

Model

kNN

Rpart

SVM

Bagging Tree (Treebag)

 

Stacked Ensemble

Accuracy

77.12%

76.5%

75.11%

76%

 

78.96%

Model 2- Fatality Level

Package: h2o

Methods: Naive Bayes, randomForest, GBM, Neural Net

Ensemble: Stacked

Model

Naive Bayes

Random Forest

Gradient Boosting

Neural Net

 

Stacked Ensemble

Log-loss

0.4118196

0.009557498

0.001596376

0.0766861


0.0009741652



Key Findings

  • The Fatality Occurrence ensemble needs more tuning
     
  • Could benefit from more data. Consider population density, GDP, geographic feature location data.
     
  • Predictive trends over time would be a logical next step

Ty Mulholland

ty@tymulholland.com

@tymulholland

THANK YOU!

Copy of Copy of Copy of deck

By Ty Mulholland

Copy of Copy of Copy of deck

  • 77