Predicting Surgical Site Infection (SSI)
Where are we, and how did we get here?
Rebecca Barter
A roadmap
Making the raw data useable
Understanding the data
Obtaining more data
Modeling approaches
Making the raw data useable
We finally got data!!




Data was split across
- one file per year
- multiple sheets within each excel file
for multiple types of data
- Labs
- Medications
- Previous diagnoses
- Problem list
- Vitals
- Denominator (patient info surgery info)
Total: 26 excel files, each with multiple sheets
Defining complete datasets
Wrote an R script (01_combine_year_separated_data.R) that automatically combined the data across years

Several things made this tricky...
The different sheets had slightly different column names, so initially everything beyond the first sheet was missing
Filenames differed slightly by year:
-
2014: Prob_List.xlsx
-
2015: Prob_list.xlsx
-
2016: Problem_list.xlsx
-
2017: Problem_List.xlsx
In 2017 there were two vitals datasets: *Vitals.xlsx and *Vitals_2
Making sense of the data
Identifying data oddities
- There were four patients reported with SSI whose surgeries did not appear in the main data
- Missing surgery times in 13% of patients
- discussion and exploration revealed that UCD switched to an EPIC EHR system in July 2014 - no surgery times available prior to July 2014
- Diagnosis codes in the same column were a mixture of ICD9 and ICD10
- Mistakes in definitions in the codebook
- Lab values that are >10000 but should be in [0, 20]
Understanding ID variables
PATNUM: patient
ADMISSION_ENCNUM:
Hospitalization + surgery
ENCNUM: hospitalization


PROCID:
procedure undertaken
So what is the unit of interest?
ADMISSION_ENCNUM
PATNUM
ADMISSION_ENCNUM
ENCNUM
ENCNUM
ENCNUM
PROCID
PROCID
- labs
- meds
- diagnoses
- vitals
- labs
- meds
- diagnoses
- vitals
- labs
- meds
- diagnoses
- vitals
- surgery info
What ID should we use to join data?
PATNUM
Obtaining supplementary information
Collecting more data
Surgeon characteristics (age & experience)
Surgical category
Elixhauser categories of diagnoses
Medical definitions of normal lab ranges by gender
Medication categories
(Hopefully) medication dispensing time
Deciding on the modeling approach
Traditional modeling approaches
| Surgical encounter ID | Procedure | Average serum creatinine 1 week pre-surgery | Minimum white blood cell count 1 week pre-surgery | Maximum temperature 2 days pre-surgery | antibiotics prescribed 1 day pre-surgery | SSI |
|---|---|---|---|---|---|---|
| 1 | Cardiac surgery | 0.4 | 7.9 | 98.2 | TRUE | FALSE |
| 2 | Colon surgery | 3.3 | 12.1 | 99.1 | FALSE | FALSE |
| 3 | Spinal fusion | 2.1 | 6.8 | 102.3 | FALSE | TRUE |
| 4 | Cesarean section | 1.4 | 8.3 | 97.4 | FALSE | FALSE |
| 5 | Abdominal surgery | 2.0 | 8.1 | 98.9 | TRUE | FALSE |
| 6 | Rectal surgery | 2.9 | 10.0 | 100.2 | FALSE | FALSE |
| 7 | Colon surgery | 1.5 | 12.5 | 101.1 | FALSE | TRUE |
Logistic regression, random forest, etc
Need to have well-defined predictor variables
Deep Learning modeling approaches

train/ssi/33098396_19385283.txt
Deep Learning modeling approaches

Nguyen et al. (2016), Deepr
"Normal profile" idea

A journey towards predicting infection
By Rebecca Barter
A journey towards predicting infection
- 77