Patients readmission prediction

Tohei Yokogawa

Data Scientist at Metis

Patients Readmission

Readmission rate is one of the key indicators for the hospitals to maintain their quality.

 

In 2014, Medicare fined a record number of 2,610 hospitals  for having too many patients return within a month for additional treatments.

(http://kaiserhealthnews.org/news/medicare-readmissions-penalties-2015/)

Predicting readmission within 30 days is very critical not only for the hospitals. If a patient can see his readmission risks, he and his family can prepare and prevent unwanted readmission.

Predicting readmission

  • Readmitted within 30 days of discharge
  • Readmitted after 30 days of discharge
  • No readmission record (between 1999-2008)

Data Source

  • 101,766 patients hospitalization records.
  • Representing 10 years (1999–2008) of clinical care at 130 hospitals.
Beata Strack, Jonathan DeShazo,et al.
BioMed Research International, Volume 2014 (2014)
http://www.hindawi.com/journals/bmri/2014/781670/

'race', 'gender', 'ages', ... etc

101,766 patients, 24 features

Readmission (<30day, >30day, No)

Predict

Classification

'race', 'gender', 'ages', 'admission', 'discharge', 'admsource', 'time_in_hospital', 'payer_code', 'num_lab_procedures', 'num_procedures', 'num_medications', 'number_outpatient', 'number_emergency', 'number_inpatient', 'diag1', 'diag2' 'number_diagnoses', 'max_glu_serum', 'A1Cresult', 'insulin', 'change', 'diabetesMed'

24 features

Cleaning data

Main focus was how much these data set can predict occurance of readmission to the hospital, and identify  more important features for the prediction. Medical diagnosis was categorize into 40 categories based on ICD-9 codes.

Random split 4:1:5

Training set (4), Validation set(1) and Testing set (5)

Title Text

Machine Learning

Calibrate model

Random Forest

Gradient Boost

Choosing model

F1 score from testing data set

Predicting who will be readmitted within 30 days

98% Precision for within 30 days readmission

Confusion Matrix

8440        865            2
3160      2093            1
1154        408        168

Predicted classes

Actual

classes

No readm

>30 days

<30 days

No readm

>30 days

<30 days

specificity = TN/ (FP+TN) = 0.9997 =(8440+865+3160+2093)/(3+8440+865+3160+2093)

Precision is 0.98 in the model, at the same time the recall(sensitivity) score is low (0.0971 =  168/1730 readmission cases).

 

recall (sensitivity) = TP/(TP+FN) = 0.0971

specificity = TN/ (FP+TN) = 0.9997

Top 18 important features

1. num_lab_procedures: 0.0463
2. num_medications:    0.0454
3. number_inpatient:   0.0442
4. time_in_hospital:   0.0400
5. ages:               0.0391
6. number_diagnoses:   0.0365
7. num_procedures:     0.0325
8. gender_Male:        0.0239
9. number_outpatient:  0.0203
10.number_emergency:   0.0186

11.insulin_Steady:     0.0161
12.payer_code_MC:      0.0150
13.Race_Caucasian:     0.0150
14.diag2_Circulatory:  0.0142
15.medication change:  0.0137
16.admission_Urgent:   0.0118
17.diag3_Neoplasms:    0.0104
18.diag2_Diabetes:     0.0103

num_lab_procedures: Number of lab tests performed during the encounter
num_procedures: Number of procedures (other than lab tests) performed during the encounter
num_medication: Number of distinct generic names administered during the encounter
number_outpatient: Number of outpatient visits of the patient in the year preceding the encounter
number_emergency: Number of emergency visits of the patient in the year preceding the encounter
Number_inpatient: Number of inpatient visits of the patient in the year preceding the encounter
number_diagnoses: Number of diagnoses entered to the system
Change of medication: Indicates if there was a change in diabetic medications: Nominal (either dosage or generic name). Values: “change” and 0% “no change”

 

Length of stay in hospital prediction

(multivariate regression)

 

 This linear regression model is not good enough to predict
length of stay in hospital (R^2= 0.341).

HOWEVER

Identified several features which have significant effect on the length of stay

After eliminated several features, choose best 4 features (primary diagnosis, number of lab procedures, number ofmedication).

time_in_hospital ~ diag1  + admsource + num_lab_procedures + num_procedures  + num_medications

TOP 13 factors for length of stay in hospital


1. admsource[Court/Law Enforcement]                                                                                                   3.6739
2. diag1[BIPOLAR I DISORDER, SINGLE MANIC EPISODE, UNSPECIFIED]                                            2.3783
3. diag1[External causes]                                                                                                                            2.1953
4. admsource[Transfer from critial access hospital]                                                                              1.9351
5. diag1[Mental]                                                                                                                                           1.8092
6. admsource[Transfer from a Skilled Nursing Facility (SNF)]                                                              1.6880
7. diag1[DIABETES PERIPHERAL CIRCULATORY DISORDERS, TYPE II OR UNSPECIFIED TYPE]         1.3647
8. admsource[Transfer from a hospital]                                                                                                  0.8428
9. admsource[Transfer from another health care facility]                                                                   0.5842
10. diag1[Obesity BODY MASS INDEX 38.0-38.9]                                                                                   0.2040
11. num_procedures                                                                                                                                   0.1817
12. num_medications                                                                                                                                 0.1307
13. diag1[PNEUMONIA ORGANISM UNSPECIFIED]                                                                               0.1130

 

Coefficients of noninteraction terms estimated from the final linear regression model

Connection between primary diagnosis and secondary diagnosis

101,766 patients primary diagnosis and secondary diagnosis data was visualized

Made with Slides.com