Tohei Yokogawa
Data Scientist at Metis
Readmission rate is one of the key indicators for the hospitals to maintain their quality.
In 2014, Medicare fined a record number of 2,610 hospitals for having too many patients return within a month for additional treatments.
(http://kaiserhealthnews.org/news/medicare-readmissions-penalties-2015/)
Predicting readmission within 30 days is very critical not only for the hospitals. If a patient can see his readmission risks, he and his family can prepare and prevent unwanted readmission.
Beata Strack, Jonathan DeShazo,et al.
BioMed Research International, Volume 2014 (2014)
http://www.hindawi.com/journals/bmri/2014/781670/
'race', 'gender', 'ages', ... etc
Readmission (<30day, >30day, No)
Predict
Classification
'race', 'gender', 'ages', 'admission', 'discharge', 'admsource', 'time_in_hospital', 'payer_code', 'num_lab_procedures', 'num_procedures', 'num_medications', 'number_outpatient', 'number_emergency', 'number_inpatient', 'diag1', 'diag2' 'number_diagnoses', 'max_glu_serum', 'A1Cresult', 'insulin', 'change', 'diabetesMed'
Main focus was how much these data set can predict occurance of readmission to the hospital, and identify more important features for the prediction. Medical diagnosis was categorize into 40 categories based on ICD-9 codes.
Training set (4), Validation set(1) and Testing set (5)
Random Forest
Gradient Boost
F1 score from testing data set
98% Precision for within 30 days readmission
8440 865 2
3160 2093 1
1154 408 168
Predicted classes
Actual
classes
No readm
>30 days
<30 days
No readm
>30 days
<30 days
specificity = TN/ (FP+TN) = 0.9997 =(8440+865+3160+2093)/(3+8440+865+3160+2093)
Precision is 0.98 in the model, at the same time the recall(sensitivity) score is low (0.0971 = 168/1730 readmission cases).
recall (sensitivity) = TP/(TP+FN) = 0.0971
specificity = TN/ (FP+TN) = 0.9997
1. num_lab_procedures: 0.0463
2. num_medications: 0.0454
3. number_inpatient: 0.0442
4. time_in_hospital: 0.0400
5. ages: 0.0391
6. number_diagnoses: 0.0365
7. num_procedures: 0.0325
8. gender_Male: 0.0239
9. number_outpatient: 0.0203
10.number_emergency: 0.0186
11.insulin_Steady: 0.0161
12.payer_code_MC: 0.0150
13.Race_Caucasian: 0.0150
14.diag2_Circulatory: 0.0142
15.medication change: 0.0137
16.admission_Urgent: 0.0118
17.diag3_Neoplasms: 0.0104
18.diag2_Diabetes: 0.0103
num_lab_procedures: Number of lab tests performed during the encounter
num_procedures: Number of procedures (other than lab tests) performed during the encounter
num_medication: Number of distinct generic names administered during the encounter
number_outpatient: Number of outpatient visits of the patient in the year preceding the encounter
number_emergency: Number of emergency visits of the patient in the year preceding the encounter
Number_inpatient: Number of inpatient visits of the patient in the year preceding the encounter
number_diagnoses: Number of diagnoses entered to the system
Change of medication: Indicates if there was a change in diabetic medications: Nominal (either dosage or generic name). Values: “change” and 0% “no change”
This linear regression model is not good enough to predict
length of stay in hospital (R^2= 0.341).
Identified several features which have significant effect on the length of stay
After eliminated several features, choose best 4 features (primary diagnosis, number of lab procedures, number ofmedication).
time_in_hospital ~ diag1 + admsource + num_lab_procedures + num_procedures + num_medications
1. admsource[Court/Law Enforcement] 3.6739
2. diag1[BIPOLAR I DISORDER, SINGLE MANIC EPISODE, UNSPECIFIED] 2.3783
3. diag1[External causes] 2.1953
4. admsource[Transfer from critial access hospital] 1.9351
5. diag1[Mental] 1.8092
6. admsource[Transfer from a Skilled Nursing Facility (SNF)] 1.6880
7. diag1[DIABETES PERIPHERAL CIRCULATORY DISORDERS, TYPE II OR UNSPECIFIED TYPE] 1.3647
8. admsource[Transfer from a hospital] 0.8428
9. admsource[Transfer from another health care facility] 0.5842
10. diag1[Obesity BODY MASS INDEX 38.0-38.9] 0.2040
11. num_procedures 0.1817
12. num_medications 0.1307
13. diag1[PNEUMONIA ORGANISM UNSPECIFIED] 0.1130
Coefficients of noninteraction terms estimated from the final linear regression model
101,766 patients primary diagnosis and secondary diagnosis data was visualized