Patients readmission prediction
Tohei Yokogawa
Data Scientist at Metis
Patients Readmission
Readmission rate is one of the key indicators for the hospitals to maintain their quality.
In 2014, Medicare fined a record number of 2,610 hospitals for having too many patients return within a month for additional treatments.
(http://kaiserhealthnews.org/news/medicare-readmissions-penalties-2015/)
Predicting readmission within 30 days is very critical not only for the hospitals. If a patient can see his readmission risks, he and his family can prepare and prevent unwanted readmission.
Predicting readmission
- Readmitted within 30 days of discharge
- Readmitted after 30 days of discharge
- No readmission record (between 1999-2008)
Data Source
- 101,766 patients hospitalization records.
- Representing 10 years (1999–2008) of clinical care at 130 hospitals.
Beata Strack, Jonathan DeShazo,et al.
BioMed Research International, Volume 2014 (2014)
http://www.hindawi.com/journals/bmri/2014/781670/
'race', 'gender', 'ages', ... etc
101,766 patients, 24 features
Readmission (<30day, >30day, No)
Predict
Classification
'race', 'gender', 'ages', 'admission', 'discharge', 'admsource', 'time_in_hospital', 'payer_code', 'num_lab_procedures', 'num_procedures', 'num_medications', 'number_outpatient', 'number_emergency', 'number_inpatient', 'diag1', 'diag2' 'number_diagnoses', 'max_glu_serum', 'A1Cresult', 'insulin', 'change', 'diabetesMed'
24 features
Cleaning data
Main focus was how much these data set can predict occurance of readmission to the hospital, and identify more important features for the prediction. Medical diagnosis was categorize into 40 categories based on ICD-9 codes.
Random split 4:1:5
Training set (4), Validation set(1) and Testing set (5)
Title Text
Machine Learning
Calibrate model
Random Forest
Gradient Boost
Choosing model
F1 score from testing data set
Predicting who will be readmitted within 30 days
98% Precision for within 30 days readmission
Confusion Matrix
8440 865 2
3160 2093 1
1154 408 168
Predicted classes
Actual
classes
No readm
>30 days
<30 days
No readm
>30 days
<30 days
specificity = TN/ (FP+TN) = 0.9997 =(8440+865+3160+2093)/(3+8440+865+3160+2093)
Precision is 0.98 in the model, at the same time the recall(sensitivity) score is low (0.0971 = 168/1730 readmission cases).
recall (sensitivity) = TP/(TP+FN) = 0.0971
specificity = TN/ (FP+TN) = 0.9997
Top 18 important features
1. num_lab_procedures: 0.0463
2. num_medications: 0.0454
3. number_inpatient: 0.0442
4. time_in_hospital: 0.0400
5. ages: 0.0391
6. number_diagnoses: 0.0365
7. num_procedures: 0.0325
8. gender_Male: 0.0239
9. number_outpatient: 0.0203
10.number_emergency: 0.0186
11.insulin_Steady: 0.0161
12.payer_code_MC: 0.0150
13.Race_Caucasian: 0.0150
14.diag2_Circulatory: 0.0142
15.medication change: 0.0137
16.admission_Urgent: 0.0118
17.diag3_Neoplasms: 0.0104
18.diag2_Diabetes: 0.0103
num_lab_procedures: Number of lab tests performed during the encounter
num_procedures: Number of procedures (other than lab tests) performed during the encounter
num_medication: Number of distinct generic names administered during the encounter
number_outpatient: Number of outpatient visits of the patient in the year preceding the encounter
number_emergency: Number of emergency visits of the patient in the year preceding the encounter
Number_inpatient: Number of inpatient visits of the patient in the year preceding the encounter
number_diagnoses: Number of diagnoses entered to the system
Change of medication: Indicates if there was a change in diabetic medications: Nominal (either dosage or generic name). Values: “change” and 0% “no change”
Length of stay in hospital prediction
(multivariate regression)
This linear regression model is not good enough to predict
length of stay in hospital (R^2= 0.341).
HOWEVER
Identified several features which have significant effect on the length of stay
After eliminated several features, choose best 4 features (primary diagnosis, number of lab procedures, number ofmedication).
time_in_hospital ~ diag1 + admsource + num_lab_procedures + num_procedures + num_medications
TOP 13 factors for length of stay in hospital
1. admsource[Court/Law Enforcement] 3.6739
2. diag1[BIPOLAR I DISORDER, SINGLE MANIC EPISODE, UNSPECIFIED] 2.3783
3. diag1[External causes] 2.1953
4. admsource[Transfer from critial access hospital] 1.9351
5. diag1[Mental] 1.8092
6. admsource[Transfer from a Skilled Nursing Facility (SNF)] 1.6880
7. diag1[DIABETES PERIPHERAL CIRCULATORY DISORDERS, TYPE II OR UNSPECIFIED TYPE] 1.3647
8. admsource[Transfer from a hospital] 0.8428
9. admsource[Transfer from another health care facility] 0.5842
10. diag1[Obesity BODY MASS INDEX 38.0-38.9] 0.2040
11. num_procedures 0.1817
12. num_medications 0.1307
13. diag1[PNEUMONIA ORGANISM UNSPECIFIED] 0.1130
Coefficients of noninteraction terms estimated from the final linear regression model
Connection between primary diagnosis and secondary diagnosis
101,766 patients primary diagnosis and secondary diagnosis data was visualized
Patients readmission prediction
By tohei
Patients readmission prediction
- 1,783