Preprocessing
Assumption
Values do change much during a week.
Give more importance to days just before the OP.
Preprocessing
Week's Missing Values
[[ nan, 14. , nan],
[ 6.8, 9.2, nan],
[ nan, 138. , nan],
[ 143. , 165. , nan],
[ nan, 3.5, nan],
[ nan, nan, nan],
[ nan, 53. , nan],
[ nan, 216. , nan],
[ nan, 393. , nan],
[ 21. , 34. , nan],
[ nan, 240. , nan],
[ nan, 124. , nan],
[ nan, 312. , nan],
[ 489. , 1024. , nan],
[ nan, nan, nan]][[ 14. ],
[ 8. ],
[138. ],
[154. ],
[ 3.5],
[ nan],
[ 53. ],
[216. ],
[393. ],
[ 27.5],
[240. ],
[124. ],
[312. ],
[756.5],
[ nan]]week average
Tests' results of a week
+
=
[[ 14. , 14. , nan],
[ 6.8, 9.2, nan],
[ 138. , 138. , nan],
[ 143. , 165. , nan],
[ 3.5, 3.5, nan],
[ nan, nan, nan],
[ 53. , 53. , nan],
[ 216. , 216. , nan],
[ 393. , 393. , nan],
[ 21. , 34. , nan],
[ 240. , 240. , nan],
[ 124. , 124. , nan],
[ 312. , 312. , nan],
[ 489. , 1024. , nan],
[ nan, nan, nan]]Preprocessing
Week's Missing Values: Technicalities
[[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]]Mark the first day of the week
If the week is empty
[[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan]]Preprocessing
Data Augmentation
Keep exactly one test per week.
Create instances out of all the possible combinations.
0 3 7 14 28 35 42
days
before OP
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
Preprocessing
Data Augmentation
Keep exactly one test per week.
Create instances out of all the possible combinations.
Problem
SSI people are much more tested!
Hence, they induce much more new instances!
⇒ The new data is very imbalanced!
Workaround
Sample SSI instances
Preprocessing
Preprocessing: Female
Preprocessing: Male
Binary classification
Experimental Setting
SSI instances are sampled to have a balanced dataset
⇒ Experiments are repeated 10 times.
Binary classification
SSI = 0
SVM
0.998 (0.001)
Forest
0.999 (0.000)
Male
SSI = 1
SVM
0.999 (0.001)
Forest
0.999 (0.001)
SSI = 2
SVM
0.999 (0.001)
Forest
1.000 (0.000)
SSI = 0
SVM
0.992 (0.002)
Forest
0.994 (0.002)
Female
SSI = 1
SVM
0.994 (0.002)
Forest
0.995 (0.002)
SSI = 2
SVM
0.999 (0.001)
Forest
0.999 (0.001)
Binary classification
SSI = 0
SVM
0.998 (0.001)
Forest
0.999 (0.001)
All
SSI = 1
SVM
0.998 (0.000)
Forest
0.999 (0.000)
SSI = 2
SVM
0.999 (0.000)
Forest
1.000 (0.000)
Lasso
Lasso Lars IC
0.978 (0.005)
Female
Lasso Lars CV
0.986 (0.003)
Lasso Lars IC
0.931 (0.006)
Male
Lasso Lars CV
0.955 (0.005)
Lasso Lars IC
0.909 (0.006)
All
Lasso Lars CV
0.931 (0.003)
Lasso