Eurostars


Preprocessing
Preprocessing
Assumption
Values do change much during a week.
Give more importance to days just before the OP.

Preprocessing
Week's Missing Values
[[ nan, 14. , nan],
[ 6.8, 9.2, nan],
[ nan, 138. , nan],
[ 143. , 165. , nan],
[ nan, 3.5, nan],
[ nan, nan, nan],
[ nan, 53. , nan],
[ nan, 216. , nan],
[ nan, 393. , nan],
[ 21. , 34. , nan],
[ nan, 240. , nan],
[ nan, 124. , nan],
[ nan, 312. , nan],
[ 489. , 1024. , nan],
[ nan, nan, nan]][[ 14. ],
[ 8. ],
[138. ],
[154. ],
[ 3.5],
[ nan],
[ 53. ],
[216. ],
[393. ],
[ 27.5],
[240. ],
[124. ],
[312. ],
[756.5],
[ nan]]week average
Tests' results of a week
+
=
[[ 14. , 14. , nan],
[ 6.8, 9.2, nan],
[ 138. , 138. , nan],
[ 143. , 165. , nan],
[ 3.5, 3.5, nan],
[ nan, nan, nan],
[ 53. , 53. , nan],
[ 216. , 216. , nan],
[ 393. , 393. , nan],
[ 21. , 34. , nan],
[ 240. , 240. , nan],
[ 124. , 124. , nan],
[ 312. , 312. , nan],
[ 489. , 1024. , nan],
[ nan, nan, nan]]Preprocessing
Week's Missing Values: Technicalities
[[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]]Mark the first day of the week
If the week is empty
[[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan]]Preprocessing
Data Augmentation
Keep exactly one test per week.
Create instances out of all the possible combinations.
0 3 7 14 28 35 42
days
before OP
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
Preprocessing
Data Augmentation
Keep exactly one test per week.
Create instances out of all the possible combinations.
Problem
SSI people are much more tested!
Hence, they induce much more new instances!
⇒ The new data is very imbalanced!
Workaround
Sample SSI instances
Preprocessing

Preprocessing: Female

Preprocessing: Male

Processing
Binary classification
Binary classification
Experimental Setting
SSI instances are sampled to have a balanced dataset
⇒ Experiments are repeated 10 times.
Binary classification
SSI = 0
SVM
0.998 (0.001)
Forest
0.999 (0.000)
Male
SSI = 1
SVM
0.999 (0.001)
Forest
0.999 (0.001)
SSI = 2
SVM
0.999 (0.001)
Forest
1.000 (0.000)
SSI = 0
SVM
0.992 (0.002)
Forest
0.994 (0.002)
Female
SSI = 1
SVM
0.994 (0.002)
Forest
0.995 (0.002)
SSI = 2
SVM
0.999 (0.001)
Forest
0.999 (0.001)
Binary classification
SSI = 0
SVM
0.998 (0.001)
Forest
0.999 (0.001)
All
SSI = 1
SVM
0.998 (0.000)
Forest
0.999 (0.000)
SSI = 2
SVM
0.999 (0.000)
Forest
1.000 (0.000)
Processing
Lasso
Lasso
Lasso Lars IC
0.978 (0.005)
Female
Lasso Lars CV
0.986 (0.003)
Lasso Lars IC
0.931 (0.006)
Male
Lasso Lars CV
0.955 (0.005)
Lasso Lars IC
0.909 (0.006)
All
Lasso Lars CV
0.931 (0.003)
Lasso

deck
By ahcene
deck
- 138