Eurostars

Preprocessing

     Preprocessing

Assumption

Values do change much during a week.

Give more importance to days just before the OP.

     Preprocessing

Week's Missing Values

[[   nan,   14. ,    nan],
 [   6.8,    9.2,    nan],
 [   nan,  138. ,    nan],
 [ 143. ,  165. ,    nan],
 [   nan,    3.5,    nan],
 [   nan,    nan,    nan],
 [   nan,   53. ,    nan],
 [   nan,  216. ,    nan],
 [   nan,  393. ,    nan],
 [  21. ,   34. ,    nan],
 [   nan,  240. ,    nan],
 [   nan,  124. ,    nan],
 [   nan,  312. ,    nan],
 [ 489. , 1024. ,    nan],
 [   nan,    nan,    nan]]
[[ 14. ],
 [  8. ],
 [138. ],
 [154. ],
 [  3.5],
 [  nan],
 [ 53. ],
 [216. ],
 [393. ],
 [ 27.5],
 [240. ],
 [124. ],
 [312. ],
 [756.5],
 [  nan]]

week average

Tests' results of a week

+

=

[[  14. ,   14. ,    nan],
 [   6.8,    9.2,    nan],
 [ 138. ,  138. ,    nan],
 [ 143. ,  165. ,    nan],
 [   3.5,    3.5,    nan],
 [   nan,    nan,    nan],
 [  53. ,   53. ,    nan],
 [ 216. ,  216. ,    nan],
 [ 393. ,  393. ,    nan],
 [  21. ,   34. ,    nan],
 [ 240. ,  240. ,    nan],
 [ 124. ,  124. ,    nan],
 [ 312. ,  312. ,    nan],
 [ 489. , 1024. ,    nan],
 [   nan,    nan,    nan]]

     Preprocessing

Week's Missing Values: Technicalities

[[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]]

Mark the first day of the week

If the week is empty

[[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan],
[-1., nan, nan]]

     Preprocessing

Data Augmentation

Keep exactly one test per week.

Create instances out of all the possible combinations.

0      3         7                 14                28               35               42 

days

before OP

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

     Preprocessing

Data Augmentation

Keep exactly one test per week.

Create instances out of all the possible combinations.

Problem

SSI people are much more tested!

Hence, they induce much more new instances!

 The new data is very imbalanced!

Workaround

Sample SSI instances

     Preprocessing

     Preprocessing: Female

     Preprocessing: Male

Processing

Binary classification

     Binary classification

Experimental Setting

SSI instances are sampled to have a balanced dataset

⇒ Experiments are repeated 10 times.

     Binary classification

SSI = 0

SVM

0.998 (0.001)

Forest

0.999 (0.000)

Male

SSI = 1

SVM

0.999 (0.001)

Forest

0.999 (0.001)

SSI = 2

SVM

0.999 (0.001)

Forest

1.000 (0.000)

SSI = 0

SVM

0.992 (0.002)

Forest

0.994 (0.002)

Female

SSI = 1

SVM

0.994 (0.002)

Forest

0.995 (0.002)

SSI = 2

SVM

0.999 (0.001)

Forest

0.999 (0.001)

     Binary classification

SSI = 0

SVM

0.998 (0.001)

Forest

0.999 (0.001)

All

SSI = 1

SVM

0.998 (0.000)

Forest

0.999 (0.000)

SSI = 2

SVM

0.999 (0.000)

Forest

1.000 (0.000)

Processing

Lasso

     Lasso

Lasso Lars IC

0.978 (0.005)

Female

Lasso Lars CV

0.986 (0.003)

Lasso Lars IC

0.931 (0.006)

Male

Lasso Lars CV

0.955 (0.005)

Lasso Lars IC

0.909 (0.006)

All

Lasso Lars CV

0.931 (0.003)

     Lasso