Team: SDML_SpioradBaseline
b05901082 楊晟甫
b05902127 劉俊緯
b06902080 吳士綸
0.697 - train from scratch (12-layers) + BCE
Actually : always output (1,1,1,0) -> score 0.560
differentiable F1 loss leads XLnet to local minima
TP = torch.sum(labels * logits, 1)
FP = torch.sum((1-labels) * logits, 1)
FN = torch.sum(labels * (1-logits), 1)
f1_loss = 1 - 2 * TP / (2*TP+FP+FN)
Avoid eps.
0.70 - pre-train(only train & test) & fine tune
0.68 - only fine tune
We freeze to self_attn 7.
Bert
0.694422 (XLnet w/o pretrain + BCE loss)-> 0.6976744
it doesn't work in mixed-loss.
Bert