b05901082 楊晟甫

b05902127 劉俊緯

b06902080 吳士綸

# Result

0.697 - train from scratch (12-layers) + BCE

# F1 loss & BCE loss

## F1 loss

Actually : always output (1,1,1,0) -> score 0.560

differentiable F1 loss leads XLnet to local minima

## Differentiable F1 loss

\frac{2 \times \text{precision} \times\text{recall} }{\text{precision} + \text{recall}} = \frac{2TP}{2TP+FP+FN}
TP = torch.sum(labels * logits, 1)
FP = torch.sum((1-labels) * logits, 1)
FN = torch.sum(labels * (1-logits), 1)
f1_loss = 1 - 2 * TP / (2*TP+FP+FN)


Avoid eps.

## Mixed Loss - BCE + F1

\alpha \times \text{BCE}_{loss} + (1-\alpha ) \times \text{F1}_{loss}
\alpha \in [0,1]

# Result

0.70 - pre-train(only train & test) & fine tune

0.68 - only fine tune

## Warm-up & Fine-tune

We freeze to self_attn 7.

Bert

# Result

0.694422 (XLnet w/o pretrain + BCE loss)-> 0.6976744

# However...

it doesn't work in mixed-loss.

Bert

# Timeline

## Timeline

• XLnet - BCE / F1 / MSE / Softmax or Sigmoid
• XLnet - F1 + BCE + Threshold
• Bert - F1 + BCE
• Bert - F1 + BCE & Pretraining
• Bert - F1 + BCE & Pretraining & Fine-tune