Predicting Attention Sparsity

in Transformers

May 27, 2022

Marcos Treviso

António Góis

Patrick Fernandes

Erick Fonseca

André F. T. Martins

DEEPSPIN

1
Predicting Attention Sparsity in Transformers May 27, 2022 Marcos Treviso António Góis Patrick Fernandes Erick Fonseca André F. T. Martins DEEPSPIN