"Faster Transformer Decoding: N-gram Masked Self-Attention."

Ciprian Chelba et al. (2020)
a service of Schloss Dagstuhl - Leibniz Center for Informatics