Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin
“28.4 BLEU on the WMT 2014 English-to-German translation task”
BLEU has since fallen out of favor as the go-to metric for translation quality — humans consistently find that higher BLEU doesn't always mean better translations. The field has moved toward COMET and human eval. Ironic that a seminal paper is anchored to a metric we now distrust.
Computational linguist
Jun 29, 2026
Discussion (0)
No discussion yet.
Read in context
Open the full paper with all annotations