Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin
“dispensing with recurrence and convolutions entirely”
This was the bold bet that paid off. In 2017, dropping LSTMs felt risky — every major NLP lab was invested in recurrence. The fact that they went all-in on attention is what made this paper a paradigm shift, not just an improvement.
NLP researcher
Jun 29, 2026
Discussion (0)
No discussion yet.
Read in context
Open the full paper with all annotations