The Annotated Transformer Revisited

In this article we have an illustrated annotated look at the Transformer published in “Attention is all you need” in 2017 by Vaswani, Shazeer, Parmar, et al. The Transformer architecture was groundbreaking as it achieved 28.4 BLEU on the WMT 2014 English-to-German translation task with comparatively very little training. Even though it is eclipsed by the “Reformer: The Efficient Transformer” published by Nikita Kitaev, Łukasz Kaiser and Anselm Levskaya in 2020, it is still interesting to have a look at the fundamental idea of the comparatively “simple network architecture […] based solely on attention mechanisms”. ...

February 22, 2020 · 982 words