The Annotated Transformer Revisited

In this article we have an illustrated annotated look at the Transformer published in “Attention is all you need” in 2017 by Vaswani, Shazeer, Parmer, et al. The Transformer architecture was groundbraking as it achieves 28.4 BLEU on the WMT 2014 English-to-German translation task with comparatively very little training. Even though it is eclipsed by the “Reformer: The Efficient Transformer” published by Nikita Kitaev, Łukasz Kaiser and Anselm Levskayain in this year/2020, it is still interesting to have a look at the fundamental idea of the comparatively “simple network architecture […] based solely on attention mechanisms”. ...

February 22, 2020 · 980 words