(Trans|Lin|Long|...)former: Self-Attention Mechanisms
||SS-CL, SS-TAC, SS-FAL
Proseminar / Hauptseminar
|Time and Place
INF 329 / SR 26
Prerequisite for Participation
- Statistical methods
- Mathematical foundations
2 presentations (subject to availability) XOR 1 presentation and (term paper|project)
The "self-attention" mechanism is the workhorse behind most systems used nowadays in NLP ("Transformer"), ranging from machine translation ("Self-attention is all you need"), to self-supervised transfer learning ("BERT") to many other applications.
The standard self-attention mechansim, however, has some downsides: e.g., it
has quadratic dependencies on sequence lengths (this makes it difficult to apply
for very long texts). To mitigate these, and other weaknesses, several variants and generalizations of the "classic" self-attention mechanism have been proposed recently (they go by the names "Longformer", "Linformer", "BigBird"...). The main goal of this seminar is to obtain a good understanding of the motivation, math, and empiricial performance of the different self-attention mechanisms.
Will be discussed in the first session