Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

(Trans|Lin|Long|...)former: Self-Attention Mechanisms

Module Description

Course Module Abbreviation Credit Points
BA-2010[100%|75%] CS-CL 6 LP
BA-2010[50%] BS-CL 6 LP
BA-2010 AS-CL, AS-FL 8 LP
Lecturer Juri Opitz
Module Type Proseminar / Hauptseminar
Language English
First Session 28.10.2021
Time and Place Thursday, 14:15-15:45
INF 329 / SR 26
Commitment Period tba

Prerequisite for Participation

  • Statistical methods
  • Mathematical foundations


2 presentations (subject to availability) XOR 1 presentation and (term paper|project)


The "self-attention" mechanism is the workhorse behind most systems used nowadays in NLP ("Transformer"), ranging from machine translation ("Self-attention is all you need"), to self-supervised transfer learning ("BERT") to many other applications. The standard self-attention mechansim, however, has some downsides: e.g., it has quadratic dependencies on sequence lengths (this makes it difficult to apply for very long texts). To mitigate these, and other weaknesses, several variants and generalizations of the "classic" self-attention mechanism have been proposed recently (they go by the names "Longformer", "Linformer", "BigBird"...). The main goal of this seminar is to obtain a good understanding of the motivation, math, and empiricial performance of the different self-attention mechanisms.


Will be discussed in the first session

» More Materials