Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

(Trans|Lin|Long|...)former: Self-Attention Mechanisms

Module Description

Course Module Abbreviation Credit Points
BA-2010[100%|75%] CS-CL 6 LP
BA-2010[50%] BS-CL 6 LP
BA-2010 AS-CL, AS-FL 8 LP
Lecturer Juri Opitz
Module Type Proseminar / Hauptseminar
Language English
First Session 28.10.2021
Time and Place Thursday, 14:15-15:45
INF 329 / SR 26 online CIP Pool INF 325 INF 306 SR 19 (except: 2.12., 9.12, 13.1., 20.1., on these dates we will have online session since we don't have access to the room)
Commitment Period tba

Prerequisite for Participation

  • Statistical methods
  • Mathematical foundations


2 presentations (subject to availability) XOR 1 presentation and (term paper|project)


The "self-attention" mechanism is the workhorse behind most systems used nowadays in NLP ("Transformer"), ranging from machine translation ("Self-attention is all you need"), to self-supervised transfer learning ("BERT") to many other applications. The standard self-attention mechansim, however, has some downsides: e.g., it has quadratic dependencies on sequence lengths (this makes it difficult to apply for very long texts). To mitigate these, and other weaknesses, several variants and generalizations of the "classic" self-attention mechanism have been proposed recently (they go by the names "Longformer", "Linformer", "BigBird"...). The main goal of this seminar is to obtain a good understanding of the motivation, math, and empiricial performance of the different self-attention mechanisms.

Module Overview


Date Session Materials
28.08. Intro slides
04.11. Paper: Self-Attention is all you need; Speaker(s): Benjamin and Max manuscript
11.11. no session (conference) na
18.11. Paper: Longformer; Speaker(s): Feisal slides
25.11. Paper: Big Bird; Speaker(s): na na
2.12. Paper: Reformer; Speaker(s): Ines slides
9.12. Paper: Transformers are RNNs; Speaker(s): Marinco and Phan na
16.12. Paper: Linformer; Speaker(s): Dang and Laura na
13.1. Paper: Performer; Speaker(s): na na
20.1. Paper: Survey: Efficient transformers; Speaker(s): Laura na
27.1. Paper: Benchmark: long range arena; Speaker(s): Frederick and Hanna na
3.2. Paper: Mixing tokens with Fourier transform; Speaker(s): Nadia and Pablo na
10.2. Paper: MLP-Mixer; Speaker(s): Frederick na
17.2. Wrap-up and discussion na


Will be discussed in the first session

» More Materials