Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Lehrveranstaltungen
heiCO
Ressourcen	Fachschaft
Studien-FAQ	Technik-FAQ

(Trans|Lin|Long|...)former: Self-Attention Mechanisms

Module Description

Course	Module Abbreviation	Credit Points
BA-2010[100%\|75%]	CS-CL	6 LP
BA-2010[50%]	BS-CL	6 LP
BA-2010	AS-CL, AS-FL	8 LP
Master	SS-CL, SS-TAC, SS-FAL	8 LP

Lecturer	Juri Opitz
Module Type	Proseminar / Hauptseminar
Language	English
First Session	28.10.2021
Time and Place	Thursday, 14:15-15:45 ~~INF 329 / SR 26~~ ~~online~~ ~~CIP Pool INF 325~~ INF 306 SR 19 (except: 2.12., 9.12, 13.1., 20.1., 27.1., on these dates we will have online session since we don't have access to the room)
Commitment Period	tba

Prerequisite for Participation

Statistical methods
Mathematical foundations

Assessment

2 presentations (subject to availability) XOR 1 presentation and (term paper|project)

Description

The "self-attention" mechanism is the workhorse behind most systems used nowadays in NLP ("Transformer"), ranging from machine translation ("Self-attention is all you need"), to self-supervised transfer learning ("BERT") to many other applications. The standard self-attention mechansim, however, has some downsides: e.g., it has quadratic dependencies on sequence lengths (this makes it difficult to apply for very long texts). To mitigate these, and other weaknesses, several variants and generalizations of the "classic" self-attention mechanism have been proposed recently (they go by the names "Longformer", "Linformer", "BigBird"...). The main goal of this seminar is to obtain a good understanding of the motivation, math, and empiricial performance of the different self-attention mechanisms.

Module Overview

Agenda

Date	Session	Materials
28.08.	Intro	slides
04.11.	Paper: Self-Attention is all you need; Speaker(s): Benjamin and Max	manuscript
11.11.	no session (conference)	na
18.11.	Paper: Longformer; Speaker(s): Feisal	slides
25.11.	Paper: Big Bird; Speaker(s): na	na
2.12.	Paper: Reformer; Speaker(s): Ines	slides
9.12.	Paper: Transformers are RNNs; Speaker(s): Marinco and Phan	na
16.12.	Paper: Linformer; Speaker(s): Dang and Laura	na
13.1.	Paper: Performer; Speaker(s): na	na
20.1.	Paper: Survey: Efficient transformers; Speaker(s): Laura	na
27.1.	Paper: Benchmark: long range arena; Speaker(s): Frederick and Hanna	na
3.2.	Paper: Mixing tokens with Fourier transform; Speaker(s): Nadia and Pablo	na
10.2.	Paper: MLP-Mixer; Speaker(s): Frederick	na
17.2.	Wrap-up and discussion	na

Literature

Will be discussed in the first session