Is Attention all you need? The Search for a New Architecture
Module Description
| Course | Module Abbreviation | Credit Points |
|---|---|---|
| BA-2010[100%|75%] | CS-CL | 6 LP |
| BA-2010[50%] | BS-CL | 6 LP |
| BA-2010 | AS-CL | 8 LP |
| Master | SS-CL-TAC | 8 LP |
| Lecturer | Michael Staniek |
| Module Type | Proseminar / Hauptseminar |
| Language | English |
| First Session | 13.10.2025 |
| Time and Place | Mo, 13:15 - 14:45, SR 26 / INF 329 |
| Commitment Period | tbd. |
Participants
All advanced CL Bachelor students and all CL master students. Students from MSc Data and Computer Science or MSc Scientific Computing with Field of Application Computational Linguistics are welcome after getting permission from the lecturer. MSc Scientific Computing students can only take the course as HS for 8 LP. If the seminar should be oversubscribed, CL students will have priority.
Prerequisites for Participation
- Introduction to Neural Networks
Assessment
- Presentation
- Second Presentation OR Project
Content
The Transformer Architecture improved neural machine translation results drastically and was quickly adopted by the natural language processing community. Training Transformer models due to the inherent parallelism was very efficient and networks could get very deep, yielding improvements in other tasks such as language modeling, completely replacing RNNs. However, inference with transformer models is not very efficient due to the lack of a hidden state summarizing all information up to that point, and researchers actively search for other architectures.
This course focuses on works that try to find a new way of doing things, either by investigating improvements to the transformer architecture, investigating improvements or alternatives to RNNs or other nifty ideas to get better results (e.g. attention modifications like ROPE).
Agenda
| Date | Session | Materials |


