
Mechanistically Interpreting Multilingual Language Models
Module Description
Course | Module Abbreviation | Credit Points |
---|---|---|
BA-2010[100%|75%] | CS-CL | 6 LP |
BA-2010[50%] | BS-CL | 6 LP |
BA-2010[25%] | BS-AC | 4 LP |
BA-2010 | AS-CL | 8 LP |
Master | SS-CL-TAC | 8 LP |
Lecturer | Frederick Riemenschneider |
Module Type | Proseminar / Hauptseminar |
Language | English |
First Session | 17.10.2025 |
Time and Place | Friday, 10:15 - 11:45, INF 329 / SR 26 |
Commitment Period | tbd. |
Participants
All advanced CL Bachelor students and all CL master students. Students from MSc Data and Computer Science or MSc Scientific Computing with Field of Application Computational Linguistics are welcome after getting permission from the lecturer. MSc Scientific Computing students can only take the course as HS for 8 LP. If the seminar should be oversubscribed, CL students will have priority.
Prerequisites for Participation
- Completion of Programming I and Introduction to Computational Linguistics or similar introductory courses
- Programming II, Mathematical Foundations of Computational Linguistics and Statistics are heavily suggested
Assessment
- Active participation, including exercises
- Presentation
- Implementation project
Content
Multilingual Language Models (MLLMs) can process and connect dozens of languages, but the internal mechanisms that enable this are not well understood. Do they develop a universal "interlingua," or a complex patchwork of language-specific skills? This seminar will address these questions by applying the principles of Mechanistic Interpretability to reverse-engineer the computations within these models.
To do so, we will look at the circuit and neuron level, probing whether MLLMs reuse the same components across different languages. We will compare these to the circuits found in separate monolingual models, exploring ideas like the "Platonic representation hypothesis." Our analysis will also examine the boundaries of multilinguality by looking at phenomena where knowledge fails to transfer, considering both the pre-training process and the final model.
Beyond the discussion of foundational papers, this seminar incorporates a practical component. We will apply core interpretability methods, like activation patching and the logit lens, directly to models. This hands-on work is intended to build a deeper understanding of the techniques themselves and to allow us to explore the models and attempt to uncover our own findings.