Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Multilingual Language Models

Kursbeschreibung

Studiengang Modulkürzel Leistungs-
bewertung
BA-2010 AS-CL, AS-FL 8 LP
BA-2010[100%|75%] CS-CL 6 LP
BA-2010[50%] BS-CL 6 LP
BA-2010[25%] BS-AC, BS-FL 4 LP
Master SS-CL, SS-TAC, SS-FAL 8 LP
Dozenten/-innen Frederick Riemenschneider
Veranstaltungsart Proseminar/Hauptseminar
Sprache English
Erster Termin 19.10.2023
Zeit und Ort Donnerstags, 13:15-14:45, INF 325 / SR 24
Commitment-Frist tbd.

Teilnahmevoraussetzungen

  • Completion of Programming I and Introduction to Computational Linguistics or similar introductory courses
  • Programming II, Mathematical Foundations of Computational Linguistics and Statistics are heavily suggested

Leistungsnachweis

  • Active participation
  • Presentation
  • Implementation project

Inhalt

By now, transformer-based language models have been pre-trained on a wide variety of languages, including high-resource languages as well as lower-resourced languages. Despite the growing inclusivity, monolingual pre-training on low-resource languages fails to yield significant benefits due to the enormous text corpus required, creating a gap in natural language processing advancements for these languages. To address this problem, numerous multilingual language models have been proposed, such as mBERT, XLM, or XLM-R. These models are trained in the hope that they will acquire generalizable knowledge from high-resource languages, which can then be transferred to lower-resource languages. However, multilingual pre-training introduces a new complication known as the "curse of multilinguality", which refers to the capacity dilution as the model's per-language capacity diminishes with an increasing number of languages. This seminar takes an in-depth look into various multilingual models and their pre-training objectives. We will also discuss the challenges presented by the "curse of multilinguality", presenting analyses and potential solutions to lift this curse.

Literatur

Will be announced at the beginning of the semester.

» weitere Kursmaterialien

zum Seitenanfang