Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Document AI

Module Description

Course Module Abbreviation Credit Points
BA-2010 AS-CL, AS-FL 8 LP
BA-2010[100%|75%] CS-CL 6 LP
BA-2010[50%] BS-CL 6 LP
BA-2010[25%] BS-AC 4 LP
Master SS-CL-TAC 8 LP
Lecturer Daniel Dahlmeier
Module Type Hauptseminar / Proseminar
Language Englisch
First Session 17.10.2025
Time and Place Friday, 08:15 - 09:45, Online
Commitment Period tbd.

Participants

All advanced CL Bachelor students and all CL master students. Students from MSc Data and Computer Science or MSc Scientific Computing with Field of Application Computational Linguistics are welcome after getting permission from the lecturer. MSc Scientific Computing students can only take the course as HS for 8 LP.  If the seminar should be oversubscribed, CL students will have priority.  

Prerequisites for Participation

  • Mathematical Foundations of CL (or a comparable introductory class to linear algebra and theory of probability)
  • Statistical Methods for CL (or a comparable introductory class to machine learning)

Assessment

  • Regular and active attendance of seminar (20%)
  • Independent study of assigned scientific paper, clarity of seminar presentation (40%)
  • Hands-on implementation of seminar project (40%)

Contents

Recent advances in large language models (LLMs) and vision language models (VLMs) have resulted in impressive capabilities of AI models to process documents, including Optical Character Recognition (OCR) with mark down and latex formats, schema-based information extraction, and document question answering.

In this seminar, we will discuss recent research papers related to these document NLP tasks, summarised under the term "Document AI". Students will be assigned research papers for them to study and present in the seminar. In addition, students will implement a Document AI application as a group project to gain practical skills in this area.

Literature

zum Seitenanfang