Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Lehrveranstaltungen
heiCO
Ressourcen	Fachschaft
Studien-FAQ	Technik-FAQ

Integrating Vision and Language: Achievements and Challenges in Multimodal Machine Learning

Module Description

Course	Module Abbreviation	Credit Points
BA-2010	AS-FL	8 LP
BA-2010	AS-CL	8 LP
BA-2010[100%\|75%]	CS-CL	6 LP
BA-2010[50%]	BS-CL	6 LP
BA-2010[25%]	BS-AC, BS-FL	4 LP
Master	SS-CL, SS-TAC, SS-FAL	8 LP

Lecturer	Letitia Parcalabescu
Module Type	Proseminar / Hauptseminar
Language	English
First Session	23.10.2019
Time and Place	Wednesday, 16:15-17:45, INF 326 / SR 27 2. OG
End of Commitment Period	21.01.2020

Prerequisite for Participation

good knowledge of statistical methods, incl. neural networks
advanced BA students or MA students
basic understanding of computer vision
interest in the interdisciplinary field of NLP and Computer Vision

Assessment

regular, active participation;
presentation
project, seminar paper or equivalent contributions to the seminar

Content

Progress in artificial intelligence requires more than separate understanding of text and unrelated processing of other signals, e.g. image, sound. Multi-modal machine learning aims to handle a combination of different signal types and relate information from different modalities. In the seminar, we will study the latest machine learning techniques tackling the multimodal applications and datasets emerged in the last years. We will discuss the performance of state-of-the-art models and assess the shortcomings and challenges of current research. Topics include:

Visual Question Answering (VQA)
Visual Dialogue
Phrase Grounding
Visual-Textual Entailment
Scene Graph Generation
Multimodal Machine Translation

Module Overview

Agenda

For the agenda and the respective materials, please check the protected Materials Webpage.

Literature

Literature will be provided by the beginning of the term. A survey:

Baltrušaitis, T., Ahuja, C. and Morency, L.P., 2018. Multimodal machine learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), pp.423-443. https://arxiv.org/pdf/1705.09406.pdf
Kafle, K., Shrestha, R. and Kanan, C., 2019. Challenges and Prospects in Vision and Language Research. arXiv preprint arXiv:1904.09317. https://arxiv.org/pdf/1904.09317.pdf
Schlangen, D., 2019. Natural Language Semantics With Pictures: Some Language & Vision Datasets and Potential Uses for Computational Semantics. arXiv preprint arXiv:1904.07318. https://arxiv.org/pdf/1904.07318.pdf