Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Vision and Language Navigation

Module Description

Course Module Abbreviation Credit Points
BA-2010[100%|75%] CS-CL 6 LP
BA-2010[50%] BS-CL 6 LP
BA-2010[25%] BS-AC 4 LP
BA-2010 AS-CL 8 LP
Master SS-CL, SS-TAC 8LP
Lecturer Raphael Schumann
Module Type Proseminar / Hauptseminar
Language English
First Session 26.04.2022
Time and Place Dienstag, 17:15–18:45, online
Commitment Period tbd.

heiconf Link

Prerequisite for Participation

• Statistical Methods for CL
• Introduction to Neural Networks and Sequence-To-Sequence Learning (or equal)


• Paper presentation
• Implementation project
• Prepare questions for each session's paper


Vision and language navigation (VLN) is a challenging task that requires a vision agent model to process natural language instructions and ground them in a visual environment. The agent is embodied in the environment and receives navigation instructions. Based on the instructions, the observed surroundings, and the current trajectory the agent decides its next action. Executing this action changes the position and/or heading of the agent within the environment, and eventually the agent follows the described route and stops at the desired goal location. While early work on VLN was confined to gridworld scenarios, recent work has studied VLN in outdoor environment consisting of a real-world urban street layouts or indoor scenarios with rich visual surroundings. Agent models treat the task as a sequence-to-sequence problem where the instructions text and image representation is the input and the output is a sequence of actions.

Module Overview


Date Session Materials


• Walk the talk: Connecting language, knowledge, and action in route instructions; MacMahon et al., 2006
• Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments.; Anderson et al., 2018
• Touchdown: Natural language navigation and spatial reasoning in visual street environments; Chen et al., 2019

» More Materials

zum Seitenanfang