Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Lehrveranstaltungen
heiCO
Ressourcen	Fachschaft
Studien-FAQ	Technik-FAQ

Vision and Language Navigation

Module Description

Course	Module Abbreviation	Credit Points
BA-2010[100%\|75%]	CS-CL	6 LP
BA-2010[50%]	BS-CL	6 LP
BA-2010[25%]	BS-AC	4 LP
BA-2010	AS-CL	8 LP
Master	SS-CL, SS-TAC	8LP

Lecturer	Raphael Schumann
Module Type	Proseminar / Hauptseminar
Language	English
First Session	26.04.2022
Time and Place	Dienstag, 17:15–18:45, online
Commitment Period	tbd.

heiconf Link

https://heiconf.uni-heidelberg.de/q6wr-3trd-awyd-ckxy

Prerequisite for Participation

• Statistical Methods for CL
• Introduction to Neural Networks and Sequence-To-Sequence Learning (or equal)

Assessment

• Paper presentation
• Implementation project
• Prepare questions for each session's paper

Inhalt

Vision and language navigation (VLN) is a challenging task that requires a vision agent model to process natural language instructions and ground them in a visual environment. The agent is embodied in the environment and receives navigation instructions. Based on the instructions, the observed surroundings, and the current trajectory the agent decides its next action. Executing this action changes the position and/or heading of the agent within the environment, and eventually the agent follows the described route and stops at the desired goal location. While early work on VLN was confined to gridworld scenarios, recent work has studied VLN in outdoor environment consisting of a real-world urban street layouts or indoor scenarios with rich visual surroundings. Agent models treat the task as a sequence-to-sequence problem where the instructions text and image representation is the input and the output is a sequence of actions.

Module Overview

Agenda

Date

Session

Materials

Literature

• Walk the talk: Connecting language, knowledge, and action in route instructions; MacMahon et al., 2006
• Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments.; Anderson et al., 2018
• Touchdown: Natural language navigation and spatial reasoning in visual street environments; Chen et al., 2019