Vision and Language Navigation
Module Description
Course | Module Abbreviation | Credit Points |
---|---|---|
BA-2010[100%|75%] | CS-CL | 6 LP |
BA-2010[50%] | BS-CL | 6 LP |
BA-2010[25%] | BS-AC | 4 LP |
BA-2010 | AS-CL | 8 LP |
Master | SS-CL, SS-TAC | 8LP |
Lecturer | Raphael Schumann |
Module Type | |
Language | English |
First Session | 26.04.2022 |
Time and Place | Dienstag, 17:15–18:45, online |
Commitment Period | tbd. |
heiconf Link
https://heiconf.uni-heidelberg.de/q6wr-3trd-awyd-ckxyPrerequisite for Participation
• Statistical Methods for CL
• Introduction to Neural Networks and Sequence-To-Sequence Learning (or equal)
Assessment
• Paper presentation
• Implementation project
• Prepare questions for each session's paper
Inhalt
Vision and language navigation (VLN) is a challenging task that requires a vision agent model to process natural language instructions and ground them in a visual environment. The agent is embodied in the environment and receives navigation instructions. Based on the instructions, the observed surroundings, and the current trajectory the agent decides its next action. Executing this action changes the position and/or heading of the agent within the environment, and eventually the agent follows the described route and stops at the desired goal location. While early work on VLN was confined to gridworld scenarios, recent work has studied VLN in outdoor environment consisting of a real-world urban street layouts or indoor scenarios with rich visual surroundings. Agent models treat the task as a sequence-to-sequence problem where the instructions text and image representation is the input and the output is a sequence of actions.
Module Overview
Agenda
Date | Session | Materials |
Literature
• Walk the talk: Connecting language, knowledge, and action in route instructions; MacMahon et al., 2006
• Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments.; Anderson et al., 2018
• Touchdown: Natural language navigation and spatial reasoning in visual street environments; Chen et al., 2019