Ruprecht-Karls-Universität Heidelberg

SiGHTSee - Simultaneous Generation
in Heidelberg: Text and Scene images

Project Aims and Development Strategy

Our long term research goal is to model the mutual specification of the two modalities language and vision in a natural language generation system. This embeds our research in the interdisciplinary working group on Cognition in Language, Art and Music at the University of Heidelberg.

We have two hypotheses: (1) Language and vision mutually disambiguate each other thus allowing for more efficient conveying of communicative goals. (2) Different languages tend to verbalise visual scenes in different ways. Reconciling these two aspects within a coherent architecture is difficult. At the same time, it is crucial for the design of naturally sounding multilingual systems. As a basis for (1), we employ a large-scale knowledge resource comprising spatial information that serves as input for synchronous generation in both modalities. To account for (2), we formalise language-specific differences in verbalising visually conveyed information, using linguistic parameterisations. To overcome the restrictions of hand-crafted rule-based systems, we apply data-driven, statistical learning methods, to enable flexible domain changes without manual work.

First Results

Poster as PDF

Generation Prototype

We implemented a research prototype that generates route instructions in combination with a dynamic 2D route visualisation using GoogleMaps API. Computed routes from GoogleMaps are enriched with landmarks retrieved from the following web resources: Google Local Search, Wikipedia, OpenStreetMaps and Wikimapia. Natural language route instructions are generated step-by-step using statistical knowledge from an annotated corpus of route directions.

More information:

Navigation Study

In March 2009, we conducted a study in order to collect route directions. For this purpose, native speakers of German, English, and Spanish were asked to follow a predefined route and to give written directions to a person unfamiliar with the area as to how to get to the destination, taking the same route. In the first part subjects walked certain routes on the campus (both inside and outside the buildings of the Theoretikum) together with an instructor and gave directions at the destination of the respective route. The second part was computer-based. Here, subjects were shown routes and possible landmarks using an interactive GoogleMaps application. They were asked to use only information derivable from the map while giving their directions, even if they knew other possible landmarks.

3D Models

For visualising routes and landmarks we use 3D models with different levels of detail. We received outdoor models of the old town of Heidelberg and the campus Im Neuenheimer Feld due to a collaboration with the Department of Geography at the University of Heidelberg. These models contain three-dimensional views of buildings with realistic fassade textures. Indoor models of buildings of the Theoretikum were created by the NGG Group at the Interdisciplinary Center for Scientific Computing using the free modeling and rendering software Blender and enhanced with surface textures afterwards. To which degree furniture is to be integrated in the model will be decided after the analysis of the navigation study.

The models serve multiple purposes: On the one hand, they will be used in the final system to visualise critical decision points and routes. On the other hand, they form an important part of the virtual environment that will be employed in further web-based studies as well as for evaluation of collected and generated route directions.

Example Scenario

zum Seitenanfang