Project Proposal: DemoParser

1 PROJECT IDENTIFICATION

1.1 Project title: Interactive Didactic Tool for Natural Language Parsing
1.2 Acronym: DemoParser
1.3 Keywords:
1.4 Duration: 9 months
1.5 Intended Starting Date: February 2001
1.6 Participant list: Yuliana Zigangirova
1.7 Project summary:

The goal of DemoParser project is to develop an interactive tool that demonstrates the functioning of basic natural language parsers. It should serve as a didactic tool for anyone who wishes to understand how a particular parser works. Users will be able to interact with the system and to follow the operation of the parser step by step. The target users of the programme are computer linguistic students.

This product is developed in the framework of the larger Linguistic Workbench project. All the subprojects within this larger projects are developed as didactic tools to be used in the educational process. They provide a student the possibility to learn the material by providing a didactic user interactive tool with a user friendly interface. The subprojects have several modules with similar or equal tasks and the coordination between them may be useful in the process of their development. The actual intersection of these subprojects may become clearer while they are developed and this can lead to their use of a common module or the exchange of the solutions of common problems.

2 PROJECT DESCRIPTION

2.1 Objectives of the project

The DemoParser is designed as an interactive tool that should demonstrate the functioning of basic natural language parsers. It should serve as a didactic tool for anyone who wishes to understand how a particular parser works. The interactive demonstration with a user-friendly interface allows one to follow the operation of the parser step by step.

The programme should allow the user to observe how various basic parsers process a given sentence. To increase the didactic value of the programme the user should be allowed to change the grammar rules and the current lexicon and to follow then the parsing process in a different language environment.

In this project the programme will be developed for one particular basic parser. However, this will involve the development of several components which should make straightforward the creation of demonstration tools for other parsers. These components of general value include: (i) the interactive interface which will be both designed conceptually and implemented; (ii) the grammar rules component which serves to store the rules, allows one to change them interactively, and provides to other components the access to the rules database; (iii) a similar database component which permits to write in, store and read out the lexicon. Apart from that different parser algorithms may use similar tools to process a sentence. All the above-mentioned components will be designed in a manner which allows one to use them as universal building blocks of various parser demonstration tools. On the other hand, the structure of any new parser requires the implementation of its particular algorithm and should be performed anew. Since the main goal is the development of an educational tool, we do not plan to use this tool to process complex sentences or to use a large vocabulary. Rather the system should be designed to use a limited lexicon and several user-defined grammar rules.

2.2 Technical Baseline

While a variety of natural language parsers have been designed only a limited number of those can be accessed, especially via the Internet, by linguistic students and scientists who work in the field. Moreover, even those parsers which are represented in the WWW provide only the results of their analysis. A tool which can facilitate representation of the details of the parsing mechanism would be of great value. First, it should give a perfect learning opportunity for the permanently growing number of computer linguistics students (and other philology students) who would be able to grasp the ideas behind parsing algorithms more easily. As an educational tool it can also be appreciated by the teaching staff of the Universities.

From a more technical point of view, our product should combine recent achievements in parsing techniques with the representation tools developed for the Internet. We should single out (starting from a particular parser within this project) the notions and ingredients used in parsing algorithms which are common for many of them. The techniques used to represent these elements in a graphical, easy-to-grasp form, when developed, will allow to easily integrate new parsers into our scheme. This, further, assumes the development of an interactive interface. Here the technical tools used in the Internet, with Java programming language on the first place, can be utilised together with the experience in the material presentation acquired by the WWW-community.

2.3 Implementation

This project is dedicated to implementation of a demonstration tool for one particular parser. However, ideally the components should be designed in such a way as to provide the means for a straightforward extension to demonstration of other parsing algorithms. In this respect, we plan to analyse the parsers from the parsing tutorial "Natural Language Parsers" by P.Hellwig (see http://www.gs.uni-heidelberg.de/~hellwig/tutorial.html) and to organise the modules within the programme such that they can be used for the other parsers from the tutorial. The project will start with this analysis since its results should define the actual structure of the modules.

System architecture

The product will be composed of modules which will be controlled by a top level module. Different modules will carry out tasks related to different stages of the parsing process and the demonstration. One of the modules will control the lexicon, a component needed for all parsing algorithms. It should store the current lexicon, provide the necessary information from the lexicon to the parsing module, and allow the user to change the lexicon interactively. Another module will store, provide access to, and allow to change the grammar rules. One more module should control the dialogue with the user, i.e., should provide an interface for the user to type in the sentence into a word table and to control the parsing process (e.g., start the process, perform a step etc.). The module which implements a particular parsing algorithm is parser-dependent. Its access to the lexicon and the grammar rules will be provided by the corresponding modules.

Functionality of the system

The top level module should mediate the interaction between the user and the parsing process. It will receive the user's commands via the interface module and control the functioning of the parsing module accordingly. These commands may include the orders to start the parsing process, to perform a step, to perform a larger stage of the process, to reverse the process. The exact list may depend on the parsing algorithm and will be specified at a later stage.

The user will be able to change the lexicon and the grammar rules (or other information needed by the parser) by clicking on the appropriate button to activate a specific dialogue. The options for the use in these dialogues will also include the possibility to choose one of sample lexicons or sets of grammar rules. This feature will be useful for an inexperienced user at the first stages of the learning process.

The user will be able to print out the result of the parsing. Depending on his/her desire only the final result will be printed or a more complete information which can include the lexicon used, the grammar rules or the work table. Instead of printing it out the user will have an option to save the information in a file.

As already mentioned some sample lexicons, grammar rules and input sentences will be provided. These will be stored in files in a proper format (to be developed). At a final stage of the project a module is planned to be implemented which would allow the user to extend or update this `database' of samples. This will allow the user, for instance, to work repeatedly with one of already introduced sets of grammar rules or lexicons. In particular, this feature will be useful for the teachers who may provide their own samples for their students.

In order to use the possibilities of interaction with the user and the means to provide the information offered by the contemporary Internet technologies the product will be implemented using HTML-tools and the Java language.

3 WORKPLAN

The parser that will be implemented first in our didactic tool is the top-down parser with backtracking (PT-1 from P.Hellwig's script). It is an interpreting parser, that demonstrates several important features of the parsing technique. The grammar is separated from the procedure and is specified as a set of context-free production rules. The input sentence is processed from left to right according to the lexicon on one-pass (depth-first) basis. The recognition strategy is based on the category expansion (top-down processing) and alternatives are treated through backtracking. The linguistic structure assigned at the end of the parsing process is a constituency tree. All these features are planned to be implemented in our demonstrating tool through the series of separate workpackages. Some of the workpackages will be possibly developed in cooperation with the other participants of the "Linguistic Workbench" project, in case the other parts of the project will require solving of the similar tasks (such as construction of the constituency tree). This has to be defined in details later.

3.1 Workpackage list

Workpackage 1: Lexicon and Input Table

Objective: implementation of the Lexicon and Input Table modules

Within this workpackage the modules are planned to be developed which should handle the information exchange with the lexicon and with the input table. More precisely, the way to store the needed information should be developed and the functions which give access to this information should be implemented. Finally, the interface which governs the interaction with this modules will be developed. The latter shall make it possible to demonstrate the functioning of these modules already by the end of the first workpackage. The tasks within this workpackage are specified below more precisely. The possibilities given to the user by these modules will include: the option to choose one of several sample lexicons, to build a new lexicon or to update an existing one, and similarly for the input table the options to choose or input a new sentence to be processed by the parser. The modules will also provide the information about the content of the lexicon and input tables, respectively. The development of the lexicon and the input table modules are combined in one workpackage since one of the goals of the input table module involves the identification of the part of speech of the words in the sentence and requires a close interaction with the lexicon module.

Task 1.1 Lexicon

Objective: development of the Lexicon module

The task involves the development of the way to store the lexicon information during the parsing process; the format to store a lexicon in a file, to be used to store sample lexicons and user-defined lexicons between the parsing sessions; the tools to create a new lexicon or to update an existing one by the addition or deletion of the entries; the tools to access the lexicon information such as to check whether a particular word is within the lexicon or to find the part of speech attribute of this word.

Deliverable: the piece of software which allows to play with the lexicons.

Task 1.2 Input

Objective: development of the input table module

The module will provide a dialogue-type interface for the user to enter a new sentence to be processed by the parser or to choose one from a set of sample sentences. After the sentence is entered or chosen the module, in cooperation with the lexicon module, determines the part of speech attribute for each word in the sentence and enters this information into the input table.

Task 1.3 Input-Lexicon interface

Objective: development of the interface which allows the user to control the lexicon and input table modules in an interactive fashion.

This should provide an interface which allows the user, by clicking on buttons on the screen or by choosing entries from the list on the screen, to select lexicons or sentences, to enter new ones, to edit the existing ones, and also to determine the part of speech of the words in the chosen sentence.

Deliverable for this task and the whole workpackage: the software with the interface

Workpackage 2: Rules module

Objective: the development of the Rules project

Within this workpackage a module will be developed which will serve to create sets of grammar rules and provide tools for their use in other modules. In addition, the interface for the dialogue with the user will be developed.

Task 2.1 Grammar

Objective: development of the module

Similar to the above modules, we plan to create several sample grammar rules sets and allow the user either to change one of them or to create a new one. Apart from that, the user will be able to edit the existing sets by addition of new rules or deletion of the existing rules.

To facilitate access to the rules database, we plan to create a tool which will provide the rules needed to the user (e.g., with a given left hand side) and return the corresponding expansions. Different rules with the same left hand side will be provided one by one, e.g., in order of their appearance in the rules set.

Task 2.2 Rules interface

Objective: interface to the rules module

This interface will provide the user-friendly environment for the creation and revision of the grammar rules used by the parser. In addition, the interface will provide the possibility to test the tool for the access to the rules set. At later stage this interface will be incorporated into the interface for the whole project.

Deliverable: the interface which allows to test the grammar rules module

Workpackage 3: Parser

This workpackage is dedicated to the development of the major module, the parsing module. This module should handle the working table and the backtracking store and, using the aforementioned packages (lexicon, rules, input table), it should govern the parsing process in a step-by-step manner. Its work will be directed by the central Top Level module based on the user's commands.

The Parser module will be able to perform a single step of the parsing process (expansion, recognition, backtracking) using the other modules and the tables and to update these tables (and other needed variables) accordingly.

Task 3.1 Parser

Objective: implementation of the parser algorithm

In order to implement the algorithm one needs to implement several procedures used in the parsing process, i.e., the expansion, the recognition and the backtracking. At the beginning the parser should create the original working table and the backtracking store. Then at each stage it should analyse the content of the working table and the backtracking store, perform the needed step and update the tables. It should also inform the software which called the module if a parse was found and if all alternatives have been processed and no more parses were found.

Deliverable: the parser which can be tested in a non-interactive manner

Task 3.2 Parser interface

Objective: development and implementation of the user interface

At this stage we plan to design and implement the interface for the dialogue with the user which allows him to control the parsing process. By choosing different options on the screen the user will be able to force a single step of the parsing process with the following update of the working table and the backtracking store on the screen; force a stage of several parsing steps; force the parsing process to run till a parse is found or till the the parsing is finished. The user's options will include, e.g., the possibility to run the parsing process till the first parse is found or till all parsers are found, the possibility to choose the number of steps to be performed before the screen (the working table) is updated etc.

Deliverable: the interface which allows to test the parser

Task 3.3 Constituency tree

Objective: implementation of the module which draws the tree

For each found parse the user will be able to visualise it in a form of a constituency tree. This task is dedicated to the implementation of this module. Based on the final working table this module will single out the rules used in the parsing process and draw the corresponding tree in a proper part of the screen.

Deliverable: the module which draws the tree

Workpackage 4: Top Level module

Objective: to combine together all the modules and to implement the Top Level module

After all major modules are implemented we will combine them all together to form the final software product. The Top Level module will coordinate the functioning of other modules. It will start the dialog with the user and allow him to choose a lexicon, a set of grammar rules and a sentence to be processed by calling the appropriate modules. Then it will give the user the possibility to start and control the parsing process. The corresponding interface will allow the user to print or save the content of any table he sees on the screen (the lexicon, the working table etc.), to start a new parsing process etc. When the Top Level module is finished the product can be tested in its integrity.

Deliverable: the parsing demo tool

Workpackage 5: Evaluation, user study and corrections

Objective: to test the product, to allow some prospective users to test the product and to improve its features and the interface based on their feedback

After the product is finished in its simple form, we plan to increase the number of useful options available for the user to control the process. When the product is finished it can be offered for testing to several linguistic students. The suggestions of the students can be used to improve the product by adding new useful features, elimination of possible bugs, improvement of the interface.

Deliverable: the final parsing demo tool

3.2 List of deliverables

The implementation of the whole project will take eight month. It will start at February 1, 2001 and end at October 1, 2001.

The terms of deliverables are as following:

Workpackage 1	Lexicon and Input Table	February 1 - April 1
Workpackage 2	Rules module	April 1 - May 1
Workpackage 3	Parser	May 1 - August 1
Workpackage 4	Top Level module	August 1 - September 1
Workpackage 5	User study and corrections	September 1 - October 1