1.1 Project title: Interactive Didactic Tool for Natural Language Parsing
1.2
Acronym: DemoParser
1.3
Keywords:
1.4
Duration: 9 months
1.5
Intended Starting Date: February 2001
1.6
Participant list:
Yuliana Zigangirova
1.7
Project summary:
The goal of DemoParser
project is to develop an interactive tool that demonstrates the functioning of
basic natural language parsers. It should serve as a didactic tool for anyone
who wishes to understand how a particular parser works. Users will be able to interact
with the system and to follow the operation of the parser step by step. The
target users of the programme are computer linguistic students.
This product is developed
in the framework of the larger Linguistic Workbench project. All the
subprojects within this larger projects are developed as didactic tools to be
used in the educational process. They provide a student the possibility to
learn the material by providing a didactic user interactive tool with a user
friendly interface. The subprojects have several modules with similar or equal
tasks and the coordination between them may be useful in the process of their
development. The actual intersection of these subprojects may become clearer
while they are developed and this can lead to their use of a common module or
the exchange of the solutions of common problems.
The
DemoParser is designed as an interactive tool that should demonstrate the
functioning of basic natural language parsers. It should serve as a didactic
tool for anyone who wishes to understand how a particular parser works. The
interactive demonstration with a user-friendly interface allows one to follow
the operation of the parser step by step.
The programme should allow
the user to observe how various basic parsers process a given sentence. To
increase the didactic value of the programme the user should be allowed to
change the grammar rules and the current lexicon and to follow then the parsing
process in a different language environment.
In this project the programme
will be developed for one particular basic parser. However, this will involve
the development of several components which should make straightforward the
creation of demonstration tools for other parsers. These components of general
value include: (i) the interactive interface which will be both designed
conceptually and implemented; (ii) the grammar rules component which serves to
store the rules, allows one to change them interactively, and provides to other
components the access to the rules database; (iii) a similar database component
which permits to write in, store and read out the lexicon. Apart from that
different parser algorithms may use similar tools to process a sentence. All
the above-mentioned components will be designed in a manner which allows one to
use them as universal building blocks of various parser demonstration tools. On
the other hand, the structure of any new parser requires the implementation of
its particular algorithm and should be performed anew. Since the main goal is
the development of an educational tool, we do not plan to use this tool to
process complex sentences or to use a large vocabulary. Rather the system
should be designed to use a limited lexicon and several user-defined grammar
rules.
While a variety of natural
language parsers have been designed only a limited number of those can be
accessed, especially via the Internet, by linguistic students and scientists
who work in the field. Moreover, even those parsers which are represented in
the WWW provide only the results of their analysis. A tool which can facilitate
representation of the details of the parsing mechanism would be of great value.
First, it should give a perfect learning opportunity for the permanently
growing number of computer linguistics students (and other philology students)
who would be able to grasp the ideas behind parsing algorithms more easily. As
an educational tool it can also be appreciated by the teaching staff of the
Universities.
From a more technical point
of view, our product should combine recent achievements in parsing techniques
with the representation tools developed for the Internet. We should single out
(starting from a particular parser within this project) the notions and
ingredients used in parsing algorithms which are common for many of them. The
techniques used to represent these elements in a graphical, easy-to-grasp form,
when developed, will allow to easily integrate new parsers into our scheme. This,
further, assumes the development of an interactive interface. Here the
technical tools used in the Internet, with Java programming language on the
first place, can be utilised together with the experience in the material
presentation acquired by the WWW-community.
This project is dedicated to
implementation of a demonstration tool for one particular parser. However,
ideally the components should be designed in such a way as to provide the means
for a straightforward extension to demonstration of other parsing algorithms. In
this respect, we plan to analyse the parsers from the parsing tutorial
"Natural Language Parsers" by P.Hellwig (see
http://www.gs.uni-heidelberg.de/~hellwig/tutorial.html) and to organise the
modules within the programme such that they can be used for the other parsers
from the tutorial. The project will start with this analysis since its results
should define the actual structure of the modules.
System architecture
The product will be
composed of modules which will be controlled by a top level module. Different
modules will carry out tasks related to different stages of the parsing process
and the demonstration. One of the modules will control the lexicon, a component
needed for all parsing algorithms. It should store the current lexicon, provide
the necessary information from the lexicon to the parsing module, and allow the
user to change the lexicon interactively. Another module will store, provide
access to, and allow to change the grammar rules. One more module should
control the dialogue with the user, i.e., should provide an interface for the
user to type in the sentence into a word table and to control the parsing
process (e.g., start the process, perform a step etc.). The module which
implements a particular parsing algorithm is parser-dependent. Its access to
the lexicon and the grammar rules will be provided by the corresponding
modules.
Functionality of the system
The top level module should
mediate the interaction between the user and the parsing process. It will
receive the user's commands via the interface module and control the
functioning of the parsing module accordingly. These commands may include the
orders to start the parsing process, to perform a step, to perform a larger
stage of the process, to reverse the process. The exact list may depend on the
parsing algorithm and will be specified at a later stage.
The user will be able to
change the lexicon and the grammar rules (or other information needed by the
parser) by clicking on the appropriate button to activate a specific dialogue. The
options for the use in these dialogues will also include the possibility to
choose one of sample lexicons or sets of grammar rules. This feature will be
useful for an inexperienced user at the first stages of the learning process.
The user will be able to
print out the result of the parsing. Depending on his/her desire only the final
result will be printed or a more complete information which can include the
lexicon used, the grammar rules or the work table. Instead of printing it out
the user will have an option to save the information in a file.
As already mentioned some
sample lexicons, grammar rules and input sentences will be provided. These will
be stored in files in a proper format (to be developed). At a final stage of
the project a module is planned to be implemented which would allow the user to
extend or update this `database' of samples. This will allow the user, for
instance, to work repeatedly with one of already introduced sets of grammar
rules or lexicons. In particular, this feature will be useful for the teachers
who may provide their own samples for their students.
In order to use the
possibilities of interaction with the user and the means to provide the
information offered by the contemporary Internet technologies the product will
be implemented using HTML-tools and the Java language.
The parser that will be
implemented first in our didactic tool is the top-down parser with backtracking
(PT-1 from P.Hellwig's script). It is an interpreting parser, that demonstrates
several important features of the parsing technique. The grammar is
separated from the procedure and is specified as a set of context-free production
rules. The input sentence is processed from left to right according
to the lexicon on one-pass (depth-first) basis. The recognition
strategy is based on the category expansion (top-down processing) and
alternatives are treated through backtracking. The linguistic structure
assigned at the end of the parsing process is a constituency tree. All
these features are planned to be implemented in our demonstrating tool through
the series of separate workpackages. Some of the workpackages will be possibly
developed in cooperation with the other participants of the "Linguistic
Workbench" project, in case the other parts of the project will require
solving of the similar tasks (such as construction of the constituency tree). This
has to be defined in details later.
Workpackage
1: Lexicon and Input Table
Objective: implementation
of the Lexicon and Input Table modules
Within this workpackage the
modules are planned to be developed which should handle the information
exchange with the lexicon and with the input table. More precisely, the way to
store the needed information should be developed and the functions which give
access to this information should be implemented. Finally, the interface which
governs the interaction with this modules will be developed. The latter shall
make it possible to demonstrate the functioning of these modules already by the
end of the first workpackage. The tasks within this workpackage are specified
below more precisely. The possibilities given to the user by these modules will
include: the option to choose one of several sample lexicons, to build a new
lexicon or to update an existing one, and similarly for the input table the
options to choose or input a new sentence to be processed by the parser. The
modules will also provide the information about the content of the lexicon and
input tables, respectively. The development of the lexicon and the input table
modules are combined in one workpackage since one of the goals of the input
table module involves the identification of the part of speech of the words in
the sentence and requires a close interaction with the lexicon module.
Task 1.1 Lexicon
Objective: development of
the Lexicon module
The task involves the
development of the way to store the lexicon information during the parsing
process; the format to store a lexicon in a file, to be used to store sample
lexicons and user-defined lexicons between the parsing sessions; the tools to
create a new lexicon or to update an existing one by the addition or deletion
of the entries; the tools to access the lexicon information such as to check
whether a particular word is within the lexicon or to find the part of speech
attribute of this word.
Deliverable: the piece of
software which allows to play with the lexicons.
Task 1.2 Input
Objective: development of
the input table module
The module will provide a dialogue-type
interface for the user to enter a new sentence to be processed by the parser or
to choose one from a set of sample sentences. After the sentence is entered or
chosen the module, in cooperation with the lexicon module, determines the part
of speech attribute for each word in the sentence and enters this information
into the input table.
Task 1.3 Input-Lexicon interface
Objective: development of
the interface which allows the user to control the lexicon and input table
modules in an interactive fashion.
This should provide an
interface which allows the user, by clicking on buttons on the screen or by
choosing entries from the list on the screen, to select lexicons or sentences,
to enter new ones, to edit the existing ones, and also to determine the part of
speech of the words in the chosen sentence.
Deliverable for this task
and the whole workpackage: the software with the interface
Workpackage 2: Rules
module
Objective: the development
of the Rules project
Within this workpackage a
module will be developed which will serve to create sets of grammar rules and
provide tools for their use in other modules. In addition, the interface for
the dialogue with the user will be developed.
Task 2.1 Grammar
Objective: development of
the module
Similar to the above
modules, we plan to create several sample grammar rules sets and allow the user
either to change one of them or to create a new one. Apart from that, the user
will be able to edit the existing sets by addition of new rules or deletion of
the existing rules.
To facilitate access to the
rules database, we plan to create a tool which will provide the rules needed to
the user (e.g., with a given left hand side) and return the corresponding
expansions. Different rules with the same left hand side will be provided one
by one, e.g., in order of their appearance in the rules set.
Task 2.2 Rules interface
Objective: interface to the
rules module
This interface will provide
the user-friendly environment for the creation and revision of the grammar
rules used by the parser. In addition, the interface will provide the
possibility to test the tool for the access to the rules set. At later stage
this interface will be incorporated into the interface for the whole project.
Deliverable: the interface
which allows to test the grammar rules module
Workpackage 3: Parser
This workpackage is
dedicated to the development of the major module, the parsing module. This
module should handle the working table and the backtracking store and, using
the aforementioned packages (lexicon, rules, input table), it should govern the
parsing process in a step-by-step manner. Its work will be directed by the
central Top Level module based on the user's commands.
The Parser module will be
able to perform a single step of the parsing process (expansion, recognition,
backtracking) using the other modules and the tables and to update these tables
(and other needed variables) accordingly.
Task 3.1 Parser
Objective: implementation
of the parser algorithm
In order to implement the
algorithm one needs to implement several procedures used in the parsing
process, i.e., the expansion, the recognition and the backtracking. At the
beginning the parser should create the original working table and the
backtracking store. Then at each stage it should analyse the content of the
working table and the backtracking store, perform the needed step and update
the tables. It should also inform the software which called the module if a
parse was found and if all alternatives have been processed and no more parses
were found.
Deliverable: the parser
which can be tested in a non-interactive manner
Task 3.2 Parser interface
Objective: development and
implementation of the user interface
At this stage we plan to
design and implement the interface for the dialogue with the user which allows
him to control the parsing process. By choosing different options on the screen
the user will be able to force a single step of the parsing process with the
following update of the working table and the backtracking store on the screen;
force a stage of several parsing steps; force the parsing process to run till a
parse is found or till the the parsing is finished. The user's options will
include, e.g., the possibility to run the parsing process till the first parse
is found or till all parsers are found, the possibility to choose the number of
steps to be performed before the screen (the working table) is updated etc.
Deliverable: the interface
which allows to test the parser
Task 3.3 Constituency tree
Objective: implementation
of the module which draws the tree
For each found parse the
user will be able to visualise it in a form of a constituency tree. This task
is dedicated to the implementation of this module. Based on the final working
table this module will single out the rules used in the parsing process and draw
the corresponding tree in a proper part of the screen.
Deliverable: the module
which draws the tree
Workpackage 4: Top
Level module
Objective: to combine
together all the modules and to implement the Top Level module
After all major modules are
implemented we will combine them all together to form the final software
product. The Top Level module will coordinate the functioning of other modules.
It will start the dialog with the user and allow him to choose a lexicon, a set
of grammar rules and a sentence to be processed by calling the appropriate
modules. Then it will give the user the possibility to start and control the
parsing process. The corresponding interface will allow the user to print or
save the content of any table he sees on the screen (the lexicon, the working
table etc.), to start a new parsing process etc. When the Top Level module is
finished the product can be tested in its integrity.
Deliverable: the parsing
demo tool
Workpackage 5: Evaluation,
user study and corrections
Objective: to test the
product, to allow some prospective users to test the product and to improve its
features and the interface based on their feedback
After the product is
finished in its simple form, we plan to increase the number of useful options
available for the user to control the process. When the product is finished it
can be offered for testing to several linguistic students. The suggestions of
the students can be used to improve the product by adding new useful features,
elimination of possible bugs, improvement of the interface.
Deliverable: the final
parsing demo tool
The implementation of the
whole project will take eight month. It will start at February 1, 2001 and end
at October 1, 2001.
The terms of deliverables
are as following:
|
Workpackage
1 |
Lexicon
and Input Table |
February
1 - April 1 |
|
Workpackage
2 |
Rules
module |
April 1 -
May 1 |
|
Workpackage
3 |
Parser |
May 1 -
August 1 |
|
Workpackage
4 |
Top Level
module |
August 1
- September 1 |
|
Workpackage
5 |
User
study and corrections |
September 1 - October 1 |