NATURAL LANGUAGE PARSERS

A "Course in Cooking"

Peter Hellwig
Department of Computational Linguistics
University of Heidelberg
Karlstrasse 2
D-69117 Heidelberg, Germany
hellwig@cl.uni-heidelberg.de

Purpose

This tutorial offers orientation in the field of NL parsing from a practical point of view. Its focus lies on the implementation of parsers rather than on their mathematical foundation. Nevertheless the tutorial aims at granting general insights. There are well-known parsing algorithms which are used in NLP (e.g. Early, Cocke, Tomita). There exist numerous technical reports about particular parsers in applications. The purpose of the tutorial is to go beyond these particular parsers, to take apart the known algorithms, to give a systematic overview of the issues that arise in any parser development, to survey the alternative solutions that exist for each identified task.

Skills to be acquired in the tutorial include: The competence of choosing methods and components deliberately when putting together a new parser; the ability to evaluate existing parsers according to a checklist of features. Similar to a course in cooking, the student will come to know the ingredients of parsers and learn how they are combined in prototypical implementations. At the end, he or she should be able to recombine the ingredients in order to invent a new parser.

A comprehensive list of parsing issues serves as the essential orientation for the tutorial. Each issue is associated with a range of solutions, for example:

What kind of structure is the parser to assign to the input ?
constituent trees
dependency trees
What form has the grammar specification?
production rules
networks
frames
What constellation is established between grammar and parser?
interpreting parser
procedural parser
compiled parser
Which recognition technique is applied?
category expansion (top-down)
category reduction (bottom-up)
state transition
slot filling
What is the control mechanism in case of grammatical alternatives?
backtracking
parallel processing
looking ahead
well-formed substring table (chart)

Another topic of the tutorial are feature structures and the unification mechanism. Finally, criteria for the evaluation of parsers will be identified and discussed, such as suitability to solve specific NLP problems like discontinuous constituents, ellipsis and coordination; efficiency, avoidance of overgeneration; ability to handle ill-formed input; linguistic perspicuity, ease of debugging and updating the grammatical data without side effects.

Presentation

The tutorial takes six hours and will be divided into two sessions. The presentation adheres to the principle of learning by example. Rather than starting with a theoretical survey, the general principles will be derived from the demonstration of a series of prototypical parsers. The prototypes represent some of the most important parsers used in NLP. In addition, each prototype illustrates a particular combination of tasks and solutions. Each prototype will be demonstrated step by step so that one can see exactly how it functions. Subsequently, the lesson that can be learned from it will be stated. The following prototypes will be shown:

PT-1. Top-down parser with backtracking

Illustrates: interpreting parser; grammar separate from procedure; derivation-oriented analysis; top-down strategy; expectation-driven; depth-first strategy; schematic backtracking in case of alternatives; left-to-right one-pass processing of the input; left-associative; only actual result accessible; same work done several times; suitable for context free grammars only.

PT-2. Top-down parser with parallel-processing

Illustrates: parallel processing of alternatives; depth-first strategy; left-associative; context free grammars only.

PT-3. Table-controlled shift-reduce parser

Illustrates: derivation-oriented analysis transformed into transition oriented processing; bottom-up strategy; data driven; reduction of alternatives by looking ahead (implicit in the action table); left-to-right one-pass processing; left-associative; organization of the work area: no work done twice; context free grammars only.

PT-4. Bottom-up parser with a well-formed substring table (Algorithm according to Cocke, Kasami and Younger)

Illustrates: bottom-up strategy; data-driven, no expectations; left-to-right one-pass processing, but not left-associative (causing obsolete intermediate results); organization of the work area as a chart: no work done twice; context free grammars only; correspondence of binary phrase structure and dependency structure.

PT-5. Top-down parser with divided productions (Earley's algorithm, active chart parser)

Illustrates: top-down strategy with bottom-up phases; expectation-driven with data-driven phases; one-pass left-to-right processing; left associative; chart: no work done twice but a lot of superfluous rule applications; context free grammars only.

PT-6. Parsing with recursive transition networks (RTN)

Illustrates: transition-oriented analysis; interpreting parser, networks derived from phrase structure rules; top-down depth-first strategy with schematic backtracking; on-pass left-to-right processing; context free grammars only.

PT-7. Augmented transition networks (ATN)

Illustrates: procedural parser, no separation of grammar and parser; Turing machine power, i.e. ability to recognize virtually any syntactic phenomenon of natural languages (e.g. discontinuous constituents by left dislocation) and to produce any output (e.g. dependency structures), but the generated language is indecidable; backtracking can be avoided to some extent by inspecting registers.

PT-8. Chart Parsing according to the slot-and-filler principle

Illustrates: interpreting parser; completion-oriented analysis; dependency structures; lexicalized grammar, basic elements carry all the syntactic information; bottom-up strategy; extremely data-driven, also expectation-driven; chart: no work done twice; one-pass left-to right processing; no left associativity, therefore obsolete intermediate result; different types of attributes and discontinuous slots extend the scope of the parser beyond context free grammar in a controlled way.

Requirements

The student of this tutorial should have some experience in programming and be acquainted with the basics of syntactic descriptions. Knowledge of parsers is not absolutely required. On the other hand, students with parsing experience might attend the tutorial as well in order to study similarities and differences of parsers and estimate their advantages and disadvantages.

Course materials

Revised version of the course script (2002)

See also the parsing courses at the Department of Computational Lingusitics of the University of Heidelberg