NATURAL LANGUAGE PARSERS

A "Course in Cooking"

Peter Hellwig
Department of Computational Linguistics
University of Heidelberg
Karlstrasse 2
D-69117 Heidelberg, Germany
hellwig@cl.uni-heidelberg.de




Purpose

This tutorial offers orientation in the field of NL parsing from a practical point of view. Its focus lies on the implementation of parsers rather than on their mathematical foundation. Nevertheless the tutorial aims at granting general insights. There are well-known parsing algorithms which are used in NLP (e.g. Early, Cocke, Tomita). There exist numerous technical reports about particular parsers in applications. The purpose of the tutorial is to go beyond these particular parsers, to take apart the known algorithms, to give a systematic overview of the issues that arise in any parser development, to survey the alternative solutions that exist for each identified task.

Skills to be acquired in the tutorial include: The competence of choosing methods and components deliberately when putting together a new parser; the ability to evaluate existing parsers according to a checklist of features. Similar to a course in cooking, the student will come to know the ingredients of parsers and learn how they are combined in prototypical implementations. At the end, he or she should be able to recombine the ingredients in order to invent a new parser.

Contents

A comprehensive list of parsing issues serves as the essential orientation for the tutorial. Each issue is associated with a range of solutions, for example:

Another topic of the tutorial are feature structures and the unification mechanism. Finally, criteria for the evaluation of parsers will be identified and discussed, such as suitability to solve specific NLP problems like discontinuous constituents, ellipsis and coordination; efficiency, avoidance of overgeneration; ability to handle ill-formed input; linguistic perspicuity, ease of debugging and updating the grammatical data without side effects.

Presentation

The tutorial takes six hours and will be divided into two sessions. The presentation adheres to the principle of learning by example. Rather than starting with a theoretical survey, the general principles will be derived from the demonstration of a series of prototypical parsers. The prototypes represent some of the most important parsers used in NLP. In addition, each prototype illustrates a particular combination of tasks and solutions. Each prototype will be demonstrated step by step so that one can see exactly how it functions. Subsequently, the lesson that can be learned from it will be stated. The following prototypes will be shown:

PT-1. Top-down parser with backtracking

PT-2. Top-down parser with parallel-processing

PT-3. Table-controlled shift-reduce parser

PT-4. Bottom-up parser with a well-formed substring table (Algorithm according to Cocke, Kasami and Younger)

PT-5. Top-down parser with divided productions (Earley's algorithm, active chart parser)

PT-6. Parsing with recursive transition networks (RTN)

PT-7. Augmented transition networks (ATN)

PT-8. Chart Parsing according to the slot-and-filler principle

Requirements

The student of this tutorial should have some experience in programming and be acquainted with the basics of syntactic descriptions. Knowledge of parsers is not absolutely required. On the other hand, students with parsing experience might attend the tutorial as well in order to study similarities and differences of parsers and estimate their advantages and disadvantages.

Course materials

Revised version of the course script (2002)

See also the parsing courses at the Department of Computational Lingusitics of the University of Heidelberg