Adaptor Grammars: A framework for Bayesian non-parametric grammatical inference
Mark Johnson
Macquarie University
Each human language contains an unbounded number of different sentences. How
can something so large and complex possibly be learnt? Over the past decade
and a half we've figured out how to define probability distributions over
grammars and the linguistic structures they generate, opening up the
possibility of Bayesian models of language acquisition. Bayesian approaches
are particularly attractive because they can exploit "prior" (e.g., innate)
knowledge as well as statistical generalizations from the input. Standard
machine-learning methods are parametric, i.e., they try to optimise a function
of a fixed set of parameter values. Recently non-parametric Bayesian methods
have been developed that aim to identify the relevant parameters as well as
their values. This talk describes Adaptor Grammars (AGs), which generalise
over the potentially infinite sets of subtrees defined by a CFG. We explain
how AGs can be applied to morphology induction, unsupervised word segmentation
and topic modelling.