Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

Validity, Reliability, and Confirmation: Elementary Empirical Methods for NLP

Module Description

Course Module Abbreviation Credit Points
BA-2010 AS-CL 8 LP
Master SS-CL, SS-TAC 8 LP
Lecturer Stefan Riezler
Module Type Hauptseminar
Language English
First Session 28.04.2020
Time and Place Tuesday, 11:15-12:45, INF 306 / SR 13
Commitment Period tbd.

Prerequisite for Participation

Good knowledge of statistical machine learning (e.g., by successful completion of courses "Statistical Methods for Computational Linguistics" and/or "Neural Networks: Architectures and Applications for NLP") and experience in experimental work (e.g., software project or seminar implementation project)


* Regular and active participation
* Oral presentation
* Implementation project or written term paper


Most natural language processing (NLP) tasks can be formalized as machine learning problems where a function predicting structured outputs needs to be learned and evaluated given data pairs of inputs and gold standard outputs. Usually, the representation of input data and the association of inputs to gold standard outputs is not questioned, assuming an ideal machine learning scenario. In real-world NLP problems, machine learning is preceded by a step of establishing representations of input data and of annotating inputs with gold standard labels, and succeeded by a comparative evaluation of the performance of the machine learning model on a held-out set of annotated data. Correct methodology in these phases is essential for the overall success of empirical NLP, however, it is underrepresented in theory and often neglected in practice. In this seminar we will explicity discuss questions and methods of empirical science that regard the phases preceding and succeeding machine learning, centered around the problems of validity, reliability, and confirmation.

The problem of VALIDITY concerns the phase preceding machine learning, and includes a discussion of the following questions:
* Does a machine learning method predict what it purports to predict, and how can its validity be analyzed?
* What are possible pitfalls that compromise validity, and how can they be avoided?

Another set of questions regarding the phase preceding machine learning concerns RELIABILITY, including a discussion of the following methods:
* Empirical methods to assess intra-annotator and inter-annotator reliability
* Agreement-based methods and their shortcomings
* Variance component models of reliability
* Probabilistic models of annotation

A last set of problems in the phase post machine learning concerns CONFIRMATION, i.e., the question of how theory-critical hypotheses can be tested and confirmed as accurately as possible. Our discussion will be based on linear mixed effects models (lmems) and include the following items:
* lmems to systematically study variations in experimental observations
* lmems to generalize from static test data to larger populations
* Statistical significance tests based on repeated measurements from larger populations

» More Materials

zum Seitenanfang