# Empirical Methods for NLP and Data Science

## Module Description

Course | Module Abbreviation | Credit Points |
---|---|---|

BA-2010 | AS-CL | 8 LP |

Master | SS-CL, SS-TAC | 8 LP |

Informatik Seminar | IS | 4 LP |

Lecturer | Stefan Riezler |

Module Type | |

Language | English |

First Session | 22.04.2021 |

Time and Place | Thursday, 11:15-12:45, tba |

Commitment Period |
tbd. |

### Prerequisite for Participation

Good knowledge of statistical machine learning (e.g., by successful completion of courses "Statistical Methods for Computational Linguistics" and/or "Neural Networks: Architectures and Applications for NLP") and experience in experimental work (e.g., software project or seminar implementation project)

### Assessment

- Regular and active participation
- Oral presentation
- Implementation project (CL) or written term paper (Informatics)

### Content

Most natural language processing (NLP) or data science tasks can be formalized as machine learning problems where a prediction function needs to be learned and evaluated on data pairs of inputs and gold standard outputs. Usually, the representation of input data and the association of inputs to gold standard outputs is not questioned, assuming an ideal machine learning scenario. In real-world NLP problems, machine learning is preceded by a step of establishing representations of input data and of annotating inputs with gold standard labels, and succeeded by a comparative evaluation of the performance of the machine learning model on a held-out set of annotated data. Correct methodology in these phases is essential for the overall success of empirical NLP, however, it is underrepresented in theory and often neglected in practice. In this seminar we will explicitly discuss questions and methods of empirical science that regard the phases preceding and succeeding machine learning, centered around the problems of validity, reliability, and significance.
The problem of VALIDITY includes the following questions:

- How can the concept of validity - does a machine learning method predict what it purports to predict - be formalized?
- Possible pitfalls that compromise validity, and how to avoid them
- Statistical tests of validity of predictions, based on generalized additive models

Another set of questions regards RELIABILITY of human data annotation and in predictions of machine learning models, including a discussion of the following methods:

- How to disentangle the various problems of agreement, reproducibility, replicability, etc.
- Problems of descriptive statistics for reliability assessment
- Reliability assessment by variance component analysis, based on linear mixed-effects models

A last set of problems concerns SIGNIFICANCE, i.e., the question of how theory-critical hypotheses can be tested and confirmed as accurately as possible. Our discussion will include the following items:

- Central concepts of statistical hypothesis testing: error types, power of tests, multiplicity problem
- Problematic assumptions on the distribution of test statistics in standard singificance tests
- Powerful and general statistical significance tests: approximate randomization and generalized likelihood ratio test

### Literature

The seminar will be based on a pre-print of the textbook "Validity, Reliability, and Significance: Model-Based Empirical Methods for NLP" (in progress). Stefan Riezler and Michael Hagmann.

A list of further literature will be given in the first session of the seminar.

### Enrollment

Please enroll at the CL enrollment page until April 11, 2021.