Evaluating the interplay of information status and information content in a multilingual parallel corpus
Abstract
The uniform information density (UID) hypothesis postulates that linguistic units are distributed in a text in such a way that the variance around an average information density is minimized. While there is work on UID, predominantly in English (e.g. Meister et al. 2021, Cuskley et al. 2021), the relationship between information density and information status is so far underexplored. In this ongoing work, we project information status annotations on the English section of the CIEP+ corpus (Verkerk & Talamo 2024) to parallel sections in other languages. We then use the projected annotations to evaluate the relationship between information status and information content in a typologically diverse sample of languages. Our preliminary findings indicate that there is an effect of information status on information density, with the directionality of the effect depending on language and part of speech.

