GSCL BEST THESIS AWARD

September 15th, 2025
At the recent KONVENS 2025 Conference in Hildesheim, Frederick Riemenschneider has presented his Master’s thesis, Investigating Language Models for Classical Philology. Aspects of Morphology, Syntax, and Knowledge from a Multilingual Perspective, which has been selected as the 2025 winner of the GSCL Bi-annual Best Thesis Award.

greek tetradrachm

In his work, Frederick builds a comprehensive language model zoo for Ancient Greek and Latin,
by curating and creating large-scale pre-training data, including
rescuing a high-quality Ancient Greek corpus from mis-OCRed Internet Archive scans via a novel identification-and-reprocessing pipeline.
He then pre-trains nine new models across three architectures (encoder-only, decoder-only, encoder–decoder) and three language configurations (Greek, Latin, and a multilingual Greek–Latin–English setup),
thereby expanding the computational toolkit for Classical Philology and
introducing the first generative models tailored to the field.

Thorough benchmarking on PoS tagging, lemmatization, and dependency parsing shows these models set a new state of the art, surpassing prior Ancient Greek BERT variants and even outperforming the winning system of EvaLatin 2022 under constrained settings.

Next, he designs two targeted probes that minimize task-specific training:
a synonym/antonym MLM probe instantiated in Greek, Latin, and English,
and a zero-shot cloze task on mythological family relations
to test cross-lingual knowledge access in T5-style models.

To look inside the models, he analyzes layer-wise hidden states of the multilingual (PhilBERTa) encoder-only model with t-SNE on parallel Bible sentences,
tracking nouns, verbs, and adpositions to see where language-agnostic structure emerges and where language identity reappears, using Greek's different script as a stress test.

t-SNE plot of Greek/Latin/English token representations

Finally, he turns to free generation with the decoder-only models, prompting them with Classical passages and examining continuations for part-of-speech distribution drift, subject–verb and verb–object order, and pseudo-perplexity,
asking whether multilingual training leaves a detectable "accent" in Greek or Latin text.

Parts of the thesis have been presented in the paper "Exploring Large Language Models for Classical Philology" at ACL 2023.
The accompanying models are now widely used in the community, supporting shared-task systems and enabling new research.

Warm congratulations to Frederick for his thesis
and for winning the 2025 GSCL Best Thesis Award!

Where to read more: