
In his work, Frederick builds a comprehensive language model zoo for Ancient Greek and Latin,
by curating and creating large-scale pre-training data, including
rescuing a high-quality Ancient Greek corpus from mis-OCRed Internet Archive scans via a novel identification-and-reprocessing pipeline.
He then pre-trains nine new models across three architectures (encoder-only, decoder-only, encoder–decoder) and three language configurations (Greek, Latin, and a multilingual Greek–Latin–English setup),
thereby expanding the computational toolkit for Classical Philology and
introducing the first generative models tailored to the field.
Thorough benchmarking on PoS tagging, lemmatization, and dependency parsing shows these models set a new state of the art, surpassing prior Ancient Greek BERT variants and even outperforming the winning system of EvaLatin 2022 under constrained settings.
Next, he designs two targeted probes that minimize task-specific training:
a synonym/antonym MLM probe instantiated in Greek, Latin, and English,
and a zero-shot cloze task on mythological family relations
to test cross-lingual knowledge access in T5-style models.
To look inside the models, he analyzes layer-wise hidden states of the multilingual (PhilBERTa) encoder-only model with t-SNE on parallel Bible sentences,
tracking nouns, verbs, and adpositions to see where language-agnostic structure emerges and where language identity reappears, using Greek's different script as a stress test.

Finally, he turns to free generation with the decoder-only models, prompting them with Classical passages and examining continuations for part-of-speech distribution drift, subject–verb and verb–object order, and pseudo-perplexity,
asking whether multilingual training leaves a detectable "accent" in Greek or Latin text.
Parts of the thesis have been presented in the paper "Exploring Large Language Models for Classical Philology" at ACL 2023.
The accompanying models are now widely used in the community, supporting shared-task systems and enabling new research.
Warm congratulations to Frederick for his thesis
and for winning the 2025 GSCL Best Thesis Award!
Where to read more:
- MA Thesis (access to come) & ACL 2023 paper
- Homepage: Frederick Riemenschneider
- GSCL Bi-Annual Best Thesis Award
- Toy Applications for Morphological Analysis and Machine Translation