Section Classification in German Discharge Letters using Few-Shot Learning and Prompting
To alleviate clinical routine, medical IE research aims to make use of unstructured information in clinical documents such as discharge letters. Recent developments in NLP showed promising results in text classification tasks using pre-trained language models (PLM) and prompts. In our work we evaluate pattern-exploiting training (PET, Schick et al, 2021), a semi-supervised few-shot learning method using prompts. Prompting exploits the ability of PLMs to infer knowledge from context, achieving state-of-the-art results on various text classification tasks. Hence we explore this method in a German clinical IE text classification task, aiming to classify sentences from German discharge letters into eleven section categories, using section annotations of a publicly available German clinical corpus. We evaluate PET using eight differently pre-trained language models. As baselines we compare our results to traditional sequence classification models. Experimenting with various few-shot training sizes our best performing PET model is based on a domain- and task-adapted large BERT model. It achieves strong improvements over the baseline and publicly available PLMs for small shot sizes, which continuously decrease with larger shot sizes. By studying interpretability methods, we were able to optimize few-shot training sets and achieved to optimize a base BERT model such, that it achieves comparable classification results as a large BERT model for selected section classes. Overall, our results imply that PET can significantly improve classification results in a clinical setup on low-resource languages like German in terms of limited annotated data and computational resources.