Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

The Underlying Logic of Language Models

Abstract

To combat unreliable LLM generations in knowledge-intensive tasks, models are increasingly grounded in evidence from external sources. This makes it possible for conflicting pieces of information to be present in the model context. Thus far, the role of the source of retrieved information in resolving such conflicts has gone largely unexamined. In this talk, we present a controlled setup to isolate LLM source preferences and how they affect conflict resolution, motivated by interdisciplinary research on credibility. We find a consistent source credibility hierarchy, identify relevant source features, and examine whether these preferences can be elicited via direct prompting. By disentangling the effects of pure repetition of information and corroboration by multiple sources, we show that models are vulnerable to simple adversarial attacks based on repetition. Finally, we propose a fine-tuning paradigm that mitigates this behavior without diminishing models’ source preferences.