
Beyond Scores: Interpretable Essay Scoring with Internalized Linguistic Structures
Abstract
Automated essay scoring (AES) often functions as a black box, limiting its educational value. This paper explores enhancing LLM-based AES by enabling models to grasp and apply deep linguistic structures internally. We propose a multi-stage fine-tuning approach: a Llama model is sequentially trained on Argumentation Structure (AS) and Rhetorical Structure Theory (RST) corpora, followed by training on an essay scoring dataset. Critically, during essay scoring, explicit AS/RST structural information is not provided, allowing us to assess the internalization of these linguistic knowledge domains. Our key contribution is demonstrating that the fine-tuned LLM can then generate linguistically informed explanations for its scores, directly referencing AS and RST concepts, even without being trained on feedback data. We compare these explanations with those from models lacking AS/RST exposure, highlighting the enhanced professional depth and interpretability of our model. This work advances AES beyond mere prediction, paving the way for more transparent and pedagogically valuable automated writing feedback.