Using LMEMs to Analyse Model Performance Beyond Mean Measures

author:

date: 30. April 2021

download: https://github.com/StatNLP/empirical_methods


Marking or Post Edit? What Feedback Method Works Better?

read data

A traditional way of analyzing system performance

average over replicates (seed)

conduct model based ANOVA

present model matrix to show dummy coding

conduct significance test

A smarter way to model and analyze the evaluation data

model data via LMEM to account for repeated measurements on the same sentence

LMEM omnibus test

reconstruct group means and differences plus test

Is the Performance Gain from Feedback Uniform for all Sentences?

classify sentences by length

LMEM with system/input length interaction as fixed effect

omnibus test for interaction model

calc group means investigate interaction

conduct post hoc test