title: How meaningful are the recent improvements in coreference resolution? Apart from having reliable evaluation metrics, we also need to have reliable validation sets to ensure valid developments on coreference resolution. A validation set is reliable if a considerable improvement on the dataset indicates a better solution for the coreference problem instead of a better exploitation of the dataset itself. We show that the current split of the CoNLL dataset rewards overfitted coreference resolvers. Therefore, it misleads the correct assessment of meaningful improvements for coreference resolution.