Improving Neural Machine Translation via Human Reinforcement

We present methods to improve neural machine translation (NMT) from human reinforcement, i.e., from bandit-type feedback in form of human quality ratings for proposed machine translations instead of from gold standard human reference translations. We investigate explicit and implicit bandit feedback that has been collected from human users of an e-commerce platform, and present different algorithms to optimize NMT parameters
from logged feedback. We show that explicit user judgments of translation quality, e.g., five-star ratings, are not reliable and do not yield BLEU improvements in bandit learning. In contrast, implicit task-based feedback collected in a cross-lingual search task can be used successfully to improve task-specific metrics and to optimize BLEU with different learning algorithms.