Interactive NMT via Phrase-Level Bandit Feedback 

Online Learning is a natural application of bandit feedback in neural machine translation. However, in online learning, input data are usually observed once, a single epoch. Can our model learn more efficiently in one epoch? We introduce a new online training algorithm, which allows users to give feedback, phrase-level feedback, in locations where the system is less confident, during the construction of a full translation. Our model utilise the advantage actor-critic architecture proposed by Nguyen et al. 2017. We train our system on Europarl and evaluate it on News Commentary. In comparison to its sentence level training, our method improve both Character-F and Corpus BLEU.