Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation Abstract: We present an application of counterfactual learning from logged bandit feedback to statistical machine translation (SMT). The goal is to optimize a target SMT system given a log of user feedback to translations predicted by a historic SMT system. The challenge lies in the fact that production SMT systems use a deterministic, steady logging policy, which conflicts with the theoretical requirement of random exploration in logging. We show that standard off-policy evaluation applies to stochastic bandit logs, and then turn to counterfactual learning from deterministic bandit logs. We demonstrate that additive and multiplicative control variates can serve to avoid degenerate behavior in empirical risk minimization by reducing variance of updates, and at the same time smoothing out deterministic components in learning. Our simulation experiments show improvements of up to 2 BLEU points by counterfactual learning from deterministic bandit feedback.