Auto-Encoding Variational Neural Machine Translation Abstract: Translation data are typically a byproduct of various data generating processes, yet translation models learn a single conditional distribution which potentially conflates most factors of variation. I will present an alternative where sentence pairs are jointly modelled as draws from the marginal of a deep generative model. I will point out statistical caveats of conditional modelling and show that our joint model is better equipped to learn from mixed-domain and noisy data. Note: This model is an instance of a variational auto-encoder, a class of generative models parameterised by neural networks, which I will revisit in the talk including efficient parameter estimation for this class via amortised variational inference.