Using images to ground machine translation Multi-modal Machine Translation (MMT) is a relatively new research topic only recently addressed by the Machine Translation (MT) research community in the form of a shared task. The practical goal of MMT is to build MT models that use image information to better translate image descriptions, i.e. by improving the translation of ambiguous terms that could in principle be disambiguated by an image (e.g., an image of a jaguar could probably disambiguate whether a certain mention "jaguar" means the car brand or the animal species). There are many different conceivable ways to extract visual information from images, as well as different MT architectures that one can incorporate visual information into. In this talk, I will discuss how to incorporate both global and local image features obtained with publicly available pre-trained Convolutional Neural Networks into the Attention-based Neural Machine Translation architecture.