Source Language Representations for Speech Translation

End-to-end models for speech translation more tightly couple speech recognition
(ASR) and machine translation (MT) than a traditional cascade of separate ASR
and MT models, with simpler model architectures and the potential for reduced
error propagation. However, end-to-end models do not yet consistently perform as
well as cascaded models, particularly in low-resource scenarios. I will discuss
some challenges for building end-to-end speech translation models (and why we
might still want to draw inspiration from cascades), and alternate source
representations to potentially address these challenges, some of which
re-introduce linguistic features from cascades.