Domain-adaptation with End-to-End Data for Pipelined Spoken Language Translation

Spoken Language translation (SLT) brings together automatic speech recognition 
(ASR) and machine translation (MT). While recent advances in seq-2-seq models 
showed promising results in end-to-end SLT, the classical pipelined approach of 
combining ASR and MT systems has advantages, for example in a low-resource 
setup. Our approach combines out-of-domain ASR and MT systems, which are 
fine-tuned for domain-adaptation on end-to-end data such as English audio and 
German transcriptions of TED-talks. In each fine-tuning step, we generate new 
training data and implement a self-training strategy which avoids the need for 
gold transcriptions.