OPENSUBTITLES data: ------------------- This corpus entails 315 German-English movie text pairs downloaded from the official web-side of the OpenSubtitles2011 corpus: http://opus.lingfil.uu.se/OpenSubtitles_v2.php. NOTE that the original OpenSubtitles corpus is changing with time. The corpus entails: - 309 training file pairs (from the years 2009-2010) - 6 test file pairs (from the year 2011) In both training and test set we provide the following folder structure: * de/: all German texts, with file name -.txt * en/: all English texts, with file name -.txt * align/: sentence-wise alignment from German to English texts aligned by J�rg Tiedemann's time stamp based alignment tool (see paper below), with file name -.txt. For test set, we additionally provide manual alignments, with file name -.goldAlign J�rg Tiedemann, 2009, News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In N. Nicolov and K. Bontcheva and G. Angelova and R. Mitkov (eds.) Recent Advances in Natural Language Processing (vol V), pages 237-248, John Benjamins, Amsterdam/Philadelphia The paper is available under http://stp.lingfil.uu.se/~joerg/published/ranlp-V.pdf