Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

German-English parallel patent corpus released

We are happy to announce the release of a parallel corpus of patent text for the German-English language pair. The corpus has been constructed from EPO, WIPO and USPTO patent documents extracted from the MAREC collection and contains 23 million sentence pairs from all patent text sections.

All sentences are labeled with metadata: patent document id, patent family, patent classification and publication date. The corpus is distributed under a Creative Common License. For more information and download, please see the PatTR-Website