Joey NMT - A Minimalist NMT Toolkit for Novices

Meme: proposing a simple and understandable NMT implementation. — Proposing Joey NMT.

Another neural machine translation (NMT) toolkit like all the others? No, this one is for you - students, novices, beginners, newbies, and for the lovers of quick prototyping and minimalism. Joey NMT matches the quality of standard toolkits such as Sockeye and OpenNMT with only one fifth of the code!

NMT toolkits have been popping up constantly over the last five years, just as deep learning frameworks keep evolving. As a newcomer it is difficult to find the best path through the NMT toolkit jungle. Guiding features are often 1) popularity, 2) the deep learning framework that the toolkit builds on, 3) the machine translation quality, 4) documentation, 5) speed, and 6) commmunity support. Our goal is to make the start easier with a clean code base, solid documentation and a focus on the important implementation details.

Please find the code on GitHub: joeynmt.

Why Joey NMT?

If you’re working on a thesis on NMT, or an internship project, or you quickly want to implement a research idea, you don’t want to get frustrated by spending days of reading through huge code bases, trying to follow inheritance hierarchies and fill the gaps in the (outdated?) documentation, and updating your fork every day to try to keep up with the most recent changes. So let’s look at what Joey NMT has to offer - I’ll give you five reasons to give Joey NMT a try.

Joey NMT builds on Pytorch, a beginner-friendly Deep Learning library in Python that has lots of open-source tutorials and examples online.
It matches benchmark performance of large-scale industry-led projects like Sockeye for RNN-based and Transformer models. That means you can rely on good baselines and quickly evaluate your new ideas. Find the detailed results here or in the paper.

WMT 17 benchmark results.
Its readability was empirically evaluated in a user study with expert and novice NMT users. Novices were able to quickly understand the code base without teacher, just a little slower than the experts (see “User Study” in our EMNLP paper). The cleanliness of the code base is ensured with the help of Pylint checks. We build a flat hierarchy with maximum one level of inheritance, slightly preferring sequential over hierarchical code solutions.

Example question from the quiz that participants of the study used to explore Joey NMT code.
It has an extensive documentation: docstrings, in-line comments (including tensor shapes!), FAQs and a tutorial, ranging from simple use cases to instructions on how to extend the model, tune and visualize the progress. In fact, the comment-to-code ratio is almost twice as high as in other frameworks. So you’ll actually be able to read natural language, not just code.
Its purpose is to be stable and minimalist rather than implementing the latest hottest feature. No surprises with API changes over night.

And if that’s not enough, here are two more bonus points:

We released pre-trained benchmark models for large-scale tasks (WMT17 en-de/lv) but also on low-resource South-African languages (Autshumato corpus as prepared in the Uxhumana project, en-af/nso/tn/ts/zu). No need for you to re-train these models. You can use them off-the-shelf for translations, distillations, and fine-tuning.
There’s a growing community of people (and accordingly github forks) who use and extend it in different directions, e.g. for learning with various levels of feedback or hieroglyph translation. That means you can take inspiration from other people’s integration solutions. Most prominently, Joey NMT is also used to train NMT models for African languages in the Masakhane project with the goal to put Africa on the NMT map.

What’s in it?

When developing Joey NMT we set the minimalist goal to achieve at least 80% quality compared to SOTA, with 20% of the code. As a result, Joey NMT now provides the following features (aka the bare necessities of NMT):

Recurrent Encoder-Decoder with GRUs or LSTMs
Transformer Encoder-Decoder
Attention Types: MLP, Dot, Multi-Head, Bilinear
Word-, BPE- and character-based input handling
BLEU, ChrF evaluation
Beam search with length penalty and greedy decoding
Customizable initialization
Attention visualization
Learning curve plotting

The EMNLP paper (Kreutzer et al., 2019) describes the details of the RNN and Transformer implementations, and also provides a comparison of features across toolkits (very last page of the Appendix).

What’s next?

How to get started?

Check out the tutorial (YouTube screencast) for a quick walk-through for synthetic data or the Masakhane notebook that describes every step from data preprocessing to model evaluation.
Missing something?

Talk to us on Gitter or raise an issue on GitHub.
How to get involved in development?

If you’d like to contribute, make a pull request for your Joey NMT extensions or look at open issues to see where your help would be welcome.

Acknowledgment: Thanks to all students and colleagues from ICL Heidelberg and the Masakhane project who helped to improve the code quality. And thanks to Stefan Riezler, Mayumi Ohta and Jasmijn Bastings for their feedback on this post.

Disclaimer: This blogpost reflects solely the opinion of the author, not any of her affiliated organizations and makes no claim or warranties as to completeness, accuracy and up-to-dateness.

Comments, ideas and critical views are very welcome. We appreciate your feedback! If you want to cite this blogpost, cite the Joey NMT paper instead:

Julia Kreutzer, Jasmijn Bastings and Stefan Riezler

Joey NMT: A Minimalist NMT Toolkit for Novices

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, Hong Kong, China, 2019

pdf | code | bib

@inproceedings{joey2019,
  author = {Kreutzer, Julia and Bastings, Jasmijn and Riezler, Stefan},
  title = {Joey {NMT}: A Minimalist {NMT} Toolkit for Novices},
  journal = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing {(EMNLP-IJCNLP)}: System Demonstrations},
  year = {2019},
  city = {Hong Kong, China},
  url = {https://www.aclweb.org/anthology/D19-3019}
}