Neural Machine Translation Speed
Machine translation can be computationally expensive, leading to the term TPU core century. To encourage computational efficiency, the Workshop on Neural Generation and Translation has a recurring efficiency shared task. I took over the 2020 task and now it's open for rolling submission.
See also:
Participants built machine translation systems from English to German using the WMT 2019 news data condition. Then I measured their performance translating 1 million sentences.
The original evaluation had three participants: OpenNMT, NiuTrans, and (also the organizer) the University of Edinburgh. The graphs in this page include original submissions and those made since the evaluation.
The task focuses on the quality and cost of deploying translation systems:
- How good are the translations?
- Approximated by sacrebleu. Specifically, the average sacrebleu on WMT11 and WMT13-WMT19; I call this WMT1*. The purpose was to create a bit of surprise for participants and avoid overfitting to one test set. BLEU is not as good as human evaluation, so we submitted two fast Czech systems for human evaluation in WMT20.
- How fast?
- Speed on an Intel Xeon Platinum 8270 CPU and NVIDIA T4 GPU.
- How big?
- The size of the model on disk and how much RAM it consumes while running. There's also Docker image size, but this mostly reflects how much of Ubuntu teams threw into their Docker image.
Results
There is no single "best" system but rather a range of trade-offs between quality and efficiency. Hence we highlight the submissions that have the best quality for a given cost (or equivalently the best cost for a given quality). These are the systems that appear on the Pareto frontier: the black staircase shown on the plots. Anything below the Pareto frontier is worse than another submission according to the metrics on the plot (but may have optimized for something else). The happy face 😊 shows where an ideal system would be.
Speed
Speed is measured in terms of the words per second translating 1 million sentences from English to German.
Some of the slower Edinburgh submissions were a buggy version with a memory leak; the fixed versions also appear.