KenLM in Moses

Quick Start

Edit moses.ini and change the first language model number to 8. For example, put this in moses.ini


[lmodel-file]

8 0 5 foo.arpa

But I recommend that you build a binary file with


bin/build_binary foo.arpa foo.binary

and pass it instead


[lmodel-file]

8 0 5 foo.binary

The first digit says to use KenLM. The second digit applies to factor 0. The last digit is the order but KenLM ignores this and loads whatever is in the file you give.

Compilation

KenLM is distributed with Moses and compiled by default. I assume a recent revision. KenLM is fully threadsafe for use with multithreaded Moses.

Full or lazy loading

KenLM supports lazy loading via mmap. This allows you to further reduce memory usage, especially with trie which has good memory locality. In Moses, this is controlled by the language model number in moses.ini. Using language model number 8 will load the full model into memory (MAP_POPULATE on Linux and read() on other OSes). Language model number 9 will lazily load the model using mmap. I recommend fully loading if you have the RAM for it; it actually takes less time to load the full model and use it because the disk does not have to seek during decoding. Lazy loading works best with local disk and is not recommended for networked filesystems.