This is the multi-engine matchine translation system from Carnegie Mellon. Contact kheafiel+memt at cs.cmu.edu The latest release is available from http://kheafield.com/code/ or by asking the author to make a release. This document shows how to compile and run the system. For technical documentation, see http://kheafield.com/professional/. Paths are relative to the MEMT directory. REQUIREMENTS We assume the following are installed: java (for METEOR and ZMERT) python (for METEOR's installation) bash Scripts are provided in ../install for the following (see ../install/README): icu >= 4.2 boost >= 1.41.0 boost jam >= 1.41.0 ruby You will also need a tokenizer and an APRA format or Suffix Array language model. COMPILATION In this directory, run # bjam release [-jPARALLELISM] # Alignment/compile.sh The Alignment/compile.sh command will also download and setup evaluation metrics if they haven't been already. Downloading the paraphrase corpus takes a while. TUNING MEMT uses weights tuned to the specific systems begin combined. This shows how to find those weights using MERT. Running MERT requires three files in a working directory: dev.matched, dev.reference, and decoder_config_base . Below are instructions for creating each of them. For each system, create a file containing _tokenized_ 1-best output, one sentence per line. A tokenizer is not provided. Run # Alignment/match.sh system0.txt system1.txt ... systemn.txt >dev.matched This runs the METEOR matcher on the system outputs. The dev.reference file contains references in plain text. If there's more than reference, place the references for a single sentence consecutively, like so: reference 0 for sentence 0 reference 1 for sentence 0 reference 0 for sentence 1 reference 1 for sentence 1 This is the format used by METEOR's text files and by ZMERT. It should be normal text; no need to tokenize or lowercase. If you have separate files for each reference, use ../Utilities/scoring/interlace.rb ref0.txt ref1.txt >dev.reference decoder_config_base contains the decoder configuration without weights. Here's an example that works alright: beam_size = 500 output.nbest = 300 horizon.stay_threshold = 0.8 score.verbatim0.individual = 2 score.verbatim0.collective = 4 horizon.method = length horizon.radius = 5 For documentation of the various options, run scripts/server.sh --help Launch the decoding server. Tell it where to find the language model (using --lm.file foo.arpa) and which port to run on (e.g. --port 2000) scripts/server.sh --lm.file foo.arpa --port 2000 It will print "Accepting Connections on port 2000" when ready. Background it or go to another terminal. Run MERT: scripts/zmert/run.sh working/directory 2000 You can also specify host:port to find the server. Multiple MERTs can use the same server in parallel. The end product of the MERT run is working/directory/decoder_config. DECODING This requires a running decoding server, decoder_config (including tuned weights), and a matched input file. Run scripts/simple_decode.rb 2000 decoder_config matched SCORING The ../Utilities/scoring directory contains a scoring script. Run score.rb to see options. Typically you can run score.rb --hyp-tok output.1best --refs-laced reference.txt which produces output.1best.scores. Run score.rb without an argument for documentation.