NSF Activities Summary

August 2008-May 2009

Please summarize your activities within the last year. Please write for the non-technical reader and for possible use in publicizing the Graduate Research Fellowship Program.

Services like Google Translate give users some idea what a sentence means, but automatically translated output can be misleading or confusing. For example, typing the French sentence "Le fait est que les dividendes ont été amputé," into Google yields the translation "The fact is that dividends have been amputated." The translation is understandable but too literal: "amputate" is the wrong word to use in English. One natural coping strategy is to use multiple translation services, hoping that their errors are complementary. Another system from the University of Le Mans and Systran translates the same sentence as "The fact is that the dividends were cut off." While reading two translations of one sentence aids understanding, it becomes tiresome to read ten translations of the same book. This is where my research enters: how can multiple translations be automatically combined into a single improved translation? This past year, I took ownership of and enhanced software that does this combination. Currently, it combines these two translations into "The fact is that the dividends have been cut off." This combined translation is close to a professional human translation: "As a matter of fact, the dividend has been omitted."

The previous example sentence comes from the 4th Workshop on Statistical Machine Translation where I published a paper [1] on combination methodology, presented a related poster, and ranked or tied best in five tracks. The workshop starts by soliciting translations from various machine translation teams. These translations are released for system combination, providing each system combination team with the same data. Finally, the independent translations and combined translations are evaluated by human judges. This past year, I participated in system combination for translations into English, with plans to participate in other tracks next year. For example, I combined English translations from Czech and submitted to the Czech to English track. I also did the same for English translations from each of French, German, Spanish, and Hungarian. Five other teams ran their system combination approaches and submitted to mostly the same tracks. Human judgments found my submission was best or tied with the best system for translations from each individual language into English.

I also ran system combination as part of the Rosetta team in the DARPA GALE program. Led by IBM, Rosetta consists of several universities and companies, with most contributing a translation system. This year, I worked on their Chinese to English system, with plans to work on Arabic to English next year. Each of these systems produces an independent translation of each Chinese sentence. System combination produces a single improved translation of each Chinese sentence, which the team submits to DARPA. Since each team has several translation systems but must submit only one combined translation, GALE has spurred system combination development during recent years. In fact, the Rosetta team alone has three approaches to combination. One approach attempts to pick the best candidate translation, preserving an entire sentence at a time. Another from Johns Hopkins preserves the word ordering from one of the candidate translations, but can substitute words based on a voting scheme. The system I maintain chops candidate translations into pieces then reassembles some of the pieces into a combined sentence. Each of these approaches defines a different set of possibilities and different criterion for selecting a best combined sentence. I am working on making these criterion compatible across approaches, leading to an improved combination system.

Primarily, I investigate and improve one approach to system combination. Given several independent translations, the first step is to identify which parts mean the same thing. When words match exactly, differ slightly in conjugation, or are synonyms, the process is fairly straightforward. Indeed, many words match in these two machine translations from French: "Allow me to address a few priority elements," and "Allow me to mention a few points." However, the words "elements" and "points" do not appear as synonyms in WordNet, a popular electronic database of English words that we use for matching. While WordNet is precise, matching quality would improve if approximate synonyms were detected. For that reason, I did a class project to automatically find approximate synonyms in large bodies of text, in this case Wikipedia. Close synonyms "mention" and "address" scored highly, while looser synonyms "elements" and "points" had a lower score, still above that of unrelated words.

Once matching words are identified in the translations, the next step is to assemble a combined sentence. Internally, many candidate combined sentences are considered, with the single best chosen at the end. These candidates are assembled one word at a time, starting with the first word of the candidate. This first word can be any of the first words in the translations being combined. As more words are added, candidates can continue following the same translation or switch to a different one, avoiding bad translations much like a driver weaves between lanes to avoid congestion. Like traffic codes, a number of constraints and incentives are designed to ensure the combined sentence neither omits information nor repeats it. Much of my research effort goes into the design and evaluation of these rules. For example, if two translation systems agree on a particular phrasing, it is more likely to be correct. However, this should also take into account how trustworthy the underlying systems are. I have found that the quality of the combined sentence depends heavily on the amount of trust placed in each contributing system. For this reason, I devised and am implementing a method to automatically learn the appropriate amounts of trust by comparing systems with human translations. It is also important that the combined sentence be fluent, especially when candidates can switch between translations. As is standard in machine translation, we measure fluency by how well it agrees with natural usage. Agreement with natural usage is measured by gathering large amounts of text and looking for similar phrases. A combined sentence using more frequent phrases is more likely to be fluent. By balancing the notions of trust and fluency, the system elects a single best combined sentence from the candidates.

Along with algorithm improvements, I have made code and speed improvements, which enable me to create and run experiments faster. A rewrite of the code, performed during my first semester, made the system 2.27x faster and easier to understand. In my second semester, I added the ability to process sentences in parallel. In practice, this parallelism makes experiments another 3.9x faster. Since experiments run more quickly, I can explore more parameters to both gain intuition and optimize for a particular task. For example, submission to the Workshop on Machine Translation consisted of 19 primary and alternate configurations, selected from 405 distinct configurations, each of which was tuned for about 5 iterations on 2 samples of 402 sentences. In total, more than 1,628,100 combined sentences were produced. Speed is crucial given the often short time period allowed by various evaluations.

As this was my first year in the program, I took four courses covering machine translation, machine learning, natural language processing algorithms, and language modeling. While machine translation is directly relevant, techniques from all of these courses are used in my research. In addition to the class project I did in synonymy detection, machine learning is used to tune translation and combination systems. Standard natural language processing algorithms are used inside translation systems to parse input sentences. Language modeling measures fluency of candidate combinations to aid selection of the final combined sentence.

In summary, I took ownership of a system combination method, rewrote most of the code, participated in two evaluations, published a paper, and took four classes. Currently, no other grant funds these activities. The Graduate Research Fellowship lets me focus on the quality of my work rather than meeting the immediate needs of a sponsor. Nonetheless, system combination is increasingly important to team efforts in machine translation, where my work is already producing results.

[1] Kenneth Heafield, Greg Hanneman, and Alon Lavie. 2009. Machine translation system combination with flexible word ordering. In Proceedings of the Fourth Workshop on Statistical Machine Translation, Athens, Greece, March. Association for Computational Linguistics.
This material is based upon work supported under a National Science Foundation Graduate Research Fellowship. Any opinions, findings, conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.