Kenneth Heafield

alt at kheafield dot com353 Serra Mall
kheafield.comStanford, CA 94305

Statistical machine translation, algorithms for big data, language modeling, and natural language processing
PhD, Carnegie Mellon
August 2008–September 2013
Awarded by the Language Technologies Institute in the School of Computer Science.
Advisor: Professor Alon Lavie
Dissertation: “Efficient Language Modeling Algorithms with Applications to Statistical Machine Translation”
Bachelor of Science, Caltech
September 2003–March 2007
Double major in Mathematics and Computer Science, with honors.
Postdoctoral Scholar, Stanford
October 2013–
Supervisor: Professor Christopher Manning
Responsible for machine translation efforts at Stanford, including supervising two PhD students and a Master’s student. Current research includes web-scale text processing, algorithms for machine translation, and applications of neural networks.
Research Associate, University of Edinburgh
Aug.–Dec. 2011; Aug. 2012–Sept. 2013
Supervisor: Professor Philipp Koehn
Created an efficient search algorithm for syntactic machine translation, created efficient language model estimation, contributed to the Moses machine translation system, and informally advised PhD students.
Software Engineer, Google
March 2007–August 2008
Applied machine learning to library records as part of the Google Books team and lectured at MIT about Hadoop.
Intern, Infosys Technologies
July–September 2006
Traveled to Bangalore, India for an internship with the Software Engineering Technology Lab. Applied latent Dirichlet allocation to automatically organize source code.
Undergraduate Researcher, Netlab at Caltech
June 2005–June 2006
Developed an error model for kernel principal component analysis and applied it to automatically analyze computer network traffic and flag possible attacks.
Undergraduate Researcher, Galaxy Evolution Explorer
June 2004–March 2007
Created a database with 193 million rows and mined it for variable stars.
Open-Source Software
An efficient library for estimating and querying language models. Compared with SRILM, querying is 2.4 times as fast and uses 57% of the memory. It has been adopted by all major open-source machine translation systems.
Hypergraph Search
Implements my new search algorithm for syntactic machine translation, which makes translation 1.6–6.0 times as fast as cube pruning.
System Combination (MEMT)
Combines the outputs of multiple machine translation systems into a single sentence with better quality.
Community Evaluation Results

First Place in Three Language Pairs, Workshop on Machine Translation
Using a language model estimated on 126 billion tokens, my three submissions ranked first in their respective language pairs, each of which had 11–13 participants.
Best System Combinations, Workshop on Machine Translation
Submitted to all ten system combination tracks, each of which had 2–8 participants. Eight of my submissions ranked first in their respective tracks.
National Science Foundation Graduate Research Fellowship
$121,500 in stipend and tuition over three years
Google Peer Bonus and Site Award
For lecturing at MIT on Hadoop while a Software Engineer at Google
International Collegiate Programming Contest Regional
Ranked third of fifty in a team of two instead of three
Carnation Scholarship
Year of full Caltech tuition based on academic merit; 38 awarded per year
Richard and Dena Krown Summer Undergraduate Research Fellowship
$5,000 for ten weeks of summer research in networking
Summer Undergraduate Research Fellowship
$5,000 for ten weeks of summer research in astronomy data mininug
Invited Talks

Faster and Better Machine Translation
Language Model Algorithms
Language Model Algorithms
Numen Digital
Faster Decoding for Machine Translation and Lattices
Xerox Research Centre Europe
Faster Decoding for Machine Translation and Lattices
Qatar Computing Research Institute and Carnegie Mellon-Qatar
Faster Search for Machine Translation
Hong Kong University of Science and Technology
Language Model Rest Costs and Space-Efficient Storage
September 2013Tutorial: Language Model Implementation, MT Marathon
March 2013Tutorial: Language Modeling with KenLM, QCRI
March 2013Guest Course Lecture: Machine Translation, Carnegie Mellon
October 2012Guest Course Lecture: Advanced NLP, University of Edinburgh
September 2012Tutorial: Chart Based Decoding, MT Marathon
Spring 2012Teaching Assistant: Language and Statistics, Carnegie Mellon
September 2011Tutorial: Language Modeling, MT Marathon
Fall 2010Teaching Assistant: Algorithms for NLP, Carnegie Mellon
Program Committees
2012–2013Empirical Methods in Natural Language Processing (EMNLP)
2013North American Association for Computational Linguistics (NAACL)
2011–2013Workshop on Statistical Machine Translation (WMT)
2012International Conference on Computational Linguistics (COLING)
2012European Association for Computational Linguistics (EACL)
2011Transactions on Asian Language Information Processing (TALIP)
2011MT Journal