Kenneth Heafield

alt at kheafield dot comUniversity of Edinburgh
kheafield.com10 Crichton Street
Edinburgh EH8 9AB
United Kingdom

Interests
Machine translation, language modeling, distributed systems, theoretical computer science
Current Positions
Research Associate, University of Edinburgh
August–December 2011; August 2012–
PhD Student, Carnegie Mellon
August 2008–August 2013
I am working on my PhD thesis as staff at the University of Edinburgh with Philipp Koehn and as a PhD student with Carnegie Mellon advised by Alon Lavie. My thesis focuses on a new hypergraph search algorithm for efficient syntactic machine translation, building on my efficient language model storage library.
Projects
Hypergraph Search
Syntactic machine translation decoding consists of two steps: parse the input sentence into a hypergraph and search the hypergraph for good translations. Search is commonly done with cube pruning. As part of my thesis, I designed a new search algorithm. My implementation is currently 1.5–3.5 times fast as cube pruning. It is available standalone or with command-line options in Moses and cdec.
KenLM
An efficient language model library. Compared with the widely-used SRILM, the default is 2.4 times as fast while using 57% of the memory. Additional options save more memory. It is used in many machine translation systems (including Moses, cdec, Joshua, Phrasal, and Ncode) and in speech recognition.
System Combination
My system combination software won the Workshop on Machine Translation (WMT) 2011 system combination task in eight of ten language pairs. In WMT 2010, it won six of eight language pairs.

My code is open source (LGPL).

Software Familiarity
Contributed to the Moses, cdec, and Joshua translation systems.
Extensive C++ with Boost, C, Ruby, SQL, Bash, and LATEX; Some Java.
Taught Hadoop; Administered Linux and PostgreSQL; Used MySQL, Octave, and PBS.
Awards
National Science Foundation Graduate Research Fellowship
2008–11
$121,500 in stipend and tuition over three years
Google Peer Bonus and Site Award
2008
For lecturing at MIT on Hadoop while a Software Engineer at Google
International Collegiate Programming Contest Regional
2006–07
Ranked third of fifty in a team of two instead of three
Carnation Scholarship
2005–06
Full Caltech tuition academic merit scholarship, 38 awarded per year
Richard and Dena Krown Summer Undergraduate Research Fellowship
2005
$5,000 for ten weeks of summer research in networking
Summer Undergraduate Research Fellowship
2004
$5,000 for ten weeks of summer research in astronomy
Background
Bachelor of Science, Caltech
September 2003–March 2007
Double major in Mathematics and Computer Science; 3.8/4.0 GPA, with honors. Courses focused on formal language theory, distributed systems, information theory, and combinatorics. I did three internships: two with Caltech research labs and one with Infosys in Bangalore. The IT department hired me as a dormitory technician and security tester. Student government appointed me to the university-wide Computing Advisory Committee. Lastly, I finished a quarter early and went to work for Google.
Google
March 2007–August 2008
As a Software Engineer with Google Book Search, I worked on a team that uses machine learning to compile card catalogs from multiple sources into a single coherent catalog of books. Previously, I created the scoring system behind a search function in Picasa Web Albums. To share Google’s approach to distributed systems, I lectured at MIT on the Hadoop MapReduce framework.
Infosys Technologies
July–September 2006
I traveled to Bangalore, India to intern with the research division of Infosys, India’s second largest software outsourcing company. We investigated automatic reorganization of legacy source code. Specifically, I applied and customized Latent Dirichlet Allocation to derive topics from names of functions and local variables. For example, it found SSL and logging topics in Apache source code while correctly tagging files belonging to both topics.
Netlab
June 2005–June 2006
As a Richard and Dena Krown Summer Undergraduate Research Fellow, I developed an error model for kernel Principal Component Analysis (kPCA). Professor Low hired me to continue with implementation during the school year. I applied it to identify possible attacks in network traffic, which appear as points with unusually high distance from the manifold learned by kPCA.
Fastsoft
January–April 2006
Netlab spun off a startup and I worked for them as a contractor. Using FAST TCP, the Netlab algorithm responsible for breaking Internet speed records, their Aria product accelerates connections passing through it. This allows senders to use high performance networks more efficiently without custom operating systems. I setup experiments and worked on the performance monitoring and configuration interface.
Galaxy Evolution Explorer
June 2004–March 2007
I started working for the Galaxy Evolution Explorer (GALEX) project as a Summer Undergraduate Research Fellow. My goal was finding variable stars and asteroids in observations made by their satellite. To do so, I created a database of all 193 million source measurements and used it to find and analyze over ninety variable objects. The findings were reported in two posters and one journal article. After the summer, they hired me to continue working on the database and to help scientists find interesting data.
Publications
Teaching
March 2013Tutorial: Language Modeling with KenLM, Qatar Computing Research Institute
March 2013Guest Course Lecture: Machine Translation, Carnegie Mellon
October 2012Guest Course Lecture: Advanced NLP, University of Edinburgh
September 2012Tutorial: Chart Based Decoding, MT Marathon
Spring 2012Teaching Assistant: Language and Statistics, Carnegie Mellon
September 2011Tutorial: Language Modeling, MT Marathon
Fall 2010Teaching Assistant: Algorithms for NLP, Carnegie Mellon
Program Committees
2013NAACL
2012Coling, EMNLP, EACL
2011-12Workshop on Machine Translation
2011Transactions on Asian Language Information Processing, MT Journal