Kenneth Heafield

<r at kheafield.com>Language Technologies Institute
http://kheafield.comCarnegie Mellon University
5000 Forbes Ave GHC 5407
Pittsburgh, PA 15213

Interests
Machine translation, machine learning, distributed systems, theoretical computer science
Education
PhD program, Carnegie Mellon
August 2008–
Language Technologies Institute in the School of Computer Science; 3.95/4.0 GPA.
With my adviser Alon Lavie, I work on machine translation system combination. I have performed system combination in the NIST Open MT, DARPA GALE, and Workshop on Machine Translation evaluations. Many sites contribute translations to these evaluations which are of competitive quality but nonetheless differ significantly. My research focuses on combining these translations to produce an improved translation using techniques adapted from statistical machine translation. Evaluations have recently added tracks specifically to evaluate system combination, in which I have shown improvement from 1 to 5 BLEU points over the best individual system. Human evaluation from the Workshop on Machine Translation found my output is best within margin of error for English translations from each of the five other languages.
Bachelor of Science, Caltech
September 2003–March 2007
Double major in Mathematics and Computer Science; 3.8/4.0 GPA, with honors.
Beyond the required courses, I focused on formal language theory, distributed systems, information theory, and combinatorics. As a student, I:
  • Worked for two research projects: Netlab and Galaxy Evolution Explorer, yielding two conference posters and a journal article.
  • Did a summer internship in Bangalore at Infosys, yielding a conference paper and patent application.
  • Worked for the IT department as a dormitory tech and security tester.
  • Represented undergraduates on the Computing Advisory Committee.
  • Finished a quarter early.

Skills
Languages
 Extensive C++, C, Ruby, SQL, BASH, LATEX, and HTML; Some Java and CSS
Software
 Taught Hadoop; Administered Linux, PostgreSQL, and Apache; Used MySQL, Octave, Gnuplot, and GTK
Awards
National Science Foundation Graduate Research Fellowship
2008–
$121,500 in stipend and tuition over three years
Google Peer Bonus and Site Award
2008
For lecturing at MIT on Hadoop while a Software Engineer at Google.
International Collegiate Programming Contest Regional
2006–07
Ranked third of fifty in a team of two instead of three
Carnation Scholarship
2005–06
Full Caltech tuition academic merit scholarship, 38 awarded per year
Richard and Dena Krown Summer Undergraduate Research Fellowship
2005
$5,000 for ten weeks of summer research
Summer Undergraduate Research Fellowship
2004
$5,000 for ten weeks of summer research
Employment Experience
Google
March 2007–August 2008
As a Software Engineer with Google Book Search, I worked on a team that uses machine learning to compile card catalogs from multiple sources into a single coherent catalog of books. Previously, I created the scoring system behind a search function in Picasa Web Albums. To share Google’s approach to distributed systems, I lectured at MIT on the Hadoop MapReduce framework.
Infosys Technologies
July–September 2006
I traveled to Bangalore, India to intern with the research division of Infosys, India’s second largest software outsourcing company. We investigated automatic reorganization of legacy source code. Specifically, I applied and customized Latent Dirichlet Allocation to derive topics from names of functions and local variables. For example, it found SSL and logging topics in Apache source code while correctly tagging files belonging to both topics.
Netlab
June 2005–June 2006
As a Richard and Dena Krown Summer Undergraduate Research Fellow, I developed an error model for kernel Principal Component Analysis (kPCA). Professor Low hired me to continue with implementation during the school year. I applied it to identify possible attacks in network traffic, which appear as points with unusually high distance from the manifold learned by kPCA.
Fastsoft
January–April 2006
Netlab spun off a startup and I worked for them as a contractor. Using FAST TCP, the Netlab algorithm responsible for breaking Internet speed records, their Aria product accelerates connections passing through it. This allows senders to use high performance networks more efficiently without custom operating systems. I setup experiments and worked on the performance monitoring and configuration interface.
Galaxy Evolution Explorer
June 2004–March 2007
I started working for the Galaxy Evolution Explorer (GALEX) project as a Summer Undergraduate Research Fellow. My goal was finding variable stars and asteroids in observations made by their satellite. To do so, I created a database of all 193 million source measurements and used it to find and analyze over ninety variable objects. The findings were reported in two posters and one journal article. After the summer, they hired me to continue working on the database and to help scientists find interesting data.
Conferences
Presentation
Heafield. CMU-StatXfer Group System Combination. Proc. NIST Open MT Workshop 2009, Ottawa, Canada (August 31-September 1, 2009).
Paper and Poster
Heafield, Hanneman, Lavie. Machine Translation System Combination with Flexible Word Ordering. Proc. EACL 2009 Fourth Workshop on Statistical Machine Translation, Athens, Greece (March 30-31, 2009), 56–60.
Paper
Rama, Sarkar, Heafield. Mining Business Topics in Source Code using Latent Dirichlet Allocation. Proc. 1st India Software Engineering Conference, Hyderabad, India (Feb 19-22, 2008), 113–120.
Poster
Browne, Wheatley, Welsh, Seibert, Heafield, Rich, and the GALEX Science Team. RR Lyrae Stars in the Far Ultraviolet: GALEX Observations Compared with Theoretical Predictions. Bulletin American Astronomical Society Poster Sessions 37 (2006).
Poster
Welsh, Wheatley, Heafield, Seibert, Browne, and the GALEX Science Team. The Flaring UV Sky. Bulletin American Astronomical Society Poster Sessions 36 (2005).
Journals
Article
Welsh, Wheatley, Heafield, Seibert, et al. The GALEX Ultraviolet Variability Catalog. The Astronomical Journal 130 (2005), 825–831.
Patents
Application
Rama, Heafield, and Sarkar. A Method For Extracting Business Topic From A Source Code. US and Indian applications filed (2008).

Publications and unofficial transcript are available at http://kheafield.com/professional/.