email website at this domain name
Language Technologies Institute
5000 Forbes Ave GHC 5407
Pittsburgh, PA 15213
At the edge of the water on an island near Santorini

I am a second-year PhD student in the Language Technologies Institute at Carnegie Mellon. Broad interests are machine translation, machine learning, distributed systems, and theoretical computer science. Narrow interests lie in the intersections of the broad ones.

With my adviser Alon Lavie, I work on machine translation system combination. This means I build on top of other translation systems (i.e. Babelfish and Google Translate) by using several at once and combining their output into an improved sentence. For more on my recent work, see a technical paper or non-technical activities report.

Before Carnegie Mellon, I worked at Google as a Software Engineer, at Caltech in two research labs and the IT department while earning a Bachelor in Mathematics and Computer Science a quarter early, and at Infosys in Bangalore as a research intern. My Curriculum Vitæ is available in html, pdf, ps, and dvi.

Publications

Paper, Presentation, and Code
Heafield and Lavie. Combining Machine Translation Output with Open Source: The Carnegie Mellon Multi-Engine Machine Translation Scheme. The Prague Bulletin of Mathematical Linguistics No. 93, 2010, pp. 27—36. ISBN 978-80-904175-4-0. doi: 10.2478/v10108-010-0008-4.
Description, Presentation, and Evaluation
Heafield, 2009. CMU-StatXfer Group System Combination. Proc. NIST Open MT Workshop 2009, Ottawa, Canada, August 31—September 1. I also did Arabic and formal system combination; the system descriptions for these are similar.1
Paper, Poster, and Evaluation
Heafield, Hanneman, and Lavie, 2009. Machine Translation System Combination with Flexible Word Ordering. Proc. EACL 2009 Fourth Workshop on Statistical Machine Translation, Athens, Greece, March 30—31.
Paper and Patent Application
Rama, Sarkar, and Heafield, 2008. Mining Business Topics in Source Code using Latent Dirichlet Allocation. Proc. 1st India Software Engineering Conference, pages 113—120, Hyderabad, India, Feb 19—22.2
Poster
Browne, Wheatley, Welsh, Seibert, Heafield, Rich, and the GALEX Science Team, 2006. RR Lyrae Stars in the Far Ultraviolet: GALEX Observations Compared with Theoretical Predictions. Bulletin American Astronomical Society, Poster Sessions 37, January.
Journal Paper
Welsh, Wheatley, Heafield, Seibert, et al, 2005. The GALEX Ultraviolet Variability Catalog. The Astronomical Journal 130, pages 825—831.
Poster
Welsh, Wheatley, Heafield, Seibert, Browne, and the GALEX Science Team, 2005. The Flaring UV Sky. Bulletin American Astronomical Society, Poster Sessions 36, January.

Reports

National Science Foundation Graduate Research Fellowship

Since August 2008, I am a National Science Foundation Graduate Research Fellow.3
2009 summary
May 2009 summary of activities since starting August 2008
Past Research
Application essay about my past research
Desire
Application essay about wanting to be a graduate student
Plan
A viable research plan in natural language processing

Google

From March 2007 to August 2008, I worked at Google as a Software Engineer on Picasa Web Albums and Google Book Search. To share Google's approach to distributed systems, I lectured on the Hadoop MapReduce framework as part of a 3-day class at MIT. I wrote and delivered the introduction, basic join, and entropy lectures.4 For running this class, our peers at Google gave us each a Peer Bonus while management gave us a Site Award.
Intro
Intended to follow a lecture on MapReduce theory, this introduces basic Hadoop programming
Diff
A few slides to explain reducers as joining data from separate sources
k-Means
Run through of the Hadoop API followed by k-means clustering
Entropy
Introduces an entropy-based word weighting scheme and uses it to motivate performance strategies

Netlab

In 2005, I worked for Netlab at Caltech as a Richard and Dena Krown Summer Undergraduate Research Fellow. Professor Low hired me after the summer and I continued until my Infosys internship in June 2006. These reports were prepared for the fellowship.
Paper and Presentation
Heafield, 2005. Detecting Network Anomalies With Kernel Principal Component Analysis.
Proposal
Heafield and Low, 2005. Locality Preservation in Manifolds to Reduce Dimensionality. Accepted for Summer Undergraduate Research Fellowship 2005.

Galaxy Evolution Explorer

Galaxy Evolution Explorer (GALEX) is a NASA satellite observatory. Starting in 2004 as a Summer Undergraduate Research Fellow, I found about 90 variable stars and asteroids in their 193 million measurements. They hired me to continue working with their data until I graduated in March 2007. Results are published and therefore listed under Publications, above.
Presentation
Heafield and Seibert, 2004. Transiting and Variable Objects: A Search Through Galaxy Evolution Explorer Observations.

Information Management Systems and Services

I worked for Caltech's IT department as a student representative and later as a security tester. They hired me as a security tester after I sent them this video:
Exploit
As part of a class project to make a course registration system, I found a simple hole in Caltech's production system. This shows how to use my roommate's login to read my grades. It has been patched.