Me plotting to throw a snowball I am a new PhD student in the Language Technologies Institute at Carnegie Mellon. Broad interests are natural language processing, machine learning, theoretical computer science, and distributed systems. Narrow interests lie in the intersections of the broad ones. Kenneth Heafield
<papers at kheafield.com>
Newell Simon Hall A502

Language Technologies Institute
Carnegie Mellon University
5000 Forbes Ave NSH 4502
Pittsburgh, PA 15213
vita pdf ps dvi html
cmu
aug 2008-
New PhD student, Language Technologies Institute, School of Computer Science, Carnegie Mellon
nsf
aug 2008-
Winning application essays for the National Science Foundation Graduate Research Fellowship Program. Though the application was about Princeton, I am using the grant at Carnegie Mellon.
desire pdf ps dvi Why I want to be a graduate student
past pdf ps dvi Past research experience
plan pdf ps dvi A viable research plan in natural language processing
google
mar 2007-
aug 2008
I worked as a Software Enginner on Google Book Search and Picasa Web Albums. To share Google's approach to distributed systems, I lectured on the Hadoop MapReduce framework as part of a 3-day class at MIT. I delivered the introduction, basic join, and entropy lectures, all written from scratch.1
intro pdfIntended to follow a lecture on MapReduce theory, this introduces basic Hadoop programming.
basic join pdfA few slides to explain reducers as joining data from separate sources
k-means pdfRun through of the Hadoop API followed by k-means clustering
entropy pdfIntroduces an entropy-based word weighting schemeand uses it to motivate performance strategies
caltech
sep 2003-
mar 2007
BS, double major in math and computer science, 3.8/4.0 GPA with honors. I lived in Dabney house.
grades pdf ps dvi Unofficial transcript without course descriptions
explained pdf ps dvi Unofficial transcript with course descriptions
infosys
july 2006-
sep 2006
The research arm of Infosys, SETLabs, develops software engineering technology. I used Latent Dirichlet Analysis to automatically organize source code files into a meaningful directory structure.
paper pdf G. Rama, S. Sarkar, K. Heafield. Mining Business Topics in Source Code using Latent Dirichlet Allocation. Proceedings of the 1st India Software Engineering Conference, Hyderabad, India (Feb 19-22 2008), 113-120.2
netlab
june 2005-
june 2006
Netlab's FAST protocol broke many Internet speed records. I developed an error model for kernel Principal Component Analysis (kPCA) and used it to find anomalous packets.
paper pdf ps dvi K. Heafield. Detecting Network Anomalies With Kernel Principal Component Analysis. Based on work done for Summer Undergraduate Research Fellowship 2005 and two quarters of CS 90: Undergraduate Research.
talk pdf swf sxi K. Heafield, S. Low. Manifold Learning to Detect Changes in Networks. For Summer Undergraduate Research Fellowship 2005.
proposal pdf ps dvi K. Heafield, S. Low. Locality Preservation in Manifolds to Reduce Dimensionality. Accepted for Summer Undergraduate Research Fellowship 2005.
galex
june 2004-
mar 2007
Galaxy Evolution Explorer (GALEX) is a NASA satellite observatory. My goal was finding variable stars and asteroids in 193 million measurements made by their satellite. I found about 90.
paper pdf B. Welsh, J. Wheatley, K. Heafield, M. Seibert, et al. The GALEX Ultraviolet Variability Catalog. The Astronomical Journal 130 (2005), 825-831.
poster pdf S. Browne, J. Wheatley, B. Welsh, M. Seibert, K. Heafield, R. Rich, and the GALEX Science Team. RR Lyrae Stars in the Far Ultraviolet: GALEX Observations Compared with Theoretical Predictions. Bulletin of the American Astronomical Society, Poster Sessions 37 (2006).
poster pdf B. Welsh, J. Wheatley, K. Heafield, M. Seibert, S. Browne, and the GALEX Science Team. The Flaring UV Sky. Bulletin of the American Astronomical Society, Poster Sessions 36 (2005). 205, January, 2005.
talk pdf swf sxi K. Heafield, M. Seibert. Transiting and Variable Objects: A Search Through Galaxy Evolution Explorer Observations. For Summer Undergraduate Research Fellowship 2004.
Note the presentation uses animations and only the openoffice (sxi) format has them.
imss
feb 2005-
mar 2007
Information Management Systems and Services (IMSS) is Caltech's IT department. As part of a class project to make a new course registration system, I found a hole in the existing one.
video mpeg I found a URL in Caltech's course registration system that produced transcripts given any student ID number. This was part of the report sent to the Director of Information Security, who subsequently hired me as a security tester. It has been patched.