![]() |
I am a new PhD student in the Language Technologies Institute at Carnegie Mellon. Broad interests are natural language processing, machine learning, theoretical computer science, and distributed systems. Narrow interests lie in the intersections of the broad ones. | Kenneth Heafield <papers at kheafield.com> Newell Simon Hall A502 Language Technologies Institute Carnegie Mellon University 5000 Forbes Ave NSH 4502 Pittsburgh, PA 15213 |
| vita | ps | dvi | html | |||
|---|---|---|---|---|---|---|
|
cmu aug 2008- |
New PhD student, Language Technologies Institute, School of Computer Science, Carnegie Mellon | |||||
|
nsf aug 2008- |
Winning application essays for the National Science Foundation Graduate Research Fellowship Program. Though the application was about Princeton, I am using the grant at Carnegie Mellon. | |||||
| desire | ps | dvi | Why I want to be a graduate student | |||
| past | ps | dvi | Past research experience | |||
| plan | ps | dvi | A viable research plan in natural language processing | |||
|
google mar 2007- aug 2008 |
I worked as a Software Enginner on Google Book Search and Picasa Web Albums. To share Google's approach to distributed systems, I lectured on the Hadoop MapReduce framework as part of a 3-day class at MIT. I delivered the introduction, basic join, and entropy lectures, all written from scratch.1 | |||||
| intro | Intended to follow a lecture on MapReduce theory, this introduces basic Hadoop programming. | |||||
| basic join | A few slides to explain reducers as joining data from separate sources | |||||
| k-means | Run through of the Hadoop API followed by k-means clustering | |||||
| entropy | Introduces an entropy-based word weighting schemeand uses it to motivate performance strategies | |||||
|
caltech sep 2003- mar 2007 |
BS, double major in math and computer science, 3.8/4.0 GPA with honors. I lived in Dabney house. | |||||
| grades | ps | dvi | Unofficial transcript without course descriptions | |||
| explained | ps | dvi | Unofficial transcript with course descriptions | |||
| infosys july 2006- sep 2006 |
The research arm of Infosys, SETLabs, develops software engineering technology. I used Latent Dirichlet Analysis to automatically organize source code files into a meaningful directory structure. | |||||
| paper | G. Rama, S. Sarkar, K. Heafield. Mining Business Topics in Source Code using Latent Dirichlet Allocation. Proceedings of the 1st India Software Engineering Conference, Hyderabad, India (Feb 19-22 2008), 113-120.2 | |||||
|
netlab june 2005- june 2006 |
Netlab's FAST protocol broke many Internet speed records. I developed an error model for kernel Principal Component Analysis (kPCA) and used it to find anomalous packets. | |||||
| paper | ps | dvi | K. Heafield. Detecting Network Anomalies With Kernel Principal Component Analysis. Based on work done for Summer Undergraduate Research Fellowship 2005 and two quarters of CS 90: Undergraduate Research. | |||
| talk | swf | sxi | K. Heafield, S. Low. Manifold Learning to Detect Changes in Networks. For Summer Undergraduate Research Fellowship 2005. | |||
| proposal | ps | dvi | K. Heafield, S. Low. Locality Preservation in Manifolds to Reduce Dimensionality. Accepted for Summer Undergraduate Research Fellowship 2005. | |||
|
galex june 2004- mar 2007 |
Galaxy Evolution Explorer (GALEX) is a NASA satellite observatory. My goal was finding variable stars and asteroids in 193 million measurements made by their satellite. I found about 90. | |||||
| paper | B. Welsh, J. Wheatley, K. Heafield, M. Seibert, et al. The GALEX Ultraviolet Variability Catalog. The Astronomical Journal 130 (2005), 825-831. | |||||
| poster | S. Browne, J. Wheatley, B. Welsh, M. Seibert, K. Heafield, R. Rich, and the GALEX Science Team. RR Lyrae Stars in the Far Ultraviolet: GALEX Observations Compared with Theoretical Predictions. Bulletin of the American Astronomical Society, Poster Sessions 37 (2006). | |||||
| poster | B. Welsh, J. Wheatley, K. Heafield, M. Seibert, S. Browne, and the GALEX Science Team. The Flaring UV Sky. Bulletin of the American Astronomical Society, Poster Sessions 36 (2005). 205, January, 2005. | |||||
| talk | swf | sxi |
K. Heafield, M. Seibert. Transiting and Variable Objects: A Search Through Galaxy Evolution Explorer Observations. For Summer Undergraduate Research Fellowship 2004. Note the presentation uses animations and only the openoffice (sxi) format has them. |
|||
|
imss feb 2005- mar 2007 |
Information Management Systems and Services (IMSS) is Caltech's IT department. As part of a class project to make a new course registration system, I found a hole in the existing one. | |||||
| video | mpeg | I found a URL in Caltech's course registration system that produced transcripts given any student ID number. This was part of the report sent to the Director of Information Security, who subsequently hired me as a security tester. It has been patched. | ||||
