In Iceland after a geothermal swim

I am a research associate with Philipp Koehn at the University of Edinburgh. Currently, I am working on hypergraph search as part of my Carnegie Mellon PhD thesis advised by Alon Lavie. My interests are machine translation, language models, machine learning, distributed systems, and theoretical computer science.

Before Carnegie Mellon, I worked at Google on Book Search and Picasa, at Caltech in Netlab and GALEX while earning a BSc in Mathematics and Computer Science, and in Bangalore at Infosys as a research intern. My Curriculum Vitæ is available in html and pdf.

Recent Projects

Each project has accompanying open source (LGPL) code in C++.
Fast and accurate hypergraph search
 in the presence of language models. I have focused on applying it to syntactic machine translation while others have found it useful for phrase-based translation, dependency-to-string translation, and spell checking.
Language model estimation and querying (KenLM)
 that is simultaneously faster, smaller, and at least as accurate compared to other packages in common cases.
System combination (MEMT)
 builds on top of other machine translation systems to produce one improved translation. Several research groups submitted system combinations to the 2011 Workshop on Machine Translation; my submission ranked best in 8 of 10 scenarios.

Publications

All papers in BibTeX format

Thesis Proposal

Estimating Language Models

Decoding with Language Models

Querying Language Models

System Combination

Topic Modeling for Source Code

Image Recommendation

Variable Stars

Reports

National Science Foundation Graduate Research Fellowship NSF

In 2008, I was awarded a National Science Foundation Graduate Research Fellowship. The application required three essays: a summary of past work, motivation, and a potential research plan.

Google Google Books
Picasa Web Albums
MIT

From March 2007 to August 2008, I worked at Google as a Software Engineer on Picasa Web Albums and Google Book Search. To share Google's approach to distributed systems, I lectured on the Hadoop MapReduce framework as part of a 3-day class at MIT. I wrote and delivered the introduction, basic join, and entropy lectures.4 Involved employees received a Site Award and a Peer Bonus.
Intro
Intended to follow a lecture on MapReduce theory, this introduces basic Hadoop programming
Diff
A few slides to explain reducers as joining data from separate sources
k-Means
Run through of the Hadoop API followed by k-means clustering
Entropy
Introduces an entropy-based word weighting scheme and uses it to motivate performance strategies

Netlab Netlab
Fastsoft

In 2005, I worked for Netlab at Caltech as a Richard and Dena Krown Summer Undergraduate Research Fellow. Professor Low hired me after the summer and I continued until my Infosys internship in June 2006. These reports were prepared for the fellowship.

Galaxy Evolution ExplorerGALEX logo

Galaxy Evolution Explorer (GALEX) is a NASA satellite observatory with science operations at Caltech. Starting in 2004 as a Summer Undergraduate Research Fellow, I found about 90 variable stars and asteroids in their 193 million measurements. They hired me to continue working with their data until I graduated in March 2007. Results are published and therefore listed under Publications, above.

Information Management Systems and ServicesCaltech

I worked for Caltech's IT department as a student representative and later as a security tester. They hired me as a security tester after I sent them this video of an exploit in their production course registration system. The video shows how to use my roomate's login to read my grades. It has been patched.