Photo of me

I am a postdoctoral scholar at Stanford with Christopher Manning. Recently, I completed my PhD from Carnegie Mellon advised by Alon Lavie. I also worked for Philipp Koehn as a staff research associate at the University of Edinburgh. Further information can be found in my Curriculum Vitæ.

My interests are machine translation, language models, efficient algorithms, and distributed systems. My thesis focuses on efficient language modeling algorithms with applications to machine translation, leading to first-place performance in the Workshop in Machine Translation and a net 3.2—10.0x speedup for syntactic machine translation.

Open-Source Projects

Hypergraph search
Finds high-scoring hypotheses in hypergraphs or lattices. It is 1.15 to 6.60 times as fast as cube pruning.
KenLM
An efficient language modeling toolkit based on streaming algorithms and custom data structures. Applying it to estimate a large language model led to first-place performance in the 2013 Workshop on Statistical Machine Translation.
MEMT (System combination)
Combines the output of several machine translation systems into a single improved output. In the 2011 Workshop on Statistical Machine Translation, it won 8 of 10 system combination tracks.

Papers

2014

2013

2012

2011

2010

2009

2008

2006

2005

Reports

National Science Foundation Graduate Research Fellowship NSF

In 2008, I was awarded a National Science Foundation Graduate Research Fellowship. The application required three essays: a summary of past work, motivation, and a potential research plan.

Google Google Books
Picasa Web Albums
MIT

From March 2007 to August 2008, I worked at Google as a Software Engineer on Picasa Web Albums and Google Book Search. To share Google's approach to distributed systems, I lectured on the Hadoop MapReduce framework as part of a 3-day class at MIT. I wrote and delivered the introduction, basic join, and entropy lectures.4 Involved employees received a Site Award and a Peer Bonus.
Intro
Intended to follow a lecture on MapReduce theory, this introduces basic Hadoop programming
Diff
A few slides to explain reducers as joining data from separate sources
k-Means
Run through of the Hadoop API followed by k-means clustering
Entropy
Introduces an entropy-based word weighting scheme and uses it to motivate performance strategies

Netlab Netlab
Fastsoft

In 2005, I worked for Netlab at Caltech as a Richard and Dena Krown Summer Undergraduate Research Fellow. Professor Low hired me after the summer and I continued until my Infosys internship in June 2006. These reports were prepared for the fellowship.

Galaxy Evolution ExplorerGALEX logo

Galaxy Evolution Explorer (GALEX) is a NASA satellite observatory with science operations at Caltech. Starting in 2004 as a Summer Undergraduate Research Fellow, I found about 90 variable stars and asteroids in their 193 million measurements. They hired me to continue working with their data until I graduated in March 2007. Results are published and therefore listed under Publications, above.

Information Management Systems and ServicesCaltech

I worked for Caltech's IT department as a student representative and later as a security tester. They hired me as a security tester after I sent them this video of an exploit in their production course registration system. The video shows how to use my roomate's login to read my grades. It has been patched.