email website at this domain name
Language Technologies Institute
5000 Forbes Ave GHC 5407
Pittsburgh, PA 15213

Open Source Code

Released code, mostly in chronological order. The language model filter is a self-contained code sample.

Machine Translation

All of these are documented in my MT Marathon paper. Each tarball contains a README with compilation instructions. Code I wrote is under the LGPL and all dependencies are open source.
system combination.tar.gz
My research: the multi-engine machine translation system. See the README.
language model filter.tar.gz
Fast filtering of language models to multiple vocabularies. Yields a 92% reduction in model size for system combination and 36% for translation systems. Updated June 9, 2010.
scoring.tar.gz
Script that makes it easy to score machine translation output using NIST's BLEU and NIST, TER, and METEOR. Use plaintext with one segment per line instead of three different formats. Puts all the scores on a single line ready for inclusion in a table. Updated June 10, 2010.

Fun with C++

In addition to this code, I have quiz on C++ corner cases.
producer consumer.h
Exception safe producer consumer class supporting multiple readers and writers. Uses Boost.
underhanded.c
For the 2008 underhanded C competition. The goal is to appear to properly redact a PPM file while leaking part of it.
prime time.c
Looking for a prime time? Call 5373737.
text twist cheat.tar.gz
Lists words for use in text twist.

UNIX Tools

website generate.rb
Ruby program that generates the titles, menus, and XHTML incantations for this website.
subject untag.procmailrc
Procmail to remove a mailing list subject tag. Useful if the list is already filtered to a folder, making the tag redundant. Without this, ACL conference announcements had only the country visible in my mail client.