whym.org
About
This website hosts a number of software packages for text processing, machine learning, typography etc, written partially or entirely by me (whym).
Projects
-
Stream-based InputFormat for processing TB-scale XML dumps of Wikipedia with Hadoop.
-
N-gram based indexing and retrieval of Wikipedia's 390 million revisions.
-
A wanna-be machine learning library powered by SWIG.
-
A web service converting texts by composing & decomposing ligatures.