Phrase analysis and expansion with Ruby

The idea is to take a phrase and analyze it for use in Information Retrieval. We need to tokenize it into words, possibly transmute some of the tokens, possibly expand some tokens into subphrases. This class lets you register lambdas to perform transformations, substitutions, and expansions. Expansions can take a numerical value representing the cost of the operation; this is intended for raising or lowering the scores of matches in the theoretical IR application.

Given the phrase “joe’s sushi & bait-shop shack”, assume I want to tokenize on whitespace, replace the ampersand with the word “and”, and create word variants for the hyphenized and apostrophized words. See the last spec for an example of the Ruby data structure this class generates.

Deletes, Transposes, Replaces, Inserts

Very simplistic rudiments of a spell checker in Ruby. Based on Norvig’s article.

Instant.rake: Compile and run individual Java classes using Rake

Sometimes, when forced to work with Java, you just want to copy and paste some code and fiddle with it. A real project build system is overkill. Try Instant.rake:

Improved object wrapper for JRuby Embed

New in JRuby 1.4 is JRuby Embed, which lets you eval Ruby from Java classes. It works, appears to be well-written, and needs some sugar. Here’s a class that limits your options in a helpful way.

King’s Third Rule of Software Development

Any software project not written in Java will clearly state on its homepage the implementation language.