Solr sends JSON as text/plain

Yet another reason not to use Solr. The discussion in this Jira issue is interesting.

The reason for this as I understand is to enable viewing the json response as as text in the browser.

Is there perhaps a more general feature we could turn this into? An expert level ability or parameter to set a custom content-type?

The problem right now is in the current class hierarchy of the response writers.

the NamedList is a weird datastructure for those who are not so used to Solr. You don’t know what is included in that unless you do an instanceof. Most of the users are happy to write out the documents

to handle this problem I would use ‘wt=json&wt.mime-type=application/json’

Phrase analysis and expansion with Ruby

The idea is to take a phrase and analyze it for use in Information Retrieval. We need to tokenize it into words, possibly transmute some of the tokens, possibly expand some tokens into subphrases. This class lets you register lambdas to perform transformations, substitutions, and expansions. Expansions can take a numerical value representing the cost of the operation; this is intended for raising or lowering the scores of matches in the theoretical IR application.

Given the phrase “joe’s sushi & bait-shop shack”, assume I want to tokenize on whitespace, replace the ampersand with the word “and”, and create word variants for the hyphenized and apostrophized words. See the last spec for an example of the Ruby data structure this class generates.

class Analyzer
def initialize
@expansions = []
@transformations = []
@substitutions = {}
@tokenizer = lambda { |string| string.split }
def tokenizer(&proc)
@tokenizer = proc
def expansion(cost=0.0, &proc)
@expansions << [cost, proc]
def substitution(input, output)
@substitutions[input] = output
alias_method :sub, :substitution
def transformation(&proc)
@transformations << proc
def tokenize(string)
def process_token(token)
@transformations.each do |proc|
token =
if out = @substitutions[token]
token = out
variants = {}
@expansions.each do |cost, proc|
if variant =
variants[variant] = cost
variants.size > 0 ? [token, variants] : token
def analyze(string)
tokenize(string).map { |token| process_token(token) }
describe "An Analyzer" do
before do
@analyzer =
it "can take a custom tokenizer" do
@analyzer.tokenizer { |string| string.split(/\s+/) }
@analyzer.tokenize("three blind mice").should == %w{three blind mice}
@analyzer.tokenizer { |string| string.scan(/[\w']+/) }
@analyzer.tokenize("joe's bait-shop").should == %w{joe's bait shop}
it "can perform weighted term expansions" do
@analyzer.expansion(0.5) { |word| "'", "") if word =~ /'/ }
@analyzer.expansion(0.5) { |word| word.chomp("'s") if word =~ /'s$/ }
@analyzer.process_token("joe's").should == ["joe's", {"joe" => 0.5, "joes" => 0.5}]
@analyzer.process_token("boring").should == "boring"
it "can transform terms" do
@analyzer.transformation { |word| word.reverse }
@analyzer.process_token("123").should == "321"
it "can substitute terms" do
@analyzer.substitution("&", "and")
@analyzer.process_token("&").should == "and"
it "expands terms after substitutions" do
@analyzer.expansion { |word| "ampersand" if word == "and" }
@analyzer.substitution("&", "and")
@analyzer.process_token("&").should == ["and", {"ampersand" => 0.0}]
it "substitutes after transformations" do
@analyzer.substitution("joe", "joseph")
@analyzer.transformation { |word|'m', 'j') }
@analyzer.process_token("moe").should == "joseph"
it "does phrases, if you know how to Enumerable#map" do
@analyzer.sub("&", "and")
@analyzer.expansion(0.5) { |word| "'", "") if word =~ /'/ }
@analyzer.expansion(0.5) { |word| word.chomp("'s") if word =~ /'s$/ }
@analyzer.expansion(3.0) { |word| word.split('-') if word =~ /-/ }
@analyzer.expansion(0.1) { |word|'-', '') if word =~ /-/ }
orig = "joe's sushi & bait-shop shack"
analyzed = [
["joe's", {"joe" => 0.5, "joes" => 0.5}],
["bait-shop", {"baitshop" => 0.1, ["bait", "shop"] => 3.0}],
@analyzer.analyze(orig).should == analyzed

view raw
hosted with ❤ by GitHub

Deletes, Transposes, Replaces, Inserts

Very simplistic rudiments of a spell checker in Ruby. Based on Norvig’s article.

Instant.rake: Compile and run individual Java classes using Rake

Sometimes, when forced to work with Java, you just want to copy and paste some code and fiddle with it. A real project build system is overkill. Try Instant.rake:

Improved object wrapper for JRuby Embed

New in JRuby 1.4 is JRuby Embed, which lets you eval Ruby from Java classes. It works, appears to be well-written, and needs some sugar. Here’s a class that limits your options in a helpful way.

King’s Third Rule of Software Development

Any software project not written in Java will clearly state on its homepage the implementation language.

The Need of a Study of Anatomy (also Swans)

“In our initial sketches for compositions, when memory has to take the place of the living model, we rely to a great extent on our anatomical knowledge for the suggestion of action and form generally. And again it adds materially to our faculties for self-criticism, which, like a sense of humour, is often, nearly always, our salvation.”
Solomon J. Solomon, The practice of oil painting and of drawing as associated with it

Knowledge of your tools is necessary, but not sufficient. The choices you make when planning the structure of software depend on your knowledge of the problem domain. A project is limited (sometimes crippled) by your comprehension of the form and motion and constraints of the body before you.

“It looks like it was made with, you know… longing. Made by a person really longed to see a swan”
Kaylee, Firefly

Self-criticism and a sense of humor are ineluctably linked, I find. Those who have not the capacity to criticize their own efforts lack most of the capacity to laugh at their own failings. If you don’t think it’s funny when you spend two hours failing to find a mindless bug in a simple depth-first traversal function, then you’re not me.

The Beginning of the End for Rubyforge

Jamis Buck is abandoning development of SQLite/Ruby, SQLite3/Ruby, Net::SSH and Capistrano. I do not say this derogatorily; Jamis owes us Capistrano like George R. R. Martin owes us A Dance with Dragons.

In the comments to that post, Dr Nic asked,

… were there ever “core contributors” who could be all added to the rubyforge project’s admin so they can start releasing new versions? Or did you ask all of them and no one said they’d take over the project?

Jamis replied:

“[T]here are no other core contributors. I tried once to create something like that, but no one else seemed to have the “passion” or “vision”. Lots of people submitting patches (many of them quite good!), but no one demonstrating a real, general desire to dig into the internals. That’s kind of why I left it like I did—there really wasn’t any heir-apparent that the keys could be left to.

“That said, if someone steps forward and seems to be getting community support (for any of the projects) behind them, I’ll be happy to give them admin access to the appropriate rubyforge pages.”

Rubyforge served a purpose for several years, and served it well. But Rubyforge is a bottleneck in the distribution of code, and this is exacerbated by the Ruby community’s reliance not only on RubyGems, but on the idea of the canonical, official version of a project. The increased popularity of distributed version control releases some of the pressue. GitHub has substantially reduced the friction involved in collaboration. Even so, the idea still holds that once a line of work is ready, you release it on Rubyforge, so that it’s official.

Good coders, even those not afflicted with a love of novelty, will eventually grow bored with their projects. The distribution model represented by Rubyforge cannot, or at least should not, long survive this human tendency.

Ruby FFI example using #ffi_lib

Ruby FFI is a cross-VM library for calling foreign functions (i.e. C or C++).  It isn’t obvious from the introductory blog posts how you specify which library to use, but the answer isn’t hard to find in the source.
Examples speak louder than words:
require 'rubygems'
require 'ffi'

class MDB
  extend FFI::Library
  # The lib name gets spackled with platform-specific 
  # prefix and suffix. On Mac OS X, e.g., the ffi_lib
  # name turns into 'libmdb.dylib'
  ffi_lib 'mdb'
  # Who needs enum, anyway?
  attach_function :mdb_init, [], :void
  attach_function :mdb_exit, [], :void
  # In the libmdb headers, you'll find that this function
  # actually returns a pointer to an MDBHandle struct.  
  # FFI::Struct would likely help out here, but just
  # calling the return result a :pointer works for now.
  attach_function :mdb_open, [ :string, :int], :pointer
  attach_function :mdb_close, [ :pointer], :void 
    db = MDB.mdb_open( path, MDB::NOFLAGS)
    yield db
  attach_function :mdb_dump_catalog, [:pointer, :int], :pointer
end'mdb_files/sample.mdb') do |db|
  MDB.mdb_dump_catalog(db, MDB::MDB_TABLE)

Set a size limit on a Ruby/LDAP query

The RDocs available online for Ruby/LDAP are not much help here.  If you download the source, though, you’ll find an FAQ file with the goods.

    conn = 'localhost', 389 )
    # Limit the results set to a maximum of 10.
    conn.set_option( LDAP::LDAP_OPT_SIZELIMIT, 10 )

Use Array#pack and String#unpack instead of Base64

Array#pack(“m*”) is your friend. So is String#unpack(“m*”). You can use them instead of the Base64 methods encode64 and decode64.

Hat tip to Rack’s authentication example.

ActiveMDB is on GitHub

ActiveMDB development, such as it is, will now take place on GitHub.