Amazon SimpleDB and CouchDB compared

Terminology mapping

  • What you and I (and CouchDB) would call a database, Amazon SimpleDB calls a domain.
  • CouchDb documents and SimpleDB items will be referred to in this post as records.
  • The JSON name:value pairs used in CouchDb documents and the attribute-value pairs in SimpleDB items will be called simply attributes.

A brief explanation: The developer documentation for SimpleDB states that attributes may have multiple values, but that attributes are uniquely identified in an item by their name/value combination. In the same paragraph, the docs give this as an example of an item’s attributes:

{ 'a', '1' }, { 'b', '2'}, { 'b', '3' }

By Amazon terminology, the ‘b’ attribute has two values. I think it clearer to regard this item as having three attributes, two of which have ‘b’ as their key.

What SimpleDB and CouchDB have in common

  • Not relational databases
  • Schemaless
  • CouchDB is built with Erlang. SimpleDB may be, as well.
  • support for data replication (this is a very sloppy generalization)
  • accessed via HTTP

How SimpleDB and CouchDB Differ

SimpleDB:
  1. provides SOAP and (what passes at Amazon for) REST interfaces to the API
  2. REST requests all use HTTP GET, specifying the API method with a query param
  3. requests specify the database, record, attributes, and modifiers with query params
  4. record creation, updating, and deletion is tomic, at the level of individual attributes
  5. all data is considered to be UTF-8 strings
  6. automatically indexes data, details unknown
  7. queries
    • limited to 5 seconds running time. Queries that take longer “will likely” return a time-out error.
    • defined with HTTP query parameters
    • composed of Boolean and set operations with some obvious comparison operators(=, !=, >, >=, etc.)
  8. as all values are UTF-8 strings, there are no sorting options.
  9. responses are XML
CouchDB:
  1. all REST, all the time
  2. requests use HTTP GET, PUT, POST, and DELETE with their usual RESTful semantics
  3. requests specify the database and record in the URL, with query params used for modifiers
  4. record creation, updating, and deletion is atomic
  5. supports all JSON data types (string, number, object, array, true, false, null)
  6. indexing is under user control, by means of “views”
    • defined with arbitrary Javascript functions
    • can be stored as documents
    • can be run ad hoc, as “temporary views”
  7. queries are basically views, with the addition of modifiers (start_key, end_key, count, descending) supplied as HTTP query parameters
  8. sorting is flexible and arbitrarily complex, as it is based on the JSON keys defined in the views. See here for more information
  9. responses are JSON

IRB: What was that method that greps again?

Giles Bowkett continually improves his .irbc file, and I’ve borrowed a few of those tricks. His latest is grep_methods, a helper to search for the methods available on an object. This is a very useful construct, but, as so often happens, there’s a better way baked in.


"my_arbitrary_string".methods.grep /ch/
=> ["each_byte", "match", "chomp!", "chop", "each_with_index", "chomp", "each_line", "each", "chop!"]

Default fonts too small in gitk on OS X when installed with MacPorts

This is an easy one, addressed obliquely by the blog post that is the number one result in Google. The blogger also sets his font to Arial, which no one should emulate.

Gitk works pretty much straight out of MacPorts. Bravo, except for the 9 point fonts. Really.

Launch X11 and cd to your git repository in xterm. Run gitk and marvel at the unreadable text.

Edit the top three lines of ~/.gitk to change the fonts and/or sizes. Mine looks like this:


set mainfont {Helvetica 12}
set textfont {Courier 12}
set uifont {Helvetica 12 bold}
set tabstop 8
...

All this assumes that your X11 profile has a usable $PATH.

Paging in OpenLDAP, or "What, no LIMIT or OFFSET?"

Disclaimer: I’m not an LDAP expert, but I’ve done a whole mess of reading about OpenLDAP lately. Let the knowledgeable correct me where I err.

Paging in LDAP is somewhat of a pain, and by “somewhat” I mean “asymptotically approaching totally”. In the ldapsearch tool, for example, you have to use a “search extension” argument, as paging is not part of the search filter syntax. This is as opposed to SQL queries, where you may specify a LIMIT and OFFSET in the WHERE clause. Thus LDAP clients must implement the pagedResults search control (and the LDAP directory server must support it).

It gets worse. Check out the way the paging is implemented when following RFC 2696 (http://www.faqs.org/rfcs/rfc2696.html). You can only specify the size of the result set, not the offset or a page number. The LDAP server returns a cookie with the search results. The client uses the cookie in the next pagedResults query, and the server uses the cookie to figure out where to start the next set of results. LDAP clients must treat the cookie as opaque, i.e. they shouldn’t know how to do anything other than send the cookie back to the server.

Thus the only way to paginate results on the server side appears to be by looping through all results. The client must retain a cookie from each query for use in the next. Hrmmm. Can you guess who wrote RFC 2696?

At the time of this writing, there are two Ruby libraries for LDAP access, and ActiveLdap can use either as its adapter. To the extent that Net::LDAP supports the pagedResults control, it is only to prevent ActiveDirectory from choking when a query returns more than 1000 results. See ./lib/net/ldap.rb:1158 for the code that handles the pagedResult control.

Ruby/LDAP does support pagedResults, which I should have figured out from the line in the TODO file that started the discussion on the mailing list that started my research: “Add result pagination via LDAP::Controls”. So I think adding support for the control to the Ruby/LDAP adapter for ActiveLdap should be practical.

It might be possible to roll your own pagination, in a very ugly way, by calling the ActiveLdap::Base#search method with a block that throws away results before and after the desired page set. Net::LDAP yields each entry *after* adding it to the result_set array, so you would need to set the entry to nil and compact the result.

Alternatively, perhaps you could override the Net::LDAP search method to yield the entry to the block first, then add it to the result_set only if not nil.

It’s ugly every way you look.

Here’s the link that started my digging:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/195249

Paging in OpenLDAP, or "What, no LIMIT or OFFSET?"

Disclaimer: I’m not an LDAP expert, but I’ve done a whole mess of reading about OpenLDAP lately. Let the knowledgeable correct me where I err.

Paging in LDAP is somewhat of a pain, and by “somewhat” I mean “asymptotically approaching totally”. In the ldapsearch tool, for example, you have to use a “search extension” argument, as paging is not part of the search filter syntax. This is as opposed to SQL queries, where you may specify a LIMIT and OFFSET in the WHERE clause. Thus LDAP clients must implement the pagedResults search control (and the LDAP directory server must support it).

It gets worse. Check out the way the paging is implemented when following RFC 2696 (http://www.faqs.org/rfcs/rfc2696.html). You can only specify the size of the result set, not the offset or a page number. The LDAP server returns a cookie with the search results. The client uses the cookie in the next pagedResults query, and the server uses the cookie to figure out where to start the next set of results. LDAP clients must treat the cookie as opaque, i.e. they shouldn’t know how to do anything other than send the cookie back to the server.

Thus the only way to paginate results on the server side appears to be by looping through all results. The client must retain a cookie from each query for use in the next. Hrmmm. Can you guess who wrote RFC 2696?

At the time of this writing, there are two Ruby libraries for LDAP access, and ActiveLdap can use either as its adapter. To the extent that Net::LDAP supports the pagedResults control, it is only to prevent ActiveDirectory from choking when a query returns more than 1000 results. See ./lib/net/ldap.rb:1158 for the code that handles the pagedResult control.

Ruby/LDAP does support pagedResults, which I should have figured out from the line in the TODO file that started the discussion on the mailing list that started my research: “Add result pagination via LDAP::Controls”. So I think adding support for the control to the Ruby/LDAP adapter for ActiveLdap should be practical.

It might be possible to roll your own pagination, in a very ugly way, by calling the ActiveLdap::Base#search method with a block that throws away results before and after the desired page set. Net::LDAP yields each entry *after* adding it to the result_set array, so you would need to set the entry to nil and compact the result.

Alternatively, perhaps you could override the Net::LDAP search method to yield the entry to the block first, then add it to the result_set only if not nil.

It’s ugly every way you look.

Here’s the link that started my digging:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/195249

Lisp tutorials in Practical Common Lisp

They’re excellent. Peter Seibel’s book is available free online, as well as in print. I read enough of the free stuff to realize that I needed to stop and buy the book when I’m ready to do some projects in CL.

You can read it free here:
Common Lisp tutorial

launchd plist to run a reverse ssh tunnel

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE plist PUBLIC -//Apple Computer//DTD PLIST 1.0//EN
http://www.apple.com/DTDs/PropertyList-1.0.dtd >
<plist version='1.0'>
<dict>
<key>Label</key><string>com.automatthew.ssh_tunnel</string>
<key>UserName</key><string>matthew</string>
<key>ProgramArguments</key>
<array>
        <string>/usr/bin/ssh</string>
        <string>-nNT</string>
        <string>-R 1389:127.0.0.1:389</string>
        <string>matthew@slice1.automatthew.com</string>
</array>
<key>Debug</key><false/>
<key>Disabled</key><false/>
<key>OnDemand</key><false/>
<key>RunAtLoad</key><false/>
</dict>
</plist>

launchd plist to run a reverse ssh tunnel

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE plist PUBLIC -//Apple Computer//DTD PLIST 1.0//EN
http://www.apple.com/DTDs/PropertyList-1.0.dtd >
<plist version='1.0'>
<dict>
<key>Label</key><string>com.automatthew.ssh_tunnel</string>
<key>UserName</key><string>matthew</string>
<key>ProgramArguments</key>
<array>
        <string>/usr/bin/ssh</string>
        <string>-nNT</string>
        <string>-R 1389:127.0.0.1:389</string>
        <string>matthew@slice1.automatthew.com</string>
</array>
<key>Debug</key><false/>
<key>Disabled</key><false/>
<key>OnDemand</key><false/>
<key>RunAtLoad</key><false/>
</dict>
</plist>

nginx 0.6.7 purports to fix the install problems

Igor announced 0.6.7 on the mailing list.

Changes with nginx 0.6.7
*) Change: now the paths specified in the "include",
"auth_basic_user_file", "perl_modules", "ssl_certificate",
"ssl_certificate_key", and "ssl_client_certificate" directives are
relative to directory of nginx configuration file nginx.conf, but no
to nginx prefix directory.

nginx 0.6.6 – make install fails

I’m trying to deploy a Rails app to a new server with Deprec and Capistrano Server Extensions (capserverext). The capistrano task fails when compiling nginx, during the `make install` bit. The make errors are something like this:

cp: cannot create regular file `/usr/local/nginx/conf/mime.types.default'
No such file or directory

I tried compiling nginx directly, to eliminate deprec and capserverext, and the problem persisted.

After much head beating, and with disbelief, I concluded that the problem was a bug in nginx. Hubristic, I know. But searching the nginx mailing list immediately turned up a message with a patch from the developer. The ‘@@’s in this patch is munged on the web, so I pastied it for your consumption.

The problem stems from the addition of a new configure option, --sysconfdir. This new option means that capserverext is going to need a change to the compile_nginx task.

Once patched, you can run configure with --sysconfdir=/usr/local/nginx/conf to meet capserverext’s assumptions. But having to patch the source breaks the whole install_nginx task anyway.

What you do, though, is bravely pretend that prepare_host is going to work. When it fails:

  1. ssh into the server and cd to /usr/local/src/nginx-0.6.6/
  2. wget http://pastie.caboo.se/84215.txt
  3. patch -p0 < 84215.txt
  4. run the configure script with the arguments from capserverext’s nginx recipe plus –sysconfdir=/usr/local/nginx/conf
    sudo ./configure --sbin-path=/usr/local/sbin \
      --pid-path=/var/run/nginx.pid \
      --error-log-path=/var/log/nginx/error.log \
      --http-log-path=/var/log/nginx/access.log \
      --with-http_ssl_module \
      --sysconfdir=/usr/local/nginx/conf
  5. sudo make
  6. sudo make install
  7. rm /usr/local/nginx/conf/nginx.conf

Now nginx 0.6.6 should be installed on your server. Back on your dev machine run the following tasks to get back on track:

  • cap install_nginx_start_script
  • cap nginx_postgres_rails_setup ( or cap nginx_mysql_rails_setup, if you’re using mysql)

This gets you past the prepare_host task.

Here’s hoping this post becomes obsolete very soon.

FinderColor: A Ruby interface to Finder labels in Mac OS X

I just posted to Rubyforge the first public version of FinderColor, a very small interface to the Finder label colors in Mac OS X. FinderColor sends Apple Events using rb-appscript, bypassing AppleScript entirely. This counts as a good thing.

Install: sudo gem install findercolor

There are only 5 methods to FinderColor:

 
  FinderColor.get_index(full_path_to_file)
  FinderColor.set_index(full_path_to_file, index)
  FinderColor.get_color(full_path_to_file)
  FinderColor.set_color(full_path_to_file, symbol)
  FinderColor.batch_set(hash)
 

The index argument must be between 0 and 7. The hash argument to batch_set expects the keys to be integers or symbols for color names. FinderColor::Labels gives you an array of the color symbols in their index order:

 
  FinderColor::Labels #=> [:none, :orange, :red, :yellow, :blue, :purple, :green, :gray ]
 

Rdocs here.

ActiveLdap wants to find your subschemaSubentry

Today was the second time I had to research the same problem with OpenLDAP and ActiveLdap. I have no idea what happened to the solution that I found and employed, but it’s gone. Can’t find it. No love from grep.

The problem is this error in ActiveLdap

 
 undefined method `[]' for nil:NilClass - (NoMethodError)
 ../active_ldap/adapter/base.rb:99:in `schema'
 

The solution is to add two ACL lines to my slapd.conf or one of its includes:

 
 access to dn.base="" by * read
 access to dn.base="cn=Subschema" by * read
 

The reason why come is that ActiveLdap apparently queries anonymously for the schema, and my acls are too mean and stingy. If you start your development with strict ACLs, you hit the problem early. If you wait until near deployment-time to tighten up the security, you will be surprised when stuff just stops working.

You can see whether your ACLs are preventing access to the schema by running the following ldapsearch command:

ldapsearch -xh http://www.example.com -b '' -s base subschemaSubentry

If the result doesn’t look something like the example below, then you can try adding the two ACL lines above. The important section is the second, where you see that the value of subschemaSubentry is ‘cn=Subschema’

 
 # extended LDIF
 #
 # LDAPv3
 # base  with scope base
 # filter: (objectclass=*)
 # requesting: subschemaSubentry 
 #

 #
 dn:
 subschemaSubentry: cn=Subschema

 # search result
 search: 2
 result: 0 Success

 # numResponses: 2
 # numEntries: 1