Amazon SimpleDB and CouchDB compared

Terminology mapping

  • What you and I (and CouchDB) would call a database, Amazon SimpleDB calls a domain.
  • CouchDb documents and SimpleDB items will be referred to in this post as records.
  • The JSON name:value pairs used in CouchDb documents and the attribute-value pairs in SimpleDB items will be called simply attributes.

A brief explanation: The developer documentation for SimpleDB states that attributes may have multiple values, but that attributes are uniquely identified in an item by their name/value combination. In the same paragraph, the docs give this as an example of an item’s attributes:

{ 'a', '1' }, { 'b', '2'}, { 'b', '3' }

By Amazon terminology, the ‘b’ attribute has two values. I think it clearer to regard this item as having three attributes, two of which have ‘b’ as their key.

What SimpleDB and CouchDB have in common

  • Not relational databases
  • Schemaless
  • CouchDB is built with Erlang. SimpleDB may be, as well.
  • support for data replication (this is a very sloppy generalization)
  • accessed via HTTP

How SimpleDB and CouchDB Differ

SimpleDB:
  1. provides SOAP and (what passes at Amazon for) REST interfaces to the API
  2. REST requests all use HTTP GET, specifying the API method with a query param
  3. requests specify the database, record, attributes, and modifiers with query params
  4. record creation, updating, and deletion is tomic, at the level of individual attributes
  5. all data is considered to be UTF-8 strings
  6. automatically indexes data, details unknown
  7. queries
    • limited to 5 seconds running time. Queries that take longer “will likely” return a time-out error.
    • defined with HTTP query parameters
    • composed of Boolean and set operations with some obvious comparison operators(=, !=, >, >=, etc.)
  8. as all values are UTF-8 strings, there are no sorting options.
  9. responses are XML
CouchDB:
  1. all REST, all the time
  2. requests use HTTP GET, PUT, POST, and DELETE with their usual RESTful semantics
  3. requests specify the database and record in the URL, with query params used for modifiers
  4. record creation, updating, and deletion is atomic
  5. supports all JSON data types (string, number, object, array, true, false, null)
  6. indexing is under user control, by means of “views”
    • defined with arbitrary Javascript functions
    • can be stored as documents
    • can be run ad hoc, as “temporary views”
  7. queries are basically views, with the addition of modifiers (start_key, end_key, count, descending) supplied as HTTP query parameters
  8. sorting is flexible and arbitrarily complex, as it is based on the JSON keys defined in the views. See here for more information
  9. responses are JSON
Advertisements