Amazon SimpleDB and CouchDB compared

Terminology mapping

  • What you and I (and CouchDB) would call a database, Amazon SimpleDB calls a domain.
  • CouchDb documents and SimpleDB items will be referred to in this post as records.
  • The JSON name:value pairs used in CouchDb documents and the attribute-value pairs in SimpleDB items will be called simply attributes.

A brief explanation: The developer documentation for SimpleDB states that attributes may have multiple values, but that attributes are uniquely identified in an item by their name/value combination. In the same paragraph, the docs give this as an example of an item’s attributes:

{ 'a', '1' }, { 'b', '2'}, { 'b', '3' }

By Amazon terminology, the ‘b’ attribute has two values. I think it clearer to regard this item as having three attributes, two of which have ‘b’ as their key.

What SimpleDB and CouchDB have in common

  • Not relational databases
  • Schemaless
  • CouchDB is built with Erlang. SimpleDB may be, as well.
  • support for data replication (this is a very sloppy generalization)
  • accessed via HTTP

How SimpleDB and CouchDB Differ

SimpleDB:
  1. provides SOAP and (what passes at Amazon for) REST interfaces to the API
  2. REST requests all use HTTP GET, specifying the API method with a query param
  3. requests specify the database, record, attributes, and modifiers with query params
  4. record creation, updating, and deletion is tomic, at the level of individual attributes
  5. all data is considered to be UTF-8 strings
  6. automatically indexes data, details unknown
  7. queries
    • limited to 5 seconds running time. Queries that take longer “will likely” return a time-out error.
    • defined with HTTP query parameters
    • composed of Boolean and set operations with some obvious comparison operators(=, !=, >, >=, etc.)
  8. as all values are UTF-8 strings, there are no sorting options.
  9. responses are XML
CouchDB:
  1. all REST, all the time
  2. requests use HTTP GET, PUT, POST, and DELETE with their usual RESTful semantics
  3. requests specify the database and record in the URL, with query params used for modifiers
  4. record creation, updating, and deletion is atomic
  5. supports all JSON data types (string, number, object, array, true, false, null)
  6. indexing is under user control, by means of “views”
    • defined with arbitrary Javascript functions
    • can be stored as documents
    • can be run ad hoc, as “temporary views”
  7. queries are basically views, with the addition of modifiers (start_key, end_key, count, descending) supplied as HTTP query parameters
  8. sorting is flexible and arbitrarily complex, as it is based on the JSON keys defined in the views. See here for more information
  9. responses are JSON

14 comments so far

  1. codebrulee on

    Another interesting “feature” of SimpleDB is Eventual Consistency: http://www.satine.org/archives/2007/12/13/amazon-simpledb/

  2. Matthew King on

    CouchDB’s equivalent to that seems to be adding a query param of update=false.

    “The update option can be used for higher performance at the cost of possibly not seeing the all latest data. If you set the update option to “false”, CouchDB will not perform any refreshing on the view that may be necessary.”

    from http://www.couchdbwiki.com/index.php?title=HTTP_View_API

  3. Anonymous on

    Good article

    You should put the differences side by side in a table, so that it’s more clear.

  4. Roberto Saccon on

    The main difference is that CouchDB you have to administer yourself, SimpleDB is a utility computing service. And isn’t CouchDB built on top of DETS, with it’s known size limitations ? But SimpleDB also has size limitation per user and per “domain” during beta phase.

  5. Bark Madly on

    what about dabbledb? This seems like an interesting db on par with these other two options

  6. Anonymous on

    Ummm.. perhaps there are more things that differ:

    1. Amazon has a widely used, web-scale storage & computing architecture
    2. Amazon gets rid of the need for custom hardware and maintenance. It turns the hardware + software scalability problem into just software.
    3. Amazon has some massive customers using the system, so you know its going to do the job
    4. Couch has to be installed on your own hardware
    5. Like “storage” systems before S3, simpledb will take the front spot in no time flat.

    Long live amazon, king of the web

  7. JanL on

    Heya Matthew,
    thanks for the nice writeup!

    Cheers,
    Jan

  8. Matthew King on

    anonymous 1: tabular format would be better. I started out doing that, but the blogspot interface seemed to be ignoring the table, and I didn’t feel like troubleshooting.

    roberto saccon: I don’t think it’s fair to say the main difference is that you have to manage CouchDB, whereas SimpleDB is run by Amazon. That may be an important difference for the problem domains where either CouchDB or SimpleDB are good solutions. In such cases, the process cost for getting started with SimpleDB is close to nil.

    bark madly: DabbleDB is hot, but it belongs to a different phylum of datastore. I had remembered it as being an extra-nifty interface to a relational database. That is incorrect, according to http://news.squeak.org/2006/10/31/ocean-waves-the-applications-built-on-seaside/

    The magic of DabbleDB is largely the user interface, and I mean that as a complement, not a slur. SimpleDB and CouchDB are fundamentally concerned with simplifying and streamlining data storage and retrieval.

    anonymous 2: I completely neglected to cover any of the differences you mention. That was partly the Purloined Letter effect; they’re so obvious that it didn’t occur to me to write about it. The neglect may also be attributed to my motive for writing the post, which was that I wanted to highlight the different ways that SimpleDB and CouchDB attempt to solve similar problems. This post focused more on the API differences than on anything else, to the complete exclusion of the deployment differences.

  9. Randy Bias on

    anonymous said:

    “3. Amazon has some massive customers using the system, so you know its going to do the job”

    Can you name one? Just curious, because I think your idea of ‘massive’ probably differs from mine.

    I know of some folks using as many as 500 EC2 instances and hundreds of TB of storage. Smugmug said 300TB in April, maybe they are at 500TB now. They are probably the top S3 customer. Assuming a power law distribution, there probably aren’t many other customers using >100TB on S3.

    For me ‘massive’ is >5,000 machines, >500 TB. By that count Amazon has one ‘massive’ customer.

  10. thejeshgn on

    What about size of data? In amazon it is 1024 bytes. in Couch DB?

  11. Matthew King on

    For CouchDB documents, “Elements can be of varying types (text, number, date, time), and there is no set limit to text size or element count.”

    CouchDB also accepts attachments, so you can store files along with the documents (a.k.a. records). When using SimpleDB, you would store files in S3 and refer to them via URL in a SimpleDB item attribute.

  12. randal on

    Does anyone know how one would do “joins” using simpledb?

    Here’s how you do it using couchdb:

    http://www.cmlenz.net/blog/2007/10/couchdb-joins.html

  13. Matthew King on

    That CouchDB pattern is not so much a join as it is an effective use of the arbitrary JSON keys. The view prepares a list of records sorted so that all of the comments follow the blog post they belong to. Then the “slicing” parameters, start_key and end_key, narrow down the results to one specific post plus its comments.

    The result is effectively the same as a join across posts and comments tables, but the path taken to get there is quite different.

    I don’t see any features in the published docs for SimpleDB that would allow for similar patterns of use. It is clearly intended for simple uses, and that does not include relational queries such as joins.

  14. Matthew King on

    Roberto Saccon:

    Damien Katz addressed the DETS issue on the mailing list today:

    CouchDB doesn’t use DETS, it has it’s own storage engine and that
    storage engine has no storage limit.

    The original issue was that the file IO driver in Erlang didn’t
    support large files, though either it was fixed or I may have been
    mistaken.


Leave a reply