Amazon SimpleDB and CouchDB compared

Terminology mapping

  • What you and I (and CouchDB) would call a database, Amazon SimpleDB calls a domain.
  • CouchDb documents and SimpleDB items will be referred to in this post as records.
  • The JSON name:value pairs used in CouchDb documents and the attribute-value pairs in SimpleDB items will be called simply attributes.

A brief explanation: The developer documentation for SimpleDB states that attributes may have multiple values, but that attributes are uniquely identified in an item by their name/value combination. In the same paragraph, the docs give this as an example of an item’s attributes:

{ 'a', '1' }, { 'b', '2'}, { 'b', '3' }

By Amazon terminology, the ‘b’ attribute has two values. I think it clearer to regard this item as having three attributes, two of which have ‘b’ as their key.

What SimpleDB and CouchDB have in common

  • Not relational databases
  • Schemaless
  • CouchDB is built with Erlang. SimpleDB may be, as well.
  • support for data replication (this is a very sloppy generalization)
  • accessed via HTTP

How SimpleDB and CouchDB Differ

SimpleDB:
  1. provides SOAP and (what passes at Amazon for) REST interfaces to the API
  2. REST requests all use HTTP GET, specifying the API method with a query param
  3. requests specify the database, record, attributes, and modifiers with query params
  4. record creation, updating, and deletion is tomic, at the level of individual attributes
  5. all data is considered to be UTF-8 strings
  6. automatically indexes data, details unknown
  7. queries
    • limited to 5 seconds running time. Queries that take longer “will likely” return a time-out error.
    • defined with HTTP query parameters
    • composed of Boolean and set operations with some obvious comparison operators(=, !=, >, >=, etc.)
  8. as all values are UTF-8 strings, there are no sorting options.
  9. responses are XML
CouchDB:
  1. all REST, all the time
  2. requests use HTTP GET, PUT, POST, and DELETE with their usual RESTful semantics
  3. requests specify the database and record in the URL, with query params used for modifiers
  4. record creation, updating, and deletion is atomic
  5. supports all JSON data types (string, number, object, array, true, false, null)
  6. indexing is under user control, by means of “views”
    • defined with arbitrary Javascript functions
    • can be stored as documents
    • can be run ad hoc, as “temporary views”
  7. queries are basically views, with the addition of modifiers (start_key, end_key, count, descending) supplied as HTTP query parameters
  8. sorting is flexible and arbitrarily complex, as it is based on the JSON keys defined in the views. See here for more information
  9. responses are JSON
About these ads
Leave a comment

37 Comments

  1. codebrulee

     /  December 14, 2007

    Another interesting “feature” of SimpleDB is Eventual Consistency: http://www.satine.org/archives/2007/12/13/amazon-simpledb/

  2. codebrulee

     /  December 14, 2007

    Another interesting “feature” of SimpleDB is Eventual Consistency: http://www.satine.org/archives/2007/12/13/amazon-simpledb/

  3. Matthew King

     /  December 14, 2007

    CouchDB’s equivalent to that seems to be adding a query param of update=false.

    “The update option can be used for higher performance at the cost of possibly not seeing the all latest data. If you set the update option to “false”, CouchDB will not perform any refreshing on the view that may be necessary.”

    from http://www.couchdbwiki.com/index.php?title=HTTP_View_API

  4. Matthew King

     /  December 14, 2007

    CouchDB’s equivalent to that seems to be adding a query param of update=false.”The update option can be used for higher performance at the cost of possibly not seeing the all latest data. If you set the update option to “false”, CouchDB will not perform any refreshing on the view that may be necessary.”from http://www.couchdbwiki.com/index.php?title=HTTP_View_API

  5. Anonymous

     /  December 15, 2007

    Good article

    You should put the differences side by side in a table, so that it’s more clear.

  6. Anonymous

     /  December 15, 2007

    Good articleYou should put the differences side by side in a table, so that it’s more clear.

  7. Roberto Saccon

     /  December 15, 2007

    The main difference is that CouchDB you have to administer yourself, SimpleDB is a utility computing service. And isn’t CouchDB built on top of DETS, with it’s known size limitations ? But SimpleDB also has size limitation per user and per “domain” during beta phase.

  8. Roberto Saccon

     /  December 15, 2007

    The main difference is that CouchDB you have to administer yourself, SimpleDB is a utility computing service. And isn’t CouchDB built on top of DETS, with it’s known size limitations ? But SimpleDB also has size limitation per user and per “domain” during beta phase.

  9. Bark Madly

     /  December 15, 2007

    what about dabbledb? This seems like an interesting db on par with these other two options

  10. Bark Madly

     /  December 15, 2007

    what about dabbledb? This seems like an interesting db on par with these other two options

  11. Anonymous

     /  December 15, 2007

    Ummm.. perhaps there are more things that differ:

    1. Amazon has a widely used, web-scale storage & computing architecture
    2. Amazon gets rid of the need for custom hardware and maintenance. It turns the hardware + software scalability problem into just software.
    3. Amazon has some massive customers using the system, so you know its going to do the job
    4. Couch has to be installed on your own hardware
    5. Like “storage” systems before S3, simpledb will take the front spot in no time flat.

    Long live amazon, king of the web

  12. Anonymous

     /  December 15, 2007

    Ummm.. perhaps there are more things that differ:1. Amazon has a widely used, web-scale storage & computing architecture2. Amazon gets rid of the need for custom hardware and maintenance. It turns the hardware + software scalability problem into just software.3. Amazon has some massive customers using the system, so you know its going to do the job4. Couch has to be installed on your own hardware5. Like “storage” systems before S3, simpledb will take the front spot in no time flat.Long live amazon, king of the web

  13. JanL

     /  December 15, 2007

    Heya Matthew,
    thanks for the nice writeup!

    Cheers,
    Jan

  14. JanL

     /  December 15, 2007

    Heya Matthew,thanks for the nice writeup!Cheers,Jan–

  15. Matthew King

     /  December 15, 2007

    anonymous 1: tabular format would be better. I started out doing that, but the blogspot interface seemed to be ignoring the table, and I didn’t feel like troubleshooting.

    roberto saccon: I don’t think it’s fair to say the main difference is that you have to manage CouchDB, whereas SimpleDB is run by Amazon. That may be an important difference for the problem domains where either CouchDB or SimpleDB are good solutions. In such cases, the process cost for getting started with SimpleDB is close to nil.

    bark madly: DabbleDB is hot, but it belongs to a different phylum of datastore. I had remembered it as being an extra-nifty interface to a relational database. That is incorrect, according to http://news.squeak.org/2006/10/31/ocean-waves-the-applications-built-on-seaside/

    The magic of DabbleDB is largely the user interface, and I mean that as a complement, not a slur. SimpleDB and CouchDB are fundamentally concerned with simplifying and streamlining data storage and retrieval.

    anonymous 2: I completely neglected to cover any of the differences you mention. That was partly the Purloined Letter effect; they’re so obvious that it didn’t occur to me to write about it. The neglect may also be attributed to my motive for writing the post, which was that I wanted to highlight the different ways that SimpleDB and CouchDB attempt to solve similar problems. This post focused more on the API differences than on anything else, to the complete exclusion of the deployment differences.

  16. Matthew King

     /  December 15, 2007

    anonymous 1: tabular format would be better. I started out doing that, but the blogspot interface seemed to be ignoring the table, and I didn’t feel like troubleshooting.roberto saccon: I don’t think it’s fair to say the main difference is that you have to manage CouchDB, whereas SimpleDB is run by Amazon. That may be an important difference for the problem domains where either CouchDB or SimpleDB are good solutions. In such cases, the process cost for getting started with SimpleDB is close to nil.bark madly: DabbleDB is hot, but it belongs to a different phylum of datastore. I had remembered it as being an extra-nifty interface to a relational database. That is incorrect, according to http://news.squeak.org/2006/10/31/ocean-waves-the-applications-built-on-seaside/The magic of DabbleDB is largely the user interface, and I mean that as a complement, not a slur. SimpleDB and CouchDB are fundamentally concerned with simplifying and streamlining data storage and retrieval.anonymous 2: I completely neglected to cover any of the differences you mention. That was partly the Purloined Letter effect; they’re so obvious that it didn’t occur to me to write about it. The neglect may also be attributed to my motive for writing the post, which was that I wanted to highlight the different ways that SimpleDB and CouchDB attempt to solve similar problems. This post focused more on the API differences than on anything else, to the complete exclusion of the deployment differences.

  17. Randy Bias

     /  December 15, 2007

    anonymous said:

    “3. Amazon has some massive customers using the system, so you know its going to do the job”

    Can you name one? Just curious, because I think your idea of ‘massive’ probably differs from mine.

    I know of some folks using as many as 500 EC2 instances and hundreds of TB of storage. Smugmug said 300TB in April, maybe they are at 500TB now. They are probably the top S3 customer. Assuming a power law distribution, there probably aren’t many other customers using >100TB on S3.

    For me ‘massive’ is >5,000 machines, >500 TB. By that count Amazon has one ‘massive’ customer.

  18. Randy Bias

     /  December 15, 2007

    anonymous said:”3. Amazon has some massive customers using the system, so you know its going to do the job”Can you name one? Just curious, because I think your idea of ‘massive’ probably differs from mine.I know of some folks using as many as 500 EC2 instances and hundreds of TB of storage. Smugmug said 300TB in April, maybe they are at 500TB now. They are probably the top S3 customer. Assuming a power law distribution, there probably aren’t many other customers using >100TB on S3.For me ‘massive’ is >5,000 machines, >500 TB. By that count Amazon has one ‘massive’ customer.

  19. thejeshgn

     /  December 17, 2007

    What about size of data? In amazon it is 1024 bytes. in Couch DB?

  20. thejeshgn

     /  December 17, 2007

    What about size of data? In amazon it is 1024 bytes. in Couch DB?

  21. Matthew King

     /  December 17, 2007

    For CouchDB documents, “Elements can be of varying types (text, number, date, time), and there is no set limit to text size or element count.”

    CouchDB also accepts attachments, so you can store files along with the documents (a.k.a. records). When using SimpleDB, you would store files in S3 and refer to them via URL in a SimpleDB item attribute.

  22. Matthew King

     /  December 17, 2007

    For CouchDB documents, “Elements can be of varying types (text, number, date, time), and there is no set limit to text size or element count.”CouchDB also accepts attachments, so you can store files along with the documents (a.k.a. records). When using SimpleDB, you would store files in S3 and refer to them via URL in a SimpleDB item attribute.

  23. randal

     /  December 17, 2007

    Does anyone know how one would do “joins” using simpledb?

    Here’s how you do it using couchdb:

    http://www.cmlenz.net/blog/2007/10/couchdb-joins.html

  24. randal

     /  December 17, 2007

    Does anyone know how one would do “joins” using simpledb? Here’s how you do it using couchdb:http://www.cmlenz.net/blog/2007/10/couchdb-joins.html

  25. Matthew King

     /  December 17, 2007

    That CouchDB pattern is not so much a join as it is an effective use of the arbitrary JSON keys. The view prepares a list of records sorted so that all of the comments follow the blog post they belong to. Then the “slicing” parameters, start_key and end_key, narrow down the results to one specific post plus its comments.

    The result is effectively the same as a join across posts and comments tables, but the path taken to get there is quite different.

    I don’t see any features in the published docs for SimpleDB that would allow for similar patterns of use. It is clearly intended for simple uses, and that does not include relational queries such as joins.

  26. Matthew King

     /  December 17, 2007

    That CouchDB pattern is not so much a join as it is an effective use of the arbitrary JSON keys. The view prepares a list of records sorted so that all of the comments follow the blog post they belong to. Then the “slicing” parameters, start_key and end_key, narrow down the results to one specific post plus its comments.The result is effectively the same as a join across posts and comments tables, but the path taken to get there is quite different.I don’t see any features in the published docs for SimpleDB that would allow for similar patterns of use. It is clearly intended for simple uses, and that does not include relational queries such as joins.

  27. Matthew King

     /  December 17, 2007

    Roberto Saccon:

    Damien Katz addressed the DETS issue on the mailing list today:

    CouchDB doesn’t use DETS, it has it’s own storage engine and that
    storage engine has no storage limit.

    The original issue was that the file IO driver in Erlang didn’t
    support large files, though either it was fixed or I may have been
    mistaken.

  28. Matthew King

     /  December 17, 2007

    Roberto Saccon:Damien Katz addressed the DETS issue on the mailing list today:CouchDB doesn’t use DETS, it has it’s own storage engine and that storage engine has no storage limit.The original issue was that the file IO driver in Erlang didn’t support large files, though either it was fixed or I may have been mistaken.

  29. Karol

     /  October 21, 2008

    Is it true that CouchDB doesn’t support passing of parameters to the views effectively making it an extremely difficult to use in real world? (Imagine having to define a view for each value of the parameter you pass to WHERE some_column = some_value) because you can’t pass it as a view param ? :)

  30. Matthew King

     /  October 21, 2008

    Karol, have you stopped beating your wife yet?Also, was the baptism of John from heaven or from man?

  31. Anonymous

     /  December 22, 2008

    Check out sdb manager at http://www.sdbmanager.com it’s a neat way to manage your simpledb databases.

  32. Shane Brauner

     /  December 26, 2008

    It seems that relational databases are really taking a hit with the increasing popularity of cloud computing. Another non-relational database to take a look at is MongoDB from 10gen. It is open source, and integrated into 10gen’s cloud platform. Matthew, I invite you to check it out (SDK is at http://www.10gen.com). I would really like to hear your thoughts and impressions about what’s right and wrong with it.

  33. Matthew King

     /  December 26, 2008

    Shane Brauner, thanks for the info and link. Happy to see Mongo is on GitHub.

  34. rtweed

     /  April 9, 2009

    M/DB (http://www.mgateway.com/mdb.html), a SimpleDB clone, provides an interesting local or other-cloud instance of SimpleDB

  35. Anonymous

     /  May 9, 2009

    SimpleDB *is* written in Erlang.Something I have been wondering about these new distributed, schema-less databases which have begun to surface all over the place: is there a generic API yet?It would suck to write your whole application to one, and then find out you need to switch to another one, and have to rewrite a whole bunch of your code.This happens all the time with SQL databases, but because we have JDBC, ODBC, DBI and so forth, it isn’t such a big issue.

  36. Sergei Tulentsev

     /  June 24, 2009

    Karol, you can pass key parameter to the view (or range of keys (or range of corresponding document ids)). See http://wiki.apache.org/couchdb/HTTP_view_APIAlso I lost 10 minutes or so, trying to apply code from this post to query my current CouchDB v0.10. Specifically, parameters are called "startkey"/"endkey" and not "start_key"/"end_key".

  37. Ashley Tate

     /  January 13, 2010

    FYI – SimpleDB now supports sorting/ordering.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: