The planned security model for CouchDB

While CouchDB currently has no authentication or authorization facilities, a general outline of the plans exists here:

http://incubator.apache.org/couchdb/docs/overview.html

It seems databases will have admin accounts. Full stop. No baked-in facility for user accounts. Administrators have two powers:

  1. manage design documents
  2. create other admin accounts

The Technical Overview doesn’t say, but I assume admins can also manage all documents.

Regarding non-administrative users, as I understand the plan, CouchDB will provide a rudimentary authentication delegator that accepts a login and password. “The user credentials are input to a javascript function, and the function returns a list of [reader-names] for the user, or an error if the user credentials are wrong.” I read this as meaning that the authenticating/authorizing javascript function is determined by the developer.

Write access, called “Update Validation” in the Technical Overview, is also handled dynamically by javascript: “Both the user’s credentials and the updated document are given as inputs to the validation formula, and can be used to implement custom security models…” Developers are free to implement custom authorization models, such as “author only”, where users must be listed in an author field in the document.

Read access is granted by a simplified ACL system. A CouchDB document may have a “reader list”, which specifies a list of “reader-names”. These reader-names need not correspond to user login names, so they are effectively the equivalent of groups (i.e. you can give one or more users the same reader-name). A user’s possession of a reader-name is determined by the javascript function that authenticates user logins.

Documents without a reader list are world-readable. Documents protected by reader lists are viewable only by users who possess one of the reader-names on the list. Read protection on documents applies to view results as well: “Documents that are not allowed to be read by the user are dynamically filtered out of views, keeping the document row and extracted information invisible to non-readers.”

The Technical Overview does not address this directly, but the _all_docs view must follow the same rules. That is, all unprotected documents show up in the results for all users, but protected docs only show up for users possessing the correct reader-names. The javascript functions that bind it all together will probably need administrator privileges.

Some interesting points:

  • The plans for write access allow developers to write their own authorization schemes, but the planned read access will be hard-coded to use document-based ACLs.
  • By default, document names (i.e. DocIds) are 128 bit UUIDs generated by the server. Depending on the algorithm used to generate them, these could be unguessable.
  • If the _all_docs view was not available, users would not be able to find any documents without knowledge of the (unguessable) DocIds.
  • The authentication javascript, upon success, is supposed to return a list of reader-names, but I don’t see that anything would prevent it from returning a list of DocIds.

I think we have the germ of a capability system here, even with the inflexible reader access.

Comments? Scathing refutations?

Advertisements

Amazon SimpleDB and CouchDB compared

Terminology mapping

  • What you and I (and CouchDB) would call a database, Amazon SimpleDB calls a domain.
  • CouchDb documents and SimpleDB items will be referred to in this post as records.
  • The JSON name:value pairs used in CouchDb documents and the attribute-value pairs in SimpleDB items will be called simply attributes.

A brief explanation: The developer documentation for SimpleDB states that attributes may have multiple values, but that attributes are uniquely identified in an item by their name/value combination. In the same paragraph, the docs give this as an example of an item’s attributes:

{ 'a', '1' }, { 'b', '2'}, { 'b', '3' }

By Amazon terminology, the ‘b’ attribute has two values. I think it clearer to regard this item as having three attributes, two of which have ‘b’ as their key.

What SimpleDB and CouchDB have in common

  • Not relational databases
  • Schemaless
  • CouchDB is built with Erlang. SimpleDB may be, as well.
  • support for data replication (this is a very sloppy generalization)
  • accessed via HTTP

How SimpleDB and CouchDB Differ

SimpleDB:
  1. provides SOAP and (what passes at Amazon for) REST interfaces to the API
  2. REST requests all use HTTP GET, specifying the API method with a query param
  3. requests specify the database, record, attributes, and modifiers with query params
  4. record creation, updating, and deletion is tomic, at the level of individual attributes
  5. all data is considered to be UTF-8 strings
  6. automatically indexes data, details unknown
  7. queries
    • limited to 5 seconds running time. Queries that take longer “will likely” return a time-out error.
    • defined with HTTP query parameters
    • composed of Boolean and set operations with some obvious comparison operators(=, !=, >, >=, etc.)
  8. as all values are UTF-8 strings, there are no sorting options.
  9. responses are XML
CouchDB:
  1. all REST, all the time
  2. requests use HTTP GET, PUT, POST, and DELETE with their usual RESTful semantics
  3. requests specify the database and record in the URL, with query params used for modifiers
  4. record creation, updating, and deletion is atomic
  5. supports all JSON data types (string, number, object, array, true, false, null)
  6. indexing is under user control, by means of “views”
    • defined with arbitrary Javascript functions
    • can be stored as documents
    • can be run ad hoc, as “temporary views”
  7. queries are basically views, with the addition of modifiers (start_key, end_key, count, descending) supplied as HTTP query parameters
  8. sorting is flexible and arbitrarily complex, as it is based on the JSON keys defined in the views. See here for more information
  9. responses are JSON