Hello world!

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!

The planned security model for CouchDB

While CouchDB currently has no authentication or authorization facilities, a general outline of the plans exists here:

http://www.couchdbwiki.com/index.php?title=Technical_Overview#Security_and_Validation

It seems databases will have admin accounts. Full stop. No baked-in facility for user accounts. Administrators have two powers:

  1. manage design documents
  2. create other admin accounts

The Technical Overview doesn’t say, but I assume admins can also manage all documents.

Regarding non-administrative users, as I understand the plan, CouchDB will provide a rudimentary authentication delegator that accepts a login and password. “The user credentials are input to a javascript function, and the function returns a list of [reader-names] for the user, or an error if the user credentials are wrong.” I read this as meaning that the authenticating/authorizing javascript function is determined by the developer.

Write access, called “Update Validation” in the Technical Overview, is also handled dynamically by javascript: “Both the user’s credentials and the updated document are given as inputs to the validation formula, and can be used to implement custom security models…” Developers are free to implement custom authorization models, such as “author only”, where users must be listed in an author field in the document.

Read access is granted by a simplified ACL system. A CouchDB document may have a “reader list”, which specifies a list of “reader-names”. These reader-names need not correspond to user login names, so they are effectively the equivalent of groups (i.e. you can give one or more users the same reader-name). A user’s possession of a reader-name is determined by the javascript function that authenticates user logins.

Documents without a reader list are world-readable. Documents protected by reader lists are viewable only by users who possess one of the reader-names on the list. Read protection on documents applies to view results as well: “Documents that are not allowed to be read by the user are dynamically filtered out of views, keeping the document row and extracted information invisible to non-readers.”

The Technical Overview does not address this directly, but the _all_docs view must follow the same rules. That is, all unprotected documents show up in the results for all users, but protected docs only show up for users possessing the correct reader-names. The javascript functions that bind it all together will probably need administrator privileges.

Some interesting points:

  • The plans for write access allow developers to write their own authorization schemes, but the planned read access will be hard-coded to use document-based ACLs.
  • By default, document names (i.e. DocIds) are 128 bit UUIDs generated by the server. Depending on the algorithm used to generate them, these could be unguessable.
  • If the _all_docs view was not available, users would not be able to find any documents without knowledge of the (unguessable) DocIds.
  • The authentication javascript, upon success, is supposed to return a list of reader-names, but I don’t see that anything would prevent it from returning a list of DocIds.

I think we have the germ of a capability system here, even with the inflexible reader access.

Comments? Scathing refutations?

Amazon SimpleDB and CouchDB compared

Terminology mapping

  • What you and I (and CouchDB) would call a database, Amazon SimpleDB calls a domain.
  • CouchDb documents and SimpleDB items will be referred to in this post as records.
  • The JSON name:value pairs used in CouchDb documents and the attribute-value pairs in SimpleDB items will be called simply attributes.

A brief explanation: The developer documentation for SimpleDB states that attributes may have multiple values, but that attributes are uniquely identified in an item by their name/value combination. In the same paragraph, the docs give this as an example of an item’s attributes:

{ 'a', '1' }, { 'b', '2'}, { 'b', '3' }

By Amazon terminology, the ‘b’ attribute has two values. I think it clearer to regard this item as having three attributes, two of which have ‘b’ as their key.

What SimpleDB and CouchDB have in common

  • Not relational databases
  • Schemaless
  • CouchDB is built with Erlang. SimpleDB may be, as well.
  • support for data replication (this is a very sloppy generalization)
  • accessed via HTTP

How SimpleDB and CouchDB Differ

SimpleDB:
  1. provides SOAP and (what passes at Amazon for) REST interfaces to the API
  2. REST requests all use HTTP GET, specifying the API method with a query param
  3. requests specify the database, record, attributes, and modifiers with query params
  4. record creation, updating, and deletion is tomic, at the level of individual attributes
  5. all data is considered to be UTF-8 strings
  6. automatically indexes data, details unknown
  7. queries
    • limited to 5 seconds running time. Queries that take longer “will likely” return a time-out error.
    • defined with HTTP query parameters
    • composed of Boolean and set operations with some obvious comparison operators(=, !=, >, >=, etc.)
  8. as all values are UTF-8 strings, there are no sorting options.
  9. responses are XML
CouchDB:
  1. all REST, all the time
  2. requests use HTTP GET, PUT, POST, and DELETE with their usual RESTful semantics
  3. requests specify the database and record in the URL, with query params used for modifiers
  4. record creation, updating, and deletion is atomic
  5. supports all JSON data types (string, number, object, array, true, false, null)
  6. indexing is under user control, by means of “views”
    • defined with arbitrary Javascript functions
    • can be stored as documents
    • can be run ad hoc, as “temporary views”
  7. queries are basically views, with the addition of modifiers (start_key, end_key, count, descending) supplied as HTTP query parameters
  8. sorting is flexible and arbitrarily complex, as it is based on the JSON keys defined in the views. See here for more information
  9. responses are JSON

IRB: What was that method that greps again?

Giles Bowkett continually improves his .irbc file, and I’ve borrowed a few of those tricks. His latest is grep_methods, a helper to search for the methods available on an object. This is a very useful construct, but, as so often happens, there’s a better way baked in.


"my_arbitrary_string".methods.grep /ch/
=> ["each_byte", "match", "chomp!", "chop", "each_with_index", "chomp", "each_line", "each", "chop!"]

Default fonts too small in gitk on OS X when installed with MacPorts

This is an easy one, addressed obliquely by the blog post that is the number one result in Google. The blogger also sets his font to Arial, which no one should emulate.

Gitk works pretty much straight out of MacPorts. Bravo, except for the 9 point fonts. Really.

Launch X11 and cd to your git repository in xterm. Run gitk and marvel at the unreadable text.

Edit the top three lines of ~/.gitk to change the fonts and/or sizes. Mine looks like this:


set mainfont {Helvetica 12}
set textfont {Courier 12}
set uifont {Helvetica 12 bold}
set tabstop 8
...

All this assumes that your X11 profile has a usable $PATH.

Paging in OpenLDAP, or "What, no LIMIT or OFFSET?"

Disclaimer: I’m not an LDAP expert, but I’ve done a whole mess of reading about OpenLDAP lately. Let the knowledgeable correct me where I err.

Paging in LDAP is somewhat of a pain, and by “somewhat” I mean “asymptotically approaching totally”. In the ldapsearch tool, for example, you have to use a “search extension” argument, as paging is not part of the search filter syntax. This is as opposed to SQL queries, where you may specify a LIMIT and OFFSET in the WHERE clause. Thus LDAP clients must implement the pagedResults search control (and the LDAP directory server must support it).

It gets worse. Check out the way the paging is implemented when following RFC 2696 (http://www.faqs.org/rfcs/rfc2696.html). You can only specify the size of the result set, not the offset or a page number. The LDAP server returns a cookie with the search results. The client uses the cookie in the next pagedResults query, and the server uses the cookie to figure out where to start the next set of results. LDAP clients must treat the cookie as opaque, i.e. they shouldn’t know how to do anything other than send the cookie back to the server.

Thus the only way to paginate results on the server side appears to be by looping through all results. The client must retain a cookie from each query for use in the next. Hrmmm. Can you guess who wrote RFC 2696?

At the time of this writing, there are two Ruby libraries for LDAP access, and ActiveLdap can use either as its adapter. To the extent that Net::LDAP supports the pagedResults control, it is only to prevent ActiveDirectory from choking when a query returns more than 1000 results. See ./lib/net/ldap.rb:1158 for the code that handles the pagedResult control.

Ruby/LDAP does support pagedResults, which I should have figured out from the line in the TODO file that started the discussion on the mailing list that started my research: “Add result pagination via LDAP::Controls”. So I think adding support for the control to the Ruby/LDAP adapter for ActiveLdap should be practical.

It might be possible to roll your own pagination, in a very ugly way, by calling the ActiveLdap::Base#search method with a block that throws away results before and after the desired page set. Net::LDAP yields each entry *after* adding it to the result_set array, so you would need to set the entry to nil and compact the result.

Alternatively, perhaps you could override the Net::LDAP search method to yield the entry to the block first, then add it to the result_set only if not nil.

It’s ugly every way you look.

Here’s the link that started my digging:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/195249

Lisp tutorials in Practical Common Lisp

They’re excellent. Peter Seibel’s book is available free online, as well as in print. I read enough of the free stuff to realize that I needed to stop and buy the book when I’m ready to do some projects in CL.

You can read it free here:
Common Lisp tutorial

launchd plist to run a reverse ssh tunnel

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE plist PUBLIC -//Apple Computer//DTD PLIST 1.0//EN
http://www.apple.com/DTDs/PropertyList-1.0.dtd >
<plist version='1.0'>
<dict>
<key>Label</key><string>com.automatthew.ssh_tunnel</string>
<key>UserName</key><string>matthew</string>
<key>ProgramArguments</key>
<array>
        <string>/usr/bin/ssh</string>
        <string>-nNT</string>
        <string>-R 1389:127.0.0.1:389</string>
        <string>matthew@slice1.automatthew.com</string>
</array>
<key>Debug</key><false/>
<key>Disabled</key><false/>
<key>OnDemand</key><false/>
<key>RunAtLoad</key><false/>
</dict>
</plist>

nginx 0.6.7 purports to fix the install problems

Igor announced 0.6.7 on the mailing list.

Changes with nginx 0.6.7
*) Change: now the paths specified in the "include",
"auth_basic_user_file", "perl_modules", "ssl_certificate",
"ssl_certificate_key", and "ssl_client_certificate" directives are
relative to directory of nginx configuration file nginx.conf, but no
to nginx prefix directory.

nginx 0.6.6 – make install fails

I’m trying to deploy a Rails app to a new server with Deprec and Capistrano Server Extensions (capserverext). The capistrano task fails when compiling nginx, during the `make install` bit. The make errors are something like this:

cp: cannot create regular file `/usr/local/nginx/conf/mime.types.default'
No such file or directory

I tried compiling nginx directly, to eliminate deprec and capserverext, and the problem persisted.

After much head beating, and with disbelief, I concluded that the problem was a bug in nginx. Hubristic, I know. But searching the nginx mailing list immediately turned up a message with a patch from the developer. The ‘@@’s in this patch is munged on the web, so I pastied it for your consumption.

The problem stems from the addition of a new configure option, --sysconfdir. This new option means that capserverext is going to need a change to the compile_nginx task.

Once patched, you can run configure with --sysconfdir=/usr/local/nginx/conf to meet capserverext’s assumptions. But having to patch the source breaks the whole install_nginx task anyway.

What you do, though, is bravely pretend that prepare_host is going to work. When it fails:

  1. ssh into the server and cd to /usr/local/src/nginx-0.6.6/
  2. wget http://pastie.caboo.se/84215.txt
  3. patch -p0 < 84215.txt
  4. run the configure script with the arguments from capserverext’s nginx recipe plus –sysconfdir=/usr/local/nginx/conf
    sudo ./configure --sbin-path=/usr/local/sbin \
      --pid-path=/var/run/nginx.pid \
      --error-log-path=/var/log/nginx/error.log \
      --http-log-path=/var/log/nginx/access.log \
      --with-http_ssl_module \
      --sysconfdir=/usr/local/nginx/conf
  5. sudo make
  6. sudo make install
  7. rm /usr/local/nginx/conf/nginx.conf

Now nginx 0.6.6 should be installed on your server. Back on your dev machine run the following tasks to get back on track:

  • cap install_nginx_start_script
  • cap nginx_postgres_rails_setup ( or cap nginx_mysql_rails_setup, if you’re using mysql)

This gets you past the prepare_host task.

Here’s hoping this post becomes obsolete very soon.

Next Page »