Contents

CouchDB Views in Python

I’ve been interested in CouchDB lately, and since I’m primarily working in Python, I naturally want to use the two together. There’s a pretty nice module called couchdb-python that makes it easy to get connected, create, edit, and delete documents, but the paucity of information on how to write CouchDB views in Python is laughable.

Missing Documentation

There are literally three lines of code and one sentence explaning how to write views in Python:

def fun(doc):
    if doc['date']:
        yield doc['date'], doc

Note that the map function uses the Python yield keyword to emit values, where JavaScript views use an emit() function.

Can you imagine a manual for MySQL where they didn’t even talk about SQL? That’s the equivalent of the CouchDB API not talking about views.

Anyway, such is life when you’re using open source and leading edge technology. Instead of complaining about bad documentation, you need to roll up your sleeves and look at the code. There is a module in there called design.py that looked promising, so I started reading. In the PyDoc for a class called ViewDefinition, I saw some sample code that was close to what I was looking for:

from couchdb import Server
server = Server()
db = server.create('python-tests')

view = ViewDefinition('tests', 'all', '''function(doc) {
    emit(doc._id, null);
}''')
view.get_doc(db)
# The view is not yet stored in the database, in fact, design doc doesn't
# even exist yet. That can be fixed using the `sync` method:
view.sync(db)

Now we’re getting somewhere! Well… kind of. The view function here is embedded inside a string literal. This is a bad idea for production code, for several reasons:

  • The example shown here is a map function in a map/reduce pair, but you can’t tell that from the code.
  • You won’t get syntax highlighting for code that’s inside a string literal.
  • It’s not easy to write unit tests for code that’s inside of a string.

You could conceivably fix some of these problems by enforcing weird code conventions and using eval, but that’s way too hacky for my tastes. In an ideal world, I’d like to be able to write named map and reduce functions, have them highlighted properly, write unit tests against them, and then easily synchronize my views with my CouchDB instance.

Nifty Hack

Reading on, I found exactly what I was looking for. Inside the constructor for ViewDefinition, I found the following:

if isinstance(map_fun, FunctionType):
    map_fun = _strip_decorators(getsource(map_fun).rstrip())
self.map_fun = dedent(map_fun.lstrip('\n'))

if isinstance(reduce_fun, FunctionType):
    reduce_fun = _strip_decorators(getsource(reduce_fun).rstrip())
if reduce_fun:
    reduce_fun = dedent(reduce_fun.lstrip('\n'))
self.reduce_fun = reduce_fun

This is a surprising little piece of code. I had not realized it before, but Python has a magical function called getsource that will actually return a string that contains the source code for an object that you pass into it! (Python is a neat language.)

Solution

Anyway, at this point, I can see a solution coming together. I’d like to write a class that encapsulates the idea of having a pair of map and (optional) reduce functions, and glue that together with the couchdb-python module in an easy-to-use way. Here’s what I came up with.

from couchdb.design import ViewDefinition
import inflection
import sys

class CouchView(ViewDefinition):
    """
    A base class for couch views that handles the magic of instantiation.
    """

    def __init__(self):
        """
        Does some magic to map the subclass implementation into the format
        expected by ViewDefinition.
        """

        module = sys.modules[self.__module__]
        design_name = module.__name__.split('.')[-1]

        if hasattr(self.__class__, "map"):
            map_fun = self.__class__.map
        else:
            raise NotImplementedError("Couch views require a map() method.")

        if hasattr(self.__class__, "reduce"):
            reduce_fun = self.__class__.reduce
        else:
            reduce_fun = None

        super_args = (design_name,
                        inflection.underscore(self.__class__.__name__),
                        map_fun,
                        reduce_fun,
                        'python')

        super(CouchView, self).__init__(*super_args)

It’s a little ugly. I could have used composition instead of inheritance, and I could have used an abstract base class instead of introspection. But that’s beside the point. The goal here is to write this short bit of ugly code once so that the numerous views I write later on will all be neat and tidy.

Speaking of neat and tidy views, here’s an example of a view subclass:

from couchview import CouchView

class CountTypes(CouchView):
    """ Count the number of documents available, per type. """

    @staticmethod
    def map(doc):
        """ Emit the document type for each document. """
        if 'doc_type' in doc:
            yield (doc['doc_type'], 1)

    @staticmethod
    def reduce(keys, values, rereduce):
        """ Sum the values for each type. """
        return sum(values)

This particular view counts up how many of each document type I have. It’s easy to read, it’s short and concise, it will highlight properly in a text editor, and I can easily write unit tests against it. (Unit testing CouchDB views will be the subject of another blog post.)

This view can easily be loaded into CouchDB using the sync_many(…) function.

couch_views = [
    CountTypes(),
    # Put other view classes here
]

couchdb.design.ViewDefinition.sync_many(couchdb, couch_views, remove_missing=True)

The super class will automatically take care of naming the design documents and views. The design document will be named for the module that the class is in, and the view will be named for the class using automatic inflection. For example, if I put that code above into statistics.py, then CouchDB will refer to it as _design/statistics/_view/count_types.

What do you think? If you’ve worked with Python and CouchDB together, I’d love to hear your feedback.