CouchDB Views in Python
I’ve been interested in CouchDB lately, and since I’m primarily working in Python, I naturally want to use the two together. There’s a pretty nice module called couchdb-python that makes it easy to get connected, create, edit, and delete documents, but the paucity of information on how to write CouchDB views in Python is laughable.
Missing Documentation
There are literally three lines of code and one sentence explaning how to write views in Python:
def fun(doc):
if doc['date']:
yield doc['date'], doc
Note that the map function uses the Python yield keyword to emit values, where
JavaScript views use an emit()
function.
Can you imagine a manual for MySQL where they didn’t even talk about SQL? That’s the equivalent of the CouchDB API not talking about views.
Anyway, such is life when you’re using open source and leading edge technology. Instead
of complaining about bad documentation, you need to roll up your sleeves and look at
the code. There is a module in
there called
design.py
that looked promising, so I started reading. In the PyDoc for a class called
ViewDefinition
, I saw some sample code that was close to what I was looking for:
from couchdb import Server
server = Server()
db = server.create('python-tests')
view = ViewDefinition('tests', 'all', '''function(doc) {
emit(doc._id, null);
}''')
view.get_doc(db)
# The view is not yet stored in the database, in fact, design doc doesn't
# even exist yet. That can be fixed using the `sync` method:
view.sync(db)
Now we’re getting somewhere! Well… kind of. The view function here is embedded inside a string literal. This is a bad idea for production code, for several reasons:
- The example shown here is a
map
function in amap
/reduce
pair, but you can’t tell that from the code. - You won’t get syntax highlighting for code that’s inside a string literal.
- It’s not easy to write unit tests for code that’s inside of a string.
You could conceivably fix some of these problems by enforcing weird code conventions and
using eval
, but that’s way too hacky for my tastes. In an ideal world, I’d like to be
able to write named map
and reduce
functions, have them highlighted properly, write
unit tests against them, and then easily synchronize my views with my CouchDB instance.
Nifty Hack
Reading on, I found exactly what I was looking for. Inside the constructor for ViewDefinition, I found the following:
if isinstance(map_fun, FunctionType):
map_fun = _strip_decorators(getsource(map_fun).rstrip())
self.map_fun = dedent(map_fun.lstrip('\n'))
if isinstance(reduce_fun, FunctionType):
reduce_fun = _strip_decorators(getsource(reduce_fun).rstrip())
if reduce_fun:
reduce_fun = dedent(reduce_fun.lstrip('\n'))
self.reduce_fun = reduce_fun
This is a surprising little piece of code. I had not realized it before, but Python has a magical function called getsource that will actually return a string that contains the source code for an object that you pass into it! (Python is a neat language.)
Solution
Anyway, at this point, I can see a solution coming together. I’d like to write a class
that encapsulates the idea of having a pair of map
and (optional) reduce
functions,
and glue that together with the couchdb-python
module in an easy-to-use way. Here’s
what I came up with.
from couchdb.design import ViewDefinition
import inflection
import sys
class CouchView(ViewDefinition):
"""
A base class for couch views that handles the magic of instantiation.
"""
def __init__(self):
"""
Does some magic to map the subclass implementation into the format
expected by ViewDefinition.
"""
module = sys.modules[self.__module__]
design_name = module.__name__.split('.')[-1]
if hasattr(self.__class__, "map"):
map_fun = self.__class__.map
else:
raise NotImplementedError("Couch views require a map() method.")
if hasattr(self.__class__, "reduce"):
reduce_fun = self.__class__.reduce
else:
reduce_fun = None
super_args = (design_name,
inflection.underscore(self.__class__.__name__),
map_fun,
reduce_fun,
'python')
super(CouchView, self).__init__(*super_args)
It’s a little ugly. I could have used composition instead of inheritance, and I could have used an abstract base class instead of introspection. But that’s beside the point. The goal here is to write this short bit of ugly code once so that the numerous views I write later on will all be neat and tidy.
Speaking of neat and tidy views, here’s an example of a view subclass:
from couchview import CouchView
class CountTypes(CouchView):
""" Count the number of documents available, per type. """
@staticmethod
def map(doc):
""" Emit the document type for each document. """
if 'doc_type' in doc:
yield (doc['doc_type'], 1)
@staticmethod
def reduce(keys, values, rereduce):
""" Sum the values for each type. """
return sum(values)
This particular view counts up how many of each document type I have. It’s easy to read, it’s short and concise, it will highlight properly in a text editor, and I can easily write unit tests against it. (Unit testing CouchDB views will be the subject of another blog post.)
This view can easily be loaded into CouchDB using the sync_many(…)
function.
couch_views = [
CountTypes(),
# Put other view classes here
]
couchdb.design.ViewDefinition.sync_many(couchdb, couch_views, remove_missing=True)
The super class will automatically take care of naming the design documents and views.
The design document will be named for the module that the class is in, and the view will
be named for the class using automatic inflection. For example, if I put that code above
into statistics.py
, then CouchDB will refer to it
as _design/statistics/_view/count_types
.
What do you think? If you’ve worked with Python and CouchDB together, I’d love to hear your feedback.