Unit Tests for Python CouchDB Views

2012-07-01 630 words 3 minutes in Coding

Contents

I recently wrote about how to write CouchDB views in Python, because I couldn’t find any documentation online explaining a good way to do it. Today I’d like to tackle a similarly neglected topic: writing unit tests for your Python CouchDB views.

Unit Tests

If you read my previous post, then you’ve already got CouchDB using real Python code for your views (and you’re not using Python code stuffed inside string literals). CouchDB doesn’t let you call any outside functions from your views, so your views necessarily won’t have any dependencies to worry about. This makes it easy to get your views into a test harness.

The only difficult thing about testing your views is that the CouchDB’s view API is highly contractual: it calls your map and reduce functions multiple times in a specific order. Ideally, we should emulate this behavior inside of our tests in order to get tests that are helpful, easy-to-read, and thorough. This emulation behavior can be implemented as a superclass for unit tests.

from collections import defaultdict
import unittest

class MapReduceTest(unittest.TestCase):
    def simulate_map(self, class_, documents):

        map_results = list()

        for document in documents:
            for map_result in class_.map(document):
                map_results.append(map_result)

        return map_results

    def simulate_reduce(self, class_, map_results, group=True):

        map_results.sort()
        map_dict = defaultdict(list)
        reduce_results = dict()

        if group:
            # Group the map results by key:
            for map_result in map_results:
                key = map_result[0]
                value = map_result[1]
                map_dict[key].append(value)

            # Now call reduce for each key:
            for key, values in map_dict.iteritems():
                reduce_results[key] = class_.reduce(keys=None, values=values, rereduce=False)
        else:
            # Call reduce once for all values:
            values = [map_result[1] for map_result in map_results]
            reduce_results[None] = class_.reduce(keys=None, values=values, rereduce=False)

        return reduce_results

The emulation of map is pretty easy. Instead of calling map a single time on a single input, we want to call map in a loop, because each call to map can actually emit multiple results. Moreover, we want to loop over a list of documents as well, so that we can take the output from map and feed it into reduce.

The reduce function is a little more interesting. CouchDB actually has a pretty complicated contract for reduce:

It can give you input records in one big group or many small groups.
It might group inputs by key or it might not. (It can also group by various parts of a compound key.)
It may pass in partially reduced results along with the map results. This operation is known as rereduce.

In my implementation, I have picked the low-hanging fruit. My simulated reduce passes in all of the input records in one call, and it has limited options for grouping. While I’d like to implement the map/reduce contracts more fully, it should be possible to start writing tests against this abstraction now and then improve the abstraction later on, without unnecessarily breaking the tests that I wrote in the interim.

Example

Here’s an example test:

from couchview import MapReduceTest
from couchview.stats import CountTypes

class TestStats(MapReduceTest):
    def test_count_types(self):

        documents = [
            {"doc_type": "foo"},
            {"doc_type": "foo"},
            {"doc_type": "foo"},
            {"doc_type": "bar"},
            {"doc_type": "bar"},
        ]

        expected_map_results = [
            ("foo", 1),
            ("foo", 1),
            ("foo", 1),
            ("bar", 1),
            ("bar", 1)
        ]

        actual_map_results = self.simulate_map(CountTypes, documents)
        self.assertListEqual(expected_map_results, actual_map_results)

        expected_reduce_results = {
            'foo': 3,
            'bar': 2
        }

        actual_reduce_results = self.simulate_reduce(CountTypes, actual_map_results)
        self.assertEqual(expected_reduce_results, actual_reduce_results)

The test is a bit verbose because of the data structures used for inputs and expected outputs, but it is also easy to read, in my opinion. I’ve written a half dozen view tests so far, and I find that they are fairly easy to write and genuinely helpful for catching and fixing errors.

There is still a lot of work that could be done here, but even this basic implementation has been useful to me in my work. If you have spent any time writing Python unit tests for views, please leave a note in the comments!