Tuesday, May 26, 2009

TDD and the SkipList (part 1)

I'm not a purist when it comes to test-driven development, although I do believe that tests validate design as well as code. My mantra is “if it's hard to test, it will be hard to use” (which is disturbingly close to “if you can dodge a wrench, you can dodge a ball”). But the purist approach, as exemplified by “Uncle Bob” Martin, is that writing your tests first will actually lead to a better design — the term they use is “emergent design.”

The Bowling Game is Uncle Bob's canonical example. While Big Design Up Front (BDUF) looks at the real world and posits the need for objects such as Frame, TDD approaches from the simple goal of scoring the game, and ends up with a single Scorer class. A neat demonstration, but the immediate response is that it's trivial, and you're not likely to be writing a scorekeeping application for your bowling league. What about real applications?

Last year I attended one of Uncle Bob's presentations on clean code. As an example of code needing cleanup, he used excerpts from his own Fitnesse tool. And while I have a great deal of respect for people who point out the flaws in their own work, I had to ask: “didn't you develop this using TDD?” His answer was yes, but that TDD doesn't guarantee clean code. Not an answer to give a warm fuzzy feeling, particularly since most of the unclean bits simply needed better factoring — which is supposed to be TDD's sweet spot.

But my real break with TDD orthodoxy came when I tried to implement a SkipList using pure TDD. For those who didn't follow the link, a SkipList is a data structure that holds its elements according to their natural ordering, and provides O(logN) access. It does this with multiple tiers of linked lists: every element is linked into the bottom tier, and each higher tier has roughly half the elements of the tier beneath. I say “roughly,” because elements are randomly assigned to tiers (following a log2 distribution). To find an element, you start at the topmost tier and work your way right and down until you get to the bottom tier and run into an element whose value is larger than the one you seek.

A SkipList doesn't implement the java.util.List API, because it doesn't preserve the insertion order of its elements. It does, however, implement the java.util.Collection API, and so I approached it through this public API, doing the simplest thing that could possibly work to make the tests pass. However, this wasn't going to get me to a SkipList: I could make all the tests pass using an ArrayList with linear search and insert. While I could actually implement a SkipList back end, I would break the rule of writing only enough code to make a single test pass. Clearly, testing the external interface wasn't enough; I had to look inside the implementation.

I posted this quandary in a comment to one of Uncle Bob's blog entries several years ago, and received the astounding (to me) reply that yes, I should open up the class and look inside it. My reply was perhaps a bit more sarcastic than needed, but I felt my point was valid: isn't encapsulation one of the reasons that we use object-oriented languages? Perhaps that explained some of the ugly code in Fitnesse.

One way to preserve encapsulation while still enabling white box testing is to program to interfaces: SkipList becomes a public interface, implemented by SkipListImpl. Mainline code uses the interface, test code uses the implementation. Of course, this implies the existence of a factory, because you need some way to create instances. Is this really what we want for a utility class?

An interesting side-effect of programming to interfaces is that we now need two tiers of tests: one that works with interfaces, one that works with the implementation classes. To me, this seems to be a move away from test- driven development, into the realm of testing for validation.

Back to the SkipList: if I wanted to develop the core linked-list code via TDD, I was going to have to extract that code and test it separately. This is one of the claimed benefits of TDD: it leads you to small, well-factored classes. But again, is this really what we want for a utility class? Or, to rephrase, do we want a library that exposes a lot of public classes used by only one consumer? My inner database developer is wary of 1:1 relationships.

I never saved my first attempt, so this weekend I started fresh; my next post will look at what TDD gave me. I decided to do this, not because I particularly need a SkipList, but because today I start a three-day class on TDD taught by Uncle Bob. I doubt there will be time in the class to do an actual implementation, but I plan to throw it out as a question. I'd like to see if I'm missing something or if the TDD purists are.

No comments: