Tuesday, June 23, 2009

Teams as Reinforcement

One of the four principles of the Agile Manifesto is “individuals over process.” It's one of the first things that I think gets missed when people adopt agile practices, but it's probably the most important.

A few years ago, I worked for a company that had a “quote page” on the development wiki. If one of your coworkers said something memorable, you'd be honor-bound to record it for posterity. One of my favorites:

I'd check this in, but James would kill me.

OK, perhaps to you this seems like a case study in hostile work environments. To me, knowing the people involved, it's a case study in high-quality software development. The person who said this realized that he had taken a shortcut, and wasn't willing to be called out for it. So he went back and did a better job.

As I've written recently, I don't think you can adopt individual practices such as test-driven development and expect your team to succeed. It's too easy to take shortcuts. Instead, you need reinforcing practices: TDD mixed with pair programming or code review, for example. And the best reinforcement is to have a team that is unwilling to disappoint each other.

I've been lucky enough to work on several high-performing teams, and to lead one of them. I've also been unlucky enough to work on several dysfunctional teams: teams that sometimes produced something that worked, but it didn't have very high quality, and the developers weren't very happy doing it. And the common thread of the high-performing teams was that the individuals on those teams were driven by maintaining each others' respect — not by money, and not even by the end result.

The desire not-to-disappoint is a powerful emotion. In its extreme form, guilt, it's a staple of one-liner comedians and mediocre managers. It's also incredibly difficult to maintain. Unless you focus on people over process.

Thursday, June 11, 2009

Reinforcement

My postings on SkipList finish with observations and concerns, but no conclusions — no resolution to my concerns. Actually, when I originally posted the article, there was a concluding paragraph, but I decided that it deserved its own posting.

So, the concern: test-first-development, or even test-driven-design, is a great thing for experienced programmers. It provides a sandbox in which to explore ideas and design, with the safety — fearlessness, in the words of TDD proponents — to change code without breaking it. In the hands of an inexperienced programmer, however, I believe that it can turn into a blind alley, through a process in which small rational decisions lead to an irrational result. I can easily imagine the dead mother counter arising from TDD.

So what's the solution? Well, one simple approach is to have an experienced programmer look at the tests. Code reviews are an accepted part of many development processes, but they usually focus on the mainline code. While that's certainly important, it seems less-so when you have high test coverage (as you will in a pure-TDD process). The tests, on the other hand, not only tell you whether the requirements were met, but give insight into the mind of the programmer. If you see a series of tests that don't make sense, it's a flag to look at the mainline code: there's probably something in it that doesn't make sense either.

Another benefit of code review for tests is that it can happen early and often. Mainline code reviews generally happen after the code is done, because that's when you can see the logic flow. Test reviews, however, can happen at any time: even if your mainline method/class is half-written, you should still have a bunch of tests for it.

Taken to the extreme (pun intended), you'll have a person sitting by your side and reviewing the tests as you write them: pair programming. I imagine that most readers are cringing after that last sentence. Programmers, because who wants someone looking over your shoulder, and managers, because pair programming is a great way to take two people and get the output of one.

But is it really? If your shop has code reviews as part of its process, have you ever considered how many person-hours goes into one? As a case study from my own career, one shop scheduled 2 to 3 reviews a week, with the presenter and 3 other developers. A review took about an hour of preparation, and an hour of meeting time, giving a total of 8 person-hours. And those reviews covered only the “important” code: perhaps 10% of the codebase.

So, 2-3 developer-days per week spent on code reviews, with more time spent by the reviewed developer to clean up any issues. Of course, you can do less rigorous reviews, but where's the value in that? It's too easy to end up with a review “process” that's nothing more than a formality: a rubber stamp, as overworked developers take at best a cursory look at the code under review. And if you're in that situation, the time is well and truly wasted.

Pair programming is a practice espoused by many Agile processes, in particular Extreme Programming (XP). Now, to be fair, I've never worked in a shop that adopted XP, so you might reasonably ignore what follows. However, I have followed many of the XP practices (including limited pair programming) at other shops, so am not completely talking through my hat.

And looking at the various practices — and also at shops that claim to practice Agile but never seem to reap any benefits — I've come to the conclusion that the real benefit of XP is that the various practices reinforce each other. OK, this isn't a great insight: it's right there on the front page of extremeprogramming.org: “The rules and practices must support each other.”

But that message seems to get lost. Indeed, if you click the “How do I start” link on that page, you'll see a suggestion that you can add a few practices at a time. This is not what I remember when I was first exposed to XP: at the time, the prevailing opinion was that you should start with all dozen or so practices, and eliminate practices only once your team had enough experience to do so with reason.

Reinforcement is important. It's too easy, when you see a deadline looming, to say “there's no need to write tests for this code, it's simple.” It's surprising how often that “simple” code ends up harboring a bug that doesn't turn up until you hit production. It's a lot harder to skip that test when the person sitting next to you raises his/her eyebrow and asks why.

Which brings me to what should be the strongest reinforcement of all: not disappointing your coworkers. But that's another posting.

Tuesday, June 9, 2009

TDD and the SkipList (part 3)

So here I am, with a SkipList implementation developed test-first. I have 100% test coverage (actually, more than 100% coverage), and didn't have to break encapsulation or pollute the class' interface to get it. Along the way, I wrote down some observations.

First: I remain convinced that test-first is the way to develop code, if only to ensure that you're doing what you think you are. In the case of the SkipList, this point was driven home early, when my tests caught a bug in the tier assignment method. It was a subtle bug: I knew that the size of the list determined the maximum tier for an element, so wrote the tier-assignment code to pass the current size into Random.nextInt(). My test however, kept failing. When I looked at the actual counts I discovered they were skewed — the bottom tier had about half the elements it should, and they were distributed to higher tiers. Without the tests, I never would have discovered the problem, nor been able to experiment with a solution (which was to pass a power of two).

And that leads to my second observation: it's really hard to avoid preconceived design. Although I wrote the test first, I already knew (or thought I knew) how I was going to implement the method. A pure TDD approach would have been to write separate tests for a variety of list sizes, and perhaps that would have led me directly to the “needs to be a power of two” solution.

Third: TDD is best accomplished in a single session, at least at the scale of a single method or group of methods. This one surprised me, because I thought that the tests would read like a story, leading me back to the point I had stopped. In reality, I put the half-finished code aside while attending the TDD training, and when I picked it up again I puzzled over some of the decisions that I'd made along the way.

On reflection, the storybook metaphor is actually the right one: you can't expect to put a novel down for a week and jump right back into it (at least, I can't). I think this may be one of the reasons for the XP practice of deleting uncommitted code before going home for the night. It's faster to start from scratch.

However, this observation concerns me, because it has direct bearing on the “code as specification” benefit of TDD. I find that the purist approach of small, limited test cases actually hinders comprehension, because the knowledge of how an object works is spread over many tests. I tend to write larger tests, such as testElementCreation() that explore multiple facets of a single behavior. Regardless of test size, this observation highlights the need for test clarity.

Fouth observation: implementation tends to be a quantum behavior. There are points at which you make a dramatic step in functionality that's impossible to break into smaller pieces. Either it works as expected, or it doesn't work at all. The SkipListManager.insert() method is an example: although I have a plethora of test scenarios, the code was pretty much complete (if slightly buggy) after the first. The only way to avoid this is to test something other than a SkipList.

And this brings me to a point that Uncle Bob made during the training: “TDD won't give you quicksort.” That statement seems to invalidate the entire approach: if a minimal failing test won't get you closer to your goal, why do it? The answer, I think, is that dogma inculcates discipline, and the TDD purists recognize that the real route to better code is disciplined programmers. Test-driven development mandates discipline: you have to to write the tests, and this will (hopefully) get you to thinking about the ways your code is used.

But this leads to my last observation: there were too many cases where the tests told me that I was fine, but my intuition said otherwise. I've developed that intuition over more than 30 years of writing code; many TDD proponents have been at it longer. An inexperienced programmer won't have that intuition, and could easily find him/herself at the end of a road with nowhere to turn.

Monday, June 8, 2009

TDD and the SkipList (part 2)

Several years have passed since my first attempt at implementing SkipList via TDD. I continue to do test-first development — for example, most of the Practical XML library was written in this manner. However, while I use tests to validate my design, and put me in the shoes of a person using the library, they don't drive that design.

For the SkipList, I tried very hard to take that next step, and let the tests show me where to go. I'm not sure that I was successful, and in some cases I didn't like the results. I won't blame you for skipping this post: it has a close resemblance to a fishing show (“yes, Billy-Bob, he's definitely got something on the hook, so lets watch him reel it in”). Not for everyone, but it does lay the groundwork for my next post.

In returning to the SkipList, I recognized that I would need a way to test the implementation separate from the public API. Java gave me a solution to the packaging issue: rather than using multiple top-level classes, or interface and implementation, I would use nested static classes. Everything lives in one source file. The choice of nested classes, rather than inner, was intentional: if TDD was going to drive me to independent collaborators, I wanted them to be truly independent.

The first question was where to draw the line between interface and implementation? I took what appeared to be the simplest: create a SkipListElement that knew only how to talk to its neighbors. The SkipList itself would know about the first element. Conceptually, this is how any linked list can be implemented, and it's very easy to test. In practice, of course, implementing a normal linked list with self-aware elements is a good way to blow up your stack from the recursive search operation (unless your language optimizes tail recursion). For SkipList, with its log2 access paths, it seemed like a workable idea.

As it turned out, while the tests remained simple, the implementation quickly became very complex. The find() method was particularly so: it had to work its way up through the tiers of links and then back down. After a lot of massaging, it turned into a half-dozen lines of code that still appeared to have O(log2) behavior. This exercise, however, was more test-first than test-driven: I made a lot of changes to the code, but only after pondering why the test was still failing; the test remained relatively unchanged.

Moreover, at this point my intuition was screaming “warning, Will Robinson!” but the tests remained silent. Looking at the tests, I realized that I wasn't actually validating that my search properly traversed the list, merely that it found the expected element. So I wrote a new test, one that recorded each invocation of the find() method, and discovered that I wasn't properly maintaining high-tier links. Even though I'd written a nice recursive method, I was doing a linear search.

Would a “pure” approach to TDD have made me write this test? On the one hand, I was validating core behavior; on the other, I already had a green bar, 100% coverage, and lots of tests that showed me what I expected to see.

As I looked at the code and thought about what would be required to make it work, I came to the conclusion that I had walked down a cul de sac. So, in good XP fashion, I deleted everything and started over.

In my next attempt I introduced another nested class, SkipListManager, and went having SkipListElement be a dumb data container. Well, almost a dumb data container: I made the decision that it should know about its neighbors.

This led to an interesting situation: to write my tests, I wanted to be able to connect two elements together at multiple tiers, so wrote an append() method to do so. As I moved forward with the implementation, I found that method leading me into a corner: because it existed, I wrote my manager to use it, and ended up with a findAndRemember() operation that was almost, but not quite the same as the find() operation. I couldn't find a good way out without a third object, a “find iterator.”

At this point, I debated whether or not I really needed the manager object. Its sole reason for existence was to provide a back door for testing; giving the list protected methods would do the same. This was a rehash of my original concerns about TDD: protected methods are part of the interface of a class, and can never be removed. I decided that I didn't want to pollute SkipList just to support testing.

As I was thinking about the manager-element interaction, I decided that the best way back was to delete a large chunk of code (unlike the former attempt, I wasn't willing to throw it all away). My tests certainly helped with this: I deleted the chunks of code that I didn't like, and my IDE told me what tests depended on them. The code that remained was still covered by tests, still worked, and still represented a design that I liked.

I dove back in, and again ran into trouble inserting elements. Once I had appended an element in a tier, it prevented me from properly traversing the lower tiers. Score one for the tests: they turned up this problem right away. I still needed to run in a debugger to figure out what was happening, but the tests gave me a controlled environment in which to walk the code.

The test for the manager's insert() function came in at nearly 100 lines of code, as I validated how different element values would work their way through the chains. Although I wrote this test one section at a time, I suspect a TDD purist would look at the length of the method and cringe (it was actually quite a bit longer, until I refactored the assertions). The TDD approach would be to break each of these scenarios into separate methods, but that seems like a lot of needless duplication to me: I would have to repeat code to configure the list for each test. By combining these tests I additionally validate that I'm not damaging other portions of the list. I will say, though, that my variable naming conventions left something to be desired.

The final hurdle came from remove(). After getting the manager to properly handle insertions, I started working through the public API in alphabetical order. When I hit removal, I realized belatedly that I just wasn't going to to get away with a singly-linked list. Should I have seen this coming? Probably yes, given that my element-only approach ran into a similar dead end. But it points out a very good point: with TDD, the order of tests matters. On the positive side, I was able to insert the backlinks by breaking one test at a time, and I was still testing the forward links.

Are you ready for “the codez”? You'll find the list class here, and the test class here. The name “SkipList1” should give you a hint that I'm still not happy with the implementation, but it's time to ship (ie, get this post up on the blog). Over the next few weeks I'll be tinkering with the implementation (I have an idea that will take me back to a forward-linked list), and once I'm happy with it (if ever), will post a link to the final code.

Next up: my thoughts about this experiment, and TDD in general.