Showing posts with label test-driven-development. Show all posts
Showing posts with label test-driven-development. Show all posts

Tuesday, April 29, 2014

Mock Objects are a Testing Smell

I like mock objects, at least in principle. I think that well-designed object-oriented programs consist of collaborating objects, and mocks do a good job of capturing that collaboration: the general idea is “push on A and verify something happens at B.”

But I feel uncomfortable when I see tests that use multiple mocks, or set many expectations. These tests have moved beyond testing behavior to testing implementation, and are tightly coupled to the mainline code. These tests are brittle, breaking at the slightest change to the mainline code. And on the flip side, the mainline code becomes much harder to refactor, because changes will break the tests.

This is a particular problem for service-level objects. To see what I mean, consider a simple service to manage bank accounts:

public interface BankService {
    BigDecimal getBalance(String accountId) throws BankException;
    List getTransactions(String accountId) throws BankException;

    void deposit(String accountId, BigDecimal amount) throws BankException;
    void withdraw(String accountId, BigDecimal amount) throws BankException;
}

To support this service, we need a data-access object, which can be equally simple:

public interface AccountRepository {
    Account find(String accountId);

    void update(Account account);
}

This seems to be a good use for mock objects: to test deposit(), you create a mock for AccountRepository and set expectations on find() and update(). Actually making this work, however, is not quite as easy as describing it. Here's an implementation using EasyMock:

public class TestAccountOperations {
    private static class AccountBalanceMatcher
    implements IArgumentMatcher {
        BigDecimal expected;

        public AccountBalanceMatcher(BigDecimal expected) {
            this.expected = expected;
        }

        @Override
        public boolean matches(Object argument) {
            Account account = (Account)argument;
            return account.getBalance().equals(expected);
        }

        @Override
        public void appendTo(StringBuffer sb) {
            sb.append(expected);
        }
    }

    public static Account eqBalance(BigDecimal expected) {
        EasyMock.reportMatcher(new AccountBalanceMatcher(expected));
        return null;
    }

    private final String accountId = "test-account-1234";
    
    private Account account = new Account(accountId);
    private AccountRepository mock = createMock(AccountRepository.class);
    private BankService service = new BankServiceImpl(mock);

    @Test
    public void testDepositHappyPath() throws Exception {
        final BigDecimal initialBalance = new BigDecimal("300.00");
        final BigDecimal depositAmount = new BigDecimal("200.00");
        final BigDecimal expectedBalance = initialBalance.add(depositAmount);
        
        account.setBalance(initialBalance);

        expect(mockAccountRepo.find(accountId)).andReturn(account);
        mockAccountRepo.update(EasyMockSupport.eqBalance(expectedBalance));
        replay(mockAccountRepo);

        service.deposit(accountId, depositAmount);
        
        verify(mockAccountRepo);
    }

That's a lot of code. More than half of it is infrastructure, needed so that EasyMock can validate the account balance in the call to update(). It is, however, relatively readable, and the mock object allows us to easily test “sad path” cases such as an invalid account number simply by replacing andReturn() with andThrow().

However, this test is missing something important: it doesn't validate that we update the transaction log. How do you discover this? You can't rely on a coverage report, because you're not actually exercising mainline code. You just have to know.

Fortunately, it's easy to add another mock (and it's also time to refactor the EasyMock extensions into their own class):

public class TestAccountOperations {
    private final String accountId = "test-account-1234";
    
    private Account account = new Account(accountId);
    private AccountRepository mockAccountRepo = createMock(AccountRepository.class);
    private TransactionRepository mockTxRepo = createMock(TransactionRepository.class);
    private BankService service = new BankServiceImpl(mockAccountRepo, mockTxRepo);

    @Test
    public void testDepositHappyPath() throws Exception {
        final BigDecimal initialBalance = new BigDecimal("300.00");
        final BigDecimal depositAmount = new BigDecimal("200.00");
        final BigDecimal expectedBalance = initialBalance.add(depositAmount);
        final Transaction expectedTx = new Transaction(accountId, TransactionType.DEPOSIT, depositAmount);
        
        account.setBalance(initialBalance);

        expect(mockAccountRepo.find(accountId)).andReturn(account);
        mockAccountRepo.update(EasyMockSupport.eqBalance(expectedBalance));
        mockTxRepo.store(EasyMockSupport.eqTransaction(expectedTx));
        replay(mockAccountRepo, mockTxRepo);

        service.deposit(accountId, depositAmount);
        
        verify(mockAccountRepo, mockTxRepo);
    }
}

We're done, although to my eye it's cluttered: you don't immediately see what is being tested, because you're distracted by how it's being tested.

Time marches on, and another department introduces a fraud-prevention service. We need to integrate with it, and verify that we reject transactions when the fraud service throws an exception. This is a case where mock-object testing is at its best: rather than understand the conditions that trigger a fraud exception, we just need to tell the mock to create one:

@Test(expected=BankException.class)
public void testWithdrawalFraudFailure() throws Exception {
    final BigDecimal initialBalance = new BigDecimal("300.00");
    final BigDecimal withdrawalAmount = new BigDecimal("200.00");
    
    account.setBalance(initialBalance);
    
    expect(mockAccountRepo.find(accountId)).andReturn(account);
    mockFraudService.check(accountId, FraudService.TransactionType.WITHDRAWAL, withdrawalAmount);
    expectLastCall().andThrow(new FraudException(accountId, "test of fraudulent transaction"));
    replay(mockAccountRepo, mockTxRepo, mockFraudService);

    service.withdraw(accountId, withdrawalAmount);
    
    verify(mockAccountRepo, mockTxRepo, mockFraudService);
}

As I said, this is one of the true benefits of mocks, but it comes at a cost: we need to update all of our tests to include an expectation on the service. When version 2.0 of the fraud service gets released, with an all-new API that doesn't use exceptions, the update to our mainline code is simple; our test library, now consisting of dozens of tests, not so much.

So here you are, spending hours updating your tests, while your manager is wondering why the team “wastes so much effort” on something that “clearly adds little value.” Perhaps you start to feel the same way, especially after you've spent a few hours trying to understand a test that hides its purpose within a mass of expectations.

But what's the alternative? Do you throw away the tests and hope that everything just works?

In my opinion, the only valid test for a service-layer object is an integration test, backed by a real (albeit preferably in-memory) database. Once you get rid of the mocks, not only do your tests become less brittle, but they clearly demonstrate the behavior that you're trying to test.

@Test
public void testDepositHappyPath() throws Exception {
    final BigDecimal initialBalance = new BigDecimal("300.00");
    final BigDecimal depositAmount = new BigDecimal("200.00");

    final BigDecimal expectedBalance = initialBalance.add(depositAmount);
    final Transaction expectedTx = new Transaction(accountId, TransactionType.DEPOSIT, depositAmount);
    
    // create account as part of @Before
    List initialTxs = service.getTransactions(accountId);
    service.deposit(accountId, depositAmount);
    List finalTxs = service.getTransactions(accountId);
     
    assertEquals("balance reflects deposit", expectedBalance, service.getBalance(accountId));
    assertEquals("transaction added by deposit", initialTxs.size() + 1, finalTxs.size());
    assertEquals("correct transaction", expectedTx, finalTxs.get(finalTxs.size() - 1));
}

That's not to say that you need to wire together the entire system for every test. As I said above, the fraud service is a good use for a stand-in. But that stand-in doesn't need to be an expectation-setting mock, it can be a behavior-delivering stub.

Monday, August 13, 2012

How Annotations are Stored in the Classfile ... WTF?!?

This weekend I made some changes to BCELX, my library of enhancements for Apache BCEL. These changes were prompted not by a need to access different types of annotations, but because I'm currently working on a tool to find hidden and unnecessary dependency references in Maven projects. Why does this have anything to do with annotation processing? Read on.

Prior to JDK 1.5, the Java class file was a rather simple beast. Every referenced class had a CONSTANT_Class_info entry in the constant pool. This structure actually references another entry in the constant pool, which holds the actual class name, but BCEL provides the ConstantClass object so you don't have to chase this reference. It's very easy to find all the external classes that your program references: walk the constant pool and pull out the ConstantClass values.

That functionality is exactly what I needed to cross-check project dependencies. But when I wrote a testcase to check my dependency-extraction method, it failed. I had used the test class itself as my target, and just by chance I picked the @Test annotation as one of my assertions. As far as my dependency-extraction code was concerned, I didn't have a reference to the annotation class.

I figured that there must be some flag in the CONSTANT_Class_info structure that was confusing BCEL — its released version hasn't been updated for JDK 1.5. So I turned to the JDK 1.5 classfile doc, and it slowly dawned on me: annotation classes aren't referenced in the constant pool. Instead, you have to walk through all of the annotation attributes, and get their names out of the constant pool. OK, I should have realized this sooner; after all, it wasn't so long ago that I'd written the annotation parsing code in BCELX.

Of course, this meant that I now had to add support for parameter and field-level annotations to BCELX (I was going to have to parameters anyway, to support another project). While doing this, I discovered something else interesting: the API docs say that you can apply annotations to packages and local variables, but the classfile docs give no indication that this is actually supported.

There are a couple of things that I take from this experience. The first is that it's another piece of evidence that JDK 1.5 represented a changing of the guard at Sun. Annotations have often had a “tacked on” feel to me — right down to the @interface keyword (they broke backwards compatibility for enum, would it have been so bad to add annotation?). I'm sure there was a reason for not treating an annotation reference as just another class, but quite frankly I can't see it.

The other thing I learned is to beware testcases built around spot checks. If I had written my testcase to look for org.junit.Assert rather than org.junit.Test, I never would have found the issue — until it turned up when using the utility. But there are lots of cases where exhaustive checks aren't cost-effective. Including this one: should I write a test that verifies every possible annotation reference? I'll need to, if I want 100% coverage, but really: it's a tiny part of the overall project.

One that could have been far easier if the JVM team had cleanly integrated their changes, and followed the existing model. I suppose that's the real take-away: if you're evolving a design, don't simply tack on the changes.

Saturday, October 1, 2011

The Role of Automated Tests

Automated testing is moving into the mainstream, adopted as a “best practice” by more companies each year. But why? Here are my reasons, originally intended as bullet points in a presentation on how to write tests.

Tests verify that the program behaves as expected

Let's get one thing out of the way up front: tests can find bugs, but they can't prove that no bugs exist. Or, as my friend Drew puts it: “tests can only show that your incorrect assumptions are internally consistent.”

However, as you increase test coverage, using well-designed tests, you gain confidence that the program will do what you want it to. In other words, that there aren't any obvious bugs. And unless you're writing code for the space shuttle, that's probably good enough.

Tests verify that the program continues to behave as expected when changed

The major portion of a program's development happens after it's released (80% is commonly quoted, but I couldn't find an authoritative reference). The bugs that got through testing will be found by end-users. Requirements will change, ranging from a simple UI facelift, through the addition of new business rules, to the deep structural changes needed to support increased load.

And when you change code, you risk breaking it. Usually in a place that you didn't think would be affected. Even in well-written code, there may be hidden side-effects. A test suite can protect you from the unintended consequences of change, provided again that it has complete coverage and well-designed tests. In my opinion, this is how automated tests provide the most value to the organization.

Of course, a test suite can also become part of a change. If your business rules change, then your tests have to change as well. This should be less of an issue at the level of “unit” tests, but it still happens. Unfortunately, many organizations consider such changes as an undesired cost. Instead, they should view them as a warning that the code may contain hidden dependencies on the old behavior, and budget extra time for release.

Tests serve as documentation

The idea of test-as-specification has long been part of Agile orthodoxy. Although, in practice, it can take a lot of work to make that happen with mainstream testing tools. I know that I've written more than my share of test methods with names like testOperation(). But if you have the discipline, a method named testFailureWhenArgumentsWouldCauseIntegerOverflow() is far more useful.

Tests give you a chance to think about your design

To me, this has always been the main benefit of testing: “if it's hard to test, it will be hard to use.” Of course, you can take this to an extreme: I have actually been asked by a traditional QA developer to store an application's internal data in comma-delimited format so that they could validate it (in that case, the binary format already took over 1GB, and was heavily optimized for access speed). While actively harming your design in the name of testability is foolish, it's not the common case.

More realistic is some code that I recently refactored: a single class that created a listener for external data, applied some business logic to the messages received, and sent messages based on that logic. As written, this code was impossible to test without instantiating the entire messaging framework. After refactoring, the business logic was in its own class, with separate listener and sender objects that could be mocked for testing. And that core business logic could now be tested in the form of a specification, with test names like testIgnoresThirdAndSubsequentDuplicateMessages().

Friday, June 24, 2011

Testing Boundaries

In Programming Pearls, Jon Bentley uses binary search as a case study of how difficult it can be to implement a seemingly simple algorithm (emphasis added).

I've assigned this problem in courses for professional programmers. The students had a couple of hours to convert the description above into a program in the language of their choice […] We would then [validate with] test cases. In several classes and with over a hundred programmers, the results varied little: ninety percent of the programmers found bugs in their programs

Bentley then goes on to develop an implementation using formal verification techniques: identifying an “invariant” condition, and proving that that invariant in fact does not vary during execution of the algorithm. Unfortunately, his code has a bug, one that found its way into the JDK.

While one can point to this bug as a refutation of formal correctness proofs,* I'm more interested in whether testing is a valid alternative. Bentley clearly didn't think so: “I wasn't always convinced of the correctness of the code in which no bugs were found.” His concern is reasonable: tests can only verify that a specific input leads to a specific output. It may be sheer coincidence that the code under test produces this output, and it may give incorrect results for different input.

Which means that a simple test, one that creates an array and searches for values in that array, isn't enough. It could be that your code works for an array that has an even number of elements, but fails for one that has an odd number. Or perhaps it works fine for all arrays with more than one element, but throws an exception if passed an empty array. And how do you know that you're really doing a binary search?** One obviously infeasible approach is to do exhaustive testing of all possible arrays.

An alternative is to adopt the “spirit of testivus,” let a tool generate random testcases for you, and assume that a sufficiently high number of tests means everything works. But from the perspective of “this is correct,” however, random tests are the same thing as arbitrary tests: they may find a bug, they may not; you'll never know if others exist.

A better approach is to adopt the core of Bentley's argument, which is to understand the inflection points of the algorithm: those places where the behavior potentially changes. For binary search, I started with the following:

  • 0: every method that involves indexing should have a test for a zero-length object (and in the case of methods using Strings, a test that passes null).
  • 1: This is the end-point of a binary search loop: you have a single element in your array, and either it's the correct value or it isn't.
  • 2, 3: These are the points at which a binary search divides the array and recurses (or loops).
  • 4: This is a superfluous test. I wrote it because I wanted to verify two passes through the divide-and-conquer logic, but then realized there was no point. Still, every test is sacred, so I checked it in.

If you remember high school algebra, you're realize this is proof by induction. If you can demonstrate that you have exhaustively tested every array size through all inflection points, then you can reasonably say that it will return the expected value for larger sizes. Add some hook points that record each pass through the loop (this could be a Comparator that counts its invocations), and you can verify that it's an O(logN) search.

Except … my code doesn't actually test all inflection points. There's another inflection point at Integer.MAX_VALUE / 2 + 2: the possibility of integer overflow. And unless you have a 64-bit JVM and over 4 Gb of physical memory, you have no way to test this case.

I don't have the answer. Tests can develop a high level of confidence, but they can't make guarantees. And to develop that level of confidence, you essentially create a formal proof. But compared to the situation with no tests, that level of confidence may be enough.


* Something that I am all too willing to do: I have a degree in History rather than Computer Science because at nineteen years old I realized that there was a big difference between proof on paper and bytes in memory. But that's a topic for another posting.

** I believe that binary search, like quicksort, is an algorithm that cannot be implemented following the strict rules of test-driven-development. Fixing your tests by doing “the simplest thing that could possibly work” will give you a linear search.

See www.youtube.com (NSFW if you don't have headphones, potentially offensive if you do).

Actually, you could test a binary search that works on arbitrary indexable objects rather than Java arrays: create an implementation that stores its data on disk. Although when Java's search methods were being developed (circa 1999), a typical workstation disk wouldn't have had the needed capacity.

Monday, June 20, 2011

Testing the Complete API

In my article on code coverage I was careful to stress that a coverage tool can only tell you what you haven't tested. It can't tell you if you're missing mainline code. Which made a recent bugfix to one of my open-source libraries even more embarrassing: the class had 100% coverage but was missing 7 characters of mainline code. And those seven characters were the difference between correct behavior and a bug that would manifest as non-deterministic behavior in consuming code.

When I have a bug, my first task is to write a unit test that exposes it (and demonstrates that I fixed it). Then I look in all similar code for the same bug (and in this case I found it). Finally, I think about how the bug came into existence, and more importantly, how come I didn't uncover it when I was first implementing the class and writing tests.

In this case, the answer was simple: the bug was in the single-byte read method of an InputStream implementation, and since I rarely use that method I got sloppy when writing tests for it. I wrote “happy path” tests that validated correct behavior, but did not try to trick the class into exhibiting incorrect behavior.

This experience has pushed me to think, again, about the role of unit tests. And it reminded me that there's a big difference between the way that a software developer thinks and the way that a good QA tester thinks. The former wants the code to work, the latter wants it to fail. Both mindsets are needed, but I'm not sure that any one person can hold both at the same time. And that's what's needed for unit tests to truly exercise code.

And I also wonder how such tests fit into the mindset of test-driven design. In my experience, TDD is very much focused on the happy path. Each test describes a small piece of the overall functionality, a “specification in code.” Can detailed edge-case tests contribute to that specification without cluttering it? And if yes, how does a developer switch mindsets to come up with them?

Friday, January 7, 2011

Mock Objects and Statistics Gathering

Unit testing tends to focus on the API. The unspoken assumption is that, if the API behaves as expected, then the internals are correct. However, that's not guaranteed; even Uncle Bob concedes that a pure test-driven approach will lead you to BubbleSort rather than QuickSort (warning: it's a bit of a rant). Clearly, the “black box” approach of testing just the API is not sufficient; some tests need a “white box” approach, where you look into the implementation. But how to do that?

One way is to add test hooks: public methods that give access to internal object state. I can see how this might work: you write all your mainline code in terms of interfaces, with factories to give you instances of those interfaces. And as mainstream software development embraces dependency injection frameworks, this might be the right answer. But it still leaves me with a queasy feeling. All it takes is one undisciplined programmer writing a cast to the implementation class so that s/he can use those methods, and you've got brittle code waiting to break.

I recently tried a different approach: using a mock object to gather statistics about the internals of the object under test. Ironically, it was a sort routine.

Mock objects have been a part of test-centric development for a while. There are many libraries that help you create mock objects, and it's easy to create your own using proxies. But most of the articles that I've read on mock-centric tests use them to track simple interactions: you want to verify that your class takes a particular action, so you inject a mock that asserts it occurred.

When testing my sort routine, I wanted to verify that the sort actually performed in NlogN time. It was a static method, so there was no place to add a test hook even if I believed in doing so. But there was one hook that was a natural part of the API:

public static class CountingIntComparator
implements Heapsort.IntComparator
{
    public int count;
    public int expectedCount;

    public CountingIntComparator(int size)
    {
        // our implementation of heapsort should perform at most 3 compares per element
        expectedCount = 3 * size * (int)Math.ceil(Math.log(size) / Math.log(2));
    }

    public int compare(int i1, int i2)
    {
        count++;
        return (i1 < i2) ? -1
             : (i1 > i2) ? 1
             : 0;
    }

    public void assertCompareCount()
    {
        assertTrue("expected < " + expectedCount + ", was " + count,
                   count < expectedCount);
    }
}

This works because “O(NlogN)” refers to the number of comparisons made by the sort. And it made for very simple test code (note that I needed a relatively large test array; this is a probabilistic tests, and I didn't want the test to fail because the random number generator returned a string of “bad” values):

public void testIntSortManyElements() throws Exception
{
    final int size = 10000;

    int[] src = createRandomArray(size);
    int[] exp = createSortedCopy(src);

    CountingIntComparator cmp = new CountingIntComparator(size);
    Heapsort.sort(src, cmp);
    assertEquals(exp, src);

    cmp.assertCompareCount();
}

The take-away from this, if there is one, is that mock objects can do more than simple interaction tests. They can gather statistics that give you insight into the long-term behavior of your objects, without the need to expose object state to the entire world.

The key seems to be whether or not the API that you're building is amenable to injecting such a mock without adding a test-specific variant. Perhaps it's a technique that only works for utility classes. But I'll be looking for similar opportunities in more complex code.

Friday, October 30, 2009

Tests are the Story

I have a recurring discussion with my friend James, regarding test-first development. He believes is that writing tests first is like writing an outline for an academic paper or novel. And sometimes you don't know where the story is going to end up when you start writing. In the past, I've always countered that with the liberal-arts-major argument that you should know what you're going to say before you say it.

However, he makes a good point. Even when you have a goal in mind, sometimes the path to that goal takes unexpected twists and turns. If you write your tests presuming a specific order of steps, they'll become obsolete when the real story doesn't follow the outline.

So here's a different response: the tests are the story. The production code is merely a footnote.

The ultimate goal is satisfied users, and the tests are development standins for those users (even if the "user" is another piece of code). As you write your tests, you should be thinking not of the needs of your mainline code, but the needs of those users. As your understanding of those users changes, your tests will change. Some will be deleted.

You could call this wasted effort, but that's misleading. It was necessary effort, because you didn't have a perfect understanding of the users' needs. The real question is: was it better to explore the plot using tests, or to simply write some mainline code and hope that it did what it should?

Monday, October 26, 2009

Unease

If you really don't care you aren't going to know it's wrong. The thought'll never occur to you. The act of pronouncing it wrong's a form of caring.

Zen and the Art of Motorcycle Maintenance is a book that you'll either love or hate. To some, it is a bombastic restatement of ideas that everyone already knows. To others, a lightweight gateway to heavyweight philosophy. And then there are people who believe it changed their life. I can't say that I'm one of the latter, but the book resonates with me on many levels, and I've reread it every few years since I was 15.

I recently paraphrased the above quote in a presentation on testing and coverage. The main point of my presentation was that 100% coverage does not guarantee sufficient testing; an audience member asked the obvious question “then how do you know that you've written enough tests?” My answer can be paraphrased as “you've written enough tests when you feel comfortable that you've written enough.”

Not a terribly good answer, to be honest. It did, however, set up my closing slide, which pointed out that you need committed, test-infected developers to get good tests. If you don't have that, you might as well buy a tool that throws random data at your code. But how does one arrive at the level of caring needed to write good tests? And is it inborn, or something that comes from experience?

I've been thinking about these questions this morning, because I'm refactoring some code and am left with what I consider an ugly method call: if called in one situation I want it to throw an exception, if called in another I want it to fail silently and return null. I've been puzzling over whether I really need to break the method into two parts, and also whether I should think more about another quote from Pirsig:

What's more common is that you feel unpeaceful even if it's right

Sometimes, perhaps, good enough is good enough.

Wednesday, September 2, 2009

Building a Wishlist Service: Testing

There are a lot of ways to test a web application: in-container testing using tools such as Cactus, full client-server testing using tools such as HttpUnit, out-of-container unit tests, and even manual tests.

I'm not a fan of any test approach that requires starting the application, because you won't want to do that on every build. So most of the tests for my service are unit tests, meant to run outside of the container. And although I knew that this would happen, I'm still amazed at how such tests push me toward decoupled code, even when I write the tests after the code.

The servlet's dispatch mechanism is a prime example. Going in, I knew that I wanted a RequestHandler class for each operation. My first pass used an interface, with a getRequestHandler() method inside the servlet class. However, the only way to test that method was to make it public or protected, and I'm not willing to violate encapsulation for testing. So RequestHandler became an abstract class, and getRequestHandler() became a static factory method. At the same time, I decided to instantiate a handler object per request, rather than reuse a table of singleton objects: the latter was easier to manage, but the former was easier to test.

Unit-testing servlet components means a lot of stubbing and mocking, and I decided to create a reflection proxy for the servlet request and response objects. I realize that there are existing mock implementations for these objects, but I figured that I wasn't using many of their methods and wanted to limit the use of third-party libraries (that's the topic of another post).

And that led to another observation: writing your own mock objects tends to reduce the number of places you touch those objects. If I have to mock both getParameter() and getParameters(), I'm going to think about why I call both methods, and probably use just one of them. This should translate to reduced chance of errors, in this case because I'll be cognizant of cases where there may be more than one parameter with the same name.

There's another effect that I'm seeing from writing for testability: I tend to use the Template Method pattern. A lot. Or perhaps it's simply a natural pattern for web services. I'll look at that more closely in the next post.

Thursday, June 11, 2009

Reinforcement

My postings on SkipList finish with observations and concerns, but no conclusions — no resolution to my concerns. Actually, when I originally posted the article, there was a concluding paragraph, but I decided that it deserved its own posting.

So, the concern: test-first-development, or even test-driven-design, is a great thing for experienced programmers. It provides a sandbox in which to explore ideas and design, with the safety — fearlessness, in the words of TDD proponents — to change code without breaking it. In the hands of an inexperienced programmer, however, I believe that it can turn into a blind alley, through a process in which small rational decisions lead to an irrational result. I can easily imagine the dead mother counter arising from TDD.

So what's the solution? Well, one simple approach is to have an experienced programmer look at the tests. Code reviews are an accepted part of many development processes, but they usually focus on the mainline code. While that's certainly important, it seems less-so when you have high test coverage (as you will in a pure-TDD process). The tests, on the other hand, not only tell you whether the requirements were met, but give insight into the mind of the programmer. If you see a series of tests that don't make sense, it's a flag to look at the mainline code: there's probably something in it that doesn't make sense either.

Another benefit of code review for tests is that it can happen early and often. Mainline code reviews generally happen after the code is done, because that's when you can see the logic flow. Test reviews, however, can happen at any time: even if your mainline method/class is half-written, you should still have a bunch of tests for it.

Taken to the extreme (pun intended), you'll have a person sitting by your side and reviewing the tests as you write them: pair programming. I imagine that most readers are cringing after that last sentence. Programmers, because who wants someone looking over your shoulder, and managers, because pair programming is a great way to take two people and get the output of one.

But is it really? If your shop has code reviews as part of its process, have you ever considered how many person-hours goes into one? As a case study from my own career, one shop scheduled 2 to 3 reviews a week, with the presenter and 3 other developers. A review took about an hour of preparation, and an hour of meeting time, giving a total of 8 person-hours. And those reviews covered only the “important” code: perhaps 10% of the codebase.

So, 2-3 developer-days per week spent on code reviews, with more time spent by the reviewed developer to clean up any issues. Of course, you can do less rigorous reviews, but where's the value in that? It's too easy to end up with a review “process” that's nothing more than a formality: a rubber stamp, as overworked developers take at best a cursory look at the code under review. And if you're in that situation, the time is well and truly wasted.

Pair programming is a practice espoused by many Agile processes, in particular Extreme Programming (XP). Now, to be fair, I've never worked in a shop that adopted XP, so you might reasonably ignore what follows. However, I have followed many of the XP practices (including limited pair programming) at other shops, so am not completely talking through my hat.

And looking at the various practices — and also at shops that claim to practice Agile but never seem to reap any benefits — I've come to the conclusion that the real benefit of XP is that the various practices reinforce each other. OK, this isn't a great insight: it's right there on the front page of extremeprogramming.org: “The rules and practices must support each other.”

But that message seems to get lost. Indeed, if you click the “How do I start” link on that page, you'll see a suggestion that you can add a few practices at a time. This is not what I remember when I was first exposed to XP: at the time, the prevailing opinion was that you should start with all dozen or so practices, and eliminate practices only once your team had enough experience to do so with reason.

Reinforcement is important. It's too easy, when you see a deadline looming, to say “there's no need to write tests for this code, it's simple.” It's surprising how often that “simple” code ends up harboring a bug that doesn't turn up until you hit production. It's a lot harder to skip that test when the person sitting next to you raises his/her eyebrow and asks why.

Which brings me to what should be the strongest reinforcement of all: not disappointing your coworkers. But that's another posting.

Tuesday, June 9, 2009

TDD and the SkipList (part 3)

So here I am, with a SkipList implementation developed test-first. I have 100% test coverage (actually, more than 100% coverage), and didn't have to break encapsulation or pollute the class' interface to get it. Along the way, I wrote down some observations.

First: I remain convinced that test-first is the way to develop code, if only to ensure that you're doing what you think you are. In the case of the SkipList, this point was driven home early, when my tests caught a bug in the tier assignment method. It was a subtle bug: I knew that the size of the list determined the maximum tier for an element, so wrote the tier-assignment code to pass the current size into Random.nextInt(). My test however, kept failing. When I looked at the actual counts I discovered they were skewed — the bottom tier had about half the elements it should, and they were distributed to higher tiers. Without the tests, I never would have discovered the problem, nor been able to experiment with a solution (which was to pass a power of two).

And that leads to my second observation: it's really hard to avoid preconceived design. Although I wrote the test first, I already knew (or thought I knew) how I was going to implement the method. A pure TDD approach would have been to write separate tests for a variety of list sizes, and perhaps that would have led me directly to the “needs to be a power of two” solution.

Third: TDD is best accomplished in a single session, at least at the scale of a single method or group of methods. This one surprised me, because I thought that the tests would read like a story, leading me back to the point I had stopped. In reality, I put the half-finished code aside while attending the TDD training, and when I picked it up again I puzzled over some of the decisions that I'd made along the way.

On reflection, the storybook metaphor is actually the right one: you can't expect to put a novel down for a week and jump right back into it (at least, I can't). I think this may be one of the reasons for the XP practice of deleting uncommitted code before going home for the night. It's faster to start from scratch.

However, this observation concerns me, because it has direct bearing on the “code as specification” benefit of TDD. I find that the purist approach of small, limited test cases actually hinders comprehension, because the knowledge of how an object works is spread over many tests. I tend to write larger tests, such as testElementCreation() that explore multiple facets of a single behavior. Regardless of test size, this observation highlights the need for test clarity.

Fouth observation: implementation tends to be a quantum behavior. There are points at which you make a dramatic step in functionality that's impossible to break into smaller pieces. Either it works as expected, or it doesn't work at all. The SkipListManager.insert() method is an example: although I have a plethora of test scenarios, the code was pretty much complete (if slightly buggy) after the first. The only way to avoid this is to test something other than a SkipList.

And this brings me to a point that Uncle Bob made during the training: “TDD won't give you quicksort.” That statement seems to invalidate the entire approach: if a minimal failing test won't get you closer to your goal, why do it? The answer, I think, is that dogma inculcates discipline, and the TDD purists recognize that the real route to better code is disciplined programmers. Test-driven development mandates discipline: you have to to write the tests, and this will (hopefully) get you to thinking about the ways your code is used.

But this leads to my last observation: there were too many cases where the tests told me that I was fine, but my intuition said otherwise. I've developed that intuition over more than 30 years of writing code; many TDD proponents have been at it longer. An inexperienced programmer won't have that intuition, and could easily find him/herself at the end of a road with nowhere to turn.

Monday, June 8, 2009

TDD and the SkipList (part 2)

Several years have passed since my first attempt at implementing SkipList via TDD. I continue to do test-first development — for example, most of the Practical XML library was written in this manner. However, while I use tests to validate my design, and put me in the shoes of a person using the library, they don't drive that design.

For the SkipList, I tried very hard to take that next step, and let the tests show me where to go. I'm not sure that I was successful, and in some cases I didn't like the results. I won't blame you for skipping this post: it has a close resemblance to a fishing show (“yes, Billy-Bob, he's definitely got something on the hook, so lets watch him reel it in”). Not for everyone, but it does lay the groundwork for my next post.

In returning to the SkipList, I recognized that I would need a way to test the implementation separate from the public API. Java gave me a solution to the packaging issue: rather than using multiple top-level classes, or interface and implementation, I would use nested static classes. Everything lives in one source file. The choice of nested classes, rather than inner, was intentional: if TDD was going to drive me to independent collaborators, I wanted them to be truly independent.

The first question was where to draw the line between interface and implementation? I took what appeared to be the simplest: create a SkipListElement that knew only how to talk to its neighbors. The SkipList itself would know about the first element. Conceptually, this is how any linked list can be implemented, and it's very easy to test. In practice, of course, implementing a normal linked list with self-aware elements is a good way to blow up your stack from the recursive search operation (unless your language optimizes tail recursion). For SkipList, with its log2 access paths, it seemed like a workable idea.

As it turned out, while the tests remained simple, the implementation quickly became very complex. The find() method was particularly so: it had to work its way up through the tiers of links and then back down. After a lot of massaging, it turned into a half-dozen lines of code that still appeared to have O(log2) behavior. This exercise, however, was more test-first than test-driven: I made a lot of changes to the code, but only after pondering why the test was still failing; the test remained relatively unchanged.

Moreover, at this point my intuition was screaming “warning, Will Robinson!” but the tests remained silent. Looking at the tests, I realized that I wasn't actually validating that my search properly traversed the list, merely that it found the expected element. So I wrote a new test, one that recorded each invocation of the find() method, and discovered that I wasn't properly maintaining high-tier links. Even though I'd written a nice recursive method, I was doing a linear search.

Would a “pure” approach to TDD have made me write this test? On the one hand, I was validating core behavior; on the other, I already had a green bar, 100% coverage, and lots of tests that showed me what I expected to see.

As I looked at the code and thought about what would be required to make it work, I came to the conclusion that I had walked down a cul de sac. So, in good XP fashion, I deleted everything and started over.

In my next attempt I introduced another nested class, SkipListManager, and went having SkipListElement be a dumb data container. Well, almost a dumb data container: I made the decision that it should know about its neighbors.

This led to an interesting situation: to write my tests, I wanted to be able to connect two elements together at multiple tiers, so wrote an append() method to do so. As I moved forward with the implementation, I found that method leading me into a corner: because it existed, I wrote my manager to use it, and ended up with a findAndRemember() operation that was almost, but not quite the same as the find() operation. I couldn't find a good way out without a third object, a “find iterator.”

At this point, I debated whether or not I really needed the manager object. Its sole reason for existence was to provide a back door for testing; giving the list protected methods would do the same. This was a rehash of my original concerns about TDD: protected methods are part of the interface of a class, and can never be removed. I decided that I didn't want to pollute SkipList just to support testing.

As I was thinking about the manager-element interaction, I decided that the best way back was to delete a large chunk of code (unlike the former attempt, I wasn't willing to throw it all away). My tests certainly helped with this: I deleted the chunks of code that I didn't like, and my IDE told me what tests depended on them. The code that remained was still covered by tests, still worked, and still represented a design that I liked.

I dove back in, and again ran into trouble inserting elements. Once I had appended an element in a tier, it prevented me from properly traversing the lower tiers. Score one for the tests: they turned up this problem right away. I still needed to run in a debugger to figure out what was happening, but the tests gave me a controlled environment in which to walk the code.

The test for the manager's insert() function came in at nearly 100 lines of code, as I validated how different element values would work their way through the chains. Although I wrote this test one section at a time, I suspect a TDD purist would look at the length of the method and cringe (it was actually quite a bit longer, until I refactored the assertions). The TDD approach would be to break each of these scenarios into separate methods, but that seems like a lot of needless duplication to me: I would have to repeat code to configure the list for each test. By combining these tests I additionally validate that I'm not damaging other portions of the list. I will say, though, that my variable naming conventions left something to be desired.

The final hurdle came from remove(). After getting the manager to properly handle insertions, I started working through the public API in alphabetical order. When I hit removal, I realized belatedly that I just wasn't going to to get away with a singly-linked list. Should I have seen this coming? Probably yes, given that my element-only approach ran into a similar dead end. But it points out a very good point: with TDD, the order of tests matters. On the positive side, I was able to insert the backlinks by breaking one test at a time, and I was still testing the forward links.

Are you ready for “the codez”? You'll find the list class here, and the test class here. The name “SkipList1” should give you a hint that I'm still not happy with the implementation, but it's time to ship (ie, get this post up on the blog). Over the next few weeks I'll be tinkering with the implementation (I have an idea that will take me back to a forward-linked list), and once I'm happy with it (if ever), will post a link to the final code.

Next up: my thoughts about this experiment, and TDD in general.

Friday, May 29, 2009

TDD Training Debrief

I'm going to interrupt the postings on SkipList, because the TDD class is over. Did I learn anything?

I came into class thinking that I had a fairly good understanding of TDD, albeit as a skeptic that it could directly drive design. I left thinking the same thing. I'm sure there were some pieces of information that I picked up without realizing — it's impossible for that not to happen if you have a good instructor. And I was exposed to Fitnesse, a tool that will probably be more useful in a future career stop. But I came in test-infected, left test-infected, and didn't have a revelation in the middle. My thoughts may not have changed, but the class pushed me to think deeply about the subject again, and test those thoughts.

Where I thought the class was tremendously valuable was in interacting with Uncle Bob. He's a lot more pragmatic in person, admitting that there are places where test-driven design falls down (eg, TDD will give you bubble-sort, not quicksort). I can understand this difference in personality: written words have to stand on their own for all time, presentations are marketing (of ideas, if not of self), while the classroom is a place for discussion.

The biggest revelation, however, came from observing my fellow students. At any given time, about half of them were working on other things: responding to email, reviewing defects, writing code for their current projects, and generally ignoring what was happening around them. And to be honest, during the last example of the first day I joined them: a minor crisis had appeared in my inbox over lunch, and I decided that it was more important to respond to that crisis than to work through the example — after all, I already knew this stuff, right?

Uncle Bob finished the class with a talk on what he saw as the future of TDD. He used an interesting example: in the 1970s there was a lot of argument about the value of structured programming, yet today every mainstream language uses structured constructs — “goto” is no longer considered harmful, because it no longer exists. His opinion is that 30 years from now we won't be discussing TDD either, because it will be standard practice.

I think there's a flaw in this reasoning: computer languages are created by language designers, made whole by compiler writers, and are in effect handed down from on high for the rest of us. Even Fortran and Cobol have evolved: while there may be dusty decks filled with gotos, code written yesterday (and there is some) uses structured constructs. TDD won't follow that path, unless Eiffel becomes a mainstream languge, because of the commitment that it requires.

Anything that requires commitment also has costs: what do I have to invest to travel this path (direct cost), and what do I have to give up (opportunity cost)? In an economics classroom, choices are easy: you add the direct and opportunity costs, and take the low-cost option. In real life, it's often hard to identify those costs — most people consider staying in the same place to have zero cost. And given that, there's no reason to change.

I can't remember when I became test-infected: I know that I was writing test programs to isolate defects some 20 years ago. As I read the testing literature that started to appear around 1999, a lightbulb went on in my head, and I tried writing my mainline code with tests. And learned that no, it didn't take me any longer, and yes, I found problems very early. In other words, that TDD was in fact the low-cost option.

Tuesday, May 26, 2009

TDD and the SkipList (part 1)

I'm not a purist when it comes to test-driven development, although I do believe that tests validate design as well as code. My mantra is “if it's hard to test, it will be hard to use” (which is disturbingly close to “if you can dodge a wrench, you can dodge a ball”). But the purist approach, as exemplified by “Uncle Bob” Martin, is that writing your tests first will actually lead to a better design — the term they use is “emergent design.”

The Bowling Game is Uncle Bob's canonical example. While Big Design Up Front (BDUF) looks at the real world and posits the need for objects such as Frame, TDD approaches from the simple goal of scoring the game, and ends up with a single Scorer class. A neat demonstration, but the immediate response is that it's trivial, and you're not likely to be writing a scorekeeping application for your bowling league. What about real applications?

Last year I attended one of Uncle Bob's presentations on clean code. As an example of code needing cleanup, he used excerpts from his own Fitnesse tool. And while I have a great deal of respect for people who point out the flaws in their own work, I had to ask: “didn't you develop this using TDD?” His answer was yes, but that TDD doesn't guarantee clean code. Not an answer to give a warm fuzzy feeling, particularly since most of the unclean bits simply needed better factoring — which is supposed to be TDD's sweet spot.

But my real break with TDD orthodoxy came when I tried to implement a SkipList using pure TDD. For those who didn't follow the link, a SkipList is a data structure that holds its elements according to their natural ordering, and provides O(logN) access. It does this with multiple tiers of linked lists: every element is linked into the bottom tier, and each higher tier has roughly half the elements of the tier beneath. I say “roughly,” because elements are randomly assigned to tiers (following a log2 distribution). To find an element, you start at the topmost tier and work your way right and down until you get to the bottom tier and run into an element whose value is larger than the one you seek.

A SkipList doesn't implement the java.util.List API, because it doesn't preserve the insertion order of its elements. It does, however, implement the java.util.Collection API, and so I approached it through this public API, doing the simplest thing that could possibly work to make the tests pass. However, this wasn't going to get me to a SkipList: I could make all the tests pass using an ArrayList with linear search and insert. While I could actually implement a SkipList back end, I would break the rule of writing only enough code to make a single test pass. Clearly, testing the external interface wasn't enough; I had to look inside the implementation.

I posted this quandary in a comment to one of Uncle Bob's blog entries several years ago, and received the astounding (to me) reply that yes, I should open up the class and look inside it. My reply was perhaps a bit more sarcastic than needed, but I felt my point was valid: isn't encapsulation one of the reasons that we use object-oriented languages? Perhaps that explained some of the ugly code in Fitnesse.

One way to preserve encapsulation while still enabling white box testing is to program to interfaces: SkipList becomes a public interface, implemented by SkipListImpl. Mainline code uses the interface, test code uses the implementation. Of course, this implies the existence of a factory, because you need some way to create instances. Is this really what we want for a utility class?

An interesting side-effect of programming to interfaces is that we now need two tiers of tests: one that works with interfaces, one that works with the implementation classes. To me, this seems to be a move away from test- driven development, into the realm of testing for validation.

Back to the SkipList: if I wanted to develop the core linked-list code via TDD, I was going to have to extract that code and test it separately. This is one of the claimed benefits of TDD: it leads you to small, well-factored classes. But again, is this really what we want for a utility class? Or, to rephrase, do we want a library that exposes a lot of public classes used by only one consumer? My inner database developer is wary of 1:1 relationships.

I never saved my first attempt, so this weekend I started fresh; my next post will look at what TDD gave me. I decided to do this, not because I particularly need a SkipList, but because today I start a three-day class on TDD taught by Uncle Bob. I doubt there will be time in the class to do an actual implementation, but I plan to throw it out as a question. I'd like to see if I'm missing something or if the TDD purists are.

Sunday, February 1, 2009

TDD Isn't Just QA

I enjoy reading Joel On Software, and have started listening to the podcasts that he does with Jeff Atwood. It's a great distraction from my commute, particularly when Joel says something that's way over Jeff's head and Jeff goes off on a completely different topic. In general I agree with Joel's positions, but in the past few week's he's been slamming test-driven development and programming to interfaces.

In last week's podcast, about 40 minutes in, Joel gives an example of what he would consider excessive testing: a feature that will change the compression level used by the Fog Creek Copilot application. And his argument was that testing that feature would require actually comparing images under the different compression schemes, which was an enormous amount of work to simply verify that you can click a button.

At this point I realized that one of us doesn't understand test-driven development. And I don't think that it's me.

I will give Joel credit for one thing: he gave a good example of how not to write a unit test.

First, because what he's describing is an integration test. And I'll agree, there's usually little return from writing a test to demonstrate that clicking a single button changes the overall flow of a program in a predetermined way. Particularly if your approach to such a test involves validating the output of a library that you do not control.

But why does he need to do an end-to-end test to verify this one button?

I suspect it's because he doesn't partition the program's logic across a series of interfaces. Instead, clicking that button sets off a chain of explicit function calls that ultimately reaches to the other end of a TCP connection. And this is where the discipline of real unit tests comes in.

One of the underlying themes of test-driven development: if it's hard to test, it will be hard to use and maintain. Maybe not today, maybe not in the next revision, but at some point all of those strands of untestable code will wrap themselves into tangled mess. And one way to make code easy to test is to create boundary layers in your application: the two sides of a boundary are easier to test than the whole.

For Joel's example, my first thought is that such a boundary could be created by introducing an object to hold configuration data. On one side of this boundary, the GUI sets values in the configuration object; on the other, the internals use those values to configure the actual libraries. Introducing that single object will improve testability, even without unit tests.

How is the application tested today? I'm guessing that the tester sets a few configuration options, makes a connection, and visually inspects the result. This accomplishes two things: verification that the GUI controls are actually hooked up to something, and verification that the controls have some effect. But why should the former be tied to the latter?

Once you have a configuration object, GUI testing is simply a matter of adding logging statements to that object. You click a button, you should see the appropriate log message. While you could automate those tests, there's no reason to do so. Without the overhead of actually making a connection, any rational number of controls can be manually validated in a matter of minutes.

The other side of the boundary is a different matter: rather than force your tester to manually inspect the results of each configuration change, write automated tests that do it for you. Different compression ratios and color depths can be verified using a single test image, with assertions written in code (does a higher compression ratio result in a smaller output? good, the test passed). There's no need to do low-level validation of the library, unless you don't believe that its authors did enough testing (in which case you have other issues).

But wait, what if that compression code is part of the communications stack?!? OK, introduce another boundary layer, one that separates data processing from data transmission. And this brings me to my biggest complaint with Jeff and Joel: they seem to feel that testing is only important when you're writing an API.

In my opinion, if you're doing object-oriented programming correctly, you are writing an API. Objects interact with each other through their interfaces: you give them certain inputs, and they produce defined outputs. This doesn't mean that every object needs have MyIntf and MyImpl classes. It does mean that you need to define a contract that describes how those inputs and outputs are related.

By writing unit tests — true unit tests, not integration tests in disguise — you force yourself to define that contract. Your goal is not to write tests for their own sake, or even for the sake of future quality assurance, but to tell you when the interface meets your requirements.