Thursday, October 20, 2011

Defensive Copies are a Code Smell

This is another posting prompted by a Stack Overflow question. The idea of a defensive copy is simple: you have a method that returns some piece of your object's state, but don't want the caller to be able to mutate it. For example, String.toCharArray():

public char[] toCharArray() {
    char result[] = new char[count];
    getChars(0, count, result, 0);
    return result;
}

If you simply returned the string's internal array, then the caller could change the contents of that array and violate String's guarantee of immutability. Creating a new array preserves the guarantee.

This technique seems to be a good idea in general: it ensures that the only way to change an object's state is via the methods that the object exposes. This in turn allows you to reason about the places where an object can change, and will make it easier to identify bugs caused by changing data. There's even a FindBugs check for code that exposes its internal state this way (along with a related case, where an object maintains a reference to mutable data that was passed to it).

But are defensive copies really useful in practice?

The core argument in favor seems to be that you can't trust your fellow programmers. In some cases, this is reasonable: security-related classes, for example, should never blindly accept or hand out pieces of their internal state. And in a large organization (or open-source library), it's unlikely that other programmers will understand or care about your intended use of an object — especially if they can save a few lines of code by using it in an unexpected way.

As an argument against, every defensive copy consumes memory and CPU time. String.toCharArray() is a perfect example of this, particularly with large strings, which may be copied directly into the tenured generation. If a programmer blindly calls this method within a loop, it's quite possible for the garbage collector to eat up most of your CPU.

Moreover, there's almost always a better solution. Again using String.toCharArray() as an example, why do you need the character array? I would guess that 99% of the time, the reason is to iterate over the characters. However, String.charAt() will do the same thing without a copy (and Hotspot should be smart enough to inline the array reference). And you should be calling String.codePointAt() anyway, to properly handle Unicode characters outside the Basic Multilingual Plane.

That's all well and good for strings, but what about your application objects. Continuing the theme of “there's a better way,” I ask: why are your objects providing access to their internal state?

One of the principles of object-oriented programming is the Law of Demeter, which holds that collaborating objects should not know anything about each others internal state. The goal of Demeter — just like defensive copies — is to allow you to reason about your objects and their interactions within the application. But it also drives your design toward action: rather than simply holding data, an object should do something with that data. To me, this is what separates object-oriented programming from procedural programming.

Of course, as with any law, there are times when Demeter can and should be broken (for example, data transfer objects). But before breaking the law, think about the consequences.

Saturday, October 1, 2011

The Role of Automated Tests

Automated testing is moving into the mainstream, adopted as a “best practice” by more companies each year. But why? Here are my reasons, originally intended as bullet points in a presentation on how to write tests.

Tests verify that the program behaves as expected

Let's get one thing out of the way up front: tests can find bugs, but they can't prove that no bugs exist. Or, as my friend Drew puts it: “tests can only show that your incorrect assumptions are internally consistent.”

However, as you increase test coverage, using well-designed tests, you gain confidence that the program will do what you want it to. In other words, that there aren't any obvious bugs. And unless you're writing code for the space shuttle, that's probably good enough.

Tests verify that the program continues to behave as expected when changed

The major portion of a program's development happens after it's released (80% is commonly quoted, but I couldn't find an authoritative reference). The bugs that got through testing will be found by end-users. Requirements will change, ranging from a simple UI facelift, through the addition of new business rules, to the deep structural changes needed to support increased load.

And when you change code, you risk breaking it. Usually in a place that you didn't think would be affected. Even in well-written code, there may be hidden side-effects. A test suite can protect you from the unintended consequences of change, provided again that it has complete coverage and well-designed tests. In my opinion, this is how automated tests provide the most value to the organization.

Of course, a test suite can also become part of a change. If your business rules change, then your tests have to change as well. This should be less of an issue at the level of “unit” tests, but it still happens. Unfortunately, many organizations consider such changes as an undesired cost. Instead, they should view them as a warning that the code may contain hidden dependencies on the old behavior, and budget extra time for release.

Tests serve as documentation

The idea of test-as-specification has long been part of Agile orthodoxy. Although, in practice, it can take a lot of work to make that happen with mainstream testing tools. I know that I've written more than my share of test methods with names like testOperation(). But if you have the discipline, a method named testFailureWhenArgumentsWouldCauseIntegerOverflow() is far more useful.

Tests give you a chance to think about your design

To me, this has always been the main benefit of testing: “if it's hard to test, it will be hard to use.” Of course, you can take this to an extreme: I have actually been asked by a traditional QA developer to store an application's internal data in comma-delimited format so that they could validate it (in that case, the binary format already took over 1GB, and was heavily optimized for access speed). While actively harming your design in the name of testability is foolish, it's not the common case.

More realistic is some code that I recently refactored: a single class that created a listener for external data, applied some business logic to the messages received, and sent messages based on that logic. As written, this code was impossible to test without instantiating the entire messaging framework. After refactoring, the business logic was in its own class, with separate listener and sender objects that could be mocked for testing. And that core business logic could now be tested in the form of a specification, with test names like testIgnoresThirdAndSubsequentDuplicateMessages().