Thursday, October 20, 2011

Defensive Copies are a Code Smell

This is another posting prompted by a Stack Overflow question. The idea of a defensive copy is simple: you have a method that returns some piece of your object's state, but don't want the caller to be able to mutate it. For example, String.toCharArray():

public char[] toCharArray() {
    char result[] = new char[count];
    getChars(0, count, result, 0);
    return result;
}

If you simply returned the string's internal array, then the caller could change the contents of that array and violate String's guarantee of immutability. Creating a new array preserves the guarantee.

This technique seems to be a good idea in general: it ensures that the only way to change an object's state is via the methods that the object exposes. This in turn allows you to reason about the places where an object can change, and will make it easier to identify bugs caused by changing data. There's even a FindBugs check for code that exposes its internal state this way (along with a related case, where an object maintains a reference to mutable data that was passed to it).

But are defensive copies really useful in practice?

The core argument in favor seems to be that you can't trust your fellow programmers. In some cases, this is reasonable: security-related classes, for example, should never blindly accept or hand out pieces of their internal state. And in a large organization (or open-source library), it's unlikely that other programmers will understand or care about your intended use of an object — especially if they can save a few lines of code by using it in an unexpected way.

As an argument against, every defensive copy consumes memory and CPU time. String.toCharArray() is a perfect example of this, particularly with large strings, which may be copied directly into the tenured generation. If a programmer blindly calls this method within a loop, it's quite possible for the garbage collector to eat up most of your CPU.

Moreover, there's almost always a better solution. Again using String.toCharArray() as an example, why do you need the character array? I would guess that 99% of the time, the reason is to iterate over the characters. However, String.charAt() will do the same thing without a copy (and Hotspot should be smart enough to inline the array reference). And you should be calling String.codePointAt() anyway, to properly handle Unicode characters outside the Basic Multilingual Plane.

That's all well and good for strings, but what about your application objects. Continuing the theme of “there's a better way,” I ask: why are your objects providing access to their internal state?

One of the principles of object-oriented programming is the Law of Demeter, which holds that collaborating objects should not know anything about each others internal state. The goal of Demeter — just like defensive copies — is to allow you to reason about your objects and their interactions within the application. But it also drives your design toward action: rather than simply holding data, an object should do something with that data. To me, this is what separates object-oriented programming from procedural programming.

Of course, as with any law, there are times when Demeter can and should be broken (for example, data transfer objects). But before breaking the law, think about the consequences.

No comments: