Tuesday, November 10, 2009

Refactoring: Evil, or Just Misunderstood

How did refactoring get such a bad reputation? Depending on who you talk to, it introduces bugs, makes projects more expensive, breaks unit tests, and helped the Yankees win the World Series. Now, as a Boston native and Philly resident, I hardly want to help the Yankees, but it seems that claim is as overdone as the others.

Let's start with increased project cost. In my prior job, I often had to create estimates for changes to our current codebase. After working through an estimate I would usually have a conversation with the Business Owner that went like this:

BO: You have an estimate of five days for these page changes; can you give a breakdown?

Me: Sure. These are pages that do essentially the same thing, but have about 3,000 lines of code each. Refactoring accounts for four days, and the new additions take a day.

Usually, I was interrupted in the middle of the word “refactoring,” with a loud “We can't afford that!” And while I understood the reaction, the alternate estimate was for 8 days, allowing the developers time to figure out where to make yet another set of copy-and-paste changes to those enormous, similar-but-not-the-same JSPs.

In my view, refactoring is a tool to reduce duplication and reduce the amount of code that a developer has to keep in his or her head at one time. This viewpoint is derived largely from the use of the word “factor“ in mathematics: the number 12, for example, has the factors 4 and 6, which can be reduced to prime factors 2, 2, and 3. Once you reduce to prime factors, you discover duplication.

In software, duplicate code may not be so obvious. Unless, of course, that software was built using “copy and paste coding,” with short deadlines and no refactoring during development. In that case, it becomes just as obvious as the prime factors of 12. Eliminating duplication reduces the amount of code that has to be maintained and changed, consequently reducing the number of places that bugs can hide. The result should therefore be reduced implementation costs (and in the places where I got away with page refactoring, using the technique described here, those cost reductions did indeed appear in future projects).

Which brings up the second complaint: refactoring introduces bugs. I suppose that's true: every time you edit code, you run the risk of introducing a bug. And if you write bugfree code that never has to be changed, then I fully agree: don't refactor it. For myself, although I've occasionally written (non-trivial) code that is apparently bug-free, I've always had to change it at some point down the road as the business requirements changed.

And as I see it, well-factored code should actually reduce bugs! Consider the following code, something that you might see scattered throughout a web application:

String operation = request.getParameter("operation");
if ((operation != null) && (operation.length() > 0))
{
    // do something with the operation value
}

I look at this code and see two immediate refactorings: replacing the literal value with a constant, and replacing the if condition with a method call like Jakarta Commons' StringUtils.isNotEmpty(). These refactorings would immediately reduce the chance of errors due to typos — including a typo that left out the null check (yes, I've seen this in production code). True, good code would be written like this from the start, but again, that's not code that needs refactoring.

If I were feeling ambitious — and if this were part of a large method or scriptlet I would almost certainly feel that way — I would extract the entire if operation into its own method, with a descriptive name like retrieveOperationData(). Over the long term, this will make the code easier to maintain, because developers can simply skip the implementation if the don't care about operation data. And they're less likely to put some unrelated code inside the if.

But what if my changes introduced a bug? What if the code inside that if already mucked with some unrelated variable? Assuming that the code would actually compile with such a bug, I'd expect my tests to catch it — and if they didn't, I'd write a new test once a person reported the bug.

Which brings up the last complaint (and the one that prompted this article): refactoring breaks unit tests. Before taking on this point, I want to quote Martin Fowler, from the preface to Refactoring:

Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure.

If you don't alter the external behavior of the system, how do you break tests? Well, one obvious way is to write tests that depend on the internal structure. It's particularly easy to do this when you use mock-object testing, or when you decompose large objects into smaller “helper” objects. If you've unintentionally made the tests dependent on internal structure, then you fix your tests, and hopefully won't repeat the mistake. But a breaking test might indicate that your mainline code has similar dependencies, in which case it would be a Good Thing to eliminate those dependencies everywhere.

A more likely cause, however, is that you're not really refactoring per the definition above, but restructuring: changing the external behavior of the code as well as its internal structure. In mathematical terms, you've decided that rather than 12, you really want 14; which happens to factor into 2 and 7 and can't be reduced further.

While you might have a perfectly valid reason for wanting this, you can't blame the tests for continuing to assert 12. Instead, ask yourself why you once thought 12 — your original design — was correct, and why it's no longer correct. That's the topic of a future post.

No comments: