blog.kdgregory.com: refactoring

Showing posts with label refactoring. Show all posts

Tuesday, July 19, 2011

Remaining Relevant

Yesterday I republished one of the articles on my website, a how-to guide on dealing with memory problems. A few weeks ago I'd been working on a memory leak in a large app, and decided that the section on heap histograms should be expanded. Once I started editing, however, the changes just kept coming. I found some places that were unclear, some that were too 32-bit-centric, and some things that were just plain wrong. I think the only section that remained unchanged was the one on permgen. The structure of the rest of the article remained the same, but almost all of the text is different, and the article doubled in size.

There are a couple of ways that I could look at this. The first, more positive way, is that I've learned a lot about debugging memory errors in the two years since I first published the article. Except … I really haven't. And the tools haven't changed that much in the interim either (although 64-bit machines are becoming ubiquitous). And after twenty-five or so years of writing professionally, I'm not convinced that I've suddenly become better at explaining technical topics.

I think that the answer is that there's always more depth, more niches to explore. But most of them are pointless. I could spend pages on “bugs that I have known,” and the only result would be that the 241 people that Google Analytics says read this blog regularly would stop. So plumbing the depths isn't the right approach.

And yet … the articles on my website are supposed to be compendiums of everything that I know on a topic. Seeing how much that one article changed has me worried that I should go through all the others and at least review them. After all, even Knuth had second editions. And I know there are some things that I'd like to change.

But against that is the philosophy of “good enough,” also known as “ship it!” I could spend hours wordsmithing, trying to get each sentence just right. But I don't think that time would make the underlying message any different. Once you've reached the point of proper grammar and logical sentence structure, you've reached a point of diminishing returns. Taking the next step may be valid if you're looking for a Pulitzer, but they don't give out Pulitzers for technical writing.

Plus, there's a whole backlog of new things to write about.

Tuesday, November 10, 2009

Refactoring: Evil, or Just Misunderstood

How did refactoring get such a bad reputation? Depending on who you talk to, it introduces bugs, makes projects more expensive, breaks unit tests, and helped the Yankees win the World Series. Now, as a Boston native and Philly resident, I hardly want to help the Yankees, but it seems that claim is as overdone as the others.

Let's start with increased project cost. In my prior job, I often had to create estimates for changes to our current codebase. After working through an estimate I would usually have a conversation with the Business Owner that went like this:

BO: You have an estimate of five days for these page changes; can you give a breakdown?

Me: Sure. These are pages that do essentially the same thing, but have about 3,000 lines of code each. Refactoring accounts for four days, and the new additions take a day.

Usually, I was interrupted in the middle of the word “refactoring,” with a loud “We can't afford that!” And while I understood the reaction, the alternate estimate was for 8 days, allowing the developers time to figure out where to make yet another set of copy-and-paste changes to those enormous, similar-but-not-the-same JSPs.

In my view, refactoring is a tool to reduce duplication and reduce the amount of code that a developer has to keep in his or her head at one time. This viewpoint is derived largely from the use of the word “factor“ in mathematics: the number 12, for example, has the factors 4 and 6, which can be reduced to prime factors 2, 2, and 3. Once you reduce to prime factors, you discover duplication.

In software, duplicate code may not be so obvious. Unless, of course, that software was built using “copy and paste coding,” with short deadlines and no refactoring during development. In that case, it becomes just as obvious as the prime factors of 12. Eliminating duplication reduces the amount of code that has to be maintained and changed, consequently reducing the number of places that bugs can hide. The result should therefore be reduced implementation costs (and in the places where I got away with page refactoring, using the technique described here, those cost reductions did indeed appear in future projects).

Which brings up the second complaint: refactoring introduces bugs. I suppose that's true: every time you edit code, you run the risk of introducing a bug. And if you write bugfree code that never has to be changed, then I fully agree: don't refactor it. For myself, although I've occasionally written (non-trivial) code that is apparently bug-free, I've always had to change it at some point down the road as the business requirements changed.

And as I see it, well-factored code should actually reduce bugs! Consider the following code, something that you might see scattered throughout a web application:

String operation = request.getParameter("operation");
if ((operation != null) && (operation.length() > 0))
{
    // do something with the operation value
}

I look at this code and see two immediate refactorings: replacing the literal value with a constant, and replacing the if condition with a method call like Jakarta Commons' StringUtils.isNotEmpty(). These refactorings would immediately reduce the chance of errors due to typos — including a typo that left out the null check (yes, I've seen this in production code). True, good code would be written like this from the start, but again, that's not code that needs refactoring.

If I were feeling ambitious — and if this were part of a large method or scriptlet I would almost certainly feel that way — I would extract the entire if operation into its own method, with a descriptive name like retrieveOperationData(). Over the long term, this will make the code easier to maintain, because developers can simply skip the implementation if the don't care about operation data. And they're less likely to put some unrelated code inside the if.

But what if my changes introduced a bug? What if the code inside that if already mucked with some unrelated variable? Assuming that the code would actually compile with such a bug, I'd expect my tests to catch it — and if they didn't, I'd write a new test once a person reported the bug.

Which brings up the last complaint (and the one that prompted this article): refactoring breaks unit tests. Before taking on this point, I want to quote Martin Fowler, from the preface to Refactoring:

Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure.

If you don't alter the external behavior of the system, how do you break tests? Well, one obvious way is to write tests that depend on the internal structure. It's particularly easy to do this when you use mock-object testing, or when you decompose large objects into smaller “helper” objects. If you've unintentionally made the tests dependent on internal structure, then you fix your tests, and hopefully won't repeat the mistake. But a breaking test might indicate that your mainline code has similar dependencies, in which case it would be a Good Thing to eliminate those dependencies everywhere.

A more likely cause, however, is that you're not really refactoring per the definition above, but restructuring: changing the external behavior of the code as well as its internal structure. In mathematical terms, you've decided that rather than 12, you really want 14; which happens to factor into 2 and 7 and can't be reduced further.

While you might have a perfectly valid reason for wanting this, you can't blame the tests for continuing to assert 12. Instead, ask yourself why you once thought 12 — your original design — was correct, and why it's no longer correct. That's the topic of a future post.

Monday, October 26, 2009

Unease

If you really don't care you aren't going to know it's wrong. The thought'll never occur to you. The act of pronouncing it wrong's a form of caring.

Zen and the Art of Motorcycle Maintenance is a book that you'll either love or hate. To some, it is a bombastic restatement of ideas that everyone already knows. To others, a lightweight gateway to heavyweight philosophy. And then there are people who believe it changed their life. I can't say that I'm one of the latter, but the book resonates with me on many levels, and I've reread it every few years since I was 15.

I recently paraphrased the above quote in a presentation on testing and coverage. The main point of my presentation was that 100% coverage does not guarantee sufficient testing; an audience member asked the obvious question “then how do you know that you've written enough tests?” My answer can be paraphrased as “you've written enough tests when you feel comfortable that you've written enough.”

Not a terribly good answer, to be honest. It did, however, set up my closing slide, which pointed out that you need committed, test-infected developers to get good tests. If you don't have that, you might as well buy a tool that throws random data at your code. But how does one arrive at the level of caring needed to write good tests? And is it inborn, or something that comes from experience?

I've been thinking about these questions this morning, because I'm refactoring some code and am left with what I consider an ugly method call: if called in one situation I want it to throw an exception, if called in another I want it to fail silently and return null. I've been puzzling over whether I really need to break the method into two parts, and also whether I should think more about another quote from Pirsig:

What's more common is that you feel unpeaceful even if it's right

Sometimes, perhaps, good enough is good enough.

blog.kdgregory.com