Saturday, May 15, 2010

Cleverness

Delta kitchen faucets have a very clever quick-release attachment for their sprayer hose. It uses a couple of pieces of plastic and a spring clip to hold the hose onto a special fitting on the underside of the faucet, with a bushing to make a water-tight seal. To install the hose, all you have to do is slide it over the fitting until you hear it click into place. I'm sure that I appreciated that cleverness five or six years ago when I installed the faucet; it's always painful to deal with threaded fittings under the sink.

Unfortunately, I don't remember much about installing the faucet (trauma can have that effect). So when the hose started leaking a couple of weeks ago, I picked up a generic replacement hose at Home Depot, grabbed a flashlight and basin wrench, and threaded my body around the garbage disposal. And spent nearly half an hour trying to figure out how to remove the hose, with its non-standard attachment mechanism. And then spent another hour driving to Home Depot to find an adapter, only to be told that they didn't have one and I'd have to call Delta. I dreaded this call, but ended up with a very pleasant support rep, who put in an order for a free replacement.

When the new hose arrived, it only took a few minutes to install. Yet I can't help but thinking that this special fitting actually cost me several hours, versus the fifteen minutes for a standard threaded fitting. Not to mention a week's worth of annoyance, wiping up leaked water every time I used the faucet.

The lesson for software development should be obvious: clever tricks can save a lot of time in the short term. But unless you remember how they work, they'll probably cost you — or someone else on your team — more in the future.

Thursday, April 15, 2010

intern() and equals()

A lot of programmers have the belief that String.intern() can be used to speed up if-else chains. So instead of writing this:

public void dispatch(String command)
{
    if (command.equals("open"))
        doOpenAction();
    else if (command.equals("close"))
        doCloseAction();
    else if (command.equals("calculate"))
        doCalculateAction();
    // and so on
}

they write this:

public void dispatch(String command)
{
    command = command.intern();
    if (command == "open")
        doOpenAction();
    else if (command == "close")
        doCloseAction();
    else if (command == "calculate")
        doCalculateAction();
    // and so on
}

Their logic seems to be that it's faster to compare integers than to compare strings. And while the only way to know for sure is a benchmark, I'm willing to bet that the intern() approach is actually slower (except, of course, for carefully constructed examples). The reason being that string comparisons are quite fast, while intern() has to do a lot of work behind the scenes.

Let's start with a string comparison, since it's available in the src.zip file that comes with the JDK (in this case, JDK 1.6):

    public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String anotherString = (String)anObject;
        int n = count;
        if (n == anotherString.count) {
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = offset;
            int j = anotherString.offset;
            while (n-- != 0) {
                if (v1[i++] != v2[j++])
                return false;
            }
            return true;
        }
    }
    return false;
    }

Like all good equals() implementations, this method tests identity equality first. Which means that, intern() or not, there's no reason to get into the bad habit of using == to compare strings (because sooner or later you'll do that with a string that isn't interned, and have a bug to track down).

The next comparison is length: if two strings have different length, they can't be equal. And finally, iteration through the actual characters of the string. Most calls should return from the length check, and in almost all real-world cases the strings will differ in the first few characters. This isn't going to take much time, particularly if the JVM decides to inline the code.

How about intern()? The source code for the JDK is available, and if you trace through method calls, you end up at symbolTable.cpp, which defines the StringTable class. Which is, unsurprisingly, a hash table.

All hash lookups work the same way: first you compute a hashcode, then you use that hashcode to probe a table to find a list of values, then you compare the actual value against every value in that list. In the example above, computing the hashcode for any of the expected strings will take as many memory accesses as all of the comparisons in the if-else chain.

So if intern() doesn't speed up comparisons, what is it good for? The answer is that it conserves memory.

A lot of programs read strings from files and then hold those strings in memory. For example, an XML parser builds up a DOM tree in which each Element instance holds its name, namespace, and perhaps a map of attributes. There's usually a lot of duplication in this data: think of the number of times that <div> and href appear in a typical XHTML file. And since you've read these names from a file, they come into your program as separate String instances; nothing is shared. By passing them through intern(), you're able to eliminate the duplication and hold one canonical instance.

The one drawback to String.intern() is that it stores data in the permgen (at least for the Sun JVM), which is often a scarce resource (particularly in app-servers). To get the benefits of canonicalization without risking an OutOfMemoryError, you can create your own canonicalizing map.

And if you'd like to replace long if-else chains with something more efficient, think about a Map of functors.

Tuesday, April 13, 2010

Book Review: Coders at Work / Founders at Work

I just finished reading Coders at Work. I received it as an early Christmas present, and while I've read several other books in the interim, getting to the end was a struggle. This is in sharp contrast to its companion volume, Founders at Work, which I bought last summer and read over the course of a week.

Both books consist of interviews with more-or-less well-known people. In Founders, these range from Steve Wozniak, talking about early days at Apple, to James Hong, who found himself dealing with the viral growth of Hot or Not. In Coders, the interviews range from Simon Peyton Jones (the creator of Haskell) to Donald Knuth (who should need to introduction). All of whom have fascinating histories.

So why did I like one book and not the other? At first I thought it was because I was more familiar with the programmers' stories, particularly those who entered the field at the same time I did. In comparison, the founders' stories were new to me: the challenges of dealing with a viral website, the hunt to find funding. Those stories seemed particularly relevant to me at the time, given that I had just left my full-time job with the thought of founding a software business.

But as I plowed through Coders, I realized that the difference was in the interviewers, not the interviewed. Jessica Livingston, the author of Founders, seemed to let her interviewees go wherever they wished: Woz, for example, took three pages to describe how he got the Apple floppy drive to work. Peter Seibel, by comparison, seemed to have a set list of questions, and forced each interviewee to respond to those questions. At one point I thought of turning “literate programming” into a drinking game, but realized I would be too drunk to ever finish the book.

This approach not only made the interviews unfocused, it made them long. If you put the two books side by side, they appear to be the same size. That's misleading: Founders is 466 pages, while Coders is 617. More important, the former has 33 interviews, the latter only 15. It's easy to read 15 pages in a sitting, but you have to plan for 40+ — or put the book down halfway through and try to regain context when you pick it up.

Bottom line: if you want to learn about the history of our industry, both books are a good choice. If you want to enjoy the process of learning, stick with Founders.