Monday, January 25, 2010

The Urge to Optimize

I've a Servlet filter which performs the following type cast [...] What is the performance impact of such a cast? Is it worth to accept this performance degradation for a better architecture?

That's an exact quote from an online Q&A forum. The first response starts with an incredulous “As compared to handling a HTTP request?” (emphasis as written). To an experienced programmer, the questioner's concern is ridiculous: a cast takes nanoseconds, processing a request takes milliseconds; the difference is six orders of magnitude.

But questions like this aren't rare: on the same day, another user wanted to know the “exact” CPU difference, for allocating byte arrays in Java versus pooling them. Another was concerned about the performance of unsigned versus signed integers in C++. And yet another wondered how best to translate a Python program (not a module, a program) into C, “so that it can be smaller (and faster).”

Most of the answers to such questions involve platitudes about premature optimization; a few suggest profiling. Both of which are valid answers. And more important, well-publicized answers: a Google search on “software profiling” returns 5½ million results. And yet the micro-optimization questions keep coming. Why?

Part of the answer, I think, is that there are also an enormous number of pages devoted to “software bloat” — and an equally large fear that one's own software is bloated. Every culture has its creation myths, and The Story of Mel is a powerful myth for the software industry. Who wouldn't want to be Mel, a person who knew his machine so well that he laid out instructions according to the characteristics of its memory device?

I was one of the “new generation of programmers” that was the audience for that story; I started my professional career at about the same time the story was posted. At the time, I was doing a lot of low-level programming: one job involved inserting NOPs into an 8086 interrupt handler to ensure real-time response characteristics (real-time means predictable, not necessarily fast). Another job involved porting a windowing system that provided Macintosh-level capabilities for the Apple 2e.

To understand the latter project, recognize that this was 1984, the Macintosh had most of its windowing library in ROM, yet developers still struggled to fit a complex program (and user data) into its 128k of RAM. The Apple 2e had 64k of memory, which would have to hold this windowing library and a real program. The person who developed the library was perhaps the equal of Mel. I was nowhere close, and just porting the library was at the limits of my ability. I came into the project thinking I understood how to cram code into limited memory, I left with a much better understanding of the things I knew, and vague hints that there was a whole realm that I didn't.

I don't remember much of it now. With multiple gigabytes of RAM and megabytes of L2 cache, it just doesn't seem to matter. But one thing that I do remember is that it had nothing whatsoever to do with debates over the performance implications of a cast. You can find that lesson in the Story of Mel, if you look closely enough, yet somehow it always seems to be missed.

No comments: