Friday, January 29, 2010

Micro-Optimization is Easy and Feels Good

Have you ever seen code like this?

private String foo(String arg1, int arg2)
{
    StringBuilder buf = new StringBuilder();
    buf.append(arg1).append(" = ").append(arg2);
    return buf.toString();
}

If you ask the person who wrote this why s/he used a StringBuilder rather than simply concatenating the strings, you'll probably hear a long lecture about how it's more efficient. If you're a bytecode geek, you can respond by demonstrating that the compiler generate exactly the same code. But then the argument changes: it's more efficient in some cases, and is never less efficient, so is still a Good Thing. At that point it's usually easiest to (snarkily) say “well, you should at least pre-allocate a reasonable buffer size,” and walk away.

These arguments are not caused by the time quantum fallacy. The StringBuilder fan recognizes that different operations take differing amounts of time, and is actively trying to minimize the overall time of his/her code. Never mind that, in context, string concatenation is rarely a significant time sink. The rationale is always “it can't hurt, and might help.”

This is the same attitude that drives curbside recycling. It feels good to put an aluminum can in the bin: the can can be melted down and reused, saving much of the energy needed to smelt the aluminum from ore. Doing so, however, avoids the bigger questions of whether aluminum is the most environmentally sound packaging material in the first place, or whether you should even be drinking the contents of that can.

In the same way, micro-optimizations let you feel good while avoiding the bigger question of what parts of your code should be optimized. Experienced programmers know that not all code should be optimized: there's an 80-20 (or more likely 90-10) rule at work. But finding the 10% that should be optimized is, quite frankly, hard.

It's not that we don't have tools: profilers of one form or another have been around since before I started working with computers. Java has had a built-in profiler since at least JDK 1.2, and a good profiler since (I think) 1.4. For that matter, some carefully placed logging statements can give you a good idea of what parts of your code take the longest (just be sure to use a logging guard so that you're not wasting CPU cycles!).

The real barrier to profiling — and by extension, intelligent optimization — is coming up with a representative workload. “Representative” is the key word here: you can write a performance test that spends all of its time doing things that your real users never do. Even when you think that you know what the users will do, it's tempting to take shortcuts: to profile an eCommerce application, you might add 10,000 items to a single cart. But the optimizations that you make after this test could be meaningless if the real world has 10,000 carts containing one item. And even if you get that right, there may be different implications running on an empty database versus one that has lots of data.

So yes, it's hard. But ultimately, the brain cycles that you spend thinking about representative workloads will have a greater payoff than those you spend thinking about micro-optimizations. And the end result feels better too, much like foregoing a can of soda for a glass of water.

Wednesday, January 27, 2010

The Time Quantum Fallacy

So why would someone who's writing a servlet worry about the time taken for a cast? I can't believe it's due to a fear of software bloat, simply because the relative impact is so minimal: a person truly concerned about bloat would think of replacing HTTP with a raw TCP connection. Or maybe UDP, to avoid all the overhead of ack/nack.

However, what if the questioner has a fundamental misunderstanding of how long it takes a computer to do anything? To be more specific, a belief that all operations take the same time, so removing a cast is just as good as removing a database lookup.

Experienced programmers will scoff at that idea. But think about it: on a human scale, all computer operations do seem to take the same amount of time. Google can come up with search suggestions in the time it takes you to type a single character; that involves a round-trip web request and some sort of database call.

Computer science classes reinforce this idea, with their emphasis on “Big O” analysis. The implications of such analysis is that an O(n2) algorithm is worse than an O(n logn) algorithm, regardless of what the code actually does — or the context in which the code is executed.

In the real world, of course, context is everything. If I need to do lookups on thousands of items, a hashed lookup (O(1)) seems far more appropriate than a linear lookup (O(N)). But what if I only have a few items? Say, 10 or 20? Or if I only occasionally do a lookup? In this case, the O(1) versus O(N) analysis just doesn't matter. And a linear structure may be easier to construct and use than a hashed structure.

This brings me back to Mel. Mel's genius was not so much in understanding the instruction set, but understanding overall context. Of knowing how long each instruction took, how quickly the memory drum spun, and most important, when this knowledge made a difference. Perhaps that understanding only comes with experience, but I find that even experienced programmers fall into the micro-optimization trap, so there must be yet another factor at work.

Monday, January 25, 2010

The Urge to Optimize

I've a Servlet filter which performs the following type cast [...] What is the performance impact of such a cast? Is it worth to accept this performance degradation for a better architecture?

That's an exact quote from an online Q&A forum. The first response starts with an incredulous “As compared to handling a HTTP request?” (emphasis as written). To an experienced programmer, the questioner's concern is ridiculous: a cast takes nanoseconds, processing a request takes milliseconds; the difference is six orders of magnitude.

But questions like this aren't rare: on the same day, another user wanted to know the “exact” CPU difference, for allocating byte arrays in Java versus pooling them. Another was concerned about the performance of unsigned versus signed integers in C++. And yet another wondered how best to translate a Python program (not a module, a program) into C, “so that it can be smaller (and faster).”

Most of the answers to such questions involve platitudes about premature optimization; a few suggest profiling. Both of which are valid answers. And more important, well-publicized answers: a Google search on “software profiling” returns 5½ million results. And yet the micro-optimization questions keep coming. Why?

Part of the answer, I think, is that there are also an enormous number of pages devoted to “software bloat” — and an equally large fear that one's own software is bloated. Every culture has its creation myths, and The Story of Mel is a powerful myth for the software industry. Who wouldn't want to be Mel, a person who knew his machine so well that he laid out instructions according to the characteristics of its memory device?

I was one of the “new generation of programmers” that was the audience for that story; I started my professional career at about the same time the story was posted. At the time, I was doing a lot of low-level programming: one job involved inserting NOPs into an 8086 interrupt handler to ensure real-time response characteristics (real-time means predictable, not necessarily fast). Another job involved porting a windowing system that provided Macintosh-level capabilities for the Apple 2e.

To understand the latter project, recognize that this was 1984, the Macintosh had most of its windowing library in ROM, yet developers still struggled to fit a complex program (and user data) into its 128k of RAM. The Apple 2e had 64k of memory, which would have to hold this windowing library and a real program. The person who developed the library was perhaps the equal of Mel. I was nowhere close, and just porting the library was at the limits of my ability. I came into the project thinking I understood how to cram code into limited memory, I left with a much better understanding of the things I knew, and vague hints that there was a whole realm that I didn't.

I don't remember much of it now. With multiple gigabytes of RAM and megabytes of L2 cache, it just doesn't seem to matter. But one thing that I do remember is that it had nothing whatsoever to do with debates over the performance implications of a cast. You can find that lesson in the Story of Mel, if you look closely enough, yet somehow it always seems to be missed.

Thursday, January 21, 2010

Debugging 401: Practicum

OK, enough with the generalities, it's time for a concrete example. And yes, I realize that debugging stories are much like fishing stories — and that fishing stories aren't terribly interesting. I've tried to stay focused, but I'll understand if you fall asleep.

A week or so ago, I saw a question on Stack Overflow that let me put all of these techniques to work. In fact, it was a nearly perfect case study of the debugging process, not to mention interesting enough to take an hour out of my day.

The question seemed simple enough: the person posting it was trying to set the F2 key as a menu accelerator in a Swing application (for those not familiar with GUIs, an “accelerator” is a way to invoke a menu item using a single keystroke; for an example, Netscape's “New Window” accelerator is Ctrl-N). On the surface, this seemed like user error, and since I happened to be working on a Swing app at the time, I changed one of my menu items to use F2 so that I could respond with the “correct” code. To my surprise, it didn't work: the F2 keystrokes seemed to vanish.

Other responders had suggested that perhaps the operating system was capturing the keypress event before it got to the program, and this seemed like a reasonable hypotheses: think of Alt-Tab or Ctrl-Alt-Delete under Windows. So, applying Scientific Method, my first task was to falsify that hypothesis, and I created a SSCEP to do so.

This first program was extremely simple: a JFrame with a JLabel as its content, and an event handler that listened for keystrokes and printed them. I ran it, hit F2, and saw that the keystrokes were being reported. Time for a new hypothesis.

My second hypothesis was that the F2 keystroke was being remapped by the operating system, so that it appeared to be some other key. Since my SSCEP was printing the keycodes, and the JDK comes with source code that showed the expected keycodes, I could easily check this. Another hypothesis falsified.

Next hypothesis: the Swing menu system is ignoring F2. Seems unlikely, but I'd apparently exhausted all other answers. So I added a menu to the SSCEP, and printed its invocations. And had another hypothesis shot down: F2 invoked the menu item.

By this point I was hooked: I saw the behavior in my own program, but wasn't able to replicate it in a SSCEP. Time to step back, and think about how my SSCEP differed from my actual application. This was easy: my program's main window was a simple tabular display using a JTable; other than that, just a menu. So I replaced the JLabel in my SSCEP with a JTable, and the F2 key stopped working!

Time for some research, starting with Sun's bug database; nothing there about JTable breaking accelerators. A little more research, and I ended up at the Sun tutorial on key bindings. Oh yeah, I remember that: individual components can intercept keystrokes; it's how the text components implement copy and paste without any program code.

This seemed very promising, so I changed my SSCEP to call getInputMap() on my table, and printed the key bindings. And didn't see a binding for F2. However, I was not yet ready to abandon this hypothesis — largely because I didn't have any others waiting.

So I fired up the debugger. Since I suspected that the table was consuming the keystroke itself, I set my first breakpoint in the processKeyEvent() method. This is a short method, so stepped through it and saw that yes, the event was being consumed due to a key binding.

It took me a little more time, but I discovered that the key binding in question wasn't in the “normal” binding map, which was why I didn't see it when I first examined the bindings. I won't bore you with the details; suffice to say that this discovery came from continued application of the scientific method, along with a little background reading of code and JavaDoc. And led to an answer that the original poster seems to have ignored … oh well, at least I had fun debugging.

Tuesday, January 19, 2010

Debugging 301: Into the Code

You've tried to write a Small, Self-Contained Example program and failed. Whatever the problem is, it only manifests in your full-scale application. What now?

The Scientific Method remains your best tool, but rather than trying to fit a hypothesis to the available data, it's now time to make predictions based on your hypotheses. And, using either log messages or a debugger, watch your program execute until you find the place where your predictions become incorrect. I've written elsewhere about logging, so this post will focus on running your code in a debugger.

The power of a debugger comes from the fact that you can examine the entire state of your program and then change it at will. In some cases, you'll be able to restart methods to see how their execution changes with different starting conditions. However, this power is also a hindrance: it's easy to get overwhelmed by the amount of information available, and even more easy to make assumptions and spend hours attempting to prove them correct.

The key is to form a hypothesis, and then look only at the code and data that are affected by that hypothesis. For example: you've written code that's supposed to upload data to a server, but it doesn't. You could waste a lot of time tracing the code, but it's more efficient to come up with a few hypotheses:

  1. You're not actually running the upload code.
  2. The upload fails due to a communication problem between client and server; the data never gets to the server.
  3. The server is unable to process the file.

Hypothesis #1 is easy to falsify: put a breakpoint at the start of the upload code and run your program. If the breakpoint is hit, then that hypothesis is done. If you don't hit the breakpoint, then you need to come up with hypotheses as to why you didn't (note, however, that falsifying this hypothesis says nothing about hypotheses #2 and #3 — you could have more bugs).

Hypothesis #2 is more difficult to falsify, because there could be many reasons for the upload to appear successful. For example, library code that ignores an exception. It might make sense to step through the code, but again all of the possibilities can waste your time. Better is to use a tool that can directly disprove the hypothesis: a TCP monitor, or perhaps using the logging that's built into your your communications library.

If you get to hypothesis #3, it will soon be time to generate a whole new set of hypotheses, to cover the different cases that would cause the server to reject your file. And it's likely that the debugger will no longer be the best tool. Because now you have enough information to create test cases to validate your code, or an SSCEP to validate the server code.

Wednesday, January 13, 2010

Debugging 202: The SSCEP

When following the Scientific Method, you need a way to test your hypotheses. And this is where the Small, Self-Contained Example program comes in: it's a program that attempts to isolate the specific problem, moving it out of the larger application. If you can cause a problem to appear in a SSCEP, you can start building new hypotheses about why it occurs. If you can't cause the problem to appear, then you've falsified the hypothesis which led to its creation, and can start formulating a new hypothesis.

Often, while you're creating a SSCEP the problem will become obvious — you'll move from the realm of science back into the realm of mathematics. It's remarkable how problems simply disappear when you give yourself the opportunity to step back and think about them without distractions.

So how do you start writing a SSCEP? The traditional way is to start with your program and pare it down until the bug stops appearing. As long as you keep track of what you remove at each step, sooner or later you will remove the buggy code. Then you figure out a better way to write it. However, you may need a lot of revisions to reach that point, and such programs tend to be larger than needed.

I prefer to start from the other end: with an empty main() method (or framework-specific class) and — guided by my hypothesis — add small pieces until it breaks. Many times, the SSCEP bears little resemblance to the real program: I'm isolating a hypothesis, not building an application.

One of the nice side-effects of this approach is that the result looks a lot like a unit or integration test. In fact, particularly when writing layered code, I can create the SSCEP using JUnit. And once I have such a test, there's no reason to throw it away. Instead, it goes into the test suite as a regression test.

Monday, January 11, 2010

Debugging 201: Scientific Method

I've never liked the term “computer science.” Science, and the scientific method, is all about increasing our knowledge of the unknown. Computer programs are deterministic, moving from state to state according to specific rules; there's nothing unknown about them. Well, at least until you introduce a bug into your program. And then scientific method can be extremely useful, helping you to discover the real rules that your program is using.

For those who slept through 8th grade, here's a summary of the scientific method:

  1. Gather data.
  2. Come up with a hypothesis to explain how that data could exist.
  3. Design experiments to prove that hypothesis wrong (this is the hard part).
  4. Once you've run out of experiments, call your hypothesis a theory (8th grade science teachers are now upset).
  5. Repeat.

That's right: science doesn't grow by proving hypotheses correct, it grows by proving them wrong. It's impossible to prove a hypothesis about the natural world, because someone might find data that disproves it. For example, Newtonian mechanics sufficed for the 17th and 18th centuries, but by the end of the 19th century scientists had gathered evidence that showed it didn't always apply. Einstein replaced Newtonian mechanics with general relativity in the early 20th century, and that hypothesis remains with us today. But it might be overturned tomorrow, next year, or a million years from now.

So what does that have to do with debugging?

It's a way to break free from your assumptions. When following the scientific method, your goal is not to prove your assumptions correct, but to prove them wrong. And you devote all of your effort to that proof. Only when you've exhausted all attempts to prove your assumptions wrong are you allowed to form new assumptions.

The first assumption that you need to prove wrong, of course, is that your code is correct. Which is where the Small, Self-Contained Example Program comes in.