Monday, January 9, 2012

Marking Time

How long does it take your program to do a task? It's not a question that you usually ask — at least not until that task takes “too long.” And while it seems like a simple question, there are two ways to measure program execution time, and the difference between those two measurements is important. For one thing, it can tell you whether you have any chance of making the task take less time.

The typical measurement is “wall clock” time: the time you'd get by watching a physical clock on the wall. Although in the real world, wall-clock measurements usually come from log output:

2012-01-04 17:21:51,434 [main] DEBUG IMAPReader - connecting to imap.gmail.com
2012-01-04 17:21:53,164 [main] DEBUG IMAPReader - login successful
2012-01-04 17:21:53,164 [main] DEBUG IMAPReader - retrieving mailbox list
2012-01-04 17:21:53,286 [main] DEBUG IMAPReader - selecting mailbox: INBOX
2012-01-04 17:21:53,392 [main] DEBUG IMAPReader - executing SEARCH for all messages
2012-01-04 17:21:53,525 [main] DEBUG IMAPReader - fetching message 1
2012-01-04 17:21:53,639 [main] DEBUG IMAPReader - disconnected

This program makes a connection to Google's IMAP mail service and downloads a single message. It takes slightly more than two seconds to do this, which exceeds the one second limit required to maintain the user's train of thought. You might be tempted to fire up a profiler to see where the code is spending its time, and perhaps rewrite some of it.

A better — easier, faster — option is to measure the program's CPU time.

CPU time is the time that the processor actually runs the process. And unless CPU time is a significant fraction of wall-clock time, then any attempts at performance optimization will be wasted. Java lets you measure CPU time — with some caveats — using the ThreadMXBean class from the java.lang.management package.

ThreadMXBean threadMX = ManagementFactory.getThreadMXBean();

long start = threadMX.getCurrentThreadUserTime();
// time-consuming operation here
long finish = threadMX.getCurrentThreadUserTime();

The getCurrentThreadUserTime() method returns the number of nanoseconds that have been spent executing the current thread's code; you'll have to divide by 1000000 to convert to milliseconds. It's user-mode time, so doesn't include time spent by the kernel on behalf of the thread. Since you can only optimize your own code, in my opinion that's the best measurement.

But, as I said, there's a caveat: this method only works if the JVM and OS support thread-level time measurement; if they don't, it will throw UnsupportedOperationException. As long as you're using the Sun JVM, on recent Windows or Linux systems, you shouldn't have a problem. And if you're worried, you can call ThreadMXBean.isCurrentThreadCpuTimeSupported() before getting the CPU time. Also call isThreadCpuTimeEnabled() ; even if the JVM supports thread CPU time, it might not be enabled.

Adding CPU-time logging into the program, we get the following:

2012-01-06 07:28:37,238 [main] DEBUG JakartaIMAPReader - connecting to imap.gmail.com (current cpu = 230000000 nanos)
...
2012-01-06 07:28:39,205 [main] DEBUG JakartaIMAPReader - disconnected (current cpu = 800000000 nanos)

The difference is 570000000 nanoseconds; 570 milliseconds, or about 25% of the overall runtime. Even if you could optimize this code to zero, you'd still have 1.5 seconds of wall-clock time left; the overall speedup would be minimal. While a profiler would have told me this as well, it was a lot faster for me to add a couple of lines of code and edit two logging statements, than to run the profiler and navigate through its results.

In this case, I didn't expect much CPU consumption: I'm connecting to a remote email server, and network calls always take time, during which the program is blocked inside the OS kernel, waiting for data to arrive. And even if I didn't know that network connections behave that way, simply looking at a tool like top or the Windows Task Manager would have told me that my program wasn't consuming much CPU.

A more interesting case is when one thread is taking a lot of wall-clock time but very little CPU time, and the program as a whole is CPU-bound. In this case, the other threads in the program are affecting the thread that you need to optimize. Usually, you'll find some sort of a synchronization barrier is the cause: the thread that you're tracking is waiting for data to be delivered from one or more other threads, which are taking a long time to create that data. By measuring CPU time, you know that you have to (recursively) examine those other threads.

There's one final case that I want to mention, and that's where the CPU is being consumed by the garbage collector. This happens with message-processing applications, where the incoming messages are relatively large (especially large byte[]s that go directly into the tenured generation), and/or generate a lot of temporary objects during processing. Your best tool in this situation is JConsole to see just how much time is being spent in garbage collection.

The bottom line: before thinking about optimizations, first verify whether there's anything you can optimize.