Thursday, December 29, 2011

Why I Don't Use Emacs

I didn't start developing code on *nix systems until around 1987. At the time, I was doing a gig with BBN, which a former coworker described as “a halfway house for MIT postgrads.” As such, emacs was clearly the editor of choice, although we used Sun hardware, so vi was available.

On my first day, I sat down with the emacs tutorial. And after a few minutes, tried to save my file. Nothing happened. In fact, nothing I typed had any effect. It took me a few more minutes to figure out what had happened.

I was using a VT-100-compatible terminal (I can't remember the name, but it was a very nice machine with a rotating display that would either show 48 rows or 120 columns). And a VT-100, like all of the ASCII-based terminals that preceded it, used Ctrl-S and Ctrl-Q to suspend and enable the flow of data.

Emacs uses Ctrl-X Ctrl-S to save a file.

My coworkers tried to convince me that this was not a problem: “just remap your keyboard.” But I decided that any editor that could not be used, as-is, on the world's most popular computer terminal was the product of a design ethos that I wanted nothing to do with. I switched to vi and haven't looked back.

Sunday, December 18, 2011

How a GPS Calculates Routes

I recently bought a Garmin nüvi 40LM GPS, and I've been comparing its routes to the ones that I normally drive. There are differences, of course, some of which seem very strange until I think about the routing algorithms involved. On a whim I Googled for websites that described how those algorithms work in a consumer GPS, and didn't find anything in-depth. So here it is.

Disclaimer: I do not work for Garmin, and do not know the exact algorithms or map formats that they use. Up until this week, I did work for the company that produces the digital maps that Garmin uses. However, you won't find any trade secrets here; everything is based on publicly available documentation, well-known algorithms, and my experience implementing those algorithms.

Now that that's out of the way, the place to start is with how digital maps describe roads. Humans tend to think of roads as single continuous entities: Broad Street in Philadelphia for example. It runs 11.5 miles, from the northern border of the city to the southern border. You might think of it as North Broad and South Broad, centered on City Hall, but you wouldn't divide it further than that.

A digital map, however, divides Broad Street into hundreds of road segments, some only 30 meters long. This is because the GPS computer looks at the entire road network as a graph, with each intersection a separate node. Some maps divide the road network further: the Navteq map, for example, inserts a node wherever the speed limit changes. This makes for a lot of data; Navteq tracks over 35 million distinct road segments in North America, and more are added with each map release.

Which causes a problem for routing algorithms, because they work by walking through the graph to find the best path from start to finish. Take a look at the picture below, which is the “GridWorld” used to explain graph algorithms. There are hundreds of ways to get from point A in the upper left, to point P in the lower right.

Two of those ways are shown in the following pictures. The left picture is a roundabout way, taking 10 steps, while the right picture is one of many shortest paths that take 6 steps. A simplistic algorithm would have to examine every possible route, including the roundabout ones, to discover the short ones. Not insurmountable in this example, which has only 24 streets, but in the real world of 35 million road segments, it would be impossible (or at least computationally infeasible).

Graph Traversal Algorithms

Computer scientists have been researching graph traversal algorithms for over 50 years, and have developed a variety of algorithms that do a better job than looking at every possible path. One of them, A*, is typically used for GPS routing.

The A* algorithm looks at a map much like a human would. When we want to pick a route, we mentally draw a straight line from the start to the destination, and pick roads that are close to that line. A computer can't perceive the map at a whole, but it can do something similar: when it is at an intersection (node) with multiple possibilities, it picks the node that gives the shortest total route length if it could go directly from that node to the destination. This is easier to understand with pictures.

In the first step, we have two choices: we can go from A to B or from A to E. As far as the computer is concerned they're both equivalent: there's no existing path and the direct line from each of those points to P is the same length. So it picks one arbitrarily, in this case the green path via B. But it remembers the blue path through E; you'll see why later.
From B there are again two choices: C and F (going back to A is not an option). However, these two choices are not equivalent. The direct path from C is longer than the path from F, which makes the overall route longer, so the computer chooses F. But again, it remembers the path via C.
At intersection F, there are three choices: E, G, and J. But the computer remembers that it already has a path that goes through E. And since the direct line from E to P is the same no matter how you get there, it will discard — and forget — the A-B-F-E routing, because A-E is more direct. That leaves it with routes through G and J, and again, this is an arbitrary choice.
Eventually, the computer will make its way to P. Along the way it will remember several routes, and discard others, always following the shortest total path.

Of course, as you've seen, this example has several shortest paths, and several places where it made an arbitrary decision. It could have just as easily picked A-B-F-G-O-P as A-B-F-J-K-L-P. What the computer would not do is pick A-B-C-D-H-L-P, even though it's the same length, because that would have taken it away from the direct line from A to P.

Remembered routes are also used to “backtrack” from dead ends, as in the pictures below. When the computer gets to point K, its only option is to go to G. However, it already has a path through G that is shorter. Since there's no other path, it discards the route through K and examines J and G again.

From here things get interesting, because both routes take it away from the direct line between A and P. But I think I've spent enough time in GridWorld. If you're interested, simply repeat the rule: shortest total path, combining the actual path with the calculated direct distance to the goal.

Graphs and the Real World

GridWorld is a useful way to understand graph traversal algorithms, but the real world isn't a grid — even if some people seem to wish it were. And distance is rarely the most important factor in choosing a route, elapsed time is. Given the choice between driving 50 miles on a highway and 30 on winding back roads, only motorcyclists choose the shorter route.

The simplest way to calculate elapsed time is to look at the speed limit of the road segments, and assume that traffic will move at that speed. More sophisticated implementations take into account stop signs and stoplights. If you have real-time traffic available, that becomes an important consideration as well: you don't want to get on the highway (with no way off) if you know that you'll be stopped in a five mile backup. The most sophisticated algorithms go so far as to weight right-hand turns above left-hand.

There's also the problem of CPU and memory. An exhaustive A* search works well for short routes — say five miles or less. But if you're going from Philadelphia to San Francisco, or even Philadelphia to Harrisburg, the number of remembered candidate routes would quickly use up the memory of your GPS.

To avoid this problem, digital maps classify roads based on their suitability for long-distance travel; in the Navteq map, these are known as “functional classes.” The top category (FC1) are generally interstate highways, the lowest (FC5) are neighborhood roads. In the middle are various types of highways and “arterial” roads. Of the 35 million road segments in North America, approximately 1 million are FC1 and FC2, 5 million are FC3 and FC4, and 28 million are FC5.

For long-distance routes, the GPS unit will try to find the shortest path from wherever you are to the nearest arterial. Depending on how far away your destination is, it might then follow the arterial network to the nearest limited-access highway. Once on the arterial and/or highway network, it tries to get you as close to your destination as possible. Then it steps down, from highway, to arterial, to neighborhood roads, until finding your destination. The goal is to find the destination within a hundred or so road segments; after that, the computation time and memory consumption for remembered routes becomes excessive.

Strange Choices: An Example

Once you understand how the GPS computes routes, you can explain why it doesn't do what you think it should. For example, last weekend my wife and I drove about 25 miles from our home in northwestern Philadelphia to a concert that her uncle was giving in one of the far northwestern suburbs.

On the way there, the GPS picked approximately the route that I would have: from my street (FC5) to a nearby arterial (FC4, maybe 3), to a limited-access state highway (FC2), to the PA Turnpike (FC1), exiting onto a state highway (FC3/4) and staying on it until we'd reached our destination. If I were planning the route, however, I would have skipped the arterial nearest our house and instead driven a mile to a parallel one with fewer traffic lights and less traffic. To the GPS, however, given its limited knowledge of local road conditions, the first route was better.

On the way home, the GPS did something else unexpected: it ignored the PA Turnpike, with its 65 mph speed limit, and kept us on state highways (which were generally 45 mph). Looking at a map gave a hint: the state highways almost exactly followed the line between the concert and our house. My colleague Max assures me that A* will pick the shortest route, but that's a theoretical guarantee. Inside the GPS, with its limited memory and CPU, theory has to meet practice.

There are several practical reasons for why the GPS didn't take us on the highway — and again, I don't work for Garmin, so this is pure speculation. One possibility is that it actually considered the state highway route to be shorter. In terms of absolute distance it was (24 miles versus 25), and based on the speed limits stored in the digital map, the travel time may have come out shorter as well (in practice, it took 10 minutes longer). Adding credence to this theory, there were sections of the route where the GPS displayed a 55 mph speed limit when the actual limit was 45 (incidentally, Navteq wants you to tell it where the map is wrong).

Alternatively, given the short trip distance, the GPS may have chosen to ignore the limited-access highway network. Although speed limits are higher, such highways often take you far out of the way. This means more CPU time and memory spent on finding a route from the highway to your destination, and one of the key design goals for a user interface is rapid response. That goal may trump the goal of finding the absolute shortest path. Against this theory is the fact that the GPS routed us on the highway to get to the concert.

A third possibility — and the one that I think most likely — is that the GPS will prune routes from the “open list” to minimize memory consumption. If you recall the diagrams at the start of this post, the number of “blue routes” adds up quickly. Each of these routes consumes scarce memory, so it makes sense to limit the number of paths that the GPS remembers. It's possible that one of the discarded paths is the shortest. But most people care more about getting to their destination period, rather than getting there a few minutes earlier.

Wrap-up

I'm going to finish with a bit of gushing. In my opinion, GPS is one of the most amazing examples of modern technology. A device the size of a deck of cards listens for signals from three or more satellites orbiting 12,000 miles above the earth. Then it calculate its position by comparing the different times that it took those signals (traveling at the speed of light) to arrive. Next, it figures out what road segment you're on, out of 35,000,000 possibilities, and within a fraction of a second it can tell you how to get to any other segment.

Updated December 27. Max complained that I over-simplified the explanation of A* to the point where I “should just say it's magic.” Hopefully he'll be happier with this version.

Tuesday, December 6, 2011

Actron CP9580: How Not To Do An Update

The Actron CP9580 is an automotive scantool. For those who aren't DIY mechanics, it connects to your car's on-board computer and reports on engine operation and trouble codes (ie, why your “check engine” light is on). My car has passed 100,000 miles, and received its first (hopefully spurious) trouble code a few weeks ago; the $200 investment seemed worthwhile.

Except that right now, the tool is an expensive doorstop, sitting in the manufacturer's repair shop, and I wasted a couple of hours last week. All because I ran the manufacturer-supplied update, which failed catastrophically. As I look back on the experience, I see several problems with their update process, some of which are rooted in a 1990-vintage design mentality, but all of which represent some fundamental failing that every developer should avoid.

#1: They used their own protocol to communicate with the device

In 1990, most embedded devices used an RS-232 serial port to communicate with the outside world. Manufacturers had no choice but to develop their own communications protocol, using something like X-Modem for file transfers.

But the CP9580 has a USB port. And I'm betting that it has flash memory to store its data files. Both of which mean that a custom protocol doesn't make sense. Instead, expose the flash memory as a removable drive and let the operating system — any operating system — manage the movement of data back and forth. Doing so should actually reduce development costs, because it would leverage existing components. And it would make user-level debugging possible: simply look at what files are present.

#2: They deleted the old firmware before installing the new

Again, a vestige of 1990, when devices used limited-size EEPROMs for their internal storage. Not only was the amount of space severely limited, but so were the number of times you could rewrite the chip before it failed. Better to simply clear the whole thing and start fresh.

This is another case where flash memory and a filesystem-based design change the game entirely. Consumer demand for memory cards has pushed the price of flash memory to the point where it's not cost-effective to use anything less than a gigabyte. And with a filesystem, version management is as simple as creating a new directory.

It's also a case where the game changed and the system designers half-changed. In the old days, communications code was in permanent ROM. If an update failed, no problem: you could try again (or reload the previous version). However, it seems that the CP9580 stores everything in flash memory, including the loader program (at least, that's what I interpret from the tech support person's comments, but maybe he was just being lazy).

The iPhone is a great example of how to do updates right: you can install a new revision of iOS, run it for months, and then decide that you want to roll back; the old version is still there. But it's not alone; even an Internet radio is smart enough to hold onto its old software while installing an update.

#3: They kept the update on their website, even though they'd had reports of similar failures

The previous two failings can be attributed to engineers doing things the same way they always have, even when the technology has moved forward. This last failure runs a little deeper. After the update failed, the second time I called tech support I was told that “we've had several cases where this happened.” Yet the software was still on the website, without a warning that it might cause problems. And it's still there today.

One of the best-known parts of the Hippocratic Oath is the exhortation to “do no harm.” Programmers don't have to swear a similar oath, but I think they should — if only to themselves. Too often we look at the technical side of a problem, forgetting that there's a human side. Sometimes the result ends up on The Daily WTF, but more often it ends up quietly causing pain to the people who use our software.

Thursday, October 20, 2011

Defensive Copies are a Code Smell

This is another posting prompted by a Stack Overflow question. The idea of a defensive copy is simple: you have a method that returns some piece of your object's state, but don't want the caller to be able to mutate it. For example, String.toCharArray():

public char[] toCharArray() {
    char result[] = new char[count];
    getChars(0, count, result, 0);
    return result;
}

If you simply returned the string's internal array, then the caller could change the contents of that array and violate String's guarantee of immutability. Creating a new array preserves the guarantee.

This technique seems to be a good idea in general: it ensures that the only way to change an object's state is via the methods that the object exposes. This in turn allows you to reason about the places where an object can change, and will make it easier to identify bugs caused by changing data. There's even a FindBugs check for code that exposes its internal state this way (along with a related case, where an object maintains a reference to mutable data that was passed to it).

But are defensive copies really useful in practice?

The core argument in favor seems to be that you can't trust your fellow programmers. In some cases, this is reasonable: security-related classes, for example, should never blindly accept or hand out pieces of their internal state. And in a large organization (or open-source library), it's unlikely that other programmers will understand or care about your intended use of an object — especially if they can save a few lines of code by using it in an unexpected way.

As an argument against, every defensive copy consumes memory and CPU time. String.toCharArray() is a perfect example of this, particularly with large strings, which may be copied directly into the tenured generation. If a programmer blindly calls this method within a loop, it's quite possible for the garbage collector to eat up most of your CPU.

Moreover, there's almost always a better solution. Again using String.toCharArray() as an example, why do you need the character array? I would guess that 99% of the time, the reason is to iterate over the characters. However, String.charAt() will do the same thing without a copy (and Hotspot should be smart enough to inline the array reference). And you should be calling String.codePointAt() anyway, to properly handle Unicode characters outside the Basic Multilingual Plane.

That's all well and good for strings, but what about your application objects. Continuing the theme of “there's a better way,” I ask: why are your objects providing access to their internal state?

One of the principles of object-oriented programming is the Law of Demeter, which holds that collaborating objects should not know anything about each others internal state. The goal of Demeter — just like defensive copies — is to allow you to reason about your objects and their interactions within the application. But it also drives your design toward action: rather than simply holding data, an object should do something with that data. To me, this is what separates object-oriented programming from procedural programming.

Of course, as with any law, there are times when Demeter can and should be broken (for example, data transfer objects). But before breaking the law, think about the consequences.

Saturday, October 1, 2011

The Role of Automated Tests

Automated testing is moving into the mainstream, adopted as a “best practice” by more companies each year. But why? Here are my reasons, originally intended as bullet points in a presentation on how to write tests.

Tests verify that the program behaves as expected

Let's get one thing out of the way up front: tests can find bugs, but they can't prove that no bugs exist. Or, as my friend Drew puts it: “tests can only show that your incorrect assumptions are internally consistent.”

However, as you increase test coverage, using well-designed tests, you gain confidence that the program will do what you want it to. In other words, that there aren't any obvious bugs. And unless you're writing code for the space shuttle, that's probably good enough.

Tests verify that the program continues to behave as expected when changed

The major portion of a program's development happens after it's released (80% is commonly quoted, but I couldn't find an authoritative reference). The bugs that got through testing will be found by end-users. Requirements will change, ranging from a simple UI facelift, through the addition of new business rules, to the deep structural changes needed to support increased load.

And when you change code, you risk breaking it. Usually in a place that you didn't think would be affected. Even in well-written code, there may be hidden side-effects. A test suite can protect you from the unintended consequences of change, provided again that it has complete coverage and well-designed tests. In my opinion, this is how automated tests provide the most value to the organization.

Of course, a test suite can also become part of a change. If your business rules change, then your tests have to change as well. This should be less of an issue at the level of “unit” tests, but it still happens. Unfortunately, many organizations consider such changes as an undesired cost. Instead, they should view them as a warning that the code may contain hidden dependencies on the old behavior, and budget extra time for release.

Tests serve as documentation

The idea of test-as-specification has long been part of Agile orthodoxy. Although, in practice, it can take a lot of work to make that happen with mainstream testing tools. I know that I've written more than my share of test methods with names like testOperation(). But if you have the discipline, a method named testFailureWhenArgumentsWouldCauseIntegerOverflow() is far more useful.

Tests give you a chance to think about your design

To me, this has always been the main benefit of testing: “if it's hard to test, it will be hard to use.” Of course, you can take this to an extreme: I have actually been asked by a traditional QA developer to store an application's internal data in comma-delimited format so that they could validate it (in that case, the binary format already took over 1GB, and was heavily optimized for access speed). While actively harming your design in the name of testability is foolish, it's not the common case.

More realistic is some code that I recently refactored: a single class that created a listener for external data, applied some business logic to the messages received, and sent messages based on that logic. As written, this code was impossible to test without instantiating the entire messaging framework. After refactoring, the business logic was in its own class, with separate listener and sender objects that could be mocked for testing. And that core business logic could now be tested in the form of a specification, with test names like testIgnoresThirdAndSubsequentDuplicateMessages().

Wednesday, August 31, 2011

Using a Local Repository Server with Gradle

I've been doing a little work with Gradle recently. And one of the things that I find “less than optimal” is that the build script holds far too much knowledge about its environment, which means that you have to jump through some hoops to make those scripts portable. Not a huge problem for in-house development, but if you're making an open-source library, you don't want everyone else to reconfigure their world to match yours.

One particular problem is how to find your dependencies. A typical build script has a repositories section that lists all the places to look. Here's a simple example, that looks first in the local Maven repository, followed by the Maven Central:

repositories {
    mavenLocal()
    mavenCentral()
}

This is a portable build script — although I have no idea how dependencies might find their way to the local Maven repository, since Gradle uses its own dependency cache. A better build script might want to use a local repository server rather than constantly hitting Maven Central:

repositories {
    mavenRepo urls: 'http://intranet.example.com/repository'
}

That works, but now you can't share the build script with anybody else, unless they edit the script to use their own repository server (assuming they have one), and remember not to check in their changes. The solution that I came up with is to store the repository URL in $HOME/.gradle/gradle.properties, which is loaded for every build.

internalRepositoryUrl: http://repo.traffic.com:8081/nexus/content/groups/public/

Then, the build script is configured to add the local server only if the property is defined:

repositories {
    mavenLocal()

    if (project.hasProperty('internalRepositoryUrl') )
        mavenRepo urls: project.internalRepositoryUrl
    else
        mavenCentral()
}

It's portable, but it's ugly. When searching for solutions, I saw a couple of postings indicating that gradle.properties will eventually be allowed to contain expressions as well as properties. That day can't come soon enough.

Wednesday, August 17, 2011

Meta Content-Type is a Bad Idea

Following last week's posting about “text files,” I wanted to look at one of the most common ways to deliver text: the web. The HTTP protocol defines a Content-Type header, which specifies how a user agent (read: browser) should interpret the response body. The content type of an HTML document is text/html; breaking from other “text” types, its default character set is ISO-8859-1. However, you can specify the document's encoding as part of the Content-Type, and most websites do.

All well and good, except that an HTML document can specify its own encoding, using the http-equiv meta tag:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="fr" dir="ltr" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Wikipédia, l'encyclopédie libre</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Wikipedia does “meta” Content-Type about as well as you can: the page is delivered with a Content-Type header specifying UTF-8, and it's an XHTML document (which implies UTF-8 encoding in the absence of a prologue). The only questionable practice with this page is the location of the <title> tag: it contains UTF-8 content, but appears before the in-document Content-Type. But in this case the in-document content type specification is superfluous.

Not all non-English pages do as well. The Montreal Craigslist page, for example, specifies ISO-8859-1 in the HTTP response, but UTF-8 in the meta tag.* It is a testament to browser developers adhering to Postel's Law that you can read the site at all.

From a “layered architecture” perspective, the embedded content-type declaration is ugly. You could argue that it self-describes a stand-alone document, much like the prologue in an XML document. But there's an important difference: the bytes of an XML prologue are rigidly specified; the parser doesn't need to know the encoding to read them. The <meta> tag can appear anywhere in the <head> of an HTML document. Including, as shown by Wikipedia, after content that requires knowledge of the encoding.

While writing this post, I did a quick search for a history of the embedded Content-Type specification. I turned up a W3C page that recommended always using it, but did not give a rationale. And I found a page that claimed specifying a character set in the HTTP response would “break older browsers.” As the page did not list those browsers, and did not appear to be written by someone involved in browser development, I'm not sure that I believe it.

For my personal website, I rely on the HTTP header, and don't use the meta tag. But I also limit myself to US-ASCII text, with HTML or numeric entities for anything that isn't ASCII. I'm not going to suggest that you remove the tag from your website (who knows, your biggest customer might have an “older browser”). But if you do use it, it should be the first thing in your <head>.

More important than whether the <meta> tag is present is that you actually get the encoding right, both in the page and in the HTTP headers.

With servlets, it's easy: the first line of your service method should be a call to ServletResponse.setContentType().

response.setContentType("text/html;charset=UTF-8");

This will set the Content-Type header and also configure the object returned by ServletResponse.getWriter(). Don't, under any circumstances, write HTML data via the object returned by ServletResponse.getOutputStream(); it exists for servlets that produce binary content.

With JSP, put the following two directives at the top of each page.

<%@page contentType="text/html"%>
<%@page pageEncoding="UTF-8"%>

These are translated into a call to ServletResponse.setContentType(), and are also used by the JSP container itself to parse the page. If, after reading this posting, you don't feel comfortable writing self-describing files, you can also use a JSP property group in your web.xml.

One final thing: if you do choose to specify content type via http-equiv, make sure that it matches what your server is putting in the HTTP response. Otherwise, you risk having your site used as an example by someone writing about encodings.


* The Paris Craigslist omits the <meta> declaration, but retains ISO-8859-1 in the HTTP response. Which explains why all of the ads say “EUR” rather than €.

Friday, August 12, 2011

"Text File" is an Oxymoron

Back in the early 1990s, life was easy. If you worked in the United States, “text” meant ASCII. If you worked in Canada or Europe, it might mean with ISO-8859-1 or windows-1252, but they were almost the same thing … unless you dealt with currency and needed to display the new Euro symbol. There were a few specialists that thought of text as wchar_t, but they were rare. Companies hired them as contractors rather than full-time employees.

This US-centric view of text is pervasive: any MIME Content-Type that begins with “text” is presumed to be US-ASCII unless it has an explicit character set specifier. Which often trips up people who create XML, which presumes UTF-8 in the absence of an explicit encoding (solution: use application/xml rather than text/xml).

This was the world that Java entered, and it left an indelible imprint. Internally, Java looked to the future, managing strings as Unicode (now UCS-2). But in the IO package, it was firmly rooted in the past, relying on “default encoding” when converting those two-byte Unicode characters into bytes. Even today, in JDK 7, FileReader and FileWriter don't support explicit encodings.

The trouble with a default encoding is that it changes from machine to machine. On my Linux machines, it's UTF-8; on my Windows XP machine at home, it's windows-1252; on my Windows XP machine from work, it's iso-8859-1. Which means that I can only move “text” files between these boxes if they're limited to US-ASCII characters. Not a problem for me, personally, but I work with people from all over the world.

At this point in time, I think the whole idea of “text” is obsolete. There's just a stream of bytes with some encoding applied. To read that stream in Java, use InputStreamReader with an explicit encoding; to write it, use OutputStreamWriter. Or, if you have a library that manages encoding for you, stick with streams.

If you're not doing that, you're doing it wrong. And if you aren't using UTF-8 as the encoding, in my opinion you're doing it poorly.

Wednesday, August 10, 2011

Defining Done

It seems that a lot of Agile teams have a problem defining when a task or story is “done.” I've seen that on the teams that I've worked with, occasionally leading to heated argument at the end of a sprint.

For stories, the definition of done is simple: it's done when the product owner says it is. That may horrify people who live and die by acceptance criteria, but the simple fact is that acceptance criteria are fluid. New acceptance criteria are often discovered after the story is committed, and although most of these should spur the creation of a new story, more likely is that they simply get added to the existing one. And sometimes (not often enough), the product owner decides that the story is “good enough.”

At the task level, however, there are no acceptance criteria. Most teams that I've seen have worked out some measure of doneness that involves test coverage and code reviews. But the problem with such criterial is that they don't actually speak to what the task is trying to accomplish. The code for a task could be 100% covered and peer reviewed, but contribute nothing to the product. I think this is especially likely in teams where individual members go off and work on their own tasks, because peer reviews in that situation tend to be technical rather than holistic. As long as the code looks right, it gets a pass.

In my experience, the last chance for holistic review is the sprint planning meeting. Unfortunately, by the time the team gets to tasking, it's often late in the day and everyone wants to go home. But I've found that by simply asking “how do you plan to test this,” the task descriptions get more exact, and — not surprisingly — the time estimates go up.

Saturday, August 6, 2011

The Horizontal Slice

Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.

That's one of the basic principles of the Agile Manifesto, and a common approach to satisfying it is the “horizontal slice&rdquo: a complete application, which takes its inputs from real sources and produces outputs that are consumed by real destinations. The application starts life as a bare skeleton, and each release cycle adds functionality.

In theory, at least, there are a lot of benefits to this approach. First and foremost is the “for tomorrow we ship” ethos that a partially-functioning application is better than no application at all. Second, it allows the team to work out the internal structure of the application, avoiding the “oops!” that usually accompanies integration of components developed in isolation. And not least, it keeps the entire team engaged: there's enough work for everyone, without stepping on each others' toes.

But after two recent green-field projects that used this approach, I think there are some drawbacks that outweigh these benefits.

The first is an over-reliance on those “real” sources and sinks; the development team is stuck if they become unavailable. And this happens a lot in a typical development or integration environment, because other teams are doing the same thing. Developing mock implementations is one way to avoid this problem, but convincing a product owner to spend time on mocks when real data is available is an exercise in futility.

The second problem is that software development proceeds in a quantum fashion. I've written about this with regards to unit testing, but it applies even more to complete projects. There's a lot of groundwork that's needed to make a real-world application. Days, perhaps weeks, go by without anything that could be called “functional”; everything is run from JUnit. And then, suddenly, the's a main(), and the application itself exists. Forcing this process into a two-week sprint cycle encourages programmers to hack together whatever is needed to make a demo, without concern for the long term.

And that results in the third problem — and in my opinion the worst: high coupling between components. When you develop a horizontal slice, I think there's less incentive to focus on unit tests, and more to focus on end-to-end tests. After all, that's how you're being judged, and if you get the same level of coverage what does it matter?

On the surface, that's a reasonable argument, but unit tests and integration tests have different goals: the latter test functionality, the former lead you to a better design. If you don't have to test your classes in isolation, it's all to easy to rely on services provided by other parts of the application. The result is a barrier to long-term maintenance, which is where most of a team's development effort is spent.

So is there a solution? The best that I can think of is working backwards: creating a module at a time, that produces real, consumable outputs from mock inputs. These modules don't have to be full-featured, and if fact shouldn't be: the goal is to get something that is well-designed. I think that working backwards gives you a much better design than working forwards because at every stage you know what the downstream stage needs, even if those needs change.

I want to say again that this approach is only for building on the green field. To maintain the building metaphor, it's establishing a foundation for the complete system, on which you add stories (pun intended).

Tuesday, July 19, 2011

Remaining Relevant

Yesterday I republished one of the articles on my website, a how-to guide on dealing with memory problems. A few weeks ago I'd been working on a memory leak in a large app, and decided that the section on heap histograms should be expanded. Once I started editing, however, the changes just kept coming. I found some places that were unclear, some that were too 32-bit-centric, and some things that were just plain wrong. I think the only section that remained unchanged was the one on permgen. The structure of the rest of the article remained the same, but almost all of the text is different, and the article doubled in size.

There are a couple of ways that I could look at this. The first, more positive way, is that I've learned a lot about debugging memory errors in the two years since I first published the article. Except … I really haven't. And the tools haven't changed that much in the interim either (although 64-bit machines are becoming ubiquitous). And after twenty-five or so years of writing professionally, I'm not convinced that I've suddenly become better at explaining technical topics.

I think that the answer is that there's always more depth, more niches to explore. But most of them are pointless. I could spend pages on “bugs that I have known,” and the only result would be that the 241 people that Google Analytics says read this blog regularly would stop. So plumbing the depths isn't the right approach.

And yet … the articles on my website are supposed to be compendiums of everything that I know on a topic. Seeing how much that one article changed has me worried that I should go through all the others and at least review them. After all, even Knuth had second editions. And I know there are some things that I'd like to change.

But against that is the philosophy of “good enough,” also known as “ship it!” I could spend hours wordsmithing, trying to get each sentence just right. But I don't think that time would make the underlying message any different. Once you've reached the point of proper grammar and logical sentence structure, you've reached a point of diminishing returns. Taking the next step may be valid if you're looking for a Pulitzer, but they don't give out Pulitzers for technical writing.

Plus, there's a whole backlog of new things to write about.

Friday, June 24, 2011

Testing Boundaries

In Programming Pearls, Jon Bentley uses binary search as a case study of how difficult it can be to implement a seemingly simple algorithm (emphasis added).

I've assigned this problem in courses for professional programmers. The students had a couple of hours to convert the description above into a program in the language of their choice […] We would then [validate with] test cases. In several classes and with over a hundred programmers, the results varied little: ninety percent of the programmers found bugs in their programs

Bentley then goes on to develop an implementation using formal verification techniques: identifying an “invariant” condition, and proving that that invariant in fact does not vary during execution of the algorithm. Unfortunately, his code has a bug, one that found its way into the JDK.

While one can point to this bug as a refutation of formal correctness proofs,* I'm more interested in whether testing is a valid alternative. Bentley clearly didn't think so: “I wasn't always convinced of the correctness of the code in which no bugs were found.” His concern is reasonable: tests can only verify that a specific input leads to a specific output. It may be sheer coincidence that the code under test produces this output, and it may give incorrect results for different input.

Which means that a simple test, one that creates an array and searches for values in that array, isn't enough. It could be that your code works for an array that has an even number of elements, but fails for one that has an odd number. Or perhaps it works fine for all arrays with more than one element, but throws an exception if passed an empty array. And how do you know that you're really doing a binary search?** One obviously infeasible approach is to do exhaustive testing of all possible arrays.

An alternative is to adopt the “spirit of testivus,” let a tool generate random testcases for you, and assume that a sufficiently high number of tests means everything works. But from the perspective of “this is correct,” however, random tests are the same thing as arbitrary tests: they may find a bug, they may not; you'll never know if others exist.

A better approach is to adopt the core of Bentley's argument, which is to understand the inflection points of the algorithm: those places where the behavior potentially changes. For binary search, I started with the following:

  • 0: every method that involves indexing should have a test for a zero-length object (and in the case of methods using Strings, a test that passes null).
  • 1: This is the end-point of a binary search loop: you have a single element in your array, and either it's the correct value or it isn't.
  • 2, 3: These are the points at which a binary search divides the array and recurses (or loops).
  • 4: This is a superfluous test. I wrote it because I wanted to verify two passes through the divide-and-conquer logic, but then realized there was no point. Still, every test is sacred, so I checked it in.

If you remember high school algebra, you're realize this is proof by induction. If you can demonstrate that you have exhaustively tested every array size through all inflection points, then you can reasonably say that it will return the expected value for larger sizes. Add some hook points that record each pass through the loop (this could be a Comparator that counts its invocations), and you can verify that it's an O(logN) search.

Except … my code doesn't actually test all inflection points. There's another inflection point at Integer.MAX_VALUE / 2 + 2: the possibility of integer overflow. And unless you have a 64-bit JVM and over 4 Gb of physical memory, you have no way to test this case.

I don't have the answer. Tests can develop a high level of confidence, but they can't make guarantees. And to develop that level of confidence, you essentially create a formal proof. But compared to the situation with no tests, that level of confidence may be enough.


* Something that I am all too willing to do: I have a degree in History rather than Computer Science because at nineteen years old I realized that there was a big difference between proof on paper and bytes in memory. But that's a topic for another posting.

** I believe that binary search, like quicksort, is an algorithm that cannot be implemented following the strict rules of test-driven-development. Fixing your tests by doing “the simplest thing that could possibly work” will give you a linear search.

See www.youtube.com (NSFW if you don't have headphones, potentially offensive if you do).

Actually, you could test a binary search that works on arbitrary indexable objects rather than Java arrays: create an implementation that stores its data on disk. Although when Java's search methods were being developed (circa 1999), a typical workstation disk wouldn't have had the needed capacity.

Monday, June 20, 2011

Testing the Complete API

In my article on code coverage I was careful to stress that a coverage tool can only tell you what you haven't tested. It can't tell you if you're missing mainline code. Which made a recent bugfix to one of my open-source libraries even more embarrassing: the class had 100% coverage but was missing 7 characters of mainline code. And those seven characters were the difference between correct behavior and a bug that would manifest as non-deterministic behavior in consuming code.

When I have a bug, my first task is to write a unit test that exposes it (and demonstrates that I fixed it). Then I look in all similar code for the same bug (and in this case I found it). Finally, I think about how the bug came into existence, and more importantly, how come I didn't uncover it when I was first implementing the class and writing tests.

In this case, the answer was simple: the bug was in the single-byte read method of an InputStream implementation, and since I rarely use that method I got sloppy when writing tests for it. I wrote “happy path” tests that validated correct behavior, but did not try to trick the class into exhibiting incorrect behavior.

This experience has pushed me to think, again, about the role of unit tests. And it reminded me that there's a big difference between the way that a software developer thinks and the way that a good QA tester thinks. The former wants the code to work, the latter wants it to fail. Both mindsets are needed, but I'm not sure that any one person can hold both at the same time. And that's what's needed for unit tests to truly exercise code.

And I also wonder how such tests fit into the mindset of test-driven design. In my experience, TDD is very much focused on the happy path. Each test describes a small piece of the overall functionality, a “specification in code.” Can detailed edge-case tests contribute to that specification without cluttering it? And if yes, how does a developer switch mindsets to come up with them?

Monday, May 16, 2011

The difference between a programmer and a software engineer

Is that the engineer thinks of all code as part of a larger system.

And understands the constraints that a piece of code imposes on that system.

As well as the ways that a piece of code can be mis-used.

At least, that's the way I see the difference.

What I don't understand is why anyone would want to be a programmer.

Or why any development organization would hire one.

Monday, May 9, 2011

Big-O Isn't Everything, Part 2

If you need to build a lookup table for some large number of values, what data structure should you choose?

Obviously, you want a hashed data structure. In Java, a HashMap or HashSet. It will provide O(1) performance, versus O(logN) for a binary search and O(N) for a linear search. And the “Big-O” performance of an algorithm becomes increasingly more important the larger the table becomes. Or does it?

To find out, I wrote a micro-benchmark that compared different approaches for varying table sizes. One version used a HashSet, another used an Integer[] and Arrays.binarySearch(), and a third used int[] and a home-grown binary search. For each of them, I probed the lookup table 10,000,000 times with random values, and recorded the execution time in seconds.

Table Size HashSet Binary Search, Object Binary Search, Primitive
1,000 0.37 1.00 0.88
10,000 0.43 1.99 1.19
100,000 0.54 3.18 1.60
1,000,000 1.00 5.75 2.33
10,000,000 1.19 12.70 4.99
100,000,000 1.58 20.68 8.50
1,000,000,000 n/a n/a 13.28

No surprises: the execution time for the hashed version remained relatively constant, while the binary search time increased logarithmically with the size of the table.* But this table leaves out two important pieces of information.

The first is that real-world applications rarely spend all their time doing lookups. Instead, they do a lookup and then do something else with the results. And while the HashSet outperformed the Integer[] search by a factor of 13, the absolute performance of the latter is two microseconds per iteration. For a real-world application, this is almost certainly sufficient.

Even so, it seems to make sense to go with the HashSet. But there's another factor at work: absolute performance isn't the only way to measure this benchmark. And if you noticed the “n/a” at the bottom of the first two columns, and the inclusion of a primitive version in the last column, you can probably guess where this post is going.**

Table Size HashSet Binary Search, Object Binary Search, Primitive
1,000,000 178,054K 142,525K 126,900K
10,000,000 656,742K 318,307K 162,057K
100,000,000 5,286,458K 2,077,979K 513,619K
1,000,000,000 n/a n/a 3,969,678K

I ran these tests on an Amazon EC2 double-extra-large instance, with a 16 Gb Java heap. It costs $1 per hour ($8,766/year) to rent this machine. If my application needed a billion-entry lookup table, I would have two choices: either use a sorted array, or upgrade to a quad-extra-large instance and double my costs. For a single machine, maybe not an issue. But if I'm running a cluster of machines, the cost difference would add up quickly.

The bottom line is that absolute CPU performance is only one factor that goes into picking a data structure. Memory consumption is another, and programmer performance is a third — one that's perhaps more important than the other two put together. A HashSet is a well-known, easy to use data structure. A sorted array is less so, and becomes increasingly complex if you have to implement a custom Comparator. Using one will increase your time to implement, and perhaps your maintenance costs. So it wouldn't be my first choice.

But if I were faced with a limited-memory environment, neither would I be blinded by “Big-O” performance.


* If the HashSet implementation is O(1), why did its performance decrease as the table got larger? There's always a risk, when you write a benchmark, that you'll get some apparently anomalous results and will need to explain them. In this case, I believe the performance difference is an artifact of the processor cache. I ran the tests on a machine with an 8MB cache, which is large enough to hold the entire hash array for 100,000 entries — and probably large numbers of the buckets for 1,000 and 10,000 entries. Since the test repeatedly probes the list, the cache is a significant contributor to overall performance. Yet another pitfall in a micro-benchmark …

** You may be wondering why I only show the last four rows of the table. The reason is another pitfall of micro-benchmarks: finding the correct measurement tool. I used MemoryMXBean.getHeapMemoryUsage(), along with a couple of calls to System.gc(). This should have shown me only the memory used by live objects, but instead I saw a minimum of 100Mb for each run. With the larger table sizes, however, the real memory consumption overwhelmed this baseline “noise.”

Thursday, April 28, 2011

JDK Bug 6337981: Writing XML to OutputStream Doesn't Indent

I was recently bitten by JDK bug #6337981. Since it took me a while to narrow down the problem, and longer to find the official bug report, I'm posting it here to hopefully increase its Googleability. And so that I can respond to any complaints about how Practical XML does output.

Here's the code:

public class XmlIndentExample
{
    public static void main(String[] argv)
    throws Exception
    {
        String src = "<root><child>text</child></root>";

        ByteArrayInputStream in1 = new ByteArrayInputStream(src.getBytes("UTF-8"));
        StringWriter out1 = new StringWriter();
        transform(new StreamSource(in1), new StreamResult(out1));
        String result1 = out1.toString();
        System.out.println("writer:\n" + result1);

        ByteArrayInputStream in2 = new ByteArrayInputStream(src.getBytes("UTF-8"));
        ByteArrayOutputStream out2 = new ByteArrayOutputStream();
        transform(new StreamSource(in2), new StreamResult(out2));
        String result2 = new String(out2.toByteArray(), "UTF-8");
        System.out.println("stream:\n" + result2);
    }


    private static void transform(Source source, Result result)
    throws Exception
    {
        TransformerFactory fact = TransformerFactory.newInstance();

        // this is a work-around bug #6296446; only needed on JDK 1.5
        fact.setAttribute("indent-number", Integer.valueOf(4));

        Transformer xform = fact.newTransformer();

        xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        xform.setOutputProperty(OutputKeys.INDENT, "yes");

        // since we set the "indent-number" attribute on the factory, we
        // don't need to set the indent amount here; uncomment if you
        // think it will make a difference

//        xform.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

        xform.transform(source, result);
    }
}

When you run this, the version that writes to a ByteArrayOutputStream isn't indented, while the version that writes to a StringWriter is.

The bug is marked as low priority, and has been open since 2005. And I was unable to find any mention of a similar problem in the Xerces bug database. All of which means that it's unlikely to get fixed any time soon.

There is a work-around, mentioned in the bug report: wrap the OutputStream in an OutputStreamWriter. That works, but you need to pay attention to encoding. Always — always — tell the OutputStreamWriter to use UTF-8:

OutputStreamWriter wrapped = new OutputStreamWriter(out, "UTF-8");

Friday, April 22, 2011

I Can Be Replaced

Returning briefly to the “fear” theme: a lot of people have a fear that they can be replaced. That if they don't do exactly what the boss says, he'll find someone who does. I think this fear drives a lot of the negative behavior in programming shops, from writing unmaintainable code to hoarding information. If someone is the only person who can solve a problem, clearly they're irreplaceable.

Not only is this fear misplaced, but the very idea of being irreplaceable is not something that I would wish on anyone.

I have a friend who is irreplaceable. He didn't try to be, but he's smart and manages to consistently meet deadlines in a chaotic environment. He's been in the same job for six or seven years now, and there's no way out. Even when he was “promoted,” it was a new title and more money, but he ended up doing the same work. For him, work has become a living hell.

And the worst part is that the world moved on, his skills have become focused on the one job, and it would be next to impossible for him to move (at least if he wants anything near the same level of compensation).

With another twenty years or more left in my working life, I try very hard to be replaceable.

Thursday, April 21, 2011

Four Data Structures Every Programmer Should Know

I went back to reading Stack Overflow during my convalescence. And saw a bunch of questions and answers that really disturbed me. For those who don't want to click, they're all questions that relate to data structures.

OK, I realize that data structures can be daunting. Knuth took three books just to cover the basics. Cormen is nearly 1,300 pages, and Sedgewick almost 1,000. And I'll be the first to admit that I've not memorized any of these books, or even read significant pieces of them (much less cover-to-cover). They're meant as references: if and when I need to implement an algorithm, I open the book. If you asked me to sit down and implement a red-black tree without a book, I couldn't do it.

But I do know the performance characteristics of binary trees, and I know that java.util.TreeMap uses a binary tree. And as a Java programmer, I know when I should use a TreeMap versus a HashMap, versus an ArrayList that I've sorted. And I think that knowledge is a baseline that every programmer should have.

So, here's a cheat-sheet of the four data structures that every programmer should know, because they'll handle almost all of your in-memory data management needs.

Structure Insert Remove Search Indexed Access Example
Hash table O(1) O(1) O(1) n/a HashMap
Binary tree O(logN) O(logN) O(logN) n/a TreeMap
Linked list O(1) O(1) O(N) O(N) LinkedList
Array-backed list O(N) O(N) O(N) O(1) ArrayList

Hash table

If you need to maintain a key-value relationship, then a hashed data structure should be your first choice. As I've written elsewhere, a hashed lookup is as fast as you can get: O(1) access for all operations. At least, it's O(1) if you have a good hash function. If you have a bad hash function, performance can drop to O(N).

There is one downside to hash tables: they add a lot of memory overhead. To avoid collisions, a hash table needs to be larger than the number of entries; by default, the Java HashMap resizes when the table becomes 75% full. There's also overhead from the linked list used to hold hash buckets: each entry is a distinct object, and in the Sun implementation, on a 32-bit JVM, takes 24 bytes — not counting the value that it holds.

Binary tree

Whenever someone mentions O(logN) performance, they're talking about a tree of some form. Trees implement a “divide and conquer” approach to data: starting at a “root” node, you recursively identify a child node that contains the data you need. At each step, you divide the total search space by some constant value; in the case of a binary tree, that value is 2, you divide the space in half. There are a bunch of different trees, and a bunch of data structures that implement a binary tree, but as far as Java is concerned, there are just two (which means that O(logN) is really O(log2N)).

TreeMap and TreeSet use a tree structure in which the value of a node's “left” child is less than the value of the node, and the value of the node's ”right” child is greater. This means that, not only do you have O(logN) search performance, you can easily iterate through the elements in order from smallest to largest.

And this is (with a few exceedingly rare exceptions), the only reason that you would pick TreeMap over HashMap. For example, if you are extracting word counts from a piece of text, it makes sense to manage the words in sorted order, because you'll want to output them in that form. Other than this use case, the performance benefit of HashMap means it's a better choice.

The PriorityQueue is another Java class that implements a binary tree structure, however it does not maintain the order of its elements. Instead, it uses a tree where a given element is required to be lower value than either of its children. This means that inserts are O(logN), but searching is O(N); you know that the root is the smallest value in the tree, but you know nothing about any of the others.

Priority queues have a very limited set of use cases. They're great when you need to process or initiate time-dependent events, or for graph traversal algorithms where you care about a least-cost path. Because they don't maintain a total ordering over their elements, there's less work that has to be done for each N, and there's no need for an explicit node object (even though two algorithms can have the same “big O” complexity, the work that takes place can make them very different in performance).

I mention priority queues because of one of the SO links above. Because priority queues are binary structures, they take O(logN) time for inserts and removes. Which is a really dumb penalty to pay if you're looking for a simple linear (eg: fifo, or first-in-first-out) queue.

Linked list

Which brings me to linked lists, the traditional data structure for queues. As long as you actually hold a reference to a node, insertion and removal are O(1): they're implemented by updating pointers. It's easy to get a reference to either the first or last node (at least in a doubly-linked list, which is what the Java LinkedList class implenents). It's not so easy to get access to an arbitrary node, because you need to iterate the predecessor nodes first, an O(N) activity.

But linked lists have another dark side: each of the nodes is a distinct object, that consumes memory and must eventually be garbage-collected (20 bytes on a Sun 32-bit JVM). And the use-case for removing elements from the head or tail of a list tends to be multi-threaded applications, in which case one of the classes in java.util.concurrent will be more appropriate. In fact, the only time that LinkedList would be my first choice is for an application where I need known marginal memory consumption.

Array-backed list

For all other generic lists, my first choice is ArrayList. As its name implies, it implements the List interface using a pre-allocated array for storage. This means that you get O(1) access to any value, as long as you know its position in the array. It also means that, while add or remove of the last element is O(1), it's O(N) for any other position, because the other values in the list have to be shifted. However, in practice most additions are to the end of a list. And even where you need to add or remove values from the start of the list, in a small list (a few thousand entries) the CPU cost to do this is lower than the cost to manage the entries of a LinkedList.

The problem with ArrayList is the way that it resizes the underlying array. As I said, it pre-allocates an array for storage; if you want to add more elements than the array can hold, it must allocate a new, larger array, and copy the existing entries into it. The documentation describes this behavior as “amortized” O(1) performance: although the copy is an O(N) operation, it will only happen every N times you add an element, and O(N) / N == O(1).

More important, in practice, is that an ArrayList allocates a new array that is some significant percentage larger than the old array (it's explicitly not documented, but in the Sun 1.6 JDK is 50%). If you're pushing the boundaries of available memory, and need to expand a large list, it's all too easy to get an OutOfMemoryError. There's also the potential for a lot of wasted space, particularly with small lists.

Summary

I would say that, 99% of the time, when I need a Java collection class I turn to ArrayList and HashMap. To me, they represent the Platonic ideal for List and Map implementations. But I do that with full understanding of their limitations, and am ready to switch to a different data structure (or implement my own) when I run into those limitations.

Thursday, April 14, 2011

There's more than one way ...

Words of wisdom from a former coworker
There's more than one way to skin a cat. And just between you and me, it's a lot easier when you don't care how it looks when you're done.

Friday, April 8, 2011

Aviate, Navigate, Communicate

Houston, we've had a problem

It's the tone of voice, not simply the words. True, the crew of Apollo 13 didn't know how serious their problem was when they calmly notified Mission Control. The crew of US Airways Flight 1549, by comparison, knew exactly how serious their problem was. Yet the transcripts and audio feature the same calm, “let's take care of business” tone.

Compare this to the way that a development organization responds to a problem. In the cases I've seen, there will be a half-dozen people clustered around a desk, maybe more on a conference call. Everyone is talking, everyone is trying to formulate a plan. And surprisingly, most of these people are managers. There may be a single developer. And while she's trying to figure out what went wrong, she has to contend with a barrage of questions from outside.

I've been in that position. Probably most of you have. And when I'm the person at the keyboard, I try to remember the best advice I got from my flight instructor.

Aviate

It's amazing how far an airplane can go without an engine. In the case of a light single-engine plane like a Cessna 172, as long as you fly straight and level at “best glide speed,” you'll travel about a mile horizontally for every 1,000 feet of altitude you lose. The first response to an engine failure is to level the wings and trim for this speed — a response regularly reinforced by your flight instructor, who pulls the throttle to idle while you're in the middle of something else.

The important thing is that the first steps in any crisis should be planned in advance. There's no hesitation when your engine goes silent: you level the wings and trim for airspeed. But the response for an engine fire is completely different: you shut off the fuel supply (causing the engine to stop) and then increase your speed to put out the fire. Once the fire is out you can think about trimming for glide.

I've never known an IT organization that planned for failure — other than disasters, of course. And while it's important to have a plan when your data center goes down, it's just as important to have a plan for when your web server starts sending out “403 - Forbidden” messages to all of your clients. And, just like the procedure for engine fire, sometimes you have to trade one emergency for another.

Navigate

“Keep the airplane flying” is good advice even when you don't have engine problems. And only after you've done that, then it's time to figure out what happened and how to resolve the problem.

Far too many crashes (primarily of private aircraft) happen because pilots try to resolve the problem first. The crash of Eastern 401 is an example: all three people in the cockpit were focused on diagnosing a burnt-out lightbulb while the aircraft descended into the Everglades.

By comparison, the crew of Flight 1549 divided up their tasks: Captain Sullenberger immediately took over the controls, while First Officer Skiles walked through the emergency checklist. This division of responsibilities has a name: Cockpit Resource Management, and it's part of the training for commercial aircrews (in large part due to accidents such as Eastern 401).

Back to an IT organization. In my experience, division of labor is ad hoc: the of the “first responders” tend to brainstorm ideas and then go off and research them. Meanwhile, who is minding the system?

Communicate

If you watched the NTSB video that I linked above, you may have noticed something: before the emergency, air traffic control initiated all communication and the crew responded very quickly — the New York airspace is one of the busiest in the country, and the controllers have little patience. But after calling the mayday, the communication changed: the crew would initiate requests for information, and ignore any requests initiated by the controller.

Of the three steps, this might be the most important: communicate only when necessary. It's also the most difficult: managers need to communicate; it's how they add value. The trick is to balance their need for communication against your need to avoid distractions. A pre-agreed communication plan is best (and this is how air traffic works: an aircraft with a declared emergency takes precedence over everything else in the sky). But if you don't have one, it's easy enough to create one: say what you're doing and when you expect to have an update. And then ignore interruptions (managers that truly add value via communication will understand).

And above all, keep the tone of your voice level, even if you feel like panicking. Because it's the tone of voice that people remember.

Tuesday, April 5, 2011

Deadlines

When a man knows he is to be hanged in a fortnight, it concentrates his mind wonderfully

I doubt that Samuel Johnson had first-hand knowledge of staring down a gallows, but he perfectly captures the effect of an approaching deadline. Unfortunately, all too often the mind is concentrated on the deadline itself, rather on the problem(s) to be solved.

Deadlines are a fact of life in software development. And I've been lucky enough to have worked at several companies where the deadline is just another day: it arrives, the software is released, everybody leaves work early for beer. At other companies, the deadline is a source of panic, sometimes starting at the project kickoff.

The difference, as everyone knows, is that the latter companies set unrealistic deadlines: too much work, too little time.

Except that, as I look back at the projects where I've experienced deadline pressure, I don't remember that the schedule or work effort were that extreme. More often, the problems came from confusion: bugs (particularly performance) that weren't discovered until late in the process, last minute questions about how a feature should behave (with different team members having different strongly-held positions), the undiscovered critical feature, or process delays (QA has to sign off, but the QA lead is sick).

There are lots of techniques to attack these problems, with Agile project management attempting to take on several at once. And I think that any or all can work, even those embodying heavyweight process. The only question is whether they reduce the confusion or add to it.

Thursday, March 31, 2011

Say No

Continuing the “fear” theme with another personal anecdote: I have a water phobia, caused (I think) by almost drowning as a child. I can trigger a panic attack in waist-deep water, simply by putting my head under. For that matter, I have issues putting my face directly into a shower's flow. Not good for someone who grew up less than a mile from the ocean.

So why is there a sea kayak hanging in my garage?

The answer, as you might have guessed from the title, is that I said “no” to my phobia. I took a guided kayak tour once on vacation, and decided that I liked it. I then took a paddling class before buying my own boat. And part of this class involved a “wet exit”: intentionally flipping the boat upside down, undoing the spray skirt, getting out, and surfacing — while still holding onto both boat and paddle. I dreaded the very idea.

As it turned out, I unintentionally flipped a few days before the class (tip: don't grab onto branches overhanging a moving river). And realized that panic takes a backseat to training: before I had a chance to think about what happened, I had pulled the “oh shit strap” on the skirt and pushed out of the boat. However, that didn't make the intentional flip any easier. Or the rolling class that I took a few months later. Phobias don't go away just because you tell them to.

To an outside observer, phobias often appear trivial. OK, perhaps not a fear of drowning, but heights? Open spaces? Small bubbles in pancakes? To the person in the grip of a phobia, however, they're very real and very immediate.

Which brings me back to fear and programming. To an outside observer, the fears that motivate programmers seem very trivial. Will you die if you miss a deadline? No. Will the company go out of business? Probably not. Will you get fired? Probably not, unless you have a history of missing deadlines. Will you get yelled at? Maybe.

Against that, what is the penalty for shipping software with a bug? Depending on the place that software is used, someone might actually die. Or a financial transaction could go wrong, meaning the company loses money. And if you ship sloppy software, even if it doesn't have bugs, you guarantee that the company will lose money when maintaining that software.

The trick to saying “no” is to mentally rehearse the alternatives, so that they take precedence over the fear. Whether you're in an upside-down kayak, an airplane that has just lost its engine, or in your boss' office after missing a deadline, there are always alternatives.

Wednesday, March 30, 2011

Fear

Rufus, my cockatiel, is sitting quietly on top of his cage, preening. Suddenly he lets out three short screeches and bolts down the hallway, flying low and fast. I look out the window and see a hawk circling. I find him in the bathroom, in full alert: feathers slicked down, neck fully extended, head turning quickly, looking for danger. I take him on my hand and quietly say “it's OK, it's a red-tail, they don't eat birds.” He doesn't understand my words, of course, but the tone calms him, and he returns to his cage. Hawks will continue to frighten him, and with good reason: there's also a family of sharpshins in the neighborhood, and they do eat birds.

This is the ur-fear: being torn apart and eaten by another creature. Humans, residing at the top of the food chain, usually only feel this fear after seeing a horror movie. But we still have an amygdala, which looks for reasons to initiate a fear response. So we react to a whole range of stimuli: violent death, injury, losing our job, or just the possibility that our manager and coworkers will be upset at us.

A little fear is perhaps healthy; it's what keeps us from doing stupid things. But there's a darker side to fear: it's focused on the self. All considerations of other people or long-term viability are ignored in the immediate pursuit whatever it takes to resolve the fear. Or, in the words of the late Warren Zevon:

You're a whole different person when you're scared

Programmers don't often face choices of self-preservation. But every day, on all types of projects, they make fear responses: working extraordinaty hours, sacrificing maintainability, adding more people to a late project. In this light, I'm not sure that the “software craftsmanship” movement can be successful. It depends on programmers aspiring to a higher ideal, but it has to contend with a visceral reaction.

Monday, March 28, 2011

Disability

There aren't a lot of disabilities that affect programmers. We occupy a “life of the mind,” with few physical constraints. Early on in my career, I realized that my only real concerns would be major head trauma, repetitive stress injuries, and blindness. For the first, I figured that I wouldn't know that it had happened, and for the others I could take precautions — there are a lot of pairs of safety glasses in my basement.

Then, a couple of weeks ago, I returned home from an extended business trip with a detached retina (for those who aren't squeamish, see the Wikipedia article). Thirty six hours later, I was in surgery. Two weeks after that, another surgery. Currently, I have a gas bubble in my right eye that requires me to keep my head in one of two positions: face down or sideways. Even though I have one good eye, and a laptop that folds flat, my ability to use a computer is limited: a half-hour at a time before I start feeling dizzy. And daytime TV — even with on-demand Mythbusters — is not something I'd wish on anyone.

But recovery is just a matter of time. Six to eight weeks of not doing what I want, of sleeping on the couch to ensure that my head is in the proper position, of wearing a green band to tell paramedics that my eye will explode if they put me on a medivac helicopter. There's a high risk that I will develop a catact as a result of the surgery, and a possibility that the same thing will happen again.

What's interesting (to me, at least), is that my attitude about disability hasn't changed, even after realizing that precautions might not be enough. I mention this because I've had a few half-written postings about fear and motivation. And this experience has taught me something about about how little fear has to do with reality. I think that now I have the ideas needed to complete those posts.

Friday, January 14, 2011

Big-O Isn't Everything

Question: when does an O(N2) algorithm outperform an O(NlogN) algorithm?

As part of my article on primitives, I decided to implement Heapsort. I needed an in-place sort for primitive integers that used an external comparator, and couldn't find one with a reasonable amount of Googling. I picked Heapsort because it's an easy algorithm to implement, and has O(NlogN) behavior for all inputs (unlike Quicksort, which degrades to O(N2) on already-sorted input unless you take care picking the pivot).

Naturally, I wanted to see how my implementation performed, vis-a-vis the sorts provided by the JDK. Particularly on large datasets, which is the reason that I wanted an in-place sort. So I created a benchmark that measured thread CPU time to sort a 10,000,000 element array (which I knew would fit in RAM). Here are the results, the average of 10 executions on my aged desktop (a 2.4 Ghz P4 with only 512 kb of cache):

Implementation Time (sec) Compares
Arrays.sort(int[]) 2.55
Heapsort.sort(int[]) 18.08
Arrays.sort(Integer[]) 25.09 223,179,141
Heapsort.sort(Integer[]) 79.91 439,021,125

Those are dramatic differences. I knew that Heapsort would make more comparisons than Mergesort (which the JDK will use for Integer[]), so I added a counter to the comparator. But the running time was over three times longer, with only twice the number of comparisons. And Quicksort (used by the JDK for int[]) was over seven times faster. Perhaps some of that difference comes from Heapsort calling a comparator, but clearly, not all O(NlogN) algorithms are alike!

So I decided to look at the JDK source. And found this interesting bit of code:*

private static void sort1(int x[], int off, int len) {
// Insertion sort on smallest arrays
if (len < 7) {
    for (int i=off; i<len+off; i++)
    for (int j=i; j>off && x[j-1]>x[j]; j--)
        swap(x, j, j-1);
    return;
}

Insertion sort is an O(N2) algorithm. But it doesn't have a lot of moving parts — unlike Quicksort, which takes almost 40 lines of code in the JDK implementation, not counting the swap and range-swap methods. And when N is small, lines of code are more important than algorithmic efficiency (or, as my coworker Max points out, when N is bounded, you have a constant-time operation).

There's one more performance-enhancing feature of this snippet: all of the data accesses should be within the L1 cache. On Intel processors, the cache is structured as 64-byte chunks of memory — enough to hold 8 adjacent int values. While Arrays.sort() makes no attempt to ensure alignment of the insertion-sorted sub-arrays, it's a naturally cache-friendly algorithm. As is Quicksort itself, for that matter. And Heapsort, which jumps around the array comparing elements at position N and 2 * N, is decidedly cache-unfriendly.

All of which is to say that sometimes, micro-optimizations have a big effect. And that, rather than implementing Heapsort, I should have simply copied the OpenJDK implementation of Quicksort and added a comparator.**


* This optimization isn't specific to the JDK. The GNU C library implementation also switches to insertion sort, albeit at 4 elements, and credits Sedgewick for that and other optimizations.

** In case you're wondering, the updated Quicksort took 4.25 seconds to run, and made approximately 375,000,000 comparisons. If it weren't covered by GPL2, I'd put it in my toolbox in place of the Heapsort implementation.

Friday, January 7, 2011

Mock Objects and Statistics Gathering

Unit testing tends to focus on the API. The unspoken assumption is that, if the API behaves as expected, then the internals are correct. However, that's not guaranteed; even Uncle Bob concedes that a pure test-driven approach will lead you to BubbleSort rather than QuickSort (warning: it's a bit of a rant). Clearly, the “black box” approach of testing just the API is not sufficient; some tests need a “white box” approach, where you look into the implementation. But how to do that?

One way is to add test hooks: public methods that give access to internal object state. I can see how this might work: you write all your mainline code in terms of interfaces, with factories to give you instances of those interfaces. And as mainstream software development embraces dependency injection frameworks, this might be the right answer. But it still leaves me with a queasy feeling. All it takes is one undisciplined programmer writing a cast to the implementation class so that s/he can use those methods, and you've got brittle code waiting to break.

I recently tried a different approach: using a mock object to gather statistics about the internals of the object under test. Ironically, it was a sort routine.

Mock objects have been a part of test-centric development for a while. There are many libraries that help you create mock objects, and it's easy to create your own using proxies. But most of the articles that I've read on mock-centric tests use them to track simple interactions: you want to verify that your class takes a particular action, so you inject a mock that asserts it occurred.

When testing my sort routine, I wanted to verify that the sort actually performed in NlogN time. It was a static method, so there was no place to add a test hook even if I believed in doing so. But there was one hook that was a natural part of the API:

public static class CountingIntComparator
implements Heapsort.IntComparator
{
    public int count;
    public int expectedCount;

    public CountingIntComparator(int size)
    {
        // our implementation of heapsort should perform at most 3 compares per element
        expectedCount = 3 * size * (int)Math.ceil(Math.log(size) / Math.log(2));
    }

    public int compare(int i1, int i2)
    {
        count++;
        return (i1 < i2) ? -1
             : (i1 > i2) ? 1
             : 0;
    }

    public void assertCompareCount()
    {
        assertTrue("expected < " + expectedCount + ", was " + count,
                   count < expectedCount);
    }
}

This works because “O(NlogN)” refers to the number of comparisons made by the sort. And it made for very simple test code (note that I needed a relatively large test array; this is a probabilistic tests, and I didn't want the test to fail because the random number generator returned a string of “bad” values):

public void testIntSortManyElements() throws Exception
{
    final int size = 10000;

    int[] src = createRandomArray(size);
    int[] exp = createSortedCopy(src);

    CountingIntComparator cmp = new CountingIntComparator(size);
    Heapsort.sort(src, cmp);
    assertEquals(exp, src);

    cmp.assertCompareCount();
}

The take-away from this, if there is one, is that mock objects can do more than simple interaction tests. They can gather statistics that give you insight into the long-term behavior of your objects, without the need to expose object state to the entire world.

The key seems to be whether or not the API that you're building is amenable to injecting such a mock without adding a test-specific variant. Perhaps it's a technique that only works for utility classes. But I'll be looking for similar opportunities in more complex code.