Friday, June 24, 2011

Testing Boundaries

In Programming Pearls, Jon Bentley uses binary search as a case study of how difficult it can be to implement a seemingly simple algorithm (emphasis added).

I've assigned this problem in courses for professional programmers. The students had a couple of hours to convert the description above into a program in the language of their choice […] We would then [validate with] test cases. In several classes and with over a hundred programmers, the results varied little: ninety percent of the programmers found bugs in their programs

Bentley then goes on to develop an implementation using formal verification techniques: identifying an “invariant” condition, and proving that that invariant in fact does not vary during execution of the algorithm. Unfortunately, his code has a bug, one that found its way into the JDK.

While one can point to this bug as a refutation of formal correctness proofs,* I'm more interested in whether testing is a valid alternative. Bentley clearly didn't think so: “I wasn't always convinced of the correctness of the code in which no bugs were found.” His concern is reasonable: tests can only verify that a specific input leads to a specific output. It may be sheer coincidence that the code under test produces this output, and it may give incorrect results for different input.

Which means that a simple test, one that creates an array and searches for values in that array, isn't enough. It could be that your code works for an array that has an even number of elements, but fails for one that has an odd number. Or perhaps it works fine for all arrays with more than one element, but throws an exception if passed an empty array. And how do you know that you're really doing a binary search?** One obviously infeasible approach is to do exhaustive testing of all possible arrays.

An alternative is to adopt the “spirit of testivus,” let a tool generate random testcases for you, and assume that a sufficiently high number of tests means everything works. But from the perspective of “this is correct,” however, random tests are the same thing as arbitrary tests: they may find a bug, they may not; you'll never know if others exist.

A better approach is to adopt the core of Bentley's argument, which is to understand the inflection points of the algorithm: those places where the behavior potentially changes. For binary search, I started with the following:

  • 0: every method that involves indexing should have a test for a zero-length object (and in the case of methods using Strings, a test that passes null).
  • 1: This is the end-point of a binary search loop: you have a single element in your array, and either it's the correct value or it isn't.
  • 2, 3: These are the points at which a binary search divides the array and recurses (or loops).
  • 4: This is a superfluous test. I wrote it because I wanted to verify two passes through the divide-and-conquer logic, but then realized there was no point. Still, every test is sacred, so I checked it in.

If you remember high school algebra, you're realize this is proof by induction. If you can demonstrate that you have exhaustively tested every array size through all inflection points, then you can reasonably say that it will return the expected value for larger sizes. Add some hook points that record each pass through the loop (this could be a Comparator that counts its invocations), and you can verify that it's an O(logN) search.

Except … my code doesn't actually test all inflection points. There's another inflection point at Integer.MAX_VALUE / 2 + 2: the possibility of integer overflow. And unless you have a 64-bit JVM and over 4 Gb of physical memory, you have no way to test this case.

I don't have the answer. Tests can develop a high level of confidence, but they can't make guarantees. And to develop that level of confidence, you essentially create a formal proof. But compared to the situation with no tests, that level of confidence may be enough.

* Something that I am all too willing to do: I have a degree in History rather than Computer Science because at nineteen years old I realized that there was a big difference between proof on paper and bytes in memory. But that's a topic for another posting.

** I believe that binary search, like quicksort, is an algorithm that cannot be implemented following the strict rules of test-driven-development. Fixing your tests by doing “the simplest thing that could possibly work” will give you a linear search.

See (NSFW if you don't have headphones, potentially offensive if you do).

Actually, you could test a binary search that works on arbitrary indexable objects rather than Java arrays: create an implementation that stores its data on disk. Although when Java's search methods were being developed (circa 1999), a typical workstation disk wouldn't have had the needed capacity.

No comments: