Wednesday, August 11, 2010

Agile Isn't New

I recently read C. A. R. Hoare's 1980 ACM Turing Award speech, “The Emperor's Old Clothes” (currently downloadable here). The theme of this speech is simplicity, in particular how lack of simplicity in a programming language makes it harder to write error free code — summarized as “so simple that there are obviously no deficiencies [versus] so complicated that there are no obvious deficiencies” (emphasis as written). This, of course, resonates with my feelings about mental models.

About midway through the speech, Hoare describes a failed project: a new operating system that was to dramatically extend the capabilities of his company's former offering. It reads like a recap of The Mythical Man-Month, right down to the programmers' assumption that memory was infinite. But where Brooks turned to organizational strategies to dig his team out from failure, Hoare did something else:

First, we classified our […] customers into groups […] We assigned to each group of customers a small team of programmers and told the team leader to visit the customers to find out what they wanted […] In no case would we consider a request for a feature that would take more than three months to implement and deliver […] Above all, I did not allow anything to be done which I did not myself understand.

That quote could have come from a book on Extreme Programming. Short iterations, understandable stories, pulling the customer into the development process. It's all there.

Or, I should say, it was all there. In 1965. Presented to a group of practicing programmers in 1980. And then “rediscovered” by Beck, Jeffries, et al in the 1990s.

Why do we keep forgetting?

Monday, August 9, 2010

Ant, Taskdef, and running out of PermGen

Although I've switched to Maven for building Java projects (convention over configuration ftw), I still keep Ant in my toolbox. It excels at the sort of free-form non-Java projects that most people implement using shell scripts.

One reason that Ant excels at these types of projects is that you can easily implement project-specific tasks such as a database extract, and mix those tasks with the large library of built-in tasks like filter or mkdir. And the easiest way to add your tasks to a build file is with a taskdef:

    <taskdef name="example"
             classname="com.kdgregory.example.ant.ExampleTask"
             classpath="${basedir}/lib/mytasks.jar"/>

Last week I was working on a custom task that would retrieve data by US state. I invoked those with the foreach task from the ant-contrib library, so that I could build a file from all 50 states. Since I expected it to take several hours to run, I kicked it off before leaving work for the day.

The next morning, I saw that it had failed about 15 minutes in, having run out of permgen space. And the error happened when it was loading a class. At first I suspected the foreach task, or more likely, the antcall that it invoked. After all, it creates a new project, so what better place to create a new classloader? Plus, it was in the stack trace.

But as I looked through the source code for these tasks, I couldn't see any place where a new classloader was created (another reason that I like Ant is that it's source is generally easy to follow). That left the taskdef — after all, I knew that my code wasn't creating a new classloader. To test, I created a task that printed out its classloader, and used the following build file:

<project default="default" basedir="..">

    <taskdef name="example1"
             classname="com.kdgregory.example.ant.ExampleTask"
             classpath="${basedir}/classes"/>
    <taskdef name="example2"
             classname="com.kdgregory.example.ant.ExampleTask"
             classpath="${basedir}/classes"/>

    <target name="default">
        <example1 />
        <example2 />
    </target>

</project>

Sure enough, each taskdef is loaded by its own classloader. The antcall simply exacerbates the problem, because it executes the typedefs over again.

It makes sense that Ant would create a new classloader for each project, and even for each taskdef within a project (they can, after all, have unique classpaths). And as long as the classloader is referenced only from the project, it — and the classes it loads — will get collected at the same time as the project. And when I looked in the Project class, I found the member variable coreLoader.

But when I fired up my debugger, I found that that variable was explicitly set to null and never updated. The I put a breakpoint in ClasspathUtils, and saw that it was being invoked with a “reuse” flag set to false. The result: each taskdef gets its own classloader, and they're never collected.

I think there's a bug here: not only is the classloader not tied to the project object, it uses the J2EE delegation model, in which a classloader attempts to load classes from its own classpath before asking its parent for the class. However, the code makes me think that this is intentional. And I don't understand project life cycles well enough to know what would break with what I feel is the “correct” implementation.

Fortunately, there's a work-around.

As I was reading the documentation for taskdef, I saw a reference to antlibs. I remembered using antlibs several years ago, when I was building a library of a dozen or so tasks, and didn't want to copy-and-paste the taskdefs for them. And then a lightbulb lit: antlibs must be available on Ant's classpath. And that means that they don't need their own classloader.

To use an antlib, you create the file antlib.xml, and package it with the tasks themselves:

<antlib>
    <taskdef name="example1" classname="com.kdgregory.example.ant.ExampleTask"/>
    <taskdef name="example2" classname="com.kdgregory.example.ant.ExampleTask"/>
</antlib>

Then you define an “antlib” namespace in your project file, and refer to your tasks using that namespace. The namespace specifies the package where antlib.xml can be found (by convention, the top-level package of your task library).

<project default="default" 
    xmlns:ex="antlib:com.kdgregory.example.ant">

    <target name="default">
        <ex:example1 />
        <ex:example2 />
        <antcall target="example"/>
    </target>

    <target name="example">
        <ex:example1 />
        <ex:example2 />
    </target>   

</project>
It's extra effort, but the output makes the effort worthwhile:
ant-classloader-example, 528> ant -f -lib bin build2.xml 
Buildfile: /home/kgregory/tmp/ant-classloader-example/build2.xml

default:
[ex:example1] project:     org.apache.tools.ant.Project@110b053
[ex:example1] classloader: java.net.URLClassLoader@a90653
[ex:example2] project:     org.apache.tools.ant.Project@110b053
[ex:example2] classloader: java.net.URLClassLoader@a90653

example:
[ex:example1] project:     org.apache.tools.ant.Project@167d940
[ex:example1] classloader: java.net.URLClassLoader@a90653
[ex:example2] project:     org.apache.tools.ant.Project@167d940
[ex:example2] classloader: java.net.URLClassLoader@a90653

BUILD SUCCESSFUL
Total time: 0 seconds

Bottom line: if you're running out of permgen while running Ant, take a look at your use of taskdef, and see if you can replace it with an antlib. (at least one other person has run into similar problems; if you're interested in the sample code, you can find it here).