Tuesday, July 31, 2012

Taming Maven: Parent POMs

When faced with a development environment that has dozens or hundreds of distinct projects, version properties are only a first step. Even if you only have to look at one place in each POM, it's a headache to update versions. As a solution, you can specify all of your common dependencies and configuration in a “parent” POM, which is then referenced by each project.

Before continuing, I want to clear up a misconception: parent POMs are not the same as multi-module projects, even though they're described together in the Maven documentation. True, the two are often seen together: a multi-module project almost always uses a parent POM to bring order to its modules. But the parent need not be tied to the modules; an example is the Sonatype parent POM, which is used by every open-source project that deploys to Maven Central via the Sonatype repository.

A parent POM looks like a normal POM, but specifies a packaging type of “pom


The children of this POM then reference it via a <parent> element:



So what goes into the parent POM? Version properties, of course; one of the main reasons for using a parent POM is to ensure that all projects use the same set of dependencies. Also common plugin configuration, such as the compiler, test runner, and any reporting plugins. Finally, any common environment configuration, such as repositories and deployment configuration.

What shouldn't go in the parent POM is an actual <dependencies> section, because that will cause all of your projects to have the same set of dependencies, whether they need them or not. Nor should you add plugins that only run for one or a few projects (although by all means specify the plugin versions). And finally, if your projects use an <scm> section, it needs to go in the individual project POMs — I learned the hard way that Maven won't substitute project-specific values into a section defined by the parent.

The biggest complaint that I've heard about parent POMs is “if we change a dependency, then we have to update all the projects that use that parent!” That's true: the parent is a released artifact, just like the projects themselves; a child specifies a particular version of its parent, and is not automagically updated when the parent changes (unless you use snapshot versions for the parents).

My answer to this complaint is “either it matters or it doesn't, and either way the parent helps you.” There are times when changes don't matter: for example, if you move to a new library version that's backwards compatible. In that case, projects that use the new parent get the new version, as do any projects that link with them, via transitive dependencies. Projects that don't need the new functionality don't need to be updated. Over time, you can migrate these projects to the new POM as you make changes to them.

On the other hand, sometimes the change matters: for example you've modified your database schema, and need to update all projects that use the affected business objects. In this case, the parent again makes your life easier: once you update the dependency property in the parent, it's a simple matter of grepping for that property to find children that need to be updated and re-released.

Monday, July 30, 2012

Taming Maven: Version Properties

Getting started with Maven is easy, and once you use its dependency management feature, you'll wonder why you waited so long. For simple web-apps or single-module projects, it Just Works.

However, most software developers aren't working on simple, one-module projects. We work in organizations that manage many projects, often depending on one-another. And in this situation, the basic Maven project breaks down: you find that every project has a differing set of dependencies, some of which are incompatible. This is the first in a series of postings about taming a multi-project development environment.

To start, replace all of your hardcoded dependency versions with properties.

Projects accumulate dependencies over time: you might start out with a few of the core Spring packages, then add a few of the Apache Commons projects, then a few more Spring projects, then some libraries that another part of your organization maintains. Pretty soon you'll have dozens of dependencies, completely unordered. Just finding a dependency in the mess becomes difficult, even if you have a tool like m2eclipse. And it becomes very easy to have two related dependencies — or even duplicate dependencies — with different versions. Maven can resolve most of these problems automagically, but when it fails, you're in for a long and painful diagnosis session.

But, if you use properties for your dependencies, and adopt a consistent naming strategy for those properties, you may not be able to find your dependency references, but at least the versions will be defined in one place. Start by adding a <properties> section to your POM; I generally place it near the top of the POM, before the <build> and <dependencies> sections (both of these tend to be long).

    <!-- and so on, for all of your dependencies -->

Each property is its own element, and the element name is the property name. You can name your properties anything you want (as long as it's a legal XML element name), but for version properties I think that GROUPID.version makes the most sense. Or use GROUPID.ARTIFACTID.version if there are different artifacts for the same group that don't have a common version (for example, Spring Core and Spring Security).

Next, update the dependency to use that property, rather than a hardcoded version number.


Once all of your POMs use version properties, you can start to organize dependencies across projects. This can be as simple as running grep to find all projects that use a particular property. But Maven gives you a better solution, which will be the topic of tomorrow's post.

This series of posts was prompted by several recent projects where I worked with development organizations that had large project bases built on Maven. I was originally planning to show some of the Vim macros that I used to clean up POMs, but decided instead to start work on a tool to clean up POMs.

Wednesday, July 25, 2012

Business Logic and Interactive Web Applications

By the mid-2000s, the structure of an MVC web-app had gelled: business logic belonged in the model (which was usually divided into a service layer and persistent objects), a thin controller would invoke model methods and select data to be shown to the user, and a view held markup and the minimal amount of code needed to generate repeating or optional content. There might be some client-side validation written in JavaScript, but it was confined to verifying that fields were filled in with something that looked like the right value. All “real” validation took place on the server, to ensure that only “good” data got persisted.

Then along came client-side JavaScript frameworks like JQuery. Not only could you make AJAX calls, but they let you easily access field values and modify the rendered markup. A few lines of JavaScript, and you have an intelligent form: if option “foo” is selected, hide fields argle and bargle, and make fields biff and bizzle read-only.

The problem with intelligent front-ends is that they almost always duplicate the logic found in the back end. Which means that, inevitably, the two will get out of sync and the users will get upset: bargle was cleared but they expected it to hold the concatenated values of biff and bizzle.

There's no good solution to this problem, although it's been faced over and over. The standard solution with a “thick client” application was layered MVC: each component had its own model, which would advertise its changes to the rest of the app via events. These events would be picked up by an application-level controller, which would initiate changes in an application-level model, which would in turn send out events that could be processed by the components. If you were fastidious, you could completely separate the business logic of both models from the GUI code that rendered those models.

I don't think that approach would work with a web-app. The main reason is that the front-end and back-end code are maintained separately, using different languages. There's simply no way to look one place and see all the logic that applies to bizzle.

Another problem is validation. The layered approach assumes that each component sends data that's already been validated; there's no need for re-validation at the lower levels. That may be acceptable for internal applications, but certainly not for something that's Internet-facing.

One alternative is that every operation returns the annotated state of the application model: every field, its value, and a status code — which might be as simple as used/not-used. The front-end code can walk that list and determine how to change the rendered view. But this means contacting the server after every field change; again, maybe not a problem on an internal network, but not something for the Internet.

Another alternative is to write all your code in one language and translate for the front end. I think the popularity of GWT says enough about this approach.

I don't have an answer, but I'm seeing enough twisted code that I think it's an important topic to think about.

Monday, July 9, 2012

Introducing Pathfinder

I've been on a new assignment for the last couple of months, located in Center City Philadelphia. On the one hand, the commute is great: I can walk to the local train station, and have a quiet 25 minutes to read or work on the train. On the other hand, orienting my morning around the train schedule has thrown a monkey wrench into my blog posts. I have a dozen or more half-written ideas waiting to be cleaned up and published. I never realized how much post-production I normally do: adding links, tweaking the HTML once it's on Blogger, and whatnot. Did I mention that Septa doesn't have wifi on their trains? (score another point for Boston)

Instead, I've been working on Pathfinder, a tool to examine Java web apps and tell you the URLs that they handle. It was inspired by rake routes, a tool from the Ruby/Rails world. My current job has me enhancing legacy web-apps, and I think that knowing the classes associated with a URL is a good way to start learning a codebase. If I just had to deal with Spring, I could rely on STS; my goal for Pathfinder is to (eventually) handle all web frameworks, obsolete or not.

It's got a way to go. Right now it handles servlets, JSPs, and Spring apps, and the latter must use either SimpleUrlHandlerMapping or a component scan with @RequestMapping. But I think the basic design is solid and extensible, and will be updating it as I run into things that it can't handle.

I've learned an enormous amount about Spring while developing it. For a framework that espouses convention over configuration, I needed to handle an enormous number of special cases. For example, there are at least two default locations where you can save your Spring context files. And did you know that @RequestMapping actually takes an array of URLs?

Speaking of which, I also learned a lot about how annotations are stored in the classfile. It's easy to use reflection to access annotations on classes that are already loaded. But because of some code that I'd seen in a legacy Spring app, I didn't want to actually load the web-app. No problem, I thought, there are a bunch of libraries that already exist for working with classfiles.

As it turned out, not so much. My old standby, BCEL, has some code in the trunk to deal with annotations. But its last released version — from 2006 — doesn't handle them; they're just an “unknown” attribute type. The “new hotness,” ASM, does support annotations, but you wish it didn't: you have to decipher each annotation using a big if-else chain filled with instanceof.

Which brings me to the second introduction of this post: BCELX. The name stands for “BCEL extensions,” and it's built on top of BCEL. Right now it just handles class and method annotations; I'll add parameter annotations when I need them. And it doesn't handle nested annotations — I can't find an example of a nested annotation to build a testcase.

BCELX may expand beyond annotation parsing: it seems that there's a need out there for a simple tree-structured view of a Java classfile, and all the existing libraries are on the visitor wagon. I just have to figure out how to schedule it into my 25 minutes of quiet time each morning.