Showing posts with label maven. Show all posts
Showing posts with label maven. Show all posts

Saturday, April 30, 2016

Taming Maven: Transitive Dependency Pitfalls

Like much of Maven, transitive dependencies are a huge benefit that brings with them the potential for pain. And while I titled this piece “Taming Maven,” the same issues apply to any build tool that uses the Maven dependency mechanism, including Gradle and Leiningen.

Let's start with definitions: direct dependencies are those listed in the <dependencies> section of your POM. Transitive dependencies are the dependencies needed to support those direct dependencies, recursively. You can display the entire dependency tree with mvn dependency:tree; here's the output for a simple Spring servlet:

[INFO] com.kdgregory.pathfinder:pathfinder-testdata-spring-dispatch-1:war:1.0-SNAPSHOT
[INFO] +- javax.servlet:servlet-api:jar:2.4:provided
[INFO] +- javax.servlet:jstl:jar:1.1.1:compile
[INFO] +- taglibs:standard:jar:1.1.1:compile
[INFO] +- org.springframework:spring-core:jar:3.1.1.RELEASE:compile
[INFO] |  +- org.springframework:spring-asm:jar:3.1.1.RELEASE:compile
[INFO] |  \- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] +- org.springframework:spring-beans:jar:3.1.1.RELEASE:compile
[INFO] +- org.springframework:spring-context:jar:3.1.1.RELEASE:compile
[INFO] |  +- org.springframework:spring-aop:jar:3.1.1.RELEASE:compile
[INFO] |  |  \- aopalliance:aopalliance:jar:1.0:compile
[INFO] |  \- org.springframework:spring-expression:jar:3.1.1.RELEASE:compile
[INFO] +- org.springframework:spring-webmvc:jar:3.1.1.RELEASE:compile
[INFO] |  +- org.springframework:spring-context-support:jar:3.1.1.RELEASE:compile
[INFO] |  \- org.springframework:spring-web:jar:3.1.1.RELEASE:compile
[INFO] \- junit:junit:jar:4.10:test
[INFO]    \- org.hamcrest:hamcrest-core:jar:1.1:test

The direct dependencies of this project include servlet-api version 2.4 and :spring-core version 3.1.1.RELEASE. The latter has a dependency on spring-asm, which in turn has a dependency on commons-logging.

In a real-world application, the dependency tree may include hundreds of JARfiles with many levels of transitive dependencies. And it's not a simple tree, but a directed acyclic graph: many JARs will share the same dependencies — although possibly with differing versions.

So, how does this cause you pain?

The first (and easiest to resolve) pain is that you might end up with dependencies that you don't want. For example, commons-logging. I don't subscribe to the fear that commons-logging causes memory leaks, but I also use SLF4J, and don't want two logging facades in my application. Fortunately, it's (relatively) easy to exclude individual dependecial's, as I described in a previous “Taming Maven” post.

The second pain point, harder to resolve, is what, exactly, is the classpath?

A project's dependency tree is the project's classpath. Actually, “the” classpath is a bit misleading: there are separate classpaths for build, test, and runtime, depending on the <scope> specifications in the POM(s). Each plugin can define its own classpath, and some provide a goal that lets you see the classpath they use; mvn dependency:build-classpath will show you the classpath used to compile your code.

This tool lists dependencies in alphabetical order. But if you look at a generated WAR, they're in a different order (which seems to bear no relationship to how they're listed in the POM). If you're using a “shaded” JAR, you'll get a different order. Worse, since a shaded JAR flattens all classes into a single tree, you might end up with one JAR that overwrites classes from another (for example, SLF4J provides the jcl-over-slf4j artifact, which contains re-implemented classes from commons-logging).

Compounding classpath ordering, there is the possibility of version conflicts. This isn't an issue for the simple example above, but for real-world applications that have deep dependency trees, there are bound to be cases where dependencies-of-dependencies have different versions. For example, the Jenkins CI server has four different versions of commons-collections in its dependency tree, ranging from 2.1 to 3.2.1 — along with 20 other version conflicts.

Maven has rules for resolving such conflicts. The only one that matters is that direct dependencies take precedence over transitive. Yes, there are other rules regarding depth of transitive dependencies and ordering, but those are only valid to discover why you're getting the wrong version; they won't help you fix the problem.

The only sure fix is to lock down the version, either via a direct dependency, or a dependency-management section. This, however, carries its own risk: if one of your transitive dependencies requires a newer version than the one you've chosen, you'll have to update your POM. And, let's be honest, the whole point of transitive dependencies was to keep you from explicitly tracking every dependency that your app needs, so this solution is decidedly sub-optimal.

A final problem — and the one that I consider the most insidious — is directly relying on a transitive dependency.

As an example, I'm going to use the excellent XML manipulation library known as Practical XML. This library makes use of the equally excellent utility library KDGCommons. Having discovered the former, you might also start using the latter — deciding, for example, that its implementation of parallel map is far superior to others.

However, if you never updated your POM with a direct reference to KDGCommons, then when the author of PracticalXML decides that he can use functions from Jakarta commons-lang rather than KDGCommons, you've got a problem. Specifically, your build breaks, because the transitive depenedency has disappeared.

You might think that this is a uncommon situation, but it was actually what prompted this post: a colleague changed one of his application's direct dependencies, and his build started failing. After comparing dependencies between the old and new versions we discovered a transitive depenency that disappeared. Adding it back as a direct dependency fixed the build.

To wrap up, here are the important take-aways:

  • Pay attention to transitive dependency versions: whenever you change your direct dependencies, you should run mvn dependency:tree to see what's changed with your transitives. Pay particular attention to transitives that are omitted due to version conflicts.
  • If your code calls it, it should be a direct dependency. Plugging another of my creations, the PomUtil dependency tool can help you discover those.

Thursday, August 2, 2012

Taming Maven: A Local Repository Server

While Maven's automatic dependency retrieval is a great feature, it does have limitations. And one of the biggest of those limitations is Maven's ability to access local projects. Or, really, any projects that aren't found in the Maven Central repository.

Returning to the archetypal IT department of my previous posts, it's really painful if, before you can start working on your own project, you first have to check-out and build multiple dependent projects. Worse is if those dependencies are under development, and you have to rebuild on a regular basis.

The first part of eliminating that pain is to run a local repository server. This server is a place to deploy your local builds, including any third-party software or local patches you've made to open source software. It can also act as a proxy for Maven Central and other external repositories, protecting you from Internet outages and keeping your POMs free of <repositories> entries.

You can create a Maven repository server using a bare-bones Apache web-server: all you need to do is make sure that its document root follows the repository format. However, there are better options: Nexus and Artifactory are both purpose-built servers for managing Maven artifacts, and both come in open-source variants that you can run for free. If you don't have a local machine, or don't want the hassle of administering it, Artifactory provides cloud hosting of your repository (for a fee). Sonatype doesn't go quite that far, instead providing a pre-built EC2 image (hopefully updated since that post).

Once you've got the repository server running, you need to configure Maven to access it. The simplest approach is to add your local server as a mirror for Maven Central, as described here. Note that you can not simply add a <repositories> entry to your parent POM, as you need to deploy that POM to the repository server.

Now you face the question of how to deploy your builds. Both Nexus and Artifactory give you a web interface to manually upload artifacts, but it's far easier to use the Maven deploy goal to deploy directly from your build (using an HTTP connection to the server). Of course, that raises the issue of credentials: do you give each developer his/her own credentials (which are stored in $HOME/.m2/settings.xml), or use a single set of credentials for all?

I'm in favor of the latter: use one set of credentials, stored either in each user's personal settings file, or in the global settings file. While that may make some people cringe, the security risk is non-existent: the repository server is write-only, and it will control where you write. As long as you don't pass out the actual admin login, or use SCP to deploy, the worst a disgruntled ex-employee can do is upload new builds.

And even that that minor risk can be eliminated if your developers never have to type “deploy” — and they'd be happier too. Instead, set up a continuous integration server that examines your source repository for changes and automatically builds and deploys the changed projects. At least for snapshot builds, this ensures that all of your developers will be using the latest codebase, without any manual intervention whatsoever.

Wednesday, August 1, 2012

Taming Maven: Dependency Management

Once you have a parent POM, you can add a <dependencyManagement> section. In my view, it's often more trouble than its worth (in fact, this blog series started out as a single post suggesting properties were usually a better choice). In the worst case, a dependency management section can prevent your child builds from seeing new dependencies. There are, however, some cases where it is useful to prevent Maven from using unwanted transitive dependencies.

For those who haven't seen <dependencyManagement> in use, here's an abbreviated example:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-core</artifactId>
            <version>${springframework.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-webmvc</artifactId>
            <version>${springframework.version}</version>
        </dependency>
    </dependencies>
</dependencyManagement>

With the dependency version specified in the parent's dependency management section, it can be omitted in the child's <dependencies>:

<dependencies>
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-core</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-webmvc</artifactId>
    </dependency>
</dependencies>

If that was all there was to it, there would be very little reason to use a dependency management section, versus defining version properties in the parent — it might actually be a net increase in the size of your POMs. But dependency management goes a bit deeper: it will override the transitive dependencies associated with your direct dependencies. This bears further examination.

If you use the Spring framework, you know that it's broken into lots of pieces. You don't need to specify all of these as direct dependencies, because each component brings along transitive dependencies (I haven't verified this, but I think spring-mvc the only direct dependency you need for a Spring web-app).

Now consider the case where the parent POM has a dependency management section that lists all of the Spring components, and gives them version 3.0.4.RELEASE, while a child uses version 3.1.1.RELEASE, and just specifies spring-webmvc as a direct dependency. When Maven builds the child and retrieves transitive dependencies, it will ignore the 3.1.1.RELEASE implied by the direct dependency, and instead load the 3.0.4.RELEASE versions specified by the parent.

This is rarely a Good Thing. Sometimes it won't cause an actual bug: if you are using features of the new version that haven't changed since the old version, you have nothing to worry about. But more often, you'll get a NoSuchMethodError thrown at runtime. Or worse, the method is present but does something unexpected. These sorts of bugs can be incredibly painful to track down.

Version properties, of course, go a long way toward keeping these errors from occurring. But some projects will need to specify their own version properties, often because they're trying out some new functionality.

There is, however, one case where a dependency management section is useful: excluding transitive dependencies. Again using Spring as an example: it will use either commons-logging or SLF4J for its internal logging; at runtime, it figures out which is available. However, as of this writing (version 3.1.2.RELEASE), spring-core has a non-optional transitive dependency on commons-logging. Which means that your program will also have a transitive dependency on commons-logging — and if your program is a web-app, you'll find commons-logging in its deployed WAR whether you want it or not.

Perhaps some day the Spring developers will change the scope of this dependency to provided. Until then, if you don't want commons-logging you need to manually break the transitive dependency with an exclusion:

<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-core</artifactId>
    <version>${springframework.version}</version>
    <exclusions>
        <exclusion>
            <groupId>commons-logging</groupId>
            <artifactId>commons-logging</artifactId>
        </exclusion>
    </exclusions>
</dependency>

Without a dependency management section in the parent POM, you would have to repeat that exclusion in every project POM. Miss just one, and the transitive dependency appears. Move the exclusion into the parent's dependency management section, and it applies to all children. Of course, this locks down the version number; any child projects that need a different version must specify a direct dependency on spring-core.

Bottom line: don't use a <dependencyManagement> section unless you absolutely have to. And even then, keep it as small as you possibly can.

Tuesday, July 31, 2012

Taming Maven: Parent POMs

When faced with a development environment that has dozens or hundreds of distinct projects, version properties are only a first step. Even if you only have to look at one place in each POM, it's a headache to update versions. As a solution, you can specify all of your common dependencies and configuration in a “parent” POM, which is then referenced by each project.

Before continuing, I want to clear up a misconception: parent POMs are not the same as multi-module projects, even though they're described together in the Maven documentation. True, the two are often seen together: a multi-module project almost always uses a parent POM to bring order to its modules. But the parent need not be tied to the modules; an example is the Sonatype parent POM, which is used by every open-source project that deploys to Maven Central via the Sonatype repository.

A parent POM looks like a normal POM, but specifies a packaging type of “pom

<groupId>org.sonatype.oss</groupId>
<artifactId>oss-parent</artifactId>
<version>7</version>
<packaging>pom</packaging>

The children of this POM then reference it via a <parent> element:

<parent>
    <groupId>org.sonatype.oss</groupId>
    <artifactId>oss-parent</artifactId>
    <version>7</version>
</parent>

<groupId>net.sf.kdgcommons</groupId>
<artifactId>kdgcommons</artifactId>
<version>1.0.7-SNAPSHOT</version>

So what goes into the parent POM? Version properties, of course; one of the main reasons for using a parent POM is to ensure that all projects use the same set of dependencies. Also common plugin configuration, such as the compiler, test runner, and any reporting plugins. Finally, any common environment configuration, such as repositories and deployment configuration.

What shouldn't go in the parent POM is an actual <dependencies> section, because that will cause all of your projects to have the same set of dependencies, whether they need them or not. Nor should you add plugins that only run for one or a few projects (although by all means specify the plugin versions). And finally, if your projects use an <scm> section, it needs to go in the individual project POMs — I learned the hard way that Maven won't substitute project-specific values into a section defined by the parent.

The biggest complaint that I've heard about parent POMs is “if we change a dependency, then we have to update all the projects that use that parent!” That's true: the parent is a released artifact, just like the projects themselves; a child specifies a particular version of its parent, and is not automagically updated when the parent changes (unless you use snapshot versions for the parents).

My answer to this complaint is “either it matters or it doesn't, and either way the parent helps you.” There are times when changes don't matter: for example, if you move to a new library version that's backwards compatible. In that case, projects that use the new parent get the new version, as do any projects that link with them, via transitive dependencies. Projects that don't need the new functionality don't need to be updated. Over time, you can migrate these projects to the new POM as you make changes to them.

On the other hand, sometimes the change matters: for example you've modified your database schema, and need to update all projects that use the affected business objects. In this case, the parent again makes your life easier: once you update the dependency property in the parent, it's a simple matter of grepping for that property to find children that need to be updated and re-released.

Monday, July 30, 2012

Taming Maven: Version Properties

Getting started with Maven is easy, and once you use its dependency management feature, you'll wonder why you waited so long. For simple web-apps or single-module projects, it Just Works.

However, most software developers aren't working on simple, one-module projects. We work in organizations that manage many projects, often depending on one-another. And in this situation, the basic Maven project breaks down: you find that every project has a differing set of dependencies, some of which are incompatible. This is the first in a series of postings about taming a multi-project development environment.

To start, replace all of your hardcoded dependency versions with properties.

Projects accumulate dependencies over time: you might start out with a few of the core Spring packages, then add a few of the Apache Commons projects, then a few more Spring projects, then some libraries that another part of your organization maintains. Pretty soon you'll have dozens of dependencies, completely unordered. Just finding a dependency in the mess becomes difficult, even if you have a tool like m2eclipse. And it becomes very easy to have two related dependencies — or even duplicate dependencies — with different versions. Maven can resolve most of these problems automagically, but when it fails, you're in for a long and painful diagnosis session.

But, if you use properties for your dependencies, and adopt a consistent naming strategy for those properties, you may not be able to find your dependency references, but at least the versions will be defined in one place. Start by adding a <properties> section to your POM; I generally place it near the top of the POM, before the <build> and <dependencies> sections (both of these tend to be long).

<properties>
    <org.springframework.version>3.1.1.RELEASE</sorg.pringframework.version>
    <!-- and so on, for all of your dependencies -->
</properties>

Each property is its own element, and the element name is the property name. You can name your properties anything you want (as long as it's a legal XML element name), but for version properties I think that GROUPID.version makes the most sense. Or use GROUPID.ARTIFACTID.version if there are different artifacts for the same group that don't have a common version (for example, Spring Core and Spring Security).

Next, update the dependency to use that property, rather than a hardcoded version number.

<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-webmvc</artifactId>
    <version>${org.springframework.version}</version>
</dependency>

Once all of your POMs use version properties, you can start to organize dependencies across projects. This can be as simple as running grep to find all projects that use a particular property. But Maven gives you a better solution, which will be the topic of tomorrow's post.


This series of posts was prompted by several recent projects where I worked with development organizations that had large project bases built on Maven. I was originally planning to show some of the Vim macros that I used to clean up POMs, but decided instead to start work on a tool to clean up POMs.

Friday, December 4, 2009

Integration Tests with Maven

Maven is simultaneously one of the most useful and most frustrating tools that I have ever used. As long as you stay within its “standard build process,” you can do some amazing things with very little configuration. But if you step off that standard, even in a small way, you'll end up in a world of pain. If you're lucky, a short session of Googling will suffice; if unlucky, you'll need to read plugin source code.

In my case, all I wanted to do was create a separate directory of integration tests, and allow them to be invoked from the command line.* I thought that I could get away with just redefining a system property:

mvn -Dproject.build.testSourceDirectory=src/itest/java test

No luck: Maven used its default directory (confirmed with the -X command-line option). After some Googling, which included a bunch of Maven bug reports, I learned that some plugins load system properties before they load the properties defined in the POM — which I think misses the whole point of command-line defines.

My next step was to take a closer look at Surefire's configuration options in the hope that I could override there. And testSourceDirectory looked like it would work, but its datatype is File, and Maven doesn't try to convert non-String properties.

Some more Googling led me to build profiles. This seemed promising: I could create a named profile, and within it set the test directory to whatever I wanted. There was only one problem: profiles can't override the POM's build directories. If this post ever drops into the hands of the Maven team, I'd really like to know the reasoning behind that decision.**

OK, but perhaps I could use a profile to pass my directory directly to the Surefire plugin? This seemed promising, and when I first tried it, even appeared to work (but it turned out that I'd accidentally put source files under the wrong directory tree). Back to the documentation, and I realized that tests are compiled by the compiler plugin, not the Surefire plugin, and the compiler plugin doesn't have any directory config. This is when I downloaded the Surefire plugin code, and grepped for testSourceDirectory. It appears to be used only for TestNG; too bad I'm using JUnit.

In the end, I gave up on using a command-line override to run integration tests. I realized that Maven really, really wanted me to have a single test directory, used for all phases of the build. But to split this directory into unit and integration phases meant I would have to use a naming convention and include/exclude filters. I still rejected this: I have no desire for “big ball of mud” source trees.

Instead I created a sub-project for the integration tests. Of course, it wasn't a true sub-project: Maven is very particular about parent-child relationships. Instead, I just created an "itest" directory within my main project directory, and gave it its own POM; Maven's transitive dependency mechanism really helped here. It took about 10 minutes, versus the 4+ hours I'd spent trying to override the test directory. And it didn't affect the mainline build; for once I was happy that Maven ignores things that it doesn't expect.


* Yes, I know that Maven has an integration-test phase. But after reading about the configuration needed, I decided that it would be a last resort.

** Update (7-Apr-10): last night I attended a presentation by Jason van Zyl, creator of Maven and founder of Sonatype. And his answer to why Maven 2 doesn't allow profiles to override all build elements was that they didn't see a use case at the time: integration tests are outside of the normal build flow — except that now they're finding a need to do just that when building the Hudson build-management tool. For myself, I've come to accept sub-projects as the simplest solution.