Saturday, April 30, 2016

Taming Maven: Transitive Dependency Pitfalls

Like much of Maven, transitive dependencies are a huge benefit that brings with them the potential for pain. And while I titled this piece “Taming Maven,” the same issues apply to any build tool that uses the Maven dependency mechanism, including Gradle and Leiningen.

Let's start with definitions: direct dependencies are those listed in the <dependencies> section of your POM. Transitive dependencies are the dependencies needed to support those direct dependencies, recursively. You can display the entire dependency tree with mvn dependency:tree; here's the output for a simple Spring servlet:

[INFO] com.kdgregory.pathfinder:pathfinder-testdata-spring-dispatch-1:war:1.0-SNAPSHOT
[INFO] +- javax.servlet:servlet-api:jar:2.4:provided
[INFO] +- javax.servlet:jstl:jar:1.1.1:compile
[INFO] +- taglibs:standard:jar:1.1.1:compile
[INFO] +- org.springframework:spring-core:jar:3.1.1.RELEASE:compile
[INFO] |  +- org.springframework:spring-asm:jar:3.1.1.RELEASE:compile
[INFO] |  \- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] +- org.springframework:spring-beans:jar:3.1.1.RELEASE:compile
[INFO] +- org.springframework:spring-context:jar:3.1.1.RELEASE:compile
[INFO] |  +- org.springframework:spring-aop:jar:3.1.1.RELEASE:compile
[INFO] |  |  \- aopalliance:aopalliance:jar:1.0:compile
[INFO] |  \- org.springframework:spring-expression:jar:3.1.1.RELEASE:compile
[INFO] +- org.springframework:spring-webmvc:jar:3.1.1.RELEASE:compile
[INFO] |  +- org.springframework:spring-context-support:jar:3.1.1.RELEASE:compile
[INFO] |  \- org.springframework:spring-web:jar:3.1.1.RELEASE:compile
[INFO] \- junit:junit:jar:4.10:test
[INFO]    \- org.hamcrest:hamcrest-core:jar:1.1:test

The direct dependencies of this project include servlet-api version 2.4 and :spring-core version 3.1.1.RELEASE. The latter has a dependency on spring-asm, which in turn has a dependency on commons-logging.

In a real-world application, the dependency tree may include hundreds of JARfiles with many levels of transitive dependencies. And it's not a simple tree, but a directed acyclic graph: many JARs will share the same dependencies — although possibly with differing versions.

So, how does this cause you pain?

The first (and easiest to resolve) pain is that you might end up with dependencies that you don't want. For example, commons-logging. I don't subscribe to the fear that commons-logging causes memory leaks, but I also use SLF4J, and don't want two logging facades in my application. Fortunately, it's (relatively) easy to exclude individual dependecial's, as I described in a previous “Taming Maven” post.

The second pain point, harder to resolve, is what, exactly, is the classpath?

A project's dependency tree is the project's classpath. Actually, “the” classpath is a bit misleading: there are separate classpaths for build, test, and runtime, depending on the <scope> specifications in the POM(s). Each plugin can define its own classpath, and some provide a goal that lets you see the classpath they use; mvn dependency:build-classpath will show you the classpath used to compile your code.

This tool lists dependencies in alphabetical order. But if you look at a generated WAR, they're in a different order (which seems to bear no relationship to how they're listed in the POM). If you're using a “shaded” JAR, you'll get a different order. Worse, since a shaded JAR flattens all classes into a single tree, you might end up with one JAR that overwrites classes from another (for example, SLF4J provides the jcl-over-slf4j artifact, which contains re-implemented classes from commons-logging).

Compounding classpath ordering, there is the possibility of version conflicts. This isn't an issue for the simple example above, but for real-world applications that have deep dependency trees, there are bound to be cases where dependencies-of-dependencies have different versions. For example, the Jenkins CI server has four different versions of commons-collections in its dependency tree, ranging from 2.1 to 3.2.1 — along with 20 other version conflicts.

Maven has rules for resolving such conflicts. The only one that matters is that direct dependencies take precedence over transitive. Yes, there are other rules regarding depth of transitive dependencies and ordering, but those are only valid to discover why you're getting the wrong version; they won't help you fix the problem.

The only sure fix is to lock down the version, either via a direct dependency, or a dependency-management section. This, however, carries its own risk: if one of your transitive dependencies requires a newer version than the one you've chosen, you'll have to update your POM. And, let's be honest, the whole point of transitive dependencies was to keep you from explicitly tracking every dependency that your app needs, so this solution is decidedly sub-optimal.

A final problem — and the one that I consider the most insidious — is directly relying on a transitive dependency.

As an example, I'm going to use the excellent XML manipulation library known as Practical XML. This library makes use of the equally excellent utility library KDGCommons. Having discovered the former, you might also start using the latter — deciding, for example, that its implementation of parallel map is far superior to others.

However, if you never updated your POM with a direct reference to KDGCommons, then when the author of PracticalXML decides that he can use functions from Jakarta commons-lang rather than KDGCommons, you've got a problem. Specifically, your build breaks, because the transitive depenedency has disappeared.

You might think that this is a uncommon situation, but it was actually what prompted this post: a colleague changed one of his application's direct dependencies, and his build started failing. After comparing dependencies between the old and new versions we discovered a transitive depenency that disappeared. Adding it back as a direct dependency fixed the build.

To wrap up, here are the important take-aways:

  • Pay attention to transitive dependency versions: whenever you change your direct dependencies, you should run mvn dependency:tree to see what's changed with your transitives. Pay particular attention to transitives that are omitted due to version conflicts.
  • If your code calls it, it should be a direct dependency. Plugging another of my creations, the PomUtil dependency tool can help you discover those.

No comments: