blog.kdgregory.com: open source

Showing posts with label open source. Show all posts

Sunday, June 21, 2020

A History of Java Logging Frameworks, or, Why Commons-Logging is Still So Common

In the beginning there was System.out.println(). Or, for those of us who were purists (or wanted our output immediately), System.err.println(). And every developer had his or her own preferences for how their log messages would appear, leading to some very messy output as the number of developers on a team increased.

To bring order to this chaos, many projects implemented their own logging framework. I worked on one such project in the late 90s: the framework consisted of a single class that implemented a static log() method. I can't remember what the logging output looked like, but I suspect that it included a consistently-formatted timestamp.

According to this article by Ceki Gülcü, a project that he was working on in 1996 also implemented their own logging framework. But unlike the project I worked on, its framework was released to the public in 1999 as Log4J.

Something else that became public in 1999 was the Jakarta project, a collaboration between Sun and the Apache Software Foundation to produce the Tomcat application server. And of course Tomcat, being a large application with contributions by many people, had its own logging framework (and it still does, although the implementation and purpose has changed over time).

And lastly, 1999 was also the year that JSR 047, the Java Logging API Specification, was released. It turned into the java.util.logging (JUL) package, released as part of JDK 1.4 in 2002.

A plethora of logging frameworks isn't a problem if you're an application developer: you pick one and stick with it. If you're developing a library that might be used by those applications, however, it's a nightmare. If your library uses Log4J and the application uses JUL, then the output becomes a mess and the developers using your library complain.

At the time, the Jakarta project was arguably the largest single producer of libraries for Java application developers, so they added another: Jakarta Commons Logging (since renamed to Apache Commons Logging, but you'll still see the initials "JCL" in the documentation). The idea of Commons Logging was that you would write your code against the JCL API, add the JCL JAR to your dependencies, and it would figure out what actual logging framework you were using.

Although Commons Logging was intended for libraries, application developers adopted it as well. I can't speak for anyone else, but I looked at it as “won't hurt, and means I don't need to keep track of multiple APIs.” Unfortunately, some developers discovered that it could hurt: they were using Tomcat, regularly redeploying their applications, and had memory leaks that would eventually cause Tomcat to stop working.

Looking back, it appears that these leaks were due to missing transitive dependencies in the deployment bundle.* This took place in the days before Maven 2, when developers were responsible for identifying every JAR that went into their application, and ensuring that it somehow got there (which often meant lots of JARs checked into source control). It wouldn't be obvious that a library used Commons Logging, so the application developer wouldn't bother to add it to the deployed WAR. Unfortunately, Tomcat made it available on the system classpath (because it used Commons Logging internally), so the developers never knew they were missing the JAR. And since Commons Logging needed to know about the actual deployed logging framework, it would establish a strong reference to the Log4J implementation that was in the WAR, preventing the classloader from unloading the classes belonging to the WAR.

That problem was rather quickly resolved: Commons Logging version 1.1 was released in 2006, Tomcat 6 moved it off the public classpath (although Tomcat 5 remained “in the wild” for quite some time), and Maven 2 ensured that a WAR would contain all of the dependencies that it needed. But developers have very long memories for things that go wrong, especially things that happened to someone else who blogged about it.**

At the same time, several popular Java frameworks appeared; Hibernate in 2001 and Spring in 2002 are two of the most familiar. These frameworks were complex enough to need logging, but for obvious reasons wouldn't want to be tied to a specific implementation. Commons Logging provided that capability (and thus became an untracked dependency for many builds).

Moving forward, the new millennium saw continued change in the logging world. Log4J became an Apache project. Ceki Gülcü left Apache and developed Logback and SLF4J. And in the early 2010s, the remaining Apache Log4J committers decided that the Log4J 1.x implementation couldn't be extended and completely rewrote it as Log4J 2.x.

Of these, SLF4J is particularly interesting because it was a logging facade, in direct competition with Commons Logging. Unlike Commons Logging, which tried to infer what underlying framework you were using, SLF4J required you to explicitly include “bridge” JARs for your actual implementation. SLF4J also provided additional features, such as formatted log messages, that were very attractive to application developers.

However, adopting SLF4J had its own pain point: if you used Spring (or Hibernate, or any other library that dependent on Commons Logging), Maven would add it to your build as a transitive dependency. Where it might take precedence over the “slf4j-jcl” bridge from SLF4J (it all depended on the order that JARs were given to the classloader). A key feature of Maven POMs from this era are multiple <exclusions> to prevent such transitive dependencies.

So here we are in 2020, and the logging frameworks scene is more complex than ever:

Log4J 1.x is still used by many projects, even though it was officially end-of-lifed in 2015. One of its most useful features doesn't work under Java 9 (and, I presume, later versions), so its popularity may fade (although it seems that many people, particularly those using OpenJDK, are quite happy with Java 8).
SLF4J/Logback is still used by many developers (including myself), even though new releases seem to have stalled at the 1.3.0-alpha stage (after 25 years of writing logging frameworks, I'm guessing Ceki is in need of a break).
Log4J 2.x provides “bridge” JARs that let people use Commons Logging and SLF4J as their API, with Log4J2 as the back-end.
Commons Logging still exists, but hasn't seen a release since 2014. Nor has its list of supported frameworks changed: Log4J 1.x, JUL, and Avalon LogKit.

Perhaps counter-intuitively, even with all of these changes, Commons Logging is still used by many libraries. However, it's become less visible. Spring Framework, for example, implements the API internally; as an application developer, you no longer need to explicitly exclude the JAR. And if you use Spring Boot, its a 3,000+ line dependency-management POM will explicitly exclude Commons Logging from the libraries that use it.

If you're developing a library, I think that Commons Logging is still the best choice for internal logging. It provides a consistent interface, and it's reasonable to expect that the consumers of your library already have the bridge JARs that they need (which might mean the internal implementation in Spring Framework). But there are a few best practices to keep your users from cursing you:

Mark your dependency as provided. This tells Maven (or Gradle, or any of the other tools that follow the Maven standard) not to resolve the transitive dependency; it will rely instead on an explicitly-referenced JAR to provide the necessary classes.
Ensure that you don't establish a transitive dependency via a package that you depend on, like HTTP Components. Take the time to look at your entire dependency tree (using mvn dependency:tree or equivalent), and add an exclusion if anything tries to pull in Commons Logging.
Don't implement your own logging facade. It's tempting, I know: you want to protect the people that haven't configured logging into their application. And it seems easy: two classes (a log factory and a logger), with some reflection to pick an appropriate back end. But there's a lot of room for error. If you get it wrong, you'll introduce subtle bugs and performance issues, and your experienced users will curse you (and look for an alternative library). And if you get it right, you'll find that you've re-implemented Commons Logging. And good or bad, it won't actually help inexperienced users: they should learn to use logging rather than deploy a black box and cross their fingers.

Bottom line: if you're writing an application, use whatever logging framework you want. I have a strong preference for SLF4J/Logback, but Log4J 2.x does have some features that it's missing. However, if you're implementing a library, stick with Commons Logging. In almost every case it will Just Work™.

* Open Source Archaeology can be a fascinating experience. It's next to impossible to find a source-code repository that gives a full history for older projects: they typically started on CVS, then moved to SVN, and are now on GIT. In some cases, moving between hosting providers (although it looks like Hibernate is still actively maintained on SourceForge, which makes me feel better about a couple of my older projects). Each move lost information such as version tags (assuming they ever existed).

Maven Central is also less helpful than expected, because many projects changed their group or artifact IDs over their lifetime (plus, who wants to dig through 55 pages of org.springframework JAR listings). And a lot of older versions are “retconned”: they were imported into the repository long after release, with a made-up, minimal POM.

Fortunately, most of the larger projects maintain their own archives, so you can download a particular version and unpack it to see what's there. And if you're looking for dependencies, you can pick a likely class and run javap -c to disassemble it and then look at variable declarations. It's a lot of work, and some might call it obsessive. But that's how I learned that Spring Framework 1.0 didn't log at all, while 1.2 used Commons Logging.

** They're also very vocal about it. I wrote an article on logging techniques in 2008 that referenced Commons Logging (because in my world it wasn't an issue), and constantly received comments telling me that it was horrible and should never be used. I eventually updated the article to reference SLF4J, primarily because it was focused at application logging, but also to keep the critics quiet.

Monday, January 6, 2020

The Future of Open Source

The world of open source software seems to be going through a period of soul-searching. On the one hand, individual maintainers have retracted packages, causing disruption for the communities that depended on those packages. On the other, software-as-a-service providers are making more money from some applications than their creators.

This is all happening in a world where businesses depend on open-source to operate. It doesn't matter whether you're an individual launching a startup with PHP and MySQL, or a multi-national replacing your mainframe with a fleet of Linux boxes running Java. Your business depends on the work of people that have their own motivations, and those motivations may not align with yours. I think this is an untenable situation, one that will eventually resolve by changing the nature of open-source.

Before looking at how I think it will resolve, I want to give some historical perspective. This is one person's view; you may not agree with it.

I date the beginning of “professional” open source as March 1985: that was the month that Dr Dobbs published an article by Richard Stallman, an article that would turn into the GNU Manifesto. There was plenty of freely available software published prior to that time; my experience was with the Digital Equipment Corporation User Society (DECUS), which published an annual catalog of programs ranging in complexity from fast fourier transform routines to complete language implementations. These came with source code and no copyright attached (or, at least, no registered copyright, which was an important distinction in the 1970s and early 1980s).

What was different about the GNU Manifesto, and why I refer to it as the start of “professional” open source, was that Stallman set out a vision of how programmers could make money when they gave away their software. In his view, companies would get software for free but then hire programmers to maintain and enhance it.

In 1989, Stallman backed up the ideas of the GNU Manifesto with the Gnu Public License (GPL), which was applied to the software produced by the GNU project. This licence introduced the idea of “copyleft”: a requirement that any “derivative works” also be licensed using the GPL, meaning that software developers could not restrict access to their code. Even though that requirement was relaxed in 1991 with the “library” (now “lesser”) license, meaning that you could use the GNU C compiler to compile your programs without them becoming open source by that act, the GPL scared most corporations away from any use of the GNU tools (as late as 1999, I was met with a look of shock when I suggested that the GNU C compiler could make our multi-platform application easier to manage).

In my opinion, it was the Apache web server, introduced in 1995, that made open-source palatable (or at least acceptable) to the corporate world. In large part, this was due to the Apache license, which essentially said “do what you want, but don't blame us if anything goes wrong.” But also, I think it was because the corporate world was completely unprepared for the web. To give a sense of how quickly things moved: in 1989 I helped set up the DNS infrastructure for a major division of one of the world's largest corporations; I had only a few years of experience with TCP/IP networking, but it was more than the IT team. NCSA Mosaic appeared four years later, and within a year or two after that companies were scrambling to create a web presence. Much like the introduction of PCs ten years earlier, this happened outside of corporate IT; while there were commercial web-servers (including Microsoft and Netscape), “free as in beer” was a strong incentive.

Linux, of course, was a thing in the late 1990s, but in my experience wasn't used outside of a hobbyist community; corporations that wanted UNIX used a commercial distribution. In my view, Linux became popular due to two things: first, Eric Raymond published The Cathedral and the Bazaar in 1997, which made the case that open source was actually better than commercial systems: it has to be good to survive. But also, after the dot-com crash, “free as in beer” became a selling point, especially to the startups that would create “Web 2.0”

Jumping forward 20 years, open-source software is firmly embedded in the corporate world. While I'm an oddity for running Linux on the desktop, all of the companies I've worked with in the last ten or more years used it for their production deployments. And not just Linux; the most popular database systems are open source, as are the tools to provision and manage servers, and even productivity tools such as LibreOffice. And for most of the users of these tools, “free as in beer” is an important consideration.

But stability is (or should be) another important consideration, and I think that many open-source consumers have been lulled into a false sense of stability. The large projects, such as GNU and Apache, have their own repositories and aren't going anywhere. And the early “public” repositories, such as SourceForge and Maven Central, adopted a policy that “once you publish something here, it will never go away.” But newer repositories don't have such a policy, and as we saw with left-pad in 2016 and chef-sugar in 2019, authors are willing and able to pull their work down.

At the same time, companies such as Mongo and Elastic.NV found that releasing their core products as open-source might not have been such a great idea. Software-as-a-service companies such as AWS are able to take those products and host them as a paid service, often making more money from the hosting than the original companies do from the services they offer around the product. And in response, the product companies have changed the license on their software, attempting to cut off that usage (or at least capture a share of it).

Looking at both behaviors, I can't help but think that one of the core tenets of the GNU manifesto has been forgotten: that the developers of open-source software do not have the right to control its use. Indeed, the Manifesto is quite clear on this point: “[programmers] deserve to be punished if they restrict the use of these programs.”

You may or may not agree with that idea. I personally believe that creators have the right to decide how their work is used. But I also believe that releasing your work under an open-source license is a commitment, one that can't be retracted.

Regardless of any philosophical view on the matter, I think there are two practical outcomes.

The first is that companies — or development teams — that depend on open-source software need to ensure their continued access to that software. Nearly eight years ago I wrote about using a local repository server when working with Maven and Java. At the time I was focused on coordination between different development teams at the same company. If I were to rewrite the post today, it would focus on using the local server to ensure that you always have access to your dependencies.

A second, and less happy change, is that I think open-source creators will lose the ability to control their work. One way this will happen is for companies whose products are dependent on open-source to provide their own public repositories — indeed, I'm rather amazed that Chef doesn't offer such a repository (although perhaps they're gun-shy after the reaction to their hamfisted attempt to redistributed chef-sugar).

The other way this will happen is for service-provider companies to fork open-source projects and maintain their own versions. Amazon has already done this, for Elasticsearch and also OpenJDK; I don't expect them to be the only company to do so. While these actions may damage the companies' reputations within a small community of open-source enthusiasts, the much larger community of their clients will applaud those actions. I can't imagine there are many development teams that will say “we're going to self-host Elasticsearch as an act of solidarity”; convenience will always win out.

If you're like me, a person responsible for a few niche open-source projects, this probably won't matter: nobody's going to care about your library (although note that both left-pad and chef-sugar at least started out single-maintainer niche projects). But if you're a company that is planning to release your core product as open-source, you should think long and hard about why you want to do this, and whether your plan to make money is viable. And remember these words from the GNU Manifesto: “programming will not be as lucrative on the new basis as it is now.”

Thursday, December 27, 2018

log4j-aws-appenders now supports Logback

I started my Log4J appenders project because I wasn't happy with how the AWS CloudWatch agent broke apart logfiles. It seemed, as I said in the FAQ, like it would be an easy weekend project. That was nearly two years ago.

In the interim, I added support for Kinesis Streams (a year ago) to support search-engine-based centralized logging using AWS managed Elasticsearch. Surprisingly, it wasn't until after that implementation effort that I truly “bought into” the benefits of using a search engine to examine logs. Now I can't imagine going back to grep.

After giving a talk on centralized logging to the local Java users' group, some of the feedback that I got was “it's nice, but we're not using Log4J 1.x.” So in the early fall I started to break the library into pieces: a front-end that's tied to a particular logging framework, and a back-end that handles communication with AWS. This turned out to be quite easy, which I think means that I had a good design to start with.

Then it was a matter of picking another logging framework, and learning enough about it to be able to implement appenders. I picked Logback because it's the default logging framework for Spring, and because it's the native back-end for SLF4J (which I've been using with Log4J for around five years now).

One of the interesting things that came out of this work is that I now see a good use case for multiple inheritance. There's an enormous amount of duplicated code because each appender has two is-a relationships: one to the logging framework and another to the back end. It would be nice if Java had something like Scala traits, where each trait would encapsulate one of the is-a relationships, and the appender would just be a combination of traits. On the other hand, I've seen enough ugly code using traits that I still think Gosling et al made the right decision.

Log4J 2.x is up next, although I think I'm going to take a break for a few months. I have several other projects that have taken a back seat, including a library of AWS utility classes that I started a year ago and hasn't seen its first release.

Happy Holidays!

Monday, July 9, 2012

Introducing Pathfinder

I've been on a new assignment for the last couple of months, located in Center City Philadelphia. On the one hand, the commute is great: I can walk to the local train station, and have a quiet 25 minutes to read or work on the train. On the other hand, orienting my morning around the train schedule has thrown a monkey wrench into my blog posts. I have a dozen or more half-written ideas waiting to be cleaned up and published. I never realized how much post-production I normally do: adding links, tweaking the HTML once it's on Blogger, and whatnot. Did I mention that Septa doesn't have wifi on their trains? (score another point for Boston)

Instead, I've been working on Pathfinder, a tool to examine Java web apps and tell you the URLs that they handle. It was inspired by rake routes, a tool from the Ruby/Rails world. My current job has me enhancing legacy web-apps, and I think that knowing the classes associated with a URL is a good way to start learning a codebase. If I just had to deal with Spring, I could rely on STS; my goal for Pathfinder is to (eventually) handle all web frameworks, obsolete or not.

It's got a way to go. Right now it handles servlets, JSPs, and Spring apps, and the latter must use either SimpleUrlHandlerMapping or a component scan with @RequestMapping. But I think the basic design is solid and extensible, and will be updating it as I run into things that it can't handle.

I've learned an enormous amount about Spring while developing it. For a framework that espouses convention over configuration, I needed to handle an enormous number of special cases. For example, there are at least two default locations where you can save your Spring context files. And did you know that @RequestMapping actually takes an array of URLs?

Speaking of which, I also learned a lot about how annotations are stored in the classfile. It's easy to use reflection to access annotations on classes that are already loaded. But because of some code that I'd seen in a legacy Spring app, I didn't want to actually load the web-app. No problem, I thought, there are a bunch of libraries that already exist for working with classfiles.

As it turned out, not so much. My old standby, BCEL, has some code in the trunk to deal with annotations. But its last released version — from 2006 — doesn't handle them; they're just an “unknown” attribute type. The “new hotness,” ASM, does support annotations, but you wish it didn't: you have to decipher each annotation using a big if-else chain filled with instanceof.

Which brings me to the second introduction of this post: BCELX. The name stands for “BCEL extensions,” and it's built on top of BCEL. Right now it just handles class and method annotations; I'll add parameter annotations when I need them. And it doesn't handle nested annotations — I can't find an example of a nested annotation to build a testcase.

BCELX may expand beyond annotation parsing: it seems that there's a need out there for a simple tree-structured view of a Java classfile, and all the existing libraries are on the visitor wagon. I just have to figure out how to schedule it into my 25 minutes of quiet time each morning.

Monday, August 9, 2010

Ant, Taskdef, and running out of PermGen

Although I've switched to Maven for building Java projects (convention over configuration ftw), I still keep Ant in my toolbox. It excels at the sort of free-form non-Java projects that most people implement using shell scripts.

One reason that Ant excels at these types of projects is that you can easily implement project-specific tasks such as a database extract, and mix those tasks with the large library of built-in tasks like filter or mkdir. And the easiest way to add your tasks to a build file is with a taskdef:

    <taskdef name="example"
             classname="com.kdgregory.example.ant.ExampleTask"
             classpath="${basedir}/lib/mytasks.jar"/>

Last week I was working on a custom task that would retrieve data by US state. I invoked those with the foreach task from the ant-contrib library, so that I could build a file from all 50 states. Since I expected it to take several hours to run, I kicked it off before leaving work for the day.

The next morning, I saw that it had failed about 15 minutes in, having run out of permgen space. And the error happened when it was loading a class. At first I suspected the foreach task, or more likely, the antcall that it invoked. After all, it creates a new project, so what better place to create a new classloader? Plus, it was in the stack trace.

But as I looked through the source code for these tasks, I couldn't see any place where a new classloader was created (another reason that I like Ant is that it's source is generally easy to follow). That left the taskdef — after all, I knew that my code wasn't creating a new classloader. To test, I created a task that printed out its classloader, and used the following build file:

<project default="default" basedir="..">

    <taskdef name="example1"
             classname="com.kdgregory.example.ant.ExampleTask"
             classpath="${basedir}/classes"/>
    <taskdef name="example2"
             classname="com.kdgregory.example.ant.ExampleTask"
             classpath="${basedir}/classes"/>

    <target name="default">
        <example1 />
        <example2 />
    </target>

</project>

Sure enough, each taskdef is loaded by its own classloader. The antcall simply exacerbates the problem, because it executes the typedefs over again.

It makes sense that Ant would create a new classloader for each project, and even for each taskdef within a project (they can, after all, have unique classpaths). And as long as the classloader is referenced only from the project, it — and the classes it loads — will get collected at the same time as the project. And when I looked in the Project class, I found the member variable coreLoader.

But when I fired up my debugger, I found that that variable was explicitly set to null and never updated. The I put a breakpoint in ClasspathUtils, and saw that it was being invoked with a “reuse” flag set to false. The result: each taskdef gets its own classloader, and they're never collected.

I think there's a bug here: not only is the classloader not tied to the project object, it uses the J2EE delegation model, in which a classloader attempts to load classes from its own classpath before asking its parent for the class. However, the code makes me think that this is intentional. And I don't understand project life cycles well enough to know what would break with what I feel is the “correct” implementation.

Fortunately, there's a work-around.

As I was reading the documentation for taskdef, I saw a reference to antlibs. I remembered using antlibs several years ago, when I was building a library of a dozen or so tasks, and didn't want to copy-and-paste the taskdefs for them. And then a lightbulb lit: antlibs must be available on Ant's classpath. And that means that they don't need their own classloader.

To use an antlib, you create the file antlib.xml, and package it with the tasks themselves:

<antlib>
    <taskdef name="example1" classname="com.kdgregory.example.ant.ExampleTask"/>
    <taskdef name="example2" classname="com.kdgregory.example.ant.ExampleTask"/>
</antlib>

Then you define an “antlib” namespace in your project file, and refer to your tasks using that namespace. The namespace specifies the package where antlib.xml can be found (by convention, the top-level package of your task library).

<project default="default" 
    xmlns:ex="antlib:com.kdgregory.example.ant">

    <target name="default">
        <ex:example1 />
        <ex:example2 />
        <antcall target="example"/>
    </target>

    <target name="example">
        <ex:example1 />
        <ex:example2 />
    </target>   

</project>

It's extra effort, but the output makes the effort worthwhile:

ant-classloader-example, 528> ant -f -lib bin build2.xml 
Buildfile: /home/kgregory/tmp/ant-classloader-example/build2.xml

default:
[ex:example1] project:     org.apache.tools.ant.Project@110b053
[ex:example1] classloader: java.net.URLClassLoader@a90653
[ex:example2] project:     org.apache.tools.ant.Project@110b053
[ex:example2] classloader: java.net.URLClassLoader@a90653

example:
[ex:example1] project:     org.apache.tools.ant.Project@167d940
[ex:example1] classloader: java.net.URLClassLoader@a90653
[ex:example2] project:     org.apache.tools.ant.Project@167d940
[ex:example2] classloader: java.net.URLClassLoader@a90653

BUILD SUCCESSFUL
Total time: 0 seconds

Bottom line: if you're running out of permgen while running Ant, take a look at your use of taskdef, and see if you can replace it with an antlib. (at least one other person has run into similar problems; if you're interested in the sample code, you can find it here).

Thursday, December 3, 2009

Why Write Open Source Libraries

I just created my third open-source project on SourceForge, S34J. It's a set of a half-dozen objects that encapsulate the calls for Amazon's Simple Storage Service (S3) — at least, it will be once I finish the code. My other two projects are PracticalXML, a utility library hiding the (often painful) Java XML API, and SwingLib, a library of Swing GUI enhancements that currently has three classes (mostly because I haven't taken the time to upload more).

Other than PracticalXML, I don't expect anyone to ever use these libraries. And for PXML, I don't expect anyone other than the other maintainers, all former coworkers, to use it. So why write them, and why take the time to create a SourceForge project? The answer can be found in the project description for SwingLib:

Classes that I've written for Swing programming. Possibly useful for other people.

Throughout my career, I've written the same code over and over again. Such as a method that creates an XML element that inherits the namespace of its parent. Simple code, a matter of a few minutes to write, but after the third or fourth time it gets annoying. Particularly if you have a several dozen such methods. And once I type that code on an employer's computer, it becomes their property; I can't simply take a copy on to my next job (I'll note here that ideas, such as an API, are not protected by copyright; I always make a DomUtil class, and it always has an appendChildInheritNamespace() method, but the implementation has been from-scratch each time).

An auto mechanic acquires his tools over a lifetime, and takes them from job to job; they don't belong to the shop where he works. By releasing this code as open source, I can do the same. And, who knows, someone else might stumble on it and decide it's useful.

Thursday, September 10, 2009

Building a Wishlist Service: External Libraries

One of the great things about developing Java applications is the wealth of open-source libraries available. The Jakarta Commons libraries have been part of every project that I've done over the past five years; you can find StringUtils.isEmpty() in almost all text manipulation code that I write.

However, the wealth of open-source libraries presents a paradox of choice: which libraries do you use for a project. Each library adds to the memory footprint of your project, either directly as classes are loaded, or indirectly as the JVM memory-maps the library's JAR. External libraries also make dependency management in your build more complex, in some cases forcing you to build the library locally.

More important, every library represents a form of lock-in: once your code is written to conform to the library, it will be expensive to change. And if you discover a bug or missing feature, you'll need to develop a remediation plan. Even if you can code a patch, it will take time to integrate with the mainline code — assuming that it is accepted. In some cases you may find yourself maintaining a private fork of the library over several public releases.

All of which is to say: use open source, but pick your libraries carefully.

In the cases of the product list service, one of the places where I considered external libraries was XML management, in particular conversion between XML and Java beans. There are lots of libraries that handle this: XMLBeans and XStream are two that are commonly used, and the JDK provides its own serialization and deserialization classes as part of the java.beans package.

Of these, XStream seemed to be the best choice: XmlBeans requires a separate pre-compile step, while the JDK's serialization format would require a lot of work on the part of any non-Java client. However, I had another alternative: I am the administrator and main developer on Practical XML, an open-source library for XML manipulation. It didn't support XML-object conversion, but I also had some converter classes that I'd written before XStream became popular. I figured that it would take a minimal amount of work to flesh out those classes and integrate them into the library.

I have an incentive to evolve the Practical XML library, and to use it in all of my projects. However, adding this functionality introduced a two week diversion into my project. In this case the delay didn't matter: I have no hard deadlines on this project. And since I was already using the library in other places, I had the benefit of consistency and reduced footprint. Faced with an unmovable ship date, my decision would have been different.

blog.kdgregory.com