Monday, June 16, 2014

Finalizers

I recently saw a comment on my reference objects article that said, in effect, “he mentioned finalizers but didn't say DON'T USE THEM EVER, so I stopped reading.” The capitalized words are taken exactly from the original comment; the rest is a paraphrase to tone down the rhetoric. The comment didn't bother me — if you stop reading so easily, you probably won't gain much from my articles anyway — but the attitude did. Because finalizers do indeed have a role to play.

They compensate for sloppy programming.

Good Java programmers know that non-memory resource allocation has to take place in a try / catch / finally construct. Just like good C programmers know that every malloc() must be matched by a free(). Sloppy programmers either don't know or don't care. And good programmers sometimes forget.

The question then becomes: what should the program do?

One alternative is to just explode. And this is not necessarily a bad thing: not closing your resources is a bug, and it's better to expose bugs early in the development process. In this view, forgetting to close your resource is no different than dereferencing a null pointer.

The problem with this view is that null pointer exceptions tend to show up early, the first time through the affected code. By comparison, leaked resources tend to hide until production, because you never generate sufficient load during development. Leaked file handles are a great example: a typical developer environment allows 4,096 open files. It will take a long time to run out of them, especially if you constantly restart the app. A typical server environment might allow twice or four times that much, but if you never shut down the app you'll eventually run out. Probably at your time of highest load.

And that's leads to an alternate solution: protect the programmers from themselves, by checking for open resources at the time the owning object is garbage collected. This isn't perfect: in a server with lots of memory, the garbage collection interval might exceed the time taken to exhaust whatever resource you're leaking.*

Once you accept the need for some alternate way to clean up non-memory resources, the only remaining question is how. Jave provides two mechanisms: finalizers or (since 1.2) phantom references.

So which should you use? Before answering that question, I want to point out something to the DON'T USE THEM EVER crowd: finalizers and phantom references are invoked under exactly the same conditions: when the garbage collector decides that an object is eligible for collection. The difference is what happens afterward. With phantom references the object gets collected right away, and the reference is put on a queue for later processing by the application. With finalizers, the object is passed to a JVM-managed thread that runs the finalize() method; actual collection happens once the finalizer runs.

Given the choice between a finalizer and a phantom reference, I will pick the finalizer 95% of the time. Phantom references require a lot of work to implement correctly. In the normal case, the only real benefit they provide is to avoid out-of-memory errors from large objects with long-running finalizers.**

A better answer is to avoid both. Far too many people think of finalizers as equivalent to C++ destructors, a place to put important cleanup code such as transaction commit/rollback. This is simply wrong. But it's just as wrong to invoke that code in a phantom reference.

Rather than dogmatically insist “no finalizers!”, I think the better approach is to adopt some development practices to prevent their invocation. One approach is to identify resources that are allocated but not released, using an analysis tool like FindBugs.

But if you want to recover from mistakes in the field, and can do so quickly and cleanly (so that you don't block the finalization thread), don't be afraid of finalizers.


* This is unlikely: if you're running through resources quickly, the associated objects will almost certainly be collected while still in the young generation. At this point, a pedantic pinhead will say “but the JLS doesn't guarantee that the garbage collector will ever run.” My best response to that: ”well, then you've got bigger problems.”

** A bigger benefit, and this is the 5% where I would use them, is to provide leaked-object tracking. For example, in a database connection pool. But I don't consider that the “normal” case.

No comments: