Monday, June 16, 2014

Finalizers

I recently saw a comment on my reference objects article that said, in effect, “he mentioned finalizers but didn't say DON'T USE THEM EVER, so I stopped reading.” The capitalized words are taken exactly from the original comment; the rest is a paraphrase to tone down the rhetoric. The comment didn't bother me — if you stop reading so easily, you probably won't gain much from my articles anyway — but the attitude did. Because finalizers do indeed have a role to play.

They compensate for sloppy programming.

Good Java programmers know that non-memory resource allocation has to take place in a try / catch / finally construct. Just like good C programmers know that every malloc() must be matched by a free(). Sloppy programmers either don't know or don't care. And good programmers sometimes forget.

The question then becomes: what should the program do?

One alternative is to just explode. And this is not necessarily a bad thing: not closing your resources is a bug, and it's better to expose bugs early in the development process. In this view, forgetting to close your resource is no different than dereferencing a null pointer.

The problem with this view is that null pointer exceptions tend to show up early, the first time through the affected code. By comparison, leaked resources tend to hide until production, because you never generate sufficient load during development. Leaked file handles are a great example: a typical developer environment allows 4,096 open files. It will take a long time to run out of them, especially if you constantly restart the app. A typical server environment might allow twice or four times that much, but if you never shut down the app you'll eventually run out. Probably at your time of highest load.

And that's leads to an alternate solution: protect the programmers from themselves, by checking for open resources at the time the owning object is garbage collected. This isn't perfect: in a server with lots of memory, the garbage collection interval might exceed the time taken to exhaust whatever resource you're leaking.*

Once you accept the need for some alternate way to clean up non-memory resources, the only remaining question is how. Jave provides two mechanisms: finalizers or (since 1.2) phantom references.

So which should you use? Before answering that question, I want to point out something to the DON'T USE THEM EVER crowd: finalizers and phantom references are invoked under exactly the same conditions: when the garbage collector decides that an object is eligible for collection. The difference is what happens afterward. With phantom references the object gets collected right away, and the reference is put on a queue for later processing by the application. With finalizers, the object is passed to a JVM-managed thread that runs the finalize() method; actual collection happens once the finalizer runs.

Given the choice between a finalizer and a phantom reference, I will pick the finalizer 95% of the time. Phantom references require a lot of work to implement correctly. In the normal case, the only real benefit they provide is to avoid out-of-memory errors from large objects with long-running finalizers.**

A better answer is to avoid both. Far too many people think of finalizers as equivalent to C++ destructors, a place to put important cleanup code such as transaction commit/rollback. This is simply wrong. But it's just as wrong to invoke that code in a phantom reference.

Rather than dogmatically insist “no finalizers!”, I think the better approach is to adopt some development practices to prevent their invocation. One approach is to identify resources that are allocated but not released, using an analysis tool like FindBugs.

But if you want to recover from mistakes in the field, and can do so quickly and cleanly (so that you don't block the finalization thread), don't be afraid of finalizers.


* This is unlikely: if you're running through resources quickly, the associated objects will almost certainly be collected while still in the young generation. At this point, a pedantic pinhead will say “but the JLS doesn't guarantee that the garbage collector will ever run.” My best response to that: ”well, then you've got bigger problems.”

** A bigger benefit, and this is the 5% where I would use them, is to provide leaked-object tracking. For example, in a database connection pool. But I don't consider that the “normal” case.

Monday, June 2, 2014

Is that SSD Really Helping Your Build Times?

Update: I ran similar tests with full-disk encryption.

As developers, we want the biggest bad-ass machine that we can get, because waiting for the computer is so last century. And part of a bad-ass machine is having a solid-state drive, with sub-millisecond latency. Spinning platters covered in rust are not just last-century, they're reminiscent of the industrial age. But do we really benefit from an SSD?

This post emerged from a conversation with a co-worker: he was surprised that I encrypted my home directory, because of the penalty it caused to disk performance. My response was that I expected most of my files to be living in RAM, unencrypted, in the disk buffer. That led to a discussion about whether an SSD provided any significant benefit, given enough RAM to keep your workspace in the buffer cache. Turns out I was wrong about unencrypted data in the cache, but not about the SSD.

I was confident about the latter because a year ago, when I built my then-seriously-badass home computer (32Gb RAM — because I could), I ran some performance comparisons against my then-seriously-pathetic vintage 2002 machine. The new machine blew away the old, but much of the performance gain seemed to come from CPU-related items: faster clock speed, faster memory, huge L1 and L2 caches, and so on. Once CPU time was deducted, the difference between spinning rust and SSD wasn't that big.

I started to write a post at that time, but went down a rathole of trying to create a C program that could highlight the difference in L1/L2 cache. Then the old machine suffered an “accident,” and that was the end of the experiments.

Now, however, I have a far simpler task: quantify the difference that an SSD makes to a developer's workload. Which can be rephrased as “will buying an SSD speed up my compile times?” This is particularly important to me right now, because I'm on a project where single-module compiles take around a minute, and full builds are over 30.

Here's the experimental protocol:

Hardware:

  • Thinkpad W520: 4-core Intel Core i7-2860QM CPU @ 2.50GHz, 800MHz FSB. A bad-ass laptop when I got it (I just wish it wasn't so heavy).
  • 8 GB RAM, 8 MB L2 cache
  • Intel “320 Series” SSD, 160 Gb, formatted as ext4. This is not a terribly fast drive, but with an average access time of 0.2 ms, and an average read rate of 270 MB/sec (as measured by the Gnome disk utility), it blows away anything with a platter.
  • Western Digital WD2500BMVU, 250 GB, 5400 RPM, formatted as ext4, accessed via USB 2.0. This is a spare backup drive; I don't think I own anything slower unless I were to reformat an old Apple SCSI drive and run it over a USB-SCSI connector (and yes, I have both). Average access time: 17.0 ms; average read rate: 35 MB/sec.
  • Xubuntu 12.04, 3.2.0-63-generic #95-Ubuntu SMP.

Workload:

  • Spring Framework 3.2.0.RELEASE. A large Java project with lots of dependencies, this should be the most disk-intensive of the three sample workloads. The build script is Gradle, which downloads and caches all dependencies.*
  • Scala 2.11.1. I'm currently working on a Scala project, and the Scala compiler itself seemed like a good sample. The main difference between Scala and Java (from a workload perspective) is that the Scala compiler does a lot more CPU-intensive work; in the office I can tell who's compiling because their CPU fan sounds like a jet engine spooling up. The build script is Ant, using Ivy to download and cache dependencies.**
  • GNU C Compiler 4.8.3. Added because not everyone uses the JVM. I didn't look closely at the makefile, but I'll assume that it has optimization turned up. Disk operations should be confined to reading source files, and repeated reads of header files.

Test conditions:

General configuration:

  • Each test is conducted as a distinct user, with its own home directory, to ensure that there aren't unexpected cross-filesystem accesses.
  • Each build is run once to configure (gcc) and/or download dependencies.
  • Timed builds are run from normal desktop environment, but without any other user programs (eg: browser) active.
  • Timed builds run with network (wifi and wired) disconnected.
  • The Spring and Scala times are an average of three runs. The gcc time is from a single run (I didn't have the patience to do repeated multi-hour builds, just to improve accuracy by a few seconds).

Per-test sequence:

  • Clean build directory (depends on build tool).
  • Sync any dirty blocks to disk (sync).
  • SSD TRIM (fstrim -v /)
  • Clear buffer cache (echo 3 > /proc/sys/vm/drop_caches)
  • Execute build, using time.

And now, the results. Each entry in the table contains the output from the Unix time command, formatted real / user / sys. I've converted all times to seconds, and rounded to the nearest second. The only number that really matters is the first, “real”: it's the time that you have to wait until the build is done. “User” is user-mode CPU time; it's primarily of interest as a measure of how parallel your build is (note that the JVM-based builds are parallel, the gcc build isn't). “Sys” is kernel-mode CPU time; it's included mostly for completeness, but notice the difference between encrypted and non-encrypted builds.

  Spring Framework Scala GCC
Unencrypted SSD 273 / 527 / 10 471 / 1039 / 13 6355 / 5608 / 311
Encrypted SSD 303 / 534 / 38 491 / 1039 / 29 6558 / 5682 / 400
USB Hard Drive 304 / 525 / 11 477 / 1035 / 14 6462 / 5612 / 311
Encryption Penalty 11 % 4 % 3 %
Spinning Rust Penalty 11 % 1 % 2 %

Do the numbers surprise you? I have to admit, they surprised me: I didn't realize that the penalty for encryption was quite so high. I haven't investigated, but it appears that ecryptfs, as a FUSE filesystem, does not maintain decrypted block buffers. Instead, the buffered data is encrypted and has to be decrypted on access. This explains the significantly higher sys numbers. Of course, losing my laptop with unencrypted client data has it's own penalty, so I'm willing to pay the encryption tax.

As for the difference between the SSD and hard drive: if you look at your drive indicator light while compiling, you'll see that it really doesn't flash much. Most of a compiler's work is manipulating data in-memory, not reading and writing. So the benefit that you'll get from those sub-millisecond access times is just noise.

On the other hand, if you're doing data analysis with large datasets, I expect the numbers would look very different. I would have killed for an SSD 3-4 years ago, when I was working with files that were tens of gigabytes in length (and using a 32 GB Java heap to process them).

Finally, to borrow an adage from drag racers: there's no subtitute for RAM. With 8 GB, my machine can spare a lot of memory for the buffer cache: free indicated 750 Mb after the Scala build, and several gigabytes after the gcc build. Each block in the cache is a block that doesn't have to be read from the disk, and developers tend to hit the same blocks over and over again: source code, the compiler executable, and libraries. If you have enough RAM, you could conceivably load your entire development environment with the first build Monday morning, and not have to reload it all week.

At least, that's what I told myself to justify 32Gb in my home computer.


* I picked this particular version tag because it mostly builds: it fails while building spring-context, due to a missing dependency. However, it spends enough time up to that point that I consider it a reasonable example of a “large Java app.” I also tried building the latest 3.X tag, but it fails right away due to a too-long classname. That may be due to the version of Groovy that I have installed, but this experience has shaken my faith in the Spring framework as a whole.

** Scala has issues with long classnames as well, which means that the build will crash if you run it as-is on an encrypted filesystem (because encryption makes filenames longer). Fortunately, there's an option to tell the compiler to use shorter names: ant -Dscalac.args='-Xmax-classfile-name 140'

What if you don't have a lot of memory to spare? I also tried running the Spring build on my anemic-when-I-bought-it netbook: an Intel Atom N450 @ 1.66 GZ with 1GB of RAM and a 512KB L2 cache. The stock hard drive is a Fujitsu MJA2250BH: 250 GB, 5400 RPM, an average read rate of 72 MB/sec, and an average access time of 18 ms. I also have a Samsung 840 SSD that I bought when I realized just how anemic this machine was, thinking that, if nothing else, it would act as a fast swap device. However, it doesn't help much: with the stock hard drive, Spring builds in 44 minutes; with the SSD, 39. An 11% improvement, but damn! If you look at the specs, that netbook is a more powerful machine than a Cray I. But it's completely unusable as a modern development platform.