Monday, June 2, 2014

Is that SSD Really Helping Your Build Times?

Update: I ran similar tests with full-disk encryption.

As developers, we want the biggest bad-ass machine that we can get, because waiting for the computer is so last century. And part of a bad-ass machine is having a solid-state drive, with sub-millisecond latency. Spinning platters covered in rust are not just last-century, they're reminiscent of the industrial age. But do we really benefit from an SSD?

This post emerged from a conversation with a co-worker: he was surprised that I encrypted my home directory, because of the penalty it caused to disk performance. My response was that I expected most of my files to be living in RAM, unencrypted, in the disk buffer. That led to a discussion about whether an SSD provided any significant benefit, given enough RAM to keep your workspace in the buffer cache. Turns out I was wrong about unencrypted data in the cache, but not about the SSD.

I was confident about the latter because a year ago, when I built my then-seriously-badass home computer (32Gb RAM — because I could), I ran some performance comparisons against my then-seriously-pathetic vintage 2002 machine. The new machine blew away the old, but much of the performance gain seemed to come from CPU-related items: faster clock speed, faster memory, huge L1 and L2 caches, and so on. Once CPU time was deducted, the difference between spinning rust and SSD wasn't that big.

I started to write a post at that time, but went down a rathole of trying to create a C program that could highlight the difference in L1/L2 cache. Then the old machine suffered an “accident,” and that was the end of the experiments.

Now, however, I have a far simpler task: quantify the difference that an SSD makes to a developer's workload. Which can be rephrased as “will buying an SSD speed up my compile times?” This is particularly important to me right now, because I'm on a project where single-module compiles take around a minute, and full builds are over 30.

Here's the experimental protocol:

Hardware:

  • Thinkpad W520: 4-core Intel Core i7-2860QM CPU @ 2.50GHz, 800MHz FSB. A bad-ass laptop when I got it (I just wish it wasn't so heavy).
  • 8 GB RAM, 8 MB L2 cache
  • Intel “320 Series” SSD, 160 Gb, formatted as ext4. This is not a terribly fast drive, but with an average access time of 0.2 ms, and an average read rate of 270 MB/sec (as measured by the Gnome disk utility), it blows away anything with a platter.
  • Western Digital WD2500BMVU, 250 GB, 5400 RPM, formatted as ext4, accessed via USB 2.0. This is a spare backup drive; I don't think I own anything slower unless I were to reformat an old Apple SCSI drive and run it over a USB-SCSI connector (and yes, I have both). Average access time: 17.0 ms; average read rate: 35 MB/sec.
  • Xubuntu 12.04, 3.2.0-63-generic #95-Ubuntu SMP.

Workload:

  • Spring Framework 3.2.0.RELEASE. A large Java project with lots of dependencies, this should be the most disk-intensive of the three sample workloads. The build script is Gradle, which downloads and caches all dependencies.*
  • Scala 2.11.1. I'm currently working on a Scala project, and the Scala compiler itself seemed like a good sample. The main difference between Scala and Java (from a workload perspective) is that the Scala compiler does a lot more CPU-intensive work; in the office I can tell who's compiling because their CPU fan sounds like a jet engine spooling up. The build script is Ant, using Ivy to download and cache dependencies.**
  • GNU C Compiler 4.8.3. Added because not everyone uses the JVM. I didn't look closely at the makefile, but I'll assume that it has optimization turned up. Disk operations should be confined to reading source files, and repeated reads of header files.

Test conditions:

General configuration:

  • Each test is conducted as a distinct user, with its own home directory, to ensure that there aren't unexpected cross-filesystem accesses.
  • Each build is run once to configure (gcc) and/or download dependencies.
  • Timed builds are run from normal desktop environment, but without any other user programs (eg: browser) active.
  • Timed builds run with network (wifi and wired) disconnected.
  • The Spring and Scala times are an average of three runs. The gcc time is from a single run (I didn't have the patience to do repeated multi-hour builds, just to improve accuracy by a few seconds).

Per-test sequence:

  • Clean build directory (depends on build tool).
  • Sync any dirty blocks to disk (sync).
  • SSD TRIM (fstrim -v /)
  • Clear buffer cache (echo 3 > /proc/sys/vm/drop_caches)
  • Execute build, using time.

And now, the results. Each entry in the table contains the output from the Unix time command, formatted real / user / sys. I've converted all times to seconds, and rounded to the nearest second. The only number that really matters is the first, “real”: it's the time that you have to wait until the build is done. “User” is user-mode CPU time; it's primarily of interest as a measure of how parallel your build is (note that the JVM-based builds are parallel, the gcc build isn't). “Sys” is kernel-mode CPU time; it's included mostly for completeness, but notice the difference between encrypted and non-encrypted builds.

  Spring Framework Scala GCC
Unencrypted SSD 273 / 527 / 10 471 / 1039 / 13 6355 / 5608 / 311
Encrypted SSD 303 / 534 / 38 491 / 1039 / 29 6558 / 5682 / 400
USB Hard Drive 304 / 525 / 11 477 / 1035 / 14 6462 / 5612 / 311
Encryption Penalty 11 % 4 % 3 %
Spinning Rust Penalty 11 % 1 % 2 %

Do the numbers surprise you? I have to admit, they surprised me: I didn't realize that the penalty for encryption was quite so high. I haven't investigated, but it appears that ecryptfs, as a FUSE filesystem, does not maintain decrypted block buffers. Instead, the buffered data is encrypted and has to be decrypted on access. This explains the significantly higher sys numbers. Of course, losing my laptop with unencrypted client data has it's own penalty, so I'm willing to pay the encryption tax.

As for the difference between the SSD and hard drive: if you look at your drive indicator light while compiling, you'll see that it really doesn't flash much. Most of a compiler's work is manipulating data in-memory, not reading and writing. So the benefit that you'll get from those sub-millisecond access times is just noise.

On the other hand, if you're doing data analysis with large datasets, I expect the numbers would look very different. I would have killed for an SSD 3-4 years ago, when I was working with files that were tens of gigabytes in length (and using a 32 GB Java heap to process them).

Finally, to borrow an adage from drag racers: there's no subtitute for RAM. With 8 GB, my machine can spare a lot of memory for the buffer cache: free indicated 750 Mb after the Scala build, and several gigabytes after the gcc build. Each block in the cache is a block that doesn't have to be read from the disk, and developers tend to hit the same blocks over and over again: source code, the compiler executable, and libraries. If you have enough RAM, you could conceivably load your entire development environment with the first build Monday morning, and not have to reload it all week.

At least, that's what I told myself to justify 32Gb in my home computer.


* I picked this particular version tag because it mostly builds: it fails while building spring-context, due to a missing dependency. However, it spends enough time up to that point that I consider it a reasonable example of a “large Java app.” I also tried building the latest 3.X tag, but it fails right away due to a too-long classname. That may be due to the version of Groovy that I have installed, but this experience has shaken my faith in the Spring framework as a whole.

** Scala has issues with long classnames as well, which means that the build will crash if you run it as-is on an encrypted filesystem (because encryption makes filenames longer). Fortunately, there's an option to tell the compiler to use shorter names: ant -Dscalac.args='-Xmax-classfile-name 140'

What if you don't have a lot of memory to spare? I also tried running the Spring build on my anemic-when-I-bought-it netbook: an Intel Atom N450 @ 1.66 GZ with 1GB of RAM and a 512KB L2 cache. The stock hard drive is a Fujitsu MJA2250BH: 250 GB, 5400 RPM, an average read rate of 72 MB/sec, and an average access time of 18 ms. I also have a Samsung 840 SSD that I bought when I realized just how anemic this machine was, thinking that, if nothing else, it would act as a fast swap device. However, it doesn't help much: with the stock hard drive, Spring builds in 44 minutes; with the SSD, 39. An 11% improvement, but damn! If you look at the specs, that netbook is a more powerful machine than a Cray I. But it's completely unusable as a modern development platform.

No comments: