Monday, August 27, 2018

(Poor) EFS Performance for Software Builds

Elastic File System (EFS) is, on the surface, a nice addition to the AWS storage options: based on the long-lived Networked File System (NFS), it lets you share a single volume between multiple EC2 instances. It also has the benefit that you only pay for what you actually use: while more expensive than Elastic Block Store (EBS), you don't have to pay for provisioned-but-unused capacity.

The dark side of EFS is performance. Unlike EBS, it's not measured in IOPS, but in megabytes per second of throughput. And it's a “burstable” performance model: while you can get a peak throughput of 100 MiB/second (or higher, for volumes over a terabyte), you can't sustain that rate. Instead, you get a “baseline” rate that's dependent on your volume size, and a pool of credits that is consumed or replenished depending on whether your actual usage is above or below that baseline rate.

That, in itself, is not so bad, but the baseline rate is 50 kiB/second per gigabyte of storage. So for small volumes, you don't get much throughput at all. Since July 2018 you've been able to buy provisioned throughput, at $6 per MiB/second per month. Historically, the way to get higher throughput has been to create a large empty file (eg, 100 Gib gets you 5 MiB/sec; it costs the same price as equivalent provisioned throughput).

The practical outcome of this behavior is that EFS is inappropriate for many services that need high throughput with relatively small filesystem sizes (unless you buy provisioned throughput). And as a result, some some software vendors recommend against using EFS with their software (for example, Atlassian BitBucket and Sonatype Nexus).

That said, one of the Amazon-recommended usecases for EFS is for user homes, and this has traditionally been one of the primary usecases for shared filesystems: no matter what machine you log into, you get the same home directory. But what happens if you're a developer and run builds on the shared filesystem? Or, going one step further, is EFS an appropriate choice for the build directories of a fleet of build machines?

Several years ago, I compared the performance of hard disks and SSDs for builds. The outcome of those experiments was that there was a negligible (11%) difference, because software builds are generally CPU-bound rather than IO-bound. So does the same logic apply to building on EFS?

To come up with an answer, I spun up an m5d.xlarge EC2 instance. This instance has 4 virtual CPUs and 16 Gb of RAM, so is a decent representative build machine. More important, it has 150 Gb of NVMe instance storage, so I figured it would be a good comparison for a similarly-sized developer PC. It was running Amazon Linux 2 (ami-04681a1dbd79675a5) along with OpenJDK 1.8 and Maven 3.5.4.

I also ran the compile tests on my personal PC, which features a Core i7-3770k CPU at 3.5 GHz, 32 GB of RAM, and a Samsung 850 Pro SSD; it runs Xubuntu 18.04, Oracle JDK 8, and Maven 3.5.2. It's now five years old, but is still a competent development box, so I figured would give me a baseline to compare to running in the cloud.

I did builds with two pieces of software: the AWS Java SDK (tag 1.11.394) and my Log4J AWS appenders project. The former is huge: 146 sub-projects and 39,124 Java source files; it represents a deployable application (although, really, is far larger than most). The latter is much smaller; with 64 source files, it's meant to represent a module that might be used by that application.

In both cases I built using mvn clean compile: I didn't want a slow-running test suite to interfere with the results. My experimental process was to flush the disk cache (see my prior post for how to do that), clone the source repository, run a first build to download any dependencies, flush caches again, then run the timed build. I also timed the clone when running on EC2; I didn't for my PC because it would be limited by the speed of my network connection.

To avoid any cross-contamination, I created a separate user for each mounted volume, so all parts of the build would be touching that volume and no other. Here are the details:

  • local: 150Gb of NVMe storage, formatted as ext4
  • ebs: an external EBS volume (so that there wouldn't be contention with the root volume), formatted as ext4
  • efs: an EFS filesystem endpoint from the same availability zone as the EC2 instance. It was mounted using the Amazon-recommended set of mount options.
  • efs-remote: an EFS filesystem with an endpoint in a different availability zone, to see if cross-AZ connections introduced any lag. As with the efs volume, this volume was also manually mounted, because the EFS Mount Helper will refuse to mount a volume that doesn't have an endpoint in the current AZ).
  • nfs: to see if there were overheads introduced by EFS on top of the NFS protocol, I spun up an m5d.large instance in the same availability zone as my test machine, and exported its NVMe instance store (formatted internally as ext4). The export options were (rw,sync,insecure,no_root_squash,no_subtree_check,fsid=0), and the mount options were identical to those used for EFS.

One important note: the EFS filesystems were created just for this test. That means that they had a full credit balance, and the ability to run at 100 MiB/sec throughput for the entire test. In other words, it presented the peak of performance from an EFS filesystem.

Here are the results. Each column is a task, each row is a filesystem type, and the three numbers in each cell are the results from the Linux time command: real, user, and system. Real time is the total amount of time that the step took; it's what most people care about. User time is CPU time; Maven can make use of multiple cores, so this is usually higher than real time. System time is time spent in the kernel; I think of it as a proxy for the number of system calls. All times are represented as MINUTES:SECONDS, with fractional seconds rounded up (I don't see benefit to half-even rounding here).

  AWS Java SDK - clone AWS Java SDK - build AWS Appenders - clone AWS Appenders - build
Desktop PC   01:21 / 02:20 / 00:08   00:03 / 00:09 / 00:01
Local NVMe 01:21 / 02:47 / 00:08 07:44 / 20:17 / 00:08 00:01 / 00:01 / 00:01 00:18 / 00:56 / 00:01
EBS 01:21 / 02:47 / 00:09 08:14 / 21:58 / 00:08 00:01 / 00:01 / 00:01 00:18 / 00:56 / 00:01
EFS Same AZ 18:01 / 02:48 / 00:15 33:05 / 19:18 / 00:17 00:08 / 00:01 / 00:01 00:29 / 01:11 / 00:01
EFS Cross AZ 19:16 / 02:49 / 00:13 35:17 / 18:36 / 00:17 00:08 / 00:01 / 00:01 00:30 / 01:08 / 00:01
NFS 02:17 / 02:48 / 00:12 08:56 / 22:42 / 00:19 00:01 / 00:01 / 00:01 00:18 / 00:55 / 00:01

The first surprise, for me, was just how poorly the EC2 instance performed compared to my destop. According to Amazon's docs, an M5 instance uses “2.5 GHz Intel Xeon® Platinum 8175 processors”: several generations newer than the Core i7 in my PC, but running at a lower clock rate. If we assume that my CPU is able to use &lrdquo;Turbo Boost” mode at 3.9 GHz, then the EC2 instance should be roughly 2/3 as fast based just on clock rate. Which should mean that builds might take twice as long, but definitely not five times as long.

I have no idea what accounts for the difference. That same EC2 doc says that “Each vCPU is a hyperthread of an Intel Xeon core,” so the four vCPUs of the EC2 instance are not the same as the four physical cores (8 hyperthreads) of my PC, and perhaps that's the cause. The ratio between real and CPU time is certainly higher on the EC2 instance, and the difference in number of cores could compound with the difference in CPU clock rate. Other things that I thought of were having more memory in my PC, leading to a larger buffer cache, but 16GB should have been more than enough for the jobs that I ran. Another possibility was a “noisy neighbor” on the same physical hardware as the EC2 instance, but I re-ran these tests after stopping and restarting the instance (so it was deployed on multiple physical machines). This is a topic for more experiments, but in the meantime I now question the wisdom of developing in the cloud.

The (lack of) performance difference between instance store and EBS wasn't surprising: it basically reiterated the results of my former post, in which I found that drive performance had little effect on compile times. While NVMe instance store may be far faster than EBS in absolute performance, you need a workload that can exploit that. And building isn't it.

The real stunner, however, was how badly EFS performed, especially compared to NFS. To be truthful, it was expected: the reason I ran these tests was seeing unexpectedly poor performance from a build machine.

When I first saw the performance problems, I thought it was due to the NFS protocol, which is based on a filesystem abstraction rather than the block-store abstraction of a disk-based filesystem. But as the results show, a generic NFS server can perform almost as well as EBS: it takes a little more time to compile, and significantly more time to check-out a large repository, but nowhere near as much as EFS.

The Amazon docs don't say much about how EFS actually works, other than “EFS file systems store data and metadata across multiple Availability Zones” and that they “allow massively parallel access” … that sounds a lot like S3 to me. I don't know if EFS is indeed a protocol layer on top of S3, but its performance compared to vanilla NFS tells me that there's a lot happening behind the scenes; it's not a traditional filesystem.

The bottom line: while there may be valid usecases for EFS, developer home directories and build-server storage aren't among them.

No comments: