Tuesday, February 16, 2021

EFS Build Performance Revisited

A few years ago I wrote a post calling out the poor performance of Amazon's Elastic File System (EFS) when used as the working directory for a software build. Since then, EFS has seen many performance improvements. Is it now viable for purposes such as developer home directories or build machines?

TL;DR: no.

As before, I'm using an m5d.xlarge EC2 instance running AWS Linux 2 as my testbed (for the record, ami-03c5cc3d1425c6d34 — you'll see later why I want to remember this). It provides four virtual CPUs and 16GB of RAM, so hardware should not be an issue. My test builds are the AWS SDK and my logging appenders project (releasing the latter is why I spun up the instance in the first place). The appenders project is larger than it was last time, but is still a reasonable “small” project. For consistency, I'm using the same tag (1.11.394) for the AWS SDK; it's grown dramatically in the interim.

I've configured the build machine with three users, each of which has their home directory in one of the tested storage types (instance store, EBS, EFS). The EBS test uses a 100 GB gp2 volume that is dedicated to the build user. For the EFS test I created two volumes, to compare the different EFS performance modes.

For each build I took the following steps:

  1. Copy the project from a "reference" user. This user has project directories without the .git directory, along with a fully-populated Maven local repository.
  2. Perform a test build. This is intended to ensure that all dependencies have been downloaded, and that there is nothing that would cause the build to fail.
  3. Run mvn clean in the test directory.
  4. Flush the disk cache (sync).
  5. For instance store, run TRIM (fstrim -v /) to avoid the penalty of SSD write amplification.
  6. Clear the in-memory buffer cache (echo 3 > /proc/sys/vm/drop_caches)
  7. Run the timed build (time mvn compile).

And here's the results. As before, I show the output from the time command: the first number is the "real" time (the wall-clock time it took to build). The second is "user" (CPU) time, while the third is "system" (kernel operation) time. All times are minutes:seconds, and are rounded to the nearest second.

  Appenders AWS SDK
  Real User System Real User System
Instance Store 00:05 00:13 00:00 01:14 02:16 00:06
EBS 00:06 00:15 00:00 01:26 02:13 00:06
EFS General Purpose 00:23 00:20 00:01 15:29 02:22 00:15
EFS Max IO 00:55 00:18 00:01 36:24 02:28 00:15

Comparing these timings to my previous run, the first thing that jumped out at me was how wildly different reported “user” time is. In fact, they are so different that my first step was to fire up an EC2 instance using the same AMI as the previous test (thankfully, AWS doesn't delete anything), and confirm the numbers (and yes, they were consistent). Intuitively, it should take the same amount of CPU time to compile a project, regardless of the performance of the disk storage, so I'm not sure why I didn't do more digging when I saw the original numbers. Regardless, “real” time tells the story.

And that story is that EFS still takes significantly longer than other options.

There have been definite performance improvements: the “general purpose” EFS volume takes 15 minutes, versus the 30+ required by the earlier test (the close correspondence of the earlier test and the “MAX IO” volume type make me think that it might be the same implementation).

But if you're speccing a build machine — or anything else that needs to work with large numbers of relatively small files — EFS remains a poor choice.

No comments: