This is the third time that I'm writing about this topic. The first time was in 2018, the second in 2021. In the interim, AWS has announced a steady stream of improvements, most recently (October) increasing read throughput to 60 MB/sec.
I wasn't planning to revisit this topic. However, I read Tim Bray's post on the Bonnie disk benchmark, and it had the comment “it’d be fun to run Bonnie on a sample of EC2 instance types with files on various EBS and EFS and so on configurations.” And after a few exchanges with him, I learned that the Bonnie++ benchmark measured file creation and deletion in addition to IO speed. So here I am.
EFS for Builds
Here's the test environment (my previous posts provide more information):
- All tests run on an
m5d.xlarge
instance (4 vCPU, 16 GB RAM), running Amazon Linux 2023 (AMIami-0453ec754f44f9a4a
). - I created three users: one using the attached instance store, one using EBS (separate from the root
filesystem), and one using EFS. Each user's home directory was on the filesystem in question, so
all build-specific IO should be confined to that filesystem type, but they shared the root filesystem
for executables and
/tmp
. - The local and EBS filesystems were formatted as
ext4
. - The EBS filesystem used a GP3 volume (so a baseline 3000 IOPS).
- The EFS filesystem used Console defaults: general purpose, elastic throughput. I mounted it using the AWS recommended settings.
- As a small project, my AWS appenders library, current (3.2.1) release.
- As a large project, the AWS Java SDK (v1),
tag
1.11.394
(the same that I used for previous posts). - The build command:
mvn clean compile
. - For each project/user, I did a pre-build to ensure that the local Maven repository was populated with all necessary dependencies.
- Between builds I flushed and cleared the filesystem cache; see previous posts for details.
- I used the
time
command to get timings; all are formatted minutes:seconds, rounded to the nearest second. “Real” time is the elapsed time of the build; if you're waiting for a build to complete, it's the most important number for you. “User” time is CPU time aggregated across threads; it should be independent of disk technology. And “System” time is that spent in the kernel; I consider it a proxy for how complex the IO implementation is (given that the absolute number of requests should be consistent between filesystems).
And here are the results:
Appenders | AWS SDK | |||||
---|---|---|---|---|---|---|
Real | User | System | Real | User | System | |
Instance Store | 00:06 | 00:16 | 00:01 | 01:19 | 02:12 | 00:09 |
EBS | 00:07 | 00:16 | 00:01 | 01:45 | 02:19 | 00:09 |
EFS | 00:18 | 00:20 | 00:01 | 15:59 | 02:24 | 00:17 |
These numbers are almost identical to the numbers from three years ago. EFS has not improved its performance when it comes to software build tasks.
What does Bonnie say?
As I mentioned above, one of the things that prompted me to revisit the topic was learning about Bonnie, specifically, Bonnie++, which performs file-level tests. I want to be clear that I'm not a disk benchmarking expert. If you are, and I've made a mistake in interpreting these results, please let me know.
I spun up a new EC2 instance to run these tests. Bonnie++ is
distributed as a source tarball;
you have to compile it yourself. Unfortunately, I was getting compiler errors (or maybe warnings) when
building on Amazon Linux. Since I no longer have enough C++ knowledge to debug such things, I switched
to Ubuntu 24.04 (ami-0e2c8caa4b6378d8c
), which has Bonnie++ as a supported package. I kept
the same instance type (m5d.xlarge).
I ran with the following parameters:
-
-c 1
, which uses a single thread. I also ran with-c 4
and-c 16
but the numbers were not significantly different. -
-s 32768
, to use 32 GB for the IO tests. This is twice the size of the VM's RAM, the test should measure actual filesystem performance and rather than the benefit of the buffer cache. -
-n 16
, to create/read/delete 16,384 small files in the second phase.
Here are the results, with the command-lines that invoked them:
- Local Instance Store:
time bonnie++ -d /mnt/local/ -c 1 -s 32768 -n 16
Version 2.00a ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP ip-172-30-1-84 32G 867k 99 128m 13 126m 11 1367k 99 238m 13 4303 121 Latency 9330us 16707us 38347us 6074us 1302us 935us Version 2.00a ------Sequential Create------ --------Random Create-------- ip-172-30-1-84 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 +++++ +++ +++++ +++ +++++ +++ 0 99 +++++ +++ +++++ +++ Latency 146us 298us 998us 1857us 18us 811us 1.98,2.00a,ip-172-30-1-84,1,1733699509,32G,,8192,5,867,99,130642,13,128610,11,1367,99,244132,13,4303,121,16,,,,,+++++,+++,+++++,+++,+++++,+++,4416,99,+++++,+++,+++++,+++,9330us,16707us,38347us,6074us,1302us,935us,146us,298us,998us,1857us,18us,811us real 11m10.129s user 0m11.579s sys 1m24.294s
- EBS:
time bonnie++ -d /mnt/ebs/ -c 1 -s 32768 -n 16
Version 2.00a ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP ip-172-30-1-84 32G 1131k 99 125m 8 65.4m 5 1387k 99 138m 7 3111 91 Latency 7118us 62128us 80278us 12380us 16517us 6303us Version 2.00a ------Sequential Create------ --------Random Create-------- ip-172-30-1-84 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Latency 218us 303us 743us 69us 15us 1047us 1.98,2.00a,ip-172-30-1-84,1,1733695252,32G,,8192,5,1131,99,128096,8,66973,5,1387,99,140828,7,3111,91,16,,,,,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,7118us,62128us,80278us,12380us,16517us,6303us,218us,303us,743us,69us,15us,1047us real 16m52.893s user 0m12.507s sys 1m4.045s
- EFS:
time bonnie++ -d /mnt/efs/ -c 1 -s 32768 -n 16
Version 2.00a ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP ip-172-30-1-84 32G 928k 98 397m 27 60.6m 6 730k 99 63.9m 4 1578 16 Latency 8633us 14621us 50626us 1893ms 59327us 34059us Version 2.00a ------Sequential Create------ --------Random Create-------- ip-172-30-1-84 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 0 0 +++++ +++ 0 0 0 0 0 1 0 0 Latency 22516us 18us 367ms 24473us 6247us 1992ms 1.98,2.00a,ip-172-30-1-84,1,1733688528,32G,,8192,5,928,98,406639,27,62097,6,730,99,65441,4,1578,16,16,,,,,218,0,+++++,+++,285,0,217,0,944,1,280,0,8633us,14621us,50626us,1893ms,59327us,34059us,22516us,18us,367ms,24473us,6247us,1992ms real 23m56.715s user 0m11.690s sys 1m18.469s
For the first part, reading large block files, I'm going to focus on the “Rewrite” statistic: the program reads a block from the already created file, makes a change, and writes it back out. For this test, local instance store managed 126 MB/sec, EBS was 65.4 MB/sec, and EFS was 60.6 MB/sec. Nothing surprising there: EFS achieved its recently-announced throughput, and a locally-attached SSD was faster than EBS (although much slower than the 443 MB/sec from my five-year-old laptop, a reminder that EC2 provides fractional access to physical hardware).
The second section was what I was interested in, and unfortunately, the results don't give much insight. In some doc I read that "+++++" in the output signifies that the results aren't statistically relevant (can't find that link now). Perhaps that's due to Bonnie++ dating to the days of single mechanical disks, and modern storage systems are all too fast?
But one number that jumped out at me was “Latency” for file creates: 146us for instance store, 218us for EBS, but a whopping 22516us for EFS. I couldn't find documentation for this value anywhere; reading the code, it appears to measure the longest time for a single operation. Which means that EFS could have 99% of requests completing in under 100ms but a few outliers, or it could mean generally high numbers, of which the one stated here is merely the worst. I suspect it's the latter.
I think, however, that the output from the Linux time
command tells the story:
each of the runs uses 11-12 seconds of “user” time, and a minute plus of “system”
time. But they vary from 11 minutes of “real” time for instance store, up to
nearly 24 minutes for EFS. That says to me that EFS has much poorer performance, and since
the block IO numbers are consistent, it must be accounted for by the file operations (timestamps
on the operation logs would make this a certainty).
Conclusion
So should you avoid EFS for your build systems? Mu.
When I first looked into EFS performance, in 2018, I was driven by my experience setting up a build server. But I haven't done that since then, and can't imagine that too many other people have either. Instead, the development teams that I work with typically use “Build as a Service” tools such as GitHub Actions (or, in some cases, Amazon CodeBuild). Running a self-hosted build server is, in my opinion, a waste of time and money for all but the most esoteric needs.
Wo where does that leave EFS?
I think that EFS is valuable for sharing files — especially large files — when you want or need
filesystem semantics rather than the web-service semantics of S3. To put this into concrete terms: you can
read a section of an object from S3, but it's much easier codewise to lseek
or mmap
a file (to be fair, I haven't looked at how well Mountpoint for Amazon S3 handles those operations). And if you need the ability to modify
portions of a file, then EFS is the only real choice: to do that with S3 you'd have to rewrite the entire file.
For myself, I haven't found that many use cases where EFS is the clear winner over alternatives. And given that, and the fact that I don't plan to set up another self-hosted build server, this is the last posting that I plan to make on the topic.