blog.kdgregory.com

Tuesday, June 8, 2021

An open letter to the AWS Training organization

You don't have a Feedback link on your site, but it seems that Amazon keeps close tabs on the Blogosphere, so hopefully this reaches you.

I don't know whether you're an actual sub-division of Amazon, but the website URL https://www.aws.training certainly didn't give me a warm fuzzy feeling when it came up in Google. In fact, my first thought was that it was some unaffiliated company that had better SEO.

So, since it was asking me for login credentials, I did what any reasonably cautious technologist would do, and ran whois. And this is what I got back:

Domain Name: aws.training
Registry Domain ID: 8d519b3def254d2f980a08f62416a5b9-DONUTS
Registrar WHOIS Server: whois.comlaude.com
Registrar URL: http://www.comlaude.com
Updated Date: 2019-05-19T19:54:24Z
Creation Date: 2014-03-19T00:32:11Z
Registry Expiry Date: 2024-03-19T00:32:11Z
Registrar: Nom-iq Ltd. dba COM LAUDE
Registrar IANA ID: 470
Registrar Abuse Contact Email: abuse@comlaude.com
Registrar Abuse Contact Phone: +44.2074218250
Registrant Name: REDACTED FOR PRIVACY
Registrant Organization: Amazon Technologies, Inc.
Registrant Street: REDACTED FOR PRIVACY
Registrant City: REDACTED FOR PRIVACY
Registrant State/Province: NV
Registrant Postal Code: REDACTED FOR PRIVACY
Registrant Country: US
Registrant Phone: REDACTED FOR PRIVACY
Registrant Phone Ext: REDACTED FOR PRIVACY
Registrant Fax: REDACTED FOR PRIVACY
Registrant Fax Ext: REDACTED FOR PRIVACY
Registrant Email: Please query the RDDS service of the Registrar of Record identified in this output for information on how to contact the Registrant, Admin, or Tech contact of the queried domain name.

That's the sort of whois entry that you get for an individual using a shared hosting service. In fact, it provides less information than you'll see with my domain, which runs on a shared hosting service, and I pay extra for privacy.

By comparison, the whois entry for Amazon itself looks like this (and note that it's a different registrar, another red flag):

Domain Name: amazon.com
Registry Domain ID: 281209_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.markmonitor.com
Registrar URL: http://www.markmonitor.com
Updated Date: 2019-08-26T12:19:56-0700
Creation Date: 1994-10-31T21:00:00-0800
Registrar Registration Expiration Date: 2024-10-30T00:00:00-0700
Registrar: MarkMonitor, Inc.
Registrar IANA ID: 292
Registrar Abuse Contact Email: abusecomplaints@markmonitor.com
Registrar Abuse Contact Phone: +1.2083895770
Registrant Name: Hostmaster, Amazon Legal Dept.
Registrant Organization: Amazon Technologies, Inc.
Registrant Street: P.O. Box 8102
Registrant City: Reno
Registrant State/Province: NV
Registrant Postal Code: 89507
Registrant Country: US
Registrant Phone: +1.2062664064
Registrant Phone Ext: 
Registrant Fax: +1.2062667010
Registrant Fax Ext: 
Registrant Email: hostmaster@amazon.com

While I'm a little surprised by the Reno address, rather than Seattle, this at least looks like the sort of registration information used by a business rather than somebody who pays $10/month for hosting.

I ended up getting to the training site via a link on the AWS Console, so was able to achieve my goal.

But I think there's a general lesson: don't forsake your brand without good reason.

And at the very least, ask your network administrators to update your whois data.

Friday, February 26, 2021

Java8 Lambda Startup Times

A few months ago I wrote a post about startup times of AWS Lambdas written in Java. This post has a similar title, but a different topic: it looks at the first-run time for lambdas (lowercase) in a Java program, and has nothing to do with AWS. Although I did discover this issue while writing code for AWS. Confused yet?

Lambda expressions were added to the Java language with the release of Java 8 in 2014. By now I'm assuming every Java programmer has used them, if only as arguments to higher-order functions in the java.util.stream package:

List<String> uppercasedNames = names.stream()
                               .map(s -> s.toUpperCase())
                               .collect(Collectors.toList());

You can implement your own higher-order functions, with parameter types from the java.util.function package (or, if you have more complex needs, defining your own functional interfaces). So, for example, you might have a function with this signature:

public static String retryLambda(long interval, long timeout, Supplier<String> lambda) throws Exception

This can be called with any lambda expression that doesn't take arguments and returns a string. For example:

retryLambda(50, 100, () -> Instant.now().toString());

As you might have guessed from the signature, this function retries some operation. But before I dig into the implementation, here's some background about why you'd implement such a function. Most of my recent posts have referenced my AWS logging library, and this one's no different. When working with AWS, you need to be prepared to retry operations: either because AWS throttled the request (returning an error that indicates you should retry after a short delay), or because operations are eventually-consistent (there's a delay between creating something and being able to use it). As a result, AWS code can include a lot of retry loops:*

long timeoutAt = System.currentTimeMillis() + timeout;
while (System.currentTimeMillis() < timeoutAt)
{
    String value = doSomething();
    if (value != null)
        return value;
    Thread.sleep(interval);
}
throw new RuntimeException("timeout expired");

That's seven lines of boilerplate wrapping one line that actually does something. Functional programming is all about getting rid of boilerplate, so I implemented a function that would accept a lambda:**

public static String retryLambda(long interval, long timeout, Supplier<String> lambda) throws Exception
{   
    long timeoutAt = System.currentTimeMillis() + timeout;
    while (System.currentTimeMillis() < timeoutAt)
    {
        String value = lambda.get();
        if (value != null)
            return value;
        Thread.sleep(interval);
    }
    
    throw new RuntimeException("timeout expired");
}

The hardcoded loops can now be replaced with a call to this function:

retryLambda(50, 250, () -> doSomething());

All well and good, and it reduced the size of the code, but then my tests started failing.

When you're actually talking to AWS, you might need a timeout of 30 seconds or more. But you definitely don't want such a long timeout in a unit test. To solve that problem, I replaced the interval and timeout arguments with much shorter values: 50 and 200 milliseconds. And then my tests would assert the number of times the function was called: based on those values, the operation should be attempted four times before timing out. However, I was seeing that they were only executed two or three times.

When I dug into the problem, what I discovered is that the first execution of a lambda takes 40 to 50 milliseconds on my Core i7-3770K running Oracle Java 1.8.0_271. I knew there was a lot happening behind the scenes to make lambdas work, but wow, that's nearly infinity!

I also ran on an EC2 m5a.xlarge instance running AWS Linux 2, and saw that it took over 70 milliseconds with OpenJDK 1.8.0_272, but only 18 milliseconds running Corretto 11.0.10.9.1. I have to assume that the performance improvement is similar across Java11 implementations, but haven't tested. If you'd like to try it out yourself, I've created a GitHub Gist with the test program.

One thing that I do not want you to take from this post is the idea that Java lambdas are bad, or are poorly implemented. I didn't delve too deeply into what happens during that first invocation, but suspect that the JVM is loading something from disk (much like the initial JVM startup time). And in my experiments, invoking additional, different lambdas did not add to the execution time. So, like anything Java, lambdas are best used in a long-running program.

However, if you are in a similar situation, testing timing-dependent code that utilizes lambdas, you need to be prepared. When I ran into the problem, I simply wanted to move on with my life and relaxed the assertions (the primary assertion was elapsed time, which didn't change; it was simply the number of invocations). Now, after thinking about the problem and writing the example program for this post, I think I'd use a @BeforeClass function to “warm up” the lambda mechanism.


* Not all AWS code needs to have retry loops. But, for example, if you create a Kinesis stream you will need to wait until it becomes active before writing to it. I've seen some mock implementations of AWS services that don't accurately reflect these delays, leading to code that fails in the real world.

* Actually, I implemented a class, which was far easier to replace during testing. For an example if its use, look here.

Tuesday, February 16, 2021

EFS Build Performance Revisited

A few years ago I wrote a post calling out the poor performance of Amazon's Elastic File System (EFS) when used as the working directory for a software build. Since then, EFS has seen many performance improvements. Is it now viable for purposes such as developer home directories or build machines?

TL;DR: no.

As before, I'm using an m5d.xlarge EC2 instance running AWS Linux 2 as my testbed (for the record, ami-03c5cc3d1425c6d34 — you'll see later why I want to remember this). It provides four virtual CPUs and 16GB of RAM, so hardware should not be an issue. My test builds are the AWS SDK and my logging appenders project (releasing the latter is why I spun up the instance in the first place). The appenders project is larger than it was last time, but is still a reasonable “small” project. For consistency, I'm using the same tag (1.11.394) for the AWS SDK; it's grown dramatically in the interim.

I've configured the build machine with three users, each of which has their home directory in one of the tested storage types (instance store, EBS, EFS). The EBS test uses a 100 GB gp2 volume that is dedicated to the build user. For the EFS test I created two volumes, to compare the different EFS performance modes.

For each build I took the following steps:

  1. Copy the project from a "reference" user. This user has project directories without the .git directory, along with a fully-populated Maven local repository.
  2. Perform a test build. This is intended to ensure that all dependencies have been downloaded, and that there is nothing that would cause the build to fail.
  3. Run mvn clean in the test directory.
  4. Flush the disk cache (sync).
  5. For instance store, run TRIM (fstrim -v /) to avoid the penalty of SSD write amplification.
  6. Clear the in-memory buffer cache (echo 3 > /proc/sys/vm/drop_caches)
  7. Run the timed build (time mvn compile).

And here's the results. As before, I show the output from the time command: the first number is the "real" time (the wall-clock time it took to build). The second is "user" (CPU) time, while the third is "system" (kernel operation) time. All times are minutes:seconds, and are rounded to the nearest second.

  Appenders AWS SDK
  Real User System Real User System
Instance Store 00:05 00:13 00:00 01:14 02:16 00:06
EBS 00:06 00:15 00:00 01:26 02:13 00:06
EFS General Purpose 00:23 00:20 00:01 15:29 02:22 00:15
EFS Max IO 00:55 00:18 00:01 36:24 02:28 00:15

Comparing these timings to my previous run, the first thing that jumped out at me was how wildly different reported “user” time is. In fact, they are so different that my first step was to fire up an EC2 instance using the same AMI as the previous test (thankfully, AWS doesn't delete anything), and confirm the numbers (and yes, they were consistent). Intuitively, it should take the same amount of CPU time to compile a project, regardless of the performance of the disk storage, so I'm not sure why I didn't do more digging when I saw the original numbers. Regardless, “real” time tells the story.

And that story is that EFS still takes significantly longer than other options.

There have been definite performance improvements: the “general purpose” EFS volume takes 15 minutes, versus the 30+ required by the earlier test (the close correspondence of the earlier test and the “MAX IO” volume type make me think that it might be the same implementation).

But if you're speccing a build machine — or anything else that needs to work with large numbers of relatively small files — EFS remains a poor choice.